- linaro-toolchain - lists.linaro.org

[TCWG CI] 470.lbm slowed down by 15% after llvm: [AArch64] Make -mcpu=generic schedule for an in-order core

by ci_notify＠linaro.org

After llvm commit adec9223616477df023026b0269ccd008701cc94 Author: David Green <david.green(a)arm.com> [AArch64] Make -mcpu=generic schedule for an in-order core the following benchmarks slowed down by more than 2%: - 470.lbm slowed down by 15% from 16308 to 18676 perf samples - 433.milc slowed down by 9% from 12206 to 13270 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-adec9223616477df023026b0269ccd008701cc94 cd investigate-llvm-adec9223616477df023026b0269ccd008701cc94 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach adec9223616477df023026b0269ccd008701cc94 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e2a2e5475cbd370044474e132a1b5c58e6a3d458 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit adec9223616477df023026b0269ccd008701cc94 Author: David Green <david.green(a)arm.com> Date: Sat Oct 9 15:58:31 2021 +0100 [AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830 --- llvm/lib/Target/AArch64/AArch64.td | 2 +- .../Analysis/CostModel/AArch64/shuffle-select.ll | 2 +- .../Analysis/CostModel/AArch64/vector-select.ll | 4 +- llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll | 2 +- .../CodeGen/AArch64/GlobalISel/arm64-atomic.ll | 68 +- llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll | 4 +- .../call-translator-variadic-musttail.ll | 26 +- .../CodeGen/AArch64/GlobalISel/combine-udiv.ll | 308 +- .../AArch64/GlobalISel/merge-stores-truncating.ll | 10 +- llvm/test/CodeGen/AArch64/GlobalISel/swifterror.ll | 86 +- llvm/test/CodeGen/AArch64/aarch64-addv.ll | 2 +- llvm/test/CodeGen/AArch64/aarch64-be-bv.ll | 40 +- .../CodeGen/AArch64/aarch64-dup-ext-scalable.ll | 40 +- llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll | 18 +- llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll | 12 +- llvm/test/CodeGen/AArch64/aarch64-load-ext.ll | 36 +- .../CodeGen/AArch64/aarch64-matrix-umull-smull.ll | 24 +- llvm/test/CodeGen/AArch64/aarch64-smull.ll | 124 +- llvm/test/CodeGen/AArch64/aarch64-tail-dup-size.ll | 6 +- .../test/CodeGen/AArch64/aarch64_win64cc_vararg.ll | 4 +- llvm/test/CodeGen/AArch64/addimm-mulimm.ll | 32 +- .../CodeGen/AArch64/addsub-constant-folding.ll | 18 +- llvm/test/CodeGen/AArch64/addsub.ll | 2 +- llvm/test/CodeGen/AArch64/align-down.ll | 10 +- llvm/test/CodeGen/AArch64/and-mask-removal.ll | 12 +- .../AArch64/argument-blocks-array-of-struct.ll | 51 +- llvm/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll | 24 +- .../CodeGen/AArch64/arm64-addr-type-promotion.ll | 37 +- llvm/test/CodeGen/AArch64/arm64-addrmode.ll | 6 +- .../test/CodeGen/AArch64/arm64-bitfield-extract.ll | 14 +- llvm/test/CodeGen/AArch64/arm64-collect-loh.ll | 2 +- llvm/test/CodeGen/AArch64/arm64-convert-v4f64.ll | 22 +- llvm/test/CodeGen/AArch64/arm64-csel.ll | 16 +- llvm/test/CodeGen/AArch64/arm64-dup.ll | 10 +- llvm/test/CodeGen/AArch64/arm64-fcopysign.ll | 18 +- llvm/test/CodeGen/AArch64/arm64-fmadd.ll | 4 +- .../arm64-homogeneous-prolog-epilog-no-helper.ll | 18 +- llvm/test/CodeGen/AArch64/arm64-indexed-memory.ll | 54 +- .../CodeGen/AArch64/arm64-indexed-vector-ldst.ll | 180 +- llvm/test/CodeGen/AArch64/arm64-inline-asm.ll | 8 +- .../AArch64/arm64-instruction-mix-remarks.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-ldp.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-memset-inline.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll | 64 +- llvm/test/CodeGen/AArch64/arm64-neon-aba-abd.ll | 6 +- llvm/test/CodeGen/AArch64/arm64-neon-copy.ll | 13 +- llvm/test/CodeGen/AArch64/arm64-neon-mul-div.ll | 1428 ++++---- llvm/test/CodeGen/AArch64/arm64-nvcast.ll | 10 +- llvm/test/CodeGen/AArch64/arm64-popcnt.ll | 198 +- .../arm64-promote-const-complex-initializers.ll | 8 +- .../test/CodeGen/AArch64/arm64-register-pairing.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-rev.ll | 14 +- .../AArch64/arm64-setcc-int-to-fp-combine.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll | 92 +- llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll | 30 +- llvm/test/CodeGen/AArch64/arm64-srl-and.ll | 2 +- .../test/CodeGen/AArch64/arm64-subvector-extend.ll | 630 ++-- llvm/test/CodeGen/AArch64/arm64-tls-dynamics.ll | 8 +- llvm/test/CodeGen/AArch64/arm64-tls-local-exec.ll | 8 +- llvm/test/CodeGen/AArch64/arm64-trunc-store.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-vabs.ll | 446 ++- llvm/test/CodeGen/AArch64/arm64-vhadd.ll | 32 +- llvm/test/CodeGen/AArch64/arm64-vmul.ll | 226 +- llvm/test/CodeGen/AArch64/arm64-windows-calls.ll | 19 +- .../CodeGen/AArch64/arm64-zero-cycle-zeroing.ll | 8 +- llvm/test/CodeGen/AArch64/arm64_32-addrs.ll | 6 +- llvm/test/CodeGen/AArch64/arm64_32-atomics.ll | 2 +- llvm/test/CodeGen/AArch64/atomic-ops-lse.ll | 17 +- .../CodeGen/AArch64/atomic-ops-not-barriers.ll | 2 +- llvm/test/CodeGen/AArch64/bcmp-inline-small.ll | 4 +- llvm/test/CodeGen/AArch64/bitcast-promote-widen.ll | 8 +- llvm/test/CodeGen/AArch64/bitfield-insert.ll | 34 +- llvm/test/CodeGen/AArch64/build-one-lane.ll | 9 +- llvm/test/CodeGen/AArch64/build-vector-extract.ll | 126 +- llvm/test/CodeGen/AArch64/cgp-usubo.ll | 24 +- llvm/test/CodeGen/AArch64/cmp-select-sign.ll | 44 +- llvm/test/CodeGen/AArch64/cmpxchg-idioms.ll | 16 +- .../CodeGen/AArch64/combine-comparisons-by-cse.ll | 50 +- llvm/test/CodeGen/AArch64/cond-sel-value-prop.ll | 12 +- llvm/test/CodeGen/AArch64/consthoist-gep.ll | 32 +- llvm/test/CodeGen/AArch64/csr-split.ll | 4 +- llvm/test/CodeGen/AArch64/ctpop-nonean.ll | 30 +- llvm/test/CodeGen/AArch64/dag-combine-select.ll | 2 +- .../CodeGen/AArch64/dag-combine-trunc-build-vec.ll | 14 +- llvm/test/CodeGen/AArch64/dag-numsignbits.ll | 12 +- .../AArch64/div-rem-pair-recomposition-signed.ll | 210 +- .../AArch64/div-rem-pair-recomposition-unsigned.ll | 210 +- llvm/test/CodeGen/AArch64/emutls.ll | 6 +- llvm/test/CodeGen/AArch64/expand-select.ll | 50 +- llvm/test/CodeGen/AArch64/expand-vector-rot.ll | 12 +- llvm/test/CodeGen/AArch64/extract-bits.ll | 484 +-- llvm/test/CodeGen/AArch64/extract-lowbits.ll | 116 +- llvm/test/CodeGen/AArch64/f16-instructions.ll | 18 +- llvm/test/CodeGen/AArch64/fabs.ll | 8 +- llvm/test/CodeGen/AArch64/fadd-combines.ll | 14 +- llvm/test/CodeGen/AArch64/faddp-half.ll | 8 +- .../CodeGen/AArch64/fast-isel-addressing-modes.ll | 6 +- .../CodeGen/AArch64/fast-isel-branch-cond-split.ll | 4 +- llvm/test/CodeGen/AArch64/fast-isel-gep.ll | 6 +- llvm/test/CodeGen/AArch64/fast-isel-memcpy.ll | 6 +- llvm/test/CodeGen/AArch64/fast-isel-shift.ll | 24 +- llvm/test/CodeGen/AArch64/fdiv_combine.ll | 6 +- llvm/test/CodeGen/AArch64/fold-global-offsets.ll | 10 +- llvm/test/CodeGen/AArch64/fp16-v8-instructions.ll | 1441 ++++---- llvm/test/CodeGen/AArch64/fp16-vector-shuffle.ll | 2 +- llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll | 198 +- llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll | 958 +++--- llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll | 114 +- llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll | 708 ++-- .../CodeGen/AArch64/framelayout-frame-record.mir | 3 +- .../CodeGen/AArch64/framelayout-unaligned-fp.ll | 4 +- llvm/test/CodeGen/AArch64/func-calls.ll | 2 +- llvm/test/CodeGen/AArch64/funnel-shift-rot.ll | 30 +- llvm/test/CodeGen/AArch64/funnel-shift.ll | 108 +- llvm/test/CodeGen/AArch64/global-merge-3.ll | 24 +- llvm/test/CodeGen/AArch64/half.ll | 10 +- .../hoist-and-by-const-from-lshr-in-eqcmp-zero.ll | 6 +- .../test/CodeGen/AArch64/hwasan-check-memaccess.ll | 2 +- .../CodeGen/AArch64/i128_volatile_load_store.ll | 36 +- llvm/test/CodeGen/AArch64/implicit-null-check.ll | 12 +- .../AArch64/insert-subvector-res-legalization.ll | 70 +- llvm/test/CodeGen/AArch64/isinf.ll | 2 +- llvm/test/CodeGen/AArch64/known-never-nan.ll | 16 +- llvm/test/CodeGen/AArch64/ldst-opt.ll | 5 +- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll | 163 +- llvm/test/CodeGen/AArch64/logical_shifted_reg.ll | 137 +- llvm/test/CodeGen/AArch64/lowerMUL-newload.ll | 24 +- .../CodeGen/AArch64/machine-licm-sink-instr.ll | 24 +- .../test/CodeGen/AArch64/machine-outliner-throw.ll | 4 +- .../AArch64/machine_cse_impdef_killflags.ll | 4 +- llvm/test/CodeGen/AArch64/madd-lohi.ll | 4 +- llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll | 50 +- llvm/test/CodeGen/AArch64/merge-trunc-store.ll | 72 +- llvm/test/CodeGen/AArch64/midpoint-int.ll | 308 +- llvm/test/CodeGen/AArch64/min-max.ll | 260 +- llvm/test/CodeGen/AArch64/minmax-of-minmax.ll | 256 +- llvm/test/CodeGen/AArch64/minmax.ll | 10 +- llvm/test/CodeGen/AArch64/misched-fusion-lit.ll | 5 +- llvm/test/CodeGen/AArch64/misched-fusion.ll | 4 +- .../CodeGen/AArch64/named-vector-shuffles-neon.ll | 18 +- .../CodeGen/AArch64/named-vector-shuffles-sve.ll | 408 +-- llvm/test/CodeGen/AArch64/neg-abs.ll | 8 +- llvm/test/CodeGen/AArch64/neg-imm.ll | 3 +- .../CodeGen/AArch64/neon-bitwise-instructions.ll | 6 +- llvm/test/CodeGen/AArch64/neon-dotpattern.ll | 4 +- llvm/test/CodeGen/AArch64/neon-dotreduce.ll | 88 +- llvm/test/CodeGen/AArch64/neon-mla-mls.ll | 30 +- llvm/test/CodeGen/AArch64/neon-mov.ll | 2 +- llvm/test/CodeGen/AArch64/neon-reverseshuffle.ll | 2 +- llvm/test/CodeGen/AArch64/neon-shift-neg.ll | 24 +- llvm/test/CodeGen/AArch64/neon-truncstore.ll | 30 +- llvm/test/CodeGen/AArch64/nontemporal.ll | 74 +- llvm/test/CodeGen/AArch64/overeager_mla_fusing.ll | 10 +- llvm/test/CodeGen/AArch64/pow.ll | 12 +- .../pull-conditional-binop-through-shift.ll | 6 +- llvm/test/CodeGen/AArch64/qmovn.ll | 8 +- .../AArch64/ragreedy-local-interval-cost.ll | 187 +- llvm/test/CodeGen/AArch64/rand.ll | 10 +- llvm/test/CodeGen/AArch64/reduce-and.ll | 348 +- llvm/test/CodeGen/AArch64/reduce-or.ll | 348 +- llvm/test/CodeGen/AArch64/reduce-xor.ll | 164 +- llvm/test/CodeGen/AArch64/regress-tblgen-chains.ll | 4 +- llvm/test/CodeGen/AArch64/rotate-extract.ll | 14 +- .../rvmarker-pseudo-expansion-and-outlining.mir | 4 +- llvm/test/CodeGen/AArch64/sadd_sat.ll | 12 +- llvm/test/CodeGen/AArch64/sadd_sat_plus.ll | 36 +- llvm/test/CodeGen/AArch64/sadd_sat_vec.ll | 68 +- llvm/test/CodeGen/AArch64/sat-add.ll | 30 +- llvm/test/CodeGen/AArch64/sdivpow2.ll | 2 +- llvm/test/CodeGen/AArch64/seh-finally.ll | 8 +- llvm/test/CodeGen/AArch64/select-with-and-or.ll | 32 +- llvm/test/CodeGen/AArch64/select_const.ll | 112 +- llvm/test/CodeGen/AArch64/select_fmf.ll | 32 +- llvm/test/CodeGen/AArch64/selectcc-to-shiftand.ll | 16 +- llvm/test/CodeGen/AArch64/settag-merge-order.ll | 4 +- llvm/test/CodeGen/AArch64/settag-merge.ll | 8 +- llvm/test/CodeGen/AArch64/settag.ll | 10 +- llvm/test/CodeGen/AArch64/shift-amount-mod.ll | 168 +- llvm/test/CodeGen/AArch64/shift-by-signext.ll | 20 +- llvm/test/CodeGen/AArch64/shift-mod.ll | 2 +- llvm/test/CodeGen/AArch64/shrink-wrapping-vla.ll | 4 +- llvm/test/CodeGen/AArch64/sibling-call.ll | 2 +- llvm/test/CodeGen/AArch64/signbit-shift.ll | 8 +- llvm/test/CodeGen/AArch64/sink-addsub-of-const.ll | 48 +- llvm/test/CodeGen/AArch64/sitofp-fixed-legal.ll | 18 +- .../CodeGen/AArch64/speculation-hardening-loads.ll | 4 +- .../test/CodeGen/AArch64/speculation-hardening.mir | 2 +- llvm/test/CodeGen/AArch64/split-vector-insert.ll | 70 +- llvm/test/CodeGen/AArch64/sqrt-fastmath.ll | 254 +- llvm/test/CodeGen/AArch64/srem-lkk.ll | 2 +- .../CodeGen/AArch64/srem-seteq-illegal-types.ll | 90 +- llvm/test/CodeGen/AArch64/srem-seteq-optsize.ll | 16 +- .../CodeGen/AArch64/srem-seteq-vec-nonsplat.ll | 382 +-- llvm/test/CodeGen/AArch64/srem-seteq-vec-splat.ll | 64 +- llvm/test/CodeGen/AArch64/srem-seteq.ll | 12 +- llvm/test/CodeGen/AArch64/srem-vector-lkk.ll | 446 +-- llvm/test/CodeGen/AArch64/ssub_sat.ll | 12 +- llvm/test/CodeGen/AArch64/ssub_sat_plus.ll | 36 +- llvm/test/CodeGen/AArch64/ssub_sat_vec.ll | 68 +- .../CodeGen/AArch64/stack-guard-remat-bitcast.ll | 12 +- llvm/test/CodeGen/AArch64/stack-guard-sysreg.ll | 30 +- .../CodeGen/AArch64/statepoint-call-lowering.ll | 6 +- .../AArch64/sve-calling-convention-mixed.ll | 16 +- llvm/test/CodeGen/AArch64/sve-expand-div.ll | 12 +- llvm/test/CodeGen/AArch64/sve-extract-element.ll | 4 +- .../CodeGen/AArch64/sve-extract-fixed-vector.ll | 64 +- .../CodeGen/AArch64/sve-extract-scalable-vector.ll | 60 +- llvm/test/CodeGen/AArch64/sve-fcopysign.ll | 18 +- llvm/test/CodeGen/AArch64/sve-fcvt.ll | 64 +- .../CodeGen/AArch64/sve-fixed-length-concat.ll | 28 +- .../AArch64/sve-fixed-length-extract-vector-elt.ll | 12 +- .../AArch64/sve-fixed-length-float-compares.ll | 28 +- .../AArch64/sve-fixed-length-fp-extend-trunc.ll | 54 +- .../CodeGen/AArch64/sve-fixed-length-fp-select.ll | 48 +- .../CodeGen/AArch64/sve-fixed-length-fp-to-int.ll | 54 +- .../CodeGen/AArch64/sve-fixed-length-fp-vselect.ll | 1716 +++++----- .../AArch64/sve-fixed-length-insert-vector-elt.ll | 148 +- .../CodeGen/AArch64/sve-fixed-length-int-div.ll | 216 +- .../AArch64/sve-fixed-length-int-extends.ll | 56 +- .../AArch64/sve-fixed-length-int-immediates.ll | 56 +- .../CodeGen/AArch64/sve-fixed-length-int-mulh.ll | 30 +- .../CodeGen/AArch64/sve-fixed-length-int-rem.ll | 282 +- .../CodeGen/AArch64/sve-fixed-length-int-select.ll | 144 +- .../CodeGen/AArch64/sve-fixed-length-int-to-fp.ll | 108 +- .../AArch64/sve-fixed-length-int-vselect.ll | 3584 ++++++++++---------- .../AArch64/sve-fixed-length-masked-gather.ll | 296 +- .../AArch64/sve-fixed-length-masked-loads.ll | 46 +- .../AArch64/sve-fixed-length-masked-scatter.ll | 342 +- .../AArch64/sve-fixed-length-masked-stores.ll | 82 +- .../AArch64/sve-fixed-length-vector-shuffle.ll | 78 +- llvm/test/CodeGen/AArch64/sve-forward-st-to-ld.ll | 7 +- llvm/test/CodeGen/AArch64/sve-fptrunc-store.ll | 4 +- llvm/test/CodeGen/AArch64/sve-gep.ll | 4 +- .../CodeGen/AArch64/sve-implicit-zero-filling.ll | 13 +- llvm/test/CodeGen/AArch64/sve-insert-element.ll | 192 +- llvm/test/CodeGen/AArch64/sve-insert-vector.ll | 80 +- llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll | 30 +- llvm/test/CodeGen/AArch64/sve-int-arith.ll | 2 +- llvm/test/CodeGen/AArch64/sve-intrinsics-index.ll | 10 +- .../CodeGen/AArch64/sve-intrinsics-int-arith.ll | 4 +- llvm/test/CodeGen/AArch64/sve-ld-post-inc.ll | 6 +- llvm/test/CodeGen/AArch64/sve-ld1r.ll | 2 +- .../sve-lsr-scaled-index-addressing-mode.ll | 1 + .../CodeGen/AArch64/sve-masked-gather-legalize.ll | 6 +- .../CodeGen/AArch64/sve-masked-scatter-legalize.ll | 2 +- llvm/test/CodeGen/AArch64/sve-masked-scatter.ll | 2 +- llvm/test/CodeGen/AArch64/sve-pred-arith.ll | 16 +- llvm/test/CodeGen/AArch64/sve-sext-zext.ll | 12 +- llvm/test/CodeGen/AArch64/sve-split-extract-elt.ll | 100 +- llvm/test/CodeGen/AArch64/sve-split-fcvt.ll | 40 +- llvm/test/CodeGen/AArch64/sve-split-fp-reduce.ll | 2 +- llvm/test/CodeGen/AArch64/sve-split-insert-elt.ll | 72 +- llvm/test/CodeGen/AArch64/sve-split-int-reduce.ll | 10 +- llvm/test/CodeGen/AArch64/sve-split-load.ll | 6 +- llvm/test/CodeGen/AArch64/sve-split-store.ll | 6 +- .../AArch64/sve-st1-addressing-mode-reg-imm.ll | 12 +- llvm/test/CodeGen/AArch64/sve-stepvector.ll | 22 +- llvm/test/CodeGen/AArch64/sve-trunc.ll | 30 +- llvm/test/CodeGen/AArch64/sve-vscale-attr.ll | 40 +- llvm/test/CodeGen/AArch64/sve-vscale.ll | 2 +- llvm/test/CodeGen/AArch64/sve-vselect-imm.ll | 12 +- llvm/test/CodeGen/AArch64/swift-async.ll | 20 +- llvm/test/CodeGen/AArch64/swift-return.ll | 2 +- llvm/test/CodeGen/AArch64/swifterror.ll | 6 +- llvm/test/CodeGen/AArch64/tiny-model-pic.ll | 12 +- llvm/test/CodeGen/AArch64/tiny-model-static.ll | 12 +- .../test/CodeGen/AArch64/typepromotion-overflow.ll | 136 +- llvm/test/CodeGen/AArch64/typepromotion-signed.ll | 38 +- llvm/test/CodeGen/AArch64/uadd_sat.ll | 6 +- llvm/test/CodeGen/AArch64/uadd_sat_plus.ll | 30 +- llvm/test/CodeGen/AArch64/uadd_sat_vec.ll | 72 +- .../AArch64/umulo-128-legalisation-lowering.ll | 27 +- ...old-masked-merge-scalar-constmask-innerouter.ll | 18 +- ...asked-merge-scalar-constmask-interleavedbits.ll | 12 +- ...merge-scalar-constmask-interleavedbytehalves.ll | 12 +- ...unfold-masked-merge-scalar-constmask-lowhigh.ll | 2 +- .../unfold-masked-merge-scalar-variablemask.ll | 98 +- llvm/test/CodeGen/AArch64/urem-lkk.ll | 20 +- .../CodeGen/AArch64/urem-seteq-illegal-types.ll | 28 +- llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll | 46 +- llvm/test/CodeGen/AArch64/urem-seteq-optsize.ll | 14 +- .../CodeGen/AArch64/urem-seteq-vec-nonsplat.ll | 340 +- .../test/CodeGen/AArch64/urem-seteq-vec-nonzero.ll | 56 +- llvm/test/CodeGen/AArch64/urem-seteq-vec-splat.ll | 38 +- .../CodeGen/AArch64/urem-seteq-vec-tautological.ll | 56 +- llvm/test/CodeGen/AArch64/urem-seteq.ll | 14 +- llvm/test/CodeGen/AArch64/urem-vector-lkk.ll | 330 +- .../AArch64/use-cr-result-of-dom-icmp-st.ll | 8 +- llvm/test/CodeGen/AArch64/usub_sat_plus.ll | 20 +- llvm/test/CodeGen/AArch64/usub_sat_vec.ll | 48 +- llvm/test/CodeGen/AArch64/vcvt-oversize.ll | 4 +- llvm/test/CodeGen/AArch64/vec-libcalls.ll | 34 +- llvm/test/CodeGen/AArch64/vec_cttz.ll | 8 +- llvm/test/CodeGen/AArch64/vec_uaddo.ll | 168 +- llvm/test/CodeGen/AArch64/vec_umulo.ll | 296 +- .../CodeGen/AArch64/vecreduce-and-legalization.ll | 36 +- .../AArch64/vecreduce-fadd-legalization-strict.ll | 96 +- .../CodeGen/AArch64/vecreduce-fadd-legalization.ll | 6 +- llvm/test/CodeGen/AArch64/vecreduce-fadd.ll | 188 +- .../CodeGen/AArch64/vecreduce-fmax-legalization.ll | 246 +- .../CodeGen/AArch64/vecreduce-fmin-legalization.ll | 246 +- .../CodeGen/AArch64/vecreduce-umax-legalization.ll | 14 +- llvm/test/CodeGen/AArch64/vector-fcopysign.ll | 346 +- llvm/test/CodeGen/AArch64/vector-gep.ll | 6 +- .../CodeGen/AArch64/vector-popcnt-128-ult-ugt.ll | 680 ++-- llvm/test/CodeGen/AArch64/vldn_shuffle.ll | 6 +- llvm/test/CodeGen/AArch64/vselect-constants.ll | 42 +- llvm/test/CodeGen/AArch64/win-tls.ll | 6 +- llvm/test/CodeGen/AArch64/win64_vararg.ll | 32 +- llvm/test/CodeGen/AArch64/win64_vararg_float.ll | 12 +- llvm/test/CodeGen/AArch64/win64_vararg_float_cc.ll | 12 +- llvm/test/CodeGen/AArch64/xor.ll | 8 +- llvm/test/MC/AArch64/elf-globaladdress.ll | 6 +- .../CanonicalizeFreezeInLoops/aarch64.ll | 2 +- .../CodeGenPrepare/AArch64/large-offset-gep.ll | 30 +- .../AArch64/lsr-pre-inc-offset-check.ll | 12 +- .../LoopStrengthReduce/AArch64/small-constant.ll | 2 +- .../aarch64_generated_funcs.ll.generated.expected | 30 +- ...aarch64_generated_funcs.ll.nogenerated.expected | 24 +- 319 files changed, 14045 insertions(+), 13817 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td index 5c1bf783ba2a..cb52532343fe 100644 --- a/llvm/lib/Target/AArch64/AArch64.td +++ b/llvm/lib/Target/AArch64/AArch64.td @@ -1156,7 +1156,7 @@ def ProcTSV110 : SubtargetFeature<"tsv110", "ARMProcFamily", "TSV110", FeatureFP16FML, FeatureDotProd]>; -def : ProcessorModel<"generic", NoSchedModel, [ +def : ProcessorModel<"generic", CortexA55Model, [ FeatureFPARMv8, FeatureFuseAES, FeatureNEON, diff --git a/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll b/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll index 5008c7f5c847..cb8ec7ba6f21 100644 --- a/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll +++ b/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll @@ -4,7 +4,7 @@ ; COST-LABEL: sel.v8i8 ; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15> ; CODE-LABEL: sel.v8i8 -; CODE: tbl v0.8b, { v0.16b }, v2.8b +; CODE: tbl v0.8b, { v0.16b }, v1.8b define <8 x i8> @sel.v8i8(<8 x i8> %v0, <8 x i8> %v1) { %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15> ret <8 x i8> %tmp0 diff --git a/llvm/test/Analysis/CostModel/AArch64/vector-select.ll b/llvm/test/Analysis/CostModel/AArch64/vector-select.ll index f2271c4ed71f..6e77612815f4 100644 --- a/llvm/test/Analysis/CostModel/AArch64/vector-select.ll +++ b/llvm/test/Analysis/CostModel/AArch64/vector-select.ll @@ -119,15 +119,15 @@ define <2 x i64> @v2i64_select_sle(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) { ; CODE-LABEL: v3i64_select_sle ; CODE: bb.0 -; CODE: ldr ; CODE: mov +; CODE: ldr ; CODE: mov ; CODE: mov ; CODE: cmge ; CODE: cmge ; CODE: bif -; CODE: ext ; CODE: bif +; CODE: ext ; CODE: ret define <3 x i64> @v3i64_select_sle(<3 x i64> %a, <3 x i64> %b, <3 x i64> %c) { diff --git a/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll b/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll index 6fe73e067e1a..c2436ccecc75 100644 --- a/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll +++ b/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll @@ -51,8 +51,8 @@ define <vscale x 4 x i32> @ashr_add_shl_nxv4i8(<vscale x 4 x i32> %a) { ; CHECK-LABEL: ashr_add_shl_nxv4i8: ; CHECK: // %bb.0: ; CHECK-NEXT: mov w8, #16777216 -; CHECK-NEXT: mov z1.s, w8 ; CHECK-NEXT: lsl z0.s, z0.s, #24 +; CHECK-NEXT: mov z1.s, w8 ; CHECK-NEXT: add z0.s, z0.s, z1.s ; CHECK-NEXT: asr z0.s, z0.s, #24 ; CHECK-NEXT: ret diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll index fd3a0072d2a8..4385e3ede36f 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll @@ -705,14 +705,14 @@ define i32 @atomic_load(i32* %p) #0 { define i8 @atomic_load_relaxed_8(i8* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_8: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldrb w8, [x0, #4095] -; CHECK-NOLSE-O1-NEXT: ldrb w9, [x0, w1, sxtw] -; CHECK-NOLSE-O1-NEXT: ldurb w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldrb w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldrb w9, [x0, #4095] +; CHECK-NOLSE-O1-NEXT: ldrb w10, [x0, w1, sxtw] +; CHECK-NOLSE-O1-NEXT: ldurb w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldrb w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_8: @@ -775,14 +775,14 @@ define i8 @atomic_load_relaxed_8(i8* %p, i32 %off32) #0 { define i16 @atomic_load_relaxed_16(i16* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_16: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldrh w8, [x0, #8190] -; CHECK-NOLSE-O1-NEXT: ldrh w9, [x0, w1, sxtw #1] -; CHECK-NOLSE-O1-NEXT: ldurh w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldrh w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldrh w9, [x0, #8190] +; CHECK-NOLSE-O1-NEXT: ldrh w10, [x0, w1, sxtw #1] +; CHECK-NOLSE-O1-NEXT: ldurh w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldrh w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_16: @@ -845,14 +845,14 @@ define i16 @atomic_load_relaxed_16(i16* %p, i32 %off32) #0 { define i32 @atomic_load_relaxed_32(i32* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_32: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldr w8, [x0, #16380] -; CHECK-NOLSE-O1-NEXT: ldr w9, [x0, w1, sxtw #2] -; CHECK-NOLSE-O1-NEXT: ldur w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldr w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldr w9, [x0, #16380] +; CHECK-NOLSE-O1-NEXT: ldr w10, [x0, w1, sxtw #2] +; CHECK-NOLSE-O1-NEXT: ldur w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldr w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_32: @@ -911,14 +911,14 @@ define i32 @atomic_load_relaxed_32(i32* %p, i32 %off32) #0 { define i64 @atomic_load_relaxed_64(i64* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_64: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldr x8, [x0, #32760] -; CHECK-NOLSE-O1-NEXT: ldr x9, [x0, w1, sxtw #3] -; CHECK-NOLSE-O1-NEXT: ldur x10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldr x11, [x11] -; CHECK-NOLSE-O1-NEXT: add x8, x8, x9 -; CHECK-NOLSE-O1-NEXT: add x8, x8, x10 -; CHECK-NOLSE-O1-NEXT: add x0, x8, x11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldr x9, [x0, #32760] +; CHECK-NOLSE-O1-NEXT: ldr x10, [x0, w1, sxtw #3] +; CHECK-NOLSE-O1-NEXT: ldur x11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldr x8, [x8] +; CHECK-NOLSE-O1-NEXT: add x9, x9, x10 +; CHECK-NOLSE-O1-NEXT: add x9, x9, x11 +; CHECK-NOLSE-O1-NEXT: add x0, x9, x8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_64: @@ -2717,8 +2717,8 @@ define { i8, i1 } @cmpxchg_i8(i8* %ptr, i8 %desired, i8 %new) { ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; CHECK-NOLSE-O1-NEXT: LBB47_4: ; %cmpxchg.nostore -; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: mov w1, wzr +; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; @@ -2783,8 +2783,8 @@ define { i16, i1 } @cmpxchg_i16(i16* %ptr, i16 %desired, i16 %new) { ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; CHECK-NOLSE-O1-NEXT: LBB48_4: ; %cmpxchg.nostore -; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: mov w1, wzr +; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll b/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll index f8d4731d3249..651ca31ae555 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll @@ -27,8 +27,8 @@ define void @call_byval_a64i32([64 x i32]* %incoming) { ; CHECK: // %bb.0: ; CHECK-NEXT: sub sp, sp, #288 ; CHECK-NEXT: stp x29, x30, [sp, #256] // 16-byte Folded Spill -; CHECK-NEXT: str x28, [sp, #272] // 8-byte Folded Spill ; CHECK-NEXT: add x29, sp, #256 +; CHECK-NEXT: str x28, [sp, #272] // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa w29, 32 ; CHECK-NEXT: .cfi_offset w28, -16 ; CHECK-NEXT: .cfi_offset w30, -24 @@ -66,8 +66,8 @@ define void @call_byval_a64i32([64 x i32]* %incoming) { ; CHECK-NEXT: ldr q0, [x0, #240] ; CHECK-NEXT: str q0, [sp, #240] ; CHECK-NEXT: bl byval_a64i32 -; CHECK-NEXT: ldr x28, [sp, #272] // 8-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #256] // 16-byte Folded Reload +; CHECK-NEXT: ldr x28, [sp, #272] // 8-byte Folded Reload ; CHECK-NEXT: add sp, sp, #288 ; CHECK-NEXT: ret call void @byval_a64i32([64 x i32]* byval([64 x i32]) %incoming) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll index 42e91f631822..44c0854ea03d 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll @@ -63,15 +63,12 @@ define i32 @test_musttail_variadic_spill(i32 %arg0, ...) { ; CHECK-NEXT: mov x25, x6 ; CHECK-NEXT: mov x26, x7 ; CHECK-NEXT: stp q1, q0, [sp, #96] ; 32-byte Folded Spill +; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: stp q3, q2, [sp, #64] ; 32-byte Folded Spill ; CHECK-NEXT: stp q5, q4, [sp, #32] ; 32-byte Folded Spill ; CHECK-NEXT: stp q7, q6, [sp] ; 32-byte Folded Spill -; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: bl _puts ; CHECK-NEXT: ldp q1, q0, [sp, #96] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: mov w0, w19 ; CHECK-NEXT: mov x1, x20 ; CHECK-NEXT: mov x2, x21 @@ -81,6 +78,9 @@ define i32 @test_musttail_variadic_spill(i32 %arg0, ...) { ; CHECK-NEXT: mov x6, x25 ; CHECK-NEXT: mov x7, x26 ; CHECK-NEXT: mov x8, x27 +; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #208] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x20, x19, [sp, #192] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x22, x21, [sp, #176] ; 16-byte Folded Reload @@ -122,9 +122,8 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: .cfi_offset w26, -80 ; CHECK-NEXT: .cfi_offset w27, -88 ; CHECK-NEXT: .cfi_offset w28, -96 -; CHECK-NEXT: mov x27, x8 -; CHECK-NEXT: add x8, sp, #128 -; CHECK-NEXT: add x9, sp, #256 +; CHECK-NEXT: add x9, sp, #128 +; CHECK-NEXT: add x10, sp, #256 ; CHECK-NEXT: mov x19, x0 ; CHECK-NEXT: mov x20, x1 ; CHECK-NEXT: mov x21, x2 @@ -134,16 +133,14 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: mov x25, x6 ; CHECK-NEXT: mov x26, x7 ; CHECK-NEXT: stp q1, q0, [sp, #96] ; 32-byte Folded Spill +; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: stp q3, q2, [sp, #64] ; 32-byte Folded Spill ; CHECK-NEXT: stp q5, q4, [sp, #32] ; 32-byte Folded Spill ; CHECK-NEXT: stp q7, q6, [sp] ; 32-byte Folded Spill -; CHECK-NEXT: str x9, [x8] +; CHECK-NEXT: str x10, [x9] ; CHECK-NEXT: bl _get_f -; CHECK-NEXT: mov x9, x0 ; CHECK-NEXT: ldp q1, q0, [sp, #96] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload +; CHECK-NEXT: mov x9, x0 ; CHECK-NEXT: mov x0, x19 ; CHECK-NEXT: mov x1, x20 ; CHECK-NEXT: mov x2, x21 @@ -153,6 +150,9 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: mov x6, x25 ; CHECK-NEXT: mov x7, x26 ; CHECK-NEXT: mov x8, x27 +; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #240] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x20, x19, [sp, #224] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x22, x21, [sp, #208] ; 16-byte Folded Reload @@ -195,9 +195,9 @@ define void @h_thunk(%struct.Foo* %this, ...) { ; CHECK-NEXT: Lloh2: ; CHECK-NEXT: adrp x10, _g@GOTPAGE ; CHECK-NEXT: ldr x9, [x0, #16] +; CHECK-NEXT: mov w11, #42 ; CHECK-NEXT: Lloh3: ; CHECK-NEXT: ldr x10, [x10, _g@GOTPAGEOFF] -; CHECK-NEXT: mov w11, #42 ; CHECK-NEXT: Lloh4: ; CHECK-NEXT: str w11, [x10] ; CHECK-NEXT: br x9 diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll index 6d9dad450ef1..3dc45e4cf5a7 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll @@ -18,20 +18,20 @@ define <8 x i16> @combine_vec_udiv_uniform(<8 x i16> %x) { ; ; GISEL-LABEL: combine_vec_udiv_uniform: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI0_1 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI0_1] -; GISEL-NEXT: adrp x8, .LCPI0_0 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI0_0] ; GISEL-NEXT: adrp x8, .LCPI0_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI0_2] -; GISEL-NEXT: sub v1.8h, v2.8h, v1.8h -; GISEL-NEXT: neg v1.8h, v1.8h -; GISEL-NEXT: umull2 v2.4s, v0.8h, v3.8h -; GISEL-NEXT: umull v3.4s, v0.4h, v3.4h -; GISEL-NEXT: uzp2 v2.8h, v3.8h, v2.8h -; GISEL-NEXT: sub v0.8h, v0.8h, v2.8h -; GISEL-NEXT: ushl v0.8h, v0.8h, v1.8h -; GISEL-NEXT: add v0.8h, v0.8h, v2.8h +; GISEL-NEXT: adrp x9, .LCPI0_0 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI0_2] +; GISEL-NEXT: adrp x8, .LCPI0_1 +; GISEL-NEXT: ldr q4, [x9, :lo12:.LCPI0_0] +; GISEL-NEXT: umull2 v2.4s, v0.8h, v1.8h +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI0_1] +; GISEL-NEXT: umull v1.4s, v0.4h, v1.4h +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v2.8h +; GISEL-NEXT: sub v2.8h, v4.8h, v3.8h +; GISEL-NEXT: sub v0.8h, v0.8h, v1.8h +; GISEL-NEXT: neg v2.8h, v2.8h +; GISEL-NEXT: ushl v0.8h, v0.8h, v2.8h +; GISEL-NEXT: add v0.8h, v0.8h, v1.8h ; GISEL-NEXT: ushr v0.8h, v0.8h, #4 ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 23, i16 23, i16 23, i16 23, i16 23, i16 23, i16 23, i16 23> @@ -44,53 +44,53 @@ define <8 x i16> @combine_vec_udiv_nonuniform(<8 x i16> %x) { ; SDAG-NEXT: adrp x8, .LCPI1_0 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI1_0] ; SDAG-NEXT: adrp x8, .LCPI1_1 +; SDAG-NEXT: ushl v1.8h, v0.8h, v1.8h ; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_1] ; SDAG-NEXT: adrp x8, .LCPI1_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI1_2] -; SDAG-NEXT: ushl v1.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v4.4s, v1.8h, v2.8h +; SDAG-NEXT: umull2 v3.4s, v1.8h, v2.8h ; SDAG-NEXT: umull v1.4s, v1.4h, v2.4h +; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_2] ; SDAG-NEXT: adrp x8, .LCPI1_3 -; SDAG-NEXT: uzp2 v1.8h, v1.8h, v4.8h -; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_3] +; SDAG-NEXT: uzp2 v1.8h, v1.8h, v3.8h ; SDAG-NEXT: sub v0.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v4.4s, v0.8h, v3.8h -; SDAG-NEXT: umull v0.4s, v0.4h, v3.4h -; SDAG-NEXT: uzp2 v0.8h, v0.8h, v4.8h +; SDAG-NEXT: umull2 v3.4s, v0.8h, v2.8h +; SDAG-NEXT: umull v0.4s, v0.4h, v2.4h +; SDAG-NEXT: uzp2 v0.8h, v0.8h, v3.8h ; SDAG-NEXT: add v0.8h, v0.8h, v1.8h -; SDAG-NEXT: ushl v0.8h, v0.8h, v2.8h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI1_3] +; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI1_5 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI1_5] ; GISEL-NEXT: adrp x8, .LCPI1_4 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI1_4] +; GISEL-NEXT: adrp x10, .LCPI1_0 +; GISEL-NEXT: adrp x9, .LCPI1_1 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI1_4] ; GISEL-NEXT: adrp x8, .LCPI1_3 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_3] -; GISEL-NEXT: adrp x8, .LCPI1_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI1_1] -; GISEL-NEXT: adrp x8, .LCPI1_0 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI1_0] +; GISEL-NEXT: ldr q5, [x10, :lo12:.LCPI1_0] +; GISEL-NEXT: ldr q6, [x9, :lo12:.LCPI1_1] +; GISEL-NEXT: neg v1.8h, v1.8h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI1_3] ; GISEL-NEXT: adrp x8, .LCPI1_2 -; GISEL-NEXT: neg v2.8h, v2.8h -; GISEL-NEXT: ldr q6, [x8, :lo12:.LCPI1_2] -; GISEL-NEXT: ushl v2.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v5.8h -; GISEL-NEXT: umull2 v5.4s, v2.8h, v3.8h +; GISEL-NEXT: ushl v1.8h, v0.8h, v1.8h +; GISEL-NEXT: umull2 v3.4s, v1.8h, v2.8h +; GISEL-NEXT: umull v1.4s, v1.4h, v2.4h +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v3.8h +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_2] +; GISEL-NEXT: adrp x8, .LCPI1_5 +; GISEL-NEXT: sub v2.8h, v0.8h, v1.8h +; GISEL-NEXT: umull2 v4.4s, v2.8h, v3.8h ; GISEL-NEXT: umull v2.4s, v2.4h, v3.4h -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v5.8h -; GISEL-NEXT: sub v3.8h, v0.8h, v2.8h -; GISEL-NEXT: umull2 v5.4s, v3.8h, v6.8h -; GISEL-NEXT: umull v3.4s, v3.4h, v6.4h -; GISEL-NEXT: uzp2 v3.8h, v3.8h, v5.8h -; GISEL-NEXT: neg v4.8h, v4.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: add v2.8h, v3.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v4.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_5] +; GISEL-NEXT: cmeq v3.8h, v3.8h, v5.8h +; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h +; GISEL-NEXT: neg v4.8h, v6.8h +; GISEL-NEXT: add v1.8h, v2.8h, v1.8h +; GISEL-NEXT: shl v2.8h, v3.8h, #15 +; GISEL-NEXT: ushl v1.8h, v1.8h, v4.8h +; GISEL-NEXT: sshr v2.8h, v2.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 23, i16 34, i16 -23, i16 56, i16 128, i16 -1, i16 -256, i16 -32768> ret <8 x i16> %1 @@ -100,41 +100,41 @@ define <8 x i16> @combine_vec_udiv_nonuniform2(<8 x i16> %x) { ; SDAG-LABEL: combine_vec_udiv_nonuniform2: ; SDAG: // %bb.0: ; SDAG-NEXT: adrp x8, .LCPI2_0 -; SDAG-NEXT: adrp x9, .LCPI2_1 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_0] -; SDAG-NEXT: ldr q2, [x9, :lo12:.LCPI2_1] +; SDAG-NEXT: adrp x8, .LCPI2_1 +; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_1] ; SDAG-NEXT: adrp x8, .LCPI2_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI2_2] +; SDAG-NEXT: umull2 v2.4s, v0.8h, v1.8h +; SDAG-NEXT: umull v0.4s, v0.4h, v1.4h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_2] +; SDAG-NEXT: uzp2 v0.8h, v0.8h, v2.8h ; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v1.4s, v0.8h, v2.8h -; SDAG-NEXT: umull v0.4s, v0.4h, v2.4h -; SDAG-NEXT: uzp2 v0.8h, v0.8h, v1.8h -; SDAG-NEXT: ushl v0.8h, v0.8h, v3.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform2: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI2_4 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI2_4] ; GISEL-NEXT: adrp x8, .LCPI2_3 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_3] -; GISEL-NEXT: adrp x8, .LCPI2_1 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI2_1] -; GISEL-NEXT: adrp x8, .LCPI2_0 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI2_0] +; GISEL-NEXT: adrp x9, .LCPI2_4 +; GISEL-NEXT: adrp x10, .LCPI2_0 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI2_3] ; GISEL-NEXT: adrp x8, .LCPI2_2 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI2_2] +; GISEL-NEXT: ldr q3, [x9, :lo12:.LCPI2_4] +; GISEL-NEXT: ldr q4, [x10, :lo12:.LCPI2_0] +; GISEL-NEXT: neg v1.8h, v1.8h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_2] +; GISEL-NEXT: adrp x8, .LCPI2_1 +; GISEL-NEXT: cmeq v3.8h, v3.8h, v4.8h +; GISEL-NEXT: ushl v1.8h, v0.8h, v1.8h +; GISEL-NEXT: shl v3.8h, v3.8h, #15 +; GISEL-NEXT: umull2 v5.4s, v1.8h, v2.8h +; GISEL-NEXT: umull v1.4s, v1.4h, v2.4h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_1] ; GISEL-NEXT: neg v2.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v4.8h -; GISEL-NEXT: umull2 v4.4s, v2.8h, v5.8h -; GISEL-NEXT: umull v2.4s, v2.4h, v5.4h -; GISEL-NEXT: neg v3.8h, v3.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v3.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v5.8h +; GISEL-NEXT: ushl v1.8h, v1.8h, v2.8h +; GISEL-NEXT: sshr v2.8h, v3.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 -34, i16 35, i16 36, i16 -37, i16 38, i16 -39, i16 40, i16 -41> ret <8 x i16> %1 @@ -146,43 +146,43 @@ define <8 x i16> @combine_vec_udiv_nonuniform3(<8 x i16> %x) { ; SDAG-NEXT: adrp x8, .LCPI3_0 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI3_0] ; SDAG-NEXT: adrp x8, .LCPI3_1 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI3_1] ; SDAG-NEXT: umull2 v2.4s, v0.8h, v1.8h ; SDAG-NEXT: umull v1.4s, v0.4h, v1.4h ; SDAG-NEXT: uzp2 v1.8h, v1.8h, v2.8h ; SDAG-NEXT: sub v0.8h, v0.8h, v1.8h ; SDAG-NEXT: usra v1.8h, v0.8h, #1 -; SDAG-NEXT: ushl v0.8h, v1.8h, v3.8h +; SDAG-NEXT: ldr q0, [x8, :lo12:.LCPI3_1] +; SDAG-NEXT: ushl v0.8h, v1.8h, v0.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform3: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI3_5 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI3_5] ; GISEL-NEXT: adrp x8, .LCPI3_4 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI3_4] -; GISEL-NEXT: adrp x8, .LCPI3_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI3_2] -; GISEL-NEXT: adrp x8, .LCPI3_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI3_1] -; GISEL-NEXT: adrp x8, .LCPI3_3 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI3_3] -; GISEL-NEXT: adrp x8, .LCPI3_0 -; GISEL-NEXT: ldr q6, [x8, :lo12:.LCPI3_0] -; GISEL-NEXT: sub v3.8h, v4.8h, v3.8h -; GISEL-NEXT: umull2 v4.4s, v0.8h, v2.8h -; GISEL-NEXT: umull v2.4s, v0.4h, v2.4h -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h -; GISEL-NEXT: neg v3.8h, v3.8h -; GISEL-NEXT: sub v4.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v6.8h -; GISEL-NEXT: ushl v3.8h, v4.8h, v3.8h -; GISEL-NEXT: neg v5.8h, v5.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: add v2.8h, v3.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v5.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: adrp x9, .LCPI3_2 +; GISEL-NEXT: adrp x10, .LCPI3_1 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI3_4] +; GISEL-NEXT: adrp x8, .LCPI3_5 +; GISEL-NEXT: ldr q2, [x9, :lo12:.LCPI3_2] +; GISEL-NEXT: adrp x9, .LCPI3_3 +; GISEL-NEXT: ldr q3, [x10, :lo12:.LCPI3_1] +; GISEL-NEXT: adrp x10, .LCPI3_0 +; GISEL-NEXT: umull2 v4.4s, v0.8h, v1.8h +; GISEL-NEXT: umull v1.4s, v0.4h, v1.4h +; GISEL-NEXT: ldr q6, [x9, :lo12:.LCPI3_3] +; GISEL-NEXT: sub v2.8h, v3.8h, v2.8h +; GISEL-NEXT: ldr q5, [x10, :lo12:.LCPI3_0] +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v4.8h +; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI3_5] +; GISEL-NEXT: neg v2.8h, v2.8h +; GISEL-NEXT: sub v3.8h, v0.8h, v1.8h +; GISEL-NEXT: ushl v2.8h, v3.8h, v2.8h +; GISEL-NEXT: cmeq v3.8h, v4.8h, v5.8h +; GISEL-NEXT: neg v4.8h, v6.8h +; GISEL-NEXT: add v1.8h, v2.8h, v1.8h +; GISEL-NEXT: shl v2.8h, v3.8h, #15 +; GISEL-NEXT: ushl v1.8h, v1.8h, v4.8h +; GISEL-NEXT: sshr v2.8h, v2.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 7, i16 23, i16 25, i16 27, i16 31, i16 47, i16 63, i16 127> ret <8 x i16> %1 @@ -192,39 +192,39 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) { ; SDAG-LABEL: combine_vec_udiv_nonuniform4: ; SDAG: // %bb.0: ; SDAG-NEXT: adrp x8, .LCPI4_0 +; SDAG-NEXT: adrp x9, .LCPI4_3 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI4_0] ; SDAG-NEXT: adrp x8, .LCPI4_1 +; SDAG-NEXT: ldr q3, [x9, :lo12:.LCPI4_3] +; SDAG-NEXT: umull2 v2.8h, v0.16b, v1.16b +; SDAG-NEXT: umull v1.8h, v0.8b, v1.8b +; SDAG-NEXT: and v0.16b, v0.16b, v3.16b +; SDAG-NEXT: uzp2 v1.16b, v1.16b, v2.16b ; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI4_1] ; SDAG-NEXT: adrp x8, .LCPI4_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI4_2] -; SDAG-NEXT: adrp x8, .LCPI4_3 -; SDAG-NEXT: ldr q4, [x8, :lo12:.LCPI4_3] -; SDAG-NEXT: umull2 v5.8h, v0.16b, v1.16b -; SDAG-NEXT: umull v1.8h, v0.8b, v1.8b -; SDAG-NEXT: uzp2 v1.16b, v1.16b, v5.16b ; SDAG-NEXT: ushl v1.16b, v1.16b, v2.16b -; SDAG-NEXT: and v1.16b, v1.16b, v3.16b -; SDAG-NEXT: and v0.16b, v0.16b, v4.16b +; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI4_2] +; SDAG-NEXT: and v1.16b, v1.16b, v2.16b ; SDAG-NEXT: orr v0.16b, v0.16b, v1.16b ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform4: ; GISEL: // %bb.0: ; GISEL-NEXT: adrp x8, .LCPI4_3 +; GISEL-NEXT: adrp x9, .LCPI4_2 +; GISEL-NEXT: adrp x10, .LCPI4_1 ; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI4_3] ; GISEL-NEXT: adrp x8, .LCPI4_0 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI4_0] -; GISEL-NEXT: adrp x8, .LCPI4_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI4_2] -; GISEL-NEXT: adrp x8, .LCPI4_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI4_1] -; GISEL-NEXT: cmeq v1.16b, v1.16b, v2.16b -; GISEL-NEXT: umull2 v2.8h, v0.16b, v3.16b -; GISEL-NEXT: umull v3.8h, v0.8b, v3.8b -; GISEL-NEXT: neg v4.16b, v4.16b -; GISEL-NEXT: uzp2 v2.16b, v3.16b, v2.16b +; GISEL-NEXT: ldr q2, [x9, :lo12:.LCPI4_2] +; GISEL-NEXT: ldr q3, [x10, :lo12:.LCPI4_1] +; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI4_0] +; GISEL-NEXT: umull2 v5.8h, v0.16b, v2.16b +; GISEL-NEXT: umull v2.8h, v0.8b, v2.8b +; GISEL-NEXT: cmeq v1.16b, v1.16b, v4.16b +; GISEL-NEXT: neg v3.16b, v3.16b +; GISEL-NEXT: uzp2 v2.16b, v2.16b, v5.16b ; GISEL-NEXT: shl v1.16b, v1.16b, #7 -; GISEL-NEXT: ushl v2.16b, v2.16b, v4.16b +; GISEL-NEXT: ushl v2.16b, v2.16b, v3.16b ; GISEL-NEXT: sshr v1.16b, v1.16b, #7 ; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b ; GISEL-NEXT: ret @@ -236,55 +236,55 @@ define <8 x i16> @pr38477(<8 x i16> %a0) { </cut>

4 years, 8 months

1
0
0 0

[TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks slowed down by more than 2%: - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O3 -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

4 years, 8 months

8
13
0 0

[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions: commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Author: Richard Biener <rguenther(a)suse.de> tree-optimization/102570 - teach VN about internal functions Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18360 # First few build errors in logs: # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 make[2]: *** [scripts/Makefile.build:288: arch/arm64/hyperv/hv_core.o] Error 1 # 00:01:22 crypto/asymmetric_keys/asymmetric_type.c:481:15: error: ‘restrict_method’ is used uninitialized [-Werror=uninitialized] # 00:01:22 make[2]: *** [scripts/Makefile.build:288: crypto/asymmetric_keys/asymmetric_type.o] Error 1 # 00:01:22 ./include/trace/perf.h:38:25: error: ‘entry’ is used uninitialized [-Werror=uninitialized] # 00:01:22 ./include/trace/perf.h:44:13: error: ‘__entry_size’ is used uninitialized [-Werror=uninitialized] # 00:01:23 security/keys/encrypted-keys/encrypted.c:660:19: error: ‘mkey’ is used uninitialized [-Werror=uninitialized] # 00:01:23 security/keys/encrypted-keys/encrypted.c:905:19: error: ‘epayload’ is used uninitialized [-Werror=uninitialized] from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21404 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Reproduce builds: <cut> mkdir investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 cd investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 22d34a2a50651d01669b6fbcdb9677c18d2197c5 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Author: Richard Biener <rguenther(a)suse.de> Date: Mon Oct 4 10:57:45 2021 +0200 tree-optimization/102570 - teach VN about internal functions We're now using internal functions for a lot of stuff but there's still missing VN support out of laziness. The following instantiates support and adds testcases for FRE and PRE (hoisting). 2021-10-04 Richard Biener <rguenther(a)suse.de> PR tree-optimization/102570 * tree-ssa-sccvn.h (vn_reference_op_struct): Document we are using clique for the internal function code. * tree-ssa-sccvn.c (vn_reference_op_eq): Compare the internal function code. (print_vn_reference_ops): Print the internal function code. (vn_reference_op_compute_hash): Hash it. (copy_reference_ops_from_call): Record it. (visit_stmt): Remove the restriction around internal function calls. (fully_constant_vn_reference_p): Use fold_const_call and handle internal functions. (vn_reference_eq): Compare call return types. * tree-ssa-pre.c (create_expression_by_pieces): Handle generating calls to internal functions. (compute_avail): Remove the restriction around internal function calls. * gcc.dg/tree-ssa/ssa-fre-96.c: New testcase. * gcc.dg/tree-ssa/ssa-pre-33.c: Likewise. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c | 14 +++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c | 15 +++++ gcc/tree-ssa-pre.c | 27 +++++---- gcc/tree-ssa-sccvn.c | 91 ++++++++++++++++++------------ gcc/tree-ssa-sccvn.h | 3 +- 5 files changed, 103 insertions(+), 47 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c new file mode 100644 index 00000000000..fd1d5713b5f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-fre1" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res) +{ + _Bool t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return t==t1; +} + +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "fre1" } } */ +/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c new file mode 100644 index 00000000000..3b3bd629bc2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res, int flag, _Bool *t) +{ + if (flag) + *t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return *t==t1; +} + +/* We should hoist the .ADD_OVERFLOW to before the check. */ +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "pre" } } */ diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 08755847f66..1cc1aae694f 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2855,9 +2855,13 @@ create_expression_by_pieces (basic_block block, pre_expr expr, unsigned int operand = 1; vn_reference_op_t currop = &ref->operands[0]; tree sc = NULL_TREE; - tree fn = find_or_generate_expression (block, currop->op0, stmts); - if (!fn) - return NULL_TREE; + tree fn = NULL_TREE; + if (currop->op0) + { + fn = find_or_generate_expression (block, currop->op0, stmts); + if (!fn) + return NULL_TREE; + } if (currop->op1) { sc = find_or_generate_expression (block, currop->op1, stmts); @@ -2873,12 +2877,19 @@ create_expression_by_pieces (basic_block block, pre_expr expr, return NULL_TREE; args.quick_push (arg); } - gcall *call = gimple_build_call_vec (fn, args); + gcall *call; + if (currop->op0) + { + call = gimple_build_call_vec (fn, args); + gimple_call_set_fntype (call, currop->type); + } + else + call = gimple_build_call_internal_vec ((internal_fn)currop->clique, + args); gimple_set_location (call, expr->loc); - gimple_call_set_fntype (call, currop->type); if (sc) gimple_call_set_chain (call, sc); - tree forcedname = make_ssa_name (TREE_TYPE (currop->type)); + tree forcedname = make_ssa_name (ref->type); gimple_call_set_lhs (call, forcedname); /* There's no CCP pass after PRE which would re-compute alignment information so make sure we re-materialize this here. */ @@ -4004,10 +4015,6 @@ compute_avail (function *fun) vn_reference_s ref1; pre_expr result = NULL; - /* We can value number only calls to real functions. */ - if (gimple_call_internal_p (stmt)) - continue; - vn_reference_lookup_call (as_a <gcall *> (stmt), &ref, &ref1); /* There is no point to PRE a call without a value. */ if (!ref || !ref->result) diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 416a5252144..0d942218279 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-ssa-loop-niter.h" #include "builtins.h" +#include "fold-const-call.h" #include "tree-ssa-sccvn.h" /* This algorithm is based on the SCC algorithm presented by Keith @@ -212,7 +213,8 @@ vn_reference_op_eq (const void *p1, const void *p2) TYPE_MAIN_VARIANT (vro2->type)))) && expressions_equal_p (vro1->op0, vro2->op0) && expressions_equal_p (vro1->op1, vro2->op1) - && expressions_equal_p (vro1->op2, vro2->op2)); + && expressions_equal_p (vro1->op2, vro2->op2) + && (vro1->opcode != CALL_EXPR || vro1->clique == vro2->clique)); } /* Free a reference operation structure VP. */ @@ -264,15 +266,18 @@ print_vn_reference_ops (FILE *outfile, const vec<vn_reference_op_s> ops) && TREE_CODE_CLASS (vro->opcode) != tcc_declaration) { fprintf (outfile, "%s", get_tree_code_name (vro->opcode)); - if (vro->op0) + if (vro->op0 || vro->opcode == CALL_EXPR) { fprintf (outfile, "<"); closebrace = true; } } - if (vro->op0) + if (vro->op0 || vro->opcode == CALL_EXPR) { - print_generic_expr (outfile, vro->op0); + if (!vro->op0) + fprintf (outfile, internal_fn_name ((internal_fn)vro->clique)); + else + print_generic_expr (outfile, vro->op0); if (vro->op1) { fprintf (outfile, ","); @@ -684,6 +689,8 @@ static void vn_reference_op_compute_hash (const vn_reference_op_t vro1, inchash::hash &hstate) { hstate.add_int (vro1->opcode); + if (vro1->opcode == CALL_EXPR && !vro1->op0) + hstate.add_int (vro1->clique); if (vro1->op0) inchash::add_expr (vro1->op0, hstate); if (vro1->op1) @@ -769,11 +776,16 @@ vn_reference_eq (const_vn_reference_t const vr1, const_vn_reference_t const vr2) if (vr1->type != vr2->type) return false; } + else if (vr1->type == vr2->type) + ; else if (COMPLETE_TYPE_P (vr1->type) != COMPLETE_TYPE_P (vr2->type) || (COMPLETE_TYPE_P (vr1->type) && !expressions_equal_p (TYPE_SIZE (vr1->type), TYPE_SIZE (vr2->type)))) return false; + else if (vr1->operands[0].opcode == CALL_EXPR + && !types_compatible_p (vr1->type, vr2->type)) + return false; else if (INTEGRAL_TYPE_P (vr1->type) && INTEGRAL_TYPE_P (vr2->type)) { @@ -1270,6 +1282,8 @@ copy_reference_ops_from_call (gcall *call, temp.type = gimple_call_fntype (call); temp.opcode = CALL_EXPR; temp.op0 = gimple_call_fn (call); + if (gimple_call_internal_p (call)) + temp.clique = gimple_call_internal_fn (call); temp.op1 = gimple_call_chain (call); if (stmt_could_throw_p (cfun, call) && (lr = lookup_stmt_eh_lp (call)) > 0) temp.op2 = size_int (lr); @@ -1459,9 +1473,11 @@ fully_constant_vn_reference_p (vn_reference_t ref) a call to a builtin function with at most two arguments. */ op = &operands[0]; if (op->opcode == CALL_EXPR - && TREE_CODE (op->op0) == ADDR_EXPR - && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL - && fndecl_built_in_p (TREE_OPERAND (op->op0, 0)) + && (!op->op0 + || (TREE_CODE (op->op0) == ADDR_EXPR + && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL + && fndecl_built_in_p (TREE_OPERAND (op->op0, 0), + BUILT_IN_NORMAL))) && operands.length () >= 2 && operands.length () <= 3) { @@ -1481,13 +1497,17 @@ fully_constant_vn_reference_p (vn_reference_t ref) anyconst = true; if (anyconst) { - tree folded = build_call_expr (TREE_OPERAND (op->op0, 0), - arg1 ? 2 : 1, - arg0->op0, - arg1 ? arg1->op0 : NULL); - if (folded - && TREE_CODE (folded) == NOP_EXPR) - folded = TREE_OPERAND (folded, 0); + combined_fn fn; + if (op->op0) + fn = as_combined_fn (DECL_FUNCTION_CODE + (TREE_OPERAND (op->op0, 0))); + else + fn = as_combined_fn ((internal_fn) op->clique); + tree folded; + if (arg1) + folded = fold_const_call (fn, ref->type, arg0->op0, arg1->op0); + else + folded = fold_const_call (fn, ref->type, arg0->op0); if (folded && is_gimple_min_invariant (folded)) return folded; @@ -5648,28 +5668,27 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = false) && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL) extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0)); } - if (!gimple_call_internal_p (call_stmt) - && (/* Calls to the same function with the same vuse - and the same operands do not necessarily return the same - value, unless they're pure or const. */ - ((gimple_call_flags (call_stmt) | extra_fnflags) - & (ECF_PURE | ECF_CONST)) - /* If calls have a vdef, subsequent calls won't have - the same incoming vuse. So, if 2 calls with vdef have the - same vuse, we know they're not subsequent. - We can value number 2 calls to the same function with the - same vuse and the same operands which are not subsequent - the same, because there is no code in the program that can - compare the 2 values... */ - || (gimple_vdef (call_stmt) - /* ... unless the call returns a pointer which does - not alias with anything else. In which case the - information that the values are distinct are encoded - in the IL. */ - && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS) - /* Only perform the following when being called from PRE - which embeds tail merging. */ - && default_vn_walk_kind == VN_WALK))) + if (/* Calls to the same function with the same vuse + and the same operands do not necessarily return the same + value, unless they're pure or const. */ + ((gimple_call_flags (call_stmt) | extra_fnflags) + & (ECF_PURE | ECF_CONST)) + /* If calls have a vdef, subsequent calls won't have + the same incoming vuse. So, if 2 calls with vdef have the + same vuse, we know they're not subsequent. + We can value number 2 calls to the same function with the + same vuse and the same operands which are not subsequent + the same, because there is no code in the program that can + compare the 2 values... */ + || (gimple_vdef (call_stmt) + /* ... unless the call returns a pointer which does + not alias with anything else. In which case the + information that the values are distinct are encoded + in the IL. */ + && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS) + /* Only perform the following when being called from PRE + which embeds tail merging. */ + && default_vn_walk_kind == VN_WALK)) changed = visit_reference_op_call (lhs, call_stmt); else changed = defs_to_varying (call_stmt); diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h index 96100596d2e..8a1b649c726 100644 --- a/gcc/tree-ssa-sccvn.h +++ b/gcc/tree-ssa-sccvn.h @@ -106,7 +106,8 @@ typedef const struct vn_phi_s *const_vn_phi_t; typedef struct vn_reference_op_struct { ENUM_BITFIELD(tree_code) opcode : 16; - /* Dependence info, used for [TARGET_]MEM_REF only. */ + /* Dependence info, used for [TARGET_]MEM_REF only. For internal + function calls clique is also used for the internal function code. */ unsigned short clique; unsigned short base; unsigned reverse : 1; </cut>

4 years, 8 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Infer flags from add/gep in any block

by ci_notify＠linaro.org

After llvm commit 0658bab870c89d81678f1f37aac0396ddd0913b3 Author: Philip Reames <listmail(a)philipreames.com> [SCEV] Infer flags from add/gep in any block the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 11124 to 11783 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 41% from 1504 to 2116 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-0658bab870c89d81678f1f37aac0396ddd0913b3 cd investigate-llvm-0658bab870c89d81678f1f37aac0396ddd0913b3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 0658bab870c89d81678f1f37aac0396ddd0913b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 2ced9a42be8aba4533225fdb8ed02fe6f50060b6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 0658bab870c89d81678f1f37aac0396ddd0913b3 Author: Philip Reames <listmail(a)philipreames.com> Date: Wed Oct 6 10:35:01 2021 -0700 [SCEV] Infer flags from add/gep in any block This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*. Differential Revision: https://reviews.llvm.org/D111186 --- llvm/lib/Analysis/ScalarEvolution.cpp | 10 --------- .../Analysis/DependenceAnalysis/Preliminary.ll | 2 +- .../Analysis/ScalarEvolution/flags-from-poison.ll | 8 +++---- .../SLPVectorizer/X86/consecutive-access.ll | 25 +++++++++------------- 4 files changed, 15 insertions(+), 30 deletions(-) diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 6683b1a5205c..4fb2266b6e89 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6645,16 +6645,6 @@ bool ScalarEvolution::isGuaranteedToTransferExecutionTo(const Instruction *A, bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) { - // Here we check that I is in the header of the innermost loop containing I, - // since we only deal with instructions in the loop header. The actual loop we - // need to check later will come from an add recurrence, but getting that - // requires computing the SCEV of the operands, which can be expensive. This - // check we can do cheaply to rule out some cases early. - Loop *InnermostContainingLoop = LI.getLoopFor(I->getParent()); - if (InnermostContainingLoop == nullptr || - InnermostContainingLoop->getHeader() != I->getParent()) - return false; - // Only proceed if we can prove that I does not yield poison. if (!programUndefinedIfPoison(I)) return false; diff --git a/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll b/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll index 0899f67d6914..91827f3231ba 100644 --- a/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll +++ b/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll @@ -623,7 +623,7 @@ entry: ; CHECK-LABEL: p9 ; CHECK: da analyze - none! -; CHECK: da analyze - flow [|<]! +; CHECK: da analyze - none! ; CHECK: da analyze - confused! ; CHECK: da analyze - none! ; CHECK: da analyze - confused! diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll index f0bda26edb38..0423854bbc3b 100644 --- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll +++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll @@ -1628,9 +1628,9 @@ define noundef i32 @add-basic(i32 %a, i32 %b) { ; CHECK-LABEL: 'add-basic' ; CHECK-NEXT: Classifying expressions for: @add-basic ; CHECK-NEXT: %res = add nuw nsw i32 %a, %b -; CHECK-NEXT: --> (%a + %b) U: full-set S: full-set +; CHECK-NEXT: --> (%a + %b)<nuw><nsw> U: full-set S: full-set ; CHECK-NEXT: %res2 = udiv i32 255, %res -; CHECK-NEXT: --> (255 /u (%a + %b)) U: [0,256) S: [0,256) +; CHECK-NEXT: --> (255 /u (%a + %b)<nuw><nsw>) U: [0,256) S: [0,256) ; CHECK-NEXT: Determining loop execution counts for: @add-basic ; %res = add nuw nsw i32 %a, %b @@ -1656,9 +1656,9 @@ define noundef i32 @mul-basic(i32 %a, i32 %b) { ; CHECK-LABEL: 'mul-basic' ; CHECK-NEXT: Classifying expressions for: @mul-basic ; CHECK-NEXT: %res = mul nuw nsw i32 %a, %b -; CHECK-NEXT: --> (%a * %b) U: full-set S: full-set +; CHECK-NEXT: --> (%a * %b)<nuw><nsw> U: full-set S: full-set ; CHECK-NEXT: %res2 = udiv i32 255, %res -; CHECK-NEXT: --> (255 /u (%a * %b)) U: [0,256) S: [0,256) +; CHECK-NEXT: --> (255 /u (%a * %b)<nuw><nsw>) U: [0,256) S: [0,256) ; CHECK-NEXT: Determining loop execution counts for: @mul-basic ; %res = mul nuw nsw i32 %a, %b diff --git a/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll b/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll index e4000b52c4a9..8f57fe6866bd 100644 --- a/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll +++ b/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll @@ -8,10 +8,6 @@ target triple = "x86_64-apple-macosx10.9.0" @C = common global [2000 x float] zeroinitializer, align 16 @D = common global [2000 x float] zeroinitializer, align 16 -; Currently SCEV isn't smart enough to figure out that accesses -; A[3*i], A[3*i+1] and A[3*i+2] are consecutive, but in future -; that would hopefully be fixed. For now, check that this isn't -; vectorized. ; Function Attrs: nounwind ssp uwtable define void @foo_3double(i32 %u) #0 { ; CHECK-LABEL: @foo_3double( @@ -21,26 +17,25 @@ define void @foo_3double(i32 %u) #0 { ; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3 ; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM]] -; CHECK-NEXT: [[TMP0:%.*]] = load double, double* [[ARRAYIDX]], align 8 ; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM]] -; CHECK-NEXT: [[TMP1:%.*]] = load double, double* [[ARRAYIDX4]], align 8 -; CHECK-NEXT: [[ADD5:%.*]] = fadd double [[TMP0]], [[TMP1]] -; CHECK-NEXT: store double [[ADD5]], double* [[ARRAYIDX]], align 8 ; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1 ; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64 ; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM12]] -; CHECK-NEXT: [[TMP2:%.*]] = load double, double* [[ARRAYIDX13]], align 8 +; CHECK-NEXT: [[TMP0:%.*]] = bitcast double* [[ARRAYIDX]] to <2 x double>* +; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, <2 x double>* [[TMP0]], align 8 ; CHECK-NEXT: [[ARRAYIDX17:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM12]] -; CHECK-NEXT: [[TMP3:%.*]] = load double, double* [[ARRAYIDX17]], align 8 -; CHECK-NEXT: [[ADD18:%.*]] = fadd double [[TMP2]], [[TMP3]] -; CHECK-NEXT: store double [[ADD18]], double* [[ARRAYIDX13]], align 8 +; CHECK-NEXT: [[TMP2:%.*]] = bitcast double* [[ARRAYIDX4]] to <2 x double>* +; CHECK-NEXT: [[TMP3:%.*]] = load <2 x double>, <2 x double>* [[TMP2]], align 8 +; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]] +; CHECK-NEXT: [[TMP5:%.*]] = bitcast double* [[ARRAYIDX]] to <2 x double>* +; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8 ; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2 ; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64 ; CHECK-NEXT: [[ARRAYIDX26:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM25]] -; CHECK-NEXT: [[TMP4:%.*]] = load double, double* [[ARRAYIDX26]], align 8 +; CHECK-NEXT: [[TMP6:%.*]] = load double, double* [[ARRAYIDX26]], align 8 ; CHECK-NEXT: [[ARRAYIDX30:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM25]] -; CHECK-NEXT: [[TMP5:%.*]] = load double, double* [[ARRAYIDX30]], align 8 -; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP4]], [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = load double, double* [[ARRAYIDX30]], align 8 +; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP6]], [[TMP7]] ; CHECK-NEXT: store double [[ADD31]], double* [[ARRAYIDX26]], align 8 ; CHECK-NEXT: ret void ; </cut>

4 years, 8 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Use full logic when infering flags on add and gep

by ci_notify＠linaro.org

After llvm commit d02db32644b7360bcda54cdf739fa42abe450fcd Author: Philip Reames <listmail(a)philipreames.com> [SCEV] Use full logic when infering flags on add and gep the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 10842 to 11545 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd cd investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d02db32644b7360bcda54cdf739fa42abe450fcd ../artifacts/test.sh # Reproduce last_good build git checkout --detach f39978b84f1d3a1da6c32db48f64c8daae64b3ad ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d02db32644b7360bcda54cdf739fa42abe450fcd Author: Philip Reames <listmail(a)philipreames.com> Date: Sun Oct 3 15:32:15 2021 -0700 [SCEV] Use full logic when infering flags on add and gep This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path. We can still do much better here (on both paths), but this is our first step. Differential Revision: https://reviews.llvm.org/D111003 --- llvm/lib/Analysis/ScalarEvolution.cpp | 10 ++-------- .../Delinearization/multidim_ivs_and_integer_offsets_3d.ll | 2 +- .../Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll | 2 +- llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll | 4 ++-- llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll | 8 ++++---- llvm/test/Analysis/ScalarEvolution/load.ll | 2 +- llvm/test/Analysis/ScalarEvolution/ptrtoint.ll | 2 +- polly/test/IstAstInfo/simple-run-time-condition.ll | 2 +- 8 files changed, 13 insertions(+), 19 deletions(-) diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 75cecbf48c08..70bf9aee6e0a 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6657,14 +6657,8 @@ bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) { // TODO: We can do better here in some cases. if (!isSCEVable(Op->getType())) return false; - // TODO: the following two lines should be: - // if (auto *DefI = getDefinedScopeRoot(getSCEV(Op))) - // if (isGuaranteedToTransferExecutionTo(DefI, I)) - // We use the following instead for the purposes of seperating a bugfix - // change from an optimization change. Once pr51817 is fully addressed, - // we should unlock this power. - if (auto *AddRecS = dyn_cast<SCEVAddRecExpr>(getSCEV(Op))) - if (isGuaranteedToExecuteForEveryIteration(I, AddRecS->getLoop())) + if (auto *DefI = getDefinedScopeRoot(getSCEV(Op))) + if (isGuaranteedToTransferExecutionTo(DefI, I)) return true; } return false; diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll index 712a52927dcb..77982c786e6e 100644 --- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll +++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll @@ -11,7 +11,7 @@ ; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k> ; CHECK: Base offset: %A ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes. -; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nw><%for.j>][{7,+,1}<nuw><nsw><%for.k>] +; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nsw><%for.j>][{7,+,1}<nuw><nsw><%for.k>] define void @foo(i64 %n, i64 %m, i64 %o, double* %A) { entry: diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll index e3fdb0642211..8ecd498ea211 100644 --- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll +++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll @@ -11,7 +11,7 @@ ; AddRec: {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k> ; CHECK: Base offset: %A ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes. -; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nw><%for.j>][{%r,+,1}<nsw><%for.k>] +; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nsw><%for.j>][{%r,+,1}<nsw><%for.k>] define void @foo(i64 %n, i64 %m, i64 %o, double* %A, i64 %p, i64 %q, i64 %r) { entry: diff --git a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll index 821513199546..1f1515435e1a 100644 --- a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll +++ b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll @@ -11,8 +11,8 @@ target triple = "powerpc64le-unknown-linux-gnu" ; } ; } -; CHECK-DAG: Loop 'for.i' has cost = 20300 -; CHECK-DAG: Loop 'for.j' has cost = 700 +; CHECK-DAG: Loop 'for.i' has cost = 20600 +; CHECK-DAG: Loop 'for.j' has cost = 800 define void @foo(i64 %n, i64 %m, i32* %A, i32* %B, i32* %C) { entry: diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll index c8d3137f8dc9..5ab24159c250 100644 --- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll +++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll @@ -273,9 +273,9 @@ define void @test-add-scope-bound-unkn-header(i32* %input, i32 %needle) { ; CHECK-NEXT: %offset = load i32, i32* %gep, align 4 ; CHECK-NEXT: --> %offset U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %i.next = add nuw i32 %i, %offset -; CHECK-NEXT: --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> (%offset + %i)<nuw> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %gep2 = getelementptr i32, i32* %input, i32 %i.next -; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: Determining loop execution counts for: @test-add-scope-bound-unkn-header ; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count. ; CHECK-NEXT: Loop %loop: Unpredictable max backedge-taken count. @@ -307,9 +307,9 @@ define void @test-add-scope-bound-unkn-header2(i32* %input, i32 %needle) { ; CHECK-NEXT: %offset = load i32, i32* %gep, align 4 ; CHECK-NEXT: --> %offset U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %i.next = add nuw i32 %i, %offset -; CHECK-NEXT: --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> (%offset + %i)<nuw> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %gep2 = getelementptr i32, i32* %input, i32 %i.next -; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: Determining loop execution counts for: @test-add-scope-bound-unkn-header2 ; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count. ; CHECK-NEXT: Loop %loop: Unpredictable max backedge-taken count. diff --git a/llvm/test/Analysis/ScalarEvolution/load.ll b/llvm/test/Analysis/ScalarEvolution/load.ll index c0d671342af7..e95a093b2a8b 100644 --- a/llvm/test/Analysis/ScalarEvolution/load.ll +++ b/llvm/test/Analysis/ScalarEvolution/load.ll @@ -73,7 +73,7 @@ define i32 @test2() nounwind uwtable readonly { ; CHECK-NEXT: %n.01 = phi %struct.ListNode* [ bitcast ({ %struct.ListNode*, i32, [4 x i8] }* @node5 to %struct.ListNode*), %entry ], [ %1, %for.body ] ; CHECK-NEXT: --> %n.01 U: full-set S: full-set Exits: @node1 LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %i = getelementptr inbounds %struct.ListNode, %struct.ListNode* %n.01, i64 0, i32 1 -; CHECK-NEXT: --> (4 + %n.01) U: full-set S: full-set Exits: (4 + @node1)<nuw><nsw> LoopDispositions: { %for.body: Variant } +; CHECK-NEXT: --> (4 + %n.01)<nuw> U: [4,0) S: [4,0) Exits: (4 + @node1)<nuw><nsw> LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %0 = load i32, i32* %i, align 4 ; CHECK-NEXT: --> %0 U: full-set S: full-set Exits: 0 LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %add = add nsw i32 %0, %sum.02 diff --git a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll index 93d8782f373e..cb40ddda9369 100644 --- a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll +++ b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll @@ -502,7 +502,7 @@ define void @pr46786_c26_int(i32* %arg, i32* %arg1, i32* %arg2) { ; X32-NEXT: %i11 = ashr exact i64 %i10, 2 ; X32-NEXT: --> %i11 U: [-2147483648,2147483648) S: [-2147483648,2147483648) Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i12 = getelementptr inbounds i32, i32* %arg2, i64 %i11 -; X32-NEXT: --> ((4 * (trunc i64 %i11 to i32)) + %arg2) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } +; X32-NEXT: --> ((4 * (trunc i64 %i11 to i32))<nsw> + %arg2) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i13 = load i32, i32* %i12, align 4 ; X32-NEXT: --> %i13 U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i14 = add nsw i32 %i13, %i8 diff --git a/polly/test/IstAstInfo/simple-run-time-condition.ll b/polly/test/IstAstInfo/simple-run-time-condition.ll index aba5d9e34f50..0d167566291b 100644 --- a/polly/test/IstAstInfo/simple-run-time-condition.ll +++ b/polly/test/IstAstInfo/simple-run-time-condition.ll @@ -20,7 +20,7 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3 ; for the delinearization is simplified such that conditions that would not ; cause any code to be executed are not generated. -; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q <= 100)) && 0 == ((m >= 1 && n + p >= 9223372036854775809) || (o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= -9223372036854775709))) +; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q <= 100)) && 0 == ((o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= -9223372036854775709))) ; CHECK: if (o <= 0) { ; CHECK: for (int c0 = 0; c0 < n; c0 += 1) </cut>

4 years, 8 months

1
0
0 0

[TCWG CI] Regression caused by gcc: Enhance -Waddress to detect more suspicious expressions [PR102103].

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Enhance -Waddress to detect more suspicious expressions [PR102103].: commit 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 Author: Martin Sebor <msebor(a)redhat.com> Enhance -Waddress to detect more suspicious expressions [PR102103]. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21397 # First few build errors in logs: # 00:02:12 sound/core/oss/mixer_oss.c:1035:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized] # 00:02:15 make[3]: *** [scripts/Makefile.build:288: sound/core/oss/mixer_oss.o] Error 1 # 00:02:15 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/oss/pcm_oss.c:2475:34: error: ‘setup’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/oss/pcm_oss.c:2985:51: error: ‘template’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:02:16 make[4]: *** [scripts/Makefile.build:288: sound/core/seq/oss/seq_oss_init.o] Error 1 # 00:02:22 make[3]: *** [scripts/Makefile.build:288: sound/core/oss/pcm_oss.o] Error 1 # 00:02:28 make[3]: *** [scripts/Makefile.build:571: sound/core/seq/oss] Error 2 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21403 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Reproduce builds: <cut> mkdir investigate-gcc-4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 cd investigate-gcc-4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 ../artifacts/test.sh # Reproduce last_good build git checkout --detach f1710910087fb1f4a7706e9ce838163ffcbc50b4 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Oct 1 11:50:25 2021 -0600 Enhance -Waddress to detect more suspicious expressions [PR102103]. Resolves: PR c/102103 - missing warning comparing array address to null gcc/ChangeLog: PR c/102103 * doc/invoke.texi (-Waddress): Update. * gengtype.c (write_types): Avoid -Waddress. * poly-int.h (POLY_SET_COEFF): Avoid using null. gcc/c-family/ChangeLog: PR c/102103 * c-common.c (decl_with_nonnull_addr_p): Handle members. Check and perform warning suppression. (c_common_truthvalue_conversion): Enhance warning suppression. gcc/c/ChangeLog: PR c/102103 * c-typeck.c (maybe_warn_for_null_address): New function. (build_binary_op): Call it. gcc/cp/ChangeLog: PR c/102103 * typeck.c (warn_for_null_address): Enhance. (cp_build_binary_op): Call it also for member pointers. gcc/fortran/ChangeLog: PR c/102103 * array.c: Remove an unnecessary test. * trans-array.c: Same. gcc/testsuite/ChangeLog: PR c/102103 * g++.dg/cpp0x/constexpr-array-ptr10.C: Suppress a valid warning. * g++.dg/warn/Wreturn-local-addr-6.C: Correct a cast. * gcc.dg/Waddress.c: Expect a warning. * c-c++-common/Waddress-3.c: New test. * c-c++-common/Waddress-4.c: New test. * g++.dg/warn/Waddress-5.C: New test. * g++.dg/warn/Waddress-6.C: New test. * g++.dg/warn/pr101219.C: Expect a warning. * gcc.dg/Waddress-3.c: New test. --- gcc/c-family/c-common.c | 29 +++-- gcc/c/c-typeck.c | 140 ++++++++++++++++----- gcc/cp/typeck.c | 94 ++++++++++++-- gcc/doc/invoke.texi | 48 +++++-- gcc/fortran/array.c | 2 +- gcc/fortran/trans-array.c | 1 - gcc/gengtype.c | 4 +- gcc/poly-int.h | 4 +- gcc/testsuite/c-c++-common/Waddress-3.c | 125 ++++++++++++++++++ gcc/testsuite/c-c++-common/Waddress-4.c | 106 ++++++++++++++++ gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C | 5 +- gcc/testsuite/g++.dg/warn/Waddress-5.C | 115 +++++++++++++++++ gcc/testsuite/g++.dg/warn/Waddress-6.C | 79 ++++++++++++ gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C | 4 +- gcc/testsuite/g++.dg/warn/pr101219.C | 4 +- gcc/testsuite/gcc.dg/Waddress-3.c | 35 ++++++ gcc/testsuite/gcc.dg/Waddress.c | 2 +- 17 files changed, 722 insertions(+), 75 deletions(-) diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 5845c675e85..9d19e352725 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -3393,14 +3393,16 @@ c_wrap_maybe_const (tree expr, bool non_const) return expr; } -/* Return whether EXPR is a declaration whose address can never be - NULL. */ +/* Return whether EXPR is a declaration whose address can never be NULL. + The address of the first struct member could be NULL only if it were + accessed through a NULL pointer, and such an access would be invalid. */ bool decl_with_nonnull_addr_p (const_tree expr) { return (DECL_P (expr) - && (TREE_CODE (expr) == PARM_DECL + && (TREE_CODE (expr) == FIELD_DECL + || TREE_CODE (expr) == PARM_DECL || TREE_CODE (expr) == LABEL_DECL || !DECL_WEAK (expr))); } @@ -3488,13 +3490,17 @@ c_common_truthvalue_conversion (location_t location, tree expr) case ADDR_EXPR: { tree inner = TREE_OPERAND (expr, 0); - if (decl_with_nonnull_addr_p (inner)) + if (decl_with_nonnull_addr_p (inner) + /* Check both EXPR and INNER for suppression. */ + && !warning_suppressed_p (expr, OPT_Waddress) + && !warning_suppressed_p (inner, OPT_Waddress)) { - /* Common Ada programmer's mistake. */ + /* Common Ada programmer's mistake. */ warning_at (location, OPT_Waddress, "the address of %qD will always evaluate as %<true%>", inner); + suppress_warning (inner, OPT_Waddress); return truthvalue_true_node; } break; @@ -3627,8 +3633,17 @@ c_common_truthvalue_conversion (location_t location, tree expr) break; /* If this isn't narrowing the argument, we can ignore it. */ if (TYPE_PRECISION (totype) >= TYPE_PRECISION (fromtype)) - return c_common_truthvalue_conversion (location, - TREE_OPERAND (expr, 0)); + { + tree op0 = TREE_OPERAND (expr, 0); + if ((TREE_CODE (fromtype) == POINTER_TYPE + && TREE_CODE (totype) == INTEGER_TYPE) + || warning_suppressed_p (expr, OPT_Waddress)) + /* Suppress -Waddress for casts to intptr_t, propagating + any suppression from the enclosing expression to its + operand. */ + suppress_warning (op0, OPT_Waddress); + return c_common_truthvalue_conversion (location, op0); + } } break; diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index c74f876e667..33963d7555a 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -11554,6 +11554,110 @@ build_vec_cmp (tree_code code, tree type, return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); } +/* Possibly warn about an address of OP never being NULL in a comparison + operation CODE involving null. */ + +static void +maybe_warn_for_null_address (location_t loc, tree op, tree_code code) +{ + if (!warn_address || warning_suppressed_p (op, OPT_Waddress)) + return; + + if (TREE_CODE (op) == NOP_EXPR) + { + /* Allow casts to intptr_t to suppress the warning. */ + tree type = TREE_TYPE (op); + if (TREE_CODE (type) == INTEGER_TYPE) + return; + op = TREE_OPERAND (op, 0); + } + + if (TREE_CODE (op) == POINTER_PLUS_EXPR) + { + /* Allow a cast to void* to suppress the warning. */ + tree type = TREE_TYPE (TREE_TYPE (op)); + if (VOID_TYPE_P (type)) + return; + + /* Adding any value to a null pointer, including zero, is undefined + in C. This includes the expression &p[0] where p is the null + pointer, although &p[0] will have been folded to p by this point + and so not diagnosed. */ + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the pointer operand in %qE must not be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the pointer operand in %qE must not be NULL", + op); + + return; + } + + if (TREE_CODE (op) != ADDR_EXPR) + return; + + op = TREE_OPERAND (op, 0); + + if (TREE_CODE (op) == IMAGPART_EXPR + || TREE_CODE (op) == REALPART_EXPR) + { + /* The address of either complex part may not be null. */ + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the address of %qE will never be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the address of %qE will never be NULL", + op); + return; + } + + /* Set to true in the loop below if OP dereferences is operand. + In such a case the ultimate target need not be a decl for + the null [in]equality test to be constant. */ + bool deref = false; + + /* Get the outermost array or object, or member. */ + while (handled_component_p (op)) + { + if (TREE_CODE (op) == COMPONENT_REF) + { + /* Get the member (its address is never null). */ + op = TREE_OPERAND (op, 1); + break; + } + + /* Get the outer array/object to refer to in the warning. */ + op = TREE_OPERAND (op, 0); + deref = true; + } + + if ((!deref && !decl_with_nonnull_addr_p (op)) + || from_macro_expansion_at (loc)) + return; + + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the address of %qE will never be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the address of %qE will never be NULL", + op); + + if (DECL_P (op)) + inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op); +} + /* Build a binary-operation expression without default conversions. CODE is the kind of expression to build. LOCATION is the operator's location. @@ -12189,44 +12293,12 @@ build_binary_op (location_t location, enum tree_code code, short_compare = 1; else if (code0 == POINTER_TYPE && null_pointer_constant_p (orig_op1)) { - if (TREE_CODE (op0) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (op0, 0)) - && !from_macro_expansion_at (location)) - { - if (code == EQ_EXPR) - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<false%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op0, 0)); - else - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<true%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op0, 0)); - } + maybe_warn_for_null_address (location, op0, code); result_type = type0; } else if (code1 == POINTER_TYPE && null_pointer_constant_p (orig_op0)) { - if (TREE_CODE (op1) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (op1, 0)) - && !from_macro_expansion_at (location)) - { - if (code == EQ_EXPR) - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<false%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op1, 0)); - else - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<true%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op1, 0)); - } + maybe_warn_for_null_address (location, op1, code); result_type = type1; } else if (code0 == POINTER_TYPE && code1 == POINTER_TYPE) diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index cd130f16a66..e880d34dcfe 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -4603,25 +4603,93 @@ warn_for_null_address (location_t location, tree op, tsubst_flags_t complain) || warning_suppressed_p (op, OPT_Waddress)) return; + if (TREE_CODE (op) == NON_DEPENDENT_EXPR) + op = TREE_OPERAND (op, 0); + tree cop = fold_for_warn (op); - if (TREE_CODE (cop) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (cop, 0)) - && !warning_suppressed_p (cop, OPT_Waddress)) - warning_at (location, OPT_Waddress, "the address of %qD will never " - "be NULL", TREE_OPERAND (cop, 0)); + if (TREE_CODE (cop) == NON_LVALUE_EXPR) + /* Unwrap the expression for C++ 98. */ + cop = TREE_OPERAND (cop, 0); - if (CONVERT_EXPR_P (op) + if (TREE_CODE (cop) == PTRMEM_CST) + { + /* The address of a nonstatic data member is never null. */ + warning_at (location, OPT_Waddress, + "the address %qE will never be NULL", + cop); + return; + } + + if (TREE_CODE (cop) == NOP_EXPR) + { + /* Allow casts to intptr_t to suppress the warning. */ + tree type = TREE_TYPE (cop); + if (TREE_CODE (type) == INTEGER_TYPE) + return; + + STRIP_NOPS (cop); + } + + bool warned = false; + if (TREE_CODE (cop) == ADDR_EXPR) + { + cop = TREE_OPERAND (cop, 0); + + /* Set to true in the loop below if OP dereferences its operand. + In such a case the ultimate target need not be a decl for + the null [in]equality test to be necessarily constant. */ + bool deref = false; + + /* Get the outermost array or object, or member. */ + while (handled_component_p (cop)) + { + if (TREE_CODE (cop) == COMPONENT_REF) + { + /* Get the member (its address is never null). */ + cop = TREE_OPERAND (cop, 1); + break; + } + + /* Get the outer array/object to refer to in the warning. */ + cop = TREE_OPERAND (cop, 0); + deref = true; + } + + if ((!deref && !decl_with_nonnull_addr_p (cop)) + || from_macro_expansion_at (location) + || warning_suppressed_p (cop, OPT_Waddress)) + return; + + warned = warning_at (location, OPT_Waddress, + "the address of %qD will never be NULL", cop); + op = cop; + } + else if (TREE_CODE (cop) == POINTER_PLUS_EXPR) + { + /* Adding zero to the null pointer is well-defined in C++. When + the offset is unknown (i.e., not a constant) warn anyway since + it's less likely that the pointer operand is null than not. */ + tree off = TREE_OPERAND (cop, 1); + if (!integer_zerop (off) + && !warning_suppressed_p (cop, OPT_Waddress)) + warning_at (location, OPT_Waddress, "comparing the result of pointer " + "addition %qE and NULL", cop); + return; + } + else if (CONVERT_EXPR_P (op) && TYPE_REF_P (TREE_TYPE (TREE_OPERAND (op, 0)))) { - tree inner_op = op; - STRIP_NOPS (inner_op); + STRIP_NOPS (op); - if (DECL_P (inner_op)) - warning_at (location, OPT_Waddress, - "the compiler can assume that the address of " - "%qD will never be NULL", inner_op); + if (DECL_P (op)) + warned = warning_at (location, OPT_Waddress, + "the compiler can assume that the address of " + "%qD will never be NULL", op); } + + if (warned && DECL_P (op)) + inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op); } /* Warn about [expr.arith.conv]/2: If one operand is of enumeration type and @@ -5411,6 +5479,8 @@ cp_build_binary_op (const op_location_t &location, op1 = cp_convert (TREE_TYPE (op0), op1, complain); } result_type = TREE_TYPE (op0); + + warn_for_null_address (location, orig_op0, complain); } else if (TYPE_PTRMEMFUNC_P (type1) && null_ptr_cst_p (orig_op0)) return cp_build_binary_op (location, code, op1, op0, complain); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5f39b208049..d35114c0727 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8551,17 +8551,43 @@ by @option{-Wall}. @item -Waddress @opindex Waddress @opindex Wno-address -Warn about suspicious uses of memory addresses. These include using -the address of a function in a conditional expression, such as -@code{void func(void); if (func)}, and comparisons against the memory -address of a string literal, such as @code{if (x == "abc")}. Such -uses typically indicate a programmer error: the address of a function -always evaluates to true, so their use in a conditional usually -indicate that the programmer forgot the parentheses in a function -call; and comparisons against string literals result in unspecified -behavior and are not portable in C, so they usually indicate that the -programmer intended to use @code{strcmp}. This warning is enabled by -@option{-Wall}. +Warn about suspicious uses of address expressions. These include comparing +the address of a function or a declared object to the null pointer constant +such as in +@smallexample +void f (void); +void g (void) +@{ + if (!func) // warning: expression evaluates to false + abort (); +@} +@end smallexample +comparisons of a pointer to a string literal, such as in +@smallexample +void f (const char *x) +@{ + if (x == "abc") // warning: expression evaluates to false + puts ("equal"); +@} +@end smallexample +and tests of the results of pointer addition or subtraction for equality +to null, such as in +@smallexample +void f (const int *p, int i) +@{ + return p + i == NULL; +@} +@end smallexample +Such uses typically indicate a programmer error: the address of most +functions and objects necessarily evaluates to true (the exception are +weak symbols), so their use in a conditional might indicate missing +parentheses in a function call or a missing dereference in an array +expression. The subset of the warning for object pointers can be +suppressed by casting the pointer operand to an integer type such +as @code{inptr_t} or @code{uinptr_t}. +Comparisons against string literals result in unspecified behavior +and are not portable, and suggest the intent was to call @code{strcmp}. +@option{-Waddress} warning is enabled by @option{-Wall}. @item -Wno-address-of-packed-member @opindex Waddress-of-packed-member diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c index a4d1cb4c72d..6552eaf3b0c 100644 --- a/gcc/fortran/array.c +++ b/gcc/fortran/array.c @@ -2581,7 +2581,7 @@ gfc_array_dimen_size (gfc_expr *array, int dimen, mpz_t *result) } } - if (array->shape && array->shape[dimen]) + if (array->shape) { mpz_init_set (*result, array->shape[dimen]); return true; diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index b8061f37772..e2f59e0823c 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -5104,7 +5104,6 @@ set_loop_bounds (gfc_loopinfo *loop) if (info->shape) { - gcc_assert (info->shape[dim]); /* The frontend has worked out the size for us. */ if (!loopspec[n] || !specinfo->shape diff --git a/gcc/gengtype.c b/gcc/gengtype.c index 31d4bf4e5d0..a77cfd92bfa 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -3685,8 +3685,8 @@ write_types (outf_p output_header, type_p structures, output_mangled_typename (output_header, s); oprintf (output_header, "(X) do { \\\n"); oprintf (output_header, - " if (X != NULL) gt_%sx_%s (X);\\\n", wtd->prefix, - s_id_for_tag); + " if ((intptr_t)(X) != 0) gt_%sx_%s (X);\\\n", + wtd->prefix, s_id_for_tag); oprintf (output_header, " } while (0)\n"); for (opt = s->u.s.opt; opt; opt = opt->next) diff --git a/gcc/poly-int.h b/gcc/poly-int.h index f47f9e436a8..94e7b701f64 100644 --- a/gcc/poly-int.h +++ b/gcc/poly-int.h @@ -324,10 +324,10 @@ struct poly_result<T1, T2, 2> routine can take the address of RES rather than the address of a temporary. - The dummy comparison against a null C * is just a way of checking + The dummy self-comparison against C * is just a way of checking that C gives the right type. */ #define POLY_SET_COEFF(C, RES, I, VALUE) \ - ((void) (&(RES).coeffs[0] == (C *) 0), \ + ((void) (&(RES).coeffs[0] == (C *) (void *) &(RES).coeffs[0]), \ wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \ ? (void) ((RES).coeffs[I] = VALUE) \ : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) diff --git a/gcc/testsuite/c-c++-common/Waddress-3.c b/gcc/testsuite/c-c++-common/Waddress-3.c new file mode 100644 index 00000000000..9a13a444045 --- /dev/null +++ b/gcc/testsuite/c-c++-common/Waddress-3.c @@ -0,0 +1,125 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +typedef __INTPTR_TYPE__ intptr_t; +typedef __UINTPTR_TYPE__ uintptr_t; + +#ifndef __cplusplus +# define bool _Bool +#endif + +struct S { void *p, *a1[2], *a2[2][2]; } s, *p; + +extern const void *a1[2]; +extern void *a2[2][2], *ax[]; + +void T (bool); + +void test_array_eq_0 (int i) +{ + // Verify that casts intptr_t suppress the warning. + T ((intptr_t)a1 == 0); + T ((uintptr_t)a1 == 0); + T (a1 == 0); // { dg-warning "-Waddress" } + T (0 == &a1); // { dg-warning "-Waddress" } + // Verify that casts to other pointer types don't suppress it. + T ((void *)a1 == 0); // { dg-warning "-Waddress" } + T ((char *)a1 == 0); // { dg-warning "-Waddress" } + T (a1[0] == 0); + T (0 == (intptr_t)&a1[0]); + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (a1[i] == 0); + T (0 == (uintptr_t)&a1[i]); + T (0 == &a1[i]); // { dg-warning "-Waddress" } + + T ((intptr_t)a2 == 0); + T (a2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2); // { dg-warning "-Waddress" } + T (a2[0] == 0); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (a2[i] == 0); // { dg-warning "-Waddress" } + T (0 == &a2[i]); // { dg-warning "-Waddress" } + T (a2[0][0] == 0); + T (0 == &a2[0][0]); // { dg-warning "-Waddress" } + T (&ax == 0); // { dg-warning "-Waddress" } + T (0 == &ax); // { dg-warning "-Waddress" } + T (&ax[0] == 0); // { dg-warning "-Waddress" } + T (0 == ax[0]); +} + + +void test_array_neq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((uintptr_t)a1); + + T (a1); // { dg-warning "-Waddress" } + T ((void *)a1); // { dg-warning "-Waddress" } + T (&a1 != 0); // { dg-warning "-Waddress" } + T (a1[0]); + T (&a1[0] != 0); // { dg-warning "-Waddress" } + T (a1[i]); + T (&a1[i] != 0); // { dg-warning "-Waddress" } + + T ((intptr_t)a2); + T (a2); // { dg-warning "-Waddress" } + T ((void *)a2); // { dg-warning "-Waddress" } + T ((char *)a2); // { dg-warning "-Waddress" } + T (&a2 != 0); // { dg-warning "-Waddress" } + T (a2[0]); // { dg-warning "-Waddress" } + T (&a1[0] != 0); // { dg-warning "-Waddress" } + T (a2[i]); // { dg-warning "-Waddress" } + T (&a2[i] != 0); // { dg-warning "-Waddress" } + T (a2[0][0]); + T (&a2[0][0] != 0); // { dg-warning "-Waddress" } +} + + +void test_member_array_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)s.a1 == 0); + T (s.a1 == 0); // { dg-warning "-Waddress" } + T (0 == &a1); // { dg-warning "-Waddress" } + T (s.a1[0] == 0); + T ((void*)s.a1); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (s.a1[i] == 0); + T (0 == &a1[i]); // { dg-warning "-Waddress" } + + T ((uintptr_t)s.a2 == 0); + T (s.a2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2); // { dg-warning "-Waddress" } + T ((void *)s.a2 == 0);// { dg-warning "-Waddress" } + T (s.a2[0] == 0); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (s.a2[i] == 0); // { dg-warning "-Waddress" } + T (0 == &a2[i]); // { dg-warning "-Waddress" } + T (s.a2[0][0] == 0); + T (0 == &a2[0][0]); // { dg-warning "-Waddress" } +} + + +void test_member_array_neq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((uintptr_t)s.a1); + T (s.a1); // { dg-warning "-Waddress" } + T (&s.a1 != 0); // { dg-warning "-Waddress" } + T ((void *)&s.a1[0]); // { dg-warning "-Waddress" } + T (s.a1[0]); + T (&s.a1[0] != 0); // { dg-warning "-Waddress" } + T (s.a1[i]); + T (&s.a1[i] != 0); // { dg-warning "-Waddress" } + + T ((intptr_t)s.a2); + T (s.a2); // { dg-warning "-Waddress" } + T (&s.a2 != 0); // { dg-warning "-Waddress" } + T (s.a2[0]); // { dg-warning "-Waddress" } + T (&s.a1[0] != 0); // { dg-warning "-Waddress" } + T (s.a2[i]); // { dg-warning "-Waddress" } + T (&s.a2[i] != 0); // { dg-warning "-Waddress" } + T (s.a2[0][0]); + T (&s.a2[0][0] != 0); // { dg-warning "-Waddress" } +} diff --git a/gcc/testsuite/c-c++-common/Waddress-4.c b/gcc/testsuite/c-c++-common/Waddress-4.c new file mode 100644 index 00000000000..489a0cd717c --- /dev/null +++ b/gcc/testsuite/c-c++-common/Waddress-4.c @@ -0,0 +1,106 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +typedef __INTPTR_TYPE__ intptr_t; +typedef __INTPTR_TYPE__ uintptr_t; + +extern char *ax[], *a2[][2]; + +void T (int); + +void test_ax_plus_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)(ax + 0) == 0); + T ((uintptr_t)(ax + 1) == 0); + + T (ax + 0 == 0); // { dg-warning "-Waddress" } + T (&ax[0] == 0); // { dg-warning "-Waddress" } + T (ax - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &ax[-1]); // { dg-warning "-Waddress" } + T ((void *)(&ax[0] + 2) == 0); // { dg-warning "-Waddress" } + T (&ax[0] + 2 == 0); // { dg-warning "-Waddress" } + T (ax + 3 == 0); // { dg-warning "-Waddress" } + T (0 == &ax[-4]); // { dg-warning "-Waddress" } + T (ax - i == 0); // { dg-warning "-Waddress" } + T (&ax[i] == 0); // { dg-warning "-Waddress" } + T (0 == &ax[1] + i); // { dg-warning "-Waddress" } +} + +void test_a2_plus_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)(a2 + 0) == 0); + T ((uintptr_t)(a2 + 1) == 0); + + T (a2 + 0 == 0); // { dg-warning "-Waddress" } + // Verify that a cast to another pointer type doesn't suppress it. + T ((void*)(a2 + 0) == 0); // { dg-warning "-Waddress" } + T ((char*)a2 + 1 == 0); // { dg-warning "-Waddress" } + T (&a2[0] == 0); // { dg-warning "-Waddress" } + T (a2 - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &a2[-1]); // { dg-warning "-Waddress" } + T (a2 + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2[-2]); // { dg-warning "-Waddress" } + T (a2 - i == 0); // { dg-warning "-Waddress" } + T (&a2[i] == 0); // { dg-warning "-Waddress" } +} + +// Exercise a pointer. +void test_p_plus_eq_0 (int *p, int i) +{ + /* P + 0 and equivalently &P[0] are invalid for a null P but they're + folded to p before the warning has a chance to trigger. */ + T (p + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&p[0] == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + + T (p - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &p[-1]); // { dg-warning "-Waddress" } + T (p + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &p[-2]); // { dg-warning "-Waddress" } + T (p - i == 0); // { dg-warning "-Waddress" } + T (&p[i] == 0); // { dg-warning "-Waddress" } +} + +// Exercise pointer to array. +void test_pa_plus_eq_0 (int (*p)[], int (*p2)[][2], int i) +{ + // The array pointer may be null. + T (*p == 0); + /* &**P is equivalent to *P and might be the result od macro expansion. + Verify it doesn't cause a warning. */ + T (&**p == 0); + + /* *P + 0 is invalid but folded to *P before the warning has a chance + to trigger. */ + T (*p + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + + T (&(*p)[0] == 0); // { dg-warning "-Waddress" } + T (*p - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p)[-1]); // { dg-warning "-Waddress" } + T (*p + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p)[-2]); // { dg-warning "-Waddress" } + T (*p - i == 0); // { dg-warning "-Waddress" } + T (&(*p)[i] == 0); // { dg-warning "-Waddress" } + + + /* Similar to the above but for a pointer to a two-dimensional array, + referring to the higher-level element (i.e., an array itself). */ + T (*p2 == 0); + T (**p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&**p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&***p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&**p2 == 0); + + T (*p2 + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&(*p2)[0] == 0); // { dg-warning "-Waddress" } + T (&(*p2)[0][1] == 0); // { dg-warning "-Waddress" } + T (*p2 - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p2)[-1]); // { dg-warning "-Waddress" } + T (0 == &(*p2)[1][2]); // { dg-warning "-Waddress" } + T (*p2 + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p2)[-2]); // { dg-warning "-Waddress" } + T (*p2 - i == 0); // { dg-warning "-Waddress" } + T (&(*p2)[i] == 0); // { dg-warning "-Waddress" } +} diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C index 5224bb14234..63295230d51 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C @@ -85,8 +85,11 @@ extern __attribute__ ((weak)) int i; constexpr int *p1 = &i + 1; #pragma GCC diagnostic push +// Suppress warning: ordered comparison of pointer with integer zero. #pragma GCC diagnostic ignored "-Wextra" -// Suppress warning: ordered comparison of pointer with integer zero +// Also suppress -Waddress for comparisons of constant addresses to +// to null. +#pragma GCC diagnostic ignored "-Waddress" constexpr bool b0 = p1; // { dg-error "not a constant expression" } constexpr bool b1 = p1 == 0; // { dg-error "not a constant expression" } diff --git a/gcc/testsuite/g++.dg/warn/Waddress-5.C b/gcc/testsuite/g++.dg/warn/Waddress-5.C new file mode 100644 index 00000000000..b1ad38a8112 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Waddress-5.C @@ -0,0 +1,115 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +#if __cplusplus < 201103L +# define nullptr __null +#endif + +struct A +{ + void f (); + virtual void vf (); + virtual void pvf () = 0; + + void sf (); + + int *p; + int a[2]; +}; + +void T (bool); + +void warn_memptr_if () +{ + // Exercise warnings for addresses of nonstatic member functions. + if (&A::f == 0) // { dg-warning "the address '&A::f'" } + T (0); + + if (&A::vf) // { dg-warning "-Waddress" } + T (0); + + if (&A::pvf != 0) // { dg-warning "-Waddress" } + T (0); + + // Exercise warnings for addresses of static member functions. + if (&A::sf == 0) // { dg-warning "-Waddress" } + T (0); + + if (&A::sf) // { dg-warning "-Waddress" } + T (0); + + // Exercise warnings for addresses of nonstatic data members. + if (&A::p == 0) // { dg-warning "the address '&A::p'" } + T (0); + + if (&A::a == nullptr) // { dg-warning "-Waddress" } + T (0); +} + +void warn_memptr_bool () +{ + // Exercise warnings for addresses of nonstatic member functions. + T (&A::f == 0); // { dg-warning "-Waddress" } + T (&A::vf); // { dg-warning "-Waddress" } + T (&A::pvf != 0); // { dg-warning "-Waddress" } + + // Exercise warnings for addresses of static member functions. + T (&A::sf == 0); // { dg-warning "-Waddress" } + T (&A::sf); // { dg-warning "-Waddress" } + + // Exercise warnings for addresses of nonstatic data members. + T (&A::p == 0); // { dg-warning "-Waddress" } + T (&A::a == nullptr); // { dg-warning "-Waddress" } +} + + +/* Verify that no warnings are issued for a dependent expression in + a template. */ + +template <int> +struct B +{ + // This is why. + struct F { void* operator& () const { return 0; } } f; +}; + +template <class Type, int N> +void nowarn_dependent (Type targ) +{ + T (&Type::x == 0); + T (&targ == 0); + + Type tarr[1]; + T (&tarr[0] == nullptr); + + T (&B<N>::f == 0); + + /* Like in the case above, the address-of operator could be a member + of B<N>::vf that returns zero. */ + T (&B<N>::vf); + T (&B<N>::pvf != 0); + T (&B<N>::p == 0); + T (&B<N>::a == 0); +} + + +/* Verify that in an uninstantiated template warnings are not issued + for dependent expressions but are issued otherwise. */ + +template <class Type> +void warn_non_dependent (Type targ, Type *tptr, int i) +{ + /* The address of a pointer to a dependent type cannot be null but + the warning doesn't have a chance to see it. */ + T (&tptr == 0); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } + T (&i == 0); // { dg-warning "-Waddress" } + + int iarr[1]; + T (&iarr == 0); // { dg-warning "-Waddress" } + T (&*iarr != 0); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } + T (&iarr[0] == 0); // { dg-warning "-Waddress" } + + Type tarr[1]; + T (&tarr == nullptr); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } +} diff --git a/gcc/testsuite/g++.dg/warn/Waddress-6.C b/gcc/testsuite/g++.dg/warn/Waddress-6.C new file mode 100644 index 00000000000..c22a83a0dd7 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Waddress-6.C @@ -0,0 +1,79 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + Verify -Waddress for member arrays of structs and notes. + { dg-options "-Wall" } */ + +#if __cplusplus < 201103L +# define nullptr __null +#endif + +void T (bool); + +struct A +{ + int n; + int ia[]; // { dg-message "'A::ia' declared here" } +}; + +struct B +{ + A a[3]; // { dg-message "'B::a' declared here" } +}; + +struct C +{ + B b[3]; // { dg-message "'C::b' declared here" } +}; + +struct D +{ + C c[3]; // { dg-message "'D::c' declared here" } +}; + + +void test_waddress_1d () +{ + D d[2]; // { dg-message "'d' declared here" } + + T (d); // { dg-warning "address of 'd'" } + T (d == nullptr); // { dg-warning "address of 'd'" } + T (&d); // { dg-warning "address of 'd'" } + T (d->c); // { dg-warning "address of 'D::c'" } + T (d->c != nullptr); // { dg-warning "address of 'D::c'" } + T (d->c->b); // { dg-warning "address of 'C::b'" } + T (d->c[1].b->a); // { dg-warning "address of 'B::a'" } + T (d->c->b[2].a->ia); // { dg-warning "address of 'A::ia'" } + + if (d->c->b[2].a[1].ia) // { dg-warning "address of 'A::ia'" } + T (0); + + if (bool b = d->c->b[1].a) // { dg-warning "address of 'B::a'" } + T (b); + + /* The following is represented as a declaration of P followed + by an if statement and so it isn't diagnosed. It's not clear + that it should be since the pointer is then used. + void *p = d->c->b[2].a; + if (p) ... + */ + if (void *p = d->c->b[2].a) // { dg-warning "address of 'A::ia'" "" { xfail *-*-* } } + T (p); +} + + +void test_waddress_2d (int i) +{ + D d[2][3]; // { dg-message "'d' declared here" } + + T (d); // { dg-warning "address of 'd'" } + T (d == nullptr); // { dg-warning "address of 'd'" } + T (&d); // { dg-warning "address of 'd'" } + T (*d); // { dg-warning "address of 'd'" } + T (d[1] != nullptr); // { dg-warning "address of 'd'" } + T (&d[1]->c); // { dg-warning "address of 'D::c'" } + T (d[1]->c); // { dg-warning "address of 'D::c'" } + T (d[1]->c == nullptr); // { dg-warning "address of 'D::c'" } + T (d[i]->c[1].b); // { dg-warning "address of 'C::b'" } + T ((*(d + i))->c->b->a); // { dg-warning "address of 'B::a'" } + T (d[1][2].c->b->a->ia); // { dg-warning "address of 'A::ia'" } +} </cut>

4 years, 8 months

1
0
0 0

[TCWG CI] 458.sjeng grew in size by 4% after gcc: aarch64: Improve size heuristic for cpymem expansion

by ci_notify＠linaro.org

After gcc commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> aarch64: Improve size heuristic for cpymem expansion the following benchmarks grew in size by more than 1%: - 458.sjeng grew in size by 4% from 105780 to 109944 bytes - 459.GemsFDTD grew in size by 2% from 247504 to 251468 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 cd investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a459ee44c0a74b0df0485ed7a56683816c02aae9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8f95e3c04d659d541ca4937b3df2f1175a1c5f05 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> Date: Wed Sep 29 11:21:45 2021 +0100 aarch64: Improve size heuristic for cpymem expansion Similar to my previous patch for setmem this one does the same for the cpymem expansion. We count the number of ops emitted and compare it against the alternative of just calling the library function when optimising for size. For the code: void cpy_127 (char *out, char *in) { __builtin_memcpy (out, in, 127); } void cpy_128 (char *out, char *in) { __builtin_memcpy (out, in, 128); } we now emit a call to memcpy (with an extra MOV-immediate instruction for the size) instead of: cpy_127(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldr q0, [x1, 96] str q0, [x0, 96] ldr q0, [x1, 111] str q0, [x0, 111] ret cpy_128(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldp q0, q1, [x1, 96] stp q0, q1, [x0, 96] ret which is a clear code size win. Speed optimisation heuristics remain unchanged. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of emitted operations and adjust heuristic for code size. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * gcc.target/aarch64/cpymem-size.c: New test. --- gcc/config/aarch64/aarch64.c | 36 ++++++++++++++++++-------- gcc/testsuite/gcc.target/aarch64/cpymem-size.c | 29 +++++++++++++++++++++ 2 files changed, 54 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ac17c1c88fb..a9a1800af53 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -23390,7 +23390,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, } /* Expand cpymem, as if from a __builtin_memcpy. Return true if - we succeed, otherwise return false. */ + we succeed, otherwise return false, indicating that a libcall to + memcpy should be emitted. */ bool aarch64_expand_cpymem (rtx *operands) @@ -23407,11 +23408,13 @@ aarch64_expand_cpymem (rtx *operands) unsigned HOST_WIDE_INT size = INTVAL (operands[2]); - /* Inline up to 256 bytes when optimizing for speed. */ + /* Try to inline up to 256 bytes. */ unsigned HOST_WIDE_INT max_copy_size = 256; - if (optimize_function_for_size_p (cfun)) - max_copy_size = 128; + bool size_p = optimize_function_for_size_p (cfun); + + if (size > max_copy_size) + return false; int copy_bits = 256; @@ -23421,13 +23424,14 @@ aarch64_expand_cpymem (rtx *operands) || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) - { - copy_bits = 128; - max_copy_size = max_copy_size / 2; - } + copy_bits = 128; - if (size > max_copy_size) - return false; + /* Emit an inline load+store sequence and count the number of operations + involved. We use a simple count of just the loads and stores emitted + rather than rtx_insn count as all the pointer adjustments and reg copying + in this function will get optimized away later in the pipeline. */ + start_sequence (); + unsigned nops = 0; base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); @@ -23456,7 +23460,8 @@ aarch64_expand_cpymem (rtx *operands) cur_mode = V4SImode; aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode); - + /* A single block copy is 1 load + 1 store. */ + nops += 2; n -= mode_bits; /* Emit trailing copies using overlapping unaligned accesses - this is @@ -23471,7 +23476,16 @@ aarch64_expand_cpymem (rtx *operands) n = n_bits; } } + rtx_insn *seq = get_insns (); + end_sequence (); + + /* A memcpy libcall in the worst case takes 3 instructions to prepare the + arguments + 1 for the call. */ + unsigned libcall_cost = 4; + if (size_p && libcall_cost < nops) + return false; + emit_insn (seq); return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/cpymem-size.c b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c new file mode 100644 index 00000000000..4d488b74301 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +#include <stdlib.h> + +/* +** cpy_127: +** mov x2, 127 +** b memcpy +*/ +void +cpy_127 (char *out, char *in) +{ + __builtin_memcpy (out, in, 127); +} + +/* +** cpy_128: +** mov x2, 128 +** b memcpy +*/ +void +cpy_128 (char *out, char *in) +{ + __builtin_memcpy (out, in, 128); +} + +/* { dg-final { check-function-bodies "**" "" "" } } */ + </cut>

4 years, 8 months

2
2
0 0

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.: commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> [PR102546] X << Y being non-zero implies X is also non-zero. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18603 # First few build errors in logs: # 00:01:53 arch/arm/vfp/vfpdouble.c:1206:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 arch/arm/vfp/vfpsingle.c:1246:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpdouble.o] Error 1 # 00:01:54 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpsingle.o] Error 1 # 00:01:55 make[1]: *** [scripts/Makefile.build:514: arch/arm/vfp] Error 2 # 00:01:56 arch/arm/nwfpe/softfloat.c:3432:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:57 make[2]: *** [scripts/Makefile.build:271: arch/arm/nwfpe/softfloat.o] Error 1 # 00:01:57 make[1]: *** [scripts/Makefile.build:514: arch/arm/nwfpe] Error 2 # 00:02:14 arch/arm/kernel/smp.c:857:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:02:15 make[2]: *** [scripts/Makefile.build:271: arch/arm/kernel/smp.o] Error 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19709 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-stable-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Reproduce builds: <cut> mkdir investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 cd investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5f9ccf17de7f7581412c6bffd4a37beca9a79836 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 257d2890a769a8aa564d079170377e637e07acb1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Fri Oct 1 13:05:36 2021 +0200 [PR102546] X << Y being non-zero implies X is also non-zero. This patch teaches this to range-ops. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102546 * range-op.cc (operator_lshift::op1_range): Teach range-ops that X << Y is non-zero implies X is also non-zero. --- gcc/range-op.cc | 18 ++++++++++++++---- gcc/testsuite/gcc.dg/tree-ssa/pr102546.c | 23 +++++++++++++++++++++++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/gcc/range-op.cc b/gcc/range-op.cc index 5e37133026d..2baca4a197f 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2078,6 +2078,12 @@ operator_lshift::op1_range (irange &r, relation_kind rel ATTRIBUTE_UNUSED) const { tree shift_amount; + + if (!lhs.contains_p (build_zero_cst (type))) + r.set_nonzero (type); + else + r.set_varying (type); + if (op2.singleton_p (&shift_amount)) { wide_int shift = wi::to_wide (shift_amount); @@ -2089,21 +2095,24 @@ operator_lshift::op1_range (irange &r, return false; if (shift == 0) { - r = lhs; + r.intersect (lhs); return true; } // Work completely in unsigned mode to start. tree utype = type; + int_range_max tmp_range; if (TYPE_SIGN (type) == SIGNED) { int_range_max tmp = lhs; utype = unsigned_type_for (type); range_cast (tmp, utype); - op_rshift.fold_range (r, utype, tmp, op2); + op_rshift.fold_range (tmp_range, utype, tmp, op2); } else - op_rshift.fold_range (r, utype, lhs, op2); + op_rshift.fold_range (tmp_range, utype, lhs, op2); + + r.intersect (tmp_range); // Start with ranges which can produce the LHS by right shifting the // result by the shift amount. @@ -2128,7 +2137,8 @@ operator_lshift::op1_range (irange &r, range_cast (r, type); return true; } - return false; + + return !r.varying_p (); } bool diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c new file mode 100644 index 00000000000..4bd98747732 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-O3 -fdump-tree-optimized" } + +static int a; +static char b, c, d; +void bar(void); +void foo(void); + +int main() { + int f = 0; + for (; f <= 5; f++) { + bar(); + b = b && f; + d = f << f; + if (!(a >= d || f)) + foo(); + c = 1; + for (; c; c = 0) + ; + } +} + +// { dg-final { scan-tree-dump-not "foo" "optimized" } } </cut>

4 years, 8 months

2
1
0 0

Re: [TCWG CI] 400.perlbench slowed down by 6% after llvm: [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest

by Maxim Kuvyrkov

Hi Arthur, Thanks for looking into this! The flags to compile regexec.c were: -O3 --target=aarch64-linux-gnu -fgnu89-inline Clang was configured with (on x86_64-linux-gnu host): cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64 Please let me know if the above doesn’t work for you. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 29 Sep 2021, at 20:47, Arthur Eubanks <aeubanks(a)google.com> wrote: > > Do you know the flags passed to Clang to compile the sources? I tried compiling the preprocessed sources but ran into the below, and couldn't find the flags in any of the logs. > > In file included from regexec.c:93: > In file included from ./perl.h:384: > In file included from /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144: > /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27: error: typedef redefinition with different types ('unsigned long' vs 'unsigned long long') > typedef long unsigned int size_t; > ^ > 1 error generated. > > > > And yeah just moving the code around could cause major performance regressions, I've had other patches do the same for various benchmarks, there's not much we can do about that if that's actually the root cause. If I can compile the file I can check if the optimization actually created worse IR or not. > > > On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote: > Hi Arthur, > > Pre-processed source is in the save-temps tarballs linked below; S_regmatch() is in regexec.i . > > The save-temps also have .s assembly file for before and after your patch, and the only code-gen difference is in S_reginclass() function — see the attached screenshot #1. > > Looking into profile of S_regmatch(), some of the extra cycles come from hot loop starting with “cbz w19,...” getting misaligned — before your patch it was starting at "2bce10", and after it starts at "2bce6c”. > > Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() in an unfortunate way? > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeubanks(a)google.com> wrote: >> >> Could I get the source file with S_regmatch()? >> >> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote: >> Hi Arthur, >> >> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of its hot function S_regmatch() by 14%. >> >> Could you take a look if this is easily fixable, please? >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >> > On 24 Sep 2021, at 15:07, ci_notify(a)linaro.org wrote: >> > >> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc >> > Author: Arthur Eubanks <aeubanks(a)google.com> >> > >> > [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest >> > >> > the following benchmarks slowed down by more than 2%: >> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples >> > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples >> > >> > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. >> > >> > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: >> > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > >> > Configuration: >> > - Benchmark: SPEC CPU2006 >> > - Toolchain: Clang + Glibc + LLVM Linker >> > - Version: all components were built from their tip of trunk >> > - Target: aarch64-linux-gnu >> > - Compiler flags: -O3 >> > - Hardware: NVidia TX1 4x Cortex-A57 >> > >> > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>

4 years, 8 months

2
3
0 0

[TCWG CI] 456.hmmer slowed down by 5% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks slowed down by more than 2%: - 456.hmmer slowed down by 5% from 7649 to 8028 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O2 -flto -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

4 years, 8 months

3
7
0 0

[ACTIVITY] report week ending 1 Oct

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Worked through my code-review backlog + Noticed that we never got round to making our emulated GICv3 support having redistributors in more than one contiguous region; this prevents using more than 123 CPUs with the virt board. Sent out a patchset which adds the necessary handling. + Generally trying to tie off loose ends pre-holiday :-) -- PMM

4 years, 8 months

1
0
0 0

test-mail

by Prasanth Nair

test-mail

4 years, 8 months

1
1
0 0

[TCWG CI] Regression caused by linux:30f349097897c115345beabeecc5e710b479ff1e

by ci_notify＠linaro.org

Identified regression caused by *linux:30f349097897c115345beabeecc5e710b479ff1e*: commit 30f349097897c115345beabeecc5e710b479ff1e Merge: 9c566611ac5c f76c87e8c337 Author: Linus Torvalds <torvalds(a)linux-foundation.org> Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Results regressed to (for first_bad == 30f349097897c115345beabeecc5e710b479ff1e) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21782 # First few build errors in logs: from (for last_good == 9c566611ac5cc7b45af943632f7a9b1b6a642991) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29893 # linux build successful: all This commit has regressed these CI configurations: - tcwg_kernel/gnu-release-arm-mainline-allmodconfig Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… Reproduce builds: <cut> mkdir investigate-linux-30f349097897c115345beabeecc5e710b479ff1e cd investigate-linux-30f349097897c115345beabeecc5e710b479ff1e # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 30f349097897c115345beabeecc5e710b479ff1e ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9c566611ac5cc7b45af943632f7a9b1b6a642991 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 30f349097897c115345beabeecc5e710b479ff1e Merge: 9c566611ac5c f76c87e8c337 Author: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Wed Sep 8 16:38:25 2021 -0700 Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are mostly ARM cpufreq driver updates, including one new MediaTek driver that has just passed all of the reviews, with the addition of a revert of a recent intel_pstate commit, some core cpufreq changes and a DT-related update of the operating performance points (OPP) support code. Specifics: - Add new cpufreq driver for the MediaTek MT6779 platform called mediatek-hw along with corresponding DT bindings (Hector.Yuan). - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara Gopinath). - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu policy flag (Taniya Das). - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn Andersson). - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV flag (Viresh Kumar). - Add new cpufreq driver callback to allow drivers to register with the Energy Model in a consistent way and make several drivers use it (Viresh Kumar). - Change the remaining users of the .ready() cpufreq driver callback to move the code from it elsewhere and drop it from the cpufreq core (Viresh Kumar). - Revert recent intel_pstate change adding HWP guaranteed performance change notification support to it that led to problems, because the notification in question is triggered prematurely on some systems (Rafael Wysocki). - Convert the OPP DT bindings to DT schema and clean them up while at it (Rob Herring)" * tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW cpufreq: Remove ready() callback cpufreq: sh: Remove sh_cpufreq_cpu_ready() cpufreq: acpi: Remove acpi_cpufreq_cpu_ready() cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model dt-bindings: opp: Convert to DT schema dt-bindings: Clean-up OPP binding node names in examples ARM: dts: omap: Drop references to opp.txt cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model ... Documentation/cpu-freq/cpu-drivers.rst | 3 - .../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +- .../bindings/cpufreq/cpufreq-mediatek-hw.yaml | 70 +++ .../bindings/cpufreq/cpufreq-mediatek.txt | 2 +- .../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +- .../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +- .../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +- .../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +- .../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +- .../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +- .../opp/allwinner,sun50i-h6-operating-points.yaml | 4 + Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++ .../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++ Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++ Documentation/devicetree/bindings/opp/opp.txt | 622 --------------------- Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +- .../bindings/opp/ti-omap5-opp-supply.txt | 2 +- .../devicetree/bindings/power/power-domain.yaml | 2 +- .../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 - arch/arm/boot/dts/omap34xx.dtsi | 1 - arch/arm/boot/dts/omap36xx.dtsi | 1 - drivers/base/arch_topology.c | 2 + drivers/cpufreq/Kconfig.arm | 12 + drivers/cpufreq/Makefile | 1 + drivers/cpufreq/acpi-cpufreq.c | 14 +- drivers/cpufreq/cpufreq-dt-platdev.c | 4 + drivers/cpufreq/cpufreq-dt.c | 3 +- drivers/cpufreq/cpufreq.c | 17 +- drivers/cpufreq/imx6q-cpufreq.c | 2 +- drivers/cpufreq/intel_pstate.c | 39 -- drivers/cpufreq/mediatek-cpufreq-hw.c | 308 ++++++++++ drivers/cpufreq/mediatek-cpufreq.c | 3 +- drivers/cpufreq/omap-cpufreq.c | 2 +- drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++- drivers/cpufreq/scmi-cpufreq.c | 65 ++- drivers/cpufreq/scpi-cpufreq.c | 3 +- drivers/cpufreq/sh-cpufreq.c | 11 - drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +- include/linux/cpufreq.h | 75 ++- 39 files changed, 1441 insertions(+), 767 deletions(-) </cut>

4 years, 9 months

3
3
0 0

[TCWG CI] Regression caused by linux: scripts/gcc-plugins: consistently use HOSTCC

by ci_notify＠linaro.org

[TCWG CI] Regression caused by linux: scripts/gcc-plugins: consistently use HOSTCC: commit e554fdf7141e9edc05e7ece258f45b471af87494 Author: Ross Burton <ross.burton(a)arm.com> scripts/gcc-plugins: consistently use HOSTCC Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21723 # First few build errors in logs: # 00:02:07 drivers/char/ipmi/ipmi_msghandler.c:4880:1: error: the frame size of 1232 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] # 00:02:07 make[2]: *** [drivers/char/ipmi/ipmi_msghandler.o] Error 1 # 00:02:43 make[1]: *** [drivers/char/ipmi] Error 2 # 00:03:47 lib/crypto/curve25519-fiat32.c:864:1: error: the frame size of 1288 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] # 00:03:47 make[2]: *** [lib/crypto/curve25519-fiat32.o] Error 1 # 00:03:54 make[1]: *** [lib/crypto] Error 2 # 00:03:54 fs/reiserfs/namei.c:1646:1: error: the frame size of 1176 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] # 00:03:54 make[2]: *** [fs/reiserfs/namei.o] Error 1 # 00:04:32 drivers/tty/serial/8250/8250_aspeed_vuart.c:568:1: error: the frame size of 1048 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] # 00:04:32 make[4]: *** [drivers/tty/serial/8250/8250_aspeed_vuart.o] Error 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29916 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-release-arm-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Reproduce builds: <cut> mkdir investigate-linux-e554fdf7141e9edc05e7ece258f45b471af87494 cd investigate-linux-e554fdf7141e9edc05e7ece258f45b471af87494 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach e554fdf7141e9edc05e7ece258f45b471af87494 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 86455276585996fe5b43972aa8f31afcbafabc40 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e554fdf7141e9edc05e7ece258f45b471af87494 Author: Ross Burton <ross.burton(a)arm.com> Date: Thu Sep 23 16:28:11 2021 +0100 scripts/gcc-plugins: consistently use HOSTCC The GCC plugins are built using HOSTCC, but the path to the GCC plugins headers is obtained using CC. This can lead to interesting failures if the host compiler and cross compiler are different versions, and the host compiler uses the cross headers. Signed-off-by: Ross Burton <ross.burton(a)arm.com> Signed-off-by: Kees Cook <keescook(a)chromium.org> Link: https://lore.kernel.org/r/20210923152811.406516-1-ross.burton@arm.com --- scripts/gcc-plugins/Kconfig | 2 +- scripts/gcc-plugins/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig index ab9eb4cbe33a..5dad6d780138 100644 --- a/scripts/gcc-plugins/Kconfig +++ b/scripts/gcc-plugins/Kconfig @@ -9,7 +9,7 @@ menuconfig GCC_PLUGINS bool "GCC plugins" depends on HAVE_GCC_PLUGINS depends on CC_IS_GCC - depends on $(success,test -e $(shell,$(CC) -print-file-name=plugin)/include/plugin-version.h) + depends on $(success,test -e $(shell,$(HOSTCC) -print-file-name=plugin)/include/plugin-version.h) default y help GCC plugins are loadable modules that provide extra features to the diff --git a/scripts/gcc-plugins/Makefile b/scripts/gcc-plugins/Makefile index 1952d3bb80c6..6aac404344a6 100644 --- a/scripts/gcc-plugins/Makefile +++ b/scripts/gcc-plugins/Makefile @@ -19,7 +19,7 @@ targets += randomize_layout_seed.h randomize_layout_hash.h always-y += $(GCC_PLUGIN) -GCC_PLUGINS_DIR = $(shell $(CC) -print-file-name=plugin) +GCC_PLUGINS_DIR = $(shell $(HOSTCXX) -print-file-name=plugin) plugin_cxxflags = -Wp,-MMD,$(depfile) $(KBUILD_HOSTCXXFLAGS) -fPIC \ -include $(srctree)/include/linux/compiler-version.h \ </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-release-aarch64-next-allnoconfig - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-aarch64-next-allnoconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-release-aarch64-next-allnoconfig Culprit: <cut> commit 8633ef82f101c040427b57d4df7b706261420b94 Author: Javier Martinez Canillas <javierm(a)redhat.com> Date: Fri Jun 25 15:13:59 2021 +0200 drivers/firmware: consolidate EFI framebuffer setup for all arches The register_gop_device() function registers an "efi-framebuffer" platform device to match against the efifb driver, to have an early framebuffer for EFI platforms. But there is already support to do exactly the same by the Generic System Framebuffers (sysfb) driver. This used to be only for X86 but it has been moved to drivers/firmware and could be reused by other architectures. Also, besides supporting registering an "efi-framebuffer", this driver can register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers on non-X86 EFI platforms. For example, on aarch64 these drivers can only be used with DT and doesn't have code to register a "simple-frambuffer" platform device when booting with EFI. For these reasons, let's remove the register_gop_device() duplicated code and instead move the platform specific logic that's there to sysfb driver. Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com> Acked-by: Borislav Petkov <bp(a)suse.de> Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi… </cut> Results regressed to (for first_bad == 8633ef82f101c040427b57d4df7b706261420b94) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 600 # First few build errors in logs: # 00:00:38 ld.lld: error: undefined symbol: screen_info # 00:00:38 make: *** [vmlinux] Error 1 from (for last_good == d391c58271072d0b0fad93c82018d495b2633448) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 601 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ff11764…" Reproduce builds: <cut> mkdir investigate-linux-8633ef82f101c040427b57d4df7b706261420b94 cd investigate-linux-8633ef82f101c040427b57d4df7b706261420b94 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 8633ef82f101c040427b57d4df7b706261420b94 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d391c58271072d0b0fad93c82018d495b2633448 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Full commit (up to 1000 lines): <cut> commit 8633ef82f101c040427b57d4df7b706261420b94 Author: Javier Martinez Canillas <javierm(a)redhat.com> Date: Fri Jun 25 15:13:59 2021 +0200 drivers/firmware: consolidate EFI framebuffer setup for all arches The register_gop_device() function registers an "efi-framebuffer" platform device to match against the efifb driver, to have an early framebuffer for EFI platforms. But there is already support to do exactly the same by the Generic System Framebuffers (sysfb) driver. This used to be only for X86 but it has been moved to drivers/firmware and could be reused by other architectures. Also, besides supporting registering an "efi-framebuffer", this driver can register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers on non-X86 EFI platforms. For example, on aarch64 these drivers can only be used with DT and doesn't have code to register a "simple-frambuffer" platform device when booting with EFI. For these reasons, let's remove the register_gop_device() duplicated code and instead move the platform specific logic that's there to sysfb driver. Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com> Acked-by: Borislav Petkov <bp(a)suse.de> Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi… --- arch/arm/include/asm/efi.h | 5 +-- arch/arm64/include/asm/efi.h | 5 +-- arch/riscv/include/asm/efi.h | 5 +-- drivers/firmware/Kconfig | 8 ++-- drivers/firmware/Makefile | 2 +- drivers/firmware/efi/efi-init.c | 90 --------------------------------------- drivers/firmware/efi/sysfb_efi.c | 76 ++++++++++++++++++++++++++++++++- drivers/firmware/sysfb.c | 35 ++++++++++----- drivers/firmware/sysfb_simplefb.c | 31 ++++++++++---- drivers/gpu/drm/tiny/Kconfig | 4 +- include/linux/sysfb.h | 26 +++++------ 11 files changed, 143 insertions(+), 144 deletions(-) diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h index 9de7ab2ce05d..a6f3b179e8a9 100644 --- a/arch/arm/include/asm/efi.h +++ b/arch/arm/include/asm/efi.h @@ -17,6 +17,7 @@ #ifdef CONFIG_EFI void efi_init(void); +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md); int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md); @@ -52,10 +53,6 @@ void efi_virtmap_unload(void); struct screen_info *alloc_screen_info(void); void free_screen_info(struct screen_info *si); -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) -{ -} - /* * A reasonable upper bound for the uncompressed kernel size is 32 MBytes, * so we will reserve that amount of memory. We have no easy way to tell what diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index 3578aba9c608..42d673a011c8 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -14,6 +14,7 @@ #ifdef CONFIG_EFI extern void efi_init(void); +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); #else #define efi_init() #endif @@ -85,10 +86,6 @@ static inline void free_screen_info(struct screen_info *si) { } -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) -{ -} - #define EFI_ALLOC_ALIGN SZ_64K /* diff --git a/arch/riscv/include/asm/efi.h b/arch/riscv/include/asm/efi.h index 6d98cd999680..7a8f0d45b13a 100644 --- a/arch/riscv/include/asm/efi.h +++ b/arch/riscv/include/asm/efi.h @@ -13,6 +13,7 @@ #ifdef CONFIG_EFI extern void efi_init(void); +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); #else #define efi_init() #endif @@ -39,10 +40,6 @@ static inline void free_screen_info(struct screen_info *si) { } -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) -{ -} - void efi_virtmap_load(void); void efi_virtmap_unload(void); diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index 71f3d97f0c39..af6719cc576b 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -254,9 +254,9 @@ config QCOM_SCM_DOWNLOAD_MODE_DEFAULT config SYSFB bool default y - depends on X86 || COMPILE_TEST + depends on X86 || ARM || ARM64 || RISCV || COMPILE_TEST -config X86_SYSFB +config SYSFB_SIMPLEFB bool "Mark VGA/VBE/EFI FB as generic system framebuffer" depends on SYSFB help @@ -264,10 +264,10 @@ config X86_SYSFB bootloader or kernel can show basic video-output during boot for user-guidance and debugging. Historically, x86 used the VESA BIOS Extensions and EFI-framebuffers for this, which are mostly limited - to x86. + to x86 BIOS or EFI systems. This option, if enabled, marks VGA/VBE/EFI framebuffers as generic framebuffers so the new generic system-framebuffer drivers can be - used on x86. If the framebuffer is not compatible with the generic + used instead. If the framebuffer is not compatible with the generic modes, it is advertised as fallback platform framebuffer so legacy drivers like efifb, vesafb and uvesafb can pick it up. If this option is not selected, all system framebuffers are always diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile index ad78f78ffa8d..6ac637e422b9 100644 --- a/drivers/firmware/Makefile +++ b/drivers/firmware/Makefile @@ -19,7 +19,7 @@ obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o obj-$(CONFIG_SYSFB) += sysfb.o -obj-$(CONFIG_X86_SYSFB) += sysfb_simplefb.o +obj-$(CONFIG_SYSFB_SIMPLEFB) += sysfb_simplefb.o obj-$(CONFIG_TI_SCI_PROTOCOL) += ti_sci.o obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o obj-$(CONFIG_TURRIS_MOX_RWTM) += turris-mox-rwtm.o diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c index a552a08a1741..b19ce1a83f91 100644 --- a/drivers/firmware/efi/efi-init.c +++ b/drivers/firmware/efi/efi-init.c @@ -275,93 +275,3 @@ void __init efi_init(void) } #endif } - -static bool efifb_overlaps_pci_range(const struct of_pci_range *range) -{ - u64 fb_base = screen_info.lfb_base; - - if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) - fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32; - - return fb_base >= range->cpu_addr && - fb_base < (range->cpu_addr + range->size); -} - -static struct device_node *find_pci_overlap_node(void) -{ - struct device_node *np; - - for_each_node_by_type(np, "pci") { - struct of_pci_range_parser parser; - struct of_pci_range range; - int err; - - err = of_pci_range_parser_init(&parser, np); - if (err) { - pr_warn("of_pci_range_parser_init() failed: %d\n", err); - continue; - } - - for_each_of_pci_range(&parser, &range) - if (efifb_overlaps_pci_range(&range)) - return np; - } - return NULL; -} - -/* - * If the efifb framebuffer is backed by a PCI graphics controller, we have - * to ensure that this relation is expressed using a device link when - * running in DT mode, or the probe order may be reversed, resulting in a - * resource reservation conflict on the memory window that the efifb - * framebuffer steals from the PCIe host bridge. - */ -static int efifb_add_links(struct fwnode_handle *fwnode) -{ - struct device_node *sup_np; - - sup_np = find_pci_overlap_node(); - - /* - * If there's no PCI graphics controller backing the efifb, we are - * done here. - */ - if (!sup_np) - return 0; - - fwnode_link_add(fwnode, of_fwnode_handle(sup_np)); - of_node_put(sup_np); - - return 0; -} - -static const struct fwnode_operations efifb_fwnode_ops = { - .add_links = efifb_add_links, -}; - -static struct fwnode_handle efifb_fwnode; - -static int __init register_gop_device(void) -{ - struct platform_device *pd; - int err; - - if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI) - return 0; - - pd = platform_device_alloc("efi-framebuffer", 0); - if (!pd) - return -ENOMEM; - - if (IS_ENABLED(CONFIG_PCI)) { - fwnode_init(&efifb_fwnode, &efifb_fwnode_ops); - pd->dev.fwnode = &efifb_fwnode; - } - - err = platform_device_add_data(pd, &screen_info, sizeof(screen_info)); - if (err) - return err; - - return platform_device_add(pd); -} -subsys_initcall(register_gop_device); diff --git a/drivers/firmware/efi/sysfb_efi.c b/drivers/firmware/efi/sysfb_efi.c index 9f035b15501c..f51865e1b876 100644 --- a/drivers/firmware/efi/sysfb_efi.c +++ b/drivers/firmware/efi/sysfb_efi.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * Generic System Framebuffers on x86 + * Generic System Framebuffers * Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com> * * EFI Quirks Copyright (c) 2006 Edgar Hucek <gimli(a)dark-green.com> @@ -19,7 +19,9 @@ #include <linux/init.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/of_address.h> #include <linux/pci.h> +#include <linux/platform_device.h> #include <linux/screen_info.h> #include <linux/sysfb.h> #include <video/vga.h> @@ -267,7 +269,72 @@ static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = { {}, }; -__init void sysfb_apply_efi_quirks(void) +static bool efifb_overlaps_pci_range(const struct of_pci_range *range) +{ + u64 fb_base = screen_info.lfb_base; + + if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) + fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32; + + return fb_base >= range->cpu_addr && + fb_base < (range->cpu_addr + range->size); +} + +static struct device_node *find_pci_overlap_node(void) +{ + struct device_node *np; + + for_each_node_by_type(np, "pci") { + struct of_pci_range_parser parser; + struct of_pci_range range; + int err; + + err = of_pci_range_parser_init(&parser, np); + if (err) { + pr_warn("of_pci_range_parser_init() failed: %d\n", err); + continue; + } + + for_each_of_pci_range(&parser, &range) + if (efifb_overlaps_pci_range(&range)) + return np; + } + return NULL; +} + +/* + * If the efifb framebuffer is backed by a PCI graphics controller, we have + * to ensure that this relation is expressed using a device link when + * running in DT mode, or the probe order may be reversed, resulting in a + * resource reservation conflict on the memory window that the efifb + * framebuffer steals from the PCIe host bridge. + */ +static int efifb_add_links(struct fwnode_handle *fwnode) +{ + struct device_node *sup_np; + + sup_np = find_pci_overlap_node(); + + /* + * If there's no PCI graphics controller backing the efifb, we are + * done here. + */ + if (!sup_np) + return 0; + + fwnode_link_add(fwnode, of_fwnode_handle(sup_np)); + of_node_put(sup_np); + + return 0; +} + +static const struct fwnode_operations efifb_fwnode_ops = { + .add_links = efifb_add_links, +}; + +static struct fwnode_handle efifb_fwnode; + +__init void sysfb_apply_efi_quirks(struct platform_device *pd) { if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI || !(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS)) @@ -281,4 +348,9 @@ __init void sysfb_apply_efi_quirks(void) screen_info.lfb_height = temp; screen_info.lfb_linelength = 4 * screen_info.lfb_width; } + + if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI && IS_ENABLED(CONFIG_PCI)) { + fwnode_init(&efifb_fwnode, &efifb_fwnode_ops); + pd->dev.fwnode = &efifb_fwnode; + } } diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c index 1337515963d5..2bfbb05f7d89 100644 --- a/drivers/firmware/sysfb.c +++ b/drivers/firmware/sysfb.c @@ -1,11 +1,11 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * Generic System Framebuffers on x86 + * Generic System Framebuffers * Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com> */ /* - * Simple-Framebuffer support for x86 systems + * Simple-Framebuffer support * Create a platform-device for any available boot framebuffer. The * simple-framebuffer platform device is already available on DT systems, so * this module parses the global "screen_info" object and creates a suitable @@ -16,12 +16,12 @@ * to pick these devices up without messing with simple-framebuffer drivers. * The global "screen_info" is still valid at all times. * - * If CONFIG_X86_SYSFB is not selected, we never register "simple-framebuffer" + * If CONFIG_SYSFB_SIMPLEFB is not selected, never register "simple-framebuffer" * platform devices, but only use legacy framebuffer devices for * backwards compatibility. * * TODO: We set the dev_id field of all platform-devices to 0. This allows - * other x86 OF/DT parsers to create such devices, too. However, they must + * other OF/DT parsers to create such devices, too. However, they must * start at offset 1 for this to work. */ @@ -43,12 +43,10 @@ static __init int sysfb_init(void) bool compatible; int ret; - sysfb_apply_efi_quirks(); - /* try to create a simple-framebuffer device */ - compatible = parse_mode(si, &mode); + compatible = sysfb_parse_mode(si, &mode); if (compatible) { - ret = create_simplefb(si, &mode); + ret = sysfb_create_simplefb(si, &mode); if (!ret) return 0; } @@ -61,9 +59,24 @@ static __init int sysfb_init(void) else name = "platform-framebuffer"; - pd = platform_device_register_resndata(NULL, name, 0, - NULL, 0, si, sizeof(*si)); - return PTR_ERR_OR_ZERO(pd); + pd = platform_device_alloc(name, 0); + if (!pd) + return -ENOMEM; + + sysfb_apply_efi_quirks(pd); + + ret = platform_device_add_data(pd, si, sizeof(*si)); + if (ret) + goto err; + + ret = platform_device_add(pd); + if (ret) + goto err; + + return 0; +err: + platform_device_put(pd); + return ret; } /* must execute after PCI subsystem for EFI quirks */ diff --git a/drivers/firmware/sysfb_simplefb.c b/drivers/firmware/sysfb_simplefb.c index df892444ea17..b86761904949 100644 --- a/drivers/firmware/sysfb_simplefb.c +++ b/drivers/firmware/sysfb_simplefb.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * Generic System Framebuffers on x86 + * Generic System Framebuffers * Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com> */ @@ -23,9 +23,9 @@ static const char simplefb_resname[] = "BOOTFB"; static const struct simplefb_format formats[] = SIMPLEFB_FORMATS; -/* try parsing x86 screen_info into a simple-framebuffer mode struct */ -__init bool parse_mode(const struct screen_info *si, - struct simplefb_platform_data *mode) +/* try parsing screen_info into a simple-framebuffer mode struct */ +__init bool sysfb_parse_mode(const struct screen_info *si, + struct simplefb_platform_data *mode) { const struct simplefb_format *f; __u8 type; @@ -57,13 +57,14 @@ __init bool parse_mode(const struct screen_info *si, return false; } -__init int create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode) +__init int sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode) { struct platform_device *pd; struct resource res; u64 base, size; u32 length; + int ret; /* * If the 64BIT_BASE capability is set, ext_lfb_base will contain the @@ -105,7 +106,19 @@ __init int create_simplefb(const struct screen_info *si, if (res.end <= res.start) return -EINVAL; - pd = platform_device_register_resndata(NULL, "simple-framebuffer", 0, - &res, 1, mode, sizeof(*mode)); - return PTR_ERR_OR_ZERO(pd); + pd = platform_device_alloc("simple-framebuffer", 0); + if (!pd) + return -ENOMEM; + + sysfb_apply_efi_quirks(pd); + + ret = platform_device_add_resources(pd, &res, 1); + if (ret) + return ret; + + ret = platform_device_add_data(pd, mode, sizeof(*mode)); + if (ret) + return ret; + + return platform_device_add(pd); } diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig index 5593128eeff9..d31be274a2bd 100644 --- a/drivers/gpu/drm/tiny/Kconfig +++ b/drivers/gpu/drm/tiny/Kconfig @@ -64,8 +64,8 @@ config DRM_SIMPLEDRM buffer, size, and display format must be provided via device tree, UEFI, VESA, etc. - On x86 and compatible, you should also select CONFIG_X86_SYSFB to - use UEFI and VESA framebuffers. + On x86 BIOS or UEFI systems, you should also select SYSFB_SIMPLEFB + to use UEFI and VESA framebuffers. config TINYDRM_HX8357D tristate "DRM support for HX8357D display panels" diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index 3e5355769dc3..b0dcfa26d07b 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -58,37 +58,37 @@ struct efifb_dmi_info { #ifdef CONFIG_EFI extern struct efifb_dmi_info efifb_dmi_list[]; -void sysfb_apply_efi_quirks(void); +void sysfb_apply_efi_quirks(struct platform_device *pd); #else /* CONFIG_EFI */ -static inline void sysfb_apply_efi_quirks(void) +static inline void sysfb_apply_efi_quirks(struct platform_device *pd) { } #endif /* CONFIG_EFI */ -#ifdef CONFIG_X86_SYSFB +#ifdef CONFIG_SYSFB_SIMPLEFB -bool parse_mode(const struct screen_info *si, - struct simplefb_platform_data *mode); -int create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode); +bool sysfb_parse_mode(const struct screen_info *si, + struct simplefb_platform_data *mode); +int sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode); -#else /* CONFIG_X86_SYSFB */ +#else /* CONFIG_SYSFB_SIMPLE */ -static inline bool parse_mode(const struct screen_info *si, - struct simplefb_platform_data *mode) +static inline bool sysfb_parse_mode(const struct screen_info *si, + struct simplefb_platform_data *mode) { return false; } -static inline int create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode) +static inline int sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode) { return -EINVAL; } -#endif /* CONFIG_X86_SYSFB */ +#endif /* CONFIG_SYSFB_SIMPLE */ #endif /* _LINUX_SYSFB_H */ </cut>

4 years, 9 months

4
4
0 0

[TCWG CI] 400.perlbench slowed down by 6% after llvm: [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest

by ci_notify＠linaro.org

After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc Author: Arthur Eubanks <aeubanks(a)google.com> [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest the following benchmarks slowed down by more than 2%: - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc cd investigate-llvm-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach e7249e4acf3cf9438d6d9e02edecebd5b622a4dc ../artifacts/test.sh # Reproduce last_good build git checkout --detach 32a50078657dd8beead327a3478ede4e9d730432 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc Author: Arthur Eubanks <aeubanks(a)google.com> Date: Fri Aug 27 12:32:59 2021 -0700 [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837 --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 17 ++++++----- llvm/test/CodeGen/AArch64/csr-split.ll | 34 +++++++++++----------- .../fold-branch-to-common-dest-free-cost.ll | 5 ++-- 3 files changed, 29 insertions(+), 27 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 2ff98b238de0..a3bd89e72af9 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -3258,13 +3258,16 @@ bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU, SawVectorOp |= isVectorOp(I); // Account for the cost of duplicating this instruction into each - // predecessor. - NumBonusInsts += PredCount; - - // Early exits once we reach the limit. - if (NumBonusInsts > - BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier) - return false; + // predecessor. Ignore free instructions. + if (!TTI || + TTI->getUserCost(&I, CostKind) != TargetTransformInfo::TCC_Free) { + NumBonusInsts += PredCount; + + // Early exits once we reach the limit. + if (NumBonusInsts > + BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier) + return false; + } auto IsBCSSAUse = [BB, &I](Use &U) { auto *UI = cast<Instruction>(U.getUser()); diff --git a/llvm/test/CodeGen/AArch64/csr-split.ll b/llvm/test/CodeGen/AArch64/csr-split.ll index 1bee7f05acec..de85b4313433 100644 --- a/llvm/test/CodeGen/AArch64/csr-split.ll +++ b/llvm/test/CodeGen/AArch64/csr-split.ll @@ -82,22 +82,22 @@ define dso_local signext i32 @test2(i32* %p1) local_unnamed_addr { ; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: .cfi_offset w19, -8 ; CHECK-NEXT: .cfi_offset w30, -16 -; CHECK-NEXT: cbz x0, .LBB1_2 -; CHECK-NEXT: // %bb.1: // %if.end +; CHECK-NEXT: cbz x0, .LBB1_3 +; CHECK-NEXT: // %bb.1: // %entry ; CHECK-NEXT: adrp x8, a ; CHECK-NEXT: ldrsw x8, [x8, :lo12:a] ; CHECK-NEXT: mov x19, x0 ; CHECK-NEXT: cmp x8, x0 -; CHECK-NEXT: b.eq .LBB1_3 -; CHECK-NEXT: .LBB1_2: // %return -; CHECK-NEXT: mov w0, wzr -; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload -; CHECK-NEXT: ret -; CHECK-NEXT: .LBB1_3: // %if.then2 +; CHECK-NEXT: b.ne .LBB1_3 +; CHECK-NEXT: // %bb.2: // %if.then2 ; CHECK-NEXT: bl callVoid ; CHECK-NEXT: mov x0, x19 ; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload ; CHECK-NEXT: b callNonVoid +; CHECK-NEXT: .LBB1_3: // %return +; CHECK-NEXT: mov w0, wzr +; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload +; CHECK-NEXT: ret ; ; CHECK-APPLE-LABEL: test2: ; CHECK-APPLE: ; %bb.0: ; %entry @@ -108,26 +108,26 @@ define dso_local signext i32 @test2(i32* %p1) local_unnamed_addr { ; CHECK-APPLE-NEXT: .cfi_offset w29, -16 ; CHECK-APPLE-NEXT: .cfi_offset w19, -24 ; CHECK-APPLE-NEXT: .cfi_offset w20, -32 -; CHECK-APPLE-NEXT: cbz x0, LBB1_2 -; CHECK-APPLE-NEXT: ; %bb.1: ; %if.end +; CHECK-APPLE-NEXT: cbz x0, LBB1_3 +; CHECK-APPLE-NEXT: ; %bb.1: ; %entry ; CHECK-APPLE-NEXT: Lloh2: ; CHECK-APPLE-NEXT: adrp x8, _a@PAGE ; CHECK-APPLE-NEXT: Lloh3: ; CHECK-APPLE-NEXT: ldrsw x8, [x8, _a@PAGEOFF] ; CHECK-APPLE-NEXT: mov x19, x0 ; CHECK-APPLE-NEXT: cmp x8, x0 -; CHECK-APPLE-NEXT: b.eq LBB1_3 -; CHECK-APPLE-NEXT: LBB1_2: ; %return -; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload -; CHECK-APPLE-NEXT: mov w0, wzr -; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload -; CHECK-APPLE-NEXT: ret -; CHECK-APPLE-NEXT: LBB1_3: ; %if.then2 +; CHECK-APPLE-NEXT: b.ne LBB1_3 +; CHECK-APPLE-NEXT: ; %bb.2: ; %if.then2 ; CHECK-APPLE-NEXT: bl _callVoid ; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload ; CHECK-APPLE-NEXT: mov x0, x19 ; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload ; CHECK-APPLE-NEXT: b _callNonVoid +; CHECK-APPLE-NEXT: LBB1_3: ; %return +; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload +; CHECK-APPLE-NEXT: mov w0, wzr +; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload +; CHECK-APPLE-NEXT: ret ; CHECK-APPLE-NEXT: .loh AdrpLdr Lloh2, Lloh3 entry: %tobool = icmp eq i32* %p1, null diff --git a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll index ace2a5ed35ca..27df5ec44582 100644 --- a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll +++ b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll @@ -8,12 +8,11 @@ declare void @g2() define void @f(i8* %a, i8* %b, i1 %c, i1 %d, i1 %e) { ; CHECK-LABEL: @f( -; CHECK-NEXT: br i1 [[C:%.*]], label [[L1:%.*]], label [[L3:%.*]] -; CHECK: l1: ; CHECK-NEXT: [[A1:%.*]] = call i8* @llvm.strip.invariant.group.p0i8(i8* [[A:%.*]]) ; CHECK-NEXT: [[B1:%.*]] = call i8* @llvm.strip.invariant.group.p0i8(i8* [[B:%.*]]) ; CHECK-NEXT: [[I:%.*]] = icmp eq i8* [[A1]], [[B1]] -; CHECK-NEXT: br i1 [[I]], label [[L2:%.*]], label [[L3]] +; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C:%.*]], i1 [[I]], i1 false +; CHECK-NEXT: br i1 [[OR_COND]], label [[L2:%.*]], label [[L3:%.*]] ; CHECK: l2: ; CHECK-NEXT: call void @g1() ; CHECK-NEXT: br label [[RET:%.*]] </cut>

4 years, 9 months

3
4
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-next-allyesconfig - Build # 14 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-next-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-next-allyesconfig Culprit: <cut> commit 3d463dd5023b5a58b3c37207d65eeb5acbac2be3 Author: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Date: Thu Jul 29 12:40:19 2021 +0200 nfc: fdp: constify several pointers Several functions do not modify pointed data so arguments and local variables can be const for correctness and safety. This allows also making file-scope nci_core_get_config_otp_ram_version array const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> </cut> Results regressed to (for first_bad == 3d463dd5023b5a58b3c37207d65eeb5acbac2be3) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 19928 # First few build errors in logs: # 00:02:04 drivers/nfc/fdp/fdp.c:116:60: error: passing 'const char *' to parameter of type '__u8 *' (aka 'unsigned char *') discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers] # 00:02:04 make[3]: *** [scripts/Makefile.build:271: drivers/nfc/fdp/fdp.o] Error 1 # 00:02:05 make[2]: *** [scripts/Makefile.build:514: drivers/nfc/fdp] Error 2 # 00:02:23 make[1]: *** [scripts/Makefile.build:514: drivers/nfc] Error 2 # 00:04:16 make: *** [Makefile:1842: drivers] Error 2 from (for last_good == c3e26b6dc1b4e3e8f57be4f004b1f2a410c5c468) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 20000 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#cb16362…" Reproduce builds: <cut> mkdir investigate-linux-3d463dd5023b5a58b3c37207d65eeb5acbac2be3 cd investigate-linux-3d463dd5023b5a58b3c37207d65eeb5acbac2be3 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 3d463dd5023b5a58b3c37207d65eeb5acbac2be3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c3e26b6dc1b4e3e8f57be4f004b1f2a410c5c468 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Full commit (up to 1000 lines): <cut> commit 3d463dd5023b5a58b3c37207d65eeb5acbac2be3 Author: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Date: Thu Jul 29 12:40:19 2021 +0200 nfc: fdp: constify several pointers Several functions do not modify pointed data so arguments and local variables can be const for correctness and safety. This allows also making file-scope nci_core_get_config_otp_ram_version array const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> --- drivers/nfc/fdp/fdp.c | 18 +++++++++--------- drivers/nfc/fdp/fdp.h | 2 +- drivers/nfc/fdp/i2c.c | 6 +++--- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/nfc/fdp/fdp.c b/drivers/nfc/fdp/fdp.c index 3f5fba922c4d..c6b3334f24c9 100644 --- a/drivers/nfc/fdp/fdp.c +++ b/drivers/nfc/fdp/fdp.c @@ -52,7 +52,7 @@ struct fdp_nci_info { u32 limited_otp_version; u8 key_index; - u8 *fw_vsc_cfg; + const u8 *fw_vsc_cfg; u8 clock_type; u32 clock_freq; @@ -65,7 +65,7 @@ struct fdp_nci_info { wait_queue_head_t setup_wq; }; -static u8 nci_core_get_config_otp_ram_version[5] = { +static const u8 nci_core_get_config_otp_ram_version[5] = { 0x04, NCI_PARAM_ID_FW_RAM_VERSION, NCI_PARAM_ID_FW_OTP_VERSION, @@ -111,7 +111,7 @@ static inline int fdp_nci_patch_cmd(struct nci_dev *ndev, u8 type) } static inline int fdp_nci_set_production_data(struct nci_dev *ndev, u8 len, - char *data) + const char *data) { return nci_prop_cmd(ndev, NCI_OP_PROP_SET_PDATA_OID, len, data); } @@ -236,7 +236,7 @@ static int fdp_nci_send_patch(struct nci_dev *ndev, u8 conn_id, u8 type) static int fdp_nci_open(struct nci_dev *ndev) { - struct fdp_nci_info *info = nci_get_drvdata(ndev); + const struct fdp_nci_info *info = nci_get_drvdata(ndev); return info->phy_ops->enable(info->phy); } @@ -260,7 +260,7 @@ static int fdp_nci_request_firmware(struct nci_dev *ndev) { struct fdp_nci_info *info = nci_get_drvdata(ndev); struct device *dev = &info->phy->i2c_dev->dev; - u8 *data; + const u8 *data; int r; r = request_firmware(&info->ram_patch, FDP_RAM_PATCH_NAME, dev); @@ -269,7 +269,7 @@ static int fdp_nci_request_firmware(struct nci_dev *ndev) return r; } - data = (u8 *) info->ram_patch->data; + data = info->ram_patch->data; info->ram_patch_version = data[FDP_FW_HEADER_SIZE] | (data[FDP_FW_HEADER_SIZE + 1] << 8) | @@ -610,9 +610,9 @@ static int fdp_nci_core_get_config_rsp_packet(struct nci_dev *ndev, { struct fdp_nci_info *info = nci_get_drvdata(ndev); struct device *dev = &info->phy->i2c_dev->dev; - struct nci_core_get_config_rsp *rsp = (void *) skb->data; + const struct nci_core_get_config_rsp *rsp = (void *) skb->data; unsigned int i; - u8 *p; + const u8 *p; if (rsp->status == NCI_STATUS_OK) { @@ -691,7 +691,7 @@ static const struct nci_ops nci_ops = { int fdp_nci_probe(struct fdp_i2c_phy *phy, const struct nfc_phy_ops *phy_ops, struct nci_dev **ndevp, int tx_headroom, int tx_tailroom, u8 clock_type, u32 clock_freq, - u8 *fw_vsc_cfg) + const u8 *fw_vsc_cfg) { struct device *dev = &phy->i2c_dev->dev; struct fdp_nci_info *info; diff --git a/drivers/nfc/fdp/fdp.h b/drivers/nfc/fdp/fdp.h index dc048d4b977e..2e9161a4d7bf 100644 --- a/drivers/nfc/fdp/fdp.h +++ b/drivers/nfc/fdp/fdp.h @@ -23,7 +23,7 @@ struct fdp_i2c_phy { int fdp_nci_probe(struct fdp_i2c_phy *phy, const struct nfc_phy_ops *phy_ops, struct nci_dev **ndev, int tx_headroom, int tx_tailroom, - u8 clock_type, u32 clock_freq, u8 *fw_vsc_cfg); + u8 clock_type, u32 clock_freq, const u8 *fw_vsc_cfg); void fdp_nci_remove(struct nci_dev *ndev); #endif /* __LOCAL_FDP_H_ */ diff --git a/drivers/nfc/fdp/i2c.c b/drivers/nfc/fdp/i2c.c index 98e1876c9468..051c43a2a52f 100644 --- a/drivers/nfc/fdp/i2c.c +++ b/drivers/nfc/fdp/i2c.c @@ -36,7 +36,7 @@ print_hex_dump(KERN_DEBUG, prefix": ", DUMP_PREFIX_OFFSET, \ 16, 1, (skb)->data, (skb)->len, 0) -static void fdp_nci_i2c_reset(struct fdp_i2c_phy *phy) +static void fdp_nci_i2c_reset(const struct fdp_i2c_phy *phy) { /* Reset RST/WakeUP for at least 100 micro-second */ gpiod_set_value_cansleep(phy->power_gpio, FDP_POWER_OFF); @@ -47,7 +47,7 @@ static void fdp_nci_i2c_reset(struct fdp_i2c_phy *phy) static int fdp_nci_i2c_enable(void *phy_id) { - struct fdp_i2c_phy *phy = phy_id; + const struct fdp_i2c_phy *phy = phy_id; fdp_nci_i2c_reset(phy); @@ -56,7 +56,7 @@ static int fdp_nci_i2c_enable(void *phy_id) static void fdp_nci_i2c_disable(void *phy_id) { - struct fdp_i2c_phy *phy = phy_id; + const struct fdp_i2c_phy *phy = phy_id; fdp_nci_i2c_reset(phy); } </cut>

4 years, 9 months

3
2
0 0

Re: [CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-lts-allmodconfig - Build # 6 - Successful!

by Maxim Kuvyrkov

Hi Greg, This appears to have been a fluke. Boot-testing succeeded before the merge and failed after. Boot-testing on allmodconfig doesn’t seem to be stable, so we are going to disable it. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 18 Aug 2021, at 08:38, Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote: > > On Wed, Aug 18, 2021 at 05:22:07AM +0000, ci_notify(a)linaro.org wrote: >> Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-lts-allmodconfig. So far, this commit has regressed CI configurations: >> - tcwg_kernel/llvm-master-aarch64-lts-allmodconfig >> >> Culprit: >> <cut> >> commit 132a8267adabd645476b542b3b132c1b91988fe8 >> Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> >> Date: Thu Aug 12 13:22:21 2021 +0200 >> >> Linux 5.10.58 > > <snip> > > And what am I supposed to do with this information? > > -- > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe(a)googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/YRyczv2OCq51edQh%40kroa….

4 years, 9 months

1
0
0 0

clang-aarch64-full-2stage buildbot timeout

by Maxim Kuvyrkov

> > On Sep 22, 2021, at 11:23, Florian Hahn <florian_hahn at apple.com > > wrote: > > > > Hi, > > > > It looks like a lot of the recent builds of clang-aarch64-full-2stage are timing out. > > > > E.g https://lab.llvm.org/buildbot/#/builders/179/builds/1078 while checking out sources > > > https://lab.llvm.org/buildbot/#/builders/179/builds/1076 during building stage2 > > > > Is there anything that could be done to avoid such timeouts and avoid false positive failure emails? > > > > Cheers, > > Florian Hi Florian, Thanks for the heads up. We’ve noticed these timeouts too, and have reduced the load in the machine. It appears to have helped. > > Looks like other bots are also hit by timeouts, including clang-arm64-windows-msvc-2stage ( > https://lab.llvm.org/buildbot/#/builders/120/builds/1197 > ) This one looks like a legitimate failure, and appears to have been fixed by https://github.com/llvm/llvm-project/commit/c6013f71a4555f6d9ef9c60e6bc4376… in build https://lab.llvm.org/buildbot/#/builders/120/builds/1200 . Regards, -- Maxim Kuvyrkov https://www.linaro.org

4 years, 9 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 6% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 11014 to 11697 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 40% from 1513 to 2118 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 401.bzip2 grew in size by 2% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks grew in size by more than 1%: - 401.bzip2 grew in size by 2% from 46428 to 47368 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 462.libquantum grew in size by 3% after llvm: [JumpThreading] Ignore free instructions

by ci_notify＠linaro.org

After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> [JumpThreading] Ignore free instructions the following benchmarks grew in size by more than 1%: - 462.libquantum grew in size by 3% from 14035 to 14398 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -Os -flto -mthumb - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Wed Sep 22 21:34:24 2021 +0200 [JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290 --- .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 ++++++++++------------ .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- .../inlining-alignment-assumptions.ll | 12 ++--- 4 files changed, 52 insertions(+), 53 deletions(-) diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h index 816ea1071e52..0ac7d7c62b7a 100644 --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h @@ -44,6 +44,7 @@ class PHINode; class SelectInst; class SwitchInst; class TargetLibraryInfo; +class TargetTransformInfo; class Value; /// A private "module" namespace for types and utilities used by @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; /// revectored to the false side of the second if. class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { TargetLibraryInfo *TLI; + TargetTransformInfo *TTI; LazyValueInfo *LVI; AAResults *AA; DomTreeUpdater *DTU; @@ -99,9 +101,9 @@ public: JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); // Glue for old PM. - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, - std::unique_ptr<BlockFrequencyInfo> BFI, + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI, + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, std::unique_ptr<BranchProbabilityInfo> BPI); PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp index 688902ecb9ff..fe9a7211967c 100644 --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { dbgs() << "LVI for function '" << F.getName() << "':\n"; @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, } bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, - LazyValueInfo *LVI_, AliasAnalysis *AA_, - DomTreeUpdater *DTU_, bool HasProfileData_, + TargetTransformInfo *TTI_, LazyValueInfo *LVI_, + AliasAnalysis *AA_, DomTreeUpdater *DTU_, + bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_, std::unique_ptr<BranchProbabilityInfo> BPI_) { LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n"); TLI = TLI_; + TTI = TTI_; LVI = LVI_; AA = AA_; DTU = DTU_; @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value *ToVal) { /// Return the cost of duplicating a piece of this block from first non-phi /// and before StopAt instruction to thread across it. Stop scanning the block /// when exceeding the threshold. If duplication is impossible, returns ~0U. -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, + BasicBlock *BB, Instruction *StopAt, unsigned Threshold) { assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, if (Size > Threshold) return Size; - // Debugger intrinsics don't incur code size. - if (isa<DbgInfoIntrinsic>(I)) continue; - - // Pseudo-probes don't incur code size. - if (isa<PseudoProbeInst>(I)) - continue; - - // If this is a pointer->pointer bitcast, it is free. - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) - continue; - - // Freeze instruction is free, too. - if (isa<FreezeInst>(I)) - continue; - // Bail out if this instruction gives back a token type, it is not possible // to duplicate it if it is used outside this BB. if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) return ~0U; + // Blocks with NoDuplicate are modelled as having infinite cost, so they + // are never duplicated. + if (const CallInst *CI = dyn_cast<CallInst>(I)) + if (CI->cannotDuplicate() || CI->isConvergent()) + return ~0U; + + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) + == TargetTransformInfo::TCC_Free) + continue; + // All other instructions count for at least one unit. ++Size; @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, // as having cost of 2 total, and if they are a vector intrinsic, we model // them as having cost 1. if (const CallInst *CI = dyn_cast<CallInst>(I)) { - if (CI->cannotDuplicate() || CI->isConvergent()) - // Blocks with NoDuplicate are modelled as having infinite cost, so they - // are never duplicated. - return ~0U; - else if (!isa<IntrinsicInst>(CI)) + if (!isa<IntrinsicInst>(CI)) Size += 3; else if (!CI->getType()->isVectorTy()) Size += 1; @@ -2234,10 +2228,10 @@ bool JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, } // Compute the cost of duplicating BB and PredBB. - unsigned BBCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned BBCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); unsigned PredBBCost = getJumpThreadDuplicationCost( - PredBB, PredBB->getTerminator(), BBDupThreshold); + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); // Give up if costs are too high. We need to check BBCost and PredBBCost // individually before checking their sum because getJumpThreadDuplicationCost @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( return false; } - unsigned JumpThreadCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned JumpThreadCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (JumpThreadCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() << "' - Cost is too high: " << JumpThreadCost << "\n"); @@ -2614,8 +2608,8 @@ bool JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( return false; } - unsigned DuplicationCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned DuplicationCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (DuplicationCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() << "' - Cost is too high: " << DuplicationCost << "\n"); @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, IntrinsicInst *Guard, ValueToValueMapTy UnguardedMapping, GuardedMapping; Instruction *AfterGuard = Guard->getNextNode(); - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, BBDupThreshold); + unsigned Cost = + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); if (Cost > BBDupThreshold) return false; // Duplicate all instructions before the guard and the guard itself to the diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll b/llvm/test/Transforms/JumpThreading/free_instructions.ll index f768ec996779..76392af77d33 100644 --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll @@ -5,26 +5,28 @@ ; the jump threading threshold, as everything else are free instructions. define i32 @free_instructions(i1 %c, i32* %p) { ; CHECK-LABEL: @free_instructions( -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] -; CHECK: if: +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] +; CHECK: if2: ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 -; CHECK-NEXT: br label [[JOIN:%.*]] -; CHECK: else: -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 -; CHECK-NEXT: br label [[JOIN]] -; CHECK: join: ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META0:![0-9]+]]) ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* +; CHECK-NEXT: [[P32:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P21]]) +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 +; CHECK-NEXT: ret i32 0 +; CHECK: else2: +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META4:![0-9]+]]) +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* ; CHECK-NEXT: [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P2]]) ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] -; CHECK: if2: -; CHECK-NEXT: ret i32 0 -; CHECK: else2: ; CHECK-NEXT: ret i32 1 ; br i1 %c, label %if, label %else diff --git a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll index 57014e856a09..f764a59dd8a2 100644 --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] ; ; ASSUMPTIONS-ON-LABEL: @caller1( -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE1:%.*]] -; ASSUMPTIONS-ON: false1: -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE2:%.*]] ; ASSUMPTIONS-ON: common.ret: -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, [[TMP0:%.*]] ] -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR]], i64 8) ] +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, [[TMP0:%.*]] ] +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR:%.*]], i64 8) ] ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: ret void +; ASSUMPTIONS-ON: false2: +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] ; br i1 %c, label %true1, label %false1 </cut>

4 years, 9 months

2
2
0 0

[TCWG CI] Regression caused by llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

[TCWG CI] Regression caused by llvm: Revert "Allow rematerialization of virtual reg uses": commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 21880 # First few build errors in logs: # 00:04:00 arch/arm/lib/xor-neon.c:30:2: error: This code requires at least version 4.6 of GCC [-Werror,-W#warnings] # 00:04:00 make[1]: *** [scripts/Makefile.build:277: arch/arm/lib/xor-neon.o] Error 1 # 00:04:00 make: *** [Makefile:1868: arch/arm/lib] Error 2 # 00:05:21 crypto/wp512.c:782:13: error: stack frame size (1176) exceeds limit (1024) in function 'wp512_process_buffer' [-Werror,-Wframe-larger-than] # 00:05:21 make[1]: *** [scripts/Makefile.build:277: crypto/wp512.o] Error 1 # 00:08:06 make: *** [Makefile:1868: crypto] Error 2 # 00:18:48 drivers/gpu/drm/selftests/test-drm_mm.c:372:12: error: stack frame size (1032) exceeds limit (1024) in function '__igt_reserve' [-Werror,-Wframe-larger-than] # 00:18:49 make[4]: *** [scripts/Makefile.build:277: drivers/gpu/drm/selftests/test-drm_mm.o] Error 1 # 00:19:07 make[3]: *** [scripts/Makefile.build:540: drivers/gpu/drm/selftests] Error 2 # 00:30:18 drivers/firmware/tegra/bpmp-debugfs.c:357:16: error: stack frame size (1248) exceeds limit (1024) in function 'bpmp_debug_store' [-Werror,-Wframe-larger-than] from # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 21881 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/llvm-master-arm-mainline-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Baseline build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Even more details: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp: commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 Author: Tom de Vries <tdevries(a)suse.de> [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_VECT_mthumb artifacts/build-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-master-arm_eabi-coremark-O3_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 cd investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3814a9e1fe77c01c7e872c25afa198537d4ac780 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 Author: Tom de Vries <tdevries(a)suse.de> Date: Fri Sep 24 12:39:14 2021 +0200 [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp When interpreting the testsuite results, it's often relevant what kind of machine the testsuite ran on. On a local machine one can just do /proc/cpuinfo, but in case of running tests using a remote system that distributes test runs to other remote systems that are not directly accessible, that's not possible. Fix this by dumping /proc/cpuinfo into the gdb.log, as well as lsb_release -a and uname -a. We could do this at the start of each test run, by putting it into unix.exp or some such. However, this might be too verbose, so we choose to put it into its own test-case, such that it get triggered in a full testrun, but not when running one or a subset of tests. We put the test-case into the gdb.testsuite directory, which is currently the only place in the testsuite where we do not test gdb. [ Though perhaps this could be put into a new gdb.info directory, since the test-case doesn't actually test the testsuite. ] Tested on x86_64-linux. --- gdb/testsuite/gdb.testsuite/dump-system-info.exp | 48 ++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/gdb/testsuite/gdb.testsuite/dump-system-info.exp b/gdb/testsuite/gdb.testsuite/dump-system-info.exp new file mode 100644 index 00000000000..bf181469bd5 --- /dev/null +++ b/gdb/testsuite/gdb.testsuite/dump-system-info.exp @@ -0,0 +1,48 @@ +# Copyright 2021 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# The purpose of this test-case is to dump /proc/cpuinfo and similar system +# info into gdb.log. + +# Check if /proc/cpuinfo is available. +set res [remote_exec target "test -r /proc/cpuinfo"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 && $output == "" } { + verbose -log "Cpuinfo available, dumping:" + remote_exec target "cat /proc/cpuinfo" +} else { + verbose -log "Cpuinfo not available" +} + +set res [remote_exec target "lsb_release -a"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 } { + verbose -log "lsb_release -a availabe, dumping:\n$output" +} else { + verbose -log "lsb_release -a not available" +} + +set res [remote_exec target "uname -a"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 } { + verbose -log "uname -a availabe, dumping:\n$output" +} else { + verbose -log "uname -a not available" +} </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] 464.h264ref slowed down by 3% after llvm: Fix test from 8dd42f, capitalization in test

by ci_notify＠linaro.org

After llvm commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af Author: Erich Keane <erich.keane(a)intel.com> Fix test from 8dd42f, capitalization in test the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 3% from 10973 to 11249 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 12% from 1446 to 1619 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af cd investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh # Reproduce last_good build git checkout --detach 77d200a546136c2855063613ff4bca1f682fb23a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af Author: Erich Keane <erich.keane(a)intel.com> Date: Fri Sep 24 10:24:17 2021 -0700 Fix test from 8dd42f, capitalization in test --- clang/test/CXX/drs/dr17xx.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/test/CXX/drs/dr17xx.cpp b/clang/test/CXX/drs/dr17xx.cpp index 42303c83ae3c..c8648908ebda 100644 --- a/clang/test/CXX/drs/dr17xx.cpp +++ b/clang/test/CXX/drs/dr17xx.cpp @@ -129,7 +129,7 @@ namespace dr1778 { // dr1778: 9 namespace dr1762 { // dr1762: 14 #if __cplusplus >= 201103L float operator ""_E(const char *); - // expected-error@+2 {{invalid suffix on literal; c++11 requires a space between literal and identifier}} + // expected-error@+2 {{invalid suffix on literal; C++11 requires a space between literal and identifier}} // expected-warning@+1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}} float operator ""E(const char *); #endif </cut>

4 years, 9 months

2
1
0 0

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Thanks, Stanislav, FWIW, it will be, probably, easier for you to just rebuild the compiler, it is an x86_64-linux-gnu -> arm-linux-gnueabihf cross. This link has the build log [1]. cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=ARM Then compile the pre-processed source with plain -O2 or -O3 optimisation settings. [1] https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 24 Sep 2021, at 20:30, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > > [AMD Official Use Only] > > I have reverted the whole change. There was yet another perf regression report. > > Stas > > From: Mekhanoshin, Stanislav > Sent: Thursday, September 23, 2021 11:48 > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > Thanks. I see the reload. There shall not be extra pressure since that is the whole idea, make pressure less. However, I see more spills in that specific file, fast_algorithms.s if I get it right. > Can I get the IR for it? Something to feed llc. > > Stas > > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > Sent: Thursday, September 23, 2021 2:31 > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > [CAUTION: External Email] > > Thanks, Stanislav. > > I’ve looked into profile dumps, and 456.hmmer’s hot loop get several additional reloads. E.g., "ldr r1, [sp, #84]” generates 203 additional samples, which translates into 20 seconds of time just for that one instruction. > > See the attached profile dumps and the the screenshot with the hot loop highlighted. > > Maybe your patch increases register pressure too much? > > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org > > > On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > > > > [AMD Official Use Only] > > > > There are actually couple things worth to try if that is easy: > > > > https://reviews.llvm.org/D109077 > > https://reviews.llvm.org/differential/diff/374324/ > > > > Both may slightly change spill weights and then spilling pattern. > > > > Stas > > > > -----Original Message----- > > From: Mekhanoshin, Stanislav > > Sent: Wednesday, September 22, 2021 12:09 > > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > > > I assume some of the newly rematerialized instructions caused perf drops. Probably some very specific ones. I would appreciate if you could point them to me. > > In addition I believe I would need to have a linked or optimized bitcode to feed into llc. > > > > Stas > > > > -----Original Message----- > > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > > Sent: Wednesday, September 22, 2021 12:06 > > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > > > [CAUTION: External Email] > > > > Hi Stanislav, > > > > That's fair; I or someone from Linaro will try to analyze this and follow up here. > > > > On a more general note, what info would you like to see in these benchmarking regression reports? > > > > Thanks, > > > > -- > > Maxim Kuvyrkov > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > > > > > >> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > >> > >> [AMD Official Use Only] > >> > >> Hm... I'd really like to help, but I do not think I can do anything with megabytes of code in an asm which I do not understand and tons of differences in 48 asm files. > >> What I can see there is overall less spilling code which was the intent in the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 less of them. > >> I doubt I could say much more without someone pointing to the actual root cause. > >> > >> Stas > >> > >> -----Original Message----- > >> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > >> Sent: Wednesday, September 22, 2021 5:16 > >> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > >> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > >> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > >> > >> [CAUTION: External Email] > >> > >> Hi Stanislav, > >> > >> Attached is a tarball with -save-temps output (pre-processed source and generated assembly) for first-bad run (your commit) and last-good run (immediate parent of your commit). > >> > >> -- > >> Maxim Kuvyrkov > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > >> > >>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > >>> > >>> [AMD Official Use Only] > >>> > >>> Thanks for letting me know. Some regressions are inevitable, however do you happen to have any analysis and dumps? I myself do not understand ARM ISA well... > >>> > >>> Stas > >>> > >>> -----Original Message----- > >>> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > >>> Sent: Wednesday, September 15, 2021 5:52 > >>> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > >>> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > >>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > >>> > >>> [CAUTION: External Email] > >>> > >>> Hi Stanislav, > >>> > >>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. > >>> > >>> -- > >>> Maxim Kuvyrkov > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > >>> > >> > >> > > > <image001.png>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit fcc561a54de2beb19cb325094fbd3ec76f96e520 Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-fcc561a54de2beb19cb325094fbd3ec76f96e520/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-fcc561a54de2beb19cb325094fbd3ec76f96e520 cd investigate-binutils-fcc561a54de2beb19cb325094fbd3ec76f96e520 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach fcc561a54de2beb19cb325094fbd3ec76f96e520 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit fcc561a54de2beb19cb325094fbd3ec76f96e520 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Wed Sep 22 00:00:31 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 338c1288a22..c45d963473c 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210921 +#define BFD_VERSION_DATE 20210922 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc: Daily bump.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Daily bump.: commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Daily bump. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-d4b84aefe696a5783a58a30b3fb8dc4617cd147a/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a cd investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d4b84aefe696a5783a58a30b3fb8dc4617cd147a ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Date: Tue Sep 21 00:17:57 2021 +0000 Daily bump. --- gcc/DATESTAMP | 2 +- gcc/fortran/ChangeLog | 5 +++++ gcc/testsuite/ChangeLog | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index c1155ef2341..ed865cb70ab 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20210920 +20210921 diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog index f6863fb900a..3d53ed99f33 100644 --- a/gcc/fortran/ChangeLog +++ b/gcc/fortran/ChangeLog @@ -1,3 +1,8 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' + on 'distribute' for combined directives, matching OpenMP 5.0 + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2ea65ee2d7f..7f8d142942a 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail

by ci_notify＠linaro.org

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail: commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Cygwin: allow open_setup to fail Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-e5fcb021cc9dcb1f19d45030457be86b4a226e65/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 cd investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /newlib/ ./ ./bisect/baseline/ cd newlib # Reproduce first_bad build git checkout --detach e5fcb021cc9dcb1f19d45030457be86b4a226e65 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b0841aa789e74b6778744b89af76b60bd1a78bc ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Date: Sat Sep 18 08:13:55 2021 -0400 Cygwin: allow open_setup to fail Convert fhandler_base::open_setup to a (virtual) method that returns a bool result. For the moment, it and its overrides always return true. --- winsup/cygwin/fhandler.cc | 3 ++- winsup/cygwin/fhandler.h | 10 +++++----- winsup/cygwin/fhandler_console.cc | 4 ++-- winsup/cygwin/fhandler_pipe.cc | 9 +++++++-- winsup/cygwin/fhandler_tty.cc | 8 ++++---- 5 files changed, 20 insertions(+), 14 deletions(-) diff --git a/winsup/cygwin/fhandler.cc b/winsup/cygwin/fhandler.cc index 9dfe70be38..1af469e0c9 100644 --- a/winsup/cygwin/fhandler.cc +++ b/winsup/cygwin/fhandler.cc @@ -789,9 +789,10 @@ fhandler_base::fd_reopen (int, mode_t) return NULL; } -void +bool fhandler_base::open_setup (int) { + return true; } /* states: diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h index 61113e6981..3471e95b97 100644 --- a/winsup/cygwin/fhandler.h +++ b/winsup/cygwin/fhandler.h @@ -355,7 +355,7 @@ class fhandler_base int open_null (int flags); virtual int open (int, mode_t); virtual fhandler_base *fd_reopen (int, mode_t); - virtual void open_setup (int flags); + virtual bool open_setup (int flags); void set_unique_id (int64_t u) { unique_id = u; } void set_unique_id () { NtAllocateLocallyUniqueId ((PLUID) &unique_id); } @@ -1206,7 +1206,7 @@ public: select_record *select_except (select_stuff *); char *get_proc_fd_name (char *buf); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); void fixup_after_fork (HANDLE); int dup (fhandler_base *child, int); void set_close_on_exec (bool val); @@ -2132,7 +2132,7 @@ private: bool use_archetype () const {return true;} int open (int flags, mode_t mode); - void open_setup (int flags); + bool open_setup (int flags); int dup (fhandler_base *, int); void __reg3 read (void *ptr, size_t& len); @@ -2300,7 +2300,7 @@ class fhandler_pty_slave: public fhandler_pty_common HANDLE& get_handle_nat () { return io_handle_nat; } int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int init (HANDLE, DWORD, mode_t); @@ -2399,7 +2399,7 @@ public: void doecho (const void *str, DWORD len); int accept_input (); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int close (); diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc index e00f2cdbcc..ee862b17d1 100644 --- a/winsup/cygwin/fhandler_console.cc +++ b/winsup/cygwin/fhandler_console.cc @@ -1366,13 +1366,13 @@ fhandler_console::open (int flags, mode_t) return 1; } -void +bool fhandler_console::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); if (myself->set_ctty (this, flags) && !myself->cygstarted) init_console_handler (true); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } int diff --git a/winsup/cygwin/fhandler_pipe.cc b/winsup/cygwin/fhandler_pipe.cc index 73ace3ac53..590ecf6670 100644 --- a/winsup/cygwin/fhandler_pipe.cc +++ b/winsup/cygwin/fhandler_pipe.cc @@ -191,10 +191,11 @@ out: return 0; } -void +bool fhandler_pipe::open_setup (int flags) { - fhandler_base::open_setup (flags); + if (!fhandler_base::open_setup (flags)) + goto err; if (get_dev () == FH_PIPER && !read_mtx) { SECURITY_ATTRIBUTES *sa = sec_none_cloexec (flags); @@ -211,6 +212,10 @@ fhandler_pipe::open_setup (int flags) } if (get_dev () == FH_PIPEW && !query_hdl) set_pipe_non_blocking (is_nonblocking ()); + return true; + +err: + return false; } off_t diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 1ea9a47ac5..05fe5348af 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -964,13 +964,13 @@ err_no_msg: return 0; } -void +bool fhandler_pty_slave::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); myself->set_ctty (this, flags); report_tty_counts (this, "opened", ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } void @@ -1947,14 +1947,14 @@ fhandler_pty_master::open (int flags, mode_t) return 1; } -void +bool fhandler_pty_master::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); char buf[sizeof ("opened pty master for ptyNNNNNNNNNNN")]; __small_sprintf (buf, "opened pty master for pty%d", get_minor ()); report_tty_counts (this, buf, ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } off_t </cut>

4 years, 9 months

1
0
0 0

gcc-linaro-6.3.1-2017.05-i686_aarch64-elf.tar.xz

by maytte sanchez

I’m trying to import these files into Ds-5. After unzipping files, it still will not show up in ds-5 search. Below is the error that I keep receiving: Sent from my iPhone

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-27439f0edab99c6870cf7fe042074e47632f3fbd/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd cd investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh # Reproduce last_good build git checkout --detach 6060c2f3373e18f76fa9e3e4d7cf2f3d5983da03 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Tue Sep 21 00:00:39 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 72a41aba322..338c1288a22 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210920 +#define BFD_VERSION_DATE 20210921 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute: commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> GCC11 - Fortran: combined directives - order(concurrent) not on distribute Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-b1dc26d3543d79805751c26ba5b142eeeb1f55b8/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 cd investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 79c523d40de1b7ce1dd0f4865c0855ab2bf6744b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> Date: Mon Sep 20 17:24:56 2021 +0200 GCC11 - Fortran: combined directives - order(concurrent) not on distribute While OpenMP 5.1 and GCC 12 permits 'order(concurrent)' on distribute, OpenMP 5.0 and GCC 11 don't. This patch for GCC 11 ensures the clause also does not end up on 'distribute' when splitting combined directives. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' on 'distribute' for combined directives, matching OpenMP 5.0 gcc/testsuite/ChangeLog: * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. --- gcc/fortran/trans-openmp.c | 2 -- .../gomp/distribute-order-concurrent.f90 | 25 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 7e931bf4bc7..973d916b4a2 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -5176,8 +5176,6 @@ gfc_split_omp_clauses (gfc_code *code, /* Duplicate collapse. */ clausesa[GFC_OMP_SPLIT_DISTRIBUTE].collapse = code->ext.omp_clauses->collapse; - clausesa[GFC_OMP_SPLIT_DISTRIBUTE].order_concurrent - = code->ext.omp_clauses->order_concurrent; } if (mask & GFC_OMP_MASK_PARALLEL) { diff --git a/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 new file mode 100644 index 00000000000..9597d913684 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 @@ -0,0 +1,25 @@ +! { dg-additional-options "-fdump-tree-original" } +! +! In OpenMP 5.0, 'order(concurrent)' does not apply to distribute +! Ensure that it is rejected in GCC 11. +! +! Note: OpenMP 5.1 allows it; the GCC 12 testcase for it is gfortran.dg/gomp/order-5.f90 + +subroutine f(a) +implicit none +integer :: i, thr +!save :: thr +integer :: a(:) + +!$omp distribute parallel do order(concurrent) private(thr) + do i = 1, 10 + thr = 5 + a(i) = thr + end do +!$omp end distribute parallel do +end + +! { dg-final { scan-tree-dump-not "omp distribute\[^\n\r]*order" "original" } } +! { dg-final { scan-tree-dump "#pragma omp distribute\[\n\r\]" "original" } } +! { dg-final { scan-tree-dump "#pragma omp parallel private\$thr\$" "original" } } +! { dg-final { scan-tree-dump "#pragma omp for nowait order\$concurrent\$" "original" } } </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 450.soplex grew in size by 2% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks grew in size by more than 1%: - 450.soplex grew in size by 2% from 207260 to 211436 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 7% after llvm: [JumpThreading] Ignore free instructions

by ci_notify＠linaro.org

After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> [JumpThreading] Ignore free instructions the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 7% from 10715 to 11434 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Wed Sep 22 21:34:24 2021 +0200 [JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290 --- .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 ++++++++++------------ .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- .../inlining-alignment-assumptions.ll | 12 ++--- 4 files changed, 52 insertions(+), 53 deletions(-) diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h index 816ea1071e52..0ac7d7c62b7a 100644 --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h @@ -44,6 +44,7 @@ class PHINode; class SelectInst; class SwitchInst; class TargetLibraryInfo; +class TargetTransformInfo; class Value; /// A private "module" namespace for types and utilities used by @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; /// revectored to the false side of the second if. class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { TargetLibraryInfo *TLI; + TargetTransformInfo *TTI; LazyValueInfo *LVI; AAResults *AA; DomTreeUpdater *DTU; @@ -99,9 +101,9 @@ public: JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); // Glue for old PM. - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, - std::unique_ptr<BlockFrequencyInfo> BFI, + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI, + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, std::unique_ptr<BranchProbabilityInfo> BPI); PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp index 688902ecb9ff..fe9a7211967c 100644 --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { dbgs() << "LVI for function '" << F.getName() << "':\n"; @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, } bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, - LazyValueInfo *LVI_, AliasAnalysis *AA_, - DomTreeUpdater *DTU_, bool HasProfileData_, + TargetTransformInfo *TTI_, LazyValueInfo *LVI_, + AliasAnalysis *AA_, DomTreeUpdater *DTU_, + bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_, std::unique_ptr<BranchProbabilityInfo> BPI_) { LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n"); TLI = TLI_; + TTI = TTI_; LVI = LVI_; AA = AA_; DTU = DTU_; @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value *ToVal) { /// Return the cost of duplicating a piece of this block from first non-phi /// and before StopAt instruction to thread across it. Stop scanning the block /// when exceeding the threshold. If duplication is impossible, returns ~0U. -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, + BasicBlock *BB, Instruction *StopAt, unsigned Threshold) { assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, if (Size > Threshold) return Size; - // Debugger intrinsics don't incur code size. - if (isa<DbgInfoIntrinsic>(I)) continue; - - // Pseudo-probes don't incur code size. - if (isa<PseudoProbeInst>(I)) - continue; - - // If this is a pointer->pointer bitcast, it is free. - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) - continue; - - // Freeze instruction is free, too. - if (isa<FreezeInst>(I)) - continue; - // Bail out if this instruction gives back a token type, it is not possible // to duplicate it if it is used outside this BB. if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) return ~0U; + // Blocks with NoDuplicate are modelled as having infinite cost, so they + // are never duplicated. + if (const CallInst *CI = dyn_cast<CallInst>(I)) + if (CI->cannotDuplicate() || CI->isConvergent()) + return ~0U; + + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) + == TargetTransformInfo::TCC_Free) + continue; + // All other instructions count for at least one unit. ++Size; @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, // as having cost of 2 total, and if they are a vector intrinsic, we model // them as having cost 1. if (const CallInst *CI = dyn_cast<CallInst>(I)) { - if (CI->cannotDuplicate() || CI->isConvergent()) - // Blocks with NoDuplicate are modelled as having infinite cost, so they - // are never duplicated. - return ~0U; - else if (!isa<IntrinsicInst>(CI)) + if (!isa<IntrinsicInst>(CI)) Size += 3; else if (!CI->getType()->isVectorTy()) Size += 1; @@ -2234,10 +2228,10 @@ bool JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, } // Compute the cost of duplicating BB and PredBB. - unsigned BBCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned BBCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); unsigned PredBBCost = getJumpThreadDuplicationCost( - PredBB, PredBB->getTerminator(), BBDupThreshold); + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); // Give up if costs are too high. We need to check BBCost and PredBBCost // individually before checking their sum because getJumpThreadDuplicationCost @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( return false; } - unsigned JumpThreadCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned JumpThreadCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (JumpThreadCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() << "' - Cost is too high: " << JumpThreadCost << "\n"); @@ -2614,8 +2608,8 @@ bool JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( return false; } - unsigned DuplicationCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned DuplicationCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (DuplicationCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() << "' - Cost is too high: " << DuplicationCost << "\n"); @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, IntrinsicInst *Guard, ValueToValueMapTy UnguardedMapping, GuardedMapping; Instruction *AfterGuard = Guard->getNextNode(); - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, BBDupThreshold); + unsigned Cost = + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); if (Cost > BBDupThreshold) return false; // Duplicate all instructions before the guard and the guard itself to the diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll b/llvm/test/Transforms/JumpThreading/free_instructions.ll index f768ec996779..76392af77d33 100644 --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll @@ -5,26 +5,28 @@ ; the jump threading threshold, as everything else are free instructions. define i32 @free_instructions(i1 %c, i32* %p) { ; CHECK-LABEL: @free_instructions( -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] -; CHECK: if: +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] +; CHECK: if2: ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 -; CHECK-NEXT: br label [[JOIN:%.*]] -; CHECK: else: -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 -; CHECK-NEXT: br label [[JOIN]] -; CHECK: join: ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META0:![0-9]+]]) ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* +; CHECK-NEXT: [[P32:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P21]]) +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 +; CHECK-NEXT: ret i32 0 +; CHECK: else2: +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META4:![0-9]+]]) +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* ; CHECK-NEXT: [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P2]]) ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] -; CHECK: if2: -; CHECK-NEXT: ret i32 0 -; CHECK: else2: ; CHECK-NEXT: ret i32 1 ; br i1 %c, label %if, label %else diff --git a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll index 57014e856a09..f764a59dd8a2 100644 --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] ; ; ASSUMPTIONS-ON-LABEL: @caller1( -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE1:%.*]] -; ASSUMPTIONS-ON: false1: -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE2:%.*]] ; ASSUMPTIONS-ON: common.ret: -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, [[TMP0:%.*]] ] -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR]], i64 8) ] +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, [[TMP0:%.*]] ] +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR:%.*]], i64 8) ] ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: ret void +; ASSUMPTIONS-ON: false2: +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] ; br i1 %c, label %true1, label %false1 </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 24 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Still looking at the mess that is non-unique bus names. Worked through exactly which devices and machine types are affected for the i2c bus. + Sent a patchset which tries to make the "create a bus" function names a bit more regular across different bus types. * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Luis figured out why GDB was crashing when fed the MVE XML by QEMU's gdbstub; this was a combination of QEMU giving GDB some non-standard extra registers in its "vfp" XML feature and GDB not being robust enough against those unexpected extras. Sent out a patchset which cleans up QEMU's XML in this area and also implements the extra XML for MVE. (This will only go into QEMU once the GDB patches have landed and the XML format is nailed down.) -- PMM

4 years, 9 months

1
0
0 0

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Hi Stanislav, FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. -- Maxim Kuvyrkov https://www.linaro.org > On 15 Sep 2021, at 12:54, ci_notify(a)linaro.org wrote: > > After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > > Allow rematerialization of virtual reg uses > > the following benchmarks slowed down by more than 2%: > - 456.hmmer slowed down by 6% > - 482.sphinx3 slowed down by 3% > > Benchmark: > Toolchain: Clang + Glibc + LLVM Linker > Version: all components were built from their tip of trunk > Target: arm-linux-gnueabihf > Compiler flags: -O3 -marm > Hardware: NVidia TK1 4x Cortex-A15 > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2 > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > Date: Thu Aug 19 11:42:09 2021 -0700 > > Allow rematerialization of virtual reg uses > > Currently isReallyTriviallyReMaterializableGeneric() implementation > prevents rematerialization on any virtual register use on the grounds > that is not a trivial rematerialization and that we do not want to > extend liveranges. > > It appears that LRE logic does not attempt to extend a liverange of > a source register for rematerialization so that is not an issue. > That is checked in the LiveRangeEdit::allUsesAvailableAt(). > > The only non-trivial aspect of it is accounting for tied-defs which > normally represent a read-modify-write operation and not rematerializable. > > The test for a tied-def situation already exists in the > /CodeGen/AMDGPU/remat-vop.mir, > test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. > > The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets > where I more or less understand the asm it seems to reduce spilling > (as expected) or be neutral. However, it needs a review by all targets' > specialists. > > Differential Revision: https://reviews.llvm.org/D106408 > --- > llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- > llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- > llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + > llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- > llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- > llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- > .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- > llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- > llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- > llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- > llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- > llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- > llvm/test/CodeGen/Mips/tls.ll | 4 +- > llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- > llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- > llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- > llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- > llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- > llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- > llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- > llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- > .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- > llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- > llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- > llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- > .../tail-pred-disabled-in-loloops.ll | 14 +- > .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- > .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- > llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- > llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- > llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- > llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- > llvm/test/CodeGen/X86/addcarry.ll | 20 +- > llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- > llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- > llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- > llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- > llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- > llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- > 42 files changed, 4217 insertions(+), 4202 deletions(-) > > diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > index 2f853a2c6f9f..1c05afba730d 100644 > --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h > +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > @@ -117,10 +117,11 @@ public: > const MachineFunction &MF) const; > > /// Return true if the instruction is trivially rematerializable, meaning it > - /// has no side effects and requires no operands that aren't always available. > - /// This means the only allowed uses are constants and unallocatable physical > - /// registers so that the instructions result is independent of the place > - /// in the function. > + /// has no side effects. Uses of constants and unallocatable physical > + /// registers are always trivial to rematerialize so that the instructions > + /// result is independent of the place in the function. Uses of virtual > + /// registers are allowed but it is caller's responsility to ensure these > + /// operands are valid at the point the instruction is beeing moved. > bool isTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA = nullptr) const { > return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || > @@ -140,8 +141,7 @@ protected: > /// set, this hook lets the target specify whether the instruction is actually > /// trivially rematerializable, taking into consideration its operands. This > /// predicate must return false if the instruction has any side effects other > - /// than producing a value, or if it requres any address registers that are > - /// not always available. > + /// than producing a value. > /// Requirements must be check as stated in isTriviallyReMaterializable() . > virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA) const { > diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp > index 1eab8e7443a7..fe7d60e0b7e2 100644 > --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp > +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp > @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > const MachineRegisterInfo &MRI = MF.getRegInfo(); > > // Remat clients assume operand 0 is the defined register. > - if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) > + if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || > + MI.getOperand(0).isTied()) > return false; > Register DefReg = MI.getOperand(0).getReg(); > > @@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > // same virtual register, though. > if (MO.isDef() && Reg != DefReg) > return false; > - > - // Don't allow any virtual-register uses. Rematting an instruction with > - // virtual register uses would length the live ranges of the uses, which > - // is not necessarily a good idea, certainly not "trivial". > - if (MO.isUse()) > - return false; > } > > // Everything checked out. > diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > index ed799bfca028..c9915aaabfde 100644 > --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir > +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > @@ -51,6 +51,66 @@ body: | > S_NOP 0, implicit %2 > S_ENDPGM 0 > ... > +# The liverange of %0 covers a point of rematerialization, source value is > +# availabe. > +--- > +name: test_remat_s_mov_b32_vreg_src_long_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_NOP 0, implicit %0 > + S_ENDPGM 0 > +... > +# The liverange of %0 does not cover a point of rematerialization, source value is > +# unavailabe and we do not want to artificially extend the liverange. > +--- > +name: test_no_remat_s_mov_b32_vreg_src_short_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) > + ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_ENDPGM 0 > +... > --- > name: test_remat_s_mov_b64 > tracksRegLiveness: true > diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > index a4243276c70a..175a2069a441 100644 > --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; ENABLE-NEXT: pophs {r11, pc} > ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; ENABLE-NEXT: movw r12, :lower16:skip > -; ENABLE-NEXT: sub r1, r1, #1 > +; ENABLE-NEXT: sub r3, r1, #1 > ; ENABLE-NEXT: movt r12, :upper16:skip > ; ENABLE-NEXT: .LBB0_4: @ %while.body > ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; ENABLE-NEXT: ldrb r3, [r0] > -; ENABLE-NEXT: ldrb r3, [r12, r3] > -; ENABLE-NEXT: add r0, r0, r3 > -; ENABLE-NEXT: sub r3, r1, #1 > -; ENABLE-NEXT: cmp r3, r1 > +; ENABLE-NEXT: ldrb r1, [r0] > +; ENABLE-NEXT: ldrb r1, [r12, r1] > +; ENABLE-NEXT: add r0, r0, r1 > +; ENABLE-NEXT: sub r1, r3, #1 > +; ENABLE-NEXT: cmp r1, r3 > ; ENABLE-NEXT: bhs .LBB0_6 > ; ENABLE-NEXT: @ %bb.5: @ %while.body > ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; ENABLE-NEXT: cmp r0, r2 > -; ENABLE-NEXT: mov r1, r3 > +; ENABLE-NEXT: mov r3, r1 > ; ENABLE-NEXT: blo .LBB0_4 > ; ENABLE-NEXT: .LBB0_6: @ %if.end29 > ; ENABLE-NEXT: pop {r11, pc} > @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; DISABLE-NEXT: pophs {r11, pc} > ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; DISABLE-NEXT: movw r12, :lower16:skip > -; DISABLE-NEXT: sub r1, r1, #1 > +; DISABLE-NEXT: sub r3, r1, #1 > ; DISABLE-NEXT: movt r12, :upper16:skip > ; DISABLE-NEXT: .LBB0_4: @ %while.body > ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; DISABLE-NEXT: ldrb r3, [r0] > -; DISABLE-NEXT: ldrb r3, [r12, r3] > -; DISABLE-NEXT: add r0, r0, r3 > -; DISABLE-NEXT: sub r3, r1, #1 > -; DISABLE-NEXT: cmp r3, r1 > +; DISABLE-NEXT: ldrb r1, [r0] > +; DISABLE-NEXT: ldrb r1, [r12, r1] > +; DISABLE-NEXT: add r0, r0, r1 > +; DISABLE-NEXT: sub r1, r3, #1 > +; DISABLE-NEXT: cmp r1, r3 > ; DISABLE-NEXT: bhs .LBB0_6 > ; DISABLE-NEXT: @ %bb.5: @ %while.body > ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; DISABLE-NEXT: cmp r0, r2 > -; DISABLE-NEXT: mov r1, r3 > +; DISABLE-NEXT: mov r3, r1 > ; DISABLE-NEXT: blo .LBB0_4 > ; DISABLE-NEXT: .LBB0_6: @ %if.end29 > ; DISABLE-NEXT: pop {r11, pc} > diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > index 55157875d355..ea15fcc5c824 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: push {r4, r5, r11, lr} > ; SCALAR-NEXT: rsb r3, r2, #0 > ; SCALAR-NEXT: and r4, r2, #63 > -; SCALAR-NEXT: and lr, r3, #63 > -; SCALAR-NEXT: rsb r3, lr, #32 > +; SCALAR-NEXT: and r12, r3, #63 > +; SCALAR-NEXT: rsb r3, r12, #32 > ; SCALAR-NEXT: lsl r2, r0, r4 > -; SCALAR-NEXT: lsr r12, r0, lr > -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 > -; SCALAR-NEXT: subs r12, lr, #32 > -; SCALAR-NEXT: lsrpl r3, r1, r12 > +; SCALAR-NEXT: lsr lr, r0, r12 > +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 > +; SCALAR-NEXT: subs lr, r12, #32 > +; SCALAR-NEXT: lsrpl r3, r1, lr > ; SCALAR-NEXT: subs r5, r4, #32 > ; SCALAR-NEXT: movwpl r2, #0 > ; SCALAR-NEXT: cmp r5, #0 > @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: lsr r3, r0, r3 > ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 > ; SCALAR-NEXT: lslpl r3, r0, r5 > -; SCALAR-NEXT: lsr r0, r1, lr > -; SCALAR-NEXT: cmp r12, #0 > +; SCALAR-NEXT: lsr r0, r1, r12 > +; SCALAR-NEXT: cmp lr, #0 > ; SCALAR-NEXT: movwpl r0, #0 > ; SCALAR-NEXT: orr r1, r3, r0 > ; SCALAR-NEXT: mov r0, r2 > @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK: @ %bb.0: > ; CHECK-NEXT: .save {r4, r5, r11, lr} > ; CHECK-NEXT: push {r4, r5, r11, lr} > -; CHECK-NEXT: and lr, r2, #63 > +; CHECK-NEXT: and r12, r2, #63 > ; CHECK-NEXT: rsb r2, r2, #0 > -; CHECK-NEXT: rsb r3, lr, #32 > +; CHECK-NEXT: rsb r3, r12, #32 > ; CHECK-NEXT: and r4, r2, #63 > -; CHECK-NEXT: lsr r12, r0, lr > -; CHECK-NEXT: orr r3, r12, r1, lsl r3 > -; CHECK-NEXT: subs r12, lr, #32 > +; CHECK-NEXT: lsr lr, r0, r12 > +; CHECK-NEXT: orr r3, lr, r1, lsl r3 > +; CHECK-NEXT: subs lr, r12, #32 > ; CHECK-NEXT: lsl r2, r0, r4 > -; CHECK-NEXT: lsrpl r3, r1, r12 > +; CHECK-NEXT: lsrpl r3, r1, lr > ; CHECK-NEXT: subs r5, r4, #32 > ; CHECK-NEXT: movwpl r2, #0 > ; CHECK-NEXT: cmp r5, #0 > @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK-NEXT: lsr r3, r0, r3 > ; CHECK-NEXT: orr r3, r3, r1, lsl r4 > ; CHECK-NEXT: lslpl r3, r0, r5 > -; CHECK-NEXT: lsr r0, r1, lr > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: lsr r0, r1, r12 > +; CHECK-NEXT: cmp lr, #0 > ; CHECK-NEXT: movwpl r0, #0 > ; CHECK-NEXT: orr r1, r0, r3 > ; CHECK-NEXT: mov r0, r2 > diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll > index 54c93b493c98..6372f9be2ca3 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll > @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { > ; CHECK-NEXT: mov r3, #0 > ; CHECK-NEXT: bl __aeabi_uldivmod > ; CHECK-NEXT: add r0, r2, #27 > -; CHECK-NEXT: lsl r6, r6, #27 > -; CHECK-NEXT: and r1, r0, #63 > ; CHECK-NEXT: lsl r2, r7, #27 > +; CHECK-NEXT: and r12, r0, #63 > +; CHECK-NEXT: lsl r6, r6, #27 > ; CHECK-NEXT: orr r7, r6, r7, lsr #5 > +; CHECK-NEXT: rsb r3, r12, #32 > +; CHECK-NEXT: lsr r2, r2, r12 > ; CHECK-NEXT: mov r6, #63 > -; CHECK-NEXT: rsb r3, r1, #32 > -; CHECK-NEXT: lsr r2, r2, r1 > -; CHECK-NEXT: subs r12, r1, #32 > -; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: orr r2, r2, r7, lsl r3 > +; CHECK-NEXT: subs r3, r12, #32 > +; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: lsl r5, r9, #1 > -; CHECK-NEXT: lsrpl r2, r7, r12 > +; CHECK-NEXT: lsrpl r2, r7, r3 > +; CHECK-NEXT: subs r1, r6, #32 > ; CHECK-NEXT: lsl r0, r5, r6 > -; CHECK-NEXT: subs r4, r6, #32 > -; CHECK-NEXT: lsl r3, r8, #1 > +; CHECK-NEXT: lsl r4, r8, #1 > ; CHECK-NEXT: movwpl r0, #0 > -; CHECK-NEXT: orr r3, r3, r9, lsr #31 > +; CHECK-NEXT: orr r4, r4, r9, lsr #31 > ; CHECK-NEXT: orr r0, r0, r2 > ; CHECK-NEXT: rsb r2, r6, #32 > -; CHECK-NEXT: cmp r4, #0 > -; CHECK-NEXT: lsr r1, r7, r1 > +; CHECK-NEXT: cmp r1, #0 > ; CHECK-NEXT: lsr r2, r5, r2 > -; CHECK-NEXT: orr r2, r2, r3, lsl r6 > -; CHECK-NEXT: lslpl r2, r5, r4 > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: orr r2, r2, r4, lsl r6 > +; CHECK-NEXT: lslpl r2, r5, r1 > +; CHECK-NEXT: lsr r1, r7, r12 > +; CHECK-NEXT: cmp r3, #0 > ; CHECK-NEXT: movwpl r1, #0 > ; CHECK-NEXT: orr r1, r2, r1 > ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} > diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > index 2922e0ed5423..0a0bb62b0a09 100644 > --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { > ; BE-LABEL: i56_or: > ; BE: @ %bb.0: > ; BE-NEXT: mov r1, r0 > -; BE-NEXT: ldr r12, [r0] > ; BE-NEXT: ldrh r2, [r1, #4]! > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: orr r2, r3, r2, lsl #8 > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: strb r2, [r1, #2] > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: strb r12, [r1, #2] > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > %aa = load i56, i56* %a > @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: strb r2, [r1, #2] > ; BE-NEXT: orr r2, r3, r12, lsl #8 > -; BE-NEXT: ldr r12, [r0] > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > > diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll > index 09a991da2e59..46490efb6631 100644 > --- a/llvm/test/CodeGen/ARM/neon-copy.ll > +++ b/llvm/test/CodeGen/ARM/neon-copy.ll > @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { > ; CHECK-NEXT: .pad #8 > ; CHECK-NEXT: sub sp, sp, #8 > ; CHECK-NEXT: vmov.u16 r1, d0[1] > -; CHECK-NEXT: and r0, r0, #3 > +; CHECK-NEXT: and r12, r0, #3 > ; CHECK-NEXT: vmov.u16 r2, d0[2] > -; CHECK-NEXT: mov r3, sp > -; CHECK-NEXT: vmov.u16 r12, d0[3] > -; CHECK-NEXT: orr r0, r3, r0, lsl #1 > +; CHECK-NEXT: mov r0, sp > +; CHECK-NEXT: vmov.u16 r3, d0[3] > +; CHECK-NEXT: orr r0, r0, r12, lsl #1 > ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] > ; CHECK-NEXT: vldr d0, [sp] > ; CHECK-NEXT: vmov.16 d0[1], r1 > ; CHECK-NEXT: vmov.16 d0[2], r2 > -; CHECK-NEXT: vmov.16 d0[3], r12 > +; CHECK-NEXT: vmov.16 d0[3], r3 > ; CHECK-NEXT: add sp, sp, #8 > ; CHECK-NEXT: bx lr > %tmp = extractelement <8 x i16> %x, i32 0 > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > index 8be7100d368b..a125446b27c3 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $2, $6 > +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 76($sp) > -; MMR3-NEXT: srlv $4, $7, $16 > -; MMR3-NEXT: not16 $3, $16 > -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sll16 $2, $6, 1 > -; MMR3-NEXT: sllv $3, $2, $3 > -; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: srlv $6, $6, $16 > -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: subu16 $7, $2, $16 > +; MMR3-NEXT: srlv $3, $7, $16 > +; MMR3-NEXT: not16 $6, $16 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sll16 $2, $2, 1 > +; MMR3-NEXT: sllv $2, $2, $6 > +; MMR3-NEXT: li16 $6, 64 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: srlv $4, $4, $16 > +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill > +; MMR3-NEXT: subu16 $7, $6, $16 > ; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: andi16 $2, $7, 32 > -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $5, $16, 32 > -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill > -; MMR3-NEXT: move $4, $9 > +; MMR3-NEXT: andi16 $5, $7, 32 > +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill > +; MMR3-NEXT: andi16 $6, $16, 32 > +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $3, $9 > ; MMR3-NEXT: li16 $17, 0 > -; MMR3-NEXT: movn $4, $17, $2 > -; MMR3-NEXT: movn $3, $6, $5 > -; MMR3-NEXT: addiu $2, $16, -64 > -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srlv $5, $5, $2 > -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $17, 1 > -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > -; MMR3-NEXT: not16 $5, $2 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $4 > -; MMR3-NEXT: srav $1, $17, $2 > -; MMR3-NEXT: andi16 $2, $2, 32 > -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $2 > -; MMR3-NEXT: sllv $2, $17, $7 > -; MMR3-NEXT: not16 $4, $7 > -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $6, $7, 1 > -; MMR3-NEXT: srlv $6, $6, $4 > +; MMR3-NEXT: movn $3, $17, $5 > +; MMR3-NEXT: movn $2, $4, $6 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $4, $17, $4 > +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $6, 1 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $3 > +; MMR3-NEXT: addiu $3, $16, -64 > +; MMR3-NEXT: srav $1, $6, $3 > +; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: sllv $3, $6, $7 > +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: not16 $3, $7 > +; MMR3-NEXT: srl16 $4, $17, 1 > +; MMR3-NEXT: srlv $3, $4, $3 > ; MMR3-NEXT: sltiu $10, $16, 64 > -; MMR3-NEXT: movn $5, $3, $10 > -; MMR3-NEXT: or16 $6, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload > -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sllv $3, $4, $3 > +; MMR3-NEXT: movn $5, $2, $10 > +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload > ; MMR3-NEXT: or16 $3, $2 > -; MMR3-NEXT: srav $11, $17, $16 > -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $11, $4 > -; MMR3-NEXT: sra $2, $17, 31 > +; MMR3-NEXT: srlv $2, $17, $16 > +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sllv $17, $7, $4 > +; MMR3-NEXT: or16 $17, $2 > +; MMR3-NEXT: srav $11, $6, $16 > +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $17, $11, $2 > +; MMR3-NEXT: sra $2, $6, 31 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: move $8, $2 > -; MMR3-NEXT: movn $8, $3, $10 > -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $6, $9, $3 > -; MMR3-NEXT: li16 $3, 0 > -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $7, $3, $4 > -; MMR3-NEXT: or16 $7, $6 > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: movn $4, $17, $10 > +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $9, $6 > +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $3 > ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movn $1, $2, $3 > ; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $3, $16 > -; MMR3-NEXT: movn $11, $2, $4 > +; MMR3-NEXT: movn $11, $2, $6 > ; MMR3-NEXT: movn $2, $11, $10 > -; MMR3-NEXT: move $3, $8 > +; MMR3-NEXT: move $3, $4 > ; MMR3-NEXT: move $4, $1 > ; MMR3-NEXT: lwp $16, 40($sp) > ; MMR3-NEXT: addiusp 48 > @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > -; MMR6-NEXT: move $1, $7 > +; MMR6-NEXT: move $12, $7 > ; MMR6-NEXT: lw $3, 44($sp) > ; MMR6-NEXT: li16 $2, 64 > -; MMR6-NEXT: subu16 $7, $2, $3 > -; MMR6-NEXT: sllv $8, $5, $7 > -; MMR6-NEXT: andi16 $2, $7, 32 > -; MMR6-NEXT: selnez $9, $8, $2 > -; MMR6-NEXT: sllv $10, $4, $7 > -; MMR6-NEXT: not16 $7, $7 > -; MMR6-NEXT: srl16 $16, $5, 1 > -; MMR6-NEXT: srlv $7, $16, $7 > -; MMR6-NEXT: or $7, $10, $7 > -; MMR6-NEXT: seleqz $7, $7, $2 > -; MMR6-NEXT: or $7, $9, $7 > -; MMR6-NEXT: srlv $9, $1, $3 > -; MMR6-NEXT: not16 $16, $3 > -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: subu16 $16, $2, $3 > +; MMR6-NEXT: sllv $1, $5, $16 > +; MMR6-NEXT: andi16 $2, $16, 32 > +; MMR6-NEXT: selnez $8, $1, $2 > +; MMR6-NEXT: sllv $9, $4, $16 > +; MMR6-NEXT: not16 $16, $16 > +; MMR6-NEXT: srl16 $17, $5, 1 > +; MMR6-NEXT: srlv $10, $17, $16 > +; MMR6-NEXT: or $9, $9, $10 > +; MMR6-NEXT: seleqz $9, $9, $2 > +; MMR6-NEXT: or $8, $8, $9 > +; MMR6-NEXT: srlv $9, $7, $3 > +; MMR6-NEXT: not16 $7, $3 > +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $17, $6, 1 > -; MMR6-NEXT: sllv $10, $17, $16 > +; MMR6-NEXT: sllv $10, $17, $7 > ; MMR6-NEXT: or $9, $10, $9 > ; MMR6-NEXT: andi16 $17, $3, 32 > ; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: srlv $10, $6, $3 > ; MMR6-NEXT: selnez $11, $10, $17 > ; MMR6-NEXT: seleqz $10, $10, $17 > -; MMR6-NEXT: or $10, $10, $7 > -; MMR6-NEXT: seleqz $12, $8, $2 > -; MMR6-NEXT: or $8, $11, $9 > +; MMR6-NEXT: or $8, $10, $8 > +; MMR6-NEXT: seleqz $1, $1, $2 > +; MMR6-NEXT: or $9, $11, $9 > ; MMR6-NEXT: addiu $2, $3, -64 > -; MMR6-NEXT: srlv $9, $5, $2 > +; MMR6-NEXT: srlv $10, $5, $2 > ; MMR6-NEXT: sll16 $7, $4, 1 > ; MMR6-NEXT: not16 $16, $2 > ; MMR6-NEXT: sllv $11, $7, $16 > ; MMR6-NEXT: sltiu $13, $3, 64 > -; MMR6-NEXT: or $8, $8, $12 > -; MMR6-NEXT: selnez $10, $10, $13 > -; MMR6-NEXT: or $9, $11, $9 > -; MMR6-NEXT: srav $11, $4, $2 > +; MMR6-NEXT: or $1, $9, $1 > +; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $9, $11, $10 > +; MMR6-NEXT: srav $10, $4, $2 > ; MMR6-NEXT: andi16 $2, $2, 32 > -; MMR6-NEXT: seleqz $12, $11, $2 > +; MMR6-NEXT: seleqz $11, $10, $2 > ; MMR6-NEXT: sra $14, $4, 31 > ; MMR6-NEXT: selnez $15, $14, $2 > ; MMR6-NEXT: seleqz $9, $9, $2 > -; MMR6-NEXT: or $12, $15, $12 > -; MMR6-NEXT: seleqz $12, $12, $13 > -; MMR6-NEXT: selnez $2, $11, $2 > -; MMR6-NEXT: seleqz $11, $14, $13 > -; MMR6-NEXT: or $10, $10, $12 > -; MMR6-NEXT: selnez $10, $10, $3 > -; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $11, $15, $11 > +; MMR6-NEXT: seleqz $11, $11, $13 > +; MMR6-NEXT: selnez $2, $10, $2 > +; MMR6-NEXT: seleqz $10, $14, $13 > +; MMR6-NEXT: or $8, $8, $11 > +; MMR6-NEXT: selnez $8, $8, $3 > +; MMR6-NEXT: selnez $1, $1, $13 > ; MMR6-NEXT: or $2, $2, $9 > ; MMR6-NEXT: srav $9, $4, $3 > ; MMR6-NEXT: seleqz $4, $9, $17 > -; MMR6-NEXT: selnez $12, $14, $17 > -; MMR6-NEXT: or $4, $12, $4 > -; MMR6-NEXT: selnez $12, $4, $13 > +; MMR6-NEXT: selnez $11, $14, $17 > +; MMR6-NEXT: or $4, $11, $4 > +; MMR6-NEXT: selnez $11, $4, $13 > ; MMR6-NEXT: seleqz $2, $2, $13 > ; MMR6-NEXT: seleqz $4, $6, $3 > -; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $8, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > +; MMR6-NEXT: seleqz $6, $12, $3 > ; MMR6-NEXT: or $1, $1, $2 > -; MMR6-NEXT: or $4, $4, $10 > -; MMR6-NEXT: or $2, $12, $11 > -; MMR6-NEXT: srlv $3, $5, $3 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $5, $7, $5 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: seleqz $3, $3, $17 > -; MMR6-NEXT: selnez $5, $9, $17 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: selnez $3, $3, $13 > -; MMR6-NEXT: or $3, $3, $11 > +; MMR6-NEXT: selnez $1, $1, $3 > +; MMR6-NEXT: or $1, $6, $1 > +; MMR6-NEXT: or $4, $4, $8 > +; MMR6-NEXT: or $6, $11, $10 > +; MMR6-NEXT: srlv $2, $5, $3 > +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $3, $7, $3 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: seleqz $2, $2, $17 > +; MMR6-NEXT: selnez $3, $9, $17 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: selnez $2, $2, $13 > +; MMR6-NEXT: or $3, $2, $10 > +; MMR6-NEXT: move $2, $6 > ; MMR6-NEXT: move $5, $1 > ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload > ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > index ed2bfc9fcf60..e4b4b3ae1d0f 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 68($sp) > ; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: subu16 $7, $2, $16 > -; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: move $17, $5 > -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $3, $7, 32 > +; MMR3-NEXT: subu16 $17, $2, $16 > +; MMR3-NEXT: sllv $9, $5, $17 > +; MMR3-NEXT: andi16 $3, $17, 32 > ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > ; MMR3-NEXT: li16 $2, 0 > ; MMR3-NEXT: move $4, $9 > ; MMR3-NEXT: movn $4, $2, $3 > -; MMR3-NEXT: srlv $5, $8, $16 > +; MMR3-NEXT: srlv $5, $7, $16 > ; MMR3-NEXT: not16 $3, $16 > ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sll16 $2, $6, 1 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sllv $2, $2, $3 > ; MMR3-NEXT: or16 $2, $5 > -; MMR3-NEXT: srlv $5, $6, $16 > -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: srlv $7, $6, $16 > ; MMR3-NEXT: andi16 $3, $16, 32 > ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $2, $5, $3 > +; MMR3-NEXT: movn $2, $7, $3 > ; MMR3-NEXT: addiu $3, $16, -64 > ; MMR3-NEXT: or16 $2, $4 > -; MMR3-NEXT: srlv $4, $17, $3 > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $4, 1 > -; MMR3-NEXT: not16 $5, $3 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $17 > -; MMR3-NEXT: srlv $1, $4, $3 > -; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $3, $6, $3 > ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $3, 1 > +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $4 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: srlv $1, $3, $4 > +; MMR3-NEXT: andi16 $4, $4, 32 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $4 > ; MMR3-NEXT: sltiu $10, $16, 64 > ; MMR3-NEXT: movn $5, $2, $10 > -; MMR3-NEXT: sllv $2, $4, $7 > -; MMR3-NEXT: not16 $3, $7 > -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $4, $7, 1 > +; MMR3-NEXT: sllv $2, $3, $17 > +; MMR3-NEXT: not16 $3, $17 > +; MMR3-NEXT: srl16 $4, $6, 1 > ; MMR3-NEXT: srlv $4, $4, $3 > ; MMR3-NEXT: or16 $4, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > +; MMR3-NEXT: srlv $2, $6, $16 > ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload > ; MMR3-NEXT: sllv $3, $6, $3 > ; MMR3-NEXT: or16 $3, $2 > ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload > ; MMR3-NEXT: srlv $2, $2, $16 > -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $2, $17 > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $2, $6 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: li16 $6, 0 > -; MMR3-NEXT: movz $3, $6, $10 > -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $4, $9, $7 > -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $6, $7, $17 > -; MMR3-NEXT: or16 $6, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movz $3, $17, $10 > +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $4, $9, $17 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $4 > ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $1, $7, $4 > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $1, $6, $10 > +; MMR3-NEXT: movn $1, $17, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $4, $16 > -; MMR3-NEXT: movn $2, $7, $17 > +; MMR3-NEXT: movn $2, $17, $6 > ; MMR3-NEXT: li16 $4, 0 > ; MMR3-NEXT: movz $2, $4, $10 > ; MMR3-NEXT: move $4, $1 > @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; > ; MMR6-LABEL: lshr_i128: > ; MMR6: # %bb.0: # %entry > -; MMR6-NEXT: addiu $sp, $sp, -32 > -; MMR6-NEXT: .cfi_def_cfa_offset 32 > -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill > -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill > +; MMR6-NEXT: addiu $sp, $sp, -24 > +; MMR6-NEXT: .cfi_def_cfa_offset 24 > +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > ; MMR6-NEXT: move $1, $7 > -; MMR6-NEXT: move $7, $5 > -; MMR6-NEXT: lw $3, 60($sp) > +; MMR6-NEXT: move $7, $4 > +; MMR6-NEXT: lw $3, 52($sp) > ; MMR6-NEXT: srlv $2, $1, $3 > -; MMR6-NEXT: not16 $5, $3 > -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill > -; MMR6-NEXT: move $17, $6 > -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $3 > +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > +; MMR6-NEXT: move $4, $6 > +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $6, $6, 1 > -; MMR6-NEXT: sllv $6, $6, $5 > +; MMR6-NEXT: sllv $6, $6, $16 > ; MMR6-NEXT: or $8, $6, $2 > -; MMR6-NEXT: addiu $5, $3, -64 > -; MMR6-NEXT: srlv $9, $7, $5 > -; MMR6-NEXT: move $6, $4 > -; MMR6-NEXT: sll16 $2, $4, 1 > -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill > -; MMR6-NEXT: not16 $16, $5 > +; MMR6-NEXT: addiu $6, $3, -64 > +; MMR6-NEXT: srlv $9, $5, $6 > +; MMR6-NEXT: sll16 $2, $7, 1 > +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $6 > ; MMR6-NEXT: sllv $10, $2, $16 > ; MMR6-NEXT: andi16 $16, $3, 32 > ; MMR6-NEXT: seleqz $8, $8, $16 > ; MMR6-NEXT: or $9, $10, $9 > -; MMR6-NEXT: srlv $10, $17, $3 > +; MMR6-NEXT: srlv $10, $4, $3 > ; MMR6-NEXT: selnez $11, $10, $16 > ; MMR6-NEXT: li16 $17, 64 > ; MMR6-NEXT: subu16 $2, $17, $3 > -; MMR6-NEXT: sllv $12, $7, $2 > -; MMR6-NEXT: move $17, $7 > +; MMR6-NEXT: sllv $12, $5, $2 > ; MMR6-NEXT: andi16 $4, $2, 32 > -; MMR6-NEXT: andi16 $7, $5, 32 > -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill > -; MMR6-NEXT: seleqz $9, $9, $7 > +; MMR6-NEXT: andi16 $17, $6, 32 > +; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: seleqz $13, $12, $4 > ; MMR6-NEXT: or $8, $11, $8 > ; MMR6-NEXT: selnez $11, $12, $4 > -; MMR6-NEXT: sllv $12, $6, $2 > -; MMR6-NEXT: move $7, $6 > -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sllv $12, $7, $2 > ; MMR6-NEXT: not16 $2, $2 > -; MMR6-NEXT: srl16 $6, $17, 1 > +; MMR6-NEXT: srl16 $6, $5, 1 > ; MMR6-NEXT: srlv $2, $6, $2 > ; MMR6-NEXT: or $2, $12, $2 > ; MMR6-NEXT: seleqz $2, $2, $4 > -; MMR6-NEXT: srlv $4, $7, $5 > -; MMR6-NEXT: or $11, $11, $2 > -; MMR6-NEXT: or $5, $8, $13 > -; MMR6-NEXT: srlv $6, $17, $3 > -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: selnez $7, $4, $2 > -; MMR6-NEXT: sltiu $8, $3, 64 > -; MMR6-NEXT: selnez $12, $5, $8 > -; MMR6-NEXT: or $7, $7, $9 > -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: addiu $4, $3, -64 > +; MMR6-NEXT: srlv $4, $7, $4 > +; MMR6-NEXT: or $12, $11, $2 > +; MMR6-NEXT: or $6, $8, $13 > +; MMR6-NEXT: srlv $5, $5, $3 > +; MMR6-NEXT: selnez $8, $4, $17 > +; MMR6-NEXT: sltiu $11, $3, 64 > +; MMR6-NEXT: selnez $13, $6, $11 > +; MMR6-NEXT: or $8, $8, $9 > ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $9, $2, $5 > +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $9, $6, $2 > ; MMR6-NEXT: seleqz $10, $10, $16 > -; MMR6-NEXT: li16 $5, 0 > -; MMR6-NEXT: or $10, $10, $11 > -; MMR6-NEXT: or $6, $9, $6 > -; MMR6-NEXT: seleqz $2, $7, $8 > -; MMR6-NEXT: seleqz $7, $5, $8 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: srlv $9, $5, $3 > -; MMR6-NEXT: seleqz $11, $9, $16 > -; MMR6-NEXT: selnez $11, $11, $8 > +; MMR6-NEXT: li16 $2, 0 > +; MMR6-NEXT: or $10, $10, $12 > +; MMR6-NEXT: or $9, $9, $5 > +; MMR6-NEXT: seleqz $5, $8, $11 > +; MMR6-NEXT: seleqz $8, $2, $11 > +; MMR6-NEXT: srlv $7, $7, $3 > +; MMR6-NEXT: seleqz $2, $7, $16 > +; MMR6-NEXT: selnez $2, $2, $11 > ; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $12, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > -; MMR6-NEXT: or $5, $1, $2 > -; MMR6-NEXT: or $2, $7, $11 > -; MMR6-NEXT: seleqz $1, $6, $16 > -; MMR6-NEXT: selnez $6, $9, $16 > -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $9, $16, $3 > -; MMR6-NEXT: selnez $10, $10, $8 > -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $4, $4, $16 > -; MMR6-NEXT: seleqz $4, $4, $8 > -; MMR6-NEXT: or $4, $10, $4 > +; MMR6-NEXT: or $5, $13, $5 > +; MMR6-NEXT: selnez $5, $5, $3 > +; MMR6-NEXT: or $5, $1, $5 > +; MMR6-NEXT: or $2, $8, $2 > +; MMR6-NEXT: seleqz $1, $9, $16 > +; MMR6-NEXT: selnez $6, $7, $16 > +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: seleqz $7, $7, $3 > +; MMR6-NEXT: selnez $9, $10, $11 > +; MMR6-NEXT: seleqz $4, $4, $17 > +; MMR6-NEXT: seleqz $4, $4, $11 > </cut>

4 years, 9 months

2
7
0 0

[TCWG CI] 482.sphinx3 slowed down by 4% after gcc: tree-optimization/65206 - dependence analysis on mixed pointer/array

by ci_notify＠linaro.org

After gcc commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> tree-optimization/65206 - dependence analysis on mixed pointer/array the following benchmarks slowed down by more than 2%: - 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach f92901a508305f291fcf2acae0825379477724de ../artifacts/test.sh # Reproduce last_good build git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> Date: Wed Sep 8 14:42:31 2021 +0200 tree-optimization/65206 - dependence analysis on mixed pointer/array This adds the capability to analyze the dependence of mixed pointer/array accesses. The example is from where using a masked load/store creates the pointer-based access when an otherwise unconditional access is array based. Other examples would include accesses to an array mixed with accesses from inlined helpers that work on pointers. The idea is quite simple and old - analyze the data-ref indices as if the reference was pointer-based. The following change does this by changing dr_analyze_indices to work on the indices sub-structure and storing an alternate indices substructure in each data reference. That alternate set of indices is analyzed lazily by initialize_data_dependence_relation when it fails to match-up the main set of indices of two data references. initialize_data_dependence_relation is refactored into a head and a tail worker and changed to work on one of the indices structures and thus away from using DR_* access macros which continue to reference the main indices substructure. There are quite some vectorization and loop distribution opportunities unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and 544.nab_r see amendments in what they report with -fopt-info-loop while the rest of the specrate set sees no changes there. Measuring runtime for the set where changes were reported reveals nothing off-noise besides 511.povray_r which seems to regress slightly for me (on a Zen2 machine with -Ofast -march=native). 2021-09-08 Richard Biener <rguenther(a)suse.de> PR tree-optimization/65206 * tree-data-ref.h (struct data_reference): Add alt_indices, order it last. * tree-data-ref.c (free_data_ref): Release alt_indices. (dr_analyze_indices): Work on struct indices and get DR_REF as tree. (create_data_ref): Adjust. (initialize_data_dependence_relation): Split into head and tail. When the base objects fail to match up try again with pointer-based analysis of indices. * tree-vectorizer.c (vec_info_shared::check_datarefs): Do not compare the lazily computed alternate set of indices. * gcc.dg/torture/20210916.c: New testcase. * gcc.dg/vect/pr65206.c: Likewise. --- gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++ gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++ gcc/tree-data-ref.c | 174 +++++++++++++++++++++----------- gcc/tree-data-ref.h | 9 +- gcc/tree-vectorizer.c | 3 +- 5 files changed, 168 insertions(+), 60 deletions(-) diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c new file mode 100644 index 00000000000..0ea6d45e463 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/20210916.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +typedef union tree_node *tree; +struct tree_base { + unsigned : 1; + unsigned lang_flag_2 : 1; +}; +struct tree_type { + tree main_variant; +}; +union tree_node { + struct tree_base base; + struct tree_type type; +}; +tree finish_struct_t, finish_struct_x; +void finish_struct() +{ + for (; finish_struct_t->type.main_variant;) + finish_struct_x->base.lang_flag_2 = 0; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c new file mode 100644 index 00000000000..3b6262622c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */ +/* { dg-additional-options "-mavx" { target avx } } */ + +#define N 1024 + +double a[N], b[N]; + +void foo () +{ + for (int i = 0; i < N; ++i) + if (b[i] < 3.) + a[i] += b[i]; +} + +/* We get a .MASK_STORE because while the load of a[i] does not trap + the store would introduce store data races. Make sure we still + can handle the data dependence with zero distance. */ + +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */ diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e061baa7c20..18307a554fc 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "vr-values.h" #include "range-op.h" +#include "tree-ssa-loop-ivopts.h" static struct datadep_stats { @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base) DR, analyzed in LOOP and instantiated before NEST. */ static void -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop) { - vec<tree> access_fns = vNULL; - tree ref, op; - tree base, off, access_fn; - /* If analyzing a basic-block there are no indices to analyze and thus no access functions. */ if (!nest) { - DR_BASE_OBJECT (dr) = DR_REF (dr); - DR_ACCESS_FNS (dr).create (0); + dri->base_object = ref; + dri->access_fns.create (0); return; } - ref = DR_REF (dr); + vec<tree> access_fns = vNULL; /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses into a two element array with a constant index. The base is @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) { if (TREE_CODE (ref) == ARRAY_REF) { - op = TREE_OPERAND (ref, 1); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 1); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); access_fns.safe_push (access_fn); } @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) analyzed nest, add it as an additional independent access-function. */ if (TREE_CODE (ref) == MEM_REF) { - op = TREE_OPERAND (ref, 0); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 0); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC) { - tree orig_type; tree memoff = TREE_OPERAND (ref, 1); - base = initial_condition (access_fn); - orig_type = TREE_TYPE (base); + tree base = initial_condition (access_fn); + tree orig_type = TREE_TYPE (base); STRIP_USELESS_TYPE_CONVERSION (base); + tree off; split_constant_offset (base, &base, &off); STRIP_USELESS_TYPE_CONVERSION (base); /* Fold the MEM_REF offset into the evolutions initial @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) base, memoff); MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old); MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old); - DR_UNCONSTRAINED_BASE (dr) = true; + dri->unconstrained_base = true; access_fns.safe_push (access_fn); } } @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) build_int_cst (reference_alias_ptr_type (ref), 0)); } - DR_BASE_OBJECT (dr) = ref; - DR_ACCESS_FNS (dr) = access_fns; + dri->base_object = ref; + dri->access_fns = access_fns; } /* Extracts the alias analysis information from the memory reference DR. */ @@ -1463,6 +1460,8 @@ void free_data_ref (data_reference_p dr) { DR_ACCESS_FNS (dr).release (); + if (dr->alt_indices.base_object) + dr->alt_indices.access_fns.release (); free (dr); } @@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt, dr_analyze_innermost (&DR_INNERMOST (dr), memref, nest != NULL ? loop : NULL, stmt); - dr_analyze_indices (dr, nest, loop); + dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop); dr_analyze_alias (dr); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b) TREE_TYPE (TREE_OPERAND (ref_b, 0))); } -/* Initialize a data dependence relation between data accesses A and - B. NB_LOOPS is the number of loops surrounding the references: the - size of the classic distance/direction vectors. */ +/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES + is true when the main indices of A and B were not comparable so we try again + with alternate indices computed on an indirect reference. */ struct data_dependence_relation * -initialize_data_dependence_relation (struct data_reference *a, - struct data_reference *b, - vec<loop_p> loop_nest) +initialize_data_dependence_relation (struct data_dependence_relation *res, + vec<loop_p> loop_nest, + bool use_alt_indices) { - struct data_dependence_relation *res; + struct data_reference *a = DDR_A (res); + struct data_reference *b = DDR_B (res); unsigned int i; - res = XCNEW (struct data_dependence_relation); - DDR_A (res) = a; - DDR_B (res) = b; - DDR_LOOP_NEST (res).create (0); - DDR_SUBSCRIPTS (res).create (0); - DDR_DIR_VECTS (res).create (0); - DDR_DIST_VECTS (res).create (0); - - if (a == NULL || b == NULL) + struct indices *indices_a = &a->indices; + struct indices *indices_b = &b->indices; + if (use_alt_indices) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (TREE_CODE (DR_REF (a)) != MEM_REF) + indices_a = &a->alt_indices; + if (TREE_CODE (DR_REF (b)) != MEM_REF) + indices_b = &b->alt_indices; } - - /* If the data references do not alias, then they are independent. */ - if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) - { - DDR_ARE_DEPENDENT (res) = chrec_known; - return res; - } - - unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); - unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + unsigned int num_dimensions_a = indices_a->access_fns.length (); + unsigned int num_dimensions_b = indices_b->access_fns.length (); if (num_dimensions_a == 0 || num_dimensions_b == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; @@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a, the a and b accesses have a single ARRAY_REF component reference [0] but have two subscripts. */ - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) num_dimensions_a -= 1; - if (DR_UNCONSTRAINED_BASE (b)) + if (indices_b->unconstrained_base) num_dimensions_b -= 1; /* These structures describe sequences of component references in @@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a, B: [3, 4] (i.e. s.e) */ while (index_a < num_dimensions_a && index_b < num_dimensions_b) { + /* The alternate indices form always has a single dimension + with unconstrained base. */ + gcc_assert (!use_alt_indices); + /* REF_A and REF_B must be one of the component access types allowed by dr_analyze_indices. */ gcc_checking_assert (access_fn_component_p (ref_a)); @@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a, /* See whether FULL_SEQ ends at the base and whether the two bases are equal. We do not care about TBAA or alignment info so we can use OEP_ADDRESS_OF to avoid false negatives. */ - tree base_a = DR_BASE_OBJECT (a); - tree base_b = DR_BASE_OBJECT (b); + tree base_a = indices_a->base_object; + tree base_b = indices_b->base_object; bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a && full_seq.start_b + full_seq.length == num_dimensions_b - && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && (indices_a->unconstrained_base + == indices_b->unconstrained_base) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) && (types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) @@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a, both lvalues are distinct from the object's declared type. */ if (same_base_p) { - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) full_seq.length += 1; } else @@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a, /* Punt if we didn't find a suitable sequence. */ if (full_seq.length == 0) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (use_alt_indices + || (TREE_CODE (DR_REF (a)) == MEM_REF + && TREE_CODE (DR_REF (b)) == MEM_REF) + || may_be_nonaddressable_p (DR_REF (a)) + || may_be_nonaddressable_p (DR_REF (b))) + { + /* Fully exhausted possibilities. */ + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* Try evaluating both DRs as dereferences of pointers. */ + if (!a->alt_indices.base_object + && TREE_CODE (DR_REF (a)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)), + build_int_cst + (reference_alias_ptr_type (DR_REF (a)), 0)); + dr_analyze_indices (&a->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (a))); + } + if (!b->alt_indices.base_object + && TREE_CODE (DR_REF (b)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)), + build_int_cst + (reference_alias_ptr_type (DR_REF (b)), 0)); + dr_analyze_indices (&b->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (b))); + } + return initialize_data_dependence_relation (res, loop_nest, true); } if (!same_base_p) @@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a, struct subscript *subscript; subscript = XNEW (struct subscript); - SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); - SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); + SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i]; + SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i]; SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a, return res; } +/* Initialize a data dependence relation between data accesses A and + B. NB_LOOPS is the number of loops surrounding the references: the + size of the classic distance/direction vectors. */ + +struct data_dependence_relation * +initialize_data_dependence_relation (struct data_reference *a, + struct data_reference *b, + vec<loop_p> loop_nest) +{ + data_dependence_relation *res = XCNEW (struct data_dependence_relation); + DDR_A (res) = a; + DDR_B (res) = b; + DDR_LOOP_NEST (res).create (0); + DDR_SUBSCRIPTS (res).create (0); + DDR_DIR_VECTS (res).create (0); + DDR_DIST_VECTS (res).create (0); + + if (a == NULL || b == NULL) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* If the data references do not alias, then they are independent. */ + if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) + { + DDR_ARE_DEPENDENT (res) = chrec_known; + return res; + } + + return initialize_data_dependence_relation (res, loop_nest, false); +} + + /* Frees memory used by the conflict function F. */ static void diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 685f33d85ae..74f579c9f3f 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -166,14 +166,19 @@ struct data_reference and runs to completion. */ bool is_conditional_in_stmt; + /* Alias information for the data reference. */ + struct dr_alias alias; + /* Behavior of the memory reference in the innermost loop. */ struct innermost_loop_behavior innermost; /* Subscripts of this data reference. */ struct indices indices; - /* Alias information for the data reference. */ - struct dr_alias alias; + /* Alternate subscripts initialized lazily and used by data-dependence + analysis only when the main indices of two DRs are not comparable. + Keep last to keep vec_info_shared::check_datarefs happy. */ + struct indices alt_indices; }; #define DR_STMT(DR) (DR)->stmt diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 3aa3e2a6783..20daa31187d 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs () return; gcc_assert (datarefs.length () == datarefs_copy.length ()); for (unsigned i = 0; i < datarefs.length (); ++i) - if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0) + if (memcmp (&datarefs_copy[i], datarefs[i], + offsetof (data_reference, alt_indices)) != 0) gcc_unreachable (); } </cut>

4 years, 9 months

3
2
0 0

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.: commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Factor predidacte analysis out of tree-ssa-uninit.c into its own module. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6240 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6999 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-mainline-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Reproduce builds: <cut> mkdir investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 cd investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 94c12ffac234b29a702aa7b6730f2678265857c8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 51166eb2c534692c3c7779def24f83c8c3811b98 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Sep 17 15:39:13 2021 -0600 Factor predidacte analysis out of tree-ssa-uninit.c into its own module. gcc/ChangeLog: * Makefile.in (OBJS): Add gimple-predicate-analysis.o. * tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis. (MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same. (check_defs): Add comment. (can_skip_redundant_opnd): Update comment. (compute_uninit_opnds_pos): Adjust to namespace change. (find_pdom): Move to gimple-predicate-analysis.cc. (find_dom): Same. (struct uninit_undef_val_t): New. (is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc. (find_control_equiv_block): Same. (MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same. (MAX_SWITCH_CASES): Same. (compute_control_dep_chain): Same. (find_uninit_use): Use predicate analyzer. (struct pred_info): Move to gimple-predicate-analysis. (convert_control_dep_chain_into_preds): Same. (find_predicates): Same. (collect_phi_def_edges): Same. (warn_uninitialized_phi): Use predicate analyzer. (find_def_preds): Move to gimple-predicate-analysis. (dump_pred_info): Same. (dump_pred_chain): Same. (dump_predicates): Same. (destroy_predicate_vecs): Remove. (execute_late_warn_uninitialized): New. (get_cmp_code): Move to gimple-predicate-analysis. (is_value_included_in): Same. (value_sat_pred_p): Same. (find_matching_predicate_in_rest_chains): Same. (is_use_properly_guarded): Same. (prune_uninit_phi_opnds): Same. (find_var_cmp_const): Same. (use_pred_not_overlap_with_undef_path_pred): Same. (pred_equal_p): Same. (is_neq_relop_p): Same. (is_neq_zero_form_p): Same. (pred_expr_equal_p): Same. (is_pred_expr_subset_of): Same. (is_pred_chain_subset_of): Same. (is_included_in): Same. (is_superset_of): Same. (pred_neg_p): Same. (simplify_pred): Same. (simplify_preds_2): Same. (simplify_preds_3): Same. (simplify_preds_4): Same. (simplify_preds): Same. (push_pred): Same. (push_to_worklist): Same. (get_pred_info_from_cmp): Same. (is_degenerated_phi): Same. (normalize_one_pred_1): Same. (normalize_one_pred): Same. (normalize_one_pred_chain): Same. (normalize_preds): Same. (can_one_predicate_be_invalidated_p): Same. (can_chain_union_be_invalidated_p): Same. (uninit_uses_cannot_happen): Same. (pass_late_warn_uninitialized::execute): Define. * gimple-predicate-analysis.cc: New file. * gimple-predicate-analysis.h: New file. --- gcc/Makefile.in | 1 + gcc/gimple-predicate-analysis.cc | 2400 +++++++++++++++++++++++++++++++++++++ gcc/gimple-predicate-analysis.h | 158 +++ gcc/tree-ssa-uninit.c | 2431 +++----------------------------------- 4 files changed, 2741 insertions(+), 2249 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index b8229adf580..f36ffa4740b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1394,6 +1394,7 @@ OBJS = \ gimple-loop-jam.o \ gimple-loop-versioning.o \ gimple-low.o \ + gimple-predicate-analysis.o \ gimple-pretty-print.o \ gimple-range.o \ gimple-range-cache.o \ diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc new file mode 100644 index 00000000000..3404f2d630a --- /dev/null +++ b/gcc/gimple-predicate-analysis.cc @@ -0,0 +1,2400 @@ +/* Support for simple predicate analysis. + + Copyright (C) 2001-2021 Free Software Foundation, Inc. + Contributed by Xinliang David Li <davidxl(a)google.com> + Generalized by Martin Sebor <msebor(a)redhat.com> + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#define INCLUDE_STRING +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-pretty-print.h" +#include "diagnostic-core.h" +#include "fold-const.h" +#include "gimple-iterator.h" +#include "tree-ssa.h" +#include "tree-cfg.h" +#include "cfghooks.h" +#include "attribs.h" +#include "builtins.h" +#include "calls.h" +#include "value-query.h" + +#include "gimple-predicate-analysis.h" + +#define DEBUG_PREDICATE_ANALYZER 1 + +/* Find the immediate postdominator of the specified basic block BB. */ + +static inline basic_block +find_pdom (basic_block bb) +{ + basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun); + if (bb == exit_bb) + return exit_bb; + + if (basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb)) + return pdom; + + return exit_bb; +} + +/* Find the immediate dominator of the specified basic block BB. */ + +static inline basic_block +find_dom (basic_block bb) +{ + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + if (bb == entry_bb) + return entry_bb; + + if (basic_block dom = get_immediate_dominator (CDI_DOMINATORS, bb)) + return dom; + + return entry_bb; +} + +/* Return true if BB1 is postdominating BB2 and BB1 is not a loop exit + bb. The loop exit bb check is simple and does not cover all cases. */ + +static bool +is_non_loop_exit_postdominating (basic_block bb1, basic_block bb2) +{ + if (!dominated_by_p (CDI_POST_DOMINATORS, bb2, bb1)) + return false; + + if (single_pred_p (bb1) && !single_succ_p (bb2)) + return false; + + return true; +} + +/* Find BB's closest postdominator that is its control equivalent (i.e., + that's controlled by the same predicate). */ + +static inline basic_block +find_control_equiv_block (basic_block bb) +{ + basic_block pdom = find_pdom (bb); + + /* Skip the postdominating bb that is also a loop exit. */ + if (!is_non_loop_exit_postdominating (pdom, bb)) + return NULL; + + /* If the postdominator is dominated by BB, return it. */ + if (dominated_by_p (CDI_DOMINATORS, pdom, bb)) + return pdom; + + return NULL; +} + +/* Return true if X1 is the negation of X2. */ + +static inline bool +pred_neg_p (const pred_info &x1, const pred_info &x2) +{ + if (!operand_equal_p (x1.pred_lhs, x2.pred_lhs, 0) + || !operand_equal_p (x1.pred_rhs, x2.pred_rhs, 0)) + return false; + + tree_code c1 = x1.cond_code, c2; + if (x1.invert == x2.invert) + c2 = invert_tree_comparison (x2.cond_code, false); + else + c2 = x2.cond_code; + + return c1 == c2; +} + +/* Return whether the condition (VAL CMPC BOUNDARY) is true. */ + +static bool +is_value_included_in (tree val, tree boundary, tree_code cmpc) +{ + /* Only handle integer constant here. */ + if (TREE_CODE (val) != INTEGER_CST || TREE_CODE (boundary) != INTEGER_CST) + return true; + + bool inverted = false; + if (cmpc == GE_EXPR || cmpc == GT_EXPR || cmpc == NE_EXPR) + { + cmpc = invert_tree_comparison (cmpc, false); + inverted = true; + } + + bool result; + if (cmpc == EQ_EXPR) + result = tree_int_cst_equal (val, boundary); + else if (cmpc == LT_EXPR) + result = tree_int_cst_lt (val, boundary); + else + { + gcc_assert (cmpc == LE_EXPR); + result = tree_int_cst_le (val, boundary); + } + + if (inverted) + result ^= 1; + + return result; +} + +/* Format the vector of edges EV as a string. */ + +static std::string +format_edge_vec (const vec<edge> &ev) +{ + std::string str; + + unsigned n = ev.length (); + for (unsigned i = 0; i < n; ++i) + { + char es[32]; + const_edge e = ev[i]; + sprintf (es, "%u", e->src->index); + str += es; + if (i + 1 < n) + str += " -> "; + } + return str; +} + +/* Format the first N elements of the array of vector of edges EVA as + a string. */ + +static std::string +format_edge_vecs (const vec<edge> eva[], unsigned n) +{ + std::string str; + + for (unsigned i = 0; i != n; ++i) + { + str += '{'; + str += format_edge_vec (eva[i]); + str += '}'; + if (i + 1 < n) + str += ", "; + } + return str; +} + +/* Dump a single pred_info to DUMP_FILE. */ + +static void +dump_pred_info (const pred_info &pred) +{ + if (pred.invert) + fprintf (dump_file, "NOT ("); + print_generic_expr (dump_file, pred.pred_lhs); + fprintf (dump_file, " %s ", op_symbol_code (pred.cond_code)); + print_generic_expr (dump_file, pred.pred_rhs); + if (pred.invert) + fputc (')', dump_file); +} + +/* Dump a pred_chain to DUMP_FILE. */ + +static void +dump_pred_chain (const pred_chain &chain) +{ + unsigned np = chain.length (); + if (np > 1) + fprintf (dump_file, "AND ("); + + for (unsigned j = 0; j < np; j++) + { + dump_pred_info (chain[j]); + if (j < np - 1) + fprintf (dump_file, ", "); + else if (j > 0) + fputc (')', dump_file); + } +} + +/* Dump the predicate chain PREDS for STMT, prefixed by MSG. */ + +static void +dump_predicates (gimple *stmt, const pred_chain_union &preds, const char *msg) +{ + fprintf (dump_file, "%s", msg); + if (stmt) + { + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "is guarded by:\n"); + } + + unsigned np = preds.length (); + if (np > 1) + fprintf (dump_file, "OR ("); + for (unsigned i = 0; i < np; i++) + { + dump_pred_chain (preds[i]); + if (i < np - 1) + fprintf (dump_file, ", "); + else if (i > 0) + fputc (')', dump_file); + } + fputc ('\n', dump_file); +} + +/* Dump the first NCHAINS elements of the DEP_CHAINS array into DUMP_FILE. */ + +static void +dump_dep_chains (const auto_vec<edge> dep_chains[], unsigned nchains) +{ + if (!dump_file) + return; + + for (unsigned i = 0; i != nchains; ++i) + { + const auto_vec<edge> &v = dep_chains[i]; + unsigned n = v.length (); + for (unsigned j = 0; j != n; ++j) + { + fprintf (dump_file, "%u", v[j]->src->index); + if (j + 1 < n) + fprintf (dump_file, " -> "); + } + fputc ('\n', dump_file); + } +} + +/* Return the 'normalized' conditional code with operand swapping + and condition inversion controlled by SWAP_COND and INVERT. */ + +static tree_code +get_cmp_code (tree_code orig_cmp_code, bool swap_cond, bool invert) +{ + tree_code tc = orig_cmp_code; + + if (swap_cond) + tc = swap_tree_comparison (orig_cmp_code); + if (invert) + tc = invert_tree_comparison (tc, false); + + switch (tc) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + break; + default: + return ERROR_MARK; + } + return tc; +} + +/* Return true if PRED is common among all predicate chains in PREDS + (and therefore can be factored out). */ + +static bool +find_matching_predicate_in_rest_chains (const pred_info &pred, + const pred_chain_union &preds) +{ + /* Trival case. */ + if (preds.length () == 1) + return true; + + for (unsigned i = 1; i < preds.length (); i++) + { + bool found = false; + const pred_chain &chain = preds[i]; + unsigned n = chain.length (); + for (unsigned j = 0; j < n; j++) + { + const pred_info &pred2 = chain[j]; + /* Can relax the condition comparison to not use address + comparison. However, the most common case is that + multiple control dependent paths share a common path + prefix, so address comparison should be ok. */ + if (operand_equal_p (pred2.pred_lhs, pred.pred_lhs, 0) + && operand_equal_p (pred2.pred_rhs, pred.pred_rhs, 0) + && pred2.invert == pred.invert) + { + found = true; + break; + } + } + if (!found) + return false; + } + return true; +} + +/* Find a predicate to examine against paths of interest. If there + is no predicate of the "FLAG_VAR CMP CONST" form, try to find one + of that's the form "FLAG_VAR CMP FLAG_VAR" with value range info. + PHI is the phi node whose incoming (interesting) paths need to be + examined. On success, return the comparison code, set defintion + gimple of FLAG_DEF and BOUNDARY_CST. Otherwise return ERROR_MARK. */ + +static tree_code +find_var_cmp_const (pred_chain_union preds, gphi *phi, gimple **flag_def, + tree *boundary_cst) +{ + tree_code vrinfo_code = ERROR_MARK; + gimple *vrinfo_def = NULL; + tree vrinfo_cst = NULL; + + gcc_assert (preds.length () > 0); + pred_chain chain = preds[0]; + for (unsigned i = 0; i < chain.length (); i++) + { + bool use_vrinfo_p = false; + const pred_info &pred = chain[i]; + tree cond_lhs = pred.pred_lhs; + tree cond_rhs = pred.pred_rhs; + if (cond_lhs == NULL_TREE || cond_rhs == NULL_TREE) + continue; + + tree_code code = get_cmp_code (pred.cond_code, false, pred.invert); + if (code == ERROR_MARK) + continue; + + /* Convert to the canonical form SSA_NAME CMP CONSTANT. */ + if (TREE_CODE (cond_lhs) == SSA_NAME + && is_gimple_constant (cond_rhs)) + ; + else if (TREE_CODE (cond_rhs) == SSA_NAME + && is_gimple_constant (cond_lhs)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + /* Check if we can take advantage of FLAG_VAR COMP FLAG_VAR predicate + with value range info. Note only first of such case is handled. */ + else if (vrinfo_code == ERROR_MARK + && TREE_CODE (cond_lhs) == SSA_NAME + && TREE_CODE (cond_rhs) == SSA_NAME) + { + gimple* lhs_def = SSA_NAME_DEF_STMT (cond_lhs); + if (!lhs_def || gimple_code (lhs_def) != GIMPLE_PHI + || gimple_bb (lhs_def) != gimple_bb (phi)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + + /* Check value range info of rhs, do following transforms: + flag_var < [min, max] -> flag_var < max + flag_var > [min, max] -> flag_var > min + + We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: + flag_var <= [min, max] -> flag_var < [min, max+1] + flag_var >= [min, max] -> flag_var > [min-1, max] + if no overflow/wrap. */ + tree type = TREE_TYPE (cond_lhs); + value_range r; + if (!INTEGRAL_TYPE_P (type) + || !get_range_query (cfun)->range_of_expr (r, cond_rhs) + || r.kind () != VR_RANGE) + continue; + + wide_int min = r.lower_bound (); + wide_int max = r.upper_bound (); + if (code == LE_EXPR + && max != wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = LT_EXPR; + max = max + 1; + } + if (code == GE_EXPR + && min != wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = GT_EXPR; + min = min - 1; + } + if (code == LT_EXPR) + cond_rhs = wide_int_to_tree (type, max); + else if (code == GT_EXPR) + cond_rhs = wide_int_to_tree (type, min); + else + continue; + + use_vrinfo_p = true; + } + else + continue; + + if ((*flag_def = SSA_NAME_DEF_STMT (cond_lhs)) == NULL) + continue; + + if (gimple_code (*flag_def) != GIMPLE_PHI + || gimple_bb (*flag_def) != gimple_bb (phi) + || !find_matching_predicate_in_rest_chains (pred, preds)) + continue; + + /* Return if any "flag_var comp const" predicate is found. */ + if (!use_vrinfo_p) + { + *boundary_cst = cond_rhs; + return code; + } + /* Record if any "flag_var comp flag_var[vinfo]" predicate is found. */ + else if (vrinfo_code == ERROR_MARK) + { + vrinfo_code = code; + vrinfo_def = *flag_def; + vrinfo_cst = cond_rhs; + } + } + /* Return the "flag_var cmp flag_var[vinfo]" predicate we found. */ + if (vrinfo_code != ERROR_MARK) + { + *flag_def = vrinfo_def; + *boundary_cst = vrinfo_cst; + } + return vrinfo_code; +} + +/* Return true if all interesting opnds are pruned, false otherwise. + PHI is the phi node with interesting operands, OPNDS is the bitmap + of the interesting operand positions, FLAG_DEF is the statement + defining the flag guarding the use of the PHI output, BOUNDARY_CST + is the const value used in the predicate associated with the flag, + CMP_CODE is the comparison code used in the predicate, VISITED_PHIS + is the pointer set of phis visited, and VISITED_FLAG_PHIS is + the pointer to the pointer set of flag definitions that are also + phis. + + Example scenario: + + BB1: + flag_1 = phi <0, 1> // (1) + var_1 = phi <undef, some_val> + + + BB2: + flag_2 = phi <0, flag_1, flag_1> // (2) + var_2 = phi <undef, var_1, var_1> + if (flag_2 == 1) + goto BB3; + + BB3: + use of var_2 // (3) + + Because some flag arg in (1) is not constant, if we do not look into + the flag phis recursively, it is conservatively treated as unknown and + var_1 is thought to flow into use at (3). Since var_1 is potentially + uninitialized a false warning will be emitted. + Checking recursively into (1), the compiler can find out that only + some_val (which is defined) can flow into (3) which is OK. */ + +static bool +prune_phi_opnds (gphi *phi, unsigned opnds, gphi *flag_def, + tree boundary_cst, tree_code cmp_code, + predicate::func_t &eval, + hash_set<gphi *> *visited_phis, + bitmap *visited_flag_phis) +{ + /* The Boolean predicate guarding the PHI definition. Initialized + lazily from PHI in the first call to is_use_guarded() and cached + for subsequent iterations. */ + predicate def_preds (eval); + + unsigned n = MIN (eval.max_phi_args, gimple_phi_num_args (flag_def)); + for (unsigned i = 0; i < n; i++) + { + if (!MASK_TEST_BIT (opnds, i)) + continue; + + tree flag_arg = gimple_phi_arg_def (flag_def, i); + if (!is_gimple_constant (flag_arg)) + { + if (TREE_CODE (flag_arg) != SSA_NAME) + return false; + + gphi *flag_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (flag_arg)); + if (!flag_arg_def) + return false; + + tree phi_arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (phi_arg) != SSA_NAME) + return false; + + gphi *phi_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (phi_arg)); + if (!phi_arg_def) + return false; + + if (gimple_bb (phi_arg_def) != gimple_bb (flag_arg_def)) + return false; + + if (!*visited_flag_phis) + *visited_flag_phis = BITMAP_ALLOC (NULL); + + tree phi_result = gimple_phi_result (flag_arg_def); + if (bitmap_bit_p (*visited_flag_phis, SSA_NAME_VERSION (phi_result))) + return false; + + bitmap_set_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + + /* Now recursively try to prune the interesting phi args. */ + unsigned opnds_arg_phi = eval.phi_arg_set (phi_arg_def); + if (!prune_phi_opnds (phi_arg_def, opnds_arg_phi, flag_arg_def, + boundary_cst, cmp_code, eval, visited_phis, + visited_flag_phis)) + return false; + + bitmap_clear_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + continue; + } + + /* Now check if the constant is in the guarded range. */ + if (is_value_included_in (flag_arg, boundary_cst, cmp_code)) + { + /* Now that we know that this undefined edge is not pruned. + If the operand is defined by another phi, we can further + prune the incoming edges of that phi by checking + the predicates of this operands. */ + + tree opnd = gimple_phi_arg_def (phi, i); + gimple *opnd_def = SSA_NAME_DEF_STMT (opnd); + if (gphi *opnd_def_phi = dyn_cast <gphi *> (opnd_def)) + { + unsigned opnds2 = eval.phi_arg_set (opnd_def_phi); + if (!MASK_EMPTY (opnds2)) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + if (def_preds.is_use_guarded (phi, opnd_edge->src, + opnd_def_phi, opnds2, + visited_phis)) + return false; + } + } + else + return false; + } + } + + return true; +} + +/* Recursively compute the set PHI's incoming edges with "uninteresting" + operands of a phi chain, i.e., those for which EVAL returns false. + CD_ROOT is the control dependence root from which edges are collected + up the CFG nodes that it's dominated by. *EDGES holds the result, and + VISITED is used for detecting cycles. */ + +static void +collect_phi_def_edges (gphi *phi, basic_block cd_root, auto_vec<edge> *edges, + predicate::func_t &eval, hash_set<gimple *> *visited) +{ + if (visited->elements () == 0 + && DEBUG_PREDICATE_ANALYZER + && dump_file) + { + fprintf (dump_file, "%s for cd_root %u and ", + __func__, cd_root->index); + print_gimple_stmt (dump_file, phi, 0); + + } + + if (visited->add (phi)) + return; + + unsigned n = gimple_phi_num_args (phi); + for (unsigned i = 0; i < n; i++) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + tree opnd = gimple_phi_arg_def (phi, i); + + if (TREE_CODE (opnd) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (opnd); + + if (gimple_code (def) == GIMPLE_PHI + && dominated_by_p (CDI_DOMINATORS, gimple_bb (def), cd_root)) + collect_phi_def_edges (as_a<gphi *> (def), cd_root, edges, eval, + visited); + else if (!eval (opnd)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + edges->safe_push (opnd_edge); + } + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + + if (!eval (opnd)) + edges->safe_push (opnd_edge); + } + } +} + +/* Return an expression corresponding to the predicate PRED. */ + +static tree +build_pred_expr (const pred_info &pred) +{ + tree_code cond_code = pred.cond_code; + tree lhs = pred.pred_lhs; + tree rhs = pred.pred_rhs; + + if (pred.invert) + cond_code = invert_tree_comparison (cond_code, false); + + return build2 (cond_code, TREE_TYPE (lhs), lhs, rhs); +} + +/* Return an expression corresponding to PREDS. */ + +static tree +build_pred_expr (const pred_chain_union &preds, bool invert = false) +{ + tree_code code = invert ? TRUTH_AND_EXPR : TRUTH_OR_EXPR; + tree_code subcode = invert ? TRUTH_OR_EXPR : TRUTH_AND_EXPR; + + tree expr = NULL_TREE; + for (unsigned i = 0; i != preds.length (); ++i) + { + tree subexpr = NULL_TREE; + for (unsigned j = 0; j != preds[i].length (); ++j) + { + const pred_info &pi = preds[i][j]; + tree cond = build_pred_expr (pi); + if (invert) + cond = invert_truthvalue (cond); + subexpr = subexpr ? build2 (subcode, boolean_type_node, + subexpr, cond) : cond; + } + if (expr) + expr = build2 (code, boolean_type_node, expr, subexpr); + else + expr = subexpr; + } + + return expr; +} + +/* Return a bitset of all PHI arguments or zero if there are too many. */ + +unsigned +predicate::func_t::phi_arg_set (gphi *phi) +{ + unsigned n = gimple_phi_num_args (phi); + + if (max_phi_args < n) + return 0; + + /* Set the least significant N bits. */ + return (1U << n) - 1; +} + +/* Determine if the predicate set of the use does not overlap with that + of the interesting paths. The most common senario of guarded use is + in Example 1: + Example 1: + if (some_cond) + { + x = ...; // set x to valid + flag = true; + } + + ... some code ... + + if (flag) + use (x); // use when x is valid + + The real world examples are usually more complicated, but similar + and usually result from inlining: + + bool init_func (int * x) + { + if (some_cond) + return false; + *x = ...; // set *x to valid + return true; + } + + void foo (..) + { + int x; + + if (!init_func (&x)) + return; + + .. some_code ... + use (x); // use when x is valid + } + + Another possible use scenario is in the following trivial example: + + Example 2: + if (n > 0) + x = 1; + ... + if (n > 0) + { + if (m < 2) + ... = x; + } + + Predicate analysis needs to compute the composite predicate: + + 1) 'x' use predicate: (n > 0) .AND. (m < 2) + 2) 'x' default value (non-def) predicate: .NOT. (n > 0) + (the predicate chain for phi operand defs can be computed + starting from a bb that is control equivalent to the phi's + bb and is dominating the operand def.) + + and check overlapping: + (n > 0) .AND. (m < 2) .AND. (.NOT. (n > 0)) + <==> false + + This implementation provides a framework that can handle different + scenarios. (Note that many simple cases are handled properly without + the predicate analysis if jump threading eliminates the merge point + thus makes path-sensitive analysis unnecessary.) + + PHI is the phi node whose incoming (undefined) paths need to be + pruned, and OPNDS is the bitmap holding interesting operand + positions. VISITED is the pointer set of phi stmts being + checked. */ + +bool +predicate::overlap (gphi *phi, unsigned opnds, hash_set<gphi *> *visited) +{ + gimple *flag_def = NULL; + tree boundary_cst = NULL_TREE; + bitmap visited_flag_phis = NULL; + + /* Find within the common prefix of multiple predicate chains + a predicate that is a comparison of a flag variable against + a constant. */ + tree_code cmp_code = find_var_cmp_const (m_preds, phi, &flag_def, + &boundary_cst); + if (cmp_code == ERROR_MARK) + return true; + + /* Now check all the uninit incoming edges have a constant flag + value that is in conflict with the use guard/predicate. */ + gphi *phi_def = as_a<gphi *> (flag_def); + bool all_pruned = prune_phi_opnds (phi, opnds, phi_def, boundary_cst, + cmp_code, m_eval, visited, + &visited_flag_phis); + + if (visited_flag_phis) + BITMAP_FREE (visited_flag_phis); + + return !all_pruned; +} + +/* Return true if two predicates PRED1 and X2 are equivalent. Assume + the expressions have already properly re-associated. */ + +static inline bool +pred_equal_p (const pred_info &pred1, const pred_info &pred2) +{ + if (!operand_equal_p (pred1.pred_lhs, pred2.pred_lhs, 0) + || !operand_equal_p (pred1.pred_rhs, pred2.pred_rhs, 0)) + return false; + + tree_code c1 = pred1.cond_code, c2; + if (pred1.invert != pred2.invert + && TREE_CODE_CLASS (pred2.cond_code) == tcc_comparison) + c2 = invert_tree_comparison (pred2.cond_code, false); + else + c2 = pred2.cond_code; + + return c1 == c2; +} + +/* Return true if PRED tests inequality (i.e., X != Y). */ + +static inline bool +is_neq_relop_p (const pred_info &pred) +{ + + return ((pred.cond_code == NE_EXPR && !pred.invert) + || (pred.cond_code == EQ_EXPR && pred.invert)); +} + +/* Returns true if PRED is of the form X != 0. */ + +static inline bool +is_neq_zero_form_p (const pred_info &pred) +{ + if (!is_neq_relop_p (pred) || !integer_zerop (pred.pred_rhs) + || TREE_CODE (pred.pred_lhs) != SSA_NAME) + return false; + return true; +} + +/* Return true if PRED is equivalent to X != 0. */ + +static inline bool +pred_expr_equal_p (const pred_info &pred, tree expr) +{ + if (!is_neq_zero_form_p (pred)) + return false; + + return operand_equal_p (pred.pred_lhs, expr, 0); +} + +/* Return true if VAL satisfies (x CMPC BOUNDARY) predicate. CMPC can + be either one of the range comparison codes ({GE,LT,EQ,NE}_EXPR and + the like), or BIT_AND_EXPR. EXACT_P is only meaningful for the latter. + Modify the question from VAL & BOUNDARY != 0 to VAL & BOUNDARY == VAL. + For other values of CMPC, EXACT_P is ignored. */ + +static bool +value_sat_pred_p (tree val, tree boundary, tree_code cmpc, + bool exact_p = false) +{ + if (cmpc != BIT_AND_EXPR) + return is_value_included_in (val, boundary, cmpc); + + wide_int andw = wi::to_wide (val) & wi::to_wide (boundary); + if (exact_p) + return andw == wi::to_wide (val); + + return andw.to_uhwi (); +} + +/* Return true if the domain of single predicate expression PRED1 + is a subset of that of PRED2, and false if it cannot be proved. */ + +static bool +subset_of (const pred_info &pred1, const pred_info &pred2) +{ + if (pred_equal_p (pred1, pred2)) + return true; + </cut>

4 years, 9 months

2
1
0 0

clang-aarch64-full-2stage buildbot timeout

by Florian Hahn

Hi, It looks like a lot of the recent builds of clang-aarch64-full-2stage are timing out. E.g https://lab.llvm.org/buildbot/#/builders/179/builds/1078 while checking out sources https://lab.llvm.org/buildbot/#/builders/179/builds/1076 during building stage2 Is there anything that could be done to avoid such timeouts and avoid false positive failure emails? Cheers, Florian

4 years, 9 months

1
1
0 0

[TCWG CI] 403.gcc grew in size by 2% after llvm: Turn on the new pass manager by default

by ci_notify＠linaro.org

After llvm commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Turn on the new pass manager by default the following benchmarks grew in size by more than 1%: - 403.gcc grew in size by 2% from 2586180 to 2648252 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Date: Mon Jan 25 11:00:56 2021 -0800 Turn on the new pass manager by default This turns on the new pass manager by default for the optimization pipeline in Clang and ThinLTO in various LLD backends. This also makes uses of `opt -instcombine` use the new pass manager (unless specifically opted out). This does not affect the backend target-dependent codegen pipeline. If this causes regressions, you can opt out of the new pass manager either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag while building LLVM, or via various compiler flags, e.g. -flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for ELF LLD. Please file bugs for any regressions. Major differences: * The inliner works slightly differently * -O1 does some amount of inlining * LCSSA and LoopSimplify are run before all loop passes * Loop unswitching is implemented slightly differently * A new SpeculateAroundPHIs pass is added to the pipeline https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html Reviewed By: asbirlea, ychen, MaskRay, echristo Differential Revision: https://reviews.llvm.org/D95380 --- llvm/CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 1affc289e64b..f5298de9f7ca 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -688,8 +688,8 @@ else() endif() option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default}) -set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL - "Enable the experimental new pass manager by default.") +set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL + "Enable the new pass manager by default.") include(HandleLLVMOptions) </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by linux: parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled

by ci_notify＠linaro.org

[TCWG CI] Regression caused by linux: parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled: commit 9caea0007601d3bc6debec04f8b4cd6f4c2394be Author: Helge Deller <deller(a)gmx.de> parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 37 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 20151 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Baseline build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Even more details: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Reproduce builds: <cut> mkdir investigate-linux-9caea0007601d3bc6debec04f8b4cd6f4c2394be cd investigate-linux-9caea0007601d3bc6debec04f8b4cd6f4c2394be # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 9caea0007601d3bc6debec04f8b4cd6f4c2394be ../artifacts/test.sh # Reproduce last_good build git checkout --detach 31ad37bd6faf871c070650f72ac9488ceeeceeb0 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9caea0007601d3bc6debec04f8b4cd6f4c2394be Author: Helge Deller <deller(a)gmx.de> Date: Sun Sep 19 10:36:09 2021 -0700 parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled Linus noticed odd declaration rules for pci_iounmap() in iomap.h and pci_iomap.h, where it dependend on either NO_GENERIC_PCI_IOPORT_MAP or GENERIC_IOMAP when CONFIG_PCI was disabled. Testing on parisc seems to indicate that we need pci_iounmap() only when CONFIG_PCI is enabled, so the declaration of pci_iounmap() can be moved cleanly into pci_iomap.h in sync with the declarations of pci_iomap(). Link: https://lore.kernel.org/all/CAHk-=wjRrh98pZoQ+AzfWmsTZacWxTJKXZ9eKU2X_0+jM=… Signed-off-by: Helge Deller <deller(a)gmx.de> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Fixes: 97a29d59fc22 ("[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional") Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Guenter Roeck <linux(a)roeck-us.net> Cc: Ulrich Teichert <krypton(a)ulrich-teichert.org> Cc: James Bottomley <James.Bottomley(a)hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> --- arch/parisc/lib/iomap.c | 4 +++- include/asm-generic/iomap.h | 10 ---------- include/asm-generic/pci_iomap.h | 3 +++ 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/arch/parisc/lib/iomap.c b/arch/parisc/lib/iomap.c index f03adb1999e7..367f6397bda7 100644 --- a/arch/parisc/lib/iomap.c +++ b/arch/parisc/lib/iomap.c @@ -513,12 +513,15 @@ void ioport_unmap(void __iomem *addr) } } +#ifdef CONFIG_PCI void pci_iounmap(struct pci_dev *dev, void __iomem * addr) { if (!INDIRECT_ADDR(addr)) { iounmap(addr); } } +EXPORT_SYMBOL(pci_iounmap); +#endif EXPORT_SYMBOL(ioread8); EXPORT_SYMBOL(ioread16); @@ -544,4 +547,3 @@ EXPORT_SYMBOL(iowrite16_rep); EXPORT_SYMBOL(iowrite32_rep); EXPORT_SYMBOL(ioport_map); EXPORT_SYMBOL(ioport_unmap); -EXPORT_SYMBOL(pci_iounmap); diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index 9b3eb6d86200..08237ae8b840 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -110,16 +110,6 @@ static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) } #endif -#ifdef CONFIG_PCI -/* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */ -struct pci_dev; -extern void pci_iounmap(struct pci_dev *dev, void __iomem *); -#elif defined(CONFIG_GENERIC_IOMAP) -struct pci_dev; -static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) -{ } -#endif - #include <asm-generic/pci_iomap.h> #endif diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index df636c6d8e6c..5a2f9bf53384 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -18,6 +18,7 @@ extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void pci_iounmap(struct pci_dev *dev, void __iomem *); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ @@ -50,6 +51,8 @@ static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, { return NULL; } +static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) +{ } #endif #endif /* __ASM_GENERIC_PCI_IOMAP_H */ </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] 447.dealII:libstdc++.so.6.0.29 grew in size by 12% after gcc: libstdc++: Add floating-point std::to_chars implementation

by ci_notify＠linaro.org

After gcc commit 3c57e692357c79ee7623dfc1586652aee2aefb8f Author: Patrick Palka <ppalka(a)redhat.com> libstdc++: Add floating-point std::to_chars implementation the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 447.dealII:libstdc++.so.6.0.29 grew in size by 12% from 1245370 to 1391240 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: arm-linux-gnueabihf - Compiler flags: -Os -mthumb - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f cd investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 3c57e692357c79ee7623dfc1586652aee2aefb8f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5033506993ef92589373270a8e8dbbf50e3ebef1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 3c57e692357c79ee7623dfc1586652aee2aefb8f Author: Patrick Palka <ppalka(a)redhat.com> Date: Thu Dec 17 23:11:34 2020 -0500 libstdc++: Add floating-point std::to_chars implementation This implements the floating-point std::to_chars overloads for float, double and long double. We use the Ryu library to compute the shortest round-trippable fixed and scientific forms for float, double and long double. We also use Ryu for performing explicit-precision fixed and scientific formatting for float and double. For explicit-precision formatting for long double we fall back to using printf. Hexadecimal formatting for float, double and long double is implemented from scratch. The supported long double binary formats are binary64, binary80 (x86 80-bit extended precision), binary128 and ibm128. Much of the complexity of the implementation is in computing the exact output length before handing it off to Ryu (which doesn't do bounds checking). In some cases it's hard to compute the output length beforehand, so in these cases we instead compute an upper bound on the output length and use a sufficiently-sized intermediate buffer only if necessary. Another source of complexity is in the general-with-precision formatting mode, where we need to do zero-trimming of the string returned by Ryu, and where we also take care to avoid having to format the number through Ryu a second time when the general formatting mode resolves to fixed (which we determine by doing a scientific formatting first and inspecting the scientific exponent). We avoid going through Ryu twice by instead transforming the scientific form to the corresponding fixed form via in-place string manipulation. This implementation is non-conforming in a couple of ways: 1. For the shortest hexadecimal formatting, we currently follow the Microsoft implementation's decision to be consistent with the output of printf's '%a' specifier at the expense of sometimes not printing the shortest representation. For example, the shortest hex form for the number 1.08p+0 is 2.1p-1, but we output the former instead of the latter, as does printf. 2. The Ryu routine generic_binary_to_decimal that we use for performing shortest formatting for large floating point types is implemented using the __int128 type, but some targets with a large long double type lack __int128 (e.g. i686), so we can't perform shortest formatting of long double on such targets through Ryu. As a temporary stopgap this patch makes the long double to_chars overloads just dispatch to the double overloads on these targets, which means we lose precision in the output. (We could potentially fix this by writing a specialized version of Ryu's generic_binary_to_decimal routine that uses uint64_t instead of __int128.) [Though I wonder if there's a better way to work around the lack of __int128 on i686 specifically?] 3. Our shortest formatting for __ibm128 doesn't guarantee the round-trip property if the difference between the high- and low-order exponent is large. This is because we treat __ibm128 as if it has a contiguous 105-bit mantissa by merging the mantissas of the high- and low-order parts (using code extracted from glibc), so we potentially lose precision from the low-order part. This seems to be consistent with how glibc printf formats __ibm128. libstdc++-v3/ChangeLog: * config/abi/pre/gnu.ver: Add new exports. * include/std/charconv (to_chars): Declare the floating-point overloads for float, double and long double. * src/c++17/Makefile.am (sources): Add floating_to_chars.cc. * src/c++17/Makefile.in: Regenerate. * src/c++17/floating_to_chars.cc: New file. (to_chars): Define for float, double and long double. * testsuite/20_util/to_chars/long_double.cc: New test. --- libstdc++-v3/config/abi/pre/gnu.ver | 7 + libstdc++-v3/include/std/charconv | 24 + libstdc++-v3/src/c++17/Makefile.am | 1 + libstdc++-v3/src/c++17/Makefile.in | 3 +- libstdc++-v3/src/c++17/floating_to_chars.cc | 1563 ++++++++++++++++++++ .../testsuite/20_util/to_chars/long_double.cc | 199 +++ 6 files changed, 1796 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 4b4bd8ab6da..05e0a512247 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -2393,6 +2393,13 @@ GLIBCXX_3.4.29 { # std::once_flag::_M_finish(bool) _ZNSt9once_flag9_M_finishEb; + # std::to_chars(char*, char*, [float|double|long double]) + _ZSt8to_charsPcS_[defg]; + # std::to_chars(char*, char*, [float|double|long double], chars_format) + _ZSt8to_charsPcS_[defg]St12chars_format; + # std::to_chars(char*, char*, [float|double|long double], chars_format, int) + _ZSt8to_charsPcS_[defg]St12chars_formati; + } GLIBCXX_3.4.28; # Symbols in the support library (libsupc++) have their own tag. diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index dd1ebdf8322..b57b0a16db2 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -702,6 +702,30 @@ namespace __detail chars_format __fmt = chars_format::general) noexcept; #endif + // Floating-point std::to_chars + + // Overloads for float. + to_chars_result to_chars(char* __first, char* __last, float __value) noexcept; + to_chars_result to_chars(char* __first, char* __last, float __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, float __value, + chars_format __fmt, int __precision) noexcept; + + // Overloads for double. + to_chars_result to_chars(char* __first, char* __last, double __value) noexcept; + to_chars_result to_chars(char* __first, char* __last, double __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, double __value, + chars_format __fmt, int __precision) noexcept; + + // Overloads for long double. + to_chars_result to_chars(char* __first, char* __last, long double __value) + noexcept; + to_chars_result to_chars(char* __first, char* __last, long double __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, long double __value, + chars_format __fmt, int __precision) noexcept; + _GLIBCXX_END_NAMESPACE_VERSION } // namespace std #endif // C++14 diff --git a/libstdc++-v3/src/c++17/Makefile.am b/libstdc++-v3/src/c++17/Makefile.am index 37cdb53c076..2ec5ed621ca 100644 --- a/libstdc++-v3/src/c++17/Makefile.am +++ b/libstdc++-v3/src/c++17/Makefile.am @@ -51,6 +51,7 @@ endif sources = \ floating_from_chars.cc \ + floating_to_chars.cc \ fs_dir.cc \ fs_ops.cc \ fs_path.cc \ diff --git a/libstdc++-v3/src/c++17/Makefile.in b/libstdc++-v3/src/c++17/Makefile.in index ccae721ab3f..9b36b7a916c 100644 --- a/libstdc++-v3/src/c++17/Makefile.in +++ b/libstdc++-v3/src/c++17/Makefile.in @@ -124,7 +124,7 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) libc__17convenience_la_LIBADD = @ENABLE_DUAL_ABI_TRUE@am__objects_1 = cow-fs_dir.lo cow-fs_ops.lo \ @ENABLE_DUAL_ABI_TRUE@ cow-fs_path.lo -am__objects_2 = floating_from_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \ +am__objects_2 = floating_from_chars.lo floating_to_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \ memory_resource.lo $(am__objects_1) @ENABLE_DUAL_ABI_TRUE@am__objects_3 = cow-string-inst.lo @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_4 = ostream-inst.lo \ @@ -440,6 +440,7 @@ headers = sources = \ floating_from_chars.cc \ + floating_to_chars.cc \ fs_dir.cc \ fs_ops.cc \ fs_path.cc \ diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc b/libstdc++-v3/src/c++17/floating_to_chars.cc new file mode 100644 index 00000000000..dd83f5eea93 --- /dev/null +++ b/libstdc++-v3/src/c++17/floating_to_chars.cc @@ -0,0 +1,1563 @@ +// std::to_chars implementation for floating-point types -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +// Activate __glibcxx_assert within this file to shake out any bugs. +#define _GLIBCXX_ASSERTIONS 1 + +#include <charconv> + +#include <bit> +#include <cfenv> +#include <cassert> +#include <cmath> +#include <cstdio> +#include <cstring> +#include <langinfo.h> +#include <optional> +#include <string_view> +#include <type_traits> + +// Determine the binary format of 'long double'. + +// We support the binary64, float80 (i.e. x86 80-bit extended precision), +// binary128, and ibm128 formats. +#define LDK_UNSUPPORTED 0 +#define LDK_BINARY64 1 +#define LDK_FLOAT80 2 +#define LDK_BINARY128 3 +#define LDK_IBM128 4 + +#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ +# define LONG_DOUBLE_KIND LDK_BINARY64 +#elif defined(__SIZEOF_INT128__) +// The Ryu routines need a 128-bit integer type in order to do shortest +// formatting of types larger than 64-bit double, so without __int128 we can't +// support any large long double format. This is the case for e.g. i386. +# if __LDBL_MANT_DIG__ == 64 +# define LONG_DOUBLE_KIND LDK_FLOAT80 +# elif __LDBL_MANT_DIG__ == 113 +# define LONG_DOUBLE_KIND LDK_BINARY128 +# elif __LDBL_MANT_DIG__ == 106 +# define LONG_DOUBLE_KIND LDK_IBM128 +# endif +#endif +#if !defined(LONG_DOUBLE_KIND) +# define LONG_DOUBLE_KIND LDK_UNSUPPORTED +#endif + +namespace +{ + namespace ryu + { +#include "ryu/common.h" +#include "ryu/digit_table.h" +#include "ryu/d2s_intrinsics.h" +#include "ryu/d2s_full_table.h" +#include "ryu/d2fixed_full_table.h" +#include "ryu/f2s_intrinsics.h" +#include "ryu/d2s.c" +#include "ryu/d2fixed.c" +#include "ryu/f2s.c" + +#ifdef __SIZEOF_INT128__ + namespace generic128 + { + // Put the generic Ryu bits in their own namespace to avoid name conflicts. +# include "ryu/generic_128.h" +# include "ryu/ryu_generic_128.h" +# include "ryu/generic_128.c" + } // namespace generic128 + + using generic128::floating_decimal_128; + using generic128::generic_binary_to_decimal; + + int + to_chars(const floating_decimal_128 v, char* const result) + { return generic128::generic_to_chars(v, result); } +#endif + } // namespace ryu + + // A traits class that contains pertinent information about the binary + // format of each of the floating-point types we support. + template<typename T> + struct floating_type_traits + { }; + + template<> + struct floating_type_traits<float> + { + // We (and Ryu) assume float has the IEEE binary32 format. + static_assert(__FLT_MANT_DIG__ == 24); + static constexpr int mantissa_bits = 23; + static constexpr int exponent_bits = 8; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = uint32_t; + using shortest_scientific_t = ryu::floating_decimal_32; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000011101011100110101100101101110000000000000000000000000 }; + }; + + template<> + struct floating_type_traits<double> + { + // We (and Ryu) assume double has the IEEE binary64 format. + static_assert(__DBL_MANT_DIG__ == 53); + static constexpr int mantissa_bits = 52; + static constexpr int exponent_bits = 11; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = uint64_t; + using shortest_scientific_t = ryu::floating_decimal_64; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000011000110101110111000001100101110000111100, + 0b0111100011110101011000011110000000110110010101011000001110011111, + 0b0101101100000000011100100100111100110110110100010001010101110000, + 0b0011110010111000101111110101100011101100010001010000000101100111, + 0b0001010000011001011100100001010000010101101000001101000000000000 }; + }; + +#if LONG_DOUBLE_KIND == LDK_BINARY64 + // When long double is equivalent to double, we just forward the long double + // overloads to the double overloads, so we don't need to define a a + // floating_type_traits<long double> specialization in this case. +#elif LONG_DOUBLE_KIND == LDK_FLOAT80 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 64; + static constexpr int exponent_bits = 15; + static constexpr bool has_implicit_leading_bit = false; + using mantissa_t = uint64_t; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000110101011111110100010100110000011101, + 0b1001100101001111010011011111101000101111110001011001011101110000, + 0b0000101111111011110010001000001010111101011110111111010100011001, + 0b0011100000011111001101101011111001111100100010000101001111101001, + 0b0100100100000000100111010010101110011000110001101101110011001010, + 0b0111100111100010100000010011000010010110101111110101000011110100, + 0b1010100111100010011110000011011101101100010110000110101010101010, + 0b0000001111001111000000101100111011011000101000110011101100110010, + 0b0111000011100100101101010100001101111110101111001000010011111111, + 0b0010111000100110100100100010101100111010110001101010010111001000, + 0b0000100000010110000011001001000111000001111010100101101000001111, + 0b0010101011101000111100001011000010011101000101010010010000101111, + 0b1011111011101101110010101011010001111000101000101101011001100011, + 0b1010111011011011110111110011001010000010011001110100101101000101, + 0b0011000001110110011010010000011100100011001011001100001101010110, + 0b0100011111011000111111101000011110000010111110101001000000001001, + 0b1110000001110001001101101110011000100000001010000111100010111010, + 0b1110001001010011101000111000001000010100110000010110100011110000, + 0b0000011010110000110001111000011111000011001101001101001001000110, + 0b1010010111001000101001100101010110100100100010010010000101000010, + 0b1011001110000111100010100110000011100011111001110111001100000101, + 0b0110101001001000010110001000010001010101110101100001111100011001, + 0b1111100011110101011110011010101001010010100011000010110001101001, + 0b0100000100001000111101011100010011011111011001000000001100011000, + 0b1110111111000111100101110111110000000011001110011100011011011001, + 0b1100001100100000010001100011011000111011110000110011010101000011, + 0b1111111011100111011101001111111000010000001111010111110010000100, + 0b1110111001111110101111000101000000001010001110011010001000111010, + 0b1000010001011000101111111010110011111101110101101001111000111010, + 0b0100000111101001000111011001101000001010111011101001101111000100, + 0b0000011100110001000111011100111100110001101111111010110111100000, + 0b0000011101011100100110010011110101010100010011110010010111010000, + 0b0011011001100111110101111100001001101110101101001110110011110110, + 0b1011000101000001110100111001100100111100110011110000000001101000, + 0b1011100011110100001001110101010110111001000000001011101001011110, + 0b1111001010010010100000010110101010101011101000101000000000001100, + 0b1000001111100100111001110101100001010011111111000001000011110000, + 0b0001011101001000010000101101111000001110101100110011001100110111, + 0b1110011100000010101011011111001010111101111110100000011100000011, + 0b1001110110011100101010011110100010110001001110110000101011100110, + 0b1001101000100011100111010000011011100001000000110101100100001001, + 0b1010111000101000101101010111000010001100001010100011111100000100, + 0b0111101000100011000101101011111011100010001101110111001111001011, + 0b1110100111010110001110110110000000010110100011110000010001111100, + 0b1100010100011010001011001000111001010101011110100101011001000000, + 0b0000110001111001100110010110111010101101001101000000000010010101, + 0b0001110111101000001111101010110010010000111110111100000111110100, + 0b0111110111001001111000110001101101001010101110110101111110000100, + 0b0000111110111010101111100010111010011100010110011011011001000001, + 0b1010010100100100101110111111111000101100000010111111101101000110, + 0b1000100111111101100011001101000110001000000100010101010100001101, + 0b1100101010101000111100101100001000110001110010100000000010110101, + 0b1010000100111101100100101010010110100010000000110101101110000100, + 0b1011111011110001110000100100000000001010111010001101100000100100, + 0b0111101101100011001110011100000001000101101101111000100111011111, + 0b0100111010010011011001010011110100001100111010010101111111100011, + 0b0010001001011000111000001100110111110111110010100011000110110110, + 0b0101010110000000010000100000110100111011111101000100000111010010, + 0b0110000011011101000001010100110101101110011100110101000000001001, + 0b1101100110100000011000001111000100100100110001100110101010101100, + 0b0010100101010110010010001010101000011111111111001011001010001111, + 0b0111001010001111001100111001010101001000110101000011110000001000, + 0b0110010011001001001111110001010010001011010010001101110110110011, + 0b0110010100111011000100111000001001101011111001110010111110111111, + 0b0101110111001001101100110100101001110010101110011001101110001000, + 0b0100110101010111011010001100010111100011010011111001010100111000, + 0b0111000110110111011110100100010111000110000110110110110001111110, + 0b1000101101010100100100111110100011110110110010011001110011110101, + 0b1001101110101001010100111101101011000101000010110101101111110000, + 0b0100100101001011011001001011000010001101001010010001010110101000, + 0b0010100001001011100110101000010110000111000111000011100101011011, + 0b0110111000011001111101101011111010001000000010101000101010011110, + 0b1000110110100001111011000001111100001001000000010110010100100100, + 0b1001110100011111100111101011010000010101011100101000010010100110, + 0b0001010110101110100010101010001110110110100011101010001001111100, + 0b1010100101101100000010110011100110100010010000100100001110000100, + 0b0001000000010000001010000010100110000001110100111001110111101101, + 0b1100000000000000000000000000000000000000000000000000000000000000 }; + }; +#elif LONG_DOUBLE_KIND == LDK_BINARY128 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 112; + static constexpr int exponent_bits = 15; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = unsigned __int128; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000000000000000000000000100000010000000, + 0b1011001111110100000100010101101110011100100110000110010110011000, + 0b1010100010001101111111000000001101010010100010010000111011110111, + 0b1011111001110001111000011111000010110111000111110100101010100101, + 0b0110100110011110011011000011000010011001110001001001010011100011, + 0b0000011111110010101111101011101010000110011111100111001110100111, + 0b0100010101010110000010111011110100000010011001001010001110111101, + 0b1101110111000010001101100000110100000111001001101011000101011011, + 0b0100111011101101010000001101011000101100101110010010110000101011, + 0b0100000110111000000110101000010011101000110100010110000011101101, + 0b1011001101001000100001010001100100001111011101010101110001010110, + 0b1000000001000000101001110010110010001111101101010101001100000110, + 0b0101110110100110000110000001001010111110001110010000111111010011, + 0b1010001111100111000100011100100100111100100101000001011001000111, + 0b1010011000011100110101100111001011100101111111100001110100000100, + 0b1100011100100010100000110001001010000000100000001001010111011101, + 0b0101110000100011001111101101000000100110000010010111010001111010, + 0b0100111100011010110111101000100110000111001001101100000001111100, + 0b1100100100111110101011000100000101011010110111000111110100110101, + 0b0110010000010111010100110011000000111010000010111011010110000100, + 0b0101001001010010110111010111000101011100000111100111000001110010, + 0b1101111111001011101010110001000111011010111101001011010110100100, + 0b0001000100110000011111101011001101110010110110010000000011100100, + 0b0001000000000101001001001000000000011000100011001110101001001110, + 0b0010010010001000111010011011100001000110011011011110110100111000, + 0b0000100110101100000111100010100100011100110111011100001111001100, + 0b1011111010001110001100000011110111111111100000001011111111101100, + 0b0000011100001111010101110000100110111100101101110111101001000001, + 0b1100010001110110111100001001001101101000011100000010110101001011, + 0b0100101001101011111001011110101101100011011111011100101010101111, + 0b0001101001111001110000101101101100001011010001011110011101000010, + 0b1111000000101001101111011010110011101110100001011011001011100010, + 0b0101001010111101101100001111100010010110001101001000001101100100, + 0b0101100101011110001100101011111000111001111001001001101101100001, + 0b1111001101010010100100011011000110110010001111000111010001001101, + 0b0001110010011000000001000110110111011000011100001000011001110111, + 0b0100001011011011011011110011101100100101111111101100101000001110, + 0b0101011110111101010111100111101111000101111111111110100011011010, + 0b1110101010001001110100000010110111010111111010111110100110010110, + 0b1010001111100001001100101000110100001100011100110010000011010111, + 0b1111111101101111000100111100000101011000001110011011101010111001, + 0b1111101100001110100101111101011001000100000101110000110010100011, + 0b1001010110110101101101000101010001010000101011011111010011010000, + 0b0111001110110011101001100111000001000100001010110000010000001101, + 0b0101111100111110100111011001111001111011011110010111010011101010, + 0b1110111000000001100100111001100100110001011011001110101111110111, + 0b0001010001001101010111101010011111000011110001101101011001111111, + 0b0101000011100011010010001101100001011101011010100110101100100010, + 0b0001000101011000100101111100110110000101101101111000110001001011, + 0b0101100101001011011000010101000000010100011100101101000010011111, + 0b1000010010001011101001011010100010111011110100110011011000100111, + 0b1000011011100001010111010111010011101100100010010010100100101001, + 0b1001001001010111110101000010111010000000101111010100001010010010, + 0b0011011110110010010101111011000001000000000011011111000011111011, + 0b1011000110100011001110000001000100000001011100010111010010011110, + 0b0111101110110101110111110000011000000100011100011000101101101110, + 0b1001100101111011011100011110101011001111100111101010101010110111, + 0b1100110010010001100011001111010000000100011101001111011101001111, + 0b1000111001111010100101000010000100000001001100101010001011001101, + 0b0011101011110000110010100101010100110010100001000010101011111101, + 0b1100000000000110000010101011000000011101000110011111100010111111, + 0b0010100110000011011100010110111100010110101100110011101110001101, + 0b0010111101010011111000111001111100110111111100100011110001101110, + 0b1001110111001001101001001001011000010100110001000000100011010110, + 0b0011110101100111011011111100001000011001010100111100100101111010, + 0b0010001101000011000010100101110000010101101000100110000100001010, + 0b0010000010100110010101100101110011101111000111111111001001100001, + 0b0100111111011011011011100111111011000010011101101111011111110110, + 0b1111111111010110101011101000100101110100001110001001101011100111, + 0b1011111101000101110000111100100010111010100001010000010010110010, + 0b1111010101001011101011101010000100110110001110111100100110111111, + 0b1011001101000001001101000010101010010110010001100001011100011010, + 0b0101001011011101010001110100010000010001111100100100100001001101, + 0b0010100000111001100011000101100101000001111100111001101000000010, + 0b1011001111010101011001000100100110100100110111110100000110111000, + 0b0101011111010011100011010010111101110010100001111111100010001001, + 0b0010111011101100100000000000001111111010011101100111100001001101, + 0b1101000000000000000000000000000000000000000000000000000000000000 }; + }; +#elif LONG_DOUBLE_KIND == LDK_IBM128 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 105; + static constexpr int exponent_bits = 11; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = unsigned __int128; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000000000000000000000001000000100000000, + 0b0000000000000000000100000000000000000000001000000000000000000010, + 0b0000100000000000000000001001000000000000000001100100000000000000, + 0b0011000000000000000000000000000001110000010000000000000000000000, + 0b0000100000000000001000000000000000000000000000100000000000000000 }; + }; +#endif + + // An IEEE-style decomposition of a floating-point value of type T. + template<typename T> + struct ieee_t + { + typename floating_type_traits<T>::mantissa_t mantissa; + uint32_t biased_exponent; + bool sign; + }; + + // Decompose the floating-point value into its IEEE components. + template<typename T> + ieee_t<T> + get_ieee_repr(const T value) + { + constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits; + constexpr int exponent_bits = floating_type_traits<T>::exponent_bits; + constexpr int total_bits = mantissa_bits + exponent_bits + 1; + + constexpr auto get_uint_t = [] { + if constexpr (total_bits <= 32) + return uint32_t{}; + else if constexpr (total_bits <= 64) + return uint64_t{}; +#ifdef __SIZEOF_INT128__ + else if constexpr (total_bits <= 128) + return (unsigned __int128){}; +#endif + }; + using uint_t = decltype(get_uint_t()); + uint_t value_bits = 0; + memcpy(&value_bits, &value, sizeof(value)); + + ieee_t<T> ieee_repr; + ieee_repr.mantissa = value_bits & ((uint_t{1} << mantissa_bits) - 1u); + ieee_repr.biased_exponent + = (value_bits >> mantissa_bits) & ((uint_t{1} << exponent_bits) - 1u); + ieee_repr.sign = (value_bits >> (mantissa_bits + exponent_bits)) & 1; + return ieee_repr; + } + +#if LONG_DOUBLE_KIND == LDK_IBM128 + template<> + ieee_t<long double> + get_ieee_repr(const long double value) + { + // The layout of __ibm128 isn't compatible with the standard IEEE format. + // So we transform it into an IEEE-compatible format, suitable for + // consumption by the generic Ryu API, with an 11-bit exponent and 105-bit + // mantissa (plus an implicit leading bit). We use the exponent and sign + // of the high part, and we merge the mantissa of the high part with the + // mantissa (and the implicit leading bit) of the low part. + using uint_t = unsigned __int128; + uint_t value_bits = 0; + memcpy(&value_bits, &value, sizeof(value_bits)); + + const uint64_t value_hi = value_bits; + const uint64_t value_lo = value_bits >> 64; + + uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1); + unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1); + const int sign_hi = (value_hi >> 63) & 1; + + uint64_t mantissa_lo = value_lo & ((1ull << 52) - 1); + const unsigned exponent_lo = (value_lo >> 52) & ((1ull << 11) - 1); + const int sign_lo = (value_lo >> 63) & 1; + + { + // The following code for adjusting the low-part mantissa to combine + // it with the high-part mantissa is taken from the glibc source file + // sysdeps/ieee754/ldbl-128ibm/printf_fphex.c. + mantissa_lo <<= 7; + if (exponent_lo != 0) + mantissa_lo |= (1ull << (52 + 7)); + else + mantissa_lo <<= 1; + + const int ediff = exponent_hi - exponent_lo - 53; + if (ediff > 63) + mantissa_lo = 0; + else if (ediff > 0) + mantissa_lo >>= ediff; + else if (ediff < 0) + mantissa_lo <<= -ediff; + + if (sign_lo != sign_hi && mantissa_lo != 0) + { + mantissa_lo = (1ull << 60) - mantissa_lo; + if (mantissa_hi == 0) + { + mantissa_hi = 0xffffffffffffeLL | (mantissa_lo >> 59); + mantissa_lo = 0xfffffffffffffffLL & (mantissa_lo << 1); + exponent_hi--; + } + else + mantissa_hi--; + } + } + + ieee_t<long double> ieee_repr; + ieee_repr.mantissa = ((uint_t{mantissa_hi} << 64) + | (uint_t{mantissa_lo} << 4)) >> 11; + ieee_repr.biased_exponent = exponent_hi; + ieee_repr.sign = sign_hi; + return ieee_repr; + } +#endif + + // Invoke Ryu to obtain the shortest scientific form for the given + // floating-point number. + template<typename T> + typename floating_type_traits<T>::shortest_scientific_t + floating_to_shortest_scientific(const T value) + { + if constexpr (std::is_same_v<T, float>) + return ryu::floating_to_fd32(value); + else if constexpr (std::is_same_v<T, double>) + return ryu::floating_to_fd64(value); +#ifdef __SIZEOF_INT128__ + else if constexpr (std::is_same_v<T, long double>) + { + constexpr int mantissa_bits + = floating_type_traits<T>::mantissa_bits; + constexpr int exponent_bits + = floating_type_traits<T>::exponent_bits; + constexpr bool has_implicit_leading_bit + = floating_type_traits<T>::has_implicit_leading_bit; + + const auto [mantissa, exponent, sign] = get_ieee_repr(value); + return ryu::generic_binary_to_decimal(mantissa, exponent, sign, + mantissa_bits, exponent_bits, + !has_implicit_leading_bit); + } +#endif + } + + // This subroutine returns true if the shortest scientific form fd is a + // positive power of 10, and the floating-point number that has this shortest + // scientific form is smaller than this power of 10. + // + // For instance, the exactly-representable 64-bit number + // 99999999999999991611392.0 has the shortest scientific form 1e23, so its + // exact value is smaller than its shortest scientific form. + // + // For these powers of 10 the length of the fixed form is one digit less + // than what the scientific exponent suggests. + // + // This subroutine inspects a lookup table to detect when fd is such a + // "rounded up" power of 10. + template<typename T> + bool + is_rounded_up_pow10_p(const typename + floating_type_traits<T>::shortest_scientific_t fd) + { + if (fd.exponent < 0 || fd.mantissa != 1) [[likely]] + return false; + + constexpr auto& pow10_adjustment_tab + = floating_type_traits<T>::pow10_adjustment_tab; + __glibcxx_assert(fd.exponent/64 < (int)std::size(pow10_adjustment_tab)); + return (pow10_adjustment_tab[fd.exponent/64] + & (1ull << (63 - fd.exponent%64))); + } + + int + get_mantissa_length(const ryu::floating_decimal_32 fd) + { return ryu::decimalLength9(fd.mantissa); } + + int + get_mantissa_length(const ryu::floating_decimal_64 fd) + { return ryu::decimalLength17(fd.mantissa); } + +#ifdef __SIZEOF_INT128__ + int + get_mantissa_length(const ryu::floating_decimal_128 fd) + { return ryu::generic128::decimalLength(fd.mantissa); } +#endif +} // anon namespace + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + +// This subroutine of __floating_to_chars_* handles writing nan, inf and 0 in +// all formatting modes. +template<typename T> + static optional<to_chars_result> + __handle_special_value(char* first, char* const last, const T value, + const chars_format fmt, const int precision) + { + __glibcxx_assert(precision >= 0); + + string_view str; + switch (__builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, + FP_ZERO, value)) + { + case FP_INFINITE: + str = "-inf"; + break; + + case FP_NAN: + str = "-nan"; + break; + + case FP_ZERO: + break; + + default: + case FP_SUBNORMAL: + case FP_NORMAL: [[likely]] + return nullopt; + } + + if (!str.empty()) + { + // We're formatting +-inf or +-nan. + if (!__builtin_signbit(value)) + str.remove_prefix(strlen("-")); + + if (last - first < (int)str.length()) + return {{last, errc::value_too_large}}; + + memcpy(first, &str[0], str.length()); + first += str.length(); + return {{first, errc{}}}; + } + + // We're formatting 0. + __glibcxx_assert(value == 0); + const auto orig_first = first; + const bool sign = __builtin_signbit(value); + int expected_output_length; + switch (fmt) + { + case chars_format::fixed: + case chars_format::scientific: + case chars_format::hex: + expected_output_length = sign + 1; + if (precision) + expected_output_length += strlen(".") + precision; + if (fmt == chars_format::scientific) + expected_output_length += strlen("e+00"); + else if (fmt == chars_format::hex) + expected_output_length += strlen("p+0"); + if (last - first < expected_output_length) + return {{last, errc::value_too_large}}; + + if (sign) + *first++ = '-'; + *first++ = '0'; + if (precision) + { + *first++ = '.'; + memset(first, '0', precision); + first += precision; + } + if (fmt == chars_format::scientific) + { + memcpy(first, "e+00", 4); + first += 4; + } + else if (fmt == chars_format::hex) + { + memcpy(first, "p+0", 3); + first += 3; + } + break; + + case chars_format::general: + default: // case chars_format{}: + expected_output_length = sign + 1; + if (last - first < expected_output_length) + return {{last, errc::value_too_large}}; + + if (sign) + *first++ = '-'; + *first++ = '0'; + break; + } + __glibcxx_assert(first - orig_first == expected_output_length); + return {{first, errc{}}}; + } + +// This subroutine of the floating-point to_chars overloads performs +// hexadecimal formatting. +template<typename T> + static to_chars_result + __floating_to_chars_hex(char* first, char* const last, const T value, + const optional<int> precision) + { + if (precision.has_value() && precision.value() < 0) [[unlikely]] + // A negative precision argument is treated as if it were omitted. + return __floating_to_chars_hex(first, last, value, nullopt); + + __glibcxx_requires_valid_range(first, last); + + constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits; + constexpr bool has_implicit_leading_bit + = floating_type_traits<T>::has_implicit_leading_bit; + constexpr int exponent_bits = floating_type_traits<T>::exponent_bits; + constexpr int exponent_bias = (1u << (exponent_bits - 1)) - 1; + using mantissa_t = typename floating_type_traits<T>::mantissa_t; + constexpr int mantissa_t_width = sizeof(mantissa_t) * __CHAR_BIT__; + + if (auto result = __handle_special_value(first, last, value, + chars_format::hex, + precision.value_or(0))) + return *result; + + // Extract the sign, mantissa and exponent from the value. + const auto [ieee_mantissa, biased_exponent, sign] = get_ieee_repr(value); + const bool is_normal_number = (biased_exponent != 0); + + // Calculate the unbiased exponent. + const int32_t unbiased_exponent = (is_normal_number + ? biased_exponent - exponent_bias + : 1 - exponent_bias); + + // Shift the mantissa so that its bitwidth is a multiple of 4. + constexpr unsigned rounded_mantissa_bits = (mantissa_bits + 3) / 4 * 4; + static_assert(mantissa_t_width >= rounded_mantissa_bits); + mantissa_t effective_mantissa + = ieee_mantissa << (rounded_mantissa_bits - mantissa_bits); + if (is_normal_number) + { + if constexpr (has_implicit_leading_bit) + // Restore the mantissa's implicit leading bit. + effective_mantissa |= mantissa_t{1} << rounded_mantissa_bits; + else + // The explicit mantissa bit should already be set. + __glibcxx_assert(effective_mantissa & (mantissa_t{1} << (mantissa_bits + - 1u))); + } + + // Compute the shortest precision needed to print this value exactly, + // disregarding trailing zeros. + constexpr int full_hex_precision = (has_implicit_leading_bit + ? (mantissa_bits + 3) / 4 + // With an explicit leading bit, we + // use the four leading nibbles as the + // hexit before the decimal point. + : (mantissa_bits - 4 + 3) / 4); + const int trailing_zeros = __countr_zero(effective_mantissa) / 4; + const int shortest_full_precision = full_hex_precision - trailing_zeros; + __glibcxx_assert(shortest_full_precision >= 0); + + int written_exponent = unbiased_exponent; + const int effective_precision = precision.value_or(shortest_full_precision); + if (effective_precision < shortest_full_precision) + { + // When limiting the precision, we need to determine how to round the + // least significant printed hexit. The following branchless + // bit-level-parallel technique computes whether to round up the + // mantissa bit at index N (according to round-to-nearest rules) when + // dropping N bits of precision, for each index N in the bit vector. + // This technique is borrowed from the MSVC implementation. + using bitvec = mantissa_t; + const bitvec round_bit = effective_mantissa << 1; + const bitvec has_tail_bits = round_bit - 1; + const bitvec lsb_bit = effective_mantissa; + const bitvec should_round = round_bit & (has_tail_bits | lsb_bit); + + const int dropped_bits = 4*(full_hex_precision - effective_precision); + // Mask out the dropped nibbles. + effective_mantissa >>= dropped_bits; + effective_mantissa <<= dropped_bits; + if (should_round & (mantissa_t{1} << dropped_bits)) + { + // Round up the least significant nibble. + effective_mantissa += mantissa_t{1} << dropped_bits; + // Check and adjust for overflow of the leading nibble. When the + // type has an implicit leading bit, then the leading nibble + // before rounding is either 0 or 1, so it can't overflow. + if constexpr (!has_implicit_leading_bit) + { + // The only supported floating-point type with explicit + // leading mantissa bit is LDK_FLOAT80, i.e. x86 80-bit + // extended precision, and so we hardcode the below overflow + // check+adjustment for this type. + static_assert(mantissa_t_width == 64 + && rounded_mantissa_bits == 64); + if (effective_mantissa == 0) + { + // We rounded up the least significant nibble and the + // mantissa overflowed, e.g f.fcp+10 with precision=1 + // became 10.0p+10. Absorb this extra hexit into the + // exponent to obtain 1.0p+14. + effective_mantissa + = mantissa_t{1} << (rounded_mantissa_bits - 4); + written_exponent += 4; + } + } + } + } + + // Compute the leading hexit and mask it out from the mantissa. + char leading_hexit; + if constexpr (has_implicit_leading_bit) + { + const unsigned nibble = effective_mantissa >> rounded_mantissa_bits; + __glibcxx_assert(nibble <= 2); + leading_hexit = '0' + nibble; + effective_mantissa &= ~(mantissa_t{0b11} << rounded_mantissa_bits); + } + else + { + const unsigned nibble = effective_mantissa >> (rounded_mantissa_bits-4); + __glibcxx_assert(nibble < 16); + leading_hexit = "0123456789abcdef"[nibble]; + effective_mantissa &= ~(mantissa_t{0b1111} << (rounded_mantissa_bits-4)); + written_exponent -= 3; + } + + // Now before we start writing the string, determine the total length of + // the output string and perform a single bounds check. + int expected_output_length = sign + 1; + if (effective_precision != 0) + expected_output_length += strlen(".") + effective_precision; + const int abs_written_exponent = abs(written_exponent); + expected_output_length += (abs_written_exponent >= 10000 ? strlen("p+ddddd") + : abs_written_exponent >= 1000 ? strlen("p+dddd") + : abs_written_exponent >= 100 ? strlen("p+ddd") + : abs_written_exponent >= 10 ? strlen("p+dd") + : strlen("p+d")); + if (last - first < expected_output_length) + return {last, errc::value_too_large}; + + const auto saved_first = first; + // Write the negative sign and the leading hexit. + if (sign) + *first++ = '-'; + *first++ = leading_hexit; + + if (effective_precision > 0) + { + *first++ = '.'; + int written_hexits = 0; + // Extract and mask out the leading nibble after the decimal point, + // write its corresponding hexit, and repeat until the mantissa is + // empty. + int nibble_offset = rounded_mantissa_bits; + if constexpr (!has_implicit_leading_bit) + // We already printed the entire leading hexit. + nibble_offset -= 4; + while (effective_mantissa != 0) + { + nibble_offset -= 4; + const unsigned nibble = effective_mantissa >> nibble_offset; + __glibcxx_assert(nibble < 16); + *first++ = "0123456789abcdef"[nibble]; + ++written_hexits; + effective_mantissa &= ~(mantissa_t{0b1111} << nibble_offset); + } + __glibcxx_assert(nibble_offset >= 0); + __glibcxx_assert(written_hexits <= effective_precision); + // Since the mantissa is now empty, every hexit hereafter must be '0'. + if (int remaining_hexits = effective_precision - written_hexits) + { + memset(first, '0', remaining_hexits); + first += remaining_hexits; + } + } + + // Finally, write the exponent. + *first++ = 'p'; + if (written_exponent >= 0) + *first++ = '+'; + const to_chars_result result = to_chars(first, last, written_exponent); + __glibcxx_assert(result.ec == errc{} + && result.ptr == saved_first + expected_output_length); + return result; + } + +template<typename T> + static to_chars_result + __floating_to_chars_shortest(char* first, char* const last, const T value, + chars_format fmt) + { + if (fmt == chars_format::hex) + return __floating_to_chars_hex(first, last, value, nullopt); + + __glibcxx_assert(fmt == chars_format::fixed + || fmt == chars_format::scientific </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] Regression caused by llvm: Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""

by ci_notify＠linaro.org

After llvm commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 Author: Jun Ma <JunMa(a)linux.alibaba.com> Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" the following benchmarks grew in size by more than 1%: - 401.bzip2 grew in size by 4% from 36214 to 37510 bytes - [.] BZ2_decompress grew in size by 19%,401.bzip2,[.] BZ2_decompress grew in size by 19% from 7260 to 8660 bytes Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 cd investigate-llvm-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d1280f6967db1ca8fa4e0c39414003e717b40feb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 Author: Jun Ma <JunMa(a)linux.alibaba.com> Date: Fri Aug 20 17:27:00 2021 +0800 Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056 --- llvm/include/llvm/Transforms/Utils/Local.h | 5 ++++ .../Scalar/CorrelatedValuePropagation.cpp | 27 +++++++++++++++++++++- llvm/lib/Transforms/Utils/Local.cpp | 20 ++++++++++++++++ llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 20 ---------------- .../Transforms/CorrelatedValuePropagation/basic.ll | 11 +++++---- 5 files changed, 57 insertions(+), 26 deletions(-) diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h index 97686d7d5f2f..f003615eca78 100644 --- a/llvm/include/llvm/Transforms/Utils/Local.h +++ b/llvm/include/llvm/Transforms/Utils/Local.h @@ -55,6 +55,7 @@ class MDNode; class MemorySSAUpdater; class PHINode; class StoreInst; +class SwitchInst; class TargetLibraryInfo; class TargetTransformInfo; @@ -236,6 +237,10 @@ CallInst *createCallMatchingInvoke(InvokeInst *II); /// This function converts the specified invoek into a normall call. void changeToCall(InvokeInst *II, DomTreeUpdater *DTU = nullptr); +/// This function removes the default destination from the specified switch. +void createUnreachableSwitchDefault(SwitchInst *Switch, + DomTreeUpdater *DTU = nullptr); + ///===---------------------------------------------------------------------===// /// Dbg Intrinsic utilities /// diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp index 36cbd42a5fdd..cd38ce96e287 100644 --- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp +++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp @@ -341,7 +341,13 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI, // ConstantFoldTerminator() as the underlying SwitchInst can be changed. SwitchInstProfUpdateWrapper SI(*I); - for (auto CI = SI->case_begin(), CE = SI->case_end(); CI != CE;) { + APInt Low = + APInt::getSignedMaxValue(Cond->getType()->getScalarSizeInBits()); + APInt High = + APInt::getSignedMinValue(Cond->getType()->getScalarSizeInBits()); + + SwitchInst::CaseIt CI = SI->case_begin(); + for (auto CE = SI->case_end(); CI != CE;) { ConstantInt *Case = CI->getCaseValue(); LazyValueInfo::Tristate State = LVI->getPredicateAt(CmpInst::ICMP_EQ, Cond, Case, I, @@ -374,9 +380,28 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI, break; } + // Get Lower/Upper bound from switch cases. + Low = APIntOps::smin(Case->getValue(), Low); + High = APIntOps::smax(Case->getValue(), High); + // Increment the case iterator since we didn't delete it. ++CI; } + + // Try to simplify default case as unreachable + if (CI == SI->case_end() && SI->getNumCases() != 0 && + !isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg())) { + const ConstantRange SIRange = + LVI->getConstantRange(SI->getCondition(), SI); + + // If the numbered switch cases cover the entire range of the condition, + // then the default case is not reachable. + if (SIRange.getSignedMin() == Low && SIRange.getSignedMax() == High && + SI->getNumCases() == High - Low + 1) { + createUnreachableSwitchDefault(SI, &DTU); + Changed = true; + } + } } if (Changed) diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp index 3d6ffded9b19..6d7eca8e3678 100644 --- a/llvm/lib/Transforms/Utils/Local.cpp +++ b/llvm/lib/Transforms/Utils/Local.cpp @@ -2182,6 +2182,26 @@ void llvm::changeToCall(InvokeInst *II, DomTreeUpdater *DTU) { DTU->applyUpdates({{DominatorTree::Delete, BB, UnwindDestBB}}); } +void llvm::createUnreachableSwitchDefault(SwitchInst *Switch, + DomTreeUpdater *DTU) { + LLVM_DEBUG(dbgs() << "SimplifyCFG: switch default is dead.\n"); + auto *BB = Switch->getParent(); + auto *OrigDefaultBlock = Switch->getDefaultDest(); + OrigDefaultBlock->removePredecessor(BB); + BasicBlock *NewDefaultBlock = BasicBlock::Create( + BB->getContext(), BB->getName() + ".unreachabledefault", BB->getParent(), + OrigDefaultBlock); + new UnreachableInst(Switch->getContext(), NewDefaultBlock); + Switch->setDefaultDest(&*NewDefaultBlock); + if (DTU) { + SmallVector<DominatorTree::UpdateType, 2> Updates; + Updates.push_back({DominatorTree::Insert, BB, &*NewDefaultBlock}); + if (!is_contained(successors(BB), OrigDefaultBlock)) + Updates.push_back({DominatorTree::Delete, BB, &*OrigDefaultBlock}); + DTU->applyUpdates(Updates); + } +} + BasicBlock *llvm::changeToInvokeAndSplitBasicBlock(CallInst *CI, BasicBlock *UnwindEdge, DomTreeUpdater *DTU) { diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 737b4f97a97a..70297e471a7a 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -4743,26 +4743,6 @@ static bool CasesAreContiguous(SmallVectorImpl<ConstantInt *> &Cases) { return true; } -static void createUnreachableSwitchDefault(SwitchInst *Switch, - DomTreeUpdater *DTU) { - LLVM_DEBUG(dbgs() << "SimplifyCFG: switch default is dead.\n"); - auto *BB = Switch->getParent(); - auto *OrigDefaultBlock = Switch->getDefaultDest(); - OrigDefaultBlock->removePredecessor(BB); - BasicBlock *NewDefaultBlock = BasicBlock::Create( - BB->getContext(), BB->getName() + ".unreachabledefault", BB->getParent(), - OrigDefaultBlock); - new UnreachableInst(Switch->getContext(), NewDefaultBlock); - Switch->setDefaultDest(&*NewDefaultBlock); - if (DTU) { - SmallVector<DominatorTree::UpdateType, 2> Updates; - Updates.push_back({DominatorTree::Insert, BB, &*NewDefaultBlock}); - if (!is_contained(successors(BB), OrigDefaultBlock)) - Updates.push_back({DominatorTree::Delete, BB, &*OrigDefaultBlock}); - DTU->applyUpdates(Updates); - } -} - /// Turn a switch with two reachable destinations into an integer range /// comparison and branch. bool SimplifyCFGOpt::TurnSwitchRangeIntoICmp(SwitchInst *SI, diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll index 5abbcbc90e01..a620c8468d4d 100644 --- a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll +++ b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll @@ -382,7 +382,7 @@ define i32 @switch_range(i32 %cond) { ; CHECK-NEXT: entry: ; CHECK-NEXT: [[S:%.*]] = urem i32 [[COND:%.*]], 3 ; CHECK-NEXT: [[S1:%.*]] = add nuw nsw i32 [[S]], 1 -; CHECK-NEXT: switch i32 [[S1]], label [[UNREACHABLE:%.*]] [ +; CHECK-NEXT: switch i32 [[S1]], label [[ENTRY_UNREACHABLEDEFAULT:%.*]] [ ; CHECK-NEXT: i32 1, label [[EXIT1:%.*]] ; CHECK-NEXT: i32 2, label [[EXIT2:%.*]] ; CHECK-NEXT: i32 3, label [[EXIT1]] @@ -391,6 +391,8 @@ define i32 @switch_range(i32 %cond) { ; CHECK-NEXT: ret i32 1 ; CHECK: exit2: ; CHECK-NEXT: ret i32 2 +; CHECK: entry.unreachabledefault: +; CHECK-NEXT: unreachable ; CHECK: unreachable: ; CHECK-NEXT: ret i32 0 ; @@ -453,10 +455,9 @@ define i8 @switch_defaultdest_multipleuse(i8 %t0) { ; CHECK-NEXT: entry: ; CHECK-NEXT: [[O:%.*]] = or i8 [[T0:%.*]], 1 ; CHECK-NEXT: [[R:%.*]] = srem i8 1, [[O]] -; CHECK-NEXT: switch i8 [[R]], label [[EXIT:%.*]] [ -; CHECK-NEXT: i8 0, label [[EXIT]] -; CHECK-NEXT: i8 1, label [[EXIT]] -; CHECK-NEXT: ] +; CHECK-NEXT: br label [[EXIT:%.*]] +; CHECK: entry.unreachabledefault: +; CHECK-NEXT: unreachable ; CHECK: exit: ; CHECK-NEXT: ret i8 0 ; </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association: commit 51298b330327a568358da069d9808f51c6cb1672 Author: Alan Modra <amodra(a)gmail.com> PR28149, debug info with wrong file association Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6363 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 7116 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Reproduce builds: <cut> mkdir investigate-binutils-51298b330327a568358da069d9808f51c6cb1672 cd investigate-binutils-51298b330327a568358da069d9808f51c6cb1672 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 51298b330327a568358da069d9808f51c6cb1672 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5cdb4f14426a99ec8fcba843fa503efdc55fa078 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 51298b330327a568358da069d9808f51c6cb1672 Author: Alan Modra <amodra(a)gmail.com> Date: Fri Sep 17 09:08:15 2021 +0930 PR28149, debug info with wrong file association gcc-11 and gcc-12 pass -gdwarf-5 to gas, in order to prime gas for DWARF 5 level debug info. Unfortunately it seems there are cases where the compiler does not emit a .file or .loc dwarf debug directive before any machine instructions. (Note that the .file directive typically emitted as the first line of assembly output doesn't count as a dwarf debug directive. The dwarf .file has a file number before the file name string.) This patch delays allocation of file numbers for gas generated line debug info until the end of assembly, thus avoiding any clashes with compiler generated file numbers. Two fixes for test case source are necessary; A .loc can't use a file number that hasn't already been specified with .file. A followup patch will remove all the gas generated line info on seeing a .file directive. PR 28149 * dwarf2dbg.c (num_of_auto_assigned): Delete. (current): Update initialisation. (set_or_check_view): Replace all accesses to view with u.view. (dwarf2_consume_line_info): Likewise. (dwarf2_directive_loc): Likewise. Assert that we aren't generating line info. (dwarf2_gen_line_info_1): Don't call set_or_check_view on gas generated line entries. (dwarf2_gen_line_info): Set and track filenames for gas generated line entries. Simplify generation of labels. (get_directory_table_entry): Use filename_cmp when comparing dirs. (do_allocate_filenum): New function. (dwarf2_where): Set u.filename and filenum to -1 for gas generated line entries. (dwarf2_directive_filename): Remove num_of_auto_assigned handling. (process_entries): Update view field access. Call do_allocate_filenum. * dwarf2dbg.h (struct dwarf2_line_info): Add filename field in union aliasing view. * testsuite/gas/i386/dwarf2-line-3.s: Add .file directive. * testsuite/gas/i386/dwarf2-line-4.s: Likewise. * testsuite/gas/i386/dwarf2-line-4.d: Update expected output. * testsuite/gas/i386/dwarf4-line-1.d: Likewise. * testsuite/gas/i386/dwarf5-line-1.d: Likewise. * testsuite/gas/i386/dwarf5-line-2.d: Likewise. --- gas/dwarf2dbg.c | 152 ++++++++++++++++++--------------- gas/dwarf2dbg.h | 7 +- gas/testsuite/gas/i386/dwarf2-line-3.s | 1 + gas/testsuite/gas/i386/dwarf2-line-4.d | 5 +- gas/testsuite/gas/i386/dwarf2-line-4.s | 1 + gas/testsuite/gas/i386/dwarf4-line-1.d | 4 +- gas/testsuite/gas/i386/dwarf5-line-1.d | 4 +- gas/testsuite/gas/i386/dwarf5-line-2.d | 3 +- 8 files changed, 105 insertions(+), 72 deletions(-) diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c index 9e3437b8948..c6303ba94a6 100644 --- a/gas/dwarf2dbg.c +++ b/gas/dwarf2dbg.c @@ -207,7 +207,6 @@ struct file_entry static struct file_entry *files; static unsigned int files_in_use; static unsigned int files_allocated; -static unsigned int num_of_auto_assigned; /* Table of directories used by .debug_line. */ static char ** dirs = NULL; @@ -233,7 +232,7 @@ static struct dwarf2_line_info current = { 1, 1, 0, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, - 0, NULL + 0, { NULL } }; /* This symbol is used to recognize view number forced resets in loc @@ -342,7 +341,7 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, /* First, compute !(E->label > P->label), to tell whether or not we're to reset the view number. If we can't resolve it to a constant, keep it symbolic. */ - if (!p || (e->loc.view == force_reset_view && force_reset_view)) + if (!p || (e->loc.u.view == force_reset_view && force_reset_view)) { viewx.X_op = O_constant; viewx.X_add_number = 0; @@ -367,9 +366,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, } } - if (S_IS_DEFINED (e->loc.view) && symbol_constant_p (e->loc.view)) + if (S_IS_DEFINED (e->loc.u.view) && symbol_constant_p (e->loc.u.view)) { - expressionS *value = symbol_get_value_expression (e->loc.view); + expressionS *value = symbol_get_value_expression (e->loc.u.view); /* We can't compare the view numbers at this point, because in VIEWX we've only determined whether we're to reset it so far. */ @@ -404,16 +403,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, { expressionS incv; - if (!p->loc.view) + if (!p->loc.u.view) { - p->loc.view = symbol_temp_make (); - gas_assert (!S_IS_DEFINED (p->loc.view)); + p->loc.u.view = symbol_temp_make (); + gas_assert (!S_IS_DEFINED (p->loc.u.view)); } memset (&incv, 0, sizeof (incv)); incv.X_unsigned = 1; incv.X_op = O_symbol; - incv.X_add_symbol = p->loc.view; + incv.X_add_symbol = p->loc.u.view; incv.X_add_number = 1; if (viewx.X_op == O_constant) @@ -430,16 +429,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, } } - if (!S_IS_DEFINED (e->loc.view)) + if (!S_IS_DEFINED (e->loc.u.view)) { - symbol_set_value_expression (e->loc.view, &viewx); - S_SET_SEGMENT (e->loc.view, expr_section); - symbol_set_frag (e->loc.view, &zero_address_frag); + symbol_set_value_expression (e->loc.u.view, &viewx); + S_SET_SEGMENT (e->loc.u.view, expr_section); + symbol_set_frag (e->loc.u.view, &zero_address_frag); } /* Define and attempt to simplify any earlier views needed to compute E's. */ - if (h && p && p->loc.view && !S_IS_DEFINED (p->loc.view)) + if (h && p && p->loc.u.view && !S_IS_DEFINED (p->loc.u.view)) { struct line_entry *h2; /* Reverse the list to avoid quadratic behavior going backwards @@ -459,7 +458,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, break; set_or_check_view (r, r->next, NULL); } - while (r->next && r->next->loc.view && !S_IS_DEFINED (r->next->loc.view) + while (r->next + && r->next->loc.u.view + && !S_IS_DEFINED (r->next->loc.u.view) && (r = r->next)); /* Unreverse the list, so that we can go forward again. */ @@ -475,14 +476,14 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, view of the previous subsegment. */ if (r == h) continue; - gas_assert (S_IS_DEFINED (r->loc.view)); - resolve_expression (symbol_get_value_expression (r->loc.view)); + gas_assert (S_IS_DEFINED (r->loc.u.view)); + resolve_expression (symbol_get_value_expression (r->loc.u.view)); } while (r != p && (r = r->next)); /* Now that we've defined and computed all earlier views that might be needed to compute E's, attempt to simplify it. */ - resolve_expression (symbol_get_value_expression (e->loc.view)); + resolve_expression (symbol_get_value_expression (e->loc.u.view)); } } @@ -518,10 +519,8 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) /* Subseg heads are chained to previous subsegs in dwarf2_finish. */ - if (loc->view && lss->head) - set_or_check_view (e, - (struct line_entry *)lss->ptail, - lss->head); + if (loc->filenum != -1u && loc->u.view && lss->head) + set_or_check_view (e, (struct line_entry *) lss->ptail, lss->head); *lss->ptail = e; lss->ptail = &e->next; @@ -532,9 +531,6 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) void dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc) { - static unsigned int line = -1; - static unsigned int filenum = -1; - symbolS *sym; /* Early out for as-yet incomplete location information. */ @@ -552,20 +548,35 @@ dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc) symbols apply to assembler code. It is necessary to emit duplicate line symbols when a compiler asks for them, because GDB uses them to determine the end of the prologue. */ - if (debug_type == DEBUG_DWARF2 - && line == loc->line && filenum == loc->filenum) - return; + if (debug_type == DEBUG_DWARF2) + { + static unsigned int line = -1; + static const char *filename = NULL; + + if (line == loc->line) + { + if (filename == loc->u.filename) + return; + if (filename_cmp (filename, loc->u.filename) == 0) + { + filename = loc->u.filename; + return; + } + } - line = loc->line; - filenum = loc->filenum; + line = loc->line; + filename = loc->u.filename; + } if (linkrelax) { - char name[120]; + static int label_num = 0; + char name[32]; /* Use a non-fake name for the line number location, so that it can be referred to by relocations. */ - sprintf (name, ".Loc.%u.%u", line, filenum); + sprintf (name, ".Loc.%u", label_num); + label_num++; sym = symbol_new (name, now_seg, frag_now, ofs); } else @@ -624,13 +635,15 @@ get_directory_table_entry (const char *dirname, { const char * pwd = file0_dirname ? file0_dirname : getpwd (); - if (dwarf_level >= 5 && strcmp (dirname, pwd) != 0) + if (dwarf_level >= 5 && filename_cmp (dirname, pwd) != 0) { - /* In DWARF-5 the 0 entry in the directory table is expected to be - the same as the DW_AT_comp_dir (which is set to the current build - directory). Since we are about to create a directory entry that - is not the same, allocate the current directory first. - FIXME: Alternatively we could generate an error message here. */ + /* In DWARF-5 the 0 entry in the directory table is + expected to be the same as the DW_AT_comp_dir (which + is set to the current build directory). Since we are + about to create a directory entry that is not the + same, allocate the current directory first. + FIXME: Alternatively we could generate an error + message here. */ (void) get_directory_table_entry (pwd, NULL, strlen (pwd), true); d = 1; @@ -745,14 +758,30 @@ allocate_filenum (const char * pathname) if (!assign_file_to_slot (i, file, dir)) return -1; - num_of_auto_assigned++; - last_used = i; last_used_dir_len = dir_len; return i; } +/* Run through the list of line entries starting at E, allocating + file entries for gas generated debug. */ + +static void +do_allocate_filenum (struct line_entry *e) +{ + do + { + if (e->loc.filenum == -1u) + { + e->loc.filenum = allocate_filenum (e->loc.u.filename); + e->loc.u.view = NULL; + } + e = e->next; + } + while (e); +} + /* Allocate slot NUM in the .debug_line file table to FILENAME. If DIRNAME is not NULL or there is a directory component to FILENAME then this will be stored in the directory table, if not already present. @@ -929,17 +958,12 @@ dwarf2_where (struct dwarf2_line_info *line) { if (debug_type == DEBUG_DWARF2) { - const char *filename; - - memset (line, 0, sizeof (*line)); - filename = as_where (&line->line); - line->filenum = allocate_filenum (filename); - /* FIXME: We should check the return value from allocate_filenum. */ + line->u.filename = as_where (&line->line); + line->filenum = -1u; line->column = 0; line->flags = DWARF2_FLAG_IS_STMT; line->isa = current.isa; line->discriminator = current.discriminator; - line->view = NULL; } else *line = current; @@ -1018,7 +1042,7 @@ dwarf2_consume_line_info (void) | DWARF2_FLAG_PROLOGUE_END | DWARF2_FLAG_EPILOGUE_BEGIN); current.discriminator = 0; - current.view = NULL; + current.u.view = NULL; } /* Called for each (preferably code) label. If dwarf2_loc_mark_labels @@ -1060,7 +1084,6 @@ dwarf2_directive_filename (void) char *filename; const char * dirname = NULL; int filename_len; - unsigned int i; /* Continue to accept a bare string and pass it off. */ SKIP_WHITESPACE (); @@ -1132,18 +1155,6 @@ dwarf2_directive_filename (void) return NULL; } - if (num_of_auto_assigned) - { - /* Clear slots auto-assigned before the first .file <NUMBER> - directive was seen. */ - if (files_in_use != (num_of_auto_assigned + 1)) - abort (); - for (i = 1; i < files_in_use; i++) - files[i].filename = NULL; - files_in_use = 0; - num_of_auto_assigned = 0; - } - if (! allocate_filename_to_slot (dirname, filename, (unsigned int) num, with_md5)) return NULL; @@ -1191,6 +1202,11 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) return; } + /* debug_type will be turned off by dwarf2_directive_filename, and + if we don't have a dwarf style .file then files_in_use will be + zero and the above error will trigger. */ + gas_assert (debug_type == DEBUG_NONE); + current.filenum = filenum; current.line = line; current.discriminator = 0; @@ -1333,7 +1349,7 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) S_SET_VALUE (sym, 0); symbol_set_frag (sym, &zero_address_frag); } - current.view = sym; + current.u.view = sym; } else { @@ -1347,10 +1363,9 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) demand_empty_rest_of_line (); dwarf2_any_loc_directive_seen = dwarf2_loc_directive_seen = true; - debug_type = DEBUG_NONE; /* If we were given a view id, emit the row right away. */ - if (current.view) + if (current.u.view) dwarf2_emit_insn (0); } @@ -1984,7 +1999,7 @@ process_entries (segT seg, struct line_entry *e) frag_ofs = S_GET_VALUE (lab); if (last_frag == NULL - || (e->loc.view == force_reset_view && force_reset_view + || (e->loc.u.view == force_reset_view && force_reset_view /* If we're going to reset the view, but we know we're advancing the PC, we don't have to force with set_address. We know we do when we're at the same @@ -2850,16 +2865,19 @@ dwarf2_finish (void) struct line_subseg *lss = s->head; struct line_entry **ptail = lss->ptail; + if (lss->head && SEG_NORMAL (s->seg)) + do_allocate_filenum (lss->head); + /* Reset the initial view of the first subsection of the section. */ - if (lss->head && lss->head->loc.view) + if (lss->head && lss->head->loc.u.view) set_or_check_view (lss->head, NULL, NULL); while ((lss = lss->next) != NULL) { /* Link the first view of subsequent subsections to the previous view. */ - if (lss->head && lss->head->loc.view) + if (lss->head && lss->head->loc.u.view) set_or_check_view (lss->head, !s->head ? NULL : (struct line_entry *)ptail, s->head ? s->head->head : NULL); diff --git a/gas/dwarf2dbg.h b/gas/dwarf2dbg.h index 14d770c40dd..700d9dec5cb 100644 --- a/gas/dwarf2dbg.h +++ b/gas/dwarf2dbg.h @@ -36,7 +36,12 @@ struct dwarf2_line_info unsigned int isa; unsigned int flags; unsigned int discriminator; - symbolS *view; + /* filenum == -1u chooses filename, otherwise view. */ + union + { + symbolS *view; + const char *filename; + } u; }; /* Implements the .file FILENO "FILENAME" directive. FILENO can be 0 diff --git a/gas/testsuite/gas/i386/dwarf2-line-3.s b/gas/testsuite/gas/i386/dwarf2-line-3.s index 2085ef93940..e933719fbc3 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-3.s +++ b/gas/testsuite/gas/i386/dwarf2-line-3.s @@ -7,6 +7,7 @@ main: .cfi_startproc nop + .file 1 "dwarf2-test.c" .loc 1 1 ret .cfi_endproc diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.d b/gas/testsuite/gas/i386/dwarf2-line-4.d index c0c85f4639f..a01fd0540f3 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-4.d +++ b/gas/testsuite/gas/i386/dwarf2-line-4.d @@ -33,11 +33,14 @@ Raw dump of debug contents of section \.z?debug_line: The File Name Table $offset 0x.*$: Entry Dir Time Size Name - 1 1 0 0 dwarf2-line-4.s + 1 0 0 0 dwarf2-test.c + 2 1 0 0 dwarf2-line-4.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Special opcode 13: advance Address by 0 to 0x0 and Line by 8 to 9 + \[0x.*\] Set File Name to entry 1 in the File Name Table \[0x.*\] Advance Line by -8 to 1 \[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 1 \[0x.*\] Advance PC by 1 to 0x2 diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.s b/gas/testsuite/gas/i386/dwarf2-line-4.s index 89bb62d9db7..7348f4be62c 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-4.s +++ b/gas/testsuite/gas/i386/dwarf2-line-4.s @@ -7,6 +7,7 @@ main: .cfi_startproc nop + .file 1 "dwarf2-test.c" .loc 1 1 ret .cfi_endproc diff --git a/gas/testsuite/gas/i386/dwarf4-line-1.d b/gas/testsuite/gas/i386/dwarf4-line-1.d index 4f8321e9bfd..8199efbb0c2 100644 --- a/gas/testsuite/gas/i386/dwarf4-line-1.d +++ b/gas/testsuite/gas/i386/dwarf4-line-1.d @@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line: Entry Dir Time Size Name 1 0 0 0 foo.c 2 0 0 0 foo.h + 3 1 0 0 dwarf4-line-1.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Advance Line by 81 to 82 \[0x.*\] Copy - \[0x.*\] Set File Name to entry 2 in the File Name Table + \[0x.*\] Set File Name to entry 3 in the File Name Table \[0x.*\] Advance Line by -73 to 9 \[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 9 \[0x.*\] Advance PC by 3 to 0x4 diff --git a/gas/testsuite/gas/i386/dwarf5-line-1.d b/gas/testsuite/gas/i386/dwarf5-line-1.d index f57fc47d269..2c2cf5696c4 100644 --- a/gas/testsuite/gas/i386/dwarf5-line-1.d +++ b/gas/testsuite/gas/i386/dwarf5-line-1.d @@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line: 0 $indirect line string, offset: 0x.*$: .*/gas/testsuite 1 $indirect line string, offset: 0x.*$: .*/gas/testsuite/gas/i386 - The File Name Table $offset 0x.*, lines 2, columns 3$: + The File Name Table $offset 0x.*, lines 3, columns 3$: Entry Dir MD5 Name 0 0 0xbbd69fc03ce253b2dbaab2522dd519ae $indirect line string, offset: 0x.*$: core.c 1 0 0x0 $indirect line string, offset: 0x.*$: types.h + 2 1 0x0 $indirect line string, offset: 0x.*$: dwarf5-line-1.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Special opcode 8: advance Address by 0 to 0x0 and Line by 3 to 4 \[0x.*\] Advance PC by 1 to 0x1 diff --git a/gas/testsuite/gas/i386/dwarf5-line-2.d b/gas/testsuite/gas/i386/dwarf5-line-2.d index 2f96df510d0..85f98c8ab9c 100644 --- a/gas/testsuite/gas/i386/dwarf5-line-2.d +++ b/gas/testsuite/gas/i386/dwarf5-line-2.d @@ -36,9 +36,10 @@ Raw dump of debug contents of section \.z?debug_line: 0 $indirect line string, offset: 0x.*$: .*/gas/testsuite 1 $indirect line string, offset: 0x.*$: .*/gas/testsuite/gas/i386 - The File Name Table $offset 0x.*, lines 1, columns 3$: + The File Name Table $offset 0x.*, lines 2, columns 3$: Entry Dir MD5 Name 0 0 0xbbd69fc03ce253b2dbaab2522dd519ae $indirect line string, offset: 0x.*$: core.c + 1 1 0x0 $indirect line string, offset: .*$: dwarf5-line-2.s Line Number Statements: \[0x.*\] Extended opcode 2: set Address to 0x0 </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 447.dealII grew in size by 2% after gcc: libstdc++: Fix and improve std::vector<bool> implementation.

by ci_notify＠linaro.org

After gcc commit 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f Author: François Dumont <fdumont(a)gcc.gnu.org> libstdc++: Fix and improve std::vector<bool> implementation. the following benchmarks grew in size by more than 1%: - 447.dealII grew in size by 2% from 348834 to 356146 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-gcc-6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f cd investigate-gcc-6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5f9669d9e23a1116e040c80e0f3d4f43639bda52 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f Author: François Dumont <fdumont(a)gcc.gnu.org> Date: Tue Jan 21 07:18:08 2020 +0100 libstdc++: Fix and improve std::vector<bool> implementation. Do not consider allocator noexcept qualification for vector<bool> move constructor. Improve swap performance using TBAA like in main vector implementation. Bypass _M_initialize_dispatch/_M_assign_dispatch in post-c++11 modes. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h [_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): Define as _Bit_type*. (_Bvector_impl_data(const _Bvector_impl_data&)): Default. (_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to latter. (_Bvector_impl_data::operator=(const _Bvector_impl_data&)): Default. (_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter. (_Bvector_impl_data::_M_reset()): Likewise. (_Bvector_impl_data::_M_swap_data): New. (_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement explicitely. (_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, _Bvector_impl&&)): New. (_Bvector_base::_Bvector_base(_Bvector_base&&, const allocator_type&)): New, use latter. (vector::vector(vector&&, const allocator_type&, true_type)): New, use latter. (vector::vector(vector&&, const allocator_type&, false_type)): New. (vector::vector(vector&&, const allocator_type&)): Use latters. (vector::vector(const vector&, const allocator_type&)): Adapt. [__cplusplus >= 201103](vector::vector(_InputIt, _InputIt, const allocator_type&)): Use _M_initialize_range. (vector::operator[](size_type)): Use iterator operator[]. (vector::operator[](size_type) const): Use const_iterator operator[]. (vector::swap(vector&)): Add assertions on allocators. Use _M_swap_data. [__cplusplus >= 201103](vector::insert(const_iterator, _InputIt, _InputIt)): Use _M_insert_range. (vector::_M_initialize(size_type)): Adapt. [__cplusplus >= 201103](vector::_M_initialize_dispatch): Remove. [__cplusplus >= 201103](vector::_M_insert_dispatch): Remove. * python/libstdcxx/v6/printers.py (StdVectorPrinter._iterator): Stop using start _M_offset. (StdVectorPrinter.to_string): Likewise. * testsuite/23_containers/vector/bool/allocator/swap.cc: Adapt. * testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc: Add check. --- libstdc++-v3/include/bits/stl_bvector.h | 140 +++++++++++++-------- libstdc++-v3/python/libstdcxx/v6/printers.py | 5 +- .../23_containers/vector/bool/allocator/swap.cc | 22 ++-- .../vector/bool/cons/noexcept_move_construct.cc | 32 ++++- 4 files changed, 130 insertions(+), 69 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index a365e7182eb..d6f5435bdfb 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -427,53 +427,75 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER struct _Bvector_impl_data { - _Bit_iterator _M_start; - _Bit_iterator _M_finish; - _Bit_pointer _M_end_of_storage; +#if !_GLIBCXX_INLINE_VERSION + _Bit_iterator _M_start; +#else + // We don't need the offset field for the start, it's always zero. + struct { + _Bit_type* _M_p; + // Allow assignment from iterators (assume offset is zero): + void operator=(_Bit_iterator __it) { _M_p = __it._M_p; } + } _M_start; +#endif + _Bit_iterator _M_finish; + _Bit_pointer _M_end_of_storage; _Bvector_impl_data() _GLIBCXX_NOEXCEPT : _M_start(), _M_finish(), _M_end_of_storage() { } #if __cplusplus >= 201103L + _Bvector_impl_data(const _Bvector_impl_data&) = default; + _Bvector_impl_data& + operator=(const _Bvector_impl_data&) = default; + _Bvector_impl_data(_Bvector_impl_data&& __x) noexcept - : _M_start(__x._M_start), _M_finish(__x._M_finish) - , _M_end_of_storage(__x._M_end_of_storage) + : _Bvector_impl_data(__x) { __x._M_reset(); } void _M_move_data(_Bvector_impl_data&& __x) noexcept { - this->_M_start = __x._M_start; - this->_M_finish = __x._M_finish; - this->_M_end_of_storage = __x._M_end_of_storage; + *this = __x; __x._M_reset(); } #endif void _M_reset() _GLIBCXX_NOEXCEPT + { *this = _Bvector_impl_data(); } + + void + _M_swap_data(_Bvector_impl_data& __x) _GLIBCXX_NOEXCEPT { - _M_start = _M_finish = _Bit_iterator(); - _M_end_of_storage = _Bit_pointer(); + // Do not use std::swap(_M_start, __x._M_start), etc as it loses + // information used by TBAA. + std::swap(*this, __x); } }; struct _Bvector_impl : public _Bit_alloc_type, public _Bvector_impl_data - { - public: - _Bvector_impl() _GLIBCXX_NOEXCEPT_IF( - is_nothrow_default_constructible<_Bit_alloc_type>::value) - : _Bit_alloc_type() - { } + { + _Bvector_impl() _GLIBCXX_NOEXCEPT_IF( + is_nothrow_default_constructible<_Bit_alloc_type>::value) + : _Bit_alloc_type() + { } - _Bvector_impl(const _Bit_alloc_type& __a) _GLIBCXX_NOEXCEPT - : _Bit_alloc_type(__a) - { } + _Bvector_impl(const _Bit_alloc_type& __a) _GLIBCXX_NOEXCEPT + : _Bit_alloc_type(__a) + { } #if __cplusplus >= 201103L - _Bvector_impl(_Bvector_impl&&) = default; + // Not defaulted, to enforce noexcept(true) even when + // !is_nothrow_move_constructible<_Bit_alloc_type>. + _Bvector_impl(_Bvector_impl&& __x) noexcept + : _Bit_alloc_type(std::move(__x)), _Bvector_impl_data(std::move(__x)) + { } + + _Bvector_impl(_Bit_alloc_type&& __a, _Bvector_impl&& __x) noexcept + : _Bit_alloc_type(std::move(__a)), _Bvector_impl_data(std::move(__x)) + { } #endif _Bit_type* @@ -511,6 +533,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus >= 201103L _Bvector_base(_Bvector_base&&) = default; + + _Bvector_base(_Bvector_base&& __x, const allocator_type& __a) noexcept + : _M_impl(_Bit_alloc_type(__a), std::move(__x._M_impl)) + { } #endif ~_Bvector_base() @@ -647,14 +673,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER : _Base(_Bit_alloc_traits::_S_select_on_copy(__x._M_get_Bit_allocator())) { _M_initialize(__x.size()); - _M_copy_aligned(__x.begin(), __x.end(), this->_M_impl._M_start); + _M_copy_aligned(__x.begin(), __x.end(), begin()); } #if __cplusplus >= 201103L vector(vector&&) = default; - vector(vector&& __x, const allocator_type& __a) - noexcept(_Bit_alloc_traits::_S_always_equal()) + private: + vector(vector&& __x, const allocator_type& __a, true_type) noexcept + : _Base(std::move(__x), __a) + { } + + vector(vector&& __x, const allocator_type& __a, false_type) : _Base(__a) { if (__x.get_allocator() == __a) @@ -667,11 +697,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } + public: + vector(vector&& __x, const allocator_type& __a) + noexcept(_Bit_alloc_traits::_S_always_equal()) + : vector(std::move(__x), __a, + typename _Bit_alloc_traits::is_always_equal{}) + { } + vector(const vector& __x, const allocator_type& __a) : _Base(__a) { _M_initialize(__x.size()); - _M_copy_aligned(__x.begin(), __x.end(), this->_M_impl._M_start); + _M_copy_aligned(__x.begin(), __x.end(), begin()); } vector(initializer_list<bool> __l, @@ -689,13 +726,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER vector(_InputIterator __first, _InputIterator __last, const allocator_type& __a = allocator_type()) : _Base(__a) - { _M_initialize_dispatch(__first, __last, __false_type()); } + { + _M_initialize_range(__first, __last, + std::__iterator_category(__first)); + } #else template<typename _InputIterator> vector(_InputIterator __first, _InputIterator __last, const allocator_type& __a = allocator_type()) : _Base(__a) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_initialize_dispatch(__first, __last, _Integral()); } @@ -762,7 +803,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER vector& operator=(initializer_list<bool> __l) { - this->assign (__l.begin(), __l.end()); + this->assign(__l.begin(), __l.end()); return *this; } #endif @@ -786,6 +827,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void assign(_InputIterator __first, _InputIterator __last) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_assign_dispatch(__first, __last, _Integral()); } @@ -874,17 +916,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER reference operator[](size_type __n) - { - return *iterator(this->_M_impl._M_start._M_p - + __n / int(_S_word_bit), __n % int(_S_word_bit)); - } + { return begin()[__n]; } const_reference operator[](size_type __n) const - { - return *const_iterator(this->_M_impl._M_start._M_p - + __n / int(_S_word_bit), __n % int(_S_word_bit)); - } + { return begin()[__n]; } protected: void @@ -951,10 +987,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void swap(vector& __x) _GLIBCXX_NOEXCEPT { - std::swap(this->_M_impl._M_start, __x._M_impl._M_start); - std::swap(this->_M_impl._M_finish, __x._M_impl._M_finish); - std::swap(this->_M_impl._M_end_of_storage, - __x._M_impl._M_end_of_storage); +#if __cplusplus >= 201103L + __glibcxx_assert(_Bit_alloc_traits::propagate_on_container_swap::value + || _M_get_Bit_allocator() == __x._M_get_Bit_allocator()); +#endif + this->_M_impl._M_swap_data(__x._M_impl); _Bit_alloc_traits::_S_on_swap(_M_get_Bit_allocator(), __x._M_get_Bit_allocator()); } @@ -992,8 +1029,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _InputIterator __first, _InputIterator __last) { difference_type __offset = __position - cbegin(); - _M_insert_dispatch(__position._M_const_cast(), - __first, __last, __false_type()); + _M_insert_range(__position._M_const_cast(), + __first, __last, + std::__iterator_category(__first)); return begin() + __offset; } #else @@ -1002,6 +1040,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER insert(iterator __position, _InputIterator __first, _InputIterator __last) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_insert_dispatch(__position, __first, __last, _Integral()); } @@ -1113,15 +1152,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { _Bit_pointer __q = this->_M_allocate(__n); this->_M_impl._M_end_of_storage = __q + _S_nword(__n); - this->_M_impl._M_start = iterator(std::__addressof(*__q), 0); + iterator __start = iterator(std::__addressof(*__q), 0); + this->_M_impl._M_start = __start; + this->_M_impl._M_finish = __start + difference_type(__n); } - else - { - this->_M_impl._M_end_of_storage = _Bit_pointer(); - this->_M_impl._M_start = iterator(0, 0); - } - this->_M_impl._M_finish = this->_M_impl._M_start + difference_type(__n); - } void @@ -1141,8 +1175,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _M_shrink_to_fit(); #endif - // Check whether it's an integral type. If so, it's not an iterator. - +#if __cplusplus < 201103L // _GLIBCXX_RESOLVE_LIB_DEFECTS // 438. Ambiguity in the "do the right thing" clause template<typename _Integer> @@ -1159,6 +1192,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __false_type) { _M_initialize_range(__first, __last, std::__iterator_category(__first)); } +#endif template<typename _InputIterator> void @@ -1176,7 +1210,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { const size_type __n = std::distance(__first, __last); _M_initialize(__n); - std::copy(__first, __last, this->_M_impl._M_start); + std::copy(__first, __last, begin()); } #if __cplusplus < 201103L @@ -1240,8 +1274,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } - // Check whether it's an integral type. If so, it's not an iterator. - +#if __cplusplus < 201103L // _GLIBCXX_RESOLVE_LIB_DEFECTS // 438. Ambiguity in the "do the right thing" clause template<typename _Integer> @@ -1257,6 +1290,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __false_type) { _M_insert_range(__pos, __first, __last, std::__iterator_category(__first)); } +#endif void _M_fill_insert(iterator __position, size_type __n, bool __x); diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py index e4da8dfe5b6..0bf307b8e5f 100644 --- a/libstdc++-v3/python/libstdcxx/v6/printers.py +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py @@ -405,7 +405,7 @@ class StdVectorPrinter: self.bitvec = bitvec if bitvec: self.item = start['_M_p'] - self.so = start['_M_offset'] + self.so = 0 self.finish = finish['_M_p'] self.fo = finish['_M_offset'] itype = self.item.dereference().type @@ -453,12 +453,11 @@ class StdVectorPrinter: end = self.val['_M_impl']['_M_end_of_storage'] if self.is_bool: start = self.val['_M_impl']['_M_start']['_M_p'] - so = self.val['_M_impl']['_M_start']['_M_offset'] finish = self.val['_M_impl']['_M_finish']['_M_p'] fo = self.val['_M_impl']['_M_finish']['_M_offset'] itype = start.dereference().type bl = 8 * itype.sizeof - length = (bl - so) + bl * ((finish - start) - 1) + fo + length = bl * (finish - start) + fo capacity = bl * (end - start) return ('%s<bool> of length %d, capacity %d' % (self.typename, int (length), int (capacity))) diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc index a8107145c58..793115b473e 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc @@ -28,19 +28,17 @@ namespace __gnu_test // It is undefined behaviour to swap() containers with unequal allocators // if the allocator doesn't propagate, so ensure the allocators compare // equal, while still being able to test propagation via get_personality(). - bool - operator==(const propagating_allocator<T, false>&, - const propagating_allocator<T, false>&) - { - return true; - } + template<typename Type> + bool + operator==(const propagating_allocator<Type, false>&, + const propagating_allocator<Type, false>&) + { return true; } - bool - operator!=(const propagating_allocator<T, false>&, - const propagating_allocator<T, false>&) - { - return false; - } + template<typename Type> + bool + operator!=(const propagating_allocator<Type, false>&, + const propagating_allocator<Type, false>&) + { return false; } } using __gnu_test::propagating_allocator; diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc index 03794d8ebd8..296ba33bba8 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc @@ -23,4 +23,34 @@ typedef std::vector<bool> vbtype; -static_assert(std::is_nothrow_move_constructible<vbtype>::value, "Error"); +static_assert( std::is_nothrow_move_constructible<vbtype>::value, + "noexcept move constructor" ); +static_assert( std::is_nothrow_constructible<vbtype, + vbtype&&, const typename vbtype::allocator_type&>::value, + "noexcept move constructor with allocator" ); + +template<typename Type> + class not_noexcept_move_constructor_alloc : public std::allocator<Type> + { + public: + not_noexcept_move_constructor_alloc() noexcept { } + + not_noexcept_move_constructor_alloc( + const not_noexcept_move_constructor_alloc& x) noexcept + : std::allocator<Type>(x) + { } + + not_noexcept_move_constructor_alloc( + not_noexcept_move_constructor_alloc&& x) noexcept(false) + : std::allocator<Type>(std::move(x)) + { } + + template<typename _Tp1> + struct rebind + { typedef not_noexcept_move_constructor_alloc<_Tp1> other; }; + }; + +typedef std::vector<bool, not_noexcept_move_constructor_alloc<bool>> vbtype2; + +static_assert( std::is_nothrow_move_constructible<vbtype2>::value, + "noexcept move constructor with not noexcept alloc" ); </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:9e58de3ce00fc2385c9efb7faf321e0c601f0b0c

by ci_notify＠linaro.org

Identified regression caused by *gcc:9e58de3ce00fc2385c9efb7faf321e0c601f0b0c*: commit 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c Author: Andrew Pinski <apinski(a)marvell.com> Fix PR lto/49664: liblto_plugin.so exports too many symbols Results regressed to (for first_bad == 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 512b383534785f9fc021e700a1fdda86cf0f3fe7) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_lto: 2 This commit has regressed these CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_lto Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Reproduce builds: <cut> mkdir investigate-gcc-9e58de3ce00fc2385c9efb7faf321e0c601f0b0c cd investigate-gcc-9e58de3ce00fc2385c9efb7faf321e0c601f0b0c # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c ../artifacts/test.sh # Reproduce last_good build git checkout --detach 512b383534785f9fc021e700a1fdda86cf0f3fe7 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c Author: Andrew Pinski <apinski(a)marvell.com> Date: Sun Sep 12 08:58:16 2021 +0000 Fix PR lto/49664: liblto_plugin.so exports too many symbols So right now liblto_plugin.so exports many libiberty symbols and simple_object file symbols but really it just needs to export onload. This fixes the problem by using "-export-symbols-regex onload" on the libtool link line. lto-plugin/ChangeLog: PR lto/49664 * Makefile.am: Export only onload. * Makefile.in: Regenerate. --- lto-plugin/Makefile.am | 3 ++- lto-plugin/Makefile.in | 7 ++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am index 8b20e1d1d87..988d7a78294 100644 --- a/lto-plugin/Makefile.am +++ b/lto-plugin/Makefile.am @@ -21,7 +21,8 @@ in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) liblto_plugin_la_SOURCES = lto-plugin.c # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) \ - $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir) + $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir) \ + -export-symbols-regex onload # Can be simplified when libiberty becomes a normal convenience library. libiberty = $(with_libiberty)/libiberty.a libiberty_noasan = $(with_libiberty)/noasan/libiberty.a diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index 20611c6b1e6..f8df31bb1e8 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -323,6 +323,7 @@ prefix = @prefix@ program_transform_name = @program_transform_name@ psdir = @psdir@ real_target_noncanonical = @real_target_noncanonical@ +runstatedir = @runstatedir@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ srcdir = @srcdir@ @@ -350,9 +351,9 @@ libexecsub_LTLIBRARIES = liblto_plugin.la in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) liblto_plugin_la_SOURCES = lto-plugin.c # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. -liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module -avoid-version \ - -bindir $(libexecsubdir) $(if $(wildcard \ - $(libiberty_noasan)),, $(if $(wildcard \ +liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module \ + -avoid-version -bindir $(libexecsubdir) -export-symbols-regex \ + onload $(if $(wildcard $(libiberty_noasan)),, $(if $(wildcard \ $(libiberty_pic)),,-Wc,$(libiberty))) # Can be simplified when libiberty becomes a normal convenience library. libiberty = $(with_libiberty)/libiberty.a </cut>

4 years, 9 months

3
5
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit dc746ef741993a7aed1f7fc0083cd7a9636481a3 Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28705 # linux build successful: all # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28705 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Reproduce builds: <cut> mkdir investigate-binutils-dc746ef741993a7aed1f7fc0083cd7a9636481a3 cd investigate-binutils-dc746ef741993a7aed1f7fc0083cd7a9636481a3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach dc746ef741993a7aed1f7fc0083cd7a9636481a3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach f677852bbdaeac38c7d8ef859905879a21d5bb71 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit dc746ef741993a7aed1f7fc0083cd7a9636481a3 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Thu Sep 16 00:00:07 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 3fc5b8197cf..f7ae1790855 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210915 +#define BFD_VERSION_DATE 20210916 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 16 Sep

by Peter Maydell

Progress (short week, 3 days) * UM-2 [QEMU upstream maintainership] + more code review, notably the Apple Silicon hvf support, which is nearly ready to go in * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Sent out v2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. + Fixed a bug where we weren't correctly setting up FPSCR.LTPSIZE when using QEMU's user-mode-only emulator + Wrote some code to add support for the (not yet finalized) gdbstub XML that tells GDB that the guest CPU has MVE. This causes a GDB with the MVE handling to crash, so one or the other of us has got something wrong :-) KVM Forum was this week, as a 2-day virtual conference. I felt the programme was comparatively a bit small this year, but there were some interesting talks. Also a BoF session on whether/how we should consider adding Rust code to QEMU: I am pushing for (a) a clearer medium-to-long-term vision of where we would be going and why we'd be doing this and (b) more design-sketch type work of "what would XYZ in rust look like", which would hopefully both (a) make the benefit/lack thereof a bit more clear and (b) demonstrate that there are enough people enthusiastic enough about the prospect to make it a success... -- PMM

4 years, 9 months

1
0
0 0

[TCWG CI] 447.dealII,[.] grew in size by 164% after llvm: [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.

by ci_notify＠linaro.org

After llvm commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 Author: Amy Kwan <amy.kwan1(a)ibm.com> [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out. the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 447.dealII,[.] contract<3> grew in size by 164% Benchmark: Toolchain: Clang + Glibc + LLVM Linker Version: all components were built from their latest release branch Target: aarch64-linux-gnu Compiler flags: -Oz Hardware: APM Mustang 8x X-Gene1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 cd investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c8905f1bb304f1cfe297312ae0dda9946cb27594 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Fri Sep 3 14:53:57 2021 -0400 [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out. It appears when testing LLVM 13 on Power, we run into failures with the `libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing values out. Despite some the functions in the test already being marked with optnone, adding the `MarkAsLive()` calls inside of the pretty printer comparison functions resolves the issues of the values being optimized out. This patch aims to address https://llvm.org/PR51675. Differential Revision: https://reviews.llvm.org/D109204 (cherry picked from commit 217c6d643124be312f4a99b203118744edb9d54c) --- libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp index 2d8e9620089a..7c8d307d19fb 100644 --- a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp +++ b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp @@ -92,24 +92,28 @@ void MarkAsLive(Type &&) {} template <typename TypeToPrint> void ComparePrettyPrintToChars( TypeToPrint value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } template <typename TypeToPrint> void ComparePrettyPrintToRegex( TypeToPrint value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } void CompareExpressionPrettyPrintToChars( std::string value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } void CompareExpressionPrettyPrintToRegex( std::string value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] 456.hmmer slowed down by 4% after gcc: c++ ICE with nested requirement as default tpl parm[PR94827]

by ci_notify＠linaro.org

After gcc commit c416c52bcdb120db5e8c53a51bd78c4360daf79b Author: Nathan Sidwell <nathan(a)acm.org> c++ ICE with nested requirement as default tpl parm[PR94827] the following benchmarks slowed down by more than 2%: - 456.hmmer slowed down by 4% Benchmark: Toolchain: GCC + Glibc + GNU Linker Version: all components were built from their latest release branch Target: aarch64-linux-gnu Compiler flags: -O3 -flto Hardware: NVidia TX1 4x Cortex-A57 This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Reproduce builds: <cut> mkdir investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b cd investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach c416c52bcdb120db5e8c53a51bd78c4360daf79b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1983f4582bbe060b7da83578acb9ed653681fc8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit c416c52bcdb120db5e8c53a51bd78c4360daf79b Author: Nathan Sidwell <nathan(a)acm.org> Date: Thu Apr 30 08:23:16 2020 -0700 c++ ICE with nested requirement as default tpl parm[PR94827] Template headers are not incrementally updated as we parse its parameters. We maintain a dummy level until the closing > when we replace the dummy with a real parameter set. requires processing was expecting a properly populated arg_vec in current_template_parms, and then creates a self-mapping of parameters from that. But we don't need to do that, just teach map_arguments to look at TREE_VALUE when args is NULL. * constraint.cc (map_arguments): If ARGS is null, it's a self-mapping of parms. (finish_nested_requirement): Do not pass argified current_template_parms to normalization. (tsubst_nested_requirement): Don't assert no template parms. --- gcc/cp/ChangeLog | 10 ++++++++++ gcc/cp/constraint.cc | 27 ++++++++++++++++----------- gcc/testsuite/g++.dg/concepts/pr94827.C | 15 +++++++++++++++ 3 files changed, 41 insertions(+), 11 deletions(-) diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog index 1fa0e123cb1..3c57945cecf 100644 --- a/gcc/cp/ChangeLog +++ b/gcc/cp/ChangeLog @@ -1,3 +1,13 @@ +2020-04-30 Jason Merrill <jason(a)redhat.com> + Nathan Sidwell <nathan(a)acm.org> + + PR c++/94827 + * constraint.cc (map_arguments): If ARGS is null, it's a + self-mapping of parms. + (finish_nested_requirement): Do not pass argified + current_template_parms to normalization. + (tsubst_nested_requirement): Don't assert no template parms. + 2020-04-30 Iain Sandoe <iain(a)sandoe.co.uk> PR c++/94886 diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 866b0f51b05..85513fecf43 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -546,12 +546,16 @@ static tree map_arguments (tree parms, tree args) { for (tree p = parms; p; p = TREE_CHAIN (p)) - { - int level; - int index; - template_parm_level_and_index (TREE_VALUE (p), &level, &index); - TREE_PURPOSE (p) = TMPL_ARG (args, level, index); - } + if (args) + { + int level; + int index; + template_parm_level_and_index (TREE_VALUE (p), &level, &index); + TREE_PURPOSE (p) = TMPL_ARG (args, level, index); + } + else + TREE_PURPOSE (p) = TREE_VALUE (p); + return parms; } @@ -2005,8 +2009,6 @@ tsubst_compound_requirement (tree t, tree args, subst_info info) static tree tsubst_nested_requirement (tree t, tree args, subst_info info) { - gcc_assert (!uses_template_parms (args)); - /* Ensure that we're in an evaluation context prior to satisfaction. */ tree norm = TREE_VALUE (TREE_TYPE (t)); tree result = satisfy_constraint (norm, args, info); @@ -2953,12 +2955,15 @@ finish_compound_requirement (location_t loc, tree expr, tree type, bool noexcept tree finish_nested_requirement (location_t loc, tree expr) { + /* Currently open template headers have dummy arg vectors, so don't + pass into normalization. */ + tree norm = normalize_constraint_expression (expr, NULL_TREE, false); + tree args = current_template_parms + ? template_parms_to_args (current_template_parms) : NULL_TREE; + /* Save the normalized constraint and complete set of normalization arguments with the requirement. We keep the complete set of arguments around for re-normalization during diagnostics. */ - tree args = current_template_parms - ? template_parms_to_args (current_template_parms) : NULL_TREE; - tree norm = normalize_constraint_expression (expr, args, false); tree info = build_tree_list (args, norm); /* Build the constraint, saving its normalization as its type. */ diff --git a/gcc/testsuite/g++.dg/concepts/pr94827.C b/gcc/testsuite/g++.dg/concepts/pr94827.C new file mode 100644 index 00000000000..f14ec2551a1 --- /dev/null +++ b/gcc/testsuite/g++.dg/concepts/pr94827.C @@ -0,0 +1,15 @@ +// PR 94287 ICE looking inside open template-parm level +// { dg-do run { target c++17 } } +// { dg-options -fconcepts } + +template <typename T, + bool X = requires { requires (sizeof(T)==1); } > + int foo(T) { return X; } + +int main() { + if (!foo('4')) + return 1; + if (foo (4)) + return 2; + return 0; +} </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm: Inform pass manager when child loops are deleted

by ci_notify＠linaro.org

After llvm commit f17d60d620283b5d53286056ceeaeb8c27b6530a Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com> Inform pass manager when child loops are deleted Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a cd investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach f17d60d620283b5d53286056ceeaeb8c27b6530a ../artifacts/test.sh # Reproduce last_good build git checkout --detach f56129fe78d5c849971017976c71333b6b1a27c6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f17d60d620283b5d53286056ceeaeb8c27b6530a Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com> Date: Fri Sep 3 20:50:33 2021 +0200 Inform pass manager when child loops are deleted As part of the nontrivial unswitching we could end up removing child loops. This patch add a notification to the pass manager when that happens (using the markLoopAsDeleted callback). Without this there could be stale LoopAccessAnalysis results cached in the analysis manager. Those analysis results are cached based on a Loop* as key. Since the BumpPtrAllocator used to allocate Loop objects could be resetted between different runs of for example the loop-distribute pass (running on different functions), a new Loop object could be created using the same Loop pointer. And then when requiring the LoopAccessAnalysis for the loop we got the stale (corrupt) result from the destroyed loop. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D109257 (fixes PR51754) (cherry-picked from commit 0f0344dd1e3b53387bb396070916e67f4c426da6) --- llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | 43 +++++++++---- .../nontrivial-unswitch-markloopasdeleted.ll | 71 ++++++++++++++++++++++ 2 files changed, 102 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp index b9cccc2af309..b1c105258027 100644 --- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp +++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp @@ -1587,10 +1587,12 @@ deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks, BB->eraseFromParent(); } -static void deleteDeadBlocksFromLoop(Loop &L, - SmallVectorImpl<BasicBlock *> &ExitBlocks, - DominatorTree &DT, LoopInfo &LI, - MemorySSAUpdater *MSSAU) { +static void +deleteDeadBlocksFromLoop(Loop &L, + SmallVectorImpl<BasicBlock *> &ExitBlocks, + DominatorTree &DT, LoopInfo &LI, + MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { // Find all the dead blocks tied to this loop, and remove them from their // successors. SmallSetVector<BasicBlock *, 8> DeadBlockSet; @@ -1640,6 +1642,7 @@ static void deleteDeadBlocksFromLoop(Loop &L, }) && "If the child loop header is dead all blocks in the child loop must " "be dead as well!"); + DestroyLoopCB(*ChildL, ChildL->getName()); LI.destroy(ChildL); return true; }); @@ -1980,6 +1983,8 @@ static bool rebuildLoopAfterUnswitch(Loop &L, ArrayRef<BasicBlock *> ExitBlocks, ParentL->removeChildLoop(llvm::find(*ParentL, &L)); else LI.removeLoop(llvm::find(LI, &L)); + // markLoopAsDeleted for L should be triggered by the caller (it is typically + // done by using the UnswitchCB callback). LI.destroy(&L); return false; } @@ -2019,7 +2024,8 @@ static void unswitchNontrivialInvariants( SmallVectorImpl<BasicBlock *> &ExitBlocks, IVConditionInfo &PartialIVInfo, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { auto *ParentBB = TI.getParent(); BranchInst *BI = dyn_cast<BranchInst>(&TI); SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI); @@ -2319,7 +2325,7 @@ static void unswitchNontrivialInvariants( // Now that our cloned loops have been built, we can update the original loop. // First we delete the dead blocks from it and then we rebuild the loop // structure taking these deletions into account. - deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU); + deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU, DestroyLoopCB); if (MSSAU && VerifyMemorySSA) MSSAU->getMemorySSA()->verifyMemorySSA(); @@ -2670,7 +2676,8 @@ static bool unswitchBestCondition( Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, AAResults &AA, TargetTransformInfo &TTI, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { // Collect all invariant conditions within this loop (as opposed to an inner // loop which would be handled when visiting that inner loop). SmallVector<std::pair<Instruction *, TinyPtrVector<Value *>>, 4> @@ -2958,7 +2965,7 @@ static bool unswitchBestCondition( << "\n"); unswitchNontrivialInvariants(L, *BestUnswitchTI, BestUnswitchInvariants, ExitBlocks, PartialIVInfo, DT, LI, AC, - UnswitchCB, SE, MSSAU); + UnswitchCB, SE, MSSAU, DestroyLoopCB); return true; } @@ -2988,7 +2995,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, AAResults &AA, TargetTransformInfo &TTI, bool Trivial, bool NonTrivial, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { assert(L.isRecursivelyLCSSAForm(DT, LI) && "Loops must be in LCSSA form before unswitching."); @@ -3036,7 +3044,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, // Try to unswitch the best invariant condition. We prefer this full unswitch to // a partial unswitch when possible below the threshold. - if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU)) + if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU, + DestroyLoopCB)) return true; // No other opportunities to unswitch. @@ -3083,6 +3092,10 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM, U.markLoopAsDeleted(L, LoopName); }; + auto DestroyLoopCB = [&U](Loop &L, StringRef Name) { + U.markLoopAsDeleted(L, Name); + }; + Optional<MemorySSAUpdater> MSSAU; if (AR.MSSA) { MSSAU = MemorySSAUpdater(AR.MSSA); @@ -3091,7 +3104,8 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM, } if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial, UnswitchCB, &AR.SE, - MSSAU.hasValue() ? MSSAU.getPointer() : nullptr)) + MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, + DestroyLoopCB)) return PreservedAnalyses::all(); if (AR.MSSA && VerifyMemorySSA) @@ -3179,12 +3193,17 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) { LPM.markLoopAsDeleted(*L); }; + auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) { + LPM.markLoopAsDeleted(L); + }; + if (MSSA && VerifyMemorySSA) MSSA->verifyMemorySSA(); bool Changed = unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial, UnswitchCB, SE, - MSSAU.hasValue() ? MSSAU.getPointer() : nullptr); + MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, + DestroyLoopCB); if (MSSA && VerifyMemorySSA) MSSA->verifyMemorySSA(); diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll new file mode 100644 index 000000000000..455a38535576 --- /dev/null +++ b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll @@ -0,0 +1,71 @@ +; RUN: opt < %s -enable-loop-distribute -passes='loop-distribute,loop-mssa(simple-loop-unswitch<nontrivial>),loop-distribute' -o /dev/null -S -debug-pass-manager=verbose 2>&1 | FileCheck %s + + +; Running loop-distribute will result in LoopAccessAnalysis being required and +; cached in the LoopAnalysisManagerFunctionProxy. +; +; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner<header><latch><exiting> + + +; Then simple-loop-unswitch is removing/replacing some loops (resulting in +; Loop objects used as key in the analyses cache is destroyed). So here we +; want to see that any analysis results cached on the destroyed loop is +; cleared. A special case here is that loop_a_inner is destroyed when +; unswitching the parent loop. +; +; The bug solved and verified by this test case was related to the +; SimpleLoopUnswitch not marking the Loop as removed, so we missed clearing +; the analysis caches. +; +; CHECK: Running pass: SimpleLoopUnswitchPass on Loop at depth 1 containing: %loop_begin<header>,%loop_b,%loop_b_inner,%loop_b_inner_exit,%loop_a,%loop_a_inner,%loop_a_inner_exit,%latch<latch><exiting> +; CHECK-NEXT: Clearing all analysis results for: loop_a_inner + + +; When running loop-distribute the second time we can see that loop_a_inner +; isn't analysed because the loop no longer exists (instead we find a new loop, +; loop_a_inner.us). This kind of verifies that it was correct to remove the +; loop_a_inner related analysis above. +; +; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner.us<header><latch><exiting> + + +define i32 @test6(i1* %ptr, i1 %cond1, i32* %a.ptr, i32* %b.ptr) { +entry: + br label %loop_begin + +loop_begin: + %v = load i1, i1* %ptr + br i1 %cond1, label %loop_a, label %loop_b + +loop_a: + br label %loop_a_inner + +loop_a_inner: + %va = load i1, i1* %ptr + %a = load i32, i32* %a.ptr + br i1 %va, label %loop_a_inner, label %loop_a_inner_exit + +loop_a_inner_exit: + %a.lcssa = phi i32 [ %a, %loop_a_inner ] + br label %latch + +loop_b: + br label %loop_b_inner + +loop_b_inner: + %vb = load i1, i1* %ptr + %b = load i32, i32* %b.ptr + br i1 %vb, label %loop_b_inner, label %loop_b_inner_exit + +loop_b_inner_exit: + %b.lcssa = phi i32 [ %b, %loop_b_inner ] + br label %latch + +latch: + %ab.phi = phi i32 [ %a.lcssa, %loop_a_inner_exit ], [ %b.lcssa, %loop_b_inner_exit ] + br i1 %v, label %loop_begin, label %loop_exit + +loop_exit: + %ab.lcssa = phi i32 [ %ab.phi, %latch ] + ret i32 %ab.lcssa +} </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:d2bb6d512c0f7da8749de51522a0a98f3f08242a

by ci_notify＠linaro.org

Identified regression caused by *llvm:d2bb6d512c0f7da8749de51522a0a98f3f08242a*: commit d2bb6d512c0f7da8749de51522a0a98f3f08242a Author: Min-Yih Hsu <minyihh(a)uci.edu> [X86] Add explicit library dependency on LLVMInstrumentation Results regressed to (for first_bad == d2bb6d512c0f7da8749de51522a0a98f3f08242a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-d2bb6d512c0f7da8749de51522a0a98f3f08242a/results_id: 1 # 433.milc,milc_base.default regressed by 103 # 433.milc,[.] mult_su3_mat_vec regressed by 123 from (for last_good == ef8707574bbc7d264644d9e6730118cc0addd871) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-ef8707574bbc7d264644d9e6730118cc0addd871/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d2bb6d512c0f7da8749de51522a0a98f3f08242a cd investigate-llvm-d2bb6d512c0f7da8749de51522a0a98f3f08242a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d2bb6d512c0f7da8749de51522a0a98f3f08242a ../artifacts/test.sh # Reproduce last_good build git checkout --detach ef8707574bbc7d264644d9e6730118cc0addd871 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d2bb6d512c0f7da8749de51522a0a98f3f08242a Author: Min-Yih Hsu <minyihh(a)uci.edu> Date: Tue Aug 24 14:07:21 2021 -0700 [X86] Add explicit library dependency on LLVMInstrumentation Patch 9588b685c6b2 introduced dependency on ASAN. But it didn't explicitly put LLVMInstrumentation as one of the library dependencies such that the build will fail if we're building LLVM as shared libraries (i.e. -DBUILD_SHARED_LIBS=ON). This patch explicitly links X86CodeGen against the Instrumentation component. Differential Revision: https://reviews.llvm.org/D108662 --- llvm/lib/Target/X86/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/llvm/lib/Target/X86/CMakeLists.txt b/llvm/lib/Target/X86/CMakeLists.txt index a2816f6e5e84..d13c5d250554 100644 --- a/llvm/lib/Target/X86/CMakeLists.txt +++ b/llvm/lib/Target/X86/CMakeLists.txt @@ -91,6 +91,7 @@ add_llvm_target(X86CodeGen ${sources} AsmPrinter CodeGen Core + Instrumentation MC SelectionDAG Support </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:392aa7d7adfbd84253121d2ef779bf3c627e8d0b

by ci_notify＠linaro.org

Identified regression caused by *gcc:392aa7d7adfbd84253121d2ef779bf3c627e8d0b*: commit 392aa7d7adfbd84253121d2ef779bf3c627e8d0b Author: Jeff Law <law(a)torsion.usersys.redhat.com> Fix some testsuite failures for H8/SX multilibs where short branches where used when long branches were necessary. Results regressed to (for first_bad == 392aa7d7adfbd84253121d2ef779bf3c627e8d0b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-392aa7d7adfbd84253121d2ef779bf3c627e8d0b/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 104 from (for last_good == c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Reproduce builds: <cut> mkdir investigate-gcc-392aa7d7adfbd84253121d2ef779bf3c627e8d0b cd investigate-gcc-392aa7d7adfbd84253121d2ef779bf3c627e8d0b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 392aa7d7adfbd84253121d2ef779bf3c627e8d0b ../artifacts/test.sh # Reproduce last_good build git checkout --detach c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 392aa7d7adfbd84253121d2ef779bf3c627e8d0b Author: Jeff Law <law(a)torsion.usersys.redhat.com> Date: Wed Apr 29 10:19:22 2020 -0400 Fix some testsuite failures for H8/SX multilibs where short branches where used when long branches were necessary. * config/h8300/h8300.md (H8/SX div patterns): All H8/SX specific division instructions are 4 bytes long. --- gcc/ChangeLog | 5 +++++ gcc/config/h8300/h8300.md | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 80064da83ce..a2d4a1b82f4 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2020-04-29 Jeff Law <law(a)redhat.com> + + * config/h8300/h8300.md (H8/SX div patterns): All H8/SX specific + division instructions are 4 bytes long. + 2020-04-29 Jakub Jelinek <jakub(a)redhat.com> PR target/94826 diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md index 3e5cdbeeebe..a86b8ea2074 100644 --- a/gcc/config/h8300/h8300.md +++ b/gcc/config/h8300/h8300.md @@ -1218,7 +1218,7 @@ (match_operand:HSI 2 "reg_or_nibble_operand" "r IP4>X")))] "TARGET_H8300SX" { return <MODE>mode == HImode ? "divu.w\\t%T2,%T0" : "divu.l\\t%S2,%S0"; } - [(set_attr "length" "2")]) + [(set_attr "length" "4")]) (define_insn "div<mode>3" [(set (match_operand:HSI 0 "register_operand" "=r") @@ -1226,7 +1226,7 @@ (match_operand:HSI 2 "reg_or_nibble_operand" "r IP4>X")))] "TARGET_H8300SX" { return <MODE>mode == HImode ? "divs.w\\t%T2,%T0" : "divs.l\\t%S2,%S0"; } - [(set_attr "length" "2")]) + [(set_attr "length" "4")]) (define_insn "udivmodqi4" [(set (match_operand:QI 0 "register_operand" "=r") </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:2eb554a9feafff5188d8b924908205c87d7f2fee

by ci_notify＠linaro.org

Identified regression caused by *llvm:2eb554a9feafff5188d8b924908205c87d7f2fee*: commit 2eb554a9feafff5188d8b924908205c87d7f2fee Author: Roman Lebedev <lebedev.ri(a)gmail.com> Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Results regressed to (for first_bad == 2eb554a9feafff5188d8b924908205c87d7f2fee) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-2eb554a9feafff5188d8b924908205c87d7f2fee/results_id: 1 # 483.xalancbmk,libc.so.6 regressed by 81400 from (for last_good == 7142eb17fb3419a76c9ac4afce0df986ff08d61c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-7142eb17fb3419a76c9ac4afce0df986ff08d61c/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-2eb554a9feafff5188d8b924908205c87d7f2fee cd investigate-llvm-2eb554a9feafff5188d8b924908205c87d7f2fee # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 2eb554a9feafff5188d8b924908205c87d7f2fee ../artifacts/test.sh # Reproduce last_good build git checkout --detach 7142eb17fb3419a76c9ac4afce0df986ff08d61c ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 2eb554a9feafff5188d8b924908205c87d7f2fee Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Mon Aug 16 10:53:15 2021 +0300 Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit 3d9beefc7d713ad8462d92427ccd17b9532ce904. --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 75 +++------------------- .../SimplifyCFG/fold-branch-to-common-dest.ll | 18 +++--- 2 files changed, 18 insertions(+), 75 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 68a0388398fc..847fdd760d2f 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -1095,24 +1095,17 @@ static void CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses( // Update (liveout) uses of bonus instructions, // now that the bonus instruction has been cloned into predecessor. - // Note that we expect to be in a block-closed SSA form for this to work! + SSAUpdater SSAUpdate; + SSAUpdate.Initialize(BonusInst.getType(), + (NewBonusInst->getName() + ".merge").str()); + SSAUpdate.AddAvailableValue(BB, &BonusInst); + SSAUpdate.AddAvailableValue(PredBlock, NewBonusInst); for (Use &U : make_early_inc_range(BonusInst.uses())) { auto *UI = cast<Instruction>(U.getUser()); - auto *PN = dyn_cast<PHINode>(UI); - if (!PN) { - assert(UI->getParent() == BB && BonusInst.comesBefore(UI) && - "If the user is not a PHI node, then it should be in the same " - "block as, and come after, the original bonus instruction."); - continue; // Keep using the original bonus instruction. - } - // Is this the block-closed SSA form PHI node? - if (PN->getIncomingBlock(U) == BB) - continue; // Great, keep using the original bonus instruction. - // The only other alternative is an "use" when coming from - // the predecessor block - here we should refer to the cloned bonus instr. - assert(PN->getIncomingBlock(U) == PredBlock && - "Not in block-closed SSA form?"); - U.set(NewBonusInst); + if (UI->getParent() != PredBlock) + SSAUpdate.RewriteUseAfterInsertions(U); + else // Use is in the same block as, and comes before, NewBonusInst. + SSAUpdate.RewriteUse(U); } } } @@ -3039,56 +3032,6 @@ static bool performBranchToCommonDestFolding(BranchInst *BI, BranchInst *PBI, LLVM_DEBUG(dbgs() << "FOLDING BRANCH TO COMMON DEST:\n" << *PBI << *BB); - // We want to duplicate all the bonus instructions in this block, - // and rewrite their uses, but in some cases with self-loops, - // the naive use rewrite approach won't work (will result in miscompilations). - // To avoid this problem, let's form block-closed SSA form. - for (Instruction &BonusInst : - reverse(iterator_range<BasicBlock::iterator>(*BB))) { - auto IsBCSSAUse = [BB, &BonusInst](Use &U) { - auto *UI = cast<Instruction>(U.getUser()); - if (auto *PN = dyn_cast<PHINode>(UI)) - return PN->getIncomingBlock(U) == BB; - return UI->getParent() == BB && BonusInst.comesBefore(UI); - }; - - // Does this instruction require rewriting of uses? - if (all_of(BonusInst.uses(), IsBCSSAUse)) - continue; - - SSAUpdater SSAUpdate; - Type *Ty = BonusInst.getType(); - SmallVector<PHINode *, 8> BCSSAPHIs; - SSAUpdate.Initialize(Ty, BonusInst.getName()); - - // Into each successor block of BB, insert a PHI node, that receives - // the BonusInst when coming from it's basic block, or poison otherwise. - for (BasicBlock *Succ : successors(BB)) { - // The block may have the same successor multiple times. Do it only once. - if (SSAUpdate.HasValueForBlock(Succ)) - continue; - BCSSAPHIs.emplace_back(PHINode::Create( - Ty, 0, BonusInst.getName() + ".bcssa", &Succ->front())); - PHINode *PN = BCSSAPHIs.back(); - for (BasicBlock *PredOfSucc : predecessors(Succ)) - PN->addIncoming(PredOfSucc == BB ? (Value *)&BonusInst - : PoisonValue::get(Ty), - PredOfSucc); - SSAUpdate.AddAvailableValue(Succ, PN); - } - - // And rewrite all uses that break block-closed SSA form. - for (Use &U : make_early_inc_range(BonusInst.uses())) - if (!IsBCSSAUse(U)) - SSAUpdate.RewriteUseAfterInsertions(U); - - // We might not have ended up needing PHI's in all of the succ blocks, - // drop the ones that are certainly unused, but don't bother otherwise. - for (PHINode *PN : BCSSAPHIs) - if (PN->use_empty()) - PN->eraseFromParent(); - } - IRBuilder<> Builder(PBI); // The builder is used to create instructions to eliminate the branch in BB. // If BB's terminator has !annotation metadata, add it to the new diff --git a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll index d948b61d65a0..2ff041826077 100644 --- a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll +++ b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll @@ -834,7 +834,7 @@ define void @pr48450() { ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_BCSSA1:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] +; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_MERGE:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] ; CHECK-NEXT: [[C:%.*]] = call i1 @gen1() ; CHECK-NEXT: br i1 [[C]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: for.inc: @@ -849,7 +849,7 @@ define void @pr48450() { ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C2_NOT]], i1 true, i1 [[CMP_NOT]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[IF_END_LOOPEXIT]], label [[FOR_BODYTHREAD_PRE_SPLIT]] ; CHECK: for.bodythread-pre-split: -; CHECK-NEXT: [[DEC_BCSSA1]] = phi i8 [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC]], [[IF_THEN]] ] +; CHECK-NEXT: [[DEC_MERGE]] = phi i8 [ [[DEC]], [[IF_THEN]] ], [ [[DEC_OLD]], [[FOR_INC]] ] ; CHECK-NEXT: call void @sideeffect0() ; CHECK-NEXT: br label [[FOR_BODY]] ; CHECK: if.end.loopexit: @@ -885,7 +885,7 @@ define void @pr48450_2(i1 %enable_loopback) { ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_BCSSA1:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] +; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_MERGE:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] ; CHECK-NEXT: [[C:%.*]] = call i1 @gen1() ; CHECK-NEXT: br i1 [[C]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: for.inc: @@ -900,7 +900,7 @@ define void @pr48450_2(i1 %enable_loopback) { ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C2_NOT]], i1 true, i1 [[CMP_NOT]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[IF_END_LOOPEXIT]], label [[FOR_BODYTHREAD_PRE_SPLIT]] ; CHECK: for.bodythread-pre-split: -; CHECK-NEXT: [[DEC_BCSSA1]] = phi i8 [ poison, [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK:%.*]] ], [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC]], [[IF_THEN]] ] +; CHECK-NEXT: [[DEC_MERGE]] = phi i8 [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC_MERGE]], [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK:%.*]] ], [ [[DEC]], [[IF_THEN]] ] ; CHECK-NEXT: [[SHOULD_LOOPBACK:%.*]] = phi i1 [ true, [[FOR_INC]] ], [ false, [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK]] ], [ true, [[IF_THEN]] ] ; CHECK-NEXT: [[DO_LOOPBACK:%.*]] = and i1 [[SHOULD_LOOPBACK]], [[ENABLE_LOOPBACK:%.*]] ; CHECK-NEXT: call void @sideeffect0() @@ -1005,8 +1005,8 @@ define void @pr49510() { ; CHECK-NEXT: [[TOBOOL_OLD:%.*]] = icmp ne i16 [[DOTOLD]], 0 ; CHECK-NEXT: br i1 [[TOBOOL_OLD]], label [[LAND_RHS:%.*]], label [[FOR_END:%.*]] ; CHECK: land.rhs: -; CHECK-NEXT: [[DOTBCSSA:%.*]] = phi i16 [ [[DOTOLD]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[LAND_RHS]] ] -; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[DOTBCSSA]], 0 +; CHECK-NEXT: [[DOTMERGE:%.*]] = phi i16 [ [[TMP0:%.*]], [[LAND_RHS]] ], [ [[DOTOLD]], [[ENTRY:%.*]] ] +; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[DOTMERGE]], 0 ; CHECK-NEXT: [[TMP0]] = load i16, i16* @global_pr49510, align 1 ; CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i16 [[TMP0]], 0 ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[TOBOOL]], i1 false @@ -1043,15 +1043,15 @@ define i32 @pr51125() { ; CHECK-NEXT: [[ISZERO_OLD:%.*]] = icmp eq i32 [[LD_OLD]], 0 ; CHECK-NEXT: br i1 [[ISZERO_OLD]], label [[EXIT:%.*]], label [[L2:%.*]] ; CHECK: L2: -; CHECK-NEXT: [[LD_BCSSA1:%.*]] = phi i32 [ [[LD_OLD]], [[ENTRY:%.*]] ], [ [[LD:%.*]], [[L2]] ] +; CHECK-NEXT: [[LD_MERGE:%.*]] = phi i32 [ [[LD:%.*]], [[L2]] ], [ [[LD_OLD]], [[ENTRY:%.*]] ] ; CHECK-NEXT: store i32 -1, i32* @global_pr51125, align 4 -; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[LD_BCSSA1]], -1 +; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[LD_MERGE]], -1 ; CHECK-NEXT: [[LD]] = load i32, i32* @global_pr51125, align 4 ; CHECK-NEXT: [[ISZERO:%.*]] = icmp eq i32 [[LD]], 0 ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[ISZERO]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[EXIT]], label [[L2]] ; CHECK: exit: -; CHECK-NEXT: [[R:%.*]] = phi i32 [ [[LD_BCSSA1]], [[L2]] ], [ [[LD_OLD]], [[ENTRY]] ] +; CHECK-NEXT: [[R:%.*]] = phi i32 [ [[LD]], [[L2]] ], [ [[LD_OLD]], [[ENTRY]] ] ; CHECK-NEXT: ret i32 [[R]] ; entry: </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 This commit has regressed these CI configurations: - tcwg_gnu_cross_build/master-aarch64 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Even more details: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap: 2 This commit has regressed these CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 This commit has regressed these CI configurations: - tcwg_gnu_native_build/master-arm Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Even more details: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:5485bbebb3679245dd4bc7c149bbc940f8b2e632

by ci_notify＠linaro.org

Identified regression caused by *gcc:5485bbebb3679245dd4bc7c149bbc940f8b2e632*: commit 5485bbebb3679245dd4bc7c149bbc940f8b2e632 Author: Aldy Hernandez <aldyh(a)redhat.com> Refactor jump_thread_path_registry. Results regressed to (for first_bad == 5485bbebb3679245dd4bc7c149bbc940f8b2e632) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-5485bbebb3679245dd4bc7c149bbc940f8b2e632/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 104 from (for last_good == 3fca63b0b6faf6a30ed735b86b8eb59944701fc1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-3fca63b0b6faf6a30ed735b86b8eb59944701fc1/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-5485bbebb3679245dd4bc7c149bbc940f8b2e632 cd investigate-gcc-5485bbebb3679245dd4bc7c149bbc940f8b2e632 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5485bbebb3679245dd4bc7c149bbc940f8b2e632 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3fca63b0b6faf6a30ed735b86b8eb59944701fc1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5485bbebb3679245dd4bc7c149bbc940f8b2e632 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Sat Sep 11 09:37:39 2021 +0200 Refactor jump_thread_path_registry. In an attempt to refactor thread_through_all_blocks(), I've realized that there is a mess of code dealing with coexisting forward and backward thread types. However, this is an impossible scenario, as the registry contains either forward/old-style threads, or backward threads (EDGE_FSM_THREADs), never both. The fact that both types of threads cannot coexist, simplifies the code considerably. For that matter, it splits things up nicely because there are some common bits that can go into a base class, and some differing code that can go into derived classes. Diving things in this way makes it very obvious which parts belong in the old-style copier and which parts belong to the generic copier. Doing all this provided some nice cleanups, as well as fixing a latent bug in adjust_paths_after_duplication. The diff is somewhat hard to read, so perhaps looking at the final output would be easier. A general overview of what this patch achieves can be seen by just looking at this simplified class layout: // Abstract class for the jump thread registry. class jt_path_registry { public: jt_path_registry (); virtual ~jt_path_registry (); bool register_jump_thread (vec<jump_thread_edge *> *); bool thread_through_all_blocks (bool peel_loop_headers); jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t); vec<jump_thread_edge *> *allocate_thread_path (); protected: vec<vec<jump_thread_edge *> *> m_paths; unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; }; // Forward threader path registry using a custom BB copier. class fwd_jt_path_registry : public jt_path_registry { public: fwd_jt_path_registry (); ~fwd_jt_path_registry (); void remove_jump_threads_including (edge); private: bool update_cfg (bool peel_loop_headers) override; void mark_threaded_blocks (bitmap threaded_blocks); bool thread_block_1 (basic_block, bool noloop_only, bool joiners); bool thread_block (basic_block, bool noloop_only); bool thread_through_loop_header (class loop *loop, bool may_peel_loop_headers); class redirection_data *lookup_redirection_data (edge e, enum insert_option); hash_table<struct removed_edges> *m_removed_edges; hash_table<redirection_data> *m_redirection_data; }; // Backward threader path registry using a generic BB copier. class back_jt_path_registry : public jt_path_registry { private: bool update_cfg (bool peel_loop_headers) override; void adjust_paths_after_duplication (unsigned curr_path_num); bool duplicate_thread_path (edge entry, edge exit, basic_block *region, unsigned n_region, unsigned current_path_no); bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); }; That is, the forward and backward bits have been completely split, while deriving from a base class for the common functionality. Most everything is mechanical, but there are a few gotchas: a) back_jt_path_registry::update_cfg(), which contains the backward threading specific bits, is rather simple, since most of the code in the original thread_through_all_blocks() only applied to the forward threader: removed edges, mark_threaded_blocks, thread_through_loop_header, the copy tables (*). (*) The back threader has its own copy tables in duplicate_thread_path. b) In some cases, adjust_paths_after_duplication() was commoning out so many blocks that it was removing the initial EDGE_FSM_THREAD marker. I've fixed this. c) AFAICT, when run from the forward threader, thread_through_all_blocks() attempts to remove threads starting with an edge already seen, but it would never see anything because the loop doing the checking only has a visited_starting_edges.contains(), and no corresponding visited_starting_edges.add(). The add() method in thread_through_all_blocks belongs to the backward threading bits, and as I've explained, both types cannot coexist. I've removed the checks in the forward bits since they don't appear to do anything. If this was an oversight, and we want to avoid threading already seen edges in the forward threader, I can move this functionality to the base class. Ultimately I would like to move all the registry code to tree-ssa-threadregistry.*. I've avoided this in this patch to aid in review. My apologies for this longass explanation, but I want to make sure we're covering all of our bases. Tested on x86-64 Linux by a very tedious process of moving chunks around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and repeating ad-nauseum. And of course, by running a full bootstrap and tests. OK? p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD type. gcc/ChangeLog: * tree-ssa-threadbackward.c (class back_threader_registry): Use back_jt_path_registry. * tree-ssa-threadedge.c (jump_threader::jump_threader): Use fwd_jt_path_registry. * tree-ssa-threadedge.h (class jump_threader): Same.. * tree-ssa-threadupdate.c (jump_thread_path_registry::jump_thread_path_registry): Rename... (jt_path_registry::jt_path_registry): ...to this. (jump_thread_path_registry::~jump_thread_path_registry): Rename... (jt_path_registry::~jt_path_registry): ...this. (fwd_jt_path_registry::fwd_jt_path_registry): New. (fwd_jt_path_registry::~fwd_jt_path_registry): New. (jump_thread_path_registry::allocate_thread_edge): Rename... (jt_path_registry::allocate_thread_edge): ...to this. (jump_thread_path_registry::allocate_thread_path): Rename... (jt_path_registry::allocate_thread_path): ...to this. (jump_thread_path_registry::lookup_redirection_data): Rename... (fwd_jt_path_registry::lookup_redirection_data): ...to this. (jump_thread_path_registry::thread_block_1): Rename... (fwd_jt_path_registry::thread_block_1): ...to this. (jump_thread_path_registry::thread_block): Rename... (fwd_jt_path_registry::thread_block): ...to this. (jt_path_registry::thread_through_loop_header): Rename... (fwd_jt_path_registry::thread_through_loop_header): ...to this. (jump_thread_path_registry::mark_threaded_blocks): Rename... (fwd_jt_path_registry::mark_threaded_blocks): ...to this. (jump_thread_path_registry::debug_path): Rename... (jt_path_registry::debug_path): ...to this. (jump_thread_path_registry::dump): Rename... (jt_path_registry::debug): ...to this. (jump_thread_path_registry::rewire_first_differing_edge): Rename... (back_jt_path_registry::rewire_first_differing_edge): ...to this. (jump_thread_path_registry::adjust_paths_after_duplication): Rename... (back_jt_path_registry::adjust_paths_after_duplication): ...to this. (jump_thread_path_registry::duplicate_thread_path): Rename... (back_jt_path_registry::duplicate_thread_path): ...to this. Also, drop ill-formed candidates. (jump_thread_path_registry::remove_jump_threads_including): Rename... (fwd_jt_path_registry::remove_jump_threads_including): ...to this. (jt_path_registry::thread_through_all_blocks): New. (back_jt_path_registry::update_cfg): New. (fwd_jt_path_registry::update_cfg): New. (jump_thread_path_registry::register_jump_thread): Rename... (jt_path_registry::register_jump_thread): ...to this. * tree-ssa-threadupdate.h (class jump_thread_path_registry): Abstract to... (class jt_path_registry): ...here. (class fwd_jt_path_registry): New. (class back_jt_path_registry): New. --- gcc/tree-ssa-threadbackward.c | 2 +- gcc/tree-ssa-threadedge.c | 2 +- gcc/tree-ssa-threadedge.h | 2 +- gcc/tree-ssa-threadupdate.c | 213 +++++++++++++++++++++--------------------- gcc/tree-ssa-threadupdate.h | 60 +++++++----- 5 files changed, 149 insertions(+), 130 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index e72992328de..7ff5cecbdab 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -56,7 +56,7 @@ public: bool register_path (const vec<basic_block> &, edge taken); bool thread_through_all_blocks (bool may_peel_loop_headers); private: - jump_thread_path_registry m_lowlevel_registry; + back_jt_path_registry m_lowlevel_registry; const int m_max_allowable_paths; int m_threaded_paths; }; diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 3c7cdc58b93..422cb89401b 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -71,7 +71,7 @@ jump_threader::jump_threader (jump_threader_simplifier *simplifier, dummy_cond = gimple_build_cond (NE_EXPR, integer_zero_node, integer_zero_node, NULL, NULL); - m_registry = new jump_thread_path_registry (); + m_registry = new fwd_jt_path_registry (); m_simplifier = simplifier; m_state = state; } diff --git a/gcc/tree-ssa-threadedge.h b/gcc/tree-ssa-threadedge.h index 0002b200d8b..18e6bd41aaa 100644 --- a/gcc/tree-ssa-threadedge.h +++ b/gcc/tree-ssa-threadedge.h @@ -75,7 +75,7 @@ private: // Dummy condition to avoid creating lots of throw away statements. gcond *dummy_cond; - class jump_thread_path_registry *m_registry; + class fwd_jt_path_registry *m_registry; jump_threader_simplifier *m_simplifier; jt_state *m_state; }; diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 18f16efbb7a..93538104fdf 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -167,29 +167,36 @@ jump_thread_path_allocator::allocate_thread_path () return new (r) vec<jump_thread_edge *> (); } -jump_thread_path_registry::jump_thread_path_registry () +jt_path_registry::jt_path_registry () { m_paths.create (5); - m_removed_edges = new hash_table<struct removed_edges> (17); m_num_threaded_edges = 0; - m_redirection_data = NULL; } -jump_thread_path_registry::~jump_thread_path_registry () +jt_path_registry::~jt_path_registry () { m_paths.release (); +} + +fwd_jt_path_registry::fwd_jt_path_registry () +{ + m_removed_edges = new hash_table<struct removed_edges> (17); + m_redirection_data = NULL; +} + +fwd_jt_path_registry::~fwd_jt_path_registry () +{ delete m_removed_edges; } jump_thread_edge * -jump_thread_path_registry::allocate_thread_edge (edge e, - jump_thread_edge_type t) +jt_path_registry::allocate_thread_edge (edge e, jump_thread_edge_type t) { return m_allocator.allocate_thread_edge (e, t); } vec<jump_thread_edge *> * -jump_thread_path_registry::allocate_thread_path () +jt_path_registry::allocate_thread_path () { return m_allocator.allocate_thread_path (); } @@ -426,8 +433,7 @@ create_block_for_threading (basic_block bb, edges associated with E in the hash table. */ redirection_data * -jump_thread_path_registry::lookup_redirection_data (edge e, - enum insert_option insert) +fwd_jt_path_registry::lookup_redirection_data (edge e, insert_option insert) { struct redirection_data **slot; struct redirection_data *elt; @@ -1413,9 +1419,9 @@ redirection_block_p (basic_block bb) If JOINERS is true, then thread through joiner blocks as well. */ bool -jump_thread_path_registry::thread_block_1 (basic_block bb, - bool noloop_only, - bool joiners) +fwd_jt_path_registry::thread_block_1 (basic_block bb, + bool noloop_only, + bool joiners) { /* E is an incoming edge into BB that we may or may not want to redirect to a duplicate of BB. */ @@ -1594,7 +1600,7 @@ jump_thread_path_registry::thread_block_1 (basic_block bb, opportunity. */ bool -jump_thread_path_registry::thread_block (basic_block bb, bool noloop_only) +fwd_jt_path_registry::thread_block (basic_block bb, bool noloop_only) { bool retval; retval = thread_block_1 (bb, noloop_only, false); @@ -1675,9 +1681,8 @@ determine_bb_domination_status (class loop *loop, basic_block bb) to the inside of the loop. */ bool -jump_thread_path_registry::thread_through_loop_header - (class loop *loop, - bool may_peel_loop_headers) +fwd_jt_path_registry::thread_through_loop_header (class loop *loop, + bool may_peel_loop_headers) { basic_block header = loop->header; edge e, tgt_edge, latch = loop_latch_edge (loop); @@ -1932,7 +1937,7 @@ count_stmts_and_phis_in_block (basic_block bb) hash table lookups to map from threaded edge to new target. */ void -jump_thread_path_registry::mark_threaded_blocks (bitmap threaded_blocks) +fwd_jt_path_registry::mark_threaded_blocks (bitmap threaded_blocks) { unsigned int i; bitmap_iterator bi; @@ -2197,7 +2202,7 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n) } void -jump_thread_path_registry::debug_path (FILE *dump_file, int pathno) +jt_path_registry::debug_path (FILE *dump_file, int pathno) { vec<jump_thread_edge *> *p = m_paths[pathno]; fprintf (dump_file, "path: "); @@ -2208,7 +2213,7 @@ jump_thread_path_registry::debug_path (FILE *dump_file, int pathno) } void -jump_thread_path_registry::dump () +jt_path_registry::debug () { for (unsigned i = 0; i < m_paths.length (); ++i) debug_path (stderr, i); @@ -2223,8 +2228,8 @@ jump_thread_path_registry::dump () Returns TRUE if we were able to successfully rewire the edge. */ bool -jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num, - unsigned edge_num) +back_jt_path_registry::rewire_first_differing_edge (unsigned path_num, + unsigned edge_num) { vec<jump_thread_edge *> *path = m_paths[path_num]; edge &e = (*path)[edge_num]->e; @@ -2269,11 +2274,9 @@ jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num, specifies the path that was just threaded. */ void -jump_thread_path_registry::adjust_paths_after_duplication - (unsigned curr_path_num) +back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num) { vec<jump_thread_edge *> *curr_path = m_paths[curr_path_num]; - gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2347,8 +2350,16 @@ jump_thread_path_registry::adjust_paths_after_duplication m_paths.unordered_remove (cand_path_num); continue; } - /* Otherwise, just remove the redundant sub-path. */ - cand_path->block_remove (0, j); + if ((*cand_path)[j]->type != EDGE_FSM_THREAD) + { + /* If all the EDGE_FSM_THREADs are common, all that's + left is the final EDGE_NO_COPY_SRC_BLOCK. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Dropping illformed candidate.\n"); + } + else + /* Otherwise, just remove the redundant sub-path. */ + cand_path->block_remove (0, j); } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2372,11 +2383,11 @@ jump_thread_path_registry::adjust_paths_after_duplication Returns false if it is unable to copy the region, true otherwise. */ bool -jump_thread_path_registry::duplicate_thread_path (edge entry, - edge exit, - basic_block *region, - unsigned n_region, - unsigned current_path_no) +back_jt_path_registry::duplicate_thread_path (edge entry, + edge exit, + basic_block *region, + unsigned n_region, + unsigned current_path_no) { unsigned i; class loop *loop = entry->dest->loop_father; @@ -2551,7 +2562,7 @@ valid_jump_thread_path (vec<jump_thread_edge *> *path) DOM/VRP rather than for every case where DOM optimizes away a COND_EXPR. */ void -jump_thread_path_registry::remove_jump_threads_including (edge_def *e) +fwd_jt_path_registry::remove_jump_threads_including (edge_def *e) { if (!m_paths.exists ()) return; @@ -2560,69 +2571,52 @@ jump_thread_path_registry::remove_jump_threads_including (edge_def *e) *slot = e; } -/* Walk through all blocks and thread incoming edges to the appropriate - outgoing edge for each edge pair recorded in THREADED_EDGES. +/* Thread all paths that have been queued for jump threading, and + update the CFG accordingly. It is the caller's responsibility to fix the dominance information and rewrite duplicated SSA_NAMEs back into SSA form. - If MAY_PEEL_LOOP_HEADERS is false, we avoid threading edges through - loop headers if it does not simplify the loop. + If PEEL_LOOP_HEADERS is false, avoid threading edges through loop + headers if it does not simplify the loop. - Returns true if one or more edges were threaded, false otherwise. */ + Returns true if one or more edges were threaded. */ bool -jump_thread_path_registry::thread_through_all_blocks - (bool may_peel_loop_headers) +jt_path_registry::thread_through_all_blocks (bool peel_loop_headers) { - bool retval = false; - unsigned int i; - auto_bitmap threaded_blocks; - hash_set<edge> visited_starting_edges; - - if (!m_paths.exists ()) - { - retval = false; - goto out; - } + if (m_paths.length () == 0) + return false; m_num_threaded_edges = 0; - /* Remove any paths that referenced removed edges. */ - if (m_removed_edges) - for (i = 0; i < m_paths.length (); ) - { - unsigned int j; - vec<jump_thread_edge *> *path = m_paths[i]; + bool retval = update_cfg (peel_loop_headers); - for (j = 0; j < path->length (); j++) - { - edge e = (*path)[j]->e; - if (m_removed_edges->find_slot (e, NO_INSERT)) - break; - } + statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges); - if (j != path->length ()) - { - cancel_thread (path, "Thread references removed edge"); - m_paths.unordered_remove (i); - continue; - } - i++; - } + if (retval) + { + loops_state_set (LOOPS_NEED_FIXUP); + return true; + } + return false; +} - /* Jump-thread all FSM threads before other jump-threads. */ - for (i = 0; i < m_paths.length ();) +/* This is the backward threader version of thread_through_all_blocks + using a generic BB copier. */ + +bool +back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) +{ + bool retval = false; + hash_set<edge> visited_starting_edges; + + while (m_paths.length ()) { - vec<jump_thread_edge *> *path = m_paths[i]; + vec<jump_thread_edge *> *path = m_paths[0]; edge entry = (*path)[0]->e; - /* Only code-generate FSM jump-threads in this loop. */ - if ((*path)[0]->type != EDGE_FSM_THREAD) - { - i++; - continue; - } + gcc_checking_assert ((*path)[0]->type == EDGE_FSM_THREAD); /* Do not jump-thread twice from the same starting edge. @@ -2638,8 +2632,8 @@ jump_thread_path_registry::thread_through_all_blocks || !valid_jump_thread_path (path)) { /* Remove invalid FSM jump-thread paths. */ - cancel_thread (path, "Invalid FSM jump-thread path"); - m_paths.unordered_remove (i); + cancel_thread (path, "Avoiding threading twice from same edge"); + m_paths.unordered_remove (0); continue; } @@ -2650,7 +2644,7 @@ jump_thread_path_registry::thread_through_all_blocks for (unsigned int j = 0; j < len - 1; j++) region[j] = (*path)[j]->e->dest; - if (duplicate_thread_path (entry, exit, region, len - 1, i)) + if (duplicate_thread_path (entry, exit, region, len - 1, 0)) { /* We do not update dominance info. */ free_dominance_info (CDI_DOMINATORS); @@ -2660,27 +2654,44 @@ jump_thread_path_registry::thread_through_all_blocks } path->release (); - m_paths.unordered_remove (i); + m_paths.unordered_remove (0); free (region); } + return retval; +} - /* Remove from PATHS all the jump-threads starting with an edge already - jump-threaded. */ - for (i = 0; i < m_paths.length ();) - { - vec<jump_thread_edge *> *path = m_paths[i]; - edge entry = (*path)[0]->e; +/* This is the forward threader version of thread_through_all_blocks, + using a custom BB copier. */ - /* Do not jump-thread twice from the same block. */ - if (visited_starting_edges.contains (entry)) - { - cancel_thread (path, "Avoiding threading twice from same BB"); - m_paths.unordered_remove (i); - } - else +bool +fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) +{ + bool retval = false; + + /* Remove any paths that referenced removed edges. */ + if (m_removed_edges) + for (unsigned i = 0; i < m_paths.length (); ) + { + unsigned int j; + vec<jump_thread_edge *> *path = m_paths[i]; + + for (j = 0; j < path->length (); j++) + { + edge e = (*path)[j]->e; + if (m_removed_edges->find_slot (e, NO_INSERT)) + break; + } + + if (j != path->length ()) + { + cancel_thread (path, "Thread references removed edge"); + m_paths.unordered_remove (i); + continue; + } i++; - } + } + auto_bitmap threaded_blocks; mark_threaded_blocks (threaded_blocks); initialize_original_copy_tables (); @@ -2737,16 +2748,8 @@ jump_thread_path_registry::thread_through_all_blocks gcc_assert (e->aux == NULL); } - statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges); - free_original_copy_tables (); - m_paths.release (); - - if (retval) - loops_state_set (LOOPS_NEED_FIXUP); - - out: return retval; } @@ -2761,7 +2764,7 @@ jump_thread_path_registry::thread_through_all_blocks Return TRUE if PATH was successfully threaded. */ bool -jump_thread_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) +jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) { if (!dbg_cnt (registered_jump_thread)) { diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 2030bda15af..58e3a38e0c5 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -54,49 +54,65 @@ private: obstack m_obstack; }; -// This is the underlying jump thread registry. When all candidates -// have been registered with register_jump_thread(), -// thread_through_all_blocks() is called to actually change the CFG. +// Abstract class for the jump thread registry. +// +// When all candidates have been registered with +// register_jump_thread(), thread_through_all_blocks() is called to +// update the CFG. -class jump_thread_path_registry +class jt_path_registry { public: - jump_thread_path_registry (); - ~jump_thread_path_registry (); + jt_path_registry (); + virtual ~jt_path_registry (); bool register_jump_thread (vec<jump_thread_edge *> *); - void remove_jump_threads_including (edge); - bool thread_through_all_blocks (bool); + bool thread_through_all_blocks (bool peel_loop_headers); jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t); vec<jump_thread_edge *> *allocate_thread_path (); - void dump (); + void debug (); +protected: + void debug_path (FILE *, int pathno); + vec<vec<jump_thread_edge *> *> m_paths; + unsigned long m_num_threaded_edges; +private: + virtual bool update_cfg (bool peel_loop_headers) = 0; + jump_thread_path_allocator m_allocator; + DISABLE_COPY_AND_ASSIGN (jt_path_registry); +}; + +// Forward threader path registry using a custom BB copier. +class fwd_jt_path_registry : public jt_path_registry +{ +public: + fwd_jt_path_registry (); + ~fwd_jt_path_registry (); + void remove_jump_threads_including (edge); private: - void debug_path (FILE *, int pathno); + bool update_cfg (bool peel_loop_headers) override; void mark_threaded_blocks (bitmap threaded_blocks); - bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); - void adjust_paths_after_duplication (unsigned curr_path_num); - bool duplicate_thread_path (edge entry, - edge exit, - basic_block *region, - unsigned n_region, - unsigned current_path_no); bool thread_block_1 (basic_block, bool noloop_only, bool joiners); bool thread_block (basic_block, bool noloop_only); bool thread_through_loop_header (class loop *loop, bool may_peel_loop_headers); class redirection_data *lookup_redirection_data (edge e, enum insert_option); - vec<vec<jump_thread_edge *> *> m_paths; - hash_table<struct removed_edges> *m_removed_edges; // Main data structure to hold information for duplicates of BB. hash_table<redirection_data> *m_redirection_data; +}; - // Jump threading statistics. - unsigned long m_num_threaded_edges; +// Backward threader path registry using a generic BB copier. - jump_thread_path_allocator m_allocator; +class back_jt_path_registry : public jt_path_registry +{ +private: + bool update_cfg (bool peel_loop_headers) override; + void adjust_paths_after_duplication (unsigned curr_path_num); + bool duplicate_thread_path (edge entry, edge exit, basic_block *region, + unsigned n_region, unsigned current_path_no); + bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); }; // Rather than search all the edges in jump thread paths each time DOM </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig Culprit: <cut> commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 Author: Slark Xiao <slark_xiao(a)163.com> Date: Tue Aug 31 10:40:25 2021 +0800 net: Add depends on OF_NET for LiteX's LiteETH Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao(a)163.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> </cut> Results regressed to (for first_bad == c3496da580b0fc10fdeba8f6a5e6aef4c78b5598) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29873 # linux build successful: all # First few build errors in logs: from (for last_good == a9e7c3cedc2914f63cd135b75832b9bf850af782) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29873 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 cd investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a9e7c3cedc2914f63cd135b75832b9bf850af782 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Full commit (up to 1000 lines): <cut> commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 Author: Slark Xiao <slark_xiao(a)163.com> Date: Tue Aug 31 10:40:25 2021 +0800 net: Add depends on OF_NET for LiteX's LiteETH Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao(a)163.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- drivers/net/ethernet/litex/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/litex/Kconfig b/drivers/net/ethernet/litex/Kconfig index 265dba414b41..63bf01d28f0c 100644 --- a/drivers/net/ethernet/litex/Kconfig +++ b/drivers/net/ethernet/litex/Kconfig @@ -17,6 +17,7 @@ if NET_VENDOR_LITEX config LITEX_LITEETH tristate "LiteX Ethernet support" + depends on OF_NET help If you wish to compile a kernel for hardware with a LiteX LiteEth device then you should answer Y to this. </cut>

4 years, 9 months

3
3
0 0

[TCWG CI] Regression caused by llvm:09507b53250dc266632c204558cb1c2b56e8ddea

by ci_notify＠linaro.org

Identified regression caused by *llvm:09507b53250dc266632c204558cb1c2b56e8ddea*: commit 09507b53250dc266632c204558cb1c2b56e8ddea Author: Cullen Rhodes <cullen.rhodes(a)arm.com> [AArch64][SME] Disable NEON in streaming mode Results regressed to (for first_bad == 09507b53250dc266632c204558cb1c2b56e8ddea) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-09507b53250dc266632c204558cb1c2b56e8ddea/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 93c55d5ea24b8f455b0621bac373f142e0008739) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-93c55d5ea24b8f455b0621bac373f142e0008739/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-09507b53250dc266632c204558cb1c2b56e8ddea cd investigate-llvm-09507b53250dc266632c204558cb1c2b56e8ddea # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 09507b53250dc266632c204558cb1c2b56e8ddea ../artifacts/test.sh # Reproduce last_good build git checkout --detach 93c55d5ea24b8f455b0621bac373f142e0008739 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 09507b53250dc266632c204558cb1c2b56e8ddea Author: Cullen Rhodes <cullen.rhodes(a)arm.com> Date: Mon Aug 16 07:31:55 2021 +0000 [AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902 --- llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp | 11 ++++++++++- llvm/test/MC/AArch64/SME/streaming-sve-feature.s | 8 ++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp index 3c2df1621e11..987cabce6cc9 100644 --- a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp +++ b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp @@ -57,7 +57,16 @@ createAArch64MCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) { CPU = "apple-a12"; } - return createAArch64MCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + // Most of the NEON instruction set isn't supported in streaming mode on SME + // targets, disable NEON unless explicitly requested. + bool RequestedNEON = FS.contains("neon"); + bool RequestedStreamingSVE = FS.contains("streaming-sve"); + MCSubtargetInfo *STI = + createAArch64MCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + if (RequestedStreamingSVE && !RequestedNEON && + STI->hasFeature(AArch64::FeatureNEON)) + STI->ToggleFeature(AArch64::FeatureNEON); + return STI; } void AArch64_MC::initLLVMToCVRegMapping(MCRegisterInfo *MRI) { diff --git a/llvm/test/MC/AArch64/SME/streaming-sve-feature.s b/llvm/test/MC/AArch64/SME/streaming-sve-feature.s new file mode 100644 index 000000000000..e35505ca39c5 --- /dev/null +++ b/llvm/test/MC/AArch64/SME/streaming-sve-feature.s @@ -0,0 +1,8 @@ +// RUN: llvm-mc -triple=aarch64 -mattr=+streaming-sve,+neon < %s 2>&1 | FileCheck %s +// RUN: not llvm-mc -triple=aarch64 -mattr=+streaming-sve < %s 2>&1 | FileCheck %s --check-prefix=CHECK-ERROR + +// Verify NEON is disabled when targeting streaming mode, if it's not +// explicitly requested. +add v0.8b, v1.8b, v2.8b +// CHECK: add v0.8b, v1.8b, v2.8b +// CHECK-ERROR: error: instruction requires: neon </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:4389a413e2129d7d55ee779638b649aa852b6f8a

by ci_notify＠linaro.org

Identified regression caused by *llvm:4389a413e2129d7d55ee779638b649aa852b6f8a*: commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" Results regressed to (for first_bad == 4389a413e2129d7d55ee779638b649aa852b6f8a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-4389a413e2129d7d55ee779638b649aa852b6f8a/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == dfce2909ee1ea1523ec27b834a0e56429e9c2beb) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-dfce2909ee1ea1523ec27b834a0e56429e9c2beb/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a cd investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4389a413e2129d7d55ee779638b649aa852b6f8a ../artifacts/test.sh # Reproduce last_good build git checkout --detach dfce2909ee1ea1523ec27b834a0e56429e9c2beb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 838669794ea8..980d0ab45975 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 1c79640be80f..96bbc0250126 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2641,7 +2641,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2666,7 +2666,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2686,18 +2686,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2783,11 +2785,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2897,17 +2897,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:01b5038718056b024b370b74a874fbd92c5bbab3

by ci_notify＠linaro.org

Identified regression caused by *gcc:01b5038718056b024b370b74a874fbd92c5bbab3*: commit 01b5038718056b024b370b74a874fbd92c5bbab3 Author: Aldy Hernandez <aldyh(a)redhat.com> Disable threading through latches until after loop optimizations. Results regressed to (for first_bad == 01b5038718056b024b370b74a874fbd92c5bbab3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-01b5038718056b024b370b74a874fbd92c5bbab3/results_id: 1 # 459.GemsFDTD,GemsFDTD_base.default regressed by 102 # 464.h264ref,h264ref_base.default regressed by 102 from (for last_good == fb88bf9931f17d137eb50c001e1c924aa1e34e83) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-fb88bf9931f17d137eb50c001e1c924aa1e34e83/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3 cd investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 01b5038718056b024b370b74a874fbd92c5bbab3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fb88bf9931f17d137eb50c001e1c924aa1e34e83 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 01b5038718056b024b370b74a874fbd92c5bbab3 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 9 20:30:28 2021 +0200 Disable threading through latches until after loop optimizations. The motivation for this patch was enabling the use of global ranges in the path solver, but this caused certain properties of loops being destroyed which made subsequent loop optimizations to fail. Consequently, this patch's mail goal is to disable jump threading involving the latch until after loop optimizations have run. As can be seen in the test adjustments, we mostly shift the threading from the early threaders (ethread, thread[12] to the late threaders thread[34]). I have nuked some of the early notes in the testcases that came as part of the jump threader rewrite. They're mostly noise now. Note that we could probably relax some other restrictions in profitable_path_p when loop optimizations have completed, but it would require more testing, and I'm hesitant to touch more things than needed at this point. I have added a reminder to the function to keep this in mind. Finally, perhaps as a follow-up, we should apply the same restrictions to the forward threader. At some point I'd like to combine the cost models. Tested on x86-64 Linux. p.s. There is a thorough discussion involving the limitations of jump threading involving loops here: https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html gcc/ChangeLog: * tree-pass.h (PROP_loop_opts_done): New. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Intersect with global range. * tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done. * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Disable threading through latches until after loop optimizations have run. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of threading through latches. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. Co-authored-by: Michael Matz <matz(a)suse.de> --- gcc/gimple-range-path.cc | 3 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +-- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 37 ++--------------------- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 17 +---------- gcc/tree-pass.h | 2 ++ gcc/tree-ssa-loop.c | 2 +- gcc/tree-ssa-threadbackward.c | 28 +++++++++++++++-- 7 files changed, 37 insertions(+), 56 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index a4fa3b296ff..c616b65756f 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt) basic_block bb = stmt ? gimple_bb (stmt) : exit_bb (); if (stmt && range_defined_in_block (r, name, bb)) { + if (TREE_CODE (name) == SSA_NAME) + r.intersect (gimple_range_global (name)); + set_cache (r, name); return true; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c index e1c33e86cd7..823ada982ff 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ +/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ void foo(); void bla(); @@ -26,4 +26,4 @@ void thread_latch_through_header (void) case. And we want to thread through the header as well. These are both caught by threading in DOM. */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */ -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */ +/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c index c7bf867b084..ee46759bacc 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -1,41 +1,8 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */ -/* All the threads in the thread1 dump start on a X->BB12 edge, as can - be seen in the dump: - - Registering FSM jump thread: (x, 12) incoming edge; ... - etc - etc - - Before the new evrp, we were threading paths that started at the - following edges: - - Registering FSM jump thread: (10, 12) incoming edge - Registering FSM jump thread: (6, 12) incoming edge - Registering FSM jump thread: (9, 12) incoming edge - - This was because the PHI at BB12 had constant values coming in from - BB10, BB6, and BB9: - - # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)> - - Now with the new evrp, we get: - - # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)> - - Thus, we have 3 more paths that are known to be constant and can be - threaded. Which means that by the second threading pass, we can - only find one profitable path. - - For the record, all these extra constants are better paths coming - out of switches. For example: - - SWITCH_BB -> BBx -> BBy -> BBz -> PHI - - We now know the value of the switch index at PHI. */ /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */ +/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */ int sum0, sum1, sum2, sum3; int foo (char *s, char **ret) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index 5fc2145a432..ba07942f9dd 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,23 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* Here we have the same issue as was commented in ssa-dom-thread-6.c. - The PHI coming into the threader has a lot more constants, so the - threader can thread more paths. - -$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 -252c252 -< # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)> ---- -> # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)> -272a273 - - I spot checked a few and they all have the same pattern. We are - basically tracking the switch index better through multiple - paths. */ - /* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 83941bc0cee..eb75eb17951 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -225,6 +225,8 @@ protected: been optimized. */ #define PROP_gimple_lomp_dev (1 << 16) /* done omp_device_lower */ #define PROP_rtl_split_insns (1 << 17) /* RTL has insns split. */ +#define PROP_loop_opts_done (1 << 18) /* SSA loop optimizations + have completed. */ #define PROP_gimple \ (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp) diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 0cc4b3bbccf..1bbf2f1fb2c 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done = OPTGROUP_LOOP, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ - 0, /* properties_provided */ + PROP_loop_opts_done, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_cleanup_cfg, /* todo_flags_finish */ diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 449232c7715..e72992328de 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "ssa.h" #include "tree-cfgcleanup.h" #include "tree-pretty-print.h" +#include "cfghooks.h" // Path registry for the backwards threader. After all paths have been // registered with register_path(), thread_through_all_blocks() is called @@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers) TAKEN_EDGE, otherwise it is NULL. CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path - would create an irreducible loop. */ + would create an irreducible loop. + + ?? It seems we should be able to loosen some of the restrictions in + this function after loop optimizations have run. */ bool back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, @@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, the last entry in the array when determining if we thread through the loop latch. */ if (loop->latch == bb) - threaded_through_latch = true; + { + threaded_through_latch = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " (latch)"); + } } gimple *stmt = get_gimple_control_stmt (m_path[0]); @@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, "a multiway branch.\n"); return false; } + + /* Threading through an empty latch would cause code to be added to + the latch. This could alter the loop form sufficiently to cause + loop optimizations to fail. Disable these threads until after + loop optimizations have run. */ + if ((threaded_through_latch + || (taken_edge && taken_edge->dest == loop->latch)) + && !(cfun->curr_properties & PROP_loop_opts_done) + && empty_block_p (loop->latch)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + " FAIL: FSM Thread through latch before loop opts would create non-empty latch\n"); + return false; + + } return true; } </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:50f4ae58eb136bc9d802cb98f02b6ff237eb61e0

by ci_notify＠linaro.org

Identified regression caused by *llvm:50f4ae58eb136bc9d802cb98f02b6ff237eb61e0*: commit 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 Author: David Green <david.green(a)arm.com> [AArch64] Correct store ReadAdrBase operand Results regressed to (for first_bad == 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0/results_id: 1 # 447.dealII,[.] _ZNK9MappingQ1ILi3EE12compute_fillERK12TriaIte regressed by 114 from (for last_good == 955c9437fd605216445fbd608de4ef1d96f825e9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-955c9437fd605216445fbd608de4ef1d96f825e9/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 cd investigate-llvm-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 955c9437fd605216445fbd608de4ef1d96f825e9 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 Author: David Green <david.green(a)arm.com> Date: Mon Aug 23 21:07:55 2021 +0100 [AArch64] Correct store ReadAdrBase operand It appears that the Read operand for stores was being placed on the first operand (the stored value) not the address base. This adds a ReadST for the stored value operand, allowing the ReadAdrBase to correctly act upon the address. Differential Revision: https://reviews.llvm.org/D108287 --- llvm/lib/Target/AArch64/AArch64InstrFormats.td | 20 +- llvm/lib/Target/AArch64/AArch64SchedA53.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA55.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA57.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA64FX.td | 1 + llvm/lib/Target/AArch64/AArch64SchedCyclone.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM3.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM4.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM5.td | 1 + llvm/lib/Target/AArch64/AArch64SchedFalkor.td | 1 + llvm/lib/Target/AArch64/AArch64SchedKryo.td | 1 + llvm/lib/Target/AArch64/AArch64SchedTSV110.td | 1 + llvm/lib/Target/AArch64/AArch64SchedThunderX.td | 1 + .../lib/Target/AArch64/AArch64SchedThunderX2T99.td | 1 + .../Target/AArch64/AArch64SchedThunderX3T110.td | 1 + llvm/lib/Target/AArch64/AArch64Schedule.td | 1 + .../llvm-mca/AArch64/Cortex/A55-store-readadv.s | 246 ++++++++++----------- 17 files changed, 148 insertions(+), 133 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td index 10c6fcd5cacd..ea0c62d2045f 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td +++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td @@ -3482,7 +3482,7 @@ multiclass Store8RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed8 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend8:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3492,7 +3492,7 @@ multiclass Store8RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed8 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend8:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3554,7 +3554,7 @@ multiclass Store16RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed16 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend16:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3564,7 +3564,7 @@ multiclass Store16RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed16 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend16:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3626,7 +3626,7 @@ multiclass Store32RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3636,7 +3636,7 @@ multiclass Store32RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3698,7 +3698,7 @@ multiclass Store64RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3708,7 +3708,7 @@ multiclass Store64RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3768,7 +3768,7 @@ multiclass Store128RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, def roW : LoadStore128RO<sz, V, opc, regtype, asm, (outs), (ins regtype:$Rt, GPR64sp:$Rn, GPR32:$Rm, ro_Wextend128:$extend), []>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3776,7 +3776,7 @@ multiclass Store128RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, def roX : LoadStore128RO<sz, V, opc, regtype, asm, (outs), (ins regtype:$Rt, GPR64sp:$Rn, GPR64:$Rm, ro_Xextend128:$extend), []>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } diff --git a/llvm/lib/Target/AArch64/AArch64SchedA53.td b/llvm/lib/Target/AArch64/AArch64SchedA53.td index 65c84b1f39c0..3fef369a4e2b 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA53.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA53.td @@ -149,6 +149,7 @@ def A53WriteFSqrtDP : SchedWriteRes<[A53UnitFPMDS]> { let Latency = 32; // No forwarding for these reads. def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadVLD, 0>; // ALU - Most operands in the ALU pipes are not needed for two cycles. Shiftable diff --git a/llvm/lib/Target/AArch64/AArch64SchedA55.td b/llvm/lib/Target/AArch64/AArch64SchedA55.td index 0e680078c348..34d6fb5fb306 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA55.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA55.td @@ -182,6 +182,7 @@ def CortexA55WriteFSqrtDP : SchedWriteRes<[CortexA55UnitFPDIV]> { let Latency = def : ReadAdvance<ReadVLD, 0>; def : ReadAdvance<ReadExtrHi, 1>; def : ReadAdvance<ReadAdrBase, 1>; +def : ReadAdvance<ReadST, 1>; // ALU - ALU input operands are generally needed in EX1. An operand produced in // in say EX2 can be forwarded for consumption to ALU in EX1, thereby diff --git a/llvm/lib/Target/AArch64/AArch64SchedA57.td b/llvm/lib/Target/AArch64/AArch64SchedA57.td index c1eacca8cc1f..c9addac18ba7 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA57.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA57.td @@ -116,6 +116,7 @@ def : ReadAdvance<ReadIM, 0>; def : ReadAdvance<ReadIMA, 2, [WriteIM32, WriteIM64]>; def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; diff --git a/llvm/lib/Target/AArch64/AArch64SchedA64FX.td b/llvm/lib/Target/AArch64/AArch64SchedA64FX.td index 6df800487ce2..dc551364ed6f 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA64FX.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA64FX.td @@ -761,6 +761,7 @@ def : ReadAdvance<ReadIMA, 0>; def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadVLD, 0>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/AArch64/AArch64SchedCyclone.td b/llvm/lib/Target/AArch64/AArch64SchedCyclone.td index 11df304a974c..310c240966f9 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedCyclone.td +++ b/llvm/lib/Target/AArch64/AArch64SchedCyclone.td @@ -258,6 +258,7 @@ def CyReadAdrBase : SchedReadVariant<[ SchedVar<ScaledIdxPred, [ReadBaseRS]>, // Read base reg after shifting offset. SchedVar<NoSchedPred, [ReadDefault]>]>; // Read base reg with no shift. def : SchedAlias<ReadAdrBase, CyReadAdrBase>; // Map AArch64->Cyclone type. +def : ReadAdvance<ReadST, 0>; //--- // 7.8.9,7.8.11. Load/Store, paired diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td index 6a33258be02c..a96917c9364a 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td @@ -277,6 +277,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td index db066a19b0b6..8c5d6bbf0ceb 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td @@ -581,6 +581,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td index 0429b6ab2ee2..64f88d719aa9 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td @@ -616,6 +616,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedFalkor.td b/llvm/lib/Target/AArch64/AArch64SchedFalkor.td index 8bb95e442249..8c40efd07e8a 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedFalkor.td +++ b/llvm/lib/Target/AArch64/AArch64SchedFalkor.td @@ -111,6 +111,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; // Detailed Refinements // ----------------------------------------------------------------------------- diff --git a/llvm/lib/Target/AArch64/AArch64SchedKryo.td b/llvm/lib/Target/AArch64/AArch64SchedKryo.td index 45964e1ed6de..f824ce462fe0 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedKryo.td +++ b/llvm/lib/Target/AArch64/AArch64SchedKryo.td @@ -117,6 +117,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/AArch64/AArch64SchedTSV110.td b/llvm/lib/Target/AArch64/AArch64SchedTSV110.td index 438371c1b6a8..4a1b5167e89d 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedTSV110.td +++ b/llvm/lib/Target/AArch64/AArch64SchedTSV110.td @@ -113,6 +113,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; def : InstRW<[WriteI], (instrs COPY)>; diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX.td index 125eb284cfd1..f41f12733e69 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX.td @@ -192,6 +192,7 @@ def THXT8XWriteFSqrtDP : SchedWriteRes<[THXT8XUnitFPMDS]> { def : ReadAdvance<ReadExtrHi, 1>; def : ReadAdvance<ReadAdrBase, 2>; def : ReadAdvance<ReadVLD, 2>; +def : ReadAdvance<ReadST, 2>; // FIXME: This needs more targeted benchmarking. // ALU - Most operands in the ALU pipes are not needed for two cycles. Shiftable diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td index 8d8675b7ac6f..0da286e942a0 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td @@ -362,6 +362,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // 3. Instruction Tables. diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td index 00838cc4b9bd..8f03be9be0dd 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td @@ -621,6 +621,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // 3. Instruction Tables. diff --git a/llvm/lib/Target/AArch64/AArch64Schedule.td b/llvm/lib/Target/AArch64/AArch64Schedule.td index 49c0c1782236..4e5a67a3a394 100644 --- a/llvm/lib/Target/AArch64/AArch64Schedule.td +++ b/llvm/lib/Target/AArch64/AArch64Schedule.td @@ -47,6 +47,7 @@ def WriteAdr : SchedWrite; // Address pre/post increment. def WriteLDIdx : SchedWrite; // Load from a register index (maybe scaled). def WriteSTIdx : SchedWrite; // Store to a register index (maybe scaled). +def ReadST : SchedRead; // Read the stored value. def ReadAdrBase : SchedRead; // Read the base resister of a reg-offset LD/ST. // Serialized two-level address load. diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s index ff45caf46e21..ad49a96c27c5 100644 --- a/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s +++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s @@ -125,12 +125,12 @@ stp x0, x1, [x2], #16 # CHECK: Iterations: 100 # CHECK-NEXT: Instructions: 11800 -# CHECK-NEXT: Total Cycles: 20301 +# CHECK-NEXT: Total Cycles: 19801 # CHECK-NEXT: Total uOps: 14400 # CHECK: Dispatch Width: 2 -# CHECK-NEXT: uOps Per Cycle: 0.71 -# CHECK-NEXT: IPC: 0.58 +# CHECK-NEXT: uOps Per Cycle: 0.73 +# CHECK-NEXT: IPC: 0.60 # CHECK-NEXT: Block RThroughput: 72.0 # CHECK: Instruction Info: @@ -401,127 +401,127 @@ stp x0, x1, [x2], #16 # CHECK-NEXT: - - - - - - - - - - - 1.00 stp x0, x1, [x2], #16 # CHECK: Timeline view: -# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 -# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123 +# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 012345678 +# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 -# CHECK: [0,0] DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,1] . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16] -# CHECK-NEXT: [0,2] . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,3] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16]! -# CHECK-NEXT: [0,4] . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,5] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2], #16 -# CHECK-NEXT: [0,6] . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,7] . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2], #16 -# CHECK-NEXT: [0,8] . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,9] . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16]! -# CHECK-NEXT: [0,10] . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,11] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16] -# CHECK-NEXT: [0,12] . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,13] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2], #16 -# CHECK-NEXT: [0,14] . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,15] . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16]! -# CHECK-NEXT: [0,16] . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,17] . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16] -# CHECK-NEXT: [0,18] . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,19] . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2], #16 -# CHECK-NEXT: [0,20] . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,21] . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16]! -# CHECK-NEXT: [0,22] . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,23] . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16] -# CHECK-NEXT: [0,24] . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,25] . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2], #16 -# CHECK-NEXT: [0,26] . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,27] . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16]! -# CHECK-NEXT: [0,28] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,29] . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16] -# CHECK-NEXT: [0,30] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,31] . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2], #16 -# CHECK-NEXT: [0,32] . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,33] . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16]! -# CHECK-NEXT: [0,34] . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,35] . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16] -# CHECK-NEXT: [0,36] . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,37] . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2], #16 -# CHECK-NEXT: [0,38] . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,39] . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16]! -# CHECK-NEXT: [0,40] . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,41] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16] -# CHECK-NEXT: [0,42] . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,43] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2], #16 -# CHECK-NEXT: [0,44] . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,45] . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16]! -# CHECK-NEXT: [0,46] . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,47] . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16] -# CHECK-NEXT: [0,48] . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,49] . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2], #16 -# CHECK-NEXT: [0,50] . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,51] . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16]! -# CHECK-NEXT: [0,52] . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,53] . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16] -# CHECK-NEXT: [0,54] . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,55] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . str d0, [x2, x2, lsl #3] -# CHECK-NEXT: [0,56] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,57] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . str q0, [x2, w0, sxtw] -# CHECK-NEXT: [0,58] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,59] . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . str w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,60] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,61] . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . str x0, [x2, w0, sxtw] -# CHECK-NEXT: [0,62] . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,63] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strb w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,64] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,65] . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strh w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,66] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,67] . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . stur b0, [x2, #255] -# CHECK-NEXT: [0,68] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,69] . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . stur d0, [x2, #255] -# CHECK-NEXT: [0,70] . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,71] . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . stur h0, [x2, #255] -# CHECK-NEXT: [0,72] . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,73] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur q0, [x2, #255] -# CHECK-NEXT: [0,74] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,75] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur s0, [x2, #255] -# CHECK-NEXT: [0,76] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,77] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . stur w0, [x2, #255] -# CHECK-NEXT: [0,78] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,79] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . sturb w0, [x2, #255] -# CHECK-NEXT: [0,80] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,81] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . sturh w0, [x2, #255] -# CHECK-NEXT: [0,82] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp d0, d1, [x2, #16] -# CHECK-NEXT: [0,84] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp q0, q1, [x2, #16] -# CHECK-NEXT: [0,86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . stnp s0, s1, [x2, #16] -# CHECK-NEXT: [0,88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . stnp s0, s1, [x2, #16] -# CHECK-NEXT: [0,90] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,91] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . stnp w0, w1, [x2, #16] -# CHECK-NEXT: [0,92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,93] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stnp x0, x1, [x2, #16] -# CHECK-NEXT: [0,94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stp d0, d1, [x2, #16] -# CHECK-NEXT: [0,96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,97] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . stp d0, d1, [x2, #16]! -# CHECK-NEXT: [0,98] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,99] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . stp d0, d1, [x2], #16 -# CHECK-NEXT: [0,100] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . stp q0, q1, [x2, #16] -# CHECK-NEXT: [0,102] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,103] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . stp q0, q1, [x2, #16]! -# CHECK-NEXT: [0,104] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,105] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . stp q0, q1, [x2], #16 -# CHECK-NEXT: [0,106] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . add x2, x3, #1 -# CHECK-NEXT: [0,107] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16] -# CHECK-NEXT: [0,108] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . add x2, x3, #1 -# CHECK-NEXT: [0,109] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16]! -# CHECK-NEXT: [0,110] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . add x2, x3, #1 -# CHECK-NEXT: [0,111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . stp s0, s1, [x2], #16 -# CHECK-NEXT: [0,112] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 -# CHECK-NEXT: [0,113] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . stp x0, x1, [x2, #16] -# CHECK-NEXT: [0,114] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 -# CHECK-NEXT: [0,115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . stp x0, x1, [x2, #16]! -# CHECK-NEXT: [0,116] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE. add x2, x3, #1 -# CHECK-NEXT: [0,117] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE stp x0, x1, [x2], #16 +# CHECK: [0,0] DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,1] . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16] +# CHECK-NEXT: [0,2] . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,3] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16]! +# CHECK-NEXT: [0,4] . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,5] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2], #16 +# CHECK-NEXT: [0,6] . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,7] . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2], #16 +# CHECK-NEXT: [0,8] . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,9] . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16]! +# CHECK-NEXT: [0,10] . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,11] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16] +# CHECK-NEXT: [0,12] . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,13] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2], #16 +# CHECK-NEXT: [0,14] . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,15] . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16]! +# CHECK-NEXT: [0,16] . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,17] . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16] +# CHECK-NEXT: [0,18] . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,19] . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2], #16 +# CHECK-NEXT: [0,20] . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,21] . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16]! +# CHECK-NEXT: [0,22] . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,23] . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16] +# CHECK-NEXT: [0,24] . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,25] . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2], #16 +# CHECK-NEXT: [0,26] . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,27] . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16]! +# CHECK-NEXT: [0,28] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,29] . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16] +# CHECK-NEXT: [0,30] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,31] . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2], #16 +# CHECK-NEXT: [0,32] . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,33] . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16]! +# CHECK-NEXT: [0,34] . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,35] . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16] +# CHECK-NEXT: [0,36] . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,37] . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2], #16 +# CHECK-NEXT: [0,38] . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,39] . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16]! +# CHECK-NEXT: [0,40] . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,41] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16] +# CHECK-NEXT: [0,42] . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,43] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2], #16 +# CHECK-NEXT: [0,44] . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,45] . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16]! +# CHECK-NEXT: [0,46] . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,47] . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16] +# CHECK-NEXT: [0,48] . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,49] . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2], #16 +# CHECK-NEXT: [0,50] . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,51] . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16]! +# CHECK-NEXT: [0,52] . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,53] . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16] +# CHECK-NEXT: [0,54] . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,55] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . str d0, [x2, x2, lsl #3] +# CHECK-NEXT: [0,56] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,57] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . str q0, [x2, w0, sxtw] +# CHECK-NEXT: [0,58] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,59] . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . str w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,60] . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,61] . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . str x0, [x2, w0, sxtw] +# CHECK-NEXT: [0,62] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,63] . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strb w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,64] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,65] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strh w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,66] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,67] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . stur b0, [x2, #255] +# CHECK-NEXT: [0,68] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,69] . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . stur d0, [x2, #255] +# CHECK-NEXT: [0,70] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,71] . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . stur h0, [x2, #255] +# CHECK-NEXT: [0,72] . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,73] . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur q0, [x2, #255] +# CHECK-NEXT: [0,74] . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,75] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur s0, [x2, #255] +# CHECK-NEXT: [0,76] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,77] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . stur w0, [x2, #255] +# CHECK-NEXT: [0,78] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,79] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . sturb w0, [x2, #255] +# CHECK-NEXT: [0,80] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,81] . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . sturh w0, [x2, #255] +# CHECK-NEXT: [0,82] . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp d0, d1, [x2, #16] +# CHECK-NEXT: [0,84] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp q0, q1, [x2, #16] +# CHECK-NEXT: [0,86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . stnp s0, s1, [x2, #16] +# CHECK-NEXT: [0,88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . stnp s0, s1, [x2, #16] +# CHECK-NEXT: [0,90] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,91] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . stnp w0, w1, [x2, #16] +# CHECK-NEXT: [0,92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,93] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stnp x0, x1, [x2, #16] +# CHECK-NEXT: [0,94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stp d0, d1, [x2, #16] +# CHECK-NEXT: [0,96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,97] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . stp d0, d1, [x2, #16]! +# CHECK-NEXT: [0,98] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,99] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . stp d0, d1, [x2], #16 +# CHECK-NEXT: [0,100] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . stp q0, q1, [x2, #16] +# CHECK-NEXT: [0,102] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,103] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . stp q0, q1, [x2, #16]! +# CHECK-NEXT: [0,104] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,105] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . stp q0, q1, [x2], #16 +# CHECK-NEXT: [0,106] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . add x2, x3, #1 +# CHECK-NEXT: [0,107] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16] +# CHECK-NEXT: [0,108] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . add x2, x3, #1 +# CHECK-NEXT: [0,109] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16]! +# CHECK-NEXT: [0,110] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . add x2, x3, #1 +# CHECK-NEXT: [0,111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . stp s0, s1, [x2], #16 +# CHECK-NEXT: [0,112] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 +# CHECK-NEXT: [0,113] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . stp x0, x1, [x2, #16] +# CHECK-NEXT: [0,114] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 +# CHECK-NEXT: [0,115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . stp x0, x1, [x2, #16]! +# CHECK-NEXT: [0,116] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE. add x2, x3, #1 +# CHECK-NEXT: [0,117] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE stp x0, x1, [x2], #16 # CHECK: Average Wait times (based on the timeline view): # CHECK-NEXT: [0]: Executions </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:a26f1bf67ec70f72e64101cf483b26466928fc38

by ci_notify＠linaro.org

Identified regression caused by *llvm:a26f1bf67ec70f72e64101cf483b26466928fc38*: commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> [PassManager] Run additional LICM before LoopRotate Results regressed to (for first_bad == a26f1bf67ec70f72e64101cf483b26466928fc38) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os_mthumb artifacts/build-a26f1bf67ec70f72e64101cf483b26466928fc38/results_id: 1 # 447.dealII,[.] SparseMatrix<double>::vmult<Vector<double>. Ve regressed by 111 from (for last_good == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os_mthumb artifacts/build-bb1e5399e4586239d6424f5eea5a9f06c52ebe9b/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 cd investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a26f1bf67ec70f72e64101cf483b26466928fc38 ../artifacts/test.sh # Reproduce last_good build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 --- llvm/lib/Passes/PassBuilder.cpp | 10 +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | 4 + llvm/test/CodeGen/AMDGPU/opt-pipeline.ll | 30 +++++--- llvm/test/Other/new-pm-defaults.ll | 7 +- llvm/test/Other/new-pm-thinlto-defaults.ll | 7 +- .../Other/new-pm-thinlto-postlink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-postlink-samplepgo-defaults.ll | 7 +- .../Other/new-pm-thinlto-prelink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-prelink-samplepgo-defaults.ll | 5 +- llvm/test/Other/opt-O2-pipeline.ll | 10 ++- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | 10 ++- llvm/test/Other/opt-O3-pipeline.ll | 10 ++- llvm/test/Other/opt-Os-pipeline.ll | 10 ++- llvm/test/Other/pass-pipelines.ll | 3 + llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll | 25 ++++--- .../PhaseOrdering/X86/spurious-peeling.ll | 87 +++++++++------------- llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll | 78 +++++++++---------- .../loop-rotation-vs-common-code-hoisting.ll | 22 +++--- 18 files changed, 193 insertions(+), 150 deletions(-) diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 3a325277e370..5a2285215769 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true, isLTOPreLink(Phase))); // TODO: Investigate promotion cap for O1. @@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + // Disable header duplication in loop rotation at -Oz. LPM1.addPass( LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 109e7c97ff1b..2c80a16febef 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses( MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO)); // TODO: Investigate promotion cap for O1. diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 34e5e6c647da..5e33d968c710 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -129,16 +129,20 @@ ; GCN-O1-NEXT: Simplify the CFG ; GCN-O1-NEXT: Reassociate expressions ; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O1-NEXT: Function Alias Analysis Results +; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: Canonicalize natural loops ; GCN-O1-NEXT: LCSSA Verifier ; GCN-O1-NEXT: Loop-Closed SSA Form Pass -; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O1-NEXT: Function Alias Analysis Results ; GCN-O1-NEXT: Scalar Evolution Analysis +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Loop Pass Manager +; GCN-O1-NEXT: Loop Invariant Code Motion ; GCN-O1-NEXT: Loop Pass Manager ; GCN-O1-NEXT: Rotate Loops -; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Lazy Branch Probability Analysis ; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: Loop Pass Manager @@ -451,16 +455,20 @@ ; GCN-O2-NEXT: Simplify the CFG ; GCN-O2-NEXT: Reassociate expressions ; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O2-NEXT: Function Alias Analysis Results +; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: Canonicalize natural loops ; GCN-O2-NEXT: LCSSA Verifier ; GCN-O2-NEXT: Loop-Closed SSA Form Pass -; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O2-NEXT: Function Alias Analysis Results ; GCN-O2-NEXT: Scalar Evolution Analysis +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis +; GCN-O2-NEXT: Loop Pass Manager +; GCN-O2-NEXT: Loop Invariant Code Motion ; GCN-O2-NEXT: Loop Pass Manager ; GCN-O2-NEXT: Rotate Loops -; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Lazy Branch Probability Analysis ; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: Loop Pass Manager @@ -810,16 +818,20 @@ ; GCN-O3-NEXT: Simplify the CFG ; GCN-O3-NEXT: Reassociate expressions ; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O3-NEXT: Function Alias Analysis Results +; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Natural Loop Information ; GCN-O3-NEXT: Canonicalize natural loops ; GCN-O3-NEXT: LCSSA Verifier ; GCN-O3-NEXT: Loop-Closed SSA Form Pass -; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O3-NEXT: Function Alias Analysis Results ; GCN-O3-NEXT: Scalar Evolution Analysis +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis +; GCN-O3-NEXT: Loop Pass Manager +; GCN-O3-NEXT: Loop Invariant Code Motion ; GCN-O3-NEXT: Loop Pass Manager ; GCN-O3-NEXT: Rotate Loops -; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Lazy Branch Probability Analysis ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll index 01b02b8fd482..337a0857701c 100644 --- a/llvm/test/Other/new-pm-defaults.ll +++ b/llvm/test/Other/new-pm-defaults.ll @@ -113,9 +113,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}> ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -156,6 +156,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll b/llvm/test/Other/new-pm-thinlto-defaults.ll index fbf47de87eeb..bba43dd50e7a 100644 --- a/llvm/test/Other/new-pm-thinlto-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-defaults.ll @@ -98,9 +98,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -139,6 +139,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll index 4bcf70e15a5b..57f0e0da73b6 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll @@ -68,10 +68,10 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -112,6 +112,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll index 1071d28432b9..0e0e2854b8df 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll @@ -78,9 +78,9 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -121,6 +121,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll index e2f1385cf52b..4cfb9825c97e 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll @@ -93,10 +93,10 @@ ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo @@ -158,6 +158,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll index d4dc552aea01..a05555c57003 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll @@ -73,8 +73,8 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis @@ -116,6 +116,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index f7217c122fdb..a3b01e5464d4 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -101,16 +101,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll index 6b98c1f80d9e..fafd5c8fdcb8 100644 --- a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll +++ b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 00a1d61ac058..103d49bbbbab 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index 21f9b8c6009e..508c21edbc68 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -87,16 +87,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/pass-pipelines.ll b/llvm/test/Other/pass-pipelines.ll index ccd364d5d740..768e8343529e 100644 --- a/llvm/test/Other/pass-pipelines.ll +++ b/llvm/test/Other/pass-pipelines.ll @@ -53,6 +53,9 @@ ; CHECK-O2-NEXT: FunctionPass Manager ; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager +; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager ; CHECK-O2-NOT: Manager ; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index 82deee9f367b..8f43029fa303 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -22,30 +22,33 @@ define dso_local i32 @main() { ; CHECK-NEXT: bb: ; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4 -; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]] +; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4 +; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]] ; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0 -; CHECK-NEXT: br label [[BB1:%.*]] -; CHECK: bb1: -; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]] -; CHECK: bb19.preheader: +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]] +; CHECK: bb27.preheader: ; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]] ; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4 ; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0 -; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]] -; CHECK: bb13.preheader.bb27.thread.split_crit_edge: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 +; CHECK-NEXT: br label [[BB27:%.*]] +; CHECK: bb27.thread: ; CHECK-NEXT: store i32 0, i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 0, i32* @c, align 4 ; CHECK-NEXT: br label [[BB32:%.*]] +; CHECK: bb27: +; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]] ; CHECK: bb32.loopexit: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: br label [[BB32]] ; CHECK: bb32: -; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ] +; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ] ; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4 ; CHECK-NEXT: ret i32 0 +; CHECK: bb36: +; CHECK-NEXT: store i32 1, i32* @c, align 4 +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]] ; bb: %i = alloca i32, align 4 diff --git a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll index 3e659414d982..4661bd8a36cc 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll @@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; OLDPM-NEXT: entry: ; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; OLDPM: for.body7.lr.ph.i: ; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] -; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]] -; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]] +; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] +; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]] +; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]] +; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; OLDPM: for.body7.i: -; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]] +; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1 +; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 ; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] ; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; OLDPM: _ZN12FloatVecPair6vecIncEv.exit: @@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; NEWPM-NEXT: entry: ; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; NEWPM: for.body7.lr.ph.i: ; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] -; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]] -; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]] -; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1 +; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] +; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]] +; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]] ; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; NEWPM: for.body7.i: -; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]] +; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]] -; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]] -; NEWPM: for.body7.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1 -; NEWPM-NEXT: br label [[FOR_BODY7_I]] +; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 +; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] +; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; NEWPM: _ZN12FloatVecPair6vecIncEv.exit: ; NEWPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll index 280f849dbb35..8b8b535f1a77 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll @@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-LABEL: @vdiv( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0 -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]] -; CHECK: for.body.lr.ph: +; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 -; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]] ; CHECK: vector.memcheck: ; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]] ; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]] ; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]] ; CHECK: vector.ph: ; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292 ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0 @@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7 ; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4 ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]] ; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8 ; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]] ; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12 ; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]] ; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16 ; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4 ; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0 -; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] ; CHECK: middle.block.unr-lcssa: ; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0 @@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ] ; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]] ; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4 ; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1 ; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]] -; CHECK: for.body.preheader: -; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]] +; CHECK: for.body.preheader8: +; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1 ; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 -; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0 -; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] +; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 +; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0 +; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] ; CHECK: for.body.prol.preheader: ; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]] ; CHECK: for.body.prol: ; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ] -; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ] +; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ] ; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]] ; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1 ; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1 ; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]] ; CHECK: for.body.prol.loopexit: -; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] +; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] ; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3 -; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]] -; CHECK: for.body.preheader.new: +; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]] +; CHECK: for.body.preheader8.new: ; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]] ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]] -; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]] ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2 ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]] ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3 ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]] ; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4 ; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]] ; CHECK: for.end: ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll index 7d7d18a5247d..bb320af193e3 100644 --- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll +++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll @@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_OLDPM: for.cond.preheader: +; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_OLDPM: for.body.preheader: ; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]] +; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_OLDPM: for.cond.cleanup: ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f2() ; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_OLDPM: for.body: -; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ] +; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f1() -; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1 +; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1 ; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]] ; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] ; ROTATED_LATER_OLDPM: return: @@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_NEWPM: for.cond.preheader: +; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_NEWPM: for.body.preheader: ; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]] -; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_NEWPM: for.cond.cleanup: ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f2() ; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_NEWPM: for.body: -; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ] +; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f1() ; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]] ; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]] ; ROTATED_LATER_NEWPM: return: ; ROTATED_LATER_NEWPM-NEXT: ret void </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 10 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Respin of a linux-user cleanup patchset + Code review, as usual * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Working on version 2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. -- PMM

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-aarch64-lts-allyesconfig - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_kernel/gnu-master-aarch64-lts-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-allyesconfig Culprit: <cut> commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Author: qing zhao <qing.zhao(a)oracle.com> Date: Thu Sep 9 15:44:49 2021 -0700 Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test. </cut> Results regressed to (for first_bad == a25e0b5e6ac8a77a71c229e0a7b744603365b0e9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29 # First few build errors in logs: from (for last_good == 5fe0865ab788bdc387b284a3ad57e5a95a767b18) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19270 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 cd investigate-gcc-a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5fe0865ab788bdc387b284a3ad57e5a95a767b18 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Full commit (up to 1000 lines): <cut> commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Author: qing zhao <qing.zhao(a)oracle.com> Date: Thu Sep 9 15:44:49 2021 -0700 Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test. --- gcc/builtins.c | 3 +- gcc/builtins.h | 1 + gcc/c-family/c-attribs.c | 27 +++ gcc/common.opt | 16 ++ gcc/doc/extend.texi | 16 ++ gcc/doc/invoke.texi | 41 +++- gcc/flag-types.h | 7 + gcc/gimple-fold.c | 54 +++-- gcc/gimplify.c | 151 ++++++++++++- gcc/internal-fn.c | 99 +++++++++ gcc/internal-fn.def | 4 + gcc/testsuite/c-c++-common/auto-init-1.c | 39 ++++ gcc/testsuite/c-c++-common/auto-init-10.c | 22 ++ gcc/testsuite/c-c++-common/auto-init-11.c | 14 ++ gcc/testsuite/c-c++-common/auto-init-12.c | 14 ++ gcc/testsuite/c-c++-common/auto-init-13.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-14.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-15.c | 13 ++ gcc/testsuite/c-c++-common/auto-init-16.c | 13 ++ gcc/testsuite/c-c++-common/auto-init-2.c | 39 ++++ gcc/testsuite/c-c++-common/auto-init-3.c | 19 ++ gcc/testsuite/c-c++-common/auto-init-4.c | 19 ++ gcc/testsuite/c-c++-common/auto-init-5.c | 21 ++ gcc/testsuite/c-c++-common/auto-init-6.c | 21 ++ gcc/testsuite/c-c++-common/auto-init-7.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-8.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-9.c | 20 ++ gcc/testsuite/c-c++-common/auto-init-esra.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-padding-1.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-padding-2.c | 114 ++++++++++ gcc/testsuite/c-c++-common/auto-init-padding-3.c | 114 ++++++++++ gcc/testsuite/g++.dg/auto-init-uninit-pred-1_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-2_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-3_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-4.C | 3 + gcc/testsuite/gcc.dg/auto-init-sra-1.c | 24 +++ gcc/testsuite/gcc.dg/auto-init-sra-2.c | 24 +++ gcc/testsuite/gcc.dg/auto-init-uninit-1.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-12.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-13.c | 10 + gcc/testsuite/gcc.dg/auto-init-uninit-14.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-15.c | 26 +++ gcc/testsuite/gcc.dg/auto-init-uninit-16.c | 25 +++ gcc/testsuite/gcc.dg/auto-init-uninit-17.c | 15 ++ gcc/testsuite/gcc.dg/auto-init-uninit-18.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-19.c | 26 +++ gcc/testsuite/gcc.dg/auto-init-uninit-2.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-20.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-21.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-22.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-23.c | 27 +++ gcc/testsuite/gcc.dg/auto-init-uninit-24.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-25.c | 23 ++ gcc/testsuite/gcc.dg/auto-init-uninit-26.c | 23 ++ gcc/testsuite/gcc.dg/auto-init-uninit-3.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-34.c | 60 ++++++ gcc/testsuite/gcc.dg/auto-init-uninit-36.c | 238 +++++++++++++++++++++ gcc/testsuite/gcc.dg/auto-init-uninit-37.c | 156 ++++++++++++++ gcc/testsuite/gcc.dg/auto-init-uninit-4.c | 10 + gcc/testsuite/gcc.dg/auto-init-uninit-5.c | 6 + gcc/testsuite/gcc.dg/auto-init-uninit-6.c | 7 + gcc/testsuite/gcc.dg/auto-init-uninit-8.c | 8 + gcc/testsuite/gcc.dg/auto-init-uninit-9.c | 8 + gcc/testsuite/gcc.dg/auto-init-uninit-A.c | 7 + gcc/testsuite/gcc.dg/auto-init-uninit-B.c | 17 ++ gcc/testsuite/gcc.dg/auto-init-uninit-C.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-H.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-I.c | 3 + gcc/testsuite/gcc.target/aarch64/auto-init-1.c | 32 +++ gcc/testsuite/gcc.target/aarch64/auto-init-2.c | 35 +++ gcc/testsuite/gcc.target/aarch64/auto-init-3.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-4.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-5.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-6.c | 18 ++ gcc/testsuite/gcc.target/aarch64/auto-init-7.c | 32 +++ gcc/testsuite/gcc.target/aarch64/auto-init-8.c | 32 +++ .../gcc.target/aarch64/auto-init-padding-1.c | 17 ++ .../gcc.target/aarch64/auto-init-padding-10.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-11.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-12.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-2.c | 18 ++ .../gcc.target/aarch64/auto-init-padding-3.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-4.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-5.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-6.c | 20 ++ .../gcc.target/aarch64/auto-init-padding-7.c | 20 ++ .../gcc.target/aarch64/auto-init-padding-8.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-9.c | 21 ++ gcc/testsuite/gcc.target/i386/auto-init-1.c | 32 +++ gcc/testsuite/gcc.target/i386/auto-init-2.c | 36 ++++ gcc/testsuite/gcc.target/i386/auto-init-21.c | 14 ++ gcc/testsuite/gcc.target/i386/auto-init-22.c | 14 ++ gcc/testsuite/gcc.target/i386/auto-init-23.c | 13 ++ gcc/testsuite/gcc.target/i386/auto-init-24.c | 13 ++ gcc/testsuite/gcc.target/i386/auto-init-3.c | 17 ++ gcc/testsuite/gcc.target/i386/auto-init-4.c | 20 ++ gcc/testsuite/gcc.target/i386/auto-init-5.c | 20 ++ gcc/testsuite/gcc.target/i386/auto-init-6.c | 19 ++ gcc/testsuite/gcc.target/i386/auto-init-7.c | 33 +++ gcc/testsuite/gcc.target/i386/auto-init-8.c | 35 +++ .../gcc.target/i386/auto-init-padding-1.c | 19 ++ .../gcc.target/i386/auto-init-padding-10.c | 21 ++ .../gcc.target/i386/auto-init-padding-11.c | 26 +++ .../gcc.target/i386/auto-init-padding-12.c | 26 +++ .../gcc.target/i386/auto-init-padding-2.c | 19 ++ .../gcc.target/i386/auto-init-padding-3.c | 30 +++ .../gcc.target/i386/auto-init-padding-4.c | 28 +++ .../gcc.target/i386/auto-init-padding-5.c | 21 ++ .../gcc.target/i386/auto-init-padding-6.c | 22 ++ .../gcc.target/i386/auto-init-padding-7.c | 22 ++ .../gcc.target/i386/auto-init-padding-8.c | 22 ++ .../gcc.target/i386/auto-init-padding-9.c | 22 ++ gcc/tree-cfg.c | 47 +++- gcc/tree-sra.c | 124 ++++++++++- gcc/tree-ssa-structalias.c | 3 + gcc/tree-ssa-uninit.c | 48 +++++ gcc/tree-ssa.c | 40 ++++ gcc/tree.c | 13 ++ 118 files changed, 3131 insertions(+), 44 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 99548627761..3e57eb03af0 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -142,7 +142,6 @@ static rtx expand_builtin_strcpy (tree, rtx); static rtx expand_builtin_strcpy_args (tree, tree, tree, rtx); static rtx expand_builtin_stpcpy (tree, rtx, machine_mode); static rtx expand_builtin_strncpy (tree, rtx); -static rtx expand_builtin_memset (tree, rtx, machine_mode); static rtx expand_builtin_memset_args (tree, tree, tree, rtx, machine_mode, tree); static rtx expand_builtin_bzero (tree); static rtx expand_builtin_strlen (tree, rtx, machine_mode); @@ -3872,7 +3871,7 @@ builtin_memset_gen_str (void *data, void *prev, try to get the result in TARGET, if convenient (and in mode MODE if that's convenient). */ -static rtx +rtx expand_builtin_memset (tree exp, rtx target, machine_mode mode) { if (!validate_arglist (exp, diff --git a/gcc/builtins.h b/gcc/builtins.h index 16b47ac1a7b..d330b78e591 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -114,6 +114,7 @@ extern rtx builtin_strncpy_read_str (void *, void *, HOST_WIDE_INT, fixed_size_mode); extern rtx builtin_memset_read_str (void *, void *, HOST_WIDE_INT, fixed_size_mode); +extern rtx expand_builtin_memset (tree, rtx, machine_mode); extern rtx expand_builtin_saveregs (void); extern tree std_build_builtin_va_list (void); extern tree std_fn_abi_va_list (tree); diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index d14e9c441b3..007b928c54b 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -83,6 +83,7 @@ static tree handle_artificial_attribute (tree *, tree, tree, int, bool *); static tree handle_flatten_attribute (tree *, tree, tree, int, bool *); static tree handle_error_attribute (tree *, tree, tree, int, bool *); static tree handle_used_attribute (tree *, tree, tree, int, bool *); +static tree handle_uninitialized_attribute (tree *, tree, tree, int, bool *); static tree handle_externally_visible_attribute (tree *, tree, tree, int, bool *); static tree handle_no_reorder_attribute (tree *, tree, tree, int, @@ -333,6 +334,8 @@ const struct attribute_spec c_common_attribute_table[] = handle_used_attribute, NULL }, { "unused", 0, 0, false, false, false, false, handle_unused_attribute, NULL }, + { "uninitialized", 0, 0, true, false, false, false, + handle_uninitialized_attribute, NULL }, { "retain", 0, 0, true, false, false, false, handle_retain_attribute, NULL }, { "externally_visible", 0, 0, true, false, false, false, @@ -1617,6 +1620,30 @@ handle_retain_attribute (tree *pnode, tree name, tree ARG_UNUSED (args), return NULL_TREE; } +/* Handle an "uninitialized" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_uninitialized_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + tree decl = *node; + if (!VAR_P (decl)) + { + warning (OPT_Wattributes, "%qE attribute ignored because %qD " + "is not a variable", name, decl); + *no_add_attrs = true; + } + else if (TREE_STATIC (decl) || DECL_EXTERNAL (decl)) + { + warning (OPT_Wattributes, "%qE attribute ignored because %qD " + "is not a local variable", name, decl); + *no_add_attrs = true; + } + + return NULL_TREE; +} + /* Handle a "externally_visible" attribute; arguments as in struct attribute_spec.handler. */ diff --git a/gcc/common.opt b/gcc/common.opt index f103a7de004..b921f5e3b25 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -3081,6 +3081,22 @@ ftree-scev-cprop Common Var(flag_tree_scev_cprop) Init(1) Optimization Enable copy propagation of scalar-evolution information. +ftrivial-auto-var-init= +Common Joined RejectNegative Enum(auto_init_type) Var(flag_auto_var_init) Init(AUTO_INIT_UNINITIALIZED) Optimization +-ftrivial-auto-var-init=[uninitialized|pattern|zero] Add initializations to automatic variables. + +Enum +Name(auto_init_type) Type(enum auto_init_type) UnknownError(unrecognized automatic variable initialization type %qs) + +EnumValue +Enum(auto_init_type) String(uninitialized) Value(AUTO_INIT_UNINITIALIZED) + +EnumValue +Enum(auto_init_type) String(pattern) Value(AUTO_INIT_PATTERN) + +EnumValue +Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO) + ; -fverbose-asm causes extra commentary information to be produced in ; the generated assembly code (to make it more readable). This option ; is generally only of use to those who actually need to read the diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 52bc4e5b76e..8b324a097a4 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7610,6 +7610,22 @@ will be placed in new, unique sections. This additional functionality requires Binutils version 2.36 or later. +@item uninitialized +@cindex @code{uninitialized} variable attribute +This attribute, attached to a variable with automatic storage, means that +the variable should not be automatically initialized by the compiler when +the option @code{-ftrivial-auto-var-init} presents. + +With the option @code{-ftrivial-auto-var-init}, all the automatic variables +that do not have explicit initializers will be initialized by the compiler. +These additional compiler initializations might incur run-time overhead, +sometimes dramatically. This attribute can be used to mark some variables +to be excluded from such automatical initialization in order to reduce runtime +overhead. + +This attribute has no effect when the option @code{-ftrivial-auto-var-init} +does not present. + @item vector_size (@var{bytes}) @cindex @code{vector_size} variable attribute This attribute specifies the vector size for the type of the declared diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d4b3a66ee4f..b08a5eb4d73 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -573,9 +573,9 @@ Objective-C and Objective-C++ Dialects}. -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra @gol -ftree-switch-conversion -ftree-tail-merge @gol --ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol --funit-at-a-time -funroll-all-loops -funroll-loops @gol --funsafe-math-optimizations -funswitch-loops @gol +-ftree-ter -ftree-vectorize -ftree-vrp -ftrivial-auto-var-init @gol +-funconstrained-commons -funit-at-a-time -funroll-all-loops @gol +-funroll-loops -funsafe-math-optimizations -funswitch-loops @gol -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol -fweb -fwhole-program -fwpa -fuse-linker-plugin -fzero-call-used-regs @gol --param @var{name}=@var{value} @@ -11843,6 +11843,41 @@ Perform basic block vectorization on trees. This flag is enabled by default at @option{-O3} and by @option{-ftree-vectorize}, @option{-fprofile-use}, and @option{-fauto-profile}. +@item -ftrivial-auto-var-init=@var{choice} +@opindex ftrivial-auto-var-init +Initialize automatic variables with either a pattern or with zeroes to increase +the security and predictability of a program by preventing uninitialized memory +disclosure and use. +GCC still considers an automatic variable that doesn't have an explicit +initializer as uninitialized, -Wuninitialized will still report warning messages +on such automatic variables. +With this option, GCC will also initialize any padding of automatic variables +that have structure or union types to zeroes. + +The three values of @var{choice} are: + +@itemize @bullet +@item +@samp{uninitialized} doesn't initialize any automatic variables. +This is C and C++'s default. + +@item +@samp{pattern} Initialize automatic variables with values which will likely +transform logic bugs into crashes down the line, are easily recognized in a +crash dump and without being values that programmers can rely on for useful +program semantics. +The current value is byte-repeatable pattern with byte "0xFE". +The values used for pattern initialization might be changed in the future. + +@item +@samp{zero} Initialize automatic variables with zeroes. +@end itemize + +The default is @samp{uninitialized}. + +You can control this behavior for a specific variable by using the variable +attribute @code{uninitialized} (@pxref{Variable Attributes}). + @item -fvect-cost-model=@var{model} @opindex fvect-cost-model Alter the cost model used for vectorization. The @var{model} argument diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 45a2338d5f6..5bd1f771c8b 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -281,6 +281,13 @@ enum vect_cost_model { VECT_COST_MODEL_DEFAULT = 1 }; +/* Automatic variable initialization type. */ +enum auto_init_type { + AUTO_INIT_UNINITIALIZED = 0, + AUTO_INIT_PATTERN = 1, + AUTO_INIT_ZERO = 2 +}; + /* Different instrumentation modes. */ enum sanitize_code { /* AddressSanitizer. */ diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 3f2c176cff6..dd0e6b5daff 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -4518,12 +4518,14 @@ clear_padding_add_padding (clear_padding_struct *buf, } } -static void clear_padding_type (clear_padding_struct *, tree, HOST_WIDE_INT); +static void clear_padding_type (clear_padding_struct *, tree, + HOST_WIDE_INT, bool); /* Clear padding bits of union type TYPE. */ static void -clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) +clear_padding_union (clear_padding_struct *buf, tree type, + HOST_WIDE_INT sz, bool for_auto_init) { clear_padding_struct *union_buf; HOST_WIDE_INT start_off = 0, next_off = 0; @@ -4568,7 +4570,7 @@ clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) continue; gcc_assert (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE && !COMPLETE_TYPE_P (TREE_TYPE (field))); - if (!buf->clear_in_mask) + if (!buf->clear_in_mask && !for_auto_init) error_at (buf->loc, "flexible array member %qD does not have " "well defined padding bits for %qs", field, "__builtin_clear_padding"); @@ -4579,7 +4581,7 @@ clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) union_buf->off = start_off; union_buf->size = start_size; memset (union_buf->buf, ~0, start_size); - clear_padding_type (union_buf, TREE_TYPE (field), fldsz); + clear_padding_type (union_buf, TREE_TYPE (field), fldsz, for_auto_init); clear_padding_add_padding (union_buf, sz - fldsz); clear_padding_flush (union_buf, true); } @@ -4649,7 +4651,8 @@ clear_padding_type_may_have_padding_p (tree type) __builtin_clear_padding (buf.base); */ static void -clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) +clear_padding_emit_loop (clear_padding_struct *buf, tree type, + tree end, bool for_auto_init) { tree l1 = create_artificial_label (buf->loc); tree l2 = create_artificial_label (buf->loc); @@ -4660,7 +4663,7 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) g = gimple_build_label (l1); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); - clear_padding_type (buf, type, buf->sz); + clear_padding_type (buf, type, buf->sz, for_auto_init); clear_padding_flush (buf, true); g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base, size_int (buf->sz)); @@ -4678,10 +4681,16 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) } /* Clear padding bits for TYPE. Called recursively from - gimple_fold_builtin_clear_padding. */ + gimple_fold_builtin_clear_padding. If FOR_AUTO_INIT is true, + the __builtin_clear_padding is not called by the end user, + instead, it's inserted by the compiler to initialize the + paddings of automatic variable. Therefore, we should not + emit the error messages for flexible array members to confuse + the end user. */ static void -clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) +clear_padding_type (clear_padding_struct *buf, tree type, + HOST_WIDE_INT sz, bool for_auto_init) { switch (TREE_CODE (type)) { @@ -4765,7 +4774,7 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) continue; gcc_assert (TREE_CODE (ftype) == ARRAY_TYPE && !COMPLETE_TYPE_P (ftype)); - if (!buf->clear_in_mask) + if (!buf->clear_in_mask && !for_auto_init) error_at (buf->loc, "flexible array member %qD does not " "have well defined padding bits for %qs", field, "__builtin_clear_padding"); @@ -4781,7 +4790,8 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) gcc_assert (pos >= 0 && fldsz >= 0 && pos >= cur_pos); clear_padding_add_padding (buf, pos - cur_pos); cur_pos = pos; - clear_padding_type (buf, TREE_TYPE (field), fldsz); + clear_padding_type (buf, TREE_TYPE (field), + fldsz, for_auto_init); cur_pos += fldsz; } } @@ -4821,7 +4831,7 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) buf->align = TYPE_ALIGN (elttype); buf->off = 0; buf->size = 0; - clear_padding_emit_loop (buf, elttype, end); + clear_padding_emit_loop (buf, elttype, end, for_auto_init); buf->base = base; buf->sz = prev_sz; buf->align = prev_align; @@ -4831,10 +4841,10 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) break; } for (HOST_WIDE_INT i = 0; i < nelts; i++) - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case UNION_TYPE: - clear_padding_union (buf, type, sz); + clear_padding_union (buf, type, sz, for_auto_init); break; case REAL_TYPE: gcc_assert ((size_t) sz <= clear_padding_unit); @@ -4858,14 +4868,14 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) break; case COMPLEX_TYPE: fldsz = int_size_in_bytes (TREE_TYPE (type)); - clear_padding_type (buf, TREE_TYPE (type), fldsz); - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case VECTOR_TYPE: nelts = TYPE_VECTOR_SUBPARTS (type).to_constant (); fldsz = int_size_in_bytes (TREE_TYPE (type)); for (HOST_WIDE_INT i = 0; i < nelts; i++) - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case NULLPTR_TYPE: gcc_assert ((size_t) sz <= clear_padding_unit); @@ -4901,7 +4911,7 @@ clear_type_padding_in_mask (tree type, unsigned char *mask) buf.sz = int_size_in_bytes (type); buf.size = 0; buf.union_ptr = mask; - clear_padding_type (&buf, type, buf.sz); + clear_padding_type (&buf, type, buf.sz, false); clear_padding_flush (&buf, true); } @@ -4911,9 +4921,13 @@ static bool gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) { gimple *stmt = gsi_stmt (*gsi); - gcc_assert (gimple_call_num_args (stmt) == 2); + gcc_assert (gimple_call_num_args (stmt) == 3); tree ptr = gimple_call_arg (stmt, 0); tree typearg = gimple_call_arg (stmt, 1); + /* the 3rd argument of __builtin_clear_padding is to distinguish whether + this call is made by the user or by the compiler for automatic variable + initialization. */ + bool for_auto_init = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); tree type = TREE_TYPE (TREE_TYPE (typearg)); location_t loc = gimple_location (stmt); clear_padding_struct buf; @@ -4970,7 +4984,7 @@ gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) buf.sz = eltsz; buf.align = TYPE_ALIGN (elttype); buf.alias_type = build_pointer_type (elttype); - clear_padding_emit_loop (&buf, elttype, end); + clear_padding_emit_loop (&buf, elttype, end, for_auto_init); } } else @@ -4983,7 +4997,7 @@ gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) gsi_insert_before (gsi, g, GSI_SAME_STMT); } buf.alias_type = build_pointer_type (type); - clear_padding_type (&buf, type, buf.sz); + clear_padding_type (&buf, type, buf.sz, for_auto_init); clear_padding_flush (&buf, true); } diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 99d1c7fcce4..3314f76cf3f 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1743,6 +1743,94 @@ force_labels_r (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) return NULL_TREE; } +/* Generate an initialization to automatic variable DECL based on INIT_TYPE. + Build a call to internal const function DEFERRED_INIT: + 1st argument: SIZE of the DECL; + 2nd argument: INIT_TYPE; + 3rd argument: IS_VLA, 0 NO, 1 YES; + + as LHS = DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA) + if IS_VLA is false, the LHS is the DECL itself, + if IS_VLA is true, the LHS is a MEM_REF whose address is the pointer + to this DECL. */ +static void +gimple_add_init_for_auto_var (tree decl, + enum auto_init_type init_type, + bool is_vla, + gimple_seq *seq_p) +{ + gcc_assert (auto_var_p (decl)); + gcc_assert (init_type > AUTO_INIT_UNINITIALIZED); + location_t loc = EXPR_LOCATION (decl); + tree decl_size = TYPE_SIZE_UNIT (TREE_TYPE (decl)); + + tree init_type_node + = build_int_cst (integer_type_node, (int) init_type); + tree is_vla_node + = build_int_cst (integer_type_node, (int) is_vla); + + tree call = build_call_expr_internal_loc (loc, IFN_DEFERRED_INIT, + TREE_TYPE (decl), 3, + decl_size, init_type_node, + is_vla_node); + + gimplify_assign (decl, call, seq_p); +} + +/* Generate padding initialization for automatic vairable DECL. + C guarantees that brace-init with fewer initializers than members + aggregate will initialize the rest of the aggregate as-if it were + static initialization. In turn static initialization guarantees + that padding is initialized to zero. So, we always initialize paddings + to zeroes regardless INIT_TYPE. + To do the padding initialization, we insert a call to + __BUILTIN_CLEAR_PADDING (&decl, 0, for_auto_init = true). + Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING, + 'for_auto_init' to distinguish whether this call is for automatic + variable initialization or not. + */ +static void +gimple_add_padding_init_for_auto_var (tree decl, bool is_vla, + gimple_seq *seq_p) +{ + tree addr_of_decl = NULL_TREE; + bool for_auto_init = true; + tree fn = builtin_decl_explicit (BUILT_IN_CLEAR_PADDING); + + if (is_vla) + { + /* The temporary address variable for this vla should be + created in gimplify_vla_decl. */ + gcc_assert (DECL_HAS_VALUE_EXPR_P (decl)); + gcc_assert (TREE_CODE (DECL_VALUE_EXPR (decl)) == INDIRECT_REF); + addr_of_decl = TREE_OPERAND (DECL_VALUE_EXPR (decl), 0); + } + else + { + mark_addressable (decl); + addr_of_decl = build_fold_addr_expr (decl); + } + + gimple *call = gimple_build_call (fn, + 3, addr_of_decl, + build_zero_cst (TREE_TYPE (addr_of_decl)), + build_int_cst (integer_type_node, + (int) for_auto_init)); + gimplify_seq_add_stmt (seq_p, call); +} + +/* Return true if the DECL need to be automaticly initialized by the + compiler. */ +static bool +is_var_need_auto_init (tree decl) +{ + if (auto_var_p (decl) + && (flag_auto_var_init > AUTO_INIT_UNINITIALIZED) + && (!lookup_attribute ("uninitialized", DECL_ATTRIBUTES (decl)))) + return true; + return false; +} + /* Gimplify a DECL_EXPR node *STMT_P by making any necessary allocation and initialization explicit. */ @@ -1840,6 +1928,26 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) as they may contain a label address. */ walk_tree (&init, force_labels_r, NULL, NULL); } + /* When there is no explicit initializer, if the user requested, + We should insert an artifical initializer for this automatic + variable. */ + else if (is_var_need_auto_init (decl)) + { + gimple_add_init_for_auto_var (decl, + flag_auto_var_init, + is_vla, + seq_p); + /* The expanding of a call to the above .DEFERRED_INIT will apply + block initialization to the whole space covered by this variable. + As a result, all the paddings will be initialized to zeroes + for zero initialization and 0xFE byte-repeatable patterns for + pattern initialization. + In order to make the paddings as zeroes for pattern init, We + should add a call to __builtin_clear_padding to clear the + paddings to zero in compatiple with CLANG. */ + if (flag_auto_var_init == AUTO_INIT_PATTERN) + gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p); + } } return GS_ALL_DONE; @@ -3411,11 +3519,15 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) { /* Remember the original type of the argument in an internal dummy second argument, as in GIMPLE pointer conversions are - useless. */ + useless. also mark this call as not for automatic initialization + in the internal dummy third argument. */ p = CALL_EXPR_ARG (*expr_p, 0); + bool for_auto_init = false; *expr_p - = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 2, p, - build_zero_cst (TREE_TYPE (p))); + = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 3, p, + build_zero_cst (TREE_TYPE (p)), + build_int_cst (integer_type_node, + (int) for_auto_init)); return GS_OK; } break; @@ -4872,6 +4984,9 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, tree object, ctor, type; enum gimplify_status ret; vec<constructor_elt, va_gc> *elts; + bool cleared = false; + bool is_empty_ctor = false; + bool is_init_expr = (TREE_CODE (*expr_p) == INIT_EXPR); gcc_assert (TREE_CODE (TREE_OPERAND (*expr_p, 1)) == CONSTRUCTOR); @@ -4914,7 +5029,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, struct gimplify_init_ctor_preeval_data preeval_data; HOST_WIDE_INT num_ctor_elements, num_nonzero_elements; HOST_WIDE_INT num_unique_nonzero_elements; - bool cleared, complete_p, valid_const_initializer; + bool complete_p, valid_const_initializer; /* Aggregate types must lower constructors to initialization of individual elements. The exception is that a CONSTRUCTOR node @@ -4923,6 +5038,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, { if (notify_temp_creation) return GS_OK; + is_empty_ctor = true; break; } @@ -5248,13 +5364,28 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, if (want_value) { *expr_p = object; - return GS_OK; + ret = GS_OK; } else { *expr_p = NULL; - return GS_ALL_DONE; - } + ret = GS_ALL_DONE; + } + + /* If the user requests to initialize automatic variables, we + should initialize paddings inside the variable. Add a call to + __BUILTIN_CLEAR_PADDING (&object, 0, for_auto_init = true) to + initialize paddings of object always to zero regardless of + INIT_TYPE. Note, we will not insert this call if the aggregate + variable has be completely cleared already or it's initialized + with an empty constructor. */ + if (is_init_expr + && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor) + || !AGGREGATE_TYPE_P (type)) + && is_var_need_auto_init (object)) + gimple_add_padding_init_for_auto_var (object, false, pre_p); + + return ret; } /* Given a pointer value OP0, return a simplified version of an @@ -5395,10 +5526,12 @@ gimplify_modify_expr_rhs (tree *expr_p, tree *from_p, tree *to_p, crack at this before we break it down. */ if (ret != GS_UNHANDLED) break; + /* If we're initializing from a CONSTRUCTOR, break this into individual MODIFY_EXPRs. */ - return gimplify_init_constructor (expr_p, pre_p, post_p, want_value, - false); + ret = gimplify_init_constructor (expr_p, pre_p, post_p, want_value, + false); + return ret; case COND_EXPR: /* If we're assigning to a non-register type, push the assignment diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 1360a00f0b9..ada2a820ff1 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -53,6 +53,9 @@ along with GCC; see the file COPYING3. If not see #include "rtl-iter.h" #include "gimple-range.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* The names of each internal function, indexed by function number. */ const char *const internal_fn_name_array[] = { #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) #CODE, @@ -2977,6 +2980,102 @@ expand_UNIQUE (internal_fn, gcall *stmt) emit_insn (pattern); } +/* Expand the IFN_DEFERRED_INIT function: + LHS = DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA); + + if IS_VLA is false, the LHS is the DECL itself, + if IS_VLA is true, the LHS is a MEM_REF whose address is the pointer + to this DECL. + + Initialize the LHS with zero/pattern according to its second argument + INIT_TYPE: + if INIT_TYPE is AUTO_INIT_ZERO, use zeroes to initialize; + if INIT_TYPE is AUTO_INIT_PATTERN, use 0xFE byte-repeatable pattern + to initialize; + The LHS variable is initialized including paddings. + The reasons to choose 0xFE for pattern initialization are: + 1. It is a non-canonical virtual address on x86_64, and at the + high end of the i386 kernel address space. + 2. It is a very large float value (-1.694739530317379e+38). + 3. It is also an unusual number for integers. */ +#define INIT_PATTERN_VALUE 0xFE +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree var_size = gimple_call_arg (stmt, 0); + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); + bool reg_lhs = true; + + tree var_type = TREE_TYPE (lhs); + gcc_assert (init_type > AUTO_INIT_UNINITIALIZED); + + if (DECL_P (lhs)) + { + rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + reg_lhs = !MEM_P (tem); + } + else if (TREE_CODE (lhs) == SSA_NAME) + reg_lhs = true; + else + { + gcc_assert (is_vla); + reg_lhs = false; + } + + + if (!reg_lhs) + { + /* If this is a VLA or the variable is not in register, + expand to a memset to initialize it. */ + + mark_addressable (lhs); + tree var_addr = build_fold_addr_expr (lhs); + + tree value = (init_type == AUTO_INIT_PATTERN) ? + build_int_cst (integer_type_node, + INIT_PATTERN_VALUE) : + integer_zero_node; + tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET), + 3, var_addr, value, var_size); + /* Expand this memset call. */ + expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type)); + } + else + { + /* If this variable is in a register, use expand_assignment might + generate better code. */ + tree init = build_zero_cst (var_type); + unsigned HOST_WIDE_INT total_bytes + = tree_to_uhwi (TYPE_SIZE_UNIT (var_type)); + + if (init_type == AUTO_INIT_PATTERN) + { + tree alt_type = NULL_TREE; + if (!can_native_interpret_type_p (var_type)) + { + alt_type </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 22 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f Author: Simon Pilgrim <llvm-dev(a)redking.me.uk> Date: Mon Aug 23 21:06:06 2021 +0100 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450 </cut> Results regressed to (for first_bad == 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f/results_id: 1 # 447.dealII,[.] _ZNK13LaplaceSolver6SolverILi3EE15assemble_mat regressed by 120 # 464.h264ref,h264ref_base.default regressed by 104 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 135 from (for last_good == 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4963 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4956 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f cd investigate-llvm-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f Author: Simon Pilgrim <llvm-dev(a)redking.me.uk> Date: Mon Aug 23 21:06:06 2021 +0100 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450 --- .../InstCombine/InstructionCombining.cpp | 21 ----- .../InstCombine/gep-combine-loop-invariant.ll | 12 +-- llvm/test/Transforms/InstCombine/gep-custom-dl.ll | 4 +- llvm/test/Transforms/InstCombine/getelementptr.ll | 4 +- llvm/test/Transforms/InstCombine/select-gep.ll | 12 +-- llvm/test/Transforms/InstCombine/shift.ll | 4 +- .../LoopVectorize/AArch64/sve-vector-reverse.ll | 100 ++++++++++----------- .../LoopVectorize/AArch64/vector-reverse-mask4.ll | 54 +++++------ .../Transforms/LoopVectorize/ARM/mve-reductions.ll | 26 +++--- .../X86/x86-interleaved-accesses-masked-group.ll | 60 +++++++------ .../x86-interleaved-store-accesses-with-gaps.ll | 58 ++++++------ .../LoopVectorize/consecutive-ptr-uniforms.ll | 4 +- .../LoopVectorize/interleaved-accesses.ll | 62 +++++++------ 13 files changed, 210 insertions(+), 211 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index 1026b9da44e9..48645b484fd2 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -2131,27 +2131,6 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) { } } } - - // Guard the gep(gep) fold so we don't create an add inside a loop - // when there wasn't an equivalent instruction there before. - bool DifferentLoops = false; - if (LI) - if (auto *GEPLoop = LI->getLoopFor(GEP.getParent())) - if (auto *SrcOpI = dyn_cast<Instruction>(Src)) - if (LI->getLoopFor(SrcOpI->getParent()) != GEPLoop) - DifferentLoops = true; - - // Fold (gep(gep(Ptr,Idx0),Idx1) -> gep(Ptr,add(Idx0,Idx1)) - if (!DifferentLoops && GO1->getType() == SO1->getType()) { - bool NewInBounds = GEP.isInBounds() && Src->isInBounds(); - auto *NewIdx = - Builder.CreateAdd(GO1, SO1, GEP.getName() + ".idx", - /*HasNUW*/ false, /*HasNSW*/ NewInBounds); - auto *NewGEP = GetElementPtrInst::Create( - GEPEltType, Src->getPointerOperand(), {NewIdx}); - NewGEP->setIsInBounds(NewInBounds); - return NewGEP; - } } // Note that if our source is a gep chain itself then we wait for that diff --git a/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll b/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll index dfa664fde208..f9aac12cfb1f 100644 --- a/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll +++ b/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll @@ -8,10 +8,10 @@ define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i3 ; CHECK-LABEL: @foo( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[IDX_EXT2:%.*]] = zext i32 [[CUR_MATCH:%.*]] to i64 +; CHECK-NEXT: [[ADD_PTR4:%.*]] = getelementptr inbounds i8, i8* [[WIN:%.*]], i64 [[IDX_EXT2]] ; CHECK-NEXT: [[IDX_EXT1:%.*]] = zext i32 [[BEST_LEN:%.*]] to i64 -; CHECK-NEXT: [[ADD_PTR25_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT1]], [[IDX_EXT2]] -; CHECK-NEXT: [[ADD_PTR36_IDX:%.*]] = add nsw i64 [[ADD_PTR25_IDX]], -1 -; CHECK-NEXT: [[ADD_PTR36:%.*]] = getelementptr inbounds i8, i8* [[WIN:%.*]], i64 [[ADD_PTR36_IDX]] +; CHECK-NEXT: [[ADD_PTR25:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR4]], i64 [[IDX_EXT1]] +; CHECK-NEXT: [[ADD_PTR36:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR25]], i64 -1 ; CHECK-NEXT: [[TMP0:%.*]] = bitcast i8* [[ADD_PTR36]] to i32* ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[TMP0]], align 4 ; CHECK-NEXT: [[CMP7:%.*]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.*]] @@ -20,9 +20,9 @@ define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i3 ; CHECK-NEXT: br label [[IF_THEN:%.*]] ; CHECK: do.body: ; CHECK-NEXT: [[IDX_EXT:%.*]] = zext i32 [[TMP4:%.*]] to i64 -; CHECK-NEXT: [[ADD_PTR2_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT]], [[IDX_EXT1]] -; CHECK-NEXT: [[ADD_PTR3_IDX:%.*]] = add nsw i64 [[ADD_PTR2_IDX]], -1 -; CHECK-NEXT: [[ADD_PTR3:%.*]] = getelementptr inbounds i8, i8* [[WIN]], i64 [[ADD_PTR3_IDX]] +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8* [[WIN]], i64 [[IDX_EXT1]] +; CHECK-NEXT: [[ADD_PTR2:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR]], i64 -1 +; CHECK-NEXT: [[ADD_PTR3:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR2]], i64 [[IDX_EXT]] ; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8* [[ADD_PTR3]] to i32* ; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP2]], align 4 ; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]] diff --git a/llvm/test/Transforms/InstCombine/gep-custom-dl.ll b/llvm/test/Transforms/InstCombine/gep-custom-dl.ll index 0980451d8ec7..3de70f3c151c 100644 --- a/llvm/test/Transforms/InstCombine/gep-custom-dl.ll +++ b/llvm/test/Transforms/InstCombine/gep-custom-dl.ll @@ -75,8 +75,8 @@ define void @test_evaluate_gep_as_ptrs_array(i8 addrspace(2)* %B) { define i32* @test4(i32* %I, i32 %C, i32 %D) { ; CHECK-LABEL: @test4( -; CHECK-NEXT: [[B_IDX:%.*]] = add i32 [[D:%.*]], [[C:%.*]] -; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[I:%.*]], i32 [[B_IDX]] +; CHECK-NEXT: [[A:%.*]] = getelementptr i32, i32* [[I:%.*]], i32 [[C:%.*]] +; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[A]], i32 [[D:%.*]] ; CHECK-NEXT: ret i32* [[B]] ; %A = getelementptr i32, i32* %I, i32 %C diff --git a/llvm/test/Transforms/InstCombine/getelementptr.ll b/llvm/test/Transforms/InstCombine/getelementptr.ll index 688303d308c1..f2a336767fda 100644 --- a/llvm/test/Transforms/InstCombine/getelementptr.ll +++ b/llvm/test/Transforms/InstCombine/getelementptr.ll @@ -115,8 +115,8 @@ define void @test_evaluate_gep_as_ptrs_array(i8 addrspace(2)* %B) { define i32* @test7(i32* %I, i64 %C, i64 %D) { ; CHECK-LABEL: @test7( -; CHECK-NEXT: [[B_IDX:%.*]] = add i64 [[D:%.*]], [[C:%.*]] -; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[I:%.*]], i64 [[B_IDX]] +; CHECK-NEXT: [[A:%.*]] = getelementptr i32, i32* [[I:%.*]], i64 [[C:%.*]] +; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[A]], i64 [[D:%.*]] ; CHECK-NEXT: ret i32* [[B]] ; %A = getelementptr i32, i32* %I, i64 %C diff --git a/llvm/test/Transforms/InstCombine/select-gep.ll b/llvm/test/Transforms/InstCombine/select-gep.ll index 2e112fe93a4c..519f0a94a136 100644 --- a/llvm/test/Transforms/InstCombine/select-gep.ll +++ b/llvm/test/Transforms/InstCombine/select-gep.ll @@ -102,10 +102,10 @@ define i32* @test2b(i32* %p, i64 %x, i64 %y) { ; PR51069 define i32* @test2c(i32* %p, i64 %x, i64 %y) { ; CHECK-LABEL: @test2c( -; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i32, i32* [[P:%.*]], i64 [[X:%.*]] +; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X]], [[Y:%.*]] ; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 0, i64 6 -; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]] -; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[P:%.*]], i64 [[SEL_IDX1]] +; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[GEP1]], i64 [[SEL_IDX]] ; CHECK-NEXT: ret i32* [[SEL]] ; %gep1 = getelementptr inbounds i32, i32* %p, i64 %x @@ -118,10 +118,10 @@ define i32* @test2c(i32* %p, i64 %x, i64 %y) { ; PR51069 define i32* @test2d(i32* %p, i64 %x, i64 %y) { ; CHECK-LABEL: @test2d( -; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i32, i32* [[P:%.*]], i64 [[X:%.*]] +; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X]], [[Y:%.*]] ; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 6, i64 0 -; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]] -; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[P:%.*]], i64 [[SEL_IDX1]] +; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[GEP1]], i64 [[SEL_IDX]] ; CHECK-NEXT: ret i32* [[SEL]] ; %gep1 = getelementptr inbounds i32, i32* %p, i64 %x diff --git a/llvm/test/Transforms/InstCombine/shift.ll b/llvm/test/Transforms/InstCombine/shift.ll index f87de574bc99..2c5c4a7dbe1c 100644 --- a/llvm/test/Transforms/InstCombine/shift.ll +++ b/llvm/test/Transforms/InstCombine/shift.ll @@ -1774,10 +1774,10 @@ define void @ashr_out_of_range(i177* %A) { define void @ashr_out_of_range_1(i177* %A) { ; CHECK-LABEL: @ashr_out_of_range_1( ; CHECK-NEXT: [[L:%.*]] = load i177, i177* [[A:%.*]], align 4 +; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, i177* [[A]], i64 -1 ; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175 ; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64 -; CHECK-NEXT: [[G62_IDX:%.*]] = add i64 [[TMP1]], -1 -; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, i177* [[A]], i64 [[G62_IDX]] +; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, i177* [[G11]], i64 [[TMP1]] ; CHECK-NEXT: store i177 0, i177* [[G62]], align 4 ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll index d406c6de1571..5cd5af5dd9e6 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll @@ -34,30 +34,30 @@ define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{ ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1 ; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]] -; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8 -; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1 -; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[DOTIDX]] -; CHECK-NEXT: [[TMP10:%.*]] = bitcast double* [[TMP9]] to <vscale x 8 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x double>, <vscale x 8 x double>* [[TMP10]], align 8, !alias.scope !0 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8 +; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1 +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[TMP6]], i64 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = bitcast double* [[TMP10]] to <vscale x 8 x double>* +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x double>, <vscale x 8 x double>* [[TMP11]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[WIDE_LOAD]]) -; CHECK-NEXT: [[TMP11:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer) -; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP11]]) -; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8 -; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1 -; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]] -; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[DOTIDX8]] -; CHECK-NEXT: [[TMP16:%.*]] = bitcast double* [[TMP15]] to <vscale x 8 x double>* -; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP16]], align 8, !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3 -; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP12:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer) +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP12]]) +; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8 +; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1 +; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds double, double* [[TMP13]], i64 [[TMP16]] +; CHECK-NEXT: [[TMP18:%.*]] = bitcast double* [[TMP17]] to <vscale x 8 x double>* +; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP18]], align 8, !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64() +; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3 +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]] +; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -72,8 +72,8 @@ define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{ ; CHECK-NEXT: [[I_08_IN:%.*]] = phi i64 [ [[I_08:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[I_08]] -; CHECK-NEXT: [[TMP20:%.*]] = load double, double* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00 +; CHECK-NEXT: [[TMP22:%.*]] = load double, double* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00 ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[I_08]] ; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1 @@ -126,30 +126,30 @@ define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 { ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1 ; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]] -; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8 -; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1 -; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[DOTIDX]] -; CHECK-NEXT: [[TMP10:%.*]] = bitcast i64* [[TMP9]] to <vscale x 8 x i64>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i64>, <vscale x 8 x i64>* [[TMP10]], align 8, !alias.scope !9 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8 +; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1 +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, i64* [[TMP6]], i64 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = bitcast i64* [[TMP10]] to <vscale x 8 x i64>* +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i64>, <vscale x 8 x i64>* [[TMP11]], align 8, !alias.scope !9 ; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[WIDE_LOAD]]) -; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer) -; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP11]]) -; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8 -; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1 -; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]] -; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[DOTIDX8]] -; CHECK-NEXT: [[TMP16:%.*]] = bitcast i64* [[TMP15]] to <vscale x 8 x i64>* -; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP16]], align 8, !alias.scope !12, !noalias !9 -; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3 -; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] +; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer) +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP12]]) +; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8 +; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1 +; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, i64* [[TMP13]], i64 [[TMP16]] +; CHECK-NEXT: [[TMP18:%.*]] = bitcast i64* [[TMP17]] to <vscale x 8 x i64>* +; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP18]], align 8, !alias.scope !12, !noalias !9 +; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64() +; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3 +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]] +; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -164,8 +164,8 @@ define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 { ; CHECK-NEXT: [[I_09_IN:%.*]] = phi i64 [ [[I_09:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_09]] = add nsw i64 [[I_09_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[I_09]] -; CHECK-NEXT: [[TMP20:%.*]] = load i64, i64* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP20]], 1 +; CHECK-NEXT: [[TMP22:%.*]] = load i64, i64* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP22]], 1 ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[I_09]] ; CHECK-NEXT: store i64 [[ADD]], i64* [[ARRAYIDX2]], align 8 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_09_IN]], 1 diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll index 4233760333ac..077d3c1f71b3 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll @@ -44,30 +44,32 @@ define void @vector_reverse_mask_v4i1(double* %a, double* %cond, i64 %N) #0 { ; CHECK-NEXT: [[TMP4:%.*]] = bitcast double* [[TMP3]] to <4 x double>* ; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP4]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, double* [[TMP2]], i64 -7 -; CHECK-NEXT: [[TMP6:%.*]] = bitcast double* [[TMP5]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, <4 x double>* [[TMP6]], align 8, !alias.scope !0 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, double* [[TMP2]], i64 -4 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, double* [[TMP5]], i64 -3 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast double* [[TMP6]] to <4 x double>* +; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, <4 x double>* [[TMP7]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP7:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer -; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP1]] -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[TMP9]], i64 -3 -; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP7]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP11:%.*]] = bitcast double* [[TMP10]] to <4 x double>* -; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP11]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds double, double* [[TMP9]], i64 -7 -; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP13:%.*]] = bitcast double* [[TMP12]] to <4 x double>* -; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP13]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP14:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> -; CHECK-NEXT: [[TMP15:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> -; CHECK-NEXT: [[TMP16:%.*]] = bitcast double* [[TMP10]] to <4 x double>* -; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP14]], <4 x double>* [[TMP16]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP12]] to <4 x double>* -; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP15]], <4 x double>* [[TMP17]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer +; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP1]] +; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[TMP10]], i64 -3 +; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> +; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[TMP10]], i64 -4 +; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds double, double* [[TMP13]], i64 -3 +; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> +; CHECK-NEXT: [[TMP15:%.*]] = bitcast double* [[TMP14]] to <4 x double>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> +; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> +; CHECK-NEXT: [[TMP18:%.*]] = bitcast double* [[TMP11]] to <4 x double>* +; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP14]] to <4 x double>* +; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP17]], <4 x double>* [[TMP19]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8 -; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -82,13 +84,13 @@ define void @vector_reverse_mask_v4i1(double* %a, double* %cond, i64 %N) #0 { ; CHECK-NEXT: [[I_08_IN:%.*]] = phi i64 [ [[I_08:%.*]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[COND]], i64 [[I_08]] -; CHECK-NEXT: [[TMP19:%.*]] = load double, double* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP19]], 0.000000e+00 +; CHECK-NEXT: [[TMP21:%.*]] = load double, double* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP21]], 0.000000e+00 ; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_INC]] ; CHECK: if.then: ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[I_08]] -; CHECK-NEXT: [[TMP20:%.*]] = load double, double* [[ARRAYIDX1]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00 +; CHECK-NEXT: [[TMP22:%.*]] = load double, double* [[ARRAYIDX1]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00 ; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8 ; CHECK-NEXT: br label [[FOR_INC]] ; CHECK: for.inc: diff --git a/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll b/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll index 3e8ac1bad93c..e66fbede57b7 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll @@ -1367,26 +1367,28 @@ define i32 @reduction_interleave_group(i32 %n, i32* %arr) #0 { ; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] ; CHECK: vector.body: ; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[VEC_PHI:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[VEC_PHI:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[ARR:%.*]], i32 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <8 x i32>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP4]], align 4 +; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[ARR:%.*]], i32 -1 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 [[TMP3]] +; CHECK-NEXT: [[TMP6:%.*]] = bitcast i32* [[TMP5]] to <8 x i32>* +; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP6]], align 4 ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]]) -; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[VEC_PHI]] -; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]]) -; CHECK-NEXT: [[TMP8]] = add i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]]) +; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], [[VEC_PHI]] +; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]]) +; CHECK-NEXT: [[TMP10]] = add i32 [[TMP9]], [[TMP8]] ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT]], label [[SCALAR_PH]] ; CHECK: scalar.ph: ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] -; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP8]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] +; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP10]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: ; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] @@ -1402,7 +1404,7 @@ define i32 @reduction_interleave_group(i32 %n, i32* %arr) #0 { ; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N]] ; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP31:![0-9]+]] ; CHECK: exit: -; CHECK-NEXT: [[RET_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: [[RET_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: ret i32 [[RET_LCSSA]] ; entry: diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll index a80140fea413..884d743a1bad 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll @@ -1439,17 +1439,19 @@ define dso_local void @masked_strided2(i8* noalias nocapture readonly %p, i8* no ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i8* [[TMP7]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[TMP8]], i32 [[TMP4]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8 ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -1875,17 +1877,19 @@ define dso_local void @masked_strided2_unknown_tc(i8* noalias nocapture readonly ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = select <8 x i1> [[TMP6]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = sub <8 x i8> zeroinitializer, [[TMP7]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP2]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP7]], <8 x i8> [[TMP8]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i32 [[TMP2]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = sub <8 x i8> zeroinitializer, [[TMP8]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, i8* [[TMP10]], i32 [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP12:%.*]] = bitcast i8* [[TMP11]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> [[TMP9]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP12]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP13]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -2311,16 +2315,18 @@ define dso_local void @unconditional_masked_strided2_unknown_tc(i8* noalias noca ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i8* [[TMP7]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[TMP8]], i32 [[TMP4]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll index 65838c1f4b02..24bedad51ae1 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll @@ -74,23 +74,25 @@ define dso_local void @test1(i16* noalias nocapture %points, i16* noalias nocapt ; ; ENABLED_MASKED_STRIDED-LABEL: @test1( ; ENABLED_MASKED_STRIDED-NEXT: entry: +; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 -1 ; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]] ; ENABLED_MASKED_STRIDED: vector.body: ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = bitcast i16* [[TMP0]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, <4 x i16>* [[TMP1]], align 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[INDEX]], 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = bitcast i16* [[TMP3]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP4]], align 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 [[TMP2]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = bitcast i16* [[TMP5]] to <16 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = bitcast i16* [[TMP1]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, <4 x i16>* [[TMP2]], align 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[INDEX]], 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = bitcast i16* [[TMP4]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i64 [[TMP3]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, i16* [[TMP0]], i64 [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i16* [[TMP7]] to <16 x i16>* ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD1]], <16 x i32> <i32 0, i32 4, i32 undef, i32 undef, i32 1, i32 5, i32 undef, i32 undef, i32 2, i32 6, i32 undef, i32 undef, i32 3, i32 7, i32 undef, i32 undef> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP6]], i32 2, <16 x i1> <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false>) +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP8]], i32 2, <16 x i1> <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false>) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -244,29 +246,31 @@ define dso_local void @test2(i16* noalias nocapture %points, i32 %numPoints, i16 ; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer +; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 -1 ; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]] ; ENABLED_MASKED_STRIDED: vector.body: ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer ; ENABLED_MASKED_STRIDED-NEXT: [[INDUCTION:%.*]] = or <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = bitcast i16* [[TMP1]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP2]], i32 2, <4 x i1> [[TMP0]], <4 x i16> poison) -; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = shl nsw i64 [[INDEX]], 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = bitcast i16* [[TMP4]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP5]], i32 2, <4 x i1> [[TMP0]], <4 x i16> poison) -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 [[TMP3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = bitcast i16* [[TMP6]] to <16 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = bitcast i16* [[TMP2]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP3]], i32 2, <4 x i1> [[TMP1]], <4 x i16> poison) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = shl nsw i64 [[INDEX]], 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = bitcast i16* [[TMP5]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP6]], i32 2, <4 x i1> [[TMP1]], <4 x i16> poison) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i16, i16* [[TMP0]], i64 [[TMP7]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = bitcast i16* [[TMP8]] to <16 x i16>* ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_MASKED_LOAD]], <4 x i16> [[WIDE_MASKED_LOAD3]], <16 x i32> <i32 0, i32 4, i32 undef, i32 undef, i32 1, i32 5, i32 undef, i32 undef, i32 2, i32 6, i32 undef, i32 undef, i32 3, i32 7, i32 undef, i32 undef> -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP0]], <4 x i1> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP7]], i32 2, <16 x i1> [[TMP8]]) +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3> +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP9]], i32 2, <16 x i1> [[TMP10]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END_LOOPEXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END_LOOPEXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end.loopexit: ; ENABLED_MASKED_STRIDED-NEXT: br label [[FOR_END]] ; ENABLED_MASKED_STRIDED: for.end: diff --git a/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll b/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll index 89c6efa6945c..0a127ad4ef88 100644 --- a/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll +++ b/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll @@ -50,8 +50,8 @@ for.end: ; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] ; CHECK: %offset.idx = sub i64 %n, %index ; CHECK-NOT: getelementptr -; CHECK: %[[G0IDX:.+]] = add nsw i64 %offset.idx, -3 -; CHECK: getelementptr inbounds i32, i32* %a, i64 %[[G0IDX]] +; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3 +; CHECK: getelementptr inbounds i32, i32* %[[G0]], i64 %offset.idx ; CHECK-NOT: getelementptr ; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body ; diff --git a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll index e56b607342e6..3e77d76a26a7 100644 --- a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll +++ b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll @@ -686,17 +686,19 @@ define void @mixed_load2_store2(i32* noalias nocapture readonly %A, i32* noalias ; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP1]], align 4 ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]] +; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]] ; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]] -; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[TMP4]] to <8 x i32>* -; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> -; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP5]], align 4 +; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]] +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 -1 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i64 [[TMP2]] +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>* +; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> +; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 -; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]] +; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 +; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -760,17 +762,19 @@ define void @mixed_load3_store3(i32* nocapture %A) { ; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10> ; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11> ; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]] -; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]] -; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[NEXT_GEP]] to <12 x i32>* -; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> -; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> -; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> [[TMP7]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11> -; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP5]], align 4 +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[NEXT_GEP]], i64 2 +; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]] +; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]] +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP3]], i64 -2 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <12 x i32>* +; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> +; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> +; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11> +; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 ; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4> -; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 -; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]] +; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -1315,21 +1319,23 @@ define void @PR27626_4(i32 *%a, i32 %x, i32 %y, i32 %z, i64 %n) { ; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 2 ; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 4 ; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 6 -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] -; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP4]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP5]] -; CHECK-NEXT: store i32 [[X:%.*]], i32* [[TMP6]], align 4 -; CHECK-NEXT: store i32 [[X]], i32* [[TMP7]], align 4 +; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[OFFSET_IDX]] +; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP4]] +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 -1 +; CHECK-NEXT: store i32 [[X:%.*]], i32* [[TMP7]], align 4 ; CHECK-NEXT: store i32 [[X]], i32* [[TMP8]], align 4 ; CHECK-NEXT: store i32 [[X]], i32* [[TMP9]], align 4 -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <8 x i32>* +; CHECK-NEXT: store i32 [[X]], i32* [[TMP10]], align 4 +; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TMP11]], i64 [[TMP6]] +; CHECK-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <8 x i32>* ; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> [[BROADCAST_SPLAT2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> -; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP11]], align 4 +; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP13]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]] +; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]] </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-release-arm-next-allyesconfig - Build # 25 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/gnu-release-arm-next-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-release-arm-next-allyesconfig Culprit: <cut> commit ee1ba5beab143f3afcc89720bb18ac438c3241b3 Merge: 2fdba6b39b9b 54f9cb2466e1 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Wed Sep 8 10:07:28 2021 +1000 Merge remote-tracking branch 'pm/linux-next' </cut> Results regressed to (for first_bad == ee1ba5beab143f3afcc89720bb18ac438c3241b3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19801 # First few build errors in logs: from (for last_good == 2fdba6b39b9bd8deafe182764414eb075032c31d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19887 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#626bf91…" Reproduce builds: <cut> mkdir investigate-linux-ee1ba5beab143f3afcc89720bb18ac438c3241b3 cd investigate-linux-ee1ba5beab143f3afcc89720bb18ac438c3241b3 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach ee1ba5beab143f3afcc89720bb18ac438c3241b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 2fdba6b39b9bd8deafe182764414eb075032c31d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Full commit (up to 1000 lines): <cut> commit ee1ba5beab143f3afcc89720bb18ac438c3241b3 Merge: 2fdba6b39b9b 54f9cb2466e1 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Wed Sep 8 10:07:28 2021 +1000 Merge remote-tracking branch 'pm/linux-next' Documentation/admin-guide/acpi/ssdt-overlays.rst | 49 +- Documentation/cpu-freq/cpu-drivers.rst | 3 - .../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +- .../bindings/cpufreq/cpufreq-mediatek-hw.yaml | 70 +++ .../bindings/cpufreq/cpufreq-mediatek.txt | 2 +- .../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +- .../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +- .../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +- .../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +- .../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +- .../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +- .../opp/allwinner,sun50i-h6-operating-points.yaml | 4 + Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++ .../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++ Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++ Documentation/devicetree/bindings/opp/opp.txt | 622 --------------------- Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +- .../bindings/opp/ti-omap5-opp-supply.txt | 2 +- .../devicetree/bindings/power/power-domain.yaml | 2 +- .../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 - arch/arm/boot/dts/omap34xx.dtsi | 1 - arch/arm/boot/dts/omap36xx.dtsi | 1 - drivers/acpi/x86/s2idle.c | 67 ++- drivers/base/arch_topology.c | 2 + drivers/cpufreq/Kconfig.arm | 12 + drivers/cpufreq/Makefile | 1 + drivers/cpufreq/acpi-cpufreq.c | 14 +- drivers/cpufreq/cpufreq-dt-platdev.c | 4 + drivers/cpufreq/cpufreq-dt.c | 3 +- drivers/cpufreq/cpufreq.c | 17 +- drivers/cpufreq/imx6q-cpufreq.c | 2 +- drivers/cpufreq/intel_pstate.c | 39 -- drivers/cpufreq/mediatek-cpufreq-hw.c | 308 ++++++++++ drivers/cpufreq/mediatek-cpufreq.c | 3 +- drivers/cpufreq/omap-cpufreq.c | 2 +- drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++- drivers/cpufreq/scmi-cpufreq.c | 65 ++- drivers/cpufreq/scpi-cpufreq.c | 3 +- drivers/cpufreq/sh-cpufreq.c | 11 - drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +- drivers/pci/controller/vmd.c | 55 ++ drivers/pci/host-bridge.c | 1 + drivers/pci/pci-acpi.c | 74 +++ include/linux/cpufreq.h | 75 ++- include/linux/pci-acpi.h | 3 + 45 files changed, 1638 insertions(+), 819 deletions(-) </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-arm-mainline-allmodconfig - Build # 28 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/gnu-master-arm-mainline-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-master-arm-mainline-allmodconfig Culprit: <cut> commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 Author: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Sun Sep 5 11:24:05 2021 -0700 Enable '-Werror' by default for all kernel builds ... but make it a config option so that broken environments can disable it when required. We really should always have a clean build, and will disable specific over-eager warnings as required, if we can't fix them. But while I fairly religiously enforce that in my own tree, it doesn't get enforced by various build robots that don't necessarily report warnings. So this just makes '-Werror' a default compiler flag, but allows people to disable it for their configuration if they have some particular issues. Occasionally, new compiler versions end up enabling new warnings, and it can take a while before we have them fixed (or the warnings disabled if that is what it takes), so the config option allows for that situation. Hopefully this will mean that I get fewer pull requests that have new warnings that were not noticed by various automation we have in place. Knock wood. Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> </cut> Results regressed to (for first_bad == 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21769 # First few build errors in logs: from (for last_good == fd47ff55c9c31101fcc06d20cb381da3d4089bd5) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29880 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 cd investigate-linux-3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fd47ff55c9c31101fcc06d20cb381da3d4089bd5 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Full commit (up to 1000 lines): <cut> commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 Author: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Sun Sep 5 11:24:05 2021 -0700 Enable '-Werror' by default for all kernel builds ... but make it a config option so that broken environments can disable it when required. We really should always have a clean build, and will disable specific over-eager warnings as required, if we can't fix them. But while I fairly religiously enforce that in my own tree, it doesn't get enforced by various build robots that don't necessarily report warnings. So this just makes '-Werror' a default compiler flag, but allows people to disable it for their configuration if they have some particular issues. Occasionally, new compiler versions end up enabling new warnings, and it can take a while before we have them fixed (or the warnings disabled if that is what it takes), so the config option allows for that situation. Hopefully this will mean that I get fewer pull requests that have new warnings that were not noticed by various automation we have in place. Knock wood. Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> --- Makefile | 3 +++ init/Kconfig | 14 ++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/Makefile b/Makefile index 6bc1c5b17a62..d45fc2edf186 100644 --- a/Makefile +++ b/Makefile @@ -785,6 +785,9 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG) := -fstack-protector-strong KBUILD_CFLAGS += $(stackp-flags-y) +KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror +KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) + ifdef CONFIG_CC_IS_CLANG KBUILD_CPPFLAGS += -Qunused-arguments # The kernel builds with '-std=gnu89' so use of GNU extensions is acceptable. diff --git a/init/Kconfig b/init/Kconfig index e708180e9a59..8cb97f141b70 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -137,6 +137,20 @@ config COMPILE_TEST here. If you are a user/distributor, say N here to exclude useless drivers to be distributed. +config WERROR + bool "Compile the kernel with warnings as errors" + default y + help + A kernel build should not cause any compiler warnings, and this + enables the '-Werror' flag to enforce that rule by default. + + However, if you have a new (or very old) compiler with odd and + unusual warnings, or you have some architecture with problems, + you may need to disable this config option in order to + successfully build the kernel. + + If in doubt, say Y. + config UAPI_HEADER_TEST bool "Compile test UAPI headers" depends on HEADERS_INSTALL && CC_CAN_LINK </cut>

4 years, 9 months

3
5
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-arm-spec2k6-Oz - Build # 7 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz Culprit: <cut> commit f947f96797f8ec33aabf9cd7234c850778068445 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 14:34:03 2021 +0200 [gdb/cli] Don't assert on empty string for core-file With current gdb we run into: ... $ gdb -batch '' '' : No such file or directory. pathstuff.cc:132: internal-error: \ gdb::unique_xmalloc_ptr<char> gdb_abspath(const char*): \ Assertion `path != NULL && path[0] != '\0'' failed. ... Fix this by skipping the call to gdb_abspath in core_target_open in the empty-string case, such that we have instead: ... $ gdb -batch '' '' : No such file or directory. : No such file or directory. $ ... Tested on x86_64-linux. gdb/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb/corelow.c (core_target_open): Skip call to gdb_abspath in the empty-string case. gdb/testsuite/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb.base/batch-exit-status.exp: Add gdb '' and gdb '' '' tests. </cut> Results regressed to (for first_bad == f947f96797f8ec33aabf9cd7234c850778068445) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-f947f96797f8ec33aabf9cd7234c850778068445/results_id: 1 # 447.dealII,[.] contract<3> regressed by 200 from (for last_good == 9b9b1092f0a8e6b7d240ea05a74968a883b8a05c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-9b9b1092f0a8e6b7d240ea05a74968a883b8a05c/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4909 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4905 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-f947f96797f8ec33aabf9cd7234c850778068445 cd investigate-binutils-f947f96797f8ec33aabf9cd7234c850778068445 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach f947f96797f8ec33aabf9cd7234c850778068445 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b9b1092f0a8e6b7d240ea05a74968a883b8a05c ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit f947f96797f8ec33aabf9cd7234c850778068445 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 14:34:03 2021 +0200 [gdb/cli] Don't assert on empty string for core-file With current gdb we run into: ... $ gdb -batch '' '' : No such file or directory. pathstuff.cc:132: internal-error: \ gdb::unique_xmalloc_ptr<char> gdb_abspath(const char*): \ Assertion `path != NULL && path[0] != '\0'' failed. ... Fix this by skipping the call to gdb_abspath in core_target_open in the empty-string case, such that we have instead: ... $ gdb -batch '' '' : No such file or directory. : No such file or directory. $ ... Tested on x86_64-linux. gdb/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb/corelow.c (core_target_open): Skip call to gdb_abspath in the empty-string case. gdb/testsuite/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb.base/batch-exit-status.exp: Add gdb '' and gdb '' '' tests. --- gdb/corelow.c | 3 ++- gdb/testsuite/gdb.base/batch-exit-status.exp | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gdb/corelow.c b/gdb/corelow.c index eb785a08633..711e86c4cd4 100644 --- a/gdb/corelow.c +++ b/gdb/corelow.c @@ -428,7 +428,8 @@ core_target_open (const char *arg, int from_tty) } gdb::unique_xmalloc_ptr<char> filename (tilde_expand (arg)); - if (!IS_ABSOLUTE_PATH (filename.get ())) + if (strlen (filename.get ()) != 0 + && !IS_ABSOLUTE_PATH (filename.get ())) filename = gdb_abspath (filename.get ()); flags = O_BINARY | O_LARGEFILE; diff --git a/gdb/testsuite/gdb.base/batch-exit-status.exp b/gdb/testsuite/gdb.base/batch-exit-status.exp index 085dfc6ad56..9a080196bd6 100644 --- a/gdb/testsuite/gdb.base/batch-exit-status.exp +++ b/gdb/testsuite/gdb.base/batch-exit-status.exp @@ -76,3 +76,7 @@ test_exit_status 1 "-batch -x $good_commands -x $bad_commands" \ "-batch -x good-commands -x bad-commands" test_exit_status 1 "-batch -x $good_commands -ex \"set not-a-thing 4\"" \ "-batch -x good-commands -ex \"set not-a-thing 4\"" + +set no_such_re ": No such file or directory\\." +test_exit_status 1 "-batch \"\"" $no_such_re +test_exit_status 1 "-batch \"\" \"\"" [multi_line $no_such_re $no_such_re] </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-release-arm-next-allmodconfig - Build # 20 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/gnu-release-arm-next-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-release-arm-next-allmodconfig Culprit: <cut> commit 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 Merge: 907f2745370d 6f65d2319f21 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Tue Sep 7 10:00:35 2021 +1000 Merge remote-tracking branch 'pm/linux-next' </cut> Results regressed to (for first_bad == 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21778 # First few build errors in logs: from (for last_good == 907f2745370d3cfcc6efe7772def37c4eee4b960) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29889 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#4b93c54…" Reproduce builds: <cut> mkdir investigate-linux-4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 cd investigate-linux-4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 907f2745370d3cfcc6efe7772def37c4eee4b960 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Full commit (up to 1000 lines): <cut> commit 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 Merge: 907f2745370d 6f65d2319f21 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Tue Sep 7 10:00:35 2021 +1000 Merge remote-tracking branch 'pm/linux-next' Documentation/admin-guide/acpi/ssdt-overlays.rst | 49 +- Documentation/cpu-freq/cpu-drivers.rst | 3 - .../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +- .../bindings/cpufreq/cpufreq-mediatek.txt | 2 +- .../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +- .../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +- .../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +- .../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +- .../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +- .../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +- .../opp/allwinner,sun50i-h6-operating-points.yaml | 4 + Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++ .../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++ Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++ Documentation/devicetree/bindings/opp/opp.txt | 622 --------------------- Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +- .../bindings/opp/ti-omap5-opp-supply.txt | 2 +- .../devicetree/bindings/power/power-domain.yaml | 2 +- .../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 - arch/arm/boot/dts/omap34xx.dtsi | 1 - arch/arm/boot/dts/omap36xx.dtsi | 1 - drivers/acpi/x86/s2idle.c | 67 ++- drivers/base/arch_topology.c | 2 + drivers/cpufreq/acpi-cpufreq.c | 14 +- drivers/cpufreq/cpufreq-dt-platdev.c | 4 + drivers/cpufreq/cpufreq-dt.c | 3 +- drivers/cpufreq/cpufreq.c | 17 +- drivers/cpufreq/imx6q-cpufreq.c | 2 +- drivers/cpufreq/mediatek-cpufreq.c | 3 +- drivers/cpufreq/omap-cpufreq.c | 2 +- drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++- drivers/cpufreq/scmi-cpufreq.c | 65 ++- drivers/cpufreq/scpi-cpufreq.c | 3 +- drivers/cpufreq/sh-cpufreq.c | 11 - drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +- drivers/pci/controller/vmd.c | 55 ++ drivers/pci/host-bridge.c | 1 + drivers/pci/pci-acpi.c | 74 +++ include/linux/cpufreq.h | 17 +- include/linux/pci-acpi.h | 3 + 40 files changed, 1190 insertions(+), 779 deletions(-) </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3 - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3 Culprit: <cut> commit 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. (cherry picked from commit 4389a413e2129d7d55ee779638b649aa852b6f8a) </cut> Results regressed to (for first_bad == 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-34f839fc9d4c0638e09c81e9981d4dacf69c3ed6/results_id: 1 # 470.lbm,lbm_base.default regressed by 109 # 444.namd,namd_base.default regressed by 104 # 447.dealII,dealII_base.default regressed by 106 # 447.dealII,[.] _ZNK12SparseMatrixIdE5vmultI6VectorIdES3_EEvRT regressed by 115 # 447.dealII,[.] _ZN16ConstraintMatrix8add_lineEj regressed by 112 # 433.milc,milc_base.default regressed by 104 # 433.milc,[.] mult_su3_mat_vec regressed by 115 from (for last_good == b643ee1b9c1a8e0b81e31908a066c71851292890) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-b643ee1b9c1a8e0b81e31908a066c71851292890/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/4881 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/4887 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 cd investigate-llvm-34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach b643ee1b9c1a8e0b81e31908a066c71851292890 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. (cherry picked from commit 4389a413e2129d7d55ee779638b649aa852b6f8a) --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index aecd28e5e12a..20be01a5f40a 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 0e129e6f2fac..4c8ba8cdcd29 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2637,7 +2637,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2662,7 +2662,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2682,18 +2682,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2779,11 +2781,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2881,17 +2881,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz_LTO - Build # 6 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Culprit: <cut> commit 131b4620ee7847102479f399ce3e35a3c1cb5461 Author: Corentin Jabot <corentin.jabot(a)gmail.com> Date: Fri Aug 6 10:29:28 2021 -0400 Implement P1937 consteval in unevaluated contexts In an unevaluated contexts, consteval functions should not be immediately evaluated. </cut> Results regressed to (for first_bad == 131b4620ee7847102479f399ce3e35a3c1cb5461) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-131b4620ee7847102479f399ce3e35a3c1cb5461/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == 3c8e94bc20e5829ab5167d21d242b6b624dd934e) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-3c8e94bc20e5829ab5167d21d242b6b624dd934e/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/4879 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/4868 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-131b4620ee7847102479f399ce3e35a3c1cb5461 cd investigate-llvm-131b4620ee7847102479f399ce3e35a3c1cb5461 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 131b4620ee7847102479f399ce3e35a3c1cb5461 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3c8e94bc20e5829ab5167d21d242b6b624dd934e ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 131b4620ee7847102479f399ce3e35a3c1cb5461 Author: Corentin Jabot <corentin.jabot(a)gmail.com> Date: Fri Aug 6 10:29:28 2021 -0400 Implement P1937 consteval in unevaluated contexts In an unevaluated contexts, consteval functions should not be immediately evaluated. --- clang/lib/Sema/SemaExpr.cpp | 7 ++--- clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp | 33 +++++++++++++++++++++++- clang/test/SemaCXX/cxx2a-consteval.cpp | 18 +++++++++++++ clang/www/cxx_status.html | 3 ++- 4 files changed, 56 insertions(+), 5 deletions(-) diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index d316687c4cd8..8ef4a9d96320 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -16641,7 +16641,8 @@ void Sema::CheckUnusedVolatileAssignment(Expr *E) { } ExprResult Sema::CheckForImmediateInvocation(ExprResult E, FunctionDecl *Decl) { - if (!E.isUsable() || !Decl || !Decl->isConsteval() || isConstantEvaluated() || + if (isUnevaluatedContext() || !E.isUsable() || !Decl || + !Decl->isConsteval() || isConstantEvaluated() || RebuildingImmediateInvocation) return E; @@ -18758,8 +18759,8 @@ void Sema::MarkDeclRefReferenced(DeclRefExpr *E, const Expr *Base) { OdrUse = false; if (auto *FD = dyn_cast<FunctionDecl>(E->getDecl())) - if (!isConstantEvaluated() && FD->isConsteval() && - !RebuildingImmediateInvocation) + if (!isUnevaluatedContext() && !isConstantEvaluated() && + FD->isConsteval() && !RebuildingImmediateInvocation) ExprEvalContexts.back().ReferenceToConsteval.insert(E); MarkExprReferenced(*this, E->getLocation(), E->getDecl(), E, OdrUse, RefsMinusAssignments); diff --git a/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp b/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp index 55debe3ca731..fafcd127feec 100644 --- a/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp +++ b/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -fsyntax-only -verify %s +// RUN: %clang_cc1 -std=c++20 -fsyntax-only -verify %s // C++ [basic.def.odr]p2: // An expression is potentially evaluated unless it [...] is the @@ -16,7 +17,7 @@ struct Poly { struct NonPoly { }; -template<typename T, typename Result = T> +template<typename T, typename Result = T> struct X { Result f(T t) { return t + t; } // expected-error{{invalid operands}} @@ -34,3 +35,33 @@ void test(X<Poly> xp, X<Poly, Poly&> xpr, X<NonPoly> xnp, X<NonPoly, NonPoly&> x // Triggers an error (as it should); xpr.g(Poly()); // expected-note{{instantiation of member function}} } + +#if __cplusplus >= 202002L + +namespace unevaluated { + +struct S { + void f(); +}; +struct T { + virtual void f(); +}; + +consteval S *null_s() { return nullptr; } +consteval S *make_s() { return new S; } +consteval T *null_t() { return nullptr; } +consteval T *make_t() { return new T; } // #alloc + +void func() { + (void)typeid(*null_s()); + (void)typeid(*make_s()); + (void)typeid(*null_t()); // expected-warning {{expression with side effects will be evaluated despite being used as an operand to 'typeid'}} + (void)typeid(*make_t()); // expected-error {{call to consteval function 'unevaluated::make_t' is not a constant expression}} \ + expected-note {{pointer to heap-allocated object is not a constant expression}} \ + expected-note@#alloc {{heap allocation performed here}} \ + expected-warning {{expression with side effects will be evaluated despite being used as an operand to 'typeid'}} +} + +} // namespace unevaluated + +#endif diff --git a/clang/test/SemaCXX/cxx2a-consteval.cpp b/clang/test/SemaCXX/cxx2a-consteval.cpp index ecf8c1e0f5bd..04c8898aa5ba 100644 --- a/clang/test/SemaCXX/cxx2a-consteval.cpp +++ b/clang/test/SemaCXX/cxx2a-consteval.cpp @@ -594,3 +594,21 @@ void test() { } } // namespace special_ctor + +namespace unevaluated { + +template <typename T, typename U> struct is_same { static const bool value = false; }; +template <typename T> struct is_same<T, T> { static const bool value = true; }; + +long f(); // expected-note {{declared here}} +auto consteval g(auto a) { + return a; +} + +auto e = g(f()); // expected-error {{is not a constant expression}} + // expected-note@-1 {{non-constexpr function 'f' cannot be used in a constant expression}} + +using T = decltype(g(f())); +static_assert(is_same<long, T>::value); + +} // namespace unevaluated diff --git a/clang/www/cxx_status.html b/clang/www/cxx_status.html index 60ce69db9922..3cbee7026c5c 100755 --- a/clang/www/cxx_status.html +++ b/clang/www/cxx_status.html @@ -1105,10 +1105,11 @@ code. This issue is expected to be rectified soon. <tr> <td rowspan=2>Immediate functions (<tt>consteval</tt>)</td> <td><a href="https://wg21.link/p1073r3">P1073R3</a></td> - <td rowspan=2 class="none" align="center">No</td> + <td class="partial" align="center">Partial</td> </tr> <tr>  <td><a href="https://wg21.link/p1937r2">P1937R2</a></td> + <td class="unreleased" align="center">Clang 14</td> </tr> <tr> <td><tt>std::is_constant_evaluated</tt></td> </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3 - Build # 32 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3 Culprit: <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. </cut> Results regressed to (for first_bad == ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3 artifacts/build-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0/results_id: 1 # 429.mcf,mcf_base.default regressed by 106 # 429.mcf,[.] price_out_impl regressed by 174 from (for last_good == a0a0499b8bb920fdd98e791804812f001f0b4fe8) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3 artifacts/build-a0a0499b8bb920fdd98e791804812f001f0b4fe8/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3/4846 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3/4851 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 cd investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a0a0499b8bb920fdd98e791804812f001f0b4fe8 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. --- gcc/testsuite/gcc.dg/lto/pr101868_0.c | 33 +++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_1.c | 23 +++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_2.c | 11 +++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_3.c | 8 ++++++++ gcc/tree-ssa-pre.c | 7 +++++++ 5 files changed, 82 insertions(+) diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_0.c b/gcc/testsuite/gcc.dg/lto/pr101868_0.c new file mode 100644 index 00000000000..c84d19b0267 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_0.c @@ -0,0 +1,33 @@ +/* { dg-lto-do run } */ +/* { dg-lto-options { "-O2 -fno-strict-aliasing -flto" } } */ + +typedef unsigned long VALUE; + +__attribute__ ((cold)) +void rb_check_type(VALUE, int); + +static VALUE +repro(VALUE dummy, VALUE hash) +{ + if (hash == 0) { + rb_check_type(hash, 1); + } + else if (*(long *)hash) { + rb_check_type(hash, 1); + } + + + return *(long *)hash; +} + +static VALUE (*that)(VALUE dummy, VALUE hash) = repro; + +int +main(int argc, char **argv) +{ + argc--; + that(0, argc); + + rb_check_type(argc, argc); + +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_1.c b/gcc/testsuite/gcc.dg/lto/pr101868_1.c new file mode 100644 index 00000000000..146c14abc76 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_1.c @@ -0,0 +1,23 @@ +typedef unsigned long VALUE; + + +__attribute__ ((noreturn)) void rexc_raise(VALUE mesg); + +VALUE rb_donothing(VALUE klass); + +static void +funexpected_type(VALUE x, int xt, int t) +{ + rexc_raise(rb_donothing(0)); +} + +__attribute__ ((cold)) +void +rb_check_type(VALUE x, int t) +{ + int xt; + + if (x == 0) { + funexpected_type(x, xt, t); + } +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_2.c b/gcc/testsuite/gcc.dg/lto/pr101868_2.c new file mode 100644 index 00000000000..e6f01b23f45 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_2.c @@ -0,0 +1,11 @@ +typedef unsigned long VALUE; + +static void thing(void) {} +static void (*ptr)(void) = &thing; + +VALUE +rb_donothing(VALUE klass) +{ + ptr(); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_3.c b/gcc/testsuite/gcc.dg/lto/pr101868_3.c new file mode 100644 index 00000000000..61217625be7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_3.c @@ -0,0 +1,8 @@ +typedef unsigned long VALUE; + +__attribute__((noreturn)) +void +rexc_raise(VALUE mesg) +{ + __builtin_exit(0); +} diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 04ec4fbaeec..2aedc31e1d7 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2070,6 +2070,13 @@ prune_clobbered_mems (bitmap_set_t set, basic_block block) && value_dies_in_block_x (expr, block)))) to_remove = i; } + /* If the REFERENCE may trap make sure the block does not contain + a possible exit point. + ??? This is overly conservative if we translate AVAIL_OUT + as the available expression might be after the exit point. */ + if (BB_MAY_NOTRETURN (block) + && vn_reference_may_trap (ref)) + to_remove = i; } else if (expr->kind == NARY) { </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3_LTO - Build # 32 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 19dc02e99f802922a3af69e802465bee0723b57a Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Sun Aug 22 18:15:55 2021 +0200 [MergeICmps] Allow sinking past non-load/store This is a followup to D106591. MergeICmps currently only allows sinking the loads past either instructions that don't write to memory at all, or simple loads/stores that don't modify the memory the loads access. The "simple loads/stores" part of this check doesn't seem necessary to me -- AA isModRef() already accurately models any operation that may clobber the memory. For example, in the adjusted test case the transform is still fine if the call to @foo() isn't readonly, but inaccessiblememonly -- in both cases, the call cannot modify the loaded memory. Differential Revision: https://reviews.llvm.org/D108517 </cut> Results regressed to (for first_bad == 19dc02e99f802922a3af69e802465bee0723b57a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-19dc02e99f802922a3af69e802465bee0723b57a/results_id: 1 # 464.h264ref,h264ref_base.default regressed by 105 from (for last_good == da12d88b1c5fc42b49b92fcf94917ca489dd677f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-da12d88b1c5fc42b49b92fcf94917ca489dd677f/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3_LTO/4822 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3_LTO/4807 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-19dc02e99f802922a3af69e802465bee0723b57a cd investigate-llvm-19dc02e99f802922a3af69e802465bee0723b57a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 19dc02e99f802922a3af69e802465bee0723b57a ../artifacts/test.sh # Reproduce last_good build git checkout --detach da12d88b1c5fc42b49b92fcf94917ca489dd677f ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 19dc02e99f802922a3af69e802465bee0723b57a Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Sun Aug 22 18:15:55 2021 +0200 [MergeICmps] Allow sinking past non-load/store This is a followup to D106591. MergeICmps currently only allows sinking the loads past either instructions that don't write to memory at all, or simple loads/stores that don't modify the memory the loads access. The "simple loads/stores" part of this check doesn't seem necessary to me -- AA isModRef() already accurately models any operation that may clobber the memory. For example, in the adjusted test case the transform is still fine if the call to @foo() isn't readonly, but inaccessiblememonly -- in both cases, the call cannot modify the loaded memory. Differential Revision: https://reviews.llvm.org/D108517 --- llvm/lib/Transforms/Scalar/MergeICmps.cpp | 14 +------------- .../Transforms/MergeICmps/X86/split-block-does-work.ll | 2 +- 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp index f13f24ad2027..34465c76dd3d 100644 --- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp +++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp @@ -66,15 +66,6 @@ namespace { #define DEBUG_TYPE "mergeicmps" -// Returns true if the instruction is a simple load or a simple store -static bool isSimpleLoadOrStore(const Instruction *I) { - if (const LoadInst *LI = dyn_cast<LoadInst>(I)) - return LI->isSimple(); - if (const StoreInst *SI = dyn_cast<StoreInst>(I)) - return SI->isSimple(); - return false; -} - // A BCE atom "Binary Compare Expression Atom" represents an integer load // that is a constant offset from a base value, e.g. `a` or `o.c` in the example // at the top. @@ -244,10 +235,7 @@ bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst, // If this instruction may clobber the loads and is in middle of the BCE cmp // block instructions, then bail for now. if (Inst->mayWriteToMemory()) { - // Bail if this is not a simple load or store - if (!isSimpleLoadOrStore(Inst)) - return false; - // Disallow stores that might alias the BCE operands + // Disallow instructions that might modify the BCE operands MemoryLocation LLoc = MemoryLocation::get(Cmp.Lhs.LoadI); MemoryLocation RLoc = MemoryLocation::get(Cmp.Rhs.LoadI); if (isModSet(AA.getModRefInfo(Inst, LLoc)) || diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll index 0b9663f44980..1e341b92918d 100644 --- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll +++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll @@ -3,7 +3,7 @@ %S = type { i32, i32, i32, i32 } -declare void @foo(...) readonly +declare void @foo(...) inaccessiblememonly ; We can split %entry and create a memcmp(16 bytes). define zeroext i1 @opeq1( </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-release-aarch64-spec2k6-Os - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os Culprit: <cut> commit 1828e57eb58685a6a7f6d4f4f698dfebf98ef789 Author: Sami Tolvanen <samitolvanen(a)google.com> Date: Tue Aug 3 10:56:56 2021 -0700 ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands 700d07f8ce6f2879610fd6b6968b05c6f17bb915 with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058 (cherry picked from commit 7ce1c4da7726577986535cb7766d782f325145fe) </cut> Results regressed to (for first_bad == 1828e57eb58685a6a7f6d4f4f698dfebf98ef789) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-1828e57eb58685a6a7f6d4f4f698dfebf98ef789/results_id: 1 # 453.povray,povray_base.default regressed by 102 # 470.lbm,lbm_base.default regressed by 103 # 470.lbm,[.] LBM_performStreamCollide regressed by 118 from (for last_good == 7161e4f3345fda1b640a8250a4b34d23c74b0489) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-7161e4f3345fda1b640a8250a4b34d23c74b0489/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-release-aarch64-spec2k6-Os/4779 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-release-aarch64-spec2k6-Os/4788 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-1828e57eb58685a6a7f6d4f4f698dfebf98ef789 cd investigate-llvm-1828e57eb58685a6a7f6d4f4f698dfebf98ef789 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1828e57eb58685a6a7f6d4f4f698dfebf98ef789 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 7161e4f3345fda1b640a8250a4b34d23c74b0489 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Full commit (up to 1000 lines): <cut> commit 1828e57eb58685a6a7f6d4f4f698dfebf98ef789 Author: Sami Tolvanen <samitolvanen(a)google.com> Date: Tue Aug 3 10:56:56 2021 -0700 ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands 700d07f8ce6f2879610fd6b6968b05c6f17bb915 with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058 (cherry picked from commit 7ce1c4da7726577986535cb7766d782f325145fe) --- llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | 21 +++++++++++++++++++++ llvm/test/ThinLTO/X86/devirt2.ll | 4 ++++ .../cfi-icall-static-inline-asm.ll | 22 ++++++++++++++++++++++ .../ThinLTOBitcodeWriter/split-internal2.ll | 3 +++ .../ThinLTOBitcodeWriter/split-vfunc-internal.ll | 3 +++ 5 files changed, 53 insertions(+) diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp index 37329b489555..eea848d3eb2f 100644 --- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp +++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp @@ -33,6 +33,19 @@ using namespace llvm; namespace { +// Determine if a promotion alias should be created for a symbol name. +static bool allowPromotionAlias(const std::string &Name) { + // Promotion aliases are used only in inline assembly. It's safe to + // simply skip unusual names. Subset of MCAsmInfo::isAcceptableChar() + // and MCAsmInfoXCOFF::isAcceptableChar(). + for (const char &C : Name) { + if (isAlnum(C) || C == '_' || C == '.') + continue; + return false; + } + return true; +} + // Promote each local-linkage entity defined by ExportM and used by ImportM by // changing visibility and appending the given ModuleId. void promoteInternals(Module &ExportM, Module &ImportM, StringRef ModuleId, @@ -55,6 +68,7 @@ void promoteInternals(Module &ExportM, Module &ImportM, StringRef ModuleId, } } + std::string OldName = Name.str(); std::string NewName = (Name + ModuleId).str(); if (const auto *C = ExportGV.getComdat()) @@ -69,6 +83,13 @@ void promoteInternals(Module &ExportM, Module &ImportM, StringRef ModuleId, ImportGV->setName(NewName); ImportGV->setVisibility(GlobalValue::HiddenVisibility); } + + if (isa<Function>(&ExportGV) && allowPromotionAlias(OldName)) { + // Create a local alias with the original name to avoid breaking + // references from inline assembly. + std::string Alias = ".set " + OldName + "," + NewName + "\n"; + ExportM.appendModuleInlineAsm(Alias); + } } if (!RenamedComdats.empty()) diff --git a/llvm/test/ThinLTO/X86/devirt2.ll b/llvm/test/ThinLTO/X86/devirt2.ll index 42c15f1c1df5..6501a01a39df 100644 --- a/llvm/test/ThinLTO/X86/devirt2.ll +++ b/llvm/test/ThinLTO/X86/devirt2.ll @@ -131,10 +131,12 @@ ; RUN: -r=%t1.o,_ZN1D1mEi, \ ; RUN: -r=%t1.o,test2, \ ; RUN: -r=%t2.o,_ZN1A1nEi,p \ +; RUN: -r=%t2.o,_ZN1A1nEi, \ ; RUN: -r=%t2.o,_ZN1B1fEi,p \ ; RUN: -r=%t2.o,_ZN1C1fEi,p \ ; RUN: -r=%t2.o,_ZN1D1mEi,p \ ; RUN: -r=%t2.o,_ZN1E1mEi,p \ +; RUN: -r=%t2.o,_ZN1E1mEi, \ ; RUN: -r=%t2.o,_ZTV1B, \ ; RUN: -r=%t2.o,_ZTV1C, \ ; RUN: -r=%t2.o,_ZTV1D, \ @@ -167,10 +169,12 @@ ; RUN: -r=%t1.o,_ZN1D1mEi, \ ; RUN: -r=%t1.o,test2, \ ; RUN: -r=%t2.o,_ZN1A1nEi,p \ +; RUN: -r=%t2.o,_ZN1A1nEi, \ ; RUN: -r=%t2.o,_ZN1B1fEi,p \ ; RUN: -r=%t2.o,_ZN1C1fEi,p \ ; RUN: -r=%t2.o,_ZN1D1mEi,p \ ; RUN: -r=%t2.o,_ZN1E1mEi,p \ +; RUN: -r=%t2.o,_ZN1E1mEi, \ ; RUN: -r=%t2.o,_ZTV1B, \ ; RUN: -r=%t2.o,_ZTV1C, \ ; RUN: -r=%t2.o,_ZTV1D, \ diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-icall-static-inline-asm.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-icall-static-inline-asm.ll new file mode 100644 index 000000000000..c2de21ed4562 --- /dev/null +++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-icall-static-inline-asm.ll @@ -0,0 +1,22 @@ +; REQUIRES: x86-registered-target +; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o - %s | llvm-modextract -b -n 0 -o - | llvm-dis | FileCheck %s + +target triple = "x86_64-unknown-linux-gnu" + +; CHECK: module asm ".set a,a.[[HASH:[0-9a-f]+]]" + +define void @b() { + %f = alloca void ()*, align 8 + ; CHECK: store{{.*}} @a.[[HASH]],{{.*}} %f + store void ()* @a, void ()** %f, align 8 + ; CHECK: %1 = call void ()* asm sideeffect "leaq a(%rip) + %1 = call void ()* asm sideeffect "leaq a(%rip), $0\0A\09", "=r,~{dirflag},~{fpsr},~{flags}"() + ret void +} + +; CHECK: define{{.*}} @a.[[HASH]](){{.*}} !type +define internal void @a() !type !0 { + ret void +} + +!0 = !{i64 0, !"typeid1"} diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll index 98cc80e557f9..f50fe3f93b08 100644 --- a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll +++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll @@ -1,3 +1,4 @@ +; REQUIRES: x86-registered-target ; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o %t %s ; RUN: llvm-modextract -b -n 0 -o %t0 %t ; RUN: llvm-modextract -b -n 1 -o %t1 %t @@ -7,6 +8,8 @@ ; RUN: llvm-bcanalyzer -dump %t0 | FileCheck --check-prefix=BCA0 %s ; RUN: llvm-bcanalyzer -dump %t1 | FileCheck --check-prefix=BCA1 %s +target triple = "x86_64-unknown-linux-gnu" + ; ERROR: llvm-modextract: error: module index out of range; bitcode file contains 2 module(s) ; BCA0: <GLOBALVAL_SUMMARY_BLOCK diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll index d17cbefb0fb1..0d67b74ca5fc 100644 --- a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll +++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll @@ -1,7 +1,10 @@ +; REQUIRES: x86-registered-target ; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o %t %s ; RUN: llvm-modextract -b -n 0 -o - %t | llvm-dis | FileCheck --check-prefix=M0 %s ; RUN: llvm-modextract -b -n 1 -o - %t | llvm-dis | FileCheck --check-prefix=M1 %s +target triple = "x86_64-unknown-linux-gnu" + define [1 x i8*]* @source() { ret [1 x i8*]* @g } </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2 - Build # 17 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Culprit: <cut> commit d39d3a327b1303012370e47d991459ffbfce45ef Author: Peyton, Jonathan L <jonathan.l.peyton(a)intel.com> Date: Fri Aug 20 16:06:13 2021 -0500 [OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488 </cut> Results regressed to (for first_bad == d39d3a327b1303012370e47d991459ffbfce45ef) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-d39d3a327b1303012370e47d991459ffbfce45ef/results_id: 1 # 447.dealII,dealII_base.default regressed by 105 from (for last_good == f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4734 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4757 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-d39d3a327b1303012370e47d991459ffbfce45ef cd investigate-llvm-d39d3a327b1303012370e47d991459ffbfce45ef git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d39d3a327b1303012370e47d991459ffbfce45ef ../artifacts/test.sh # Reproduce last_good build git checkout --detach f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit d39d3a327b1303012370e47d991459ffbfce45ef Author: Peyton, Jonathan L <jonathan.l.peyton(a)intel.com> Date: Fri Aug 20 16:06:13 2021 -0500 [OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488 --- openmp/runtime/test/api/omp_get_wtime.c | 75 ++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 16 deletions(-) diff --git a/openmp/runtime/test/api/omp_get_wtime.c b/openmp/runtime/test/api/omp_get_wtime.c index e2bb211e0ce4..a862e07fc5a2 100644 --- a/openmp/runtime/test/api/omp_get_wtime.c +++ b/openmp/runtime/test/api/omp_get_wtime.c @@ -4,30 +4,73 @@ #include "omp_testsuite.h" #include "omp_my_sleep.h" -int test_omp_get_wtime() -{ +#define NTIMES 100 + +// This is the error % threshold. Be generous with the error threshold since +// this test may be run in parallel with many other tests it may throw off the +// sleep timing. +#define THRESHOLD 33.0 + +double test_omp_get_wtime(double desired_wait_time) { double start; double end; - double measured_time; - double wait_time = 0.2; start = 0; end = 0; start = omp_get_wtime(); - my_sleep (wait_time); + my_sleep(desired_wait_time); end = omp_get_wtime(); - measured_time = end-start; - return ((measured_time > 0.97 * wait_time) && (measured_time < 1.03 * wait_time)) ; + return end - start; } -int main() -{ - int i; - int num_failed=0; +int compare_times(const void *lhs, const void *rhs) { + const double *a = (const double *)lhs; + const double *b = (const double *)rhs; + return *a - *b; +} + +int main() { + int i, final_count; + double percent_off; + double *begin, *end, *ptr; + double wait_time = 0.01; + double average = 0.0; + double n = 0.0; + double *times = (double *)malloc(sizeof(double) * NTIMES); + + // Get each timing + for (i = 0; i < NTIMES; i++) { + times[i] = test_omp_get_wtime(wait_time); + } + + // Remove approx the "worst" tenth of the timings + qsort(times, NTIMES, sizeof(double), compare_times); + begin = times; + end = times + NTIMES; + for (i = 0; i < NTIMES / 10; ++i) { + if (i % 2 == 0) + begin++; + else + end--; + } + + // Get the average of the remaining timings + for (ptr = begin, final_count = 0; ptr != end; ++ptr, ++final_count) + average += times[i]; + average /= (double)final_count; + free(times); + + // Calculate the percent off of desired wait time + percent_off = (average - wait_time) / wait_time * 100.0; + // Should always be positive, but just in case + if (percent_off < 0) + percent_off = -percent_off; - for(i = 0; i < REPETITIONS; i++) { - if(!test_omp_get_wtime()) { - num_failed++; - } + if (percent_off > (double)THRESHOLD) { + fprintf(stderr, "error: average of %d runs (%lf) is of by %lf%%\n", NTIMES, + average, percent_off); + return EXIT_FAILURE; } - return num_failed; + printf("pass: average of %d runs (%lf) is only off by %lf%%\n", NTIMES, + average, percent_off); + return EXIT_SUCCESS; } </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-release-arm-spec2k6-Oz - Build # 4 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Oz Culprit: <cut> commit 876de062f94650f9ded56a22b062236f711fcd18 Author: Marius Brehler <marius.brehler(a)iml.fraunhofer.de> Date: Wed Jun 9 13:38:10 2021 +0000 [mlir] Add EmitC dialect This upstreams the EmitC dialect and the corresponding Cpp target, both initially presented with [1], from [2] to MLIR core. For the related discussion, see [3]. [1] https://reviews.llvm.org/D76571 [2] https://github.com/iml130/mlir-emitc [3] https://llvm.discourse.group/t/emitc-generating-c-c-from-mlir/3388 Co-authored-by: Jacques Pienaar <jpienaar(a)google.com> Co-authored-by: Simon Camphausen <simon.camphausen(a)iml.fraunhofer.de> Co-authored-by: Oliver Scherf <oliver.scherf(a)iml.fraunhofer.de> Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D103969 </cut> Results regressed to (for first_bad == 876de062f94650f9ded56a22b062236f711fcd18) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-876de062f94650f9ded56a22b062236f711fcd18/results_id: 1 # 470.lbm,lbm_base.default regressed by 107 from (for last_good == 1bd4085e0bbc14ec61ab69c83464098622b2df56) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-1bd4085e0bbc14ec61ab69c83464098622b2df56/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-release-arm-spec2k6-Oz/4752 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-release-arm-spec2k6-Oz/4718 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-876de062f94650f9ded56a22b062236f711fcd18 cd investigate-llvm-876de062f94650f9ded56a22b062236f711fcd18 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 876de062f94650f9ded56a22b062236f711fcd18 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1bd4085e0bbc14ec61ab69c83464098622b2df56 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Full commit (up to 1000 lines): <cut> commit 876de062f94650f9ded56a22b062236f711fcd18 Author: Marius Brehler <marius.brehler(a)iml.fraunhofer.de> Date: Wed Jun 9 13:38:10 2021 +0000 [mlir] Add EmitC dialect This upstreams the EmitC dialect and the corresponding Cpp target, both initially presented with [1], from [2] to MLIR core. For the related discussion, see [3]. [1] https://reviews.llvm.org/D76571 [2] https://github.com/iml130/mlir-emitc [3] https://llvm.discourse.group/t/emitc-generating-c-c-from-mlir/3388 Co-authored-by: Jacques Pienaar <jpienaar(a)google.com> Co-authored-by: Simon Camphausen <simon.camphausen(a)iml.fraunhofer.de> Co-authored-by: Oliver Scherf <oliver.scherf(a)iml.fraunhofer.de> Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D103969 --- mlir/include/mlir/Dialect/CMakeLists.txt | 1 + mlir/include/mlir/Dialect/EmitC/CMakeLists.txt | 1 + mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt | 7 + mlir/include/mlir/Dialect/EmitC/IR/EmitC.h | 32 ++++ mlir/include/mlir/Dialect/EmitC/IR/EmitC.td | 148 ++++++++++++++ .../mlir/Dialect/EmitC/IR/EmitCAttributes.td | 45 +++++ mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td | 28 +++ mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td | 46 +++++ mlir/include/mlir/InitAllDialects.h | 2 + mlir/lib/Dialect/CMakeLists.txt | 1 + mlir/lib/Dialect/EmitC/CMakeLists.txt | 1 + mlir/lib/Dialect/EmitC/IR/CMakeLists.txt | 14 ++ mlir/lib/Dialect/EmitC/IR/EmitC.cpp | 212 +++++++++++++++++++++ mlir/test/Dialect/EmitC/invalid_ops.mlir | 79 ++++++++ mlir/test/Dialect/EmitC/ops.mlir | 24 +++ mlir/test/Dialect/EmitC/types.mlir | 18 ++ mlir/test/mlir-opt/commandline.mlir | 1 + 17 files changed, 660 insertions(+) diff --git a/mlir/include/mlir/Dialect/CMakeLists.txt b/mlir/include/mlir/Dialect/CMakeLists.txt index 2d6d04a52a9d..44a9249cef83 100644 --- a/mlir/include/mlir/Dialect/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/CMakeLists.txt @@ -5,6 +5,7 @@ add_subdirectory(ArmSVE) add_subdirectory(AMX) add_subdirectory(Complex) add_subdirectory(DLTI) +add_subdirectory(EmitC) add_subdirectory(GPU) add_subdirectory(Math) add_subdirectory(Linalg) diff --git a/mlir/include/mlir/Dialect/EmitC/CMakeLists.txt b/mlir/include/mlir/Dialect/EmitC/CMakeLists.txt new file mode 100644 index 000000000000..f33061b2d87c --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/CMakeLists.txt @@ -0,0 +1 @@ +add_subdirectory(IR) diff --git a/mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt new file mode 100644 index 000000000000..09a9f7a2ec1c --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt @@ -0,0 +1,7 @@ +add_mlir_dialect(EmitC emitc) +add_mlir_doc(EmitC EmitC Dialects/ -gen-dialect-doc) + +set(LLVM_TARGET_DEFINITIONS EmitCAttributes.td) +mlir_tablegen(EmitCAttributes.h.inc -gen-attrdef-decls) +mlir_tablegen(EmitCAttributes.cpp.inc -gen-attrdef-defs) +add_public_tablegen_target(MLIREmitCAttributesIncGen) diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h new file mode 100644 index 000000000000..857d1430f941 --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h @@ -0,0 +1,32 @@ +//===- EmitC.h - EmitC Dialect ----------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file declares EmitC in MLIR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITC_H +#define MLIR_DIALECT_EMITC_IR_EMITC_H + +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/BuiltinTypes.h" +#include "mlir/IR/Dialect.h" +#include "mlir/Interfaces/SideEffectInterfaces.h" + +#include "mlir/Dialect/EmitC/IR/EmitCDialect.h.inc" + +#define GET_ATTRDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCAttributes.h.inc" + +#define GET_TYPEDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCTypes.h.inc" + +#define GET_OP_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitC.h.inc" + +#endif // MLIR_DIALECT_EMITC_IR_EMITC_H diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td new file mode 100644 index 000000000000..78c682a80671 --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td @@ -0,0 +1,148 @@ +//===- EmitC.td - EmitC operations--------------------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC operations. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITC +#define MLIR_DIALECT_EMITC_IR_EMITC + +include "mlir/Dialect/EmitC/IR/EmitCAttributes.td" +include "mlir/Dialect/EmitC/IR/EmitCTypes.td" + +include "mlir/Interfaces/SideEffectInterfaces.td" + +//===----------------------------------------------------------------------===// +// EmitC op definitions +//===----------------------------------------------------------------------===// + +// Base class for EmitC dialect ops. +class EmitC_Op<string mnemonic, list<OpTrait> traits = []> + : Op<EmitC_Dialect, mnemonic, traits> { + let verifier = "return ::verify(*this);"; +} + +def EmitC_ApplyOp : EmitC_Op<"apply", []> { + let summary = "Apply operation"; + let description = [{ + With the `apply` operation the operators & (address of) and * (contents of) + can be applied to a single operand. + + Example: + + ```mlir + // Custom form of applying the & operator. + %0 = emitc.apply "&"(%arg0) : (i32) -> !emitc.opaque<"int32_t*"> + + // Generic form of the same operation. + %0 = "emitc.apply"(%arg0) {applicableOperator = "&"} + : (i32) -> !emitc.opaque<"int32_t*"> + + ``` + }]; + let arguments = (ins + Arg<StrAttr, "the operator to apply">:$applicableOperator, + AnyType:$operand + ); + let results = (outs AnyType:$result); + let assemblyFormat = [{ + $applicableOperator `(` $operand `)` attr-dict `:` functional-type($operand, results) + }]; +} + +def EmitC_CallOp : EmitC_Op<"call", []> { + let summary = "Call operation"; + let description = [{ + The `call` operation represents a C++ function call. The call allows + specifying order of operands and attributes in the call as follows: + + - integer value of index type refers to an operand; + - attribute which will get lowered to constant value in call; + + Example: + + ```mlir + // Custom form defining a call to `foo()`. + %0 = emitc.call "foo" () : () -> i32 + + // Generic form of the same operation. + %0 = "emitc.call"() {callee = "foo"} : () -> i32 + ``` + }]; + let arguments = (ins + Arg<StrAttr, "the C++ function to call">:$callee, + Arg<OptionalAttr<ArrayAttr>, "the order of operands and further attributes">:$args, + Arg<OptionalAttr<ArrayAttr>, "template arguments">:$template_args, + Variadic<AnyType>:$operands + ); + let results = (outs Variadic<AnyType>); + let assemblyFormat = [{ + $callee `(` $operands `)` attr-dict `:` functional-type($operands, results) + }]; +} + +def EmitC_ConstantOp : EmitC_Op<"constant", [ConstantLike]> { + let summary = "Constant operation"; + let description = [{ + The `constant` operation produces an SSA value equal to some constant + specified by an attribute. This can be used to form simple integer and + floating point constants, as well as more exotic things like tensor + constants. The `constant` operation also supports the EmitC opaque + attribute and the EmitC opaque type. + + Example: + + ```mlir + // Integer constant + %0 = "emitc.constant"(){value = 42 : i32} : () -> i32 + + // Constant emitted as `int32_t* = NULL;` + %1 = "emitc.constant"() + {value = #emitc.opaque<"NULL"> : !emitc.opaque<"int32_t*">} + : () -> !emitc.opaque<"int32_t*"> + ``` + }]; + + let arguments = (ins AnyAttr:$value); + let results = (outs AnyType); + + let hasFolder = 1; +} + +def EmitC_IncludeOp + : EmitC_Op<"include", [NoSideEffect, HasParent<"ModuleOp">]> { + let summary = "Include operation"; + let description = [{ + The `include` operation allows to define a source file inclusion via the + `#include` directive. + + Example: + + ```mlir + // Custom form defining the inclusion of `<myheader>`. + emitc.include "myheader.h" is_standard_include + + // Generic form of the same operation. + "emitc.include" (){include = "myheader.h", is_standard_include} : () -> () + + // Generic form defining the inclusion of `"myheader"`. + "emitc.include" (){include = "myheader.h"} : () -> () + ``` + }]; + let arguments = (ins + Arg<StrAttr, "source file to include">:$include, + UnitAttr:$is_standard_include + ); + let assemblyFormat = [{ + $include attr-dict (`is_standard_include` $is_standard_include^)? + }]; + let verifier = ?; +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITC diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCAttributes.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCAttributes.td new file mode 100644 index 000000000000..2dd782ba49bf --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCAttributes.td @@ -0,0 +1,45 @@ +//===- EmitCAttributes.td - EmitC attributes ---------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC attributes. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITCATTRIBUTES +#define MLIR_DIALECT_EMITC_IR_EMITCATTRIBUTES + +include "mlir/Dialect/EmitC/IR/EmitCBase.td" + +//===----------------------------------------------------------------------===// +// EmitC attribute definitions +//===----------------------------------------------------------------------===// + +class EmitC_Attr<string name, string attrMnemonic> + : AttrDef<EmitC_Dialect, name> { + let mnemonic = attrMnemonic; +} + +def EmitC_OpaqueAttr : EmitC_Attr<"Opaque", "opaque"> { + let summary = "An opaque attribute"; + + let description = [{ + An opaque attribute of which the value gets emitted as is. + + Example: + + ```mlir + #emitc.opaque<""> + #emitc.opaque<"NULL"> + #emitc.opaque<"nullptr"> + ``` + }]; + + let parameters = (ins StringRefParameter<"the opaque value">:$value); +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITCATTRIBUTES diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td new file mode 100644 index 000000000000..5b7e81e2833a --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td @@ -0,0 +1,28 @@ +//===- EmitCBase.td - EmitC dialect ------------------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC dialect. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITCBASE +#define MLIR_DIALECT_EMITC_IR_EMITCBASE + +include "mlir/IR/OpBase.td" + +//===----------------------------------------------------------------------===// +// EmitC dialect definition +//===----------------------------------------------------------------------===// + +def EmitC_Dialect : Dialect { + let name = "emitc"; + let cppNamespace = "::mlir::emitc"; + let hasConstantMaterializer = 1; +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITCBASE diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td new file mode 100644 index 000000000000..d6fdd0fbf82d --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td @@ -0,0 +1,46 @@ +//===- EmitCTypes.td - EmitC types -------------------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC types. +// +//===----------------------------------------------------------------------===// + + +#ifndef MLIR_DIALECT_EMITC_IR_EMITCTYPES +#define MLIR_DIALECT_EMITC_IR_EMITCTYPES + +include "mlir/Dialect/EmitC/IR/EmitCBase.td" + +//===----------------------------------------------------------------------===// +// EmitC type definitions +//===----------------------------------------------------------------------===// + +class EmitC_Type<string name, string typeMnemonic> + : TypeDef<EmitC_Dialect, name> { + let mnemonic = typeMnemonic; +} + +def EmitC_OpaqueType : EmitC_Type<"Opaque", "opaque"> { + let summary = "An opaque type"; + + let description = [{ + An opaque data type of which the value gets emitted as is. + + Example: + + ```mlir + !emitc.opaque<"int"> + !emitc.opaque<"float *"> + !emitc.opaque<"std::vector<std::string>"> + ``` + }]; + + let parameters = (ins StringRefParameter<"the opaque value">:$value); +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITCTYPES diff --git a/mlir/include/mlir/InitAllDialects.h b/mlir/include/mlir/InitAllDialects.h index e44f2e8f1ae0..c52dae3fd1b5 100644 --- a/mlir/include/mlir/InitAllDialects.h +++ b/mlir/include/mlir/InitAllDialects.h @@ -21,6 +21,7 @@ #include "mlir/Dialect/Async/IR/Async.h" #include "mlir/Dialect/Complex/IR/Complex.h" #include "mlir/Dialect/DLTI/DLTI.h" +#include "mlir/Dialect/EmitC/IR/EmitC.h" #include "mlir/Dialect/GPU/GPUDialect.h" #include "mlir/Dialect/LLVMIR/LLVMDialect.h" #include "mlir/Dialect/LLVMIR/NVVMDialect.h" @@ -57,6 +58,7 @@ inline void registerAllDialects(DialectRegistry &registry) { async::AsyncDialect, complex::ComplexDialect, DLTIDialect, + emitc::EmitCDialect, gpu::GPUDialect, LLVM::LLVMDialect, linalg::LinalgDialect, diff --git a/mlir/lib/Dialect/CMakeLists.txt b/mlir/lib/Dialect/CMakeLists.txt index f5124f7d138f..de946beef0d9 100644 --- a/mlir/lib/Dialect/CMakeLists.txt +++ b/mlir/lib/Dialect/CMakeLists.txt @@ -5,6 +5,7 @@ add_subdirectory(Async) add_subdirectory(AMX) add_subdirectory(Complex) add_subdirectory(DLTI) +add_subdirectory(EmitC) add_subdirectory(GPU) add_subdirectory(Linalg) add_subdirectory(LLVMIR) diff --git a/mlir/lib/Dialect/EmitC/CMakeLists.txt b/mlir/lib/Dialect/EmitC/CMakeLists.txt new file mode 100644 index 000000000000..f33061b2d87c --- /dev/null +++ b/mlir/lib/Dialect/EmitC/CMakeLists.txt @@ -0,0 +1 @@ +add_subdirectory(IR) diff --git a/mlir/lib/Dialect/EmitC/IR/CMakeLists.txt b/mlir/lib/Dialect/EmitC/IR/CMakeLists.txt new file mode 100644 index 000000000000..6283441fdadf --- /dev/null +++ b/mlir/lib/Dialect/EmitC/IR/CMakeLists.txt @@ -0,0 +1,14 @@ +add_mlir_dialect_library(MLIREmitC + EmitC.cpp + + ADDITIONAL_HEADER_DIRS + ${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/EmitC + + DEPENDS + MLIREmitCIncGen + MLIREmitCAttributesIncGen + + LINK_LIBS PUBLIC + MLIRIR + MLIRSideEffectInterfaces + ) diff --git a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp new file mode 100644 index 000000000000..364c247f75e4 --- /dev/null +++ b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp @@ -0,0 +1,212 @@ +//===- EmitC.cpp - EmitC Dialect ------------------------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "mlir/Dialect/EmitC/IR/EmitC.h" +#include "mlir/IR/Builders.h" +#include "mlir/IR/DialectImplementation.h" +#include "llvm/ADT/TypeSwitch.h" + +using namespace mlir; +using namespace mlir::emitc; + +//===----------------------------------------------------------------------===// +// EmitCDialect +//===----------------------------------------------------------------------===// + +void EmitCDialect::initialize() { + addOperations< +#define GET_OP_LIST +#include "mlir/Dialect/EmitC/IR/EmitC.cpp.inc" + >(); + addTypes< +#define GET_TYPEDEF_LIST +#include "mlir/Dialect/EmitC/IR/EmitCTypes.cpp.inc" + >(); + addAttributes< +#define GET_ATTRDEF_LIST +#include "mlir/Dialect/EmitC/IR/EmitCAttributes.cpp.inc" + >(); +} + +/// Materialize a single constant operation from a given attribute value with +/// the desired resultant type. +Operation *EmitCDialect::materializeConstant(OpBuilder &builder, + Attribute value, Type type, + Location loc) { + return builder.create<ConstantOp>(loc, type, value); +} + +//===----------------------------------------------------------------------===// +// ApplyOp +//===----------------------------------------------------------------------===// + +static LogicalResult verify(ApplyOp op) { + StringRef applicableOperator = op.applicableOperator(); + + // Applicable operator must not be empty. + if (applicableOperator.empty()) + return op.emitOpError("applicable operator must not be empty"); + + // Only `*` and `&` are supported. + if (applicableOperator != "&" && applicableOperator != "*") + return op.emitOpError("applicable operator is illegal"); + + return success(); +} + +//===----------------------------------------------------------------------===// +// CallOp +//===----------------------------------------------------------------------===// + +static LogicalResult verify(emitc::CallOp op) { + // Callee must not be empty. + if (op.callee().empty()) + return op.emitOpError("callee must not be empty"); + + if (Optional<ArrayAttr> argsAttr = op.args()) { + for (Attribute arg : argsAttr.getValue()) { + if (arg.getType().isa<IndexType>()) { + int64_t index = arg.cast<IntegerAttr>().getInt(); + // Args with elements of type index must be in range + // [0..operands.size). + if ((index < 0) || (index >= static_cast<int64_t>(op.getNumOperands()))) + return op.emitOpError("index argument is out of range"); + + // Args with elements of type ArrayAttr must have a type. + } else if (arg.isa<ArrayAttr>() && arg.getType().isa<NoneType>()) { + return op.emitOpError("array argument has no type"); + } + } + } + + if (Optional<ArrayAttr> templateArgsAttr = op.template_args()) { + for (Attribute tArg : templateArgsAttr.getValue()) { + if (!tArg.isa<TypeAttr>() && !tArg.isa<IntegerAttr>() && + !tArg.isa<FloatAttr>() && !tArg.isa<emitc::OpaqueAttr>()) + return op.emitOpError("template argument has invalid type"); + } + } + + return success(); +} + +//===----------------------------------------------------------------------===// +// ConstantOp +//===----------------------------------------------------------------------===// + +/// The constant op requires that the attribute's type matches the return type. +static LogicalResult verify(emitc::ConstantOp &op) { + Attribute value = op.value(); + Type type = op.getType(); + if (!value.getType().isa<NoneType>() && type != value.getType()) + return op.emitOpError() << "requires attribute's type (" << value.getType() + << ") to match op's return type (" << type << ")"; + return success(); +} + +OpFoldResult emitc::ConstantOp::fold(ArrayRef<Attribute> operands) { + assert(operands.empty() && "constant has no operands"); + return value(); +} + +//===----------------------------------------------------------------------===// +// TableGen'd op method definitions +//===----------------------------------------------------------------------===// + +#define GET_OP_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitC.cpp.inc" + +//===----------------------------------------------------------------------===// +// EmitC Attributes +//===----------------------------------------------------------------------===// + +#define GET_ATTRDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCAttributes.cpp.inc" + +Attribute emitc::OpaqueAttr::parse(MLIRContext *context, + DialectAsmParser &parser, Type type) { + if (parser.parseLess()) + return Attribute(); + StringRef value; + llvm::SMLoc loc = parser.getCurrentLocation(); + if (parser.parseOptionalString(&value)) { + parser.emitError(loc) << "expected string"; + return Attribute(); + } + if (parser.parseGreater()) + return Attribute(); + return get(context, value); +} + +Attribute EmitCDialect::parseAttribute(DialectAsmParser &parser, + Type type) const { + llvm::SMLoc typeLoc = parser.getCurrentLocation(); + StringRef mnemonic; + if (parser.parseKeyword(&mnemonic)) + return Attribute(); + Attribute genAttr; + OptionalParseResult parseResult = + generatedAttributeParser(getContext(), parser, mnemonic, type, genAttr); + if (parseResult.hasValue()) + return genAttr; + parser.emitError(typeLoc, "unknown attribute in EmitC dialect"); + return Attribute(); +} + +void EmitCDialect::printAttribute(Attribute attr, DialectAsmPrinter &os) const { + if (failed(generatedAttributePrinter(attr, os))) + llvm_unreachable("unexpected 'EmitC' attribute kind"); +} + +void emitc::OpaqueAttr::print(DialectAsmPrinter &printer) const { + printer << "opaque<\"" << getValue() << "\">"; +} + +//===----------------------------------------------------------------------===// +// EmitC Types +//===----------------------------------------------------------------------===// + +#define GET_TYPEDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCTypes.cpp.inc" + +Type emitc::OpaqueType::parse(MLIRContext *context, DialectAsmParser &parser) { + if (parser.parseLess()) + return Type(); + StringRef value; + llvm::SMLoc loc = parser.getCurrentLocation(); + if (parser.parseOptionalString(&value) || value.empty()) { + parser.emitError(loc) << "expected non empty string"; + return Type(); + } + if (parser.parseGreater()) + return Type(); + return get(context, value); +} + +Type EmitCDialect::parseType(DialectAsmParser &parser) const { + llvm::SMLoc typeLoc = parser.getCurrentLocation(); + StringRef mnemonic; + if (parser.parseKeyword(&mnemonic)) + return Type(); + Type genType; + OptionalParseResult parseResult = + generatedTypeParser(getContext(), parser, mnemonic, genType); + if (parseResult.hasValue()) + return genType; + parser.emitError(typeLoc, "unknown type in EmitC dialect"); + return Type(); +} + +void EmitCDialect::printType(Type type, DialectAsmPrinter &os) const { + if (failed(generatedTypePrinter(type, os))) + llvm_unreachable("unexpected 'EmitC' type kind"); +} + +void emitc::OpaqueType::print(DialectAsmPrinter &printer) const { + printer << "opaque<\"" << getValue() << "\">"; +} diff --git a/mlir/test/Dialect/EmitC/invalid_ops.mlir b/mlir/test/Dialect/EmitC/invalid_ops.mlir new file mode 100644 index 000000000000..e86664627c36 --- /dev/null +++ b/mlir/test/Dialect/EmitC/invalid_ops.mlir @@ -0,0 +1,79 @@ +// RUN: mlir-opt %s -split-input-file -verify-diagnostics + +func @const_attribute_return_type_1() { + // expected-error @+1 {{'emitc.constant' op requires attribute's type ('i64') to match op's return type ('i32')}} + %c0 = "emitc.constant"(){value = 42: i64} : () -> i32 + return +} + +// ----- + +func @const_attribute_return_type_2() { + // expected-error @+1 {{'emitc.constant' op requires attribute's type ('!emitc.opaque<"int32_t*">') to match op's return type ('!emitc.opaque<"int32_t">')}} + %c0 = "emitc.constant"(){value = "nullptr" : !emitc.opaque<"int32_t*">} : () -> !emitc.opaque<"int32_t"> + return +} + +// ----- + +func @index_args_out_of_range_1() { + // expected-error @+1 {{'emitc.call' op index argument is out of range}} + emitc.call "test" () {args = [0 : index]} : () -> () + return +} + +// ----- + +func @index_args_out_of_range_2(%arg : i32) { + // expected-error @+1 {{'emitc.call' op index argument is out of range}} + emitc.call "test" (%arg, %arg) {args = [2 : index]} : (i32, i32) -> () + return +} + +// ----- + +func @empty_callee() { + // expected-error @+1 {{'emitc.call' op callee must not be empty}} + emitc.call "" () : () -> () + return +} + +// ----- + +func @nonetype_arg(%arg : i32) { + // expected-error @+1 {{'emitc.call' op array argument has no type}} + emitc.call "nonetype_arg"(%arg) {args = [0 : index, [0, 1, 2]]} : (i32) -> i32 + return +} + +// ----- + +func @array_template_arg(%arg : i32) { + // expected-error @+1 {{'emitc.call' op template argument has invalid type}} + emitc.call "nonetype_template_arg"(%arg) {template_args = [[0, 1, 2]]} : (i32) -> i32 + return +} + +// ----- + +func @dense_template_argument(%arg : i32) { + // expected-error @+1 {{'emitc.call' op template argument has invalid type}} + emitc.call "dense_template_argument"(%arg) {template_args = [dense<[1.0, 1.0]> : tensor<2xf32>]} : (i32) -> i32 + return +} + +// ----- + +func @empty_operator(%arg : i32) { + // expected-error @+1 {{'emitc.apply' op applicable operator must not be empty}} + %2 = emitc.apply ""(%arg) : (i32) -> !emitc.opaque<"int32_t*"> + return +} + +// ----- + +func @illegal_operator(%arg : i32) { + // expected-error @+1 {{'emitc.apply' op applicable operator is illegal}} + %2 = emitc.apply "+"(%arg) : (i32) -> !emitc.opaque<"int32_t*"> + return +} diff --git a/mlir/test/Dialect/EmitC/ops.mlir b/mlir/test/Dialect/EmitC/ops.mlir new file mode 100644 index 000000000000..3a48ff447e1c --- /dev/null +++ b/mlir/test/Dialect/EmitC/ops.mlir @@ -0,0 +1,24 @@ +// RUN: mlir-opt -verify-diagnostics %s | FileCheck %s + +"emitc.include" (){include = "test.h", is_standard_include} : () -> () +emitc.include "test.h" is_standard_include + +// CHECK-LABEL: func @f(%{{.*}}: i32, %{{.*}}: !emitc.opaque<"int32_t">) { +func @f(%arg0: i32, %f: !emitc.opaque<"int32_t">) { + %1 = "emitc.call"() {callee = "blah"} : () -> i64 + emitc.call "foo" (%1) {args = [ + 0 : index, dense<[0, 1]> : tensor<2xi32>, 0 : index + ]} : (i64) -> () + return +} + +func @c(%arg0: i32) { + %1 = "emitc.constant"(){value = 42 : i32} : () -> i32 + return +} + +func @a(%arg0: i32, %arg1: i32) { + %1 = "emitc.apply"(%arg0) {applicableOperator = "&"} : (i32) -> !emitc.opaque<"int32_t*"> + %2 = emitc.apply "&"(%arg1) : (i32) -> !emitc.opaque<"int32_t*"> + return +} diff --git a/mlir/test/Dialect/EmitC/types.mlir b/mlir/test/Dialect/EmitC/types.mlir new file mode 100644 index 000000000000..f1ffce74e4c2 --- /dev/null +++ b/mlir/test/Dialect/EmitC/types.mlir @@ -0,0 +1,18 @@ +// RUN: mlir-opt -verify-diagnostics %s | FileCheck %s +// check parser +// RUN: mlir-opt -verify-diagnostics %s | mlir-opt -verify-diagnostics | FileCheck %s + +// CHECK-LABEL: func @opaque_types() { +func @opaque_types() { + // CHECK-NEXT: !emitc.opaque<"int"> + emitc.call "f"() {args = [!emitc<"opaque<\"int\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"byte"> + emitc.call "f"() {args = [!emitc<"opaque<\"byte\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"unsigned"> + emitc.call "f"() {args = [!emitc<"opaque<\"unsigned\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"status_t"> + emitc.call "f"() {args = [!emitc<"opaque<\"status_t\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"std::vector<std::string>"> + emitc.call "f"() {args = [!emitc.opaque<"std::vector<std::string>">]} : () -> () + return +} diff --git a/mlir/test/mlir-opt/commandline.mlir b/mlir/test/mlir-opt/commandline.mlir index 125d6b1d950b..95c476a84163 100644 --- a/mlir/test/mlir-opt/commandline.mlir +++ b/mlir/test/mlir-opt/commandline.mlir @@ -8,6 +8,7 @@ // CHECK-NEXT: async // CHECK-NEXT: complex // CHECK-NEXT: dlti +// CHECK-NEXT: emitc // CHECK-NEXT: gpu // CHECK-NEXT: linalg // CHECK-NEXT: llvm </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3_LTO - Build # 35 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 7a3248463c2095ba112a31809f2965d04bed03b3 Author: Mark Eggleston <markeggleston(a)gcc.gnu.org> Date: Mon Oct 7 09:13:16 2019 +0000 Delete auto-in_equiv.f90 forgot to use svn delete the first time. From-SVN: r276651 </cut> Results regressed to (for first_bad == 7a3248463c2095ba112a31809f2965d04bed03b3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-7a3248463c2095ba112a31809f2965d04bed03b3/results_id: 1 # 436.cactusADM,cactusADM_base.default regressed by 104 from (for last_good == 9b0365879b3c4917f5a2485a1fca8bb678484bfe) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-9b0365879b3c4917f5a2485a1fca8bb678484bfe/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4731 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4733 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-7a3248463c2095ba112a31809f2965d04bed03b3 cd investigate-gcc-7a3248463c2095ba112a31809f2965d04bed03b3 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 7a3248463c2095ba112a31809f2965d04bed03b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b0365879b3c4917f5a2485a1fca8bb678484bfe ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit 7a3248463c2095ba112a31809f2965d04bed03b3 Author: Mark Eggleston <markeggleston(a)gcc.gnu.org> Date: Mon Oct 7 09:13:16 2019 +0000 Delete auto-in_equiv.f90 forgot to use svn delete the first time. From-SVN: r276651 --- gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 | 63 --------------------------- 1 file changed, 63 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 b/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 deleted file mode 100644 index 57c384d1772..00000000000 --- a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 +++ /dev/null @@ -1,63 +0,0 @@ -! { dg-do run } -! { dg-options "-fdec-static -fno-automatic" } - -! Contributed by Mark Eggleston <mark.eggleston(a)codethink.com> - -! Storage is NOT on the static unless explicitly specified using the -! DEC extension "automatic". The address of the first local variable -! is used to determine that storage for the automatic local variable -! is different to that of a local variable with no attributes. The -! contents of the local variable in suba should be overwritten by the -! call to subb. -! -program test - integer :: dummy - integer, parameter :: address = kind(loc(dummy)) - integer(address) :: ad1 - integer(address) :: ad2 - integer(address) :: ad3 - logical :: ok - - call suba(0, ad1) - call subb(0, ad2) - call suba(1, ad1) - call subc(0, ad3) - ok = (ad1.eq.ad3).and.(ad1.ne.ad2) - if (.not.ok) stop 4 - -contains - subroutine suba(option, addr) - integer, intent(in) :: option - integer(address), intent(out) :: addr - integer, automatic :: a - integer :: b - equivalence (a, b) - addr = loc(a) - if (option.eq.0) then - ! initialise a and c - a = 9 - if (a.ne.b) stop 1 - if (loc(a).ne.loc(b)) stop 2 - else - ! a should've been overwritten - if (a.eq.9) stop 3 - end if - end subroutine suba - - subroutine subb(dummy, addr) - integer, intent(in) :: dummy - integer(address), intent(out) :: addr - integer :: x - addr = loc(x) - x = 77 - end subroutine subb - - subroutine subc(dummy, addr) - integer, intent(in) :: dummy - integer(address), intent(out) :: addr - integer, automatic :: y - addr = loc(y) - y = 77 - end subroutine subc - -end program test </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 21 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 4aeeb91a9249282231cdd35773c17110e05a870d Author: MaheshRavishankar <ravishankarm(a)google.com> Date: Mon Aug 23 10:15:35 2021 -0700 [mlir][Linalg] Allow all build methods of Structured ops to specify additional attributes. Differential Revision: https://reviews.llvm.org/D108338 </cut> Results regressed to (for first_bad == 4aeeb91a9249282231cdd35773c17110e05a870d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-4aeeb91a9249282231cdd35773c17110e05a870d/results_id: 1 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 111 from (for last_good == 19dc02e99f802922a3af69e802465bee0723b57a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-19dc02e99f802922a3af69e802465bee0723b57a/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4687 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4669 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4aeeb91a9249282231cdd35773c17110e05a870d cd investigate-llvm-4aeeb91a9249282231cdd35773c17110e05a870d git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4aeeb91a9249282231cdd35773c17110e05a870d ../artifacts/test.sh # Reproduce last_good build git checkout --detach 19dc02e99f802922a3af69e802465bee0723b57a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4aeeb91a9249282231cdd35773c17110e05a870d Author: MaheshRavishankar <ravishankarm(a)google.com> Date: Mon Aug 23 10:15:35 2021 -0700 [mlir][Linalg] Allow all build methods of Structured ops to specify additional attributes. Differential Revision: https://reviews.llvm.org/D108338 --- .../mlir/Dialect/Linalg/IR/LinalgStructuredOps.td | 12 ++++++++---- mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 19 ++++++++++++------- mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc | 3 ++- .../tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp | 12 +++++++++--- .../mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp | 8 ++++++-- 5 files changed, 37 insertions(+), 17 deletions(-) diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td index 33f4992e41f9..1d4e6d546067 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td @@ -620,18 +620,22 @@ def GenericOp : LinalgStructuredBase_Op<"generic", [ "ValueRange":$outputs, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, "StringRef":$doc, "StringRef":$libraryCall, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)>, + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>, OpBuilder<(ins "ValueRange":$inputs, "ValueRange":$outputBuffers, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, "StringRef":$doc, "StringRef":$libraryCall, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)>, + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>, OpBuilder<(ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, "ValueRange":$outputs, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)>, + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>, OpBuilder<(ins "ValueRange":$inputs, "ValueRange":$outputBuffers, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)> + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)> ]; let extraClassDeclaration = structuredOpsBaseDecls # [{ diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp index 6b65a9ecd9e5..f4750ca390a8 100644 --- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp +++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp @@ -502,13 +502,15 @@ void GenericOp::build( OpBuilder &builder, OperationState &result, TypeRange resultTensorTypes, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, StringRef doc, StringRef libraryCall, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, resultTensorTypes, inputs, outputs, builder.getAffineMapArrayAttr(indexingMaps), builder.getStrArrayAttr(iteratorTypes), doc.empty() ? StringAttr() : builder.getStringAttr(doc), libraryCall.empty() ? StringAttr() : builder.getStringAttr(libraryCall)); + result.addAttributes(attributes); if (!bodyBuild) return; @@ -527,30 +529,33 @@ void GenericOp::build( OpBuilder &builder, OperationState &result, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, StringRef doc, StringRef libraryCall, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, TypeRange{}, inputs, outputs, indexingMaps, - iteratorTypes, doc, libraryCall, bodyBuild); + iteratorTypes, doc, libraryCall, bodyBuild, attributes); } void GenericOp::build( OpBuilder &builder, OperationState &result, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, inputs, outputs, indexingMaps, iteratorTypes, /*doc=*/"", - /*libraryCall=*/"", bodyBuild); + /*libraryCall=*/"", bodyBuild, attributes); } void GenericOp::build( OpBuilder &builder, OperationState &result, TypeRange resultTensorTypes, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, resultTensorTypes, inputs, outputs, indexingMaps, iteratorTypes, /*doc=*/"", - /*libraryCall=*/"", bodyBuild); + /*libraryCall=*/"", bodyBuild, attributes); } static void print(OpAsmPrinter &p, GenericOp op) { diff --git a/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc b/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc index 471961f837bf..743bdbdb12d6 100644 --- a/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc +++ b/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc @@ -169,7 +169,8 @@ It has one output. // ODS-LABEL: def Test7Op // ODS: OpBuilder< // ODS: (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, -// ODS: "ValueRange":$outputs, "Attribute":$attr_a, "Attribute":$attr_b) +// ODS: "ValueRange":$outputs, "Attribute":$attr_a, "Attribute":$attr_b, +// ODS: CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes) // ODS: $_state.addAttribute("attr_a", attr_a); // ODS: $_state.addAttribute("attr_b", attr_b); // diff --git a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp index 1bdb5b8806d0..590f17fdedfa 100644 --- a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp +++ b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp @@ -1910,7 +1910,8 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, let skipDefaultBuilders = 1; let builders = [ OpBuilder< - (ins "ValueRange":$inputs, "ValueRange":$outputs), + (ins "ValueRange":$inputs, "ValueRange":$outputs, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -1919,6 +1920,7 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, @@ -1927,7 +1929,8 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, }]>, OpBuilder< (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, - "ValueRange":$outputs), + "ValueRange":$outputs, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -1937,6 +1940,7 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, @@ -2020,7 +2024,8 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, const char *builderFmt = R"FMT( , OpBuilder< (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, - "ValueRange":$outputs, {1}), + "ValueRange":$outputs, {1}, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -2030,6 +2035,7 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, diff --git a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp index a0eb1dea8860..98e90b69d631 100644 --- a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp +++ b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp @@ -457,7 +457,8 @@ def {0} : LinalgStructuredBase_Op<"{1}", !listconcat([ let skipDefaultBuilders = 1; let builders = [ OpBuilder< - (ins "ValueRange":$inputs, "ValueRange":$outputs), + (ins "ValueRange":$inputs, "ValueRange":$outputs, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -471,6 +472,7 @@ def {0} : LinalgStructuredBase_Op<"{1}", !listconcat([ $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, @@ -539,7 +541,8 @@ def {0} : LinalgStructuredBase_Op<"{1}", !listconcat([ static const char structuredOpBuilderFormat[] = R"FMT( , OpBuilder< (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, - "ValueRange":$outputs, {1}), + "ValueRange":$outputs, {1}, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -555,6 +558,7 @@ static const char structuredOpBuilderFormat[] = R"FMT( TypeRange(inputs), TypeRange(outputs)); {2} + $_state.addAttributes(attributes); }]> )FMT"; </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3_LTO - Build # 34 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b Author: Ian Lance Taylor <ian(a)gcc.gnu.org> Date: Fri Oct 4 13:50:07 2019 +0000 compiler: adjust code to avoid shadowing local variables Also add a couple of missing calls to free after mpz_get_str. This should make the code clean with respect to -Wshadow=local. Based on patch by Bernd Edlinger. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/198837 From-SVN: r276579 </cut> Results regressed to (for first_bad == 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 0ced79bc4c9925c574177cb6345c26e4aad4155f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-0ced79bc4c9925c574177cb6345c26e4aad4155f/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4662 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4665 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b cd investigate-gcc-3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0ced79bc4c9925c574177cb6345c26e4aad4155f ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b Author: Ian Lance Taylor <ian(a)gcc.gnu.org> Date: Fri Oct 4 13:50:07 2019 +0000 compiler: adjust code to avoid shadowing local variables Also add a couple of missing calls to free after mpz_get_str. This should make the code clean with respect to -Wshadow=local. Based on patch by Bernd Edlinger. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/198837 From-SVN: r276579 --- gcc/go/gofrontend/MERGE | 2 +- gcc/go/gofrontend/ast-dump.cc | 8 +++--- gcc/go/gofrontend/escape.cc | 1 - gcc/go/gofrontend/expressions.cc | 54 ++++++++++++++++++---------------------- gcc/go/gofrontend/gogo.cc | 28 ++++++++++----------- gcc/go/gofrontend/parse.cc | 26 +++++++++---------- gcc/go/gofrontend/statements.cc | 17 +++++++------ gcc/go/gofrontend/types.cc | 40 ++++++++++++++--------------- 8 files changed, 85 insertions(+), 91 deletions(-) diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 54c682a8b78..bb509943d6e 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -a3aef6b6df932ea6c7094d074695bc0b033a3d17 +441f3f1f350b532707c48273d7f454cf1c4e959f The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/gcc/go/gofrontend/ast-dump.cc b/gcc/go/gofrontend/ast-dump.cc index b20f7e4e725..a3cbda9debc 100644 --- a/gcc/go/gofrontend/ast-dump.cc +++ b/gcc/go/gofrontend/ast-dump.cc @@ -135,11 +135,11 @@ Ast_dump_traverse_blocks_and_functions::function(Named_object* no) { if (it != res->begin()) this->ast_dump_context_->ostream() << ","; - Named_object* no = (*it); + Named_object* rno = (*it); - this->ast_dump_context_->ostream() << no->name() << " "; - go_assert(no->is_result_variable()); - Result_variable* resvar = no->result_var_value(); + this->ast_dump_context_->ostream() << rno->name() << " "; + go_assert(rno->is_result_variable()); + Result_variable* resvar = rno->result_var_value(); this->ast_dump_context_->dump_type(resvar->type()); diff --git a/gcc/go/gofrontend/escape.cc b/gcc/go/gofrontend/escape.cc index bfd1a39d7e4..f8e07f73cd2 100644 --- a/gcc/go/gofrontend/escape.cc +++ b/gcc/go/gofrontend/escape.cc @@ -1541,7 +1541,6 @@ Escape_analysis_assign::expression(Expression** pexpr) if (debug_level > 1) { - Node* n = Node::make_node(*pexpr); std::string fn_name = this->context_->current_function_name(); go_debug((*pexpr)->location(), "[%d] %s esc: %s", this->context_->loop_depth(), fn_name.c_str(), diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc index a72ba243f37..9babc348595 100644 --- a/gcc/go/gofrontend/expressions.cc +++ b/gcc/go/gofrontend/expressions.cc @@ -4104,9 +4104,11 @@ Type_conversion_expression::do_get_backend(Translate_context* context) x = mpz_get_ui(intval); else { - char* s = mpz_get_str(NULL, 16, intval); + char* ms = mpz_get_str(NULL, 16, intval); go_warning_at(loc, 0, - "unicode code point 0x%s out of range in string", s); + "unicode code point 0x%s out of range in string", + ms); + free(ms); x = 0xfffd; } Lex::append_char(x, true, &s, loc); @@ -8016,14 +8018,14 @@ Bound_method_expression::do_flatten(Gogo* gogo, Named_object*, Expression* ret = Expression::make_struct_composite_literal(st, vals, loc); ret = Expression::make_heap_expression(ret, loc); - Node* n = Node::make_node(this); - if ((n->encoding() & ESCAPE_MASK) == Node::ESCAPE_NONE) + Node* node = Node::make_node(this); + if ((node->encoding() & ESCAPE_MASK) == Node::ESCAPE_NONE) ret->heap_expression()->set_allocate_on_stack(); else if (gogo->compiling_runtime() && gogo->package_name() == "runtime" && !saw_errors()) go_error_at(loc, "%s escapes to heap, not allowed in runtime", - n->ast_format(gogo).c_str()); + node->ast_format(gogo).c_str()); // If necessary, check whether the expression or any embedded // pointers are nil. @@ -8741,8 +8743,6 @@ Builtin_call_expression::lower_make(Statement_inserter* inserter) Expression::make_nil(loc)); else { - Numeric_constant nclen; - unsigned long vlen; if (len_arg->numeric_constant_value(&nclen) && nclen.to_unsigned_long(&vlen) == Numeric_constant::NC_UL_VALID && vlen <= Map_type::bucket_size) @@ -9053,8 +9053,7 @@ Builtin_call_expression::flatten_append(Gogo* gogo, Named_object* function, else { Type* int32_type = Type::lookup_integer_type("int32"); - Expression* zero = - Expression::make_integer_ul(0, int32_type, loc); + zero = Expression::make_integer_ul(0, int32_type, loc); call = Runtime::make_call(Runtime::BUILTIN_MEMSET, loc, 3, a1, zero, a2); } @@ -9064,15 +9063,12 @@ Builtin_call_expression::flatten_append(Gogo* gogo, Named_object* function, // For a slice containing pointers, growslice already zeroed // the memory. We only need to zero in non-growing case. // Note: growslice does not zero the memory in non-pointer case. - Expression* left = - Expression::make_temporary_reference(ntmp, loc); - left = Expression::make_cast(uint_type, left, loc); - Expression* right = - Expression::make_temporary_reference(c1tmp, loc); - right = Expression::make_cast(uint_type, right, loc); - Expression* cond = - Expression::make_binary(OPERATOR_GT, left, right, loc); - Expression* zero = Expression::make_integer_ul(0, int_type, loc); + ref = Expression::make_temporary_reference(ntmp, loc); + ref = Expression::make_cast(uint_type, ref, loc); + ref2 = Expression::make_temporary_reference(c1tmp, loc); + ref2 = Expression::make_cast(uint_type, ref2, loc); + cond = Expression::make_binary(OPERATOR_GT, ref, ref2, loc); + zero = Expression::make_integer_ul(0, int_type, loc); call = Expression::make_conditional(cond, call, zero, loc); } } @@ -10877,9 +10873,7 @@ Call_expression::do_lower(Gogo* gogo, Named_object* function, if (this->result_count() > 1 && this->call_temp_ == NULL) { Struct_field_list* sfl = new Struct_field_list(); - Function_type* fntype = this->get_function_type(); const Typed_identifier_list* results = fntype->results(); - Location loc = this->location(); int i = 0; char buf[20]; @@ -12295,10 +12289,10 @@ Call_expression::do_get_backend(Translate_context* context) } else { - Expression* first_arg; - fn = this->interface_method_function(interface_method, &first_arg, + Expression* arg0; + fn = this->interface_method_function(interface_method, &arg0, location); - fn_args[0] = first_arg->get_backend(context); + fn_args[0] = arg0->get_backend(context); } Bexpression* bclosure = NULL; @@ -16453,11 +16447,11 @@ Composite_literal_expression::lower_array(Type* type) traverse_order = new std::vector<unsigned long>(); traverse_order->reserve(v.size()); - for (V::const_iterator p = v.begin(); p != v.end(); ++p) + for (V::const_iterator pv = v.begin(); pv != v.end(); ++pv) { - indexes->push_back(p->index); - vals->push_back(p->expr); - traverse_order->push_back(p->traversal_order); + indexes->push_back(pv->index); + vals->push_back(pv->expr); + traverse_order->push_back(pv->traversal_order); } } @@ -17771,9 +17765,9 @@ Interface_info_expression::do_type() Interface_type* itype = this->iface_->type()->interface_type(); - Hashtable::const_iterator p = result_types.find(itype); - if (p != result_types.end()) - return p->second; + Hashtable::const_iterator pr = result_types.find(itype); + if (pr != result_types.end()) + return pr->second; Type* pdt = Type::make_type_descriptor_ptr_type(); if (itype->is_empty()) diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc index e7af673c8df..a79cfc3a9a7 100644 --- a/gcc/go/gofrontend/gogo.cc +++ b/gcc/go/gofrontend/gogo.cc @@ -518,11 +518,11 @@ Gogo::import_package(const std::string& filename, else if (ln == ".") { Bindings* bindings = package->bindings(); - for (Bindings::const_declarations_iterator p = + for (Bindings::const_declarations_iterator pd = bindings->begin_declarations(); - p != bindings->end_declarations(); - ++p) - this->add_dot_import_object(p->second); + pd != bindings->end_declarations(); + ++pd) + this->add_dot_import_object(pd->second); std::string dot_alias = "." + package->package_name(); package->add_alias(dot_alias, location); } @@ -678,8 +678,8 @@ Gogo::recompute_init_priorities() pci != ii->precursors().end(); ++pci) { - Import_init* ii = this->lookup_init(*pci); - nonroots.insert(ii); + Import_init* ii_init = this->lookup_init(*pci); + nonroots.insert(ii_init); } } @@ -2613,11 +2613,11 @@ Gogo::define_global_names() { if (no->type_declaration_value()->has_methods()) { - for (std::vector<Named_object*>::const_iterator p = + for (std::vector<Named_object*>::const_iterator pm = no->type_declaration_value()->methods()->begin(); - p != no->type_declaration_value()->methods()->end(); - p++) - go_error_at((*p)->location(), + pm != no->type_declaration_value()->methods()->end(); + pm++) + go_error_at((*pm)->location(), "may not define methods on non-local type"); } no->set_type_value(global_no->type_value()); @@ -6550,8 +6550,8 @@ Function::build(Gogo* gogo, Named_object* named_function) // Build the backend representation for all the statements in the // function. - Translate_context context(gogo, named_function, NULL, NULL); - Bblock* code_block = this->block_->get_backend(&context); + Translate_context bcontext(gogo, named_function, NULL, NULL); + Bblock* code_block = this->block_->get_backend(&bcontext); // Initialize variables if necessary. Translate_context icontext(gogo, named_function, this->block_, @@ -6608,8 +6608,8 @@ Function::build(Gogo* gogo, Named_object* named_function) // If we created a descriptor for the function, make sure we emit it. if (this->descriptor_ != NULL) { - Translate_context context(gogo, NULL, NULL, NULL); - this->descriptor_->get_backend(&context); + Translate_context dcontext(gogo, NULL, NULL, NULL); + this->descriptor_->get_backend(&dcontext); } } diff --git a/gcc/go/gofrontend/parse.cc b/gcc/go/gofrontend/parse.cc index 52371b2b032..e50af616421 100644 --- a/gcc/go/gofrontend/parse.cc +++ b/gcc/go/gofrontend/parse.cc @@ -836,7 +836,7 @@ Parse::parameter_list(bool* is_varargs) { std::string name = token->identifier(); bool is_exported = token->is_identifier_exported(); - Location location = token->location(); + Location id_location = token->location(); token = this->advance_token(); if (!token->is_op(OPERATOR_COMMA)) { @@ -861,7 +861,7 @@ Parse::parameter_list(bool* is_varargs) } this->unget_token(Token::make_identifier_token(name, is_exported, - location)); + id_location)); } else { @@ -872,15 +872,15 @@ Parse::parameter_list(bool* is_varargs) // commas as we can. std::string id_name = this->gogo_->pack_hidden_name(name, is_exported); - ret->push_back(Typed_identifier(id_name, NULL, location)); + ret->push_back(Typed_identifier(id_name, NULL, id_location)); bool just_saw_comma = true; while (this->advance_token()->is_identifier()) { name = this->peek_token()->identifier(); is_exported = this->peek_token()->is_identifier_exported(); - location = this->peek_token()->location(); + id_location = this->peek_token()->location(); id_name = this->gogo_->pack_hidden_name(name, is_exported); - ret->push_back(Typed_identifier(id_name, NULL, location)); + ret->push_back(Typed_identifier(id_name, NULL, id_location)); if (!this->advance_token()->is_op(OPERATOR_COMMA)) { just_saw_comma = false; @@ -909,7 +909,7 @@ Parse::parameter_list(bool* is_varargs) // names. parameters_have_names = false; this->unget_token(Token::make_identifier_token(name, is_exported, - location)); + id_location)); ret->pop_back(); just_saw_comma = true; } @@ -2808,7 +2808,7 @@ Parse::composite_lit(Type* type, int depth, Location location) { std::string identifier = token->identifier(); bool is_exported = token->is_identifier_exported(); - Location location = token->location(); + Location id_location = token->location(); if (this->advance_token()->is_op(OPERATOR_COLON)) { @@ -2820,14 +2820,14 @@ Parse::composite_lit(Type* type, int depth, Location location) Gogo* gogo = this->gogo_; val = this->id_to_expression(gogo->pack_hidden_name(identifier, is_exported), - location, false); + id_location, false); is_name = true; } else { this->unget_token(Token::make_identifier_token(identifier, is_exported, - location)); + id_location)); val = this->expression(PRECEDENCE_NORMAL, false, true, NULL, NULL); } @@ -2923,14 +2923,14 @@ Parse::composite_lit(Type* type, int depth, Location location) go_error_at(this->location(), "expected %<,%> or %<}%>"); this->gogo_->mark_locals_used(); - int depth = 0; + int edepth = 0; while (!token->is_eof() - && (depth > 0 || !token->is_op(OPERATOR_RCURLY))) + && (edepth > 0 || !token->is_op(OPERATOR_RCURLY))) { if (token->is_op(OPERATOR_LCURLY)) - ++depth; + ++edepth; else if (token->is_op(OPERATOR_RCURLY)) - --depth; + --edepth; token = this->advance_token(); } if (token->is_op(OPERATOR_RCURLY)) diff --git a/gcc/go/gofrontend/statements.cc b/gcc/go/gofrontend/statements.cc index 3dc394ab32b..f52b33d665c 100644 --- a/gcc/go/gofrontend/statements.cc +++ b/gcc/go/gofrontend/statements.cc @@ -2938,7 +2938,7 @@ Thunk_statement::build_thunk(Gogo* gogo, const std::string& thunk_name) go_assert(call_statement->classification() == STATEMENT_EXPRESSION); Expression_statement* es = static_cast<Expression_statement*>(call_statement); - Call_expression* ce = es->expr()->call_expression(); + ce = es->expr()->call_expression(); if (ce == NULL) go_assert(saw_errors()); else @@ -5972,10 +5972,11 @@ Select_statement::lower_two_case(Block* b) // if selectnbrecv2(&lhs, &ok, chan) { body } else { default body } Type* booltype = Type::make_boolean_type(); - Temporary_statement* ts = Statement::make_temporary(booltype, NULL, loc); - b->add_statement(ts); + Temporary_statement* okts = Statement::make_temporary(booltype, NULL, + loc); + b->add_statement(okts); - okref = Expression::make_temporary_reference(ts, loc); + okref = Expression::make_temporary_reference(okts, loc); Expression* okaddr = Expression::make_unary(OPERATOR_AND, okref, loc); call = Runtime::make_call(Runtime::SELECTNBRECV2, loc, 3, addr, okaddr, chanref); @@ -6595,7 +6596,7 @@ For_range_statement::lower_range_array(Gogo* gogo, iter_init = new Block(body_block, loc); ref = Expression::make_temporary_reference(range_temp, loc); - Expression* ref2 = Expression::make_temporary_reference(index_temp, loc); + ref2 = Expression::make_temporary_reference(index_temp, loc); Expression* index = Expression::make_index(ref, ref2, NULL, NULL, loc); tref = Expression::make_temporary_reference(value_temp, loc); @@ -6693,7 +6694,7 @@ For_range_statement::lower_range_slice(Gogo* gogo, iter_init = new Block(body_block, loc); ref = Expression::make_temporary_reference(for_temp, loc); - Expression* ref2 = Expression::make_temporary_reference(index_temp, loc); + ref2 = Expression::make_temporary_reference(index_temp, loc); Expression* index = Expression::make_index(ref, ref2, NULL, NULL, loc); tref = Expression::make_temporary_reference(value_temp, loc); @@ -7179,9 +7180,9 @@ For_range_statement::lower_array_range_clear(Gogo* gogo, else { Type* int32_type = Type::lookup_integer_type("int32"); - Expression* zero = Expression::make_integer_ul(0, int32_type, loc); + Expression* zero32 = Expression::make_integer_ul(0, int32_type, loc); call = Runtime::make_call(Runtime::BUILTIN_MEMSET, loc, 3, ptr_arg, - zero, sz_arg); + zero32, sz_arg); } Statement* cs3 = Statement::make_statement(call, true); b->add_statement(cs3); diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc index eeae9fa4c0e..e02b832df14 100644 --- a/gcc/go/gofrontend/types.cc +++ b/gcc/go/gofrontend/types.cc @@ -6410,12 +6410,11 @@ Struct_type::do_type_descriptor(Gogo* gogo, Named_type* name) fvals->push_back(Expression::make_nil(bloc)); else { - std::string n; if (is_embedded_builtin) n = gogo->package_name(); else n = Gogo::hidden_name_pkgpath(pf->field_name()); - Expression* s = Expression::make_string(n, bloc); + s = Expression::make_string(n, bloc); fvals->push_back(Expression::make_unary(OPERATOR_AND, s, bloc)); } @@ -6429,7 +6428,7 @@ Struct_type::do_type_descriptor(Gogo* gogo, Named_type* name) fvals->push_back(Expression::make_nil(bloc)); else { - Expression* s = Expression::make_string(pf->tag(), bloc); + s = Expression::make_string(pf->tag(), bloc); fvals->push_back(Expression::make_unary(OPERATOR_AND, s, bloc)); } @@ -6635,22 +6634,22 @@ Struct_type::do_reflection(Gogo* gogo, std::string* ret) const { const std::string& tag(p->tag()); ret->append(" \""); - for (std::string::const_iterator p = tag.begin(); - p != tag.end(); - ++p) + for (std::string::const_iterator pt = tag.begin(); + pt != tag.end(); + ++pt) { - if (*p == '\0') + if (*pt == '\0') ret->append("\\x00"); - else if (*p == '\n') + else if (*pt == '\n') ret->append("\\n"); - else if (*p == '\t') + else if (*pt == '\t') ret->append("\\t"); - else if (*p == '"') + else if (*pt == '"') ret->append("\\\""); - else if (*p == '\\') + else if (*pt == '\\') ret->append("\\\\"); else - ret->push_back(*p); + ret->push_back(*pt); } ret->push_back('"'); } @@ -7197,11 +7196,11 @@ Array_type::verify_length() return false; case Numeric_constant::NC_UL_BIG: { - mpz_t val; - if (!nc.to_int(&val)) + mpz_t mval; + if (!nc.to_int(&mval)) go_unreachable(); - unsigned int bits = mpz_sizeinbase(val, 2); - mpz_clear(val); + unsigned int bits = mpz_sizeinbase(mval, 2); + mpz_clear(mval); if (bits >= tbits) { go_error_at(this->length_->location(), "array bound overflows"); @@ -7704,6 +7703,7 @@ Array_type::do_export(Export* exp) const } char* s = mpz_get_str(NULL, 10, val); exp->write_string(s); + free(s); exp->write_string(" "); mpz_clear(val); } @@ -9752,7 +9752,7 @@ Interface_type::do_import(Import* imp) parameters = new Typed_identifier_list; while (true) { - std::string name = imp->read_name(); + std::string pname = imp->read_name(); imp->require_c_string(" "); if (imp->match_c_string("...")) @@ -9764,7 +9764,7 @@ Interface_type::do_import(Import* imp) Type* ptype = imp->read_type(); if (is_varargs) ptype = Type::make_array_type(ptype, NULL); - parameters->push_back(Typed_identifier(name, ptype, + parameters->push_back(Typed_identifier(pname, ptype, imp->location())); if (imp->peek_char() != ',') break; @@ -9791,10 +9791,10 @@ Interface_type::do_import(Import* imp) imp->advance(1); while (true) { - std::string name = imp->read_name(); + std::string rname = imp->read_name(); imp->require_c_string(" "); Type* rtype = imp->read_type(); - results->push_back(Typed_identifier(name, rtype, + results->push_back(Typed_identifier(rname, rtype, imp->location())); if (imp->peek_char() != ',') break; </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O2 - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2 Culprit: <cut> commit 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 Author: Stephen Kelly <steveire(a)gmail.com> Date: Mon Apr 26 18:28:50 2021 +0100 [AST] Fix DeclarationNameInfo introspection Some AST classes return `const DeclarationNameInfo &` instead of returning by value (eg CXXDependentScopeMemberExpr). </cut> Results regressed to (for first_bad == 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-50b523cb2ceee4ca7279b4ce22ddb0d0b05df313/results_id: 1 # 433.milc,[.] mult_su3_mat_vec regressed by 115 from (for last_good == 10038d0b3dfcfa6abf8a710612899f859ef1534b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-10038d0b3dfcfa6abf8a710612899f859ef1534b/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2/4634 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2/4631 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 cd investigate-llvm-50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 10038d0b3dfcfa6abf8a710612899f859ef1534b ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 Author: Stephen Kelly <steveire(a)gmail.com> Date: Mon Apr 26 18:28:50 2021 +0100 [AST] Fix DeclarationNameInfo introspection Some AST classes return `const DeclarationNameInfo &` instead of returning by value (eg CXXDependentScopeMemberExpr). --- clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp | 3 + .../unittests/Introspection/IntrospectionTest.cpp | 66 ++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp b/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp index 0aeb3a7703f7..0a7fb9b52f23 100644 --- a/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp +++ b/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp @@ -225,6 +225,9 @@ void ASTSrcLocProcessor::run(const MatchFinder::MatchResult &Result) { CaptureMethods("class clang::NestedNameSpecifierLoc", ASTClass, Result); CD.DeclNameInfos = CaptureMethods("struct clang::DeclarationNameInfo", ASTClass, Result); + auto DI = CaptureMethods("const struct clang::DeclarationNameInfo &", + ASTClass, Result); + CD.DeclNameInfos.insert(CD.DeclNameInfos.end(), DI.begin(), DI.end()); if (const auto *DerivedFrom = Result.Nodes.getNodeAs<clang::CXXRecordDecl>("derivedFrom")) { diff --git a/clang/unittests/Introspection/IntrospectionTest.cpp b/clang/unittests/Introspection/IntrospectionTest.cpp index 521520c9a7c7..d4f626bfeb74 100644 --- a/clang/unittests/Introspection/IntrospectionTest.cpp +++ b/clang/unittests/Introspection/IntrospectionTest.cpp @@ -1456,6 +1456,72 @@ getNamedTypeInfo()->getTypeLoc().getAs<clang::TypeSpecTypeLoc>().getNameLoc()), STRING_LOCATION_PAIR((&NI), getSourceRange()))); } +TEST(Introspection, SourceLocations_DeclarationNameInfo_CRef) { + if (!NodeIntrospection::hasIntrospectionSupport()) + return; + + auto AST = buildASTFromCodeWithArgs( + R"cpp( +template<typename T> +struct MyContainer +{ + template <typename U> + void pushBack(); +}; + +template<typename T> +void foo() +{ + MyContainer<T> mc; + mc.template pushBack<int>(); +} +)cpp", + {"-fno-delayed-template-parsing"}, "foo.cpp", "clang-tool", + std::make_shared<PCHContainerOperations>()); + + auto &Ctx = AST->getASTContext(); + auto &TU = *Ctx.getTranslationUnitDecl(); + + auto BoundNodes = ast_matchers::match( + decl(hasDescendant(cxxDependentScopeMemberExpr(hasMemberName("pushBack")).bind("member"))), TU, + Ctx); + + EXPECT_EQ(BoundNodes.size(), 1u); + + const auto *Member = BoundNodes[0].getNodeAs<CXXDependentScopeMemberExpr>("member"); + auto Result = NodeIntrospection::GetLocations(Member); + + auto ExpectedLocations = + FormatExpected<SourceLocation>(Result.LocationAccessors); + + llvm::sort(ExpectedLocations); + + EXPECT_EQ( + llvm::makeArrayRef(ExpectedLocations), + (ArrayRef<std::pair<std::string, SourceLocation>>{ + STRING_LOCATION_STDPAIR(Member, getBeginLoc()), + STRING_LOCATION_STDPAIR(Member, getEndLoc()), + STRING_LOCATION_STDPAIR(Member, getExprLoc()), + STRING_LOCATION_STDPAIR(Member, getLAngleLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberNameInfo().getBeginLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberNameInfo().getEndLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberNameInfo().getLoc()), + STRING_LOCATION_STDPAIR(Member, getOperatorLoc()), + STRING_LOCATION_STDPAIR(Member, getRAngleLoc()), + STRING_LOCATION_STDPAIR(Member, getTemplateKeywordLoc()) + })); + + auto ExpectedRanges = FormatExpected<SourceRange>(Result.RangeAccessors); + + EXPECT_THAT( + ExpectedRanges, + UnorderedElementsAre( + STRING_LOCATION_PAIR(Member, getMemberNameInfo().getSourceRange()), + STRING_LOCATION_PAIR(Member, getSourceRange()) + )); +} + TEST(Introspection, SourceLocations_DeclarationNameInfo_ConvOp) { if (!NodeIntrospection::hasIntrospectionSupport()) return; </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz_LTO - Build # 5 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Culprit: <cut> commit 02b1c3f0529e525a4ffa671478050f4704b3f472 Author: Dmitry Preobrazhensky <dmitry.preobrazhensky(a)amd.com> Date: Fri Aug 6 15:49:52 2021 +0300 [AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description. Corrected sendmsg description (bug https://bugs.llvm.org/show_bug.cgi?id=49648). </cut> Results regressed to (for first_bad == 02b1c3f0529e525a4ffa671478050f4704b3f472) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-02b1c3f0529e525a4ffa671478050f4704b3f472/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == 4aafd5f00c2a772337ec065d4542ef158453a343) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-baseline/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/baseline-llvm-master-aarch64-spec2k6-Oz_LTO/4560 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/4618 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-02b1c3f0529e525a4ffa671478050f4704b3f472 cd investigate-llvm-02b1c3f0529e525a4ffa671478050f4704b3f472 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 02b1c3f0529e525a4ffa671478050f4704b3f472 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4aafd5f00c2a772337ec065d4542ef158453a343 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 02b1c3f0529e525a4ffa671478050f4704b3f472 Author: Dmitry Preobrazhensky <dmitry.preobrazhensky(a)amd.com> Date: Fri Aug 6 15:49:52 2021 +0300 [AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description. Corrected sendmsg description (bug https://bugs.llvm.org/show_bug.cgi?id=49648). --- llvm/docs/AMDGPU/gfx10_msg.rst | 41 +++++++++++++++++++++++------------------ llvm/docs/AMDGPU/gfx8_msg.rst | 1 + llvm/docs/AMDGPU/gfx90a_msg.rst | 41 +++++++++++++++++++++++------------------ llvm/docs/AMDGPU/gfx9_msg.rst | 41 +++++++++++++++++++++++------------------ 4 files changed, 70 insertions(+), 54 deletions(-) diff --git a/llvm/docs/AMDGPU/gfx10_msg.rst b/llvm/docs/AMDGPU/gfx10_msg.rst index 3e6c532dd85a..c0774d85a62e 100644 --- a/llvm/docs/AMDGPU/gfx10_msg.rst +++ b/llvm/docs/AMDGPU/gfx10_msg.rst @@ -47,24 +47,29 @@ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. Each message type supports specific operations: - ================= ========== ============================== ============ ========== - Message name Message Id Supported Operations Operation Id Stream Id - ================= ========== ============================== ============ ========== - MSG_INTERRUPT 1 \- \- \- - MSG_GS 2 GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_DONE 3 GS_OP_NOP 0 \- - \ GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_ALLOC_REQ 9 \- \- \- - MSG_GET_DOORBELL 10 \- \- \- - MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- - \ SYSMSG_OP_REG_RD 2 \- - \ SYSMSG_OP_HOST_TRAP_ACK 3 \- - \ SYSMSG_OP_TTRACE_PC 4 \- - ================= ========== ============================== ============ ========== + =================== ========== ============================== ============ ========== + Message name Message Id Supported Operations Operation Id Stream Id + =================== ========== ============================== ============ ========== + MSG_INTERRUPT 1 \- \- \- + MSG_GS 2 GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_GS_DONE 3 GS_OP_NOP 0 \- + \ GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- + MSG_STALL_WAVE_GEN 5 \- \- \- + MSG_HALT_WAVES 6 \- \- \- + MSG_ORDERED_PS_DONE 7 \- \- \- + MSG_GS_ALLOC_REQ 9 \- \- \- + MSG_GET_DOORBELL 10 \- \- \- + MSG_GET_DDID 11 \- \- \- + MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- + \ SYSMSG_OP_REG_RD 2 \- + \ SYSMSG_OP_HOST_TRAP_ACK 3 \- + \ SYSMSG_OP_TTRACE_PC 4 \- + =================== ========== ============================== ============ ========== *Sendmsg* arguments are validated depending on how *type* value is specified: diff --git a/llvm/docs/AMDGPU/gfx8_msg.rst b/llvm/docs/AMDGPU/gfx8_msg.rst index 0b0b2f307482..f32033dd944c 100644 --- a/llvm/docs/AMDGPU/gfx8_msg.rst +++ b/llvm/docs/AMDGPU/gfx8_msg.rst @@ -58,6 +58,7 @@ Each message type supports specific operations: \ GS_OP_CUT 1 Optional \ GS_OP_EMIT 2 Optional \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- \ SYSMSG_OP_REG_RD 2 \- \ SYSMSG_OP_HOST_TRAP_ACK 3 \- diff --git a/llvm/docs/AMDGPU/gfx90a_msg.rst b/llvm/docs/AMDGPU/gfx90a_msg.rst index aa44d3b64f49..37f945464e58 100644 --- a/llvm/docs/AMDGPU/gfx90a_msg.rst +++ b/llvm/docs/AMDGPU/gfx90a_msg.rst @@ -47,24 +47,29 @@ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. Each message type supports specific operations: - ================= ========== ============================== ============ ========== - Message name Message Id Supported Operations Operation Id Stream Id - ================= ========== ============================== ============ ========== - MSG_INTERRUPT 1 \- \- \- - MSG_GS 2 GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_DONE 3 GS_OP_NOP 0 \- - \ GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_ALLOC_REQ 9 \- \- \- - MSG_GET_DOORBELL 10 \- \- \- - MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- - \ SYSMSG_OP_REG_RD 2 \- - \ SYSMSG_OP_HOST_TRAP_ACK 3 \- - \ SYSMSG_OP_TTRACE_PC 4 \- - ================= ========== ============================== ============ ========== + ====================== ========== ============================== ============ ========== + Message name Message Id Supported Operations Operation Id Stream Id + ====================== ========== ============================== ============ ========== + MSG_INTERRUPT 1 \- \- \- + MSG_GS 2 GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_GS_DONE 3 GS_OP_NOP 0 \- + \ GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- + MSG_STALL_WAVE_GEN 5 \- \- \- + MSG_HALT_WAVES 6 \- \- \- + MSG_ORDERED_PS_DONE 7 \- \- \- + MSG_EARLY_PRIM_DEALLOC 8 \- \- \- + MSG_GS_ALLOC_REQ 9 \- \- \- + MSG_GET_DOORBELL 10 \- \- \- + MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- + \ SYSMSG_OP_REG_RD 2 \- + \ SYSMSG_OP_HOST_TRAP_ACK 3 \- + \ SYSMSG_OP_TTRACE_PC 4 \- + ====================== ========== ============================== ============ ========== *Sendmsg* arguments are validated depending on how *type* value is specified: diff --git a/llvm/docs/AMDGPU/gfx9_msg.rst b/llvm/docs/AMDGPU/gfx9_msg.rst index efb95e5a97db..34be1c8a24c5 100644 --- a/llvm/docs/AMDGPU/gfx9_msg.rst +++ b/llvm/docs/AMDGPU/gfx9_msg.rst @@ -47,24 +47,29 @@ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. Each message type supports specific operations: - ================= ========== ============================== ============ ========== - Message name Message Id Supported Operations Operation Id Stream Id - ================= ========== ============================== ============ ========== - MSG_INTERRUPT 1 \- \- \- - MSG_GS 2 GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_DONE 3 GS_OP_NOP 0 \- - \ GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_ALLOC_REQ 9 \- \- \- - MSG_GET_DOORBELL 10 \- \- \- - MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- - \ SYSMSG_OP_REG_RD 2 \- - \ SYSMSG_OP_HOST_TRAP_ACK 3 \- - \ SYSMSG_OP_TTRACE_PC 4 \- - ================= ========== ============================== ============ ========== + ====================== ========== ============================== ============ ========== + Message name Message Id Supported Operations Operation Id Stream Id + ====================== ========== ============================== ============ ========== + MSG_INTERRUPT 1 \- \- \- + MSG_GS 2 GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_GS_DONE 3 GS_OP_NOP 0 \- + \ GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- + MSG_STALL_WAVE_GEN 5 \- \- \- + MSG_HALT_WAVES 6 \- \- \- + MSG_ORDERED_PS_DONE 7 \- \- \- + MSG_EARLY_PRIM_DEALLOC 8 \- \- \- + MSG_GS_ALLOC_REQ 9 \- \- \- + MSG_GET_DOORBELL 10 \- \- \- + MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- + \ SYSMSG_OP_REG_RD 2 \- + \ SYSMSG_OP_HOST_TRAP_ACK 3 \- + \ SYSMSG_OP_TTRACE_PC 4 \- + ====================== ========== ============================== ============ ========== *Sendmsg* arguments are validated depending on how *type* value is specified: </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3_LTO - Build # 33 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 53329d29274fa4af5af7ab155947fe84b9684e39 Author: Rainer Orth <ro(a)CeBiTec.Uni-Bielefeld.DE> Date: Tue May 21 16:59:39 2019 +0000 Fix dg-require-* syntax * gcc.dg/Wattribute-alias.c: Pass emtpy arg to dg-require-ifunc. * gcc.c-torture/execute/20030125-1.c: Pass emtpy arg to dg-require-weak. * gcc.dg/torture/ftrapv-2.c: Pass empty arg to dg-require-fork. * gcc.target/i386/pr84723-1.c: Remove dg-require-ifunc. * gcc.target/i386/pr84723-2.c: Likewise. * gcc.target/i386/pr84723-3.c: Likewise. * gcc.target/i386/pr84723-4.c: Likewise. * gcc.target/i386/pr84723-5.c: Likewise. From-SVN: r271476 </cut> Results regressed to (for first_bad == 53329d29274fa4af5af7ab155947fe84b9684e39) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-53329d29274fa4af5af7ab155947fe84b9684e39/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 104 from (for last_good == b33a3c6451ecc09ac5f1c7ccdac9b19eb0bd1a48) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-b33a3c6451ecc09ac5f1c7ccdac9b19eb0bd1a48/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4568 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4564 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-53329d29274fa4af5af7ab155947fe84b9684e39 cd investigate-gcc-53329d29274fa4af5af7ab155947fe84b9684e39 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 53329d29274fa4af5af7ab155947fe84b9684e39 ../artifacts/test.sh # Reproduce last_good build git checkout --detach b33a3c6451ecc09ac5f1c7ccdac9b19eb0bd1a48 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit 53329d29274fa4af5af7ab155947fe84b9684e39 Author: Rainer Orth <ro(a)CeBiTec.Uni-Bielefeld.DE> Date: Tue May 21 16:59:39 2019 +0000 Fix dg-require-* syntax * gcc.dg/Wattribute-alias.c: Pass emtpy arg to dg-require-ifunc. * gcc.c-torture/execute/20030125-1.c: Pass emtpy arg to dg-require-weak. * gcc.dg/torture/ftrapv-2.c: Pass empty arg to dg-require-fork. * gcc.target/i386/pr84723-1.c: Remove dg-require-ifunc. * gcc.target/i386/pr84723-2.c: Likewise. * gcc.target/i386/pr84723-3.c: Likewise. * gcc.target/i386/pr84723-4.c: Likewise. * gcc.target/i386/pr84723-5.c: Likewise. From-SVN: r271476 --- gcc/testsuite/ChangeLog | 14 ++++++++++++++ gcc/testsuite/gcc.c-torture/execute/20030125-1.c | 2 +- gcc/testsuite/gcc.dg/Wattribute-alias.c | 2 +- gcc/testsuite/gcc.dg/torture/ftrapv-2.c | 2 +- gcc/testsuite/gcc.target/i386/pr84723-1.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-2.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-3.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-4.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-5.c | 1 - 9 files changed, 17 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 8ec3ed1a513..4e8e73cb52f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,17 @@ +2019-05-21 Rainer Orth <ro(a)CeBiTec.Uni-Bielefeld.DE> + + * gcc.dg/Wattribute-alias.c: Pass emtpy arg to dg-require-ifunc. + + * gcc.c-torture/execute/20030125-1.c: Pass emtpy arg to dg-require-weak. + + * gcc.dg/torture/ftrapv-2.c: Pass empty arg to dg-require-fork. + + * gcc.target/i386/pr84723-1.c: Remove dg-require-ifunc. + * gcc.target/i386/pr84723-2.c: Likewise. + * gcc.target/i386/pr84723-3.c: Likewise. + * gcc.target/i386/pr84723-4.c: Likewise. + * gcc.target/i386/pr84723-5.c: Likewise. + 2019-05-21 Iain Sandoe <iain(a)sandoe.co.uk> PR testsuite/67958 diff --git a/gcc/testsuite/gcc.c-torture/execute/20030125-1.c b/gcc/testsuite/gcc.c-torture/execute/20030125-1.c index 960552c3c3a..39578e51d15 100644 --- a/gcc/testsuite/gcc.c-torture/execute/20030125-1.c +++ b/gcc/testsuite/gcc.c-torture/execute/20030125-1.c @@ -1,6 +1,6 @@ /* Verify whether math functions are simplified. */ /* { dg-require-effective-target c99_runtime } */ -/* { dg-require-weak } */ +/* { dg-require-weak "" } */ double sin(double); double floor(double); float diff --git a/gcc/testsuite/gcc.dg/Wattribute-alias.c b/gcc/testsuite/gcc.dg/Wattribute-alias.c index 228c1be82fc..12774e82834 100644 --- a/gcc/testsuite/gcc.dg/Wattribute-alias.c +++ b/gcc/testsuite/gcc.dg/Wattribute-alias.c @@ -1,6 +1,6 @@ /* PR middle-end/81824 - Warn for missing attributes with function aliases { dg-do compile } - { dg-require-ifunc "require ifunc support" } + { dg-require-ifunc "" } { dg-options "-Wall -Wattribute-alias=2" } */ #define ATTR(...) __attribute__ ((__VA_ARGS__)) diff --git a/gcc/testsuite/gcc.dg/torture/ftrapv-2.c b/gcc/testsuite/gcc.dg/torture/ftrapv-2.c index 8065ee0461a..75e464fe557 100644 --- a/gcc/testsuite/gcc.dg/torture/ftrapv-2.c +++ b/gcc/testsuite/gcc.dg/torture/ftrapv-2.c @@ -3,7 +3,7 @@ /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */ /* { dg-additional-options "-ftrapv" } */ /* { dg-require-effective-target trapping } */ -/* { dg-require-fork unused } */ +/* { dg-require-fork "" } */ #include <stdlib.h> #include <unistd.h> diff --git a/gcc/testsuite/gcc.target/i386/pr84723-1.c b/gcc/testsuite/gcc.target/i386/pr84723-1.c index 0264ecb1159..1357b1d5f46 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-1.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-1.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-2.c b/gcc/testsuite/gcc.target/i386/pr84723-2.c index 6456d6d256f..d092e676b62 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-2.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-2.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-3.c b/gcc/testsuite/gcc.target/i386/pr84723-3.c index bb8e7cabc88..7bb8eb29815 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-3.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-3.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-4.c b/gcc/testsuite/gcc.target/i386/pr84723-4.c index 9df1008497c..f30567dfae3 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-4.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-4.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-5.c b/gcc/testsuite/gcc.target/i386/pr84723-5.c index c7aa92804fa..0167df39850 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-5.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-5.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2 - Build # 16 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Culprit: <cut> commit 7de439b2be4a046da541b625812f2fe34c54c4b9 Author: Rob Suderman <rob.suderman(a)gmail.com> Date: Wed Aug 11 11:05:08 2021 -0700 [mlir][tosa] Migrate tosa to more efficient linalg.conv Existing linalg.conv2d is not well optimized for performance. Changed to a version that is more aligned for optimziation. Include the corresponding transposes to use this optimized version. This also splits the conv and depthwise conv into separate implementations to avoid overly complex lowerings. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D107504 </cut> Results regressed to (for first_bad == 7de439b2be4a046da541b625812f2fe34c54c4b9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-7de439b2be4a046da541b625812f2fe34c54c4b9/results_id: 1 # 447.dealII,[.] _ZNK12SparseMatrixIdE5vmultI6VectorIdES3_EEvRT regressed by 111 # 433.milc,[.] mult_su3_mat_vec regressed by 112 from (for last_good == c1a8f12873783e8f4827437f6b2dddadfc58109d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-c1a8f12873783e8f4827437f6b2dddadfc58109d/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4556 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4553 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-7de439b2be4a046da541b625812f2fe34c54c4b9 cd investigate-llvm-7de439b2be4a046da541b625812f2fe34c54c4b9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 7de439b2be4a046da541b625812f2fe34c54c4b9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c1a8f12873783e8f4827437f6b2dddadfc58109d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 7de439b2be4a046da541b625812f2fe34c54c4b9 Author: Rob Suderman <rob.suderman(a)gmail.com> Date: Wed Aug 11 11:05:08 2021 -0700 [mlir][tosa] Migrate tosa to more efficient linalg.conv Existing linalg.conv2d is not well optimized for performance. Changed to a version that is more aligned for optimziation. Include the corresponding transposes to use this optimized version. This also splits the conv and depthwise conv into separate implementations to avoid overly complex lowerings. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D107504 --- .../Linalg/IR/LinalgNamedStructuredOps.yaml | 261 +++++++++--------- mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp | 303 +++++++++++++-------- .../dialects/linalg/opdsl/ops/core_named_ops.py | 79 +++--- .../Conversion/TosaToLinalg/tosa-to-linalg.mlir | 40 ++- mlir/test/Dialect/Linalg/named-ops.mlir | 14 - 5 files changed, 378 insertions(+), 319 deletions(-) diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml index 53b54e1bff9f..3e1fcabc8cb9 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml @@ -628,10 +628,10 @@ structured_op: !LinalgStructuredOpConfig scalar_arg: B --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: conv_2d_input_nhwc_filter_ohwi_poly - cpp_class_name: Conv2DInputNhwcFilterOhwiPolyOp + name: conv_2d_nchw + cpp_class_name: Conv2DNchwOp doc: |- - Performs a 2-D convolution. + Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting them to the same data type as the accumulator/output. @@ -648,13 +648,13 @@ structured_op: !LinalgStructuredOpConfig usage: InputOperand type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s4, s5, s6, s3)> + -> (s4, s1, s5, s6)> - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s7, s8, s4)> + -> (s0, s4, s7, s8, s1)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -670,18 +670,18 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d6)> + s9, s10, s11, s12] -> (d0, d4, d2 * s9 + d5 * s11, d3 * s10 + d6 * s12)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d5, d3, d4, d6)> + s9, s10, s11, s12] -> (d1, d4, d5, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d5)> + s9, s10, s11, s12] -> (d0, d1, d2, d3)> iterator_types: - parallel - parallel - parallel + - parallel - reduction - reduction - - parallel - reduction assignments: - !ScalarAssign @@ -710,14 +710,13 @@ structured_op: !LinalgStructuredOpConfig scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: conv_2d_input_nhwc_filter_ohwi_poly_q - cpp_class_name: Conv2DInputNhwcFilterOhwiPolyQOp + name: conv_2d_nhwc_hwcf + cpp_class_name: Conv2DNhwcHwcfOp doc: |- - Performs a 2-D quantized convolution. + Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. Includes zero point - adjustment for quantization. + them to the same data type as the accumulator/output. structured_op: !LinalgStructuredOpConfig args: - !LinalgOperandDefConfig @@ -731,21 +730,13 @@ structured_op: !LinalgStructuredOpConfig usage: InputOperand type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s4, s5, s6, s3)> - - !LinalgOperandDefConfig - name: IZp - usage: InputOperand - type_var: I32 - - !LinalgOperandDefConfig - name: KZp - usage: InputOperand - type_var: I32 + -> (s4, s5, s3, s6)> - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s7, s8, s4)> + -> (s0, s7, s8, s6)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -761,22 +752,18 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d6)> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d5, d3, d4, d6)> + s9, s10, s11, s12] -> (d0, d1 * s9 + d4 * s11, d2 * s10 + d5 * s12, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> + s9, s10, s11, s12] -> (d4, d5, d6, d3)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d5)> + s9, s10, s11, s12] -> (d0, d1, d2, d3)> iterator_types: - parallel - parallel - parallel + - parallel - reduction - reduction - - parallel - reduction assignments: - !ScalarAssign @@ -792,37 +779,17 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: I - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: IZp + scalar_arg: I - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: K - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: KZp + scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata name: depthwise_conv_2d_input_nhwc_filter_hwc_poly @@ -906,13 +873,14 @@ structured_op: !LinalgStructuredOpConfig scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: depthwise_conv_2D_nchw - cpp_class_name: DepthwiseConv2DNchwOp + name: conv_2d_nhwc_hwcf_q + cpp_class_name: Conv2DNhwcHwcfQOp doc: |- - Performs depth-wise 2-D convolution. + Performs 2-D convolution with zero point offsets. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. + them to the same data type as the accumulator/output. This includes the zero + point offsets common to quantized operations. structured_op: !LinalgStructuredOpConfig args: - !LinalgOperandDefConfig @@ -927,12 +895,20 @@ structured_op: !LinalgStructuredOpConfig type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (s4, s5, s3, s6)> + - !LinalgOperandDefConfig + name: IZp + usage: InputOperand + type_var: I32 + - !LinalgOperandDefConfig + name: KZp + usage: InputOperand + type_var: I32 - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s7, s8, s3, s6)> + -> (s0, s7, s8, s6)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -948,19 +924,23 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d5)> + s9, s10, s11, s12] -> (d0, d1 * s9 + d4 * s11, d2 * s10 + d5 * s12, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d3, d4, d5, d6)> + s9, s10, s11, s12] -> (d4, d5, d6, d3)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d5, d6)> + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> (d0, d1, d2, d3)> iterator_types: - parallel - parallel - parallel + - parallel + - reduction - reduction - reduction - - parallel - - parallel assignments: - !ScalarAssign arg: O @@ -975,21 +955,41 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: I + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: I + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: IZp - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: K + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: K + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: KZp --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: depthwise_conv2D_nchw_q - cpp_class_name: DepthwiseConv2DNchwQOp + name: depthwise_conv2D_nchw + cpp_class_name: DepthwiseConv2DNchwOp doc: |- Performs depth-wise 2-D convolution. @@ -1009,14 +1009,6 @@ structured_op: !LinalgStructuredOpConfig type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (s4, s5, s3, s6)> - - !LinalgOperandDefConfig - name: IZp - usage: InputOperand - type_var: I32 - - !LinalgOperandDefConfig - name: KZp - usage: InputOperand - type_var: I32 - !LinalgOperandDefConfig name: O usage: OutputOperand @@ -1041,10 +1033,6 @@ structured_op: !LinalgStructuredOpConfig s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d5)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (d3, d4, d5, d6)> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (d0, d1, d2, d5, d6)> iterator_types: @@ -1069,43 +1057,23 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: I - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: IZp + scalar_arg: I - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: K - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: KZp + scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: conv_2d_nchw - cpp_class_name: Conv2DNchwOp + name: depthwise_conv2D_nchw_q + cpp_class_name: DepthwiseConv2DNchwQOp doc: |- - Performs 2-D convolution. + Performs depth-wise 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting them to the same data type as the accumulator/output. @@ -1122,13 +1090,21 @@ structured_op: !LinalgStructuredOpConfig usage: InputOperand type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s4, s1, s5, s6)> + -> (s4, s5, s3, s6)> + - !LinalgOperandDefConfig + name: IZp + usage: InputOperand + type_var: I32 + - !LinalgOperandDefConfig + name: KZp + usage: InputOperand + type_var: I32 - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s4, s7, s8, s1)> + -> (s0, s7, s8, s3, s6)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -1144,19 +1120,23 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d4, d2 * s9 + d5 * s11, d3 * s10 + d6 * s12)> + s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d5)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d1, d4, d5, d6)> + s9, s10, s11, s12] -> (d3, d4, d5, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d3)> + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> (d0, d1, d2, d5, d6)> iterator_types: - parallel - parallel - parallel - - parallel - - reduction - reduction - reduction + - parallel + - parallel assignments: - !ScalarAssign arg: O @@ -1171,17 +1151,37 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: I + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: I + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: IZp - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: K + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: K + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: KZp --- !LinalgOpConfig metadata: !LinalgOpMetadata name: pooling_nhwc_sum @@ -1896,3 +1896,4 @@ structured_op: !LinalgStructuredOpConfig operands: - !ScalarExpression scalar_arg: I + diff --git a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp index 37687337e10b..8e24f03a0dac 100644 --- a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp +++ b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp @@ -849,104 +849,213 @@ static LogicalResult reduceMatchAndRewriteHelper(Operation *op, uint64_t axis, return success(); } -static LogicalResult -convolutionMatchAndRewriterHelper(Operation *op, - ConversionPatternRewriter &rewriter) { - Location loc = op->getLoc(); - Value input = op->getOperand(0); - Value weight = op->getOperand(1); - Value bias = op->getOperand(2); +namespace { - ShapedType inputTy = input.getType().cast<ShapedType>(); - ShapedType weightTy = weight.getType().cast<ShapedType>(); - ShapedType biasTy = bias.getType().cast<ShapedType>(); - ShapedType resultTy = op->getResult(0).getType().cast<ShapedType>(); +template <typename SrcOp> +class PointwiseConverter : public OpRewritePattern<SrcOp> { +public: + using OpRewritePattern<SrcOp>::OpRewritePattern; - Type inputETy = inputTy.getElementType(); - Type resultETy = resultTy.getElementType(); - - auto padAttr = op->getAttr("pad").cast<ArrayAttr>(); - auto strideTosaAttr = op->getAttr("stride").cast<ArrayAttr>(); - auto dilationTosaAttr = op->getAttr("dilation").cast<ArrayAttr>(); - - bool isQuantized = op->hasAttr("quantization_info"); - IntegerAttr iZp; - IntegerAttr kZp; - if (isQuantized) { - auto quantizationInfo = - op->getAttr("quantization_info").cast<tosa::ConvOpQuantizationAttr>(); - iZp = rewriter.getI32IntegerAttr( - quantizationInfo.input_zp().getValue().getSExtValue()); - kZp = rewriter.getI32IntegerAttr( - quantizationInfo.weight_zp().getValue().getSExtValue()); + LogicalResult matchAndRewrite(SrcOp op, + PatternRewriter &rewriter) const final { + return elementwiseMatchAndRewriteHelper(op, rewriter); } +}; - if (!inputTy.hasStaticShape() || !weightTy.hasStaticShape() || - !biasTy.hasStaticShape() || !resultTy.hasStaticShape()) - return rewriter.notifyMatchFailure(op, - "tosa.conv ops require static shapes"); +class ConvConverter : public OpConversionPattern<tosa::Conv2DOp> { +public: + using OpConversionPattern<tosa::Conv2DOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(tosa::Conv2DOp op, ArrayRef<Value> args, + ConversionPatternRewriter &rewriter) const final { + Location loc = op->getLoc(); + Value input = op->getOperand(0); + Value weight = op->getOperand(1); + Value bias = op->getOperand(2); - auto weightShape = weightTy.getShape(); - auto resultShape = resultTy.getShape(); + ShapedType inputTy = input.getType().cast<ShapedType>(); + ShapedType weightTy = weight.getType().cast<ShapedType>(); + ShapedType biasTy = bias.getType().cast<ShapedType>(); + ShapedType resultTy = op->getResult(0).getType().cast<ShapedType>(); - // Apply padding as necessary. - Attribute zeroAttr = rewriter.getZeroAttr(inputETy); - llvm::SmallVector<int64_t> pad; - pad.resize(2, 0); - getValuesFromIntArrayAttribute(padAttr, pad); - pad.resize(pad.size() + 2, 0); + Type inputETy = inputTy.getElementType(); + Type resultETy = resultTy.getElementType(); - input = applyPad(loc, input, pad, zeroAttr, rewriter); + auto padAttr = op->getAttr("pad").cast<ArrayAttr>(); + auto strideTosaAttr = op->getAttr("stride").cast<ArrayAttr>(); + auto dilationTosaAttr = op->getAttr("dilation").cast<ArrayAttr>(); + bool isQuantized = op->hasAttr("quantization_info"); - // Broadcast the initial value to the output tensor before convolving. - SmallVector<AffineMap, 4> indexingMaps; - indexingMaps.push_back(AffineMap::get( - /*dimCount=*/resultTy.getRank(), /*symbolCount=*/0, - {rewriter.getAffineDimExpr(3)}, rewriter.getContext())); - indexingMaps.push_back(rewriter.getMultiDimIdentityMap(resultTy.getRank())); + if (!inputTy.hasStaticShape() || !weightTy.hasStaticShape() || + !biasTy.hasStaticShape() || !resultTy.hasStaticShape()) + return rewriter.notifyMatchFailure(op, + "tosa.conv ops require static shapes"); - Value initTensor = rewriter.create<linalg::InitTensorOp>( - loc, resultTy.getShape(), resultTy.getElementType()); + auto weightShape = weightTy.getShape(); - Value biasBroadcast = - rewriter - .create<linalg::GenericOp>( - loc, resultTy, bias, initTensor, indexingMaps, - getNParallelLoopsAttrs(resultTy.getRank()), - [&](OpBuilder &nestedBuilder, Location nestedLoc, - ValueRange args) { - nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]); - }) - .getResult(0); - - // Extract the attributes for convolution. - llvm::SmallVector<int64_t> stride, dilation; - getValuesFromIntArrayAttribute(strideTosaAttr, stride); - getValuesFromIntArrayAttribute(dilationTosaAttr, dilation); - - // Create the convolution op. - auto strideAttr = DenseIntElementsAttr::get( - RankedTensorType::get({2}, rewriter.getI64Type()), stride); - auto dilationAttr = DenseIntElementsAttr::get( - RankedTensorType::get({2}, rewriter.getI64Type()), dilation); - - if (isa<tosa::Conv2DOp>(op) && !isQuantized) { - rewriter.replaceOpWithNewOp<linalg::Conv2DInputNhwcFilterOhwiPolyOp>( + // Apply padding as necessary. + Attribute zeroAttr = rewriter.getZeroAttr(inputETy); + llvm::SmallVector<int64_t> pad; + pad.resize(2, 0); + getValuesFromIntArrayAttribute(padAttr, pad); + pad.resize(pad.size() + 2, 0); + input = applyPad(loc, input, pad, zeroAttr, rewriter); + + // Transpose the kernel to match dimension ordering of the linalg + // convolution operation. + // TODO(suderman): See if this can be efficiently folded - check whether + // the input is used anywhere else, if not fold the constant. + SmallVector<int64_t> weightPerm{1, 2, 3, 0}; + SmallVector<int64_t> newWeightShape{weightShape[1], weightShape[2], + weightShape[3], weightShape[0]}; + auto weightPermAttr = DenseIntElementsAttr::get( + RankedTensorType::get({4}, rewriter.getI64Type()), weightPerm); + Value weightPermValue = rewriter.create<ConstantOp>(loc, weightPermAttr); + Type newWeightTy = + RankedTensorType::get(newWeightShape, weightTy.getElementType()); + weight = rewriter.create<tosa::TransposeOp>(loc, newWeightTy, weight, + weightPermValue); + + // Broadcast the initial value to the output tensor before convolving. + SmallVector<AffineMap, 4> indexingMaps; + indexingMaps.push_back(AffineMap::get( + /*dimCount=*/resultTy.getRank(), /*symbolCount=*/0, + {rewriter.getAffineDimExpr(3)}, rewriter.getContext())); + indexingMaps.push_back(rewriter.getMultiDimIdentityMap(resultTy.getRank())); + + Value initTensor = rewriter.create<linalg::InitTensorOp>( + loc, resultTy.getShape(), resultETy); + + Value biasBroadcast = + rewriter + .create<linalg::GenericOp>( + loc, resultTy, bias, initTensor, indexingMaps, + getNParallelLoopsAttrs(resultTy.getRank()), + [&](OpBuilder &nestedBuilder, Location nestedLoc, + ValueRange args) { + nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]); + }) + .getResult(0); + + // Extract the attributes for convolution. + llvm::SmallVector<int64_t> stride, dilation; + getValuesFromIntArrayAttribute(strideTosaAttr, stride); + getValuesFromIntArrayAttribute(dilationTosaAttr, dilation); + + // Create the convolution op. + auto strideAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), stride); + auto dilationAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), dilation); + + Value conv; + if (isQuantized) { + auto quantizationInfo = + op->getAttr("quantization_info").cast<tosa::ConvOpQuantizationAttr>(); + auto iZp = rewriter.getI32IntegerAttr( + quantizationInfo.input_zp().getValue().getSExtValue()); + auto kZp = rewriter.getI32IntegerAttr( + quantizationInfo.weight_zp().getValue().getSExtValue()); + + auto iZpVal = rewriter.create<ConstantOp>(loc, iZp); + auto kZpVal = rewriter.create<ConstantOp>(loc, kZp); + rewriter.replaceOpWithNewOp<linalg::Conv2DNhwcHwcfQOp>( + op, resultTy, ValueRange{input, weight, iZpVal, kZpVal}, + ValueRange{biasBroadcast}, strideAttr, dilationAttr); + return success(); + } + + rewriter.replaceOpWithNewOp<linalg::Conv2DNhwcHwcfOp>( op, resultTy, ValueRange{input, weight}, ValueRange{biasBroadcast}, strideAttr, dilationAttr); return success(); } +}; - if (isa<tosa::Conv2DOp>(op) && isQuantized) { - auto iZpVal = rewriter.create<ConstantOp>(loc, iZp); - auto kZpVal = rewriter.create<ConstantOp>(loc, kZp); - rewriter.replaceOpWithNewOp<linalg::Conv2DInputNhwcFilterOhwiPolyQOp>( - op, resultTy, ValueRange{input, weight, iZpVal, kZpVal}, - ValueRange{biasBroadcast}, strideAttr, dilationAttr); - return success(); - } +class DepthwiseConvConverter + : public OpConversionPattern<tosa::DepthwiseConv2DOp> { +public: + using OpConversionPattern<tosa::DepthwiseConv2DOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(tosa::DepthwiseConv2DOp op, ArrayRef<Value> args, + ConversionPatternRewriter &rewriter) const final { + Location loc = op->getLoc(); + Value input = op->getOperand(0); + Value weight = op->getOperand(1); + Value bias = op->getOperand(2); + + ShapedType inputTy = input.getType().cast<ShapedType>(); + ShapedType weightTy = weight.getType().cast<ShapedType>(); + ShapedType biasTy = bias.getType().cast<ShapedType>(); + ShapedType resultTy = op->getResult(0).getType().cast<ShapedType>(); - if (isa<tosa::DepthwiseConv2DOp>(op)) { + Type inputETy = inputTy.getElementType(); + Type resultETy = resultTy.getElementType(); + + auto padAttr = op->getAttr("pad").cast<ArrayAttr>(); + auto strideTosaAttr = op->getAttr("stride").cast<ArrayAttr>(); + auto dilationTosaAttr = op->getAttr("dilation").cast<ArrayAttr>(); + + bool isQuantized = op->hasAttr("quantization_info"); + IntegerAttr iZp; + IntegerAttr kZp; + if (isQuantized) { + auto quantizationInfo = + op->getAttr("quantization_info").cast<tosa::ConvOpQuantizationAttr>(); + iZp = rewriter.getI32IntegerAttr( + quantizationInfo.input_zp().getValue().getSExtValue()); + kZp = rewriter.getI32IntegerAttr( + quantizationInfo.weight_zp().getValue().getSExtValue()); + } + + if (!inputTy.hasStaticShape() || !weightTy.hasStaticShape() || + !biasTy.hasStaticShape() || !resultTy.hasStaticShape()) + return rewriter.notifyMatchFailure(op, + "tosa.conv ops require static shapes"); + + auto weightShape = weightTy.getShape(); + auto resultShape = resultTy.getShape(); + + // Apply padding as necessary. + Attribute zeroAttr = rewriter.getZeroAttr(inputETy); + llvm::SmallVector<int64_t> pad; + pad.resize(2, 0); + getValuesFromIntArrayAttribute(padAttr, pad); + pad.resize(pad.size() + 2, 0); + + input = applyPad(loc, input, pad, zeroAttr, rewriter); + + // Broadcast the initial value to the output tensor before convolving. + SmallVector<AffineMap, 4> indexingMaps; + indexingMaps.push_back(AffineMap::get( + /*dimCount=*/resultTy.getRank(), /*symbolCount=*/0, + {rewriter.getAffineDimExpr(3)}, rewriter.getContext())); + indexingMaps.push_back(rewriter.getMultiDimIdentityMap(resultTy.getRank())); + + Value initTensor = + rewriter.create<linalg::InitTensorOp>(loc, resultShape, resultETy); + + Value biasBroadcast = + rewriter + .create<linalg::GenericOp>( + loc, resultTy, bias, initTensor, indexingMaps, + getNParallelLoopsAttrs(resultTy.getRank()), + [&](OpBuilder &nestedBuilder, Location nestedLoc, + ValueRange args) { + nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]); + }) + .getResult(0); + + // Extract the attributes for convolution. + llvm::SmallVector<int64_t> stride, dilation; + getValuesFromIntArrayAttribute(strideTosaAttr, stride); + getValuesFromIntArrayAttribute(dilationTosaAttr, dilation); + + // Create the convolution op. + auto strideAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), stride); + auto dilationAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), dilation); ShapedType linalgConvTy = RankedTensorType::get({resultShape[0], resultShape[1], resultShape[2], weightShape[2], weightShape[3]}, @@ -976,32 +1085,6 @@ convolutionMatchAndRewriterHelper(Operation *op, rewriter.replaceOp(op, reshape); return success(); } - - return failure(); -} - -namespace { - -template <typename SrcOp> -class PointwiseConverter : public OpRewritePattern<SrcOp> { -public: - using OpRewritePattern<SrcOp>::OpRewritePattern; - - LogicalResult matchAndRewrite(SrcOp op, - PatternRewriter &rewriter) const final { - return elementwiseMatchAndRewriteHelper(op, rewriter); - } -}; - -template <typename T> -class ConvConverter : public OpConversionPattern<T> { -public: - using OpConversionPattern<T>::OpConversionPattern; - LogicalResult - matchAndRewrite(T op, ArrayRef<Value> args, - ConversionPatternRewriter &rewriter) const final { - return convolutionMatchAndRewriterHelper(op, rewriter); - } }; class TransposeConvConverter @@ -2528,8 +2611,8 @@ void mlir::tosa::populateTosaToLinalgOnTensorsConversionPatterns( ReduceConverter<tosa::ReduceProdOp>, ArgMaxConverter, ConcatConverter, - ConvConverter<tosa::Conv2DOp>, - ConvConverter<tosa::DepthwiseConv2DOp>, + ConvConverter, + DepthwiseConvConverter, TransposeConvConverter, GatherConverter, PadConverter, diff --git a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py index fc92c196a059..b9faeeb831df 100644 --- a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py +++ b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py @@ -144,49 +144,39 @@ def dot( implements(ContractionOpInterface) C[None] += cast(U, A[D.m]) * cast(U, B[D.m]) - @linalg_structured_op -def conv_2d_input_nhwc_filter_ohwi_poly( - I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), - K=TensorDef(T2, S.OC, S.KH, S.KW, S.IC), - O=TensorDef(U, S.N, S.OH, S.OW, S.OC, output=True), +def conv_2d_nchw( + I=TensorDef(T1, S.N, S.C, S.IH, S.IW), + K=TensorDef(T2, S.F, S.C, S.KH, S.KW), + O=TensorDef(U, S.N, S.F, S.OH, S.OW, S.C, output=True), strides=AttributeDef(S.SH, S.SW), dilations=AttributeDef(S.DH, S.DW)): - """Performs a 2-D convolution. + """Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting them to the same data type as the accumulator/output. """ - domain(D.n, D.oh, D.ow, D.kh, D.kw, D.oc, D.ic) - O[D.n, D.oh, D.ow, D.oc] += cast( - U, I[D.n, - D.oh * S.SH + D.kh * S.DH, - D.ow * S.SW + D.kw * S.DW, - D.ic]) * cast(U, K[D.oc, D.kh, D.kw, D.ic]) + domain(D.n, D.f, D.oh, D.ow, D.c, D.kh, D.kw) + O[D.n, D.f, D.oh, D.ow] += cast( + U, I[D.n, D.c, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, + ]) * cast(U, K[D.f, D.c, D.kh, D.kw]) @linalg_structured_op -def conv_2d_input_nhwc_filter_ohwi_poly_q( - I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), - K=TensorDef(T2, S.OC, S.KH, S.KW, S.IC), - IZp=ScalarDef(I32), - KZp=ScalarDef(I32), - O=TensorDef(U, S.N, S.OH, S.OW, S.OC, output=True), +def conv_2d_nhwc_hwcf( + I=TensorDef(T1, S.N, S.IH, S.IW, S.C), + K=TensorDef(T2, S.KH, S.KW, S.C, S.F), + O=TensorDef(U, S.N, S.OH, S.OW, S.F, output=True), strides=AttributeDef(S.SH, S.SW), dilations=AttributeDef(S.DH, S.DW)): - """Performs a 2-D quantized convolution. + """Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. Includes zero point - adjustment for quantization. + them to the same data type as the accumulator/output. """ - domain(D.n, D.oh, D.ow, D.kh, D.kw, D.oc, D.ic) - O[D.n, D.oh, D.ow, D.oc] += ((cast( - U, I[D.n, - D.oh * S.SH + D.kh * S.DH, - D.ow * S.SW + D.kw * S.DW, - D.ic]) - cast(U, IZp)) * - (cast(U, K[D.oc, D.kh, D.kw, D.ic]) - cast(U, KZp))) - + domain(D.n, D.oh, D.ow, D.f, D.kh, D.kw, D.c) + O[D.n, D.oh, D.ow, D.f] += cast( + U, I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.c + ]) * cast(U, K[D.kh, D.kw, D.c, D.f]) @linalg_structured_op def depthwise_conv_2d_input_nhwc_filter_hwc_poly( @@ -206,24 +196,27 @@ def depthwise_conv_2d_input_nhwc_filter_hwc_poly( D.c]) * cast(U, K[D.kh, D.kw, D.c]) @linalg_structured_op -def conv_2d_nchw( - I=TensorDef(T1, S.N, S.C, S.IH, S.IW), - K=TensorDef(T2, S.F, S.C, S.KH, S.KW), - O=TensorDef(U, S.N, S.F, S.OH, S.OW, S.C, output=True), +def conv_2d_nhwc_hwcf_q( + I=TensorDef(T1, S.N, S.IH, S.IW, S.C), + K=TensorDef(T2, S.KH, S.KW, S.C, S.F), + IZp=ScalarDef(I32), + KZp=ScalarDef(I32), + O=TensorDef(U, S.N, S.OH, S.OW, S.F, output=True), strides=AttributeDef(S.SH, S.SW), dilations=AttributeDef(S.DH, S.DW)): - """Performs 2-D convolution. + """Performs 2-D convolution with zero point offsets. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. + them to the same data type as the accumulator/output. This includes the zero + point offsets common to quantized operations. """ - domain(D.n, D.f, D.oh, D.ow, D.c, D.kh, D.kw) - O[D.n, D.f, D.oh, D.ow] += cast( - U, I[D.n, D.c, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, - ]) * cast(U, K[D.f, D.c, D.kh, D.kw]) + domain(D.n, D.oh, D.ow, D.f, D.kh, D.kw, D.c) + O[D.n, D.oh, D.ow, D.f] += (cast( + U, I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.c + ]) - cast(U, IZp)) * (cast(U, K[D.kh, D.kw, D.c, D.f]) - cast(U, KZp)) - -def depthwise_conv2D_nchw( #TODO: Fix name +@linalg_structured_op +def depthwise_conv2D_nchw( I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), K=TensorDef(T2, S.KH, S.KW, S.IC, S.CM), O=TensorDef(U, S.N, S.OH, S.OW, S.IC, S.CM, output=True), @@ -239,8 +232,8 @@ def depthwise_conv2D_nchw( #TODO: Fix name U, I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.ic]) * cast(U, K[D.kh, D.kw, D.ic, D.cm]) - -def depthwise_conv2D_nchw_q( #TODO: Fix name +@linalg_structured_op +def depthwise_conv2D_nchw_q( I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), K=TensorDef(T2, S.KH, S.KW, S.IC, S.CM), IZp=ScalarDef(I32), diff --git a/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir b/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir index 309846d66c94..3c89de395187 100644 --- a/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir +++ b/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir @@ -1176,14 +1176,19 @@ func @avg_pool(%arg0: tensor<1x6x34x62xf32>) -> (tensor<1x5x33x62xf32>) { // ----- -// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2, d3) -> (d3)> -// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d1, d2)> +// CHECK: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3) -> (d3)> -// CHECK-LABEL: @conv2d_f32 +// CHECK-LABEL @conv2d_f32 func @conv2d_f32(%input: tensor<1x49x42x27xf32>, %weights: tensor<28x3x3x27xf32>, %bias: tensor<28xf32>) -> () { - // CHECK: %[[INIT:.+]] = linalg.init_tensor [1, 45, 40, 28] - // CHECK: %[[BROADCAST:.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<28xf32>) outs(%[[INIT]] : tensor<1x45x40x28xf32>) - // CHECK: linalg.conv_2d_input_nhwc_filter_ohwi_poly {dilations = dense<[2, 1]> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<1x49x42x27xf32>, tensor<28x3x3x27xf32>) outs(%[[BROADCAST]] : tensor<1x45x40x28xf32>) + // CHECK: %[[W_IN:.+]] = linalg.init_tensor [3, 3, 27, 28] + // CHECK: %[[W:.+]] = linalg.generic {indexing_maps = [#[[MAP0]], #[[MAP1]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg1 : tensor<28x3x3x27xf32>) outs(%[[W_IN]] : tensor<3x3x27x28xf32>) + // CHECK: linalg.yield %arg3 : f32 + // CHECK: %[[B_IN:.+]] = linalg.init_tensor [1, 45, 40, 28] + // CHECK: %[[B:.+]] = linalg.generic {indexing_maps = [#[[MAP2]], #[[MAP1]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<28xf32>) outs(%[[B_IN]] : tensor<1x45x40x28xf32>) + // CHECK: linalg.yield %arg3 : f32 + // CHECK: %[[CONV:.+]] = linalg.conv_2d_nhwc_hwcf {dilations = dense<[2, 1]> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %1 : tensor<1x49x42x27xf32>, tensor<3x3x27x28xf32>) outs(%[[B]] : tensor<1x45x40x28xf32>) %0 = "tosa.conv2d"(%input, %weights, %bias) {pad = [0, 0, 0, 0], stride = [1, 1], dilation = [2, 1]} : (tensor<1x49x42x27xf32>, tensor<28x3x3x27xf32>, tensor<28xf32>) -> (tensor<1x45x40x28xf32>) return } @@ -1192,26 +1197,17 @@ func @conv2d_f32(%input: tensor<1x49x42x27xf32>, %weights: tensor<28x3x3x27xf32> // CHECK-LABEL: @conv2d_padded_f32 func @conv2d_padded_f32(%input: tensor<1x47x40x28xf32>, %weights: tensor<28x3x3x28xf32>, %bias: tensor<28xf32>) -> () { - // CHECK: linalg.pad_tensor %arg0 - // CHECK: linalg.conv_2d_input_nhwc_filter_ohwi_poly </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig - Build # 16 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig Culprit: <cut> commit 342f43af70dbc74f8629381998f92c060e1763a2 Author: Maurizio Lombardi <mlombard(a)redhat.com> Date: Thu Jul 29 15:52:50 2021 +0200 iscsi_ibft: fix crash due to KASLR physical memory remapping Starting with commit a799c2bd29d1 ("x86/setup: Consolidate early memory reservations") memory reservations have been moved earlier during the boot process, before the execution of the Kernel Address Space Layout Randomization code. setup_arch() calls the iscsi_ibft's find_ibft_region() function to find and reserve the memory dedicated to the iBFT and this function also saves a virtual pointer to the iBFT table for later use. The problem is that if KALSR is active, the physical memory gets remapped somewhere else in the virtual address space and the pointer is no longer valid, this will cause a kernel panic when the iscsi driver tries to dereference it. iBFT detected. BUG: unable to handle page fault for address: ffff888000099fd8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ..snip.. Call Trace: ? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft] do_one_initcall+0x44/0x1d0 ? kmem_cache_alloc_trace+0x119/0x220 do_init_module+0x5c/0x270 __do_sys_init_module+0x12e/0x1b0 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this bug by saving the address of the physical location of the ibft; later the driver will use isa_bus_to_virt() to get the correct virtual address. N.B. On each reboot KASLR randomizes the virtual addresses so assuming phys_to_virt before KASLR does its deed is incorrect. Simplify the code by renaming find_ibft_region() to reserve_ibft_region() and remove all the wrappers. Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com> Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad(a)kernel.org> </cut> Results regressed to (for first_bad == 342f43af70dbc74f8629381998f92c060e1763a2) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 19722 # First few build errors in logs: from (for last_good == 62fb9874f5da54fdb243003b386128037319b219) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 19795 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-342f43af70dbc74f8629381998f92c060e1763a2 cd investigate-linux-342f43af70dbc74f8629381998f92c060e1763a2 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 342f43af70dbc74f8629381998f92c060e1763a2 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 62fb9874f5da54fdb243003b386128037319b219 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Full commit (up to 1000 lines): <cut> commit 342f43af70dbc74f8629381998f92c060e1763a2 Author: Maurizio Lombardi <mlombard(a)redhat.com> Date: Thu Jul 29 15:52:50 2021 +0200 iscsi_ibft: fix crash due to KASLR physical memory remapping Starting with commit a799c2bd29d1 ("x86/setup: Consolidate early memory reservations") memory reservations have been moved earlier during the boot process, before the execution of the Kernel Address Space Layout Randomization code. setup_arch() calls the iscsi_ibft's find_ibft_region() function to find and reserve the memory dedicated to the iBFT and this function also saves a virtual pointer to the iBFT table for later use. The problem is that if KALSR is active, the physical memory gets remapped somewhere else in the virtual address space and the pointer is no longer valid, this will cause a kernel panic when the iscsi driver tries to dereference it. iBFT detected. BUG: unable to handle page fault for address: ffff888000099fd8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ..snip.. Call Trace: ? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft] do_one_initcall+0x44/0x1d0 ? kmem_cache_alloc_trace+0x119/0x220 do_init_module+0x5c/0x270 __do_sys_init_module+0x12e/0x1b0 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this bug by saving the address of the physical location of the ibft; later the driver will use isa_bus_to_virt() to get the correct virtual address. N.B. On each reboot KASLR randomizes the virtual addresses so assuming phys_to_virt before KASLR does its deed is incorrect. Simplify the code by renaming find_ibft_region() to reserve_ibft_region() and remove all the wrappers. Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com> Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad(a)kernel.org> --- arch/x86/kernel/setup.c | 10 -------- drivers/firmware/iscsi_ibft.c | 10 +++++--- drivers/firmware/iscsi_ibft_find.c | 48 ++++++++++++++------------------------ include/linux/iscsi_ibft.h | 18 ++++++-------- 4 files changed, 32 insertions(+), 54 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 1e720626069a..b6a62af06a9f 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -571,16 +571,6 @@ void __init reserve_standard_io_resources(void) } -static __init void reserve_ibft_region(void) -{ - unsigned long addr, size = 0; - - addr = find_ibft_region(&size); - - if (size) - memblock_reserve(addr, size); -} - static bool __init snb_gfx_workaround_needed(void) { #ifdef CONFIG_PCI diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c index 7127a04bca19..612a59e213df 100644 --- a/drivers/firmware/iscsi_ibft.c +++ b/drivers/firmware/iscsi_ibft.c @@ -84,8 +84,10 @@ MODULE_DESCRIPTION("sysfs interface to BIOS iBFT information"); MODULE_LICENSE("GPL"); MODULE_VERSION(IBFT_ISCSI_VERSION); +static struct acpi_table_ibft *ibft_addr; + #ifndef CONFIG_ISCSI_IBFT_FIND -struct acpi_table_ibft *ibft_addr; +phys_addr_t ibft_phys_addr; #endif struct ibft_hdr { @@ -858,11 +860,13 @@ static int __init ibft_init(void) int rc = 0; /* - As on UEFI systems the setup_arch()/find_ibft_region() + As on UEFI systems the setup_arch()/reserve_ibft_region() is called before ACPI tables are parsed and it only does legacy finding. */ - if (!ibft_addr) + if (ibft_phys_addr) + ibft_addr = isa_bus_to_virt(ibft_phys_addr); + else acpi_find_ibft_region(); if (ibft_addr) { diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c index 64bb94523281..a0594590847d 100644 --- a/drivers/firmware/iscsi_ibft_find.c +++ b/drivers/firmware/iscsi_ibft_find.c @@ -31,8 +31,8 @@ /* * Physical location of iSCSI Boot Format Table. */ -struct acpi_table_ibft *ibft_addr; -EXPORT_SYMBOL_GPL(ibft_addr); +phys_addr_t ibft_phys_addr; +EXPORT_SYMBOL_GPL(ibft_phys_addr); static const struct { char *sign; @@ -47,13 +47,24 @@ static const struct { #define VGA_MEM 0xA0000 /* VGA buffer */ #define VGA_SIZE 0x20000 /* 128kB */ -static int __init find_ibft_in_mem(void) +/* + * Routine used to find and reserve the iSCSI Boot Format Table + */ +void __init reserve_ibft_region(void) { unsigned long pos; unsigned int len = 0; void *virt; int i; + ibft_phys_addr = 0; + + /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will + * only use ACPI for this + */ + if (efi_enabled(EFI_BOOT)) + return; + for (pos = IBFT_START; pos < IBFT_END; pos += 16) { /* The table can't be inside the VGA BIOS reserved space, * so skip that area */ @@ -70,35 +81,12 @@ static int __init find_ibft_in_mem(void) /* if the length of the table extends past 1M, * the table cannot be valid. */ if (pos + len <= (IBFT_END-1)) { - ibft_addr = (struct acpi_table_ibft *)virt; - pr_info("iBFT found at 0x%lx.\n", pos); - goto done; + ibft_phys_addr = pos; + memblock_reserve(ibft_phys_addr, PAGE_ALIGN(len)); + pr_info("iBFT found at 0x%lx.\n", ibft_phys_addr); + return; } } } } -done: - return len; -} -/* - * Routine used to find the iSCSI Boot Format Table. The logical - * kernel address is set in the ibft_addr global variable. - */ -unsigned long __init find_ibft_region(unsigned long *sizep) -{ - ibft_addr = NULL; - - /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will - * only use ACPI for this */ - - if (!efi_enabled(EFI_BOOT)) - find_ibft_in_mem(); - - if (ibft_addr) { - *sizep = PAGE_ALIGN(ibft_addr->header.length); - return (u64)virt_to_phys(ibft_addr); - } - - *sizep = 0; - return 0; } diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h index b7b45ca82bea..790e7fcfc1a6 100644 --- a/include/linux/iscsi_ibft.h +++ b/include/linux/iscsi_ibft.h @@ -13,26 +13,22 @@ #ifndef ISCSI_IBFT_H #define ISCSI_IBFT_H -#include <linux/acpi.h> +#include <linux/types.h> /* - * Logical location of iSCSI Boot Format Table. - * If the value is NULL there is no iBFT on the machine. + * Physical location of iSCSI Boot Format Table. + * If the value is 0 there is no iBFT on the machine. */ -extern struct acpi_table_ibft *ibft_addr; +extern phys_addr_t ibft_phys_addr; /* * Routine used to find and reserve the iSCSI Boot Format Table. The - * mapped address is set in the ibft_addr variable. + * physical address is set in the ibft_phys_addr variable. */ #ifdef CONFIG_ISCSI_IBFT_FIND -unsigned long find_ibft_region(unsigned long *sizep); +void reserve_ibft_region(void); #else -static inline unsigned long find_ibft_region(unsigned long *sizep) -{ - *sizep = 0; - return 0; -} +static inline void reserve_ibft_region(void) {} #endif #endif /* ISCSI_IBFT_H */ </cut>

4 years, 9 months

2
1
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 20 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 4cd8dd3fe05e099792e1494dedd074eb5ba289b6 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Sun Aug 22 13:46:52 2021 -0500 [scudo][standalone] Link tests against libatomic if libatomic exists It is possible that libatomic does not exist on some systems. This patch updates the scudo standalone tests to link against libatomic if the library exists. This is an update to the original patch: https://reviews.llvm.org/D64134 and aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431. Differential Revision: https://reviews.llvm.org/D108503 </cut> Results regressed to (for first_bad == 4cd8dd3fe05e099792e1494dedd074eb5ba289b6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-4cd8dd3fe05e099792e1494dedd074eb5ba289b6/results_id: 1 # 447.dealII,dealII_base.default regressed by 103 from (for last_good == d8d84c9df82fc114f2b22a533a8183065ca1a2e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-d8d84c9df82fc114f2b22a533a8183065ca1a2e0/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4515 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4510 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4cd8dd3fe05e099792e1494dedd074eb5ba289b6 cd investigate-llvm-4cd8dd3fe05e099792e1494dedd074eb5ba289b6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4cd8dd3fe05e099792e1494dedd074eb5ba289b6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d8d84c9df82fc114f2b22a533a8183065ca1a2e0 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4cd8dd3fe05e099792e1494dedd074eb5ba289b6 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Sun Aug 22 13:46:52 2021 -0500 [scudo][standalone] Link tests against libatomic if libatomic exists It is possible that libatomic does not exist on some systems. This patch updates the scudo standalone tests to link against libatomic if the library exists. This is an update to the original patch: https://reviews.llvm.org/D64134 and aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431. Differential Revision: https://reviews.llvm.org/D108503 --- compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt index f4186eba1688..eaa47a04a179 100644 --- a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt +++ b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt @@ -39,7 +39,10 @@ foreach(lib ${SANITIZER_TEST_CXX_LIBRARIES}) endforeach() list(APPEND LINK_FLAGS -pthread) # Linking against libatomic is required with some compilers -list(APPEND LINK_FLAGS -latomic) +check_library_exists(atomic __atomic_load_8 "" COMPILER_RT_HAS_LIBATOMIC) +if (COMPILER_RT_HAS_LIBATOMIC) + list(APPEND LINK_FLAGS -latomic) +endif() set(SCUDO_TEST_HEADERS scudo_unit_test.h </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/gnu-release-aarch64-spec2k6-Os_LTO - Build # 3 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_apm/gnu-release-aarch64-spec2k6-Os_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_apm/gnu-release-aarch64-spec2k6-Os_LTO Culprit: <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. </cut> Results regressed to (for first_bad == ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os_LTO artifacts/build-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0/results_id: 1 # 450.soplex,soplex_base.default regressed by 102 from (for last_good == a0a0499b8bb920fdd98e791804812f001f0b4fe8) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os_LTO artifacts/build-a0a0499b8bb920fdd98e791804812f001f0b4fe8/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Results ID of last_good: apm_64/tcwg_bmk_gnu_apm/bisect-gnu-release-aarch64-spec2k6-Os_LTO/4497 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Results ID of first_bad: apm_64/tcwg_bmk_gnu_apm/bisect-gnu-release-aarch64-spec2k6-Os_LTO/4482 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 cd investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a0a0499b8bb920fdd98e791804812f001f0b4fe8 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Full commit (up to 1000 lines): <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. --- gcc/testsuite/gcc.dg/lto/pr101868_0.c | 33 +++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_1.c | 23 +++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_2.c | 11 +++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_3.c | 8 ++++++++ gcc/tree-ssa-pre.c | 7 +++++++ 5 files changed, 82 insertions(+) diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_0.c b/gcc/testsuite/gcc.dg/lto/pr101868_0.c new file mode 100644 index 00000000000..c84d19b0267 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_0.c @@ -0,0 +1,33 @@ +/* { dg-lto-do run } */ +/* { dg-lto-options { "-O2 -fno-strict-aliasing -flto" } } */ + +typedef unsigned long VALUE; + +__attribute__ ((cold)) +void rb_check_type(VALUE, int); + +static VALUE +repro(VALUE dummy, VALUE hash) +{ + if (hash == 0) { + rb_check_type(hash, 1); + } + else if (*(long *)hash) { + rb_check_type(hash, 1); + } + + + return *(long *)hash; +} + +static VALUE (*that)(VALUE dummy, VALUE hash) = repro; + +int +main(int argc, char **argv) +{ + argc--; + that(0, argc); + + rb_check_type(argc, argc); + +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_1.c b/gcc/testsuite/gcc.dg/lto/pr101868_1.c new file mode 100644 index 00000000000..146c14abc76 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_1.c @@ -0,0 +1,23 @@ +typedef unsigned long VALUE; + + +__attribute__ ((noreturn)) void rexc_raise(VALUE mesg); + +VALUE rb_donothing(VALUE klass); + +static void +funexpected_type(VALUE x, int xt, int t) +{ + rexc_raise(rb_donothing(0)); +} + +__attribute__ ((cold)) +void +rb_check_type(VALUE x, int t) +{ + int xt; + + if (x == 0) { + funexpected_type(x, xt, t); + } +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_2.c b/gcc/testsuite/gcc.dg/lto/pr101868_2.c new file mode 100644 index 00000000000..e6f01b23f45 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_2.c @@ -0,0 +1,11 @@ +typedef unsigned long VALUE; + +static void thing(void) {} +static void (*ptr)(void) = &thing; + +VALUE +rb_donothing(VALUE klass) +{ + ptr(); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_3.c b/gcc/testsuite/gcc.dg/lto/pr101868_3.c new file mode 100644 index 00000000000..61217625be7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_3.c @@ -0,0 +1,8 @@ +typedef unsigned long VALUE; + +__attribute__((noreturn)) +void +rexc_raise(VALUE mesg) +{ + __builtin_exit(0); +} diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 04ec4fbaeec..2aedc31e1d7 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2070,6 +2070,13 @@ prune_clobbered_mems (bitmap_set_t set, basic_block block) && value_dies_in_block_x (expr, block)))) to_remove = i; } + /* If the REFERENCE may trap make sure the block does not contain + a possible exit point. + ??? This is overly conservative if we translate AVAIL_OUT + as the available expression might be after the exit point. */ + if (BB_MAY_NOTRETURN (block) + && vn_reference_may_trap (ref)) + to_remove = i; } else if (expr->kind == NARY) { </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 3 Sep

by Peter Maydell

Progress (short week, 2 days) * UM-2 [QEMU upstream maintainership] + Lots of code review and getting things upstream after trunk reopened + Wrote up a first draft of how to handle merging pullreqs, so that other people can share this job with me + Sent a patchset that allows board models to mark some buses as not suitable for plugging in user-created devices -- this avoids problems with i2c devices appearing on buses that are supposed to be for on-board devices only in the MPS2/MPS3 machines * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + MVE is now enabled upstream. (There are still some loose ends to do under this JIRA task, though.) + Sent a patchset that makes some of the easier codegen optimizations for the no-predication case. (Code review spotted an issue which might be painful to sort out -- we'll see next week...) -- PMM

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-arm-spec2k6-Oz - Build # 4 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz Culprit: <cut> commit 590d3faada8a12bf0937bbf68413956dc6a339a9 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 10:30:26 2021 +0200 [gdb/testsuite] Improve argument syntax of proc arange The current syntax of proc arange is: ... proc arange { arange_start arange_length {comment ""} {seg_sel ""} } { ... and a typical call looks like: ... arange $start $len ... This style is somewhat annoying because if you want to specify the last parameter, you need to give the default values of all the other optional ones before as well: ... arange $start $len "" $seg_sel ... Update the syntax to: ... proc arange { options arange_start arange_length } { parse_options { { comment "" } { seg_sel "" } } ... such that a typical call looks like: ... arange {} $start $len ... and a call using seg_sel looks like: ... arange { seg_sel $seg_sel } $start $len ... Also update proc aranges, which already has an options argument, to use the new proc parse_options. Tested on x86_64-linux. Co-Authored-By: Simon Marchi <simon.marchi(a)polymtl.ca> </cut> Results regressed to (for first_bad == 590d3faada8a12bf0937bbf68413956dc6a339a9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-590d3faada8a12bf0937bbf68413956dc6a339a9/results_id: 1 # 447.dealII,[.] contract<3> regressed by 200 from (for last_good == cb03dd22b36b7bd21a81137005ec42dab8355b62) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-cb03dd22b36b7bd21a81137005ec42dab8355b62/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4418 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4431 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-590d3faada8a12bf0937bbf68413956dc6a339a9 cd investigate-binutils-590d3faada8a12bf0937bbf68413956dc6a339a9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 590d3faada8a12bf0937bbf68413956dc6a339a9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach cb03dd22b36b7bd21a81137005ec42dab8355b62 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 590d3faada8a12bf0937bbf68413956dc6a339a9 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 10:30:26 2021 +0200 [gdb/testsuite] Improve argument syntax of proc arange The current syntax of proc arange is: ... proc arange { arange_start arange_length {comment ""} {seg_sel ""} } { ... and a typical call looks like: ... arange $start $len ... This style is somewhat annoying because if you want to specify the last parameter, you need to give the default values of all the other optional ones before as well: ... arange $start $len "" $seg_sel ... Update the syntax to: ... proc arange { options arange_start arange_length } { parse_options { { comment "" } { seg_sel "" } } ... such that a typical call looks like: ... arange {} $start $len ... and a call using seg_sel looks like: ... arange { seg_sel $seg_sel } $start $len ... Also update proc aranges, which already has an options argument, to use the new proc parse_options. Tested on x86_64-linux. Co-Authored-By: Simon Marchi <simon.marchi(a)polymtl.ca> --- gdb/testsuite/gdb.dlang/watch-loc.exp | 2 +- gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp | 6 +- .../gdb.dwarf2/frame-inlined-in-outer-frame.exp | 2 +- .../template-specification-full-name.exp | 2 +- gdb/testsuite/gdb.testsuite/parse_options_args.exp | 59 ++++++++++++ gdb/testsuite/lib/dwarf.exp | 31 +++--- gdb/testsuite/lib/gdb.exp | 104 ++++++++++++++------- 7 files changed, 150 insertions(+), 56 deletions(-) diff --git a/gdb/testsuite/gdb.dlang/watch-loc.exp b/gdb/testsuite/gdb.dlang/watch-loc.exp index 6e8b26e3109..e13400ed479 100644 --- a/gdb/testsuite/gdb.dlang/watch-loc.exp +++ b/gdb/testsuite/gdb.dlang/watch-loc.exp @@ -68,7 +68,7 @@ Dwarf::assemble $asm_file { } aranges {} cu_start { - arange $dmain_start $dmain_length + arange {} $dmain_start $dmain_length } } diff --git a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp index e65b4c8610a..d55b7fd150e 100644 --- a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp +++ b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp @@ -125,9 +125,9 @@ Dwarf::assemble $asm_file { } aranges {} cu_label { - arange [lindex $main_func 0] [lindex $main_func 1] - arange [lindex $frame2_func 0] [lindex $frame2_func 1] - arange [lindex $frame3_func 0] [lindex $frame3_func 1] + arange {} [lindex $main_func 0] [lindex $main_func 1] + arange {} [lindex $frame2_func 0] [lindex $frame2_func 1] + arange {} [lindex $frame3_func 0] [lindex $frame3_func 1] } } diff --git a/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp b/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp index ff12cd79f19..f95558dffef 100644 --- a/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp +++ b/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp @@ -95,7 +95,7 @@ Dwarf::assemble $dwarf_asm { } aranges {} cu_label { - arange __cu_low_pc __cu_high_pc + arange {} __cu_low_pc __cu_high_pc } } diff --git a/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp b/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp index 5c59777e1b6..6e736f2c8ef 100644 --- a/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp +++ b/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp @@ -69,7 +69,7 @@ Dwarf::assemble $asm_file { } aranges {} cu_start { - arange "$main_start" "$main_length" + arange {} "$main_start" "$main_length" } } diff --git a/gdb/testsuite/gdb.testsuite/parse_options_args.exp b/gdb/testsuite/gdb.testsuite/parse_options_args.exp new file mode 100644 index 00000000000..ce14fc3cd7c --- /dev/null +++ b/gdb/testsuite/gdb.testsuite/parse_options_args.exp @@ -0,0 +1,59 @@ +# Copyright 2021 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# Testsuite self-tests for parse_options and parse_args. + +with_test_prefix parse_options { + proc test1 { options a b } { + set v2 "defval2" + parse_options { + { opt1 defval1 } + { opt2 $v2 } + { opt3 } + { opt4 } + } + + gdb_assert { [string equal $a "vala"] } + gdb_assert { [string equal $b "valb"] } + gdb_assert { [string equal $opt1 "val1"] } + gdb_assert { [string equal $opt2 "defval2"] } + gdb_assert { $opt3 == 1 } + gdb_assert { $opt4 == 0 } + } + + set v1 "val1" + test1 { opt1 $v1 opt3 } "vala" "valb" +} + +with_test_prefix parse_args { + proc test2 { args } { + parse_args { + { opt1 defval1 } + { opt2 defval2 } + { opt3 } + { opt4 } + } + gdb_assert { [llength $args] == 2 } + lassign $args a b + gdb_assert { [string equal $a "vala"] } + gdb_assert { [string equal $b "valb"] } + gdb_assert { [string equal $opt1 "val1"] } + gdb_assert { [string equal $opt2 "defval2"] } + gdb_assert { $opt3 == 1 } + gdb_assert { $opt4 == 0 } + } + + set v1 "val1" + test2 -opt1 $v1 -opt3 "vala" "valb" +} diff --git a/gdb/testsuite/lib/dwarf.exp b/gdb/testsuite/lib/dwarf.exp index 120fa418201..7fb3561a443 100644 --- a/gdb/testsuite/lib/dwarf.exp +++ b/gdb/testsuite/lib/dwarf.exp @@ -2212,7 +2212,12 @@ namespace eval Dwarf { # Emit a DWARF .debug_aranges entry. - proc arange { arange_start arange_length {comment ""} {seg_sel ""} } { + proc arange { options arange_start arange_length } { + parse_options { + { comment "" } + { seg_sel "" } + } + if { $comment != "" } { # Wrap set comment " ($comment)" @@ -2270,22 +2275,14 @@ namespace eval Dwarf { variable _addr_size variable _seg_size - # Establish the defaults. - set is_64 0 - set cu_is_64 0 - set section_version 2 - set _seg_size 0 - # Handle options. - foreach { name value } $options { - switch -exact -- $name { - is_64 { set is_64 $value } - cu_is_64 { set cu_is_64 $value } - section_version {set section_version $value } - seg_size { set _seg_size $value } - default { error "unknown option $name" } - } + parse_options { + { is_64 0 } + { cu_is_64 0 } + { section_version 2 } + { seg_size 0 } } + set _seg_size $seg_size if { [is_64_target] } { set _addr_size 8 @@ -2354,9 +2351,9 @@ namespace eval Dwarf { # Terminator tuple. set comment "Terminator" if { $_seg_size == 0 } { - arange 0 0 $comment + arange {comment $comment} 0 0 } else { - arange 0 0 $comment 0 + arange {comment $comment seg_sel 0} 0 0 } # End label. diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp index 093392709b4..3aea7baaab0 100644 --- a/gdb/testsuite/lib/gdb.exp +++ b/gdb/testsuite/lib/gdb.exp @@ -7293,8 +7293,8 @@ proc using_fission { } { return [regexp -- "-gsplit-dwarf" $debug_flags] } -# Search the caller's ARGS list and set variables according to the list of -# valid options described by ARGSET. +# Search LISTNAME in uplevel LEVEL caller and set variables according to the +# list of valid options with prefix PREFIX described by ARGSET. # # The first member of each one- or two-element list in ARGSET defines the # name of a variable that will be added to the caller's scope. @@ -7305,13 +7305,15 @@ proc using_fission { } { # # If two elements are given, the second element is the default value of # the variable. This is then overwritten if the option exists in ARGS. +# If EVAL, then subst is called on the value, which allows variables +# to be used. # # Any parse_args elements in (the caller's) ARGS will be removed, leaving # any optional components. - +# # Example: # proc myproc {foo args} { -# parse_args {{bar} {baz "abc"} {qux}} +# parse_list args 1 {{bar} {baz "abc"} {qux}} "-" false # # ... # } # myproc ABC -bar -baz DEF peanut butter @@ -7319,43 +7321,79 @@ proc using_fission { } { # foo (=ABC), bar (=1), baz (=DEF), and qux (=0) # args will be the list {peanut butter} -proc parse_args { argset } { - upvar args args +proc parse_list { level listname argset prefix eval } { + upvar $level $listname args foreach argument $argset { - if {[llength $argument] == 1} { - # No default specified, so we assume that we should set - # the value to 1 if the arg is present and 0 if it's not. - # It is assumed that no value is given with the argument. - set result [lsearch -exact $args "-$argument"] - if {$result != -1} then { - uplevel 1 [list set $argument 1] - set args [lreplace $args $result $result] - } else { - uplevel 1 [list set $argument 0] - } - } elseif {[llength $argument] == 2} { - # There are two items in the argument. The second is a - # default value to use if the item is not present. - # Otherwise, the variable is set to whatever is provided - # after the item in the args. - set arg [lindex $argument 0] - set result [lsearch -exact $args "-[lindex $arg 0]"] - if {$result != -1} then { - uplevel 1 [list set $arg [lindex $args [expr $result+1]]] - set args [lreplace $args $result [expr $result+1]] - } else { - uplevel 1 [list set $arg [lindex $argument 1]] - } - } else { - error "Badly formatted argument \"$argument\" in argument set" - } + if {[llength $argument] == 1} { + # Normalize argument, strip leading/trailing whitespace. + # Allows us to treat {foo} and { foo } the same. + set argument [string trim $argument] + + # No default specified, so we assume that we should set + # the value to 1 if the arg is present and 0 if it's not. + # It is assumed that no value is given with the argument. + set pattern "$prefix$argument" + set result [lsearch -exact $args $pattern] + + if {$result != -1} then { + set value 1 + set args [lreplace $args $result $result] + } else { + set value 0 + } + uplevel $level [list set $argument $value] + } elseif {[llength $argument] == 2} { + # There are two items in the argument. The second is a + # default value to use if the item is not present. + # Otherwise, the variable is set to whatever is provided + # after the item in the args. + set arg [lindex $argument 0] + set pattern "$prefix[lindex $arg 0]" + set result [lsearch -exact $args $pattern] + + if {$result != -1} then { + set value [lindex $args [expr $result+1]] + if { $eval } { + set value [uplevel [expr $level + 1] [list subst $value]] + } + set args [lreplace $args $result [expr $result+1]] + } else { + set value [lindex $argument 1] + if { $eval } { + set value [uplevel $level [list subst $value]] + } + } + uplevel $level [list set $arg $value] + } else { + error "Badly formatted argument \"$argument\" in argument set" + } } +} + +# Search the caller's args variable and set variables according to the list of +# valid options described by ARGSET. + +proc parse_args { argset } { + parse_list 2 args $argset "-" false # The remaining args should be checked to see that they match the # number of items expected to be passed into the procedure... } +# Process the caller's options variable and set variables according +# to the list of valid options described by OPTIONSET. + +proc parse_options { optionset } { + parse_list 2 options $optionset "" true + + # Require no remaining options. + upvar 1 options options + if { [llength $options] != 0 } { + error "Options left unparsed: $options" + } +} + # Capture the output of COMMAND in a string ignoring PREFIX (a regexp); # return that string. </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] week ending 29 Aug 2021

by Richard Henderson

(PSA: On holiday through 11 September.) [ UM-2 ] * Some patch review * Revise riscv tcg_constant cleanup * Cleanup tcg/optimize.c * Optimize repeat sign-extensions. r~

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_O3 - Build # 2 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_O3. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_O3 Culprit: <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. </cut> Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_O3: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Full commit (up to 1000 lines): <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. --- gcc/expr.c | 19 ++++++++++++++++++- gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 096c0315ecc..5dd98a9bccc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) && (GET_MODE_PRECISION (subreg_promoted_mode (x)) >= GET_MODE_PRECISION (int_mode)) && SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)) - x = gen_lowpart (int_mode, SUBREG_REG (x)); + { + scalar_int_mode int_orig_mode; + machine_mode orig_mode = GET_MODE (x); + x = gen_lowpart (int_mode, SUBREG_REG (x)); + + /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than + the original mode, but narrower than the inner mode. */ + if (GET_CODE (x) == SUBREG + && GET_MODE_PRECISION (subreg_promoted_mode (x)) + > GET_MODE_PRECISION (int_mode) + && is_a <scalar_int_mode> (orig_mode, &int_orig_mode) + && GET_MODE_PRECISION (int_mode) + > GET_MODE_PRECISION (int_orig_mode)) + { + SUBREG_PROMOTED_VAR_P (x) = 1; + SUBREG_PROMOTED_SET (x, unsignedp); + } + } if (GET_MODE (x) != VOIDmode) oldmode = GET_MODE (x); diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e431e0c19d7..ebad5cb5a79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_SIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_SIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 1); + } + return temp; + } + } + else + /* Sign-extending a sign-extended subreg. */ + return simplify_gen_unary (SIGN_EXTEND, mode, + subreg, subreg_mode); } /* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>). @@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_UNSIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_UNSIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 0); + } + return temp; + } + } + else + /* Zero-extending a zero-extended subreg. */ + return simplify_gen_unary (ZERO_EXTEND, mode, + subreg, subreg_mode); } /* Extending a widening multiplication should be canonicalized to </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-master-arm-spec2k6-O2 - Build # 11 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2 Culprit: <cut> commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Thu Aug 19 11:42:09 2021 -0700 Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408 </cut> Results regressed to (for first_bad == 92c1fd19abb15bc68b1127a26137a69e033cdb39) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_marm artifacts/build-92c1fd19abb15bc68b1127a26137a69e033cdb39/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 1d02a8bcd393ea9c50f0212797059888efc78002) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_marm artifacts/build-1d02a8bcd393ea9c50f0212797059888efc78002/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/4381 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/4378 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Thu Aug 19 11:42:09 2021 -0700 Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408 --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 42 files changed, 4217 insertions(+), 4202 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index 2f853a2c6f9f..1c05afba730d 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,10 +117,11 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects and requires no operands that aren't always available. - /// This means the only allowed uses are constants and unallocatable physical - /// registers so that the instructions result is independent of the place - /// in the function. + /// has no side effects. Uses of constants and unallocatable physical + /// registers are always trivial to rematerialize so that the instructions + /// result is independent of the place in the function. Uses of virtual + /// registers are allowed but it is caller's responsility to ensure these + /// operands are valid at the point the instruction is beeing moved. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -140,8 +141,7 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value, or if it requres any address registers that are - /// not always available. + /// than producing a value. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index 1eab8e7443a7..fe7d60e0b7e2 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || + MI.getOperand(0).isTied()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; - - // Don't allow any virtual-register uses. Rematting an instruction with - // virtual register uses would length the live ranges of the uses, which - // is not necessarily a good idea, certainly not "trivial". - if (MO.isUse()) - return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index ed799bfca028..c9915aaabfde 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,6 +51,66 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... +# The liverange of %0 covers a point of rematerialization, source value is +# availabe. +--- +name: test_remat_s_mov_b32_vreg_src_long_lr +tracksRegLiveness: true +machineFunctionInfo: + stackPtrOffsetReg: $sgpr32 +body: | + bb.0: + ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr + ; GCN: renamable $sgpr0 = IMPLICIT_DEF + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 + ; GCN: S_ENDPGM 0 + %0:sreg_32 = IMPLICIT_DEF + %1:sreg_32 = S_MOV_B32 %0:sreg_32 + %2:sreg_32 = S_MOV_B32 %0:sreg_32 + %3:sreg_32 = S_MOV_B32 %0:sreg_32 + S_NOP 0, implicit %1 + S_NOP 0, implicit %2 + S_NOP 0, implicit %3 + S_NOP 0, implicit %0 + S_ENDPGM 0 +... +# The liverange of %0 does not cover a point of rematerialization, source value is +# unavailabe and we do not want to artificially extend the liverange. +--- +name: test_no_remat_s_mov_b32_vreg_src_short_lr +tracksRegLiveness: true +machineFunctionInfo: + stackPtrOffsetReg: $sgpr32 +body: | + bb.0: + ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr + ; GCN: renamable $sgpr0 = IMPLICIT_DEF + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) + ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 + ; GCN: S_ENDPGM 0 + %0:sreg_32 = IMPLICIT_DEF + %1:sreg_32 = S_MOV_B32 %0:sreg_32 + %2:sreg_32 = S_MOV_B32 %0:sreg_32 + %3:sreg_32 = S_MOV_B32 %0:sreg_32 + S_NOP 0, implicit %1 + S_NOP 0, implicit %2 + S_NOP 0, implicit %3 + S_ENDPGM 0 +... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index a4243276c70a..175a2069a441 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r1, r1, #1 +; ENABLE-NEXT: sub r3, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r3, [r0] -; ENABLE-NEXT: ldrb r3, [r12, r3] -; ENABLE-NEXT: add r0, r0, r3 -; ENABLE-NEXT: sub r3, r1, #1 -; ENABLE-NEXT: cmp r3, r1 +; ENABLE-NEXT: ldrb r1, [r0] +; ENABLE-NEXT: ldrb r1, [r12, r1] +; ENABLE-NEXT: add r0, r0, r1 +; ENABLE-NEXT: sub r1, r3, #1 +; ENABLE-NEXT: cmp r1, r3 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r1, r3 +; ENABLE-NEXT: mov r3, r1 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r1, r1, #1 +; DISABLE-NEXT: sub r3, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r3, [r0] -; DISABLE-NEXT: ldrb r3, [r12, r3] -; DISABLE-NEXT: add r0, r0, r3 -; DISABLE-NEXT: sub r3, r1, #1 -; DISABLE-NEXT: cmp r3, r1 +; DISABLE-NEXT: ldrb r1, [r0] +; DISABLE-NEXT: ldrb r1, [r12, r1] +; DISABLE-NEXT: add r0, r0, r1 +; DISABLE-NEXT: sub r1, r3, #1 +; DISABLE-NEXT: cmp r1, r3 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r1, r3 +; DISABLE-NEXT: mov r3, r1 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index 55157875d355..ea15fcc5c824 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and lr, r3, #63 -; SCALAR-NEXT: rsb r3, lr, #32 +; SCALAR-NEXT: and r12, r3, #63 +; SCALAR-NEXT: rsb r3, r12, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr r12, r0, lr -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 -; SCALAR-NEXT: subs r12, lr, #32 -; SCALAR-NEXT: lsrpl r3, r1, r12 +; SCALAR-NEXT: lsr lr, r0, r12 +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 +; SCALAR-NEXT: subs lr, r12, #32 +; SCALAR-NEXT: lsrpl r3, r1, lr ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, lr -; SCALAR-NEXT: cmp r12, #0 +; SCALAR-NEXT: lsr r0, r1, r12 +; SCALAR-NEXT: cmp lr, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and lr, r2, #63 +; CHECK-NEXT: and r12, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, lr, #32 +; CHECK-NEXT: rsb r3, r12, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr r12, r0, lr -; CHECK-NEXT: orr r3, r12, r1, lsl r3 -; CHECK-NEXT: subs r12, lr, #32 +; CHECK-NEXT: lsr lr, r0, r12 +; CHECK-NEXT: orr r3, lr, r1, lsl r3 +; CHECK-NEXT: subs lr, r12, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, r12 +; CHECK-NEXT: lsrpl r3, r1, lr ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, lr -; CHECK-NEXT: cmp r12, #0 +; CHECK-NEXT: lsr r0, r1, r12 +; CHECK-NEXT: cmp lr, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 54c93b493c98..6372f9be2ca3 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r6, r6, #27 -; CHECK-NEXT: and r1, r0, #63 ; CHECK-NEXT: lsl r2, r7, #27 +; CHECK-NEXT: and r12, r0, #63 +; CHECK-NEXT: lsl r6, r6, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 +; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: rsb r3, r1, #32 -; CHECK-NEXT: lsr r2, r2, r1 -; CHECK-NEXT: subs r12, r1, #32 -; CHECK-NEXT: bic r6, r6, r0 ; CHECK-NEXT: orr r2, r2, r7, lsl r3 +; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: bic r6, r6, r0 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r12 +; CHECK-NEXT: lsrpl r2, r7, r3 +; CHECK-NEXT: subs r1, r6, #32 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: subs r4, r6, #32 -; CHECK-NEXT: lsl r3, r8, #1 +; CHECK-NEXT: lsl r4, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r3, r3, r9, lsr #31 +; CHECK-NEXT: orr r4, r4, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r4, #0 -; CHECK-NEXT: lsr r1, r7, r1 +; CHECK-NEXT: cmp r1, #0 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r3, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r4 -; CHECK-NEXT: cmp r12, #0 +; CHECK-NEXT: orr r2, r2, r4, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r1 +; CHECK-NEXT: lsr r1, r7, r12 +; CHECK-NEXT: cmp r3, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 2922e0ed5423..0a0bb62b0a09 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 -; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: orr r2, r2, r12, lsl #24 -; BE-NEXT: orr r2, r2, #384 -; BE-NEXT: strb r2, [r1, #2] -; BE-NEXT: lsr r3, r2, #8 -; BE-NEXT: strh r3, [r1] -; BE-NEXT: bic r1, r12, #255 -; BE-NEXT: orr r1, r1, r2, lsr #24 +; BE-NEXT: ldr r3, [r0] +; BE-NEXT: orr r2, r2, r3, lsl #24 +; BE-NEXT: orr r12, r2, #384 +; BE-NEXT: strb r12, [r1, #2] +; BE-NEXT: lsr r2, r12, #8 +; BE-NEXT: strh r2, [r1] +; BE-NEXT: bic r1, r3, #255 +; BE-NEXT: orr r1, r1, r12, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r12, [r0] -; BE-NEXT: orr r2, r2, r12, lsl #24 -; BE-NEXT: orr r2, r2, #384 -; BE-NEXT: lsr r3, r2, #8 -; BE-NEXT: strh r3, [r1] -; BE-NEXT: bic r1, r12, #255 -; BE-NEXT: orr r1, r1, r2, lsr #24 +; BE-NEXT: ldr r3, [r0] +; BE-NEXT: orr r2, r2, r3, lsl #24 +; BE-NEXT: orr r12, r2, #384 +; BE-NEXT: lsr r2, r12, #8 +; BE-NEXT: strh r2, [r1] +; BE-NEXT: bic r1, r3, #255 +; BE-NEXT: orr r1, r1, r12, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 09a991da2e59..46490efb6631 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r0, r0, #3 +; CHECK-NEXT: and r12, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r3, sp -; CHECK-NEXT: vmov.u16 r12, d0[3] -; CHECK-NEXT: orr r0, r3, r0, lsl #1 +; CHECK-NEXT: mov r0, sp +; CHECK-NEXT: vmov.u16 r3, d0[3] +; CHECK-NEXT: orr r0, r0, r12, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r12 +; CHECK-NEXT: vmov.16 d0[3], r3 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index 8be7100d368b..a125446b27c3 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $2, $6 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $4, $7, $16 -; MMR3-NEXT: not16 $3, $16 -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sllv $3, $2, $3 -; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: or16 $3, $4 -; MMR3-NEXT: srlv $6, $6, $16 -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: srlv $3, $7, $16 +; MMR3-NEXT: not16 $6, $16 +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $2 +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $2, 1 +; MMR3-NEXT: sllv $2, $2, $6 +; MMR3-NEXT: li16 $6, 64 +; MMR3-NEXT: or16 $2, $3 +; MMR3-NEXT: srlv $4, $4, $16 +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $6, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $2, $7, 32 -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $5, $16, 32 -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $9 +; MMR3-NEXT: andi16 $5, $7, 32 +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $6, $16, 32 +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $3, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $4, $17, $2 -; MMR3-NEXT: movn $3, $6, $5 -; MMR3-NEXT: addiu $2, $16, -64 -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $5, $5, $2 -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $6, $17, 1 -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $5, $2 -; MMR3-NEXT: sllv $5, $6, $5 -; MMR3-NEXT: or16 $3, $4 -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: srav $1, $17, $2 -; MMR3-NEXT: andi16 $2, $2, 32 -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $2 -; MMR3-NEXT: sllv $2, $17, $7 -; MMR3-NEXT: not16 $4, $7 -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: srl16 $6, $7, 1 -; MMR3-NEXT: srlv $6, $6, $4 +; MMR3-NEXT: movn $3, $17, $5 +; MMR3-NEXT: movn $2, $4, $6 +; MMR3-NEXT: addiu $4, $16, -64 +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $4, $17, $4 +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $4, $6, 1 +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: addiu $5, $16, -64 +; MMR3-NEXT: not16 $5, $5 +; MMR3-NEXT: sllv $5, $4, $5 +; MMR3-NEXT: or16 $2, $3 +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $3 +; MMR3-NEXT: addiu $3, $16, -64 +; MMR3-NEXT: srav $1, $6, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 +; MMR3-NEXT: sllv $3, $6, $7 +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: srl16 $4, $17, 1 +; MMR3-NEXT: srlv $3, $4, $3 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $3, $10 -; MMR3-NEXT: or16 $6, $2 -; MMR3-NEXT: srlv $2, $7, $16 -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $3, $4, $3 +; MMR3-NEXT: movn $5, $2, $10 +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srav $11, $17, $16 -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $11, $4 -; MMR3-NEXT: sra $2, $17, 31 +; MMR3-NEXT: srlv $2, $17, $16 +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $17, $7, $4 +; MMR3-NEXT: or16 $17, $2 +; MMR3-NEXT: srav $11, $6, $16 +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $17, $11, $2 +; MMR3-NEXT: sra $2, $6, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $8, $2 -; MMR3-NEXT: movn $8, $3, $10 -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $6, $9, $3 -; MMR3-NEXT: li16 $3, 0 -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $3, $4 -; MMR3-NEXT: or16 $7, $6 +; MMR3-NEXT: move $4, $2 +; MMR3-NEXT: movn $4, $17, $10 +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $9, $6 +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $17, $6 +; MMR3-NEXT: or16 $7, $3 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $4 +; MMR3-NEXT: movn $11, $2, $6 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $8 +; MMR3-NEXT: move $3, $4 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $1, $7 +; MMR6-NEXT: move $12, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $7, $2, $3 -; MMR6-NEXT: sllv $8, $5, $7 -; MMR6-NEXT: andi16 $2, $7, 32 -; MMR6-NEXT: selnez $9, $8, $2 -; MMR6-NEXT: sllv $10, $4, $7 -; MMR6-NEXT: not16 $7, $7 -; MMR6-NEXT: srl16 $16, $5, 1 -; MMR6-NEXT: srlv $7, $16, $7 -; MMR6-NEXT: or $7, $10, $7 -; MMR6-NEXT: seleqz $7, $7, $2 -; MMR6-NEXT: or $7, $9, $7 -; MMR6-NEXT: srlv $9, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $16, $2, $3 +; MMR6-NEXT: sllv $1, $5, $16 +; MMR6-NEXT: andi16 $2, $16, 32 +; MMR6-NEXT: selnez $8, $1, $2 +; MMR6-NEXT: sllv $9, $4, $16 +; MMR6-NEXT: not16 $16, $16 +; MMR6-NEXT: srl16 $17, $5, 1 +; MMR6-NEXT: srlv $10, $17, $16 +; MMR6-NEXT: or $9, $9, $10 +; MMR6-NEXT: seleqz $9, $9, $2 +; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $9, $7, $3 +; MMR6-NEXT: not16 $7, $3 +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $16 +; MMR6-NEXT: sllv $10, $17, $7 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $10, $10, $7 -; MMR6-NEXT: seleqz $12, $8, $2 -; MMR6-NEXT: or $8, $11, $9 +; MMR6-NEXT: or $8, $10, $8 +; MMR6-NEXT: seleqz $1, $1, $2 +; MMR6-NEXT: or $9, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $9, $5, $2 +; MMR6-NEXT: srlv $10, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $8, $8, $12 -; MMR6-NEXT: selnez $10, $10, $13 -; MMR6-NEXT: or $9, $11, $9 -; MMR6-NEXT: srav $11, $4, $2 +; MMR6-NEXT: or $1, $9, $1 +; MMR6-NEXT: selnez $8, $8, $13 +; MMR6-NEXT: or $9, $11, $10 +; MMR6-NEXT: srav $10, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $12, $11, $2 +; MMR6-NEXT: seleqz $11, $10, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $12, $15, $12 -; MMR6-NEXT: seleqz $12, $12, $13 -; MMR6-NEXT: selnez $2, $11, $2 -; MMR6-NEXT: seleqz $11, $14, $13 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: selnez $10, $10, $3 -; MMR6-NEXT: selnez $8, $8, $13 +; MMR6-NEXT: or $11, $15, $11 +; MMR6-NEXT: seleqz $11, $11, $13 +; MMR6-NEXT: selnez $2, $10, $2 +; MMR6-NEXT: seleqz $10, $14, $13 +; MMR6-NEXT: or $8, $8, $11 +; MMR6-NEXT: selnez $8, $8, $3 +; MMR6-NEXT: selnez $1, $1, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $12, $14, $17 -; MMR6-NEXT: or $4, $12, $4 -; MMR6-NEXT: selnez $12, $4, $13 +; MMR6-NEXT: selnez $11, $14, $17 +; MMR6-NEXT: or $4, $11, $4 +; MMR6-NEXT: selnez $11, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: seleqz $6, $12, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: or $4, $4, $10 -; MMR6-NEXT: or $2, $12, $11 -; MMR6-NEXT: srlv $3, $5, $3 -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $5, $7, $5 -; MMR6-NEXT: or $3, $5, $3 -; MMR6-NEXT: seleqz $3, $3, $17 -; MMR6-NEXT: selnez $5, $9, $17 -; MMR6-NEXT: or $3, $5, $3 -; MMR6-NEXT: selnez $3, $3, $13 -; MMR6-NEXT: or $3, $3, $11 +; MMR6-NEXT: selnez $1, $1, $3 +; MMR6-NEXT: or $1, $6, $1 +; MMR6-NEXT: or $4, $4, $8 +; MMR6-NEXT: or $6, $11, $10 +; MMR6-NEXT: srlv $2, $5, $3 +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $3, $7, $3 +; MMR6-NEXT: or $2, $3, $2 +; MMR6-NEXT: seleqz $2, $2, $17 +; MMR6-NEXT: selnez $3, $9, $17 +; MMR6-NEXT: or $2, $3, $2 +; MMR6-NEXT: selnez $2, $2, $13 +; MMR6-NEXT: or $3, $2, $10 +; MMR6-NEXT: move $2, $6 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index ed2bfc9fcf60..e4b4b3ae1d0f 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $7, $2, $16 -; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: move $17, $5 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $3, $7, 32 +; MMR3-NEXT: subu16 $17, $2, $16 +; MMR3-NEXT: sllv $9, $5, $17 +; MMR3-NEXT: andi16 $3, $17, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $8, $16 +; MMR3-NEXT: srlv $5, $7, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $5, $6, $16 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: srlv $7, $6, $16 ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $5, $3 +; MMR3-NEXT: movn $2, $7, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: srlv $4, $17, $3 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $6, $4, 1 -; MMR3-NEXT: not16 $5, $3 -; MMR3-NEXT: sllv $5, $6, $5 -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $17 -; MMR3-NEXT: srlv $1, $4, $3 -; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $3, $6, $3 ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $4, $3, 1 +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: addiu $5, $16, -64 +; MMR3-NEXT: not16 $5, $5 +; MMR3-NEXT: sllv $5, $4, $5 +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: addiu $4, $16, -64 +; MMR3-NEXT: srlv $1, $3, $4 +; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $4 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $4, $7 -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srl16 $4, $7, 1 +; MMR3-NEXT: sllv $2, $3, $17 +; MMR3-NEXT: not16 $3, $17 +; MMR3-NEXT: srl16 $4, $6, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: srlv $2, $6, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $17 +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $6 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $6, 0 -; MMR3-NEXT: movz $3, $6, $10 -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $7 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $7, 0 -; MMR3-NEXT: movn $6, $7, $17 -; MMR3-NEXT: or16 $6, $4 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movz $3, $17, $10 +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $17 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movn $7, $17, $6 +; MMR3-NEXT: or16 $7, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $7, $4 -; MMR3-NEXT: li16 $7, 0 -; MMR3-NEXT: movn $1, $6, $10 +; MMR3-NEXT: movn $1, $17, $4 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $7, $17 +; MMR3-NEXT: movn $2, $17, $6 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -32 -; MMR6-NEXT: .cfi_def_cfa_offset 32 -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -24 +; MMR6-NEXT: .cfi_def_cfa_offset 24 +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $5 -; MMR6-NEXT: lw $3, 60($sp) +; MMR6-NEXT: move $7, $4 +; MMR6-NEXT: lw $3, 52($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $5, $3 -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $17, $6 -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $4, $6 +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $5 +; MMR6-NEXT: sllv $6, $6, $16 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $5, $3, -64 -; MMR6-NEXT: srlv $9, $7, $5 -; MMR6-NEXT: move $6, $4 -; MMR6-NEXT: sll16 $2, $4, 1 -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $5 +; MMR6-NEXT: addiu $6, $3, -64 +; MMR6-NEXT: srlv $9, $5, $6 +; MMR6-NEXT: sll16 $2, $7, 1 +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $6 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $17, $3 +; MMR6-NEXT: srlv $10, $4, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $7, $2 -; MMR6-NEXT: move $17, $7 +; MMR6-NEXT: sllv $12, $5, $2 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $7, $5, 32 -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: seleqz $9, $9, $7 +; MMR6-NEXT: andi16 $17, $6, 32 +; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $6, $2 -; MMR6-NEXT: move $7, $6 -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: sllv $12, $7, $2 ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $17, 1 +; MMR6-NEXT: srl16 $6, $5, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: srlv $4, $7, $5 -; MMR6-NEXT: or $11, $11, $2 -; MMR6-NEXT: or $5, $8, $13 -; MMR6-NEXT: srlv $6, $17, $3 -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: selnez $7, $4, $2 -; MMR6-NEXT: sltiu $8, $3, 64 -; MMR6-NEXT: selnez $12, $5, $8 -; MMR6-NEXT: or $7, $7, $9 -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $4, $3, -64 +; MMR6-NEXT: srlv $4, $7, $4 +; MMR6-NEXT: or $12, $11, $2 +; MMR6-NEXT: or $6, $8, $13 +; MMR6-NEXT: srlv $5, $5, $3 +; MMR6-NEXT: selnez $8, $4, $17 +; MMR6-NEXT: sltiu $11, $3, 64 +; MMR6-NEXT: selnez $13, $6, $11 +; MMR6-NEXT: or $8, $8, $9 ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $2, $5 +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $9, $6, $2 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $5, 0 -; MMR6-NEXT: or $10, $10, $11 -; MMR6-NEXT: or $6, $9, $6 -; MMR6-NEXT: seleqz $2, $7, $8 -; MMR6-NEXT: seleqz $7, $5, $8 -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: srlv $9, $5, $3 -; MMR6-NEXT: seleqz $11, $9, $16 -; MMR6-NEXT: selnez $11, $11, $8 +; MMR6-NEXT: li16 $2, 0 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: or $9, $9, $5 +; MMR6-NEXT: seleqz $5, $8, $11 +; MMR6-NEXT: seleqz $8, $2, $11 +; MMR6-NEXT: srlv $7, $7, $3 +; MMR6-NEXT: seleqz $2, $7, $16 +; MMR6-NEXT: selnez $2, $2, $11 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $2, $12, $2 -; MMR6-NEXT: selnez $2, $2, $3 -; MMR6-NEXT: or $5, $1, $2 -; MMR6-NEXT: or $2, $7, $11 -; MMR6-NEXT: seleqz $1, $6, $16 -; MMR6-NEXT: selnez $6, $9, $16 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $9, $16, $3 -; MMR6-NEXT: selnez $10, $10, $8 -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $4, $4, $16 -; MMR6-NEXT: seleqz $4, $4, $8 -; MMR6-NEXT: or $4, $10, $4 +; MMR6-NEXT: or $5, $13, $5 +; MMR6-NEXT: selnez $5, $5, $3 +; MMR6-NEXT: or $5, $1, $5 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: seleqz $1, $9, $16 +; MMR6-NEXT: selnez $6, $7, $16 +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $7, $7, $3 +; MMR6-NEXT: selnez $9, $10, $11 +; MMR6-NEXT: seleqz $4, $4, $17 +; MMR6-NEXT: seleqz $4, $4, $11 </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_native_build/master-aarch64 - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gnu_native_build/master-aarch64. So far, this commit has regressed CI configurations: - tcwg_gnu_native_build/master-aarch64 Culprit: <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. </cut> Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/shared-object.mk:14: trunctfhf2.o] Error 1 from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… Build log: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/con… Full commit (up to 1000 lines): <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. --- gcc/expr.c | 19 ++++++++++++++++++- gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 096c0315ecc..5dd98a9bccc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) && (GET_MODE_PRECISION (subreg_promoted_mode (x)) >= GET_MODE_PRECISION (int_mode)) && SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)) - x = gen_lowpart (int_mode, SUBREG_REG (x)); + { + scalar_int_mode int_orig_mode; + machine_mode orig_mode = GET_MODE (x); + x = gen_lowpart (int_mode, SUBREG_REG (x)); + + /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than + the original mode, but narrower than the inner mode. */ + if (GET_CODE (x) == SUBREG + && GET_MODE_PRECISION (subreg_promoted_mode (x)) + > GET_MODE_PRECISION (int_mode) + && is_a <scalar_int_mode> (orig_mode, &int_orig_mode) + && GET_MODE_PRECISION (int_mode) + > GET_MODE_PRECISION (int_orig_mode)) + { + SUBREG_PROMOTED_VAR_P (x) = 1; + SUBREG_PROMOTED_SET (x, unsignedp); + } + } if (GET_MODE (x) != VOIDmode) oldmode = GET_MODE (x); diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e431e0c19d7..ebad5cb5a79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_SIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_SIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 1); + } + return temp; + } + } + else + /* Sign-extending a sign-extended subreg. */ + return simplify_gen_unary (SIGN_EXTEND, mode, + subreg, subreg_mode); } /* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>). @@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_UNSIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_UNSIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 0); + } + return temp; + } + } + else + /* Zero-extending a zero-extended subreg. */ + return simplify_gen_unary (ZERO_EXTEND, mode, + subreg, subreg_mode); } /* Extending a widening multiplication should be canonicalized to </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_cross_build/master-aarch64 - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gnu_cross_build/master-aarch64. So far, this commit has regressed CI configurations: - tcwg_gnu_cross_build/master-aarch64 Culprit: <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. </cut> Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:04:40 cc1: error: no include path in which to search for stdc-predef.h # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/static-object.mk:17: floatsitf.o] Error 1 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… Build log: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/cons… Full commit (up to 1000 lines): <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. --- gcc/expr.c | 19 ++++++++++++++++++- gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 096c0315ecc..5dd98a9bccc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) && (GET_MODE_PRECISION (subreg_promoted_mode (x)) >= GET_MODE_PRECISION (int_mode)) && SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)) - x = gen_lowpart (int_mode, SUBREG_REG (x)); + { + scalar_int_mode int_orig_mode; + machine_mode orig_mode = GET_MODE (x); + x = gen_lowpart (int_mode, SUBREG_REG (x)); + + /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than + the original mode, but narrower than the inner mode. */ + if (GET_CODE (x) == SUBREG + && GET_MODE_PRECISION (subreg_promoted_mode (x)) + > GET_MODE_PRECISION (int_mode) + && is_a <scalar_int_mode> (orig_mode, &int_orig_mode) + && GET_MODE_PRECISION (int_mode) + > GET_MODE_PRECISION (int_orig_mode)) + { + SUBREG_PROMOTED_VAR_P (x) = 1; + SUBREG_PROMOTED_SET (x, unsignedp); + } + } if (GET_MODE (x) != VOIDmode) oldmode = GET_MODE (x); diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e431e0c19d7..ebad5cb5a79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_SIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_SIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 1); + } + return temp; + } + } + else + /* Sign-extending a sign-extended subreg. */ + return simplify_gen_unary (SIGN_EXTEND, mode, + subreg, subreg_mode); } /* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>). @@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_UNSIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_UNSIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 0); + } + return temp; + } + } + else + /* Zero-extending a zero-extended subreg. */ + return simplify_gen_unary (ZERO_EXTEND, mode, + subreg, subreg_mode); } /* Extending a widening multiplication should be canonicalized to </cut>

4 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_native_build/master-arm - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gdb* in CI configuration tcwg_gnu_native_build/master-arm. So far, this commit has regressed CI configurations: - tcwg_gnu_native_build/master-arm Culprit: <cut> commit 282aa4f7d292eb4bc213d028465a3b96f5af2f22 Author: Tom Tromey <tom(a)tromey.com> Date: Sat Aug 28 13:16:50 2021 -0600 Add some parallel_for_each tests Tom de Vries noticed that a patch in the DWARF scanner rewrite series caused a regression in parallel_for_each -- it started crashing in the case where the number of threads is 0 (there was an unchecked use of "n-1" that was used to size an array). He also pointed out that there were no tests of parallel_for_each. This adds a few tests of parallel_for_each, primarily testing that different settings for the number of threads will work. This test catches the bug that he found in that series. </cut> Results regressed to (for first_bad == 282aa4f7d292eb4bc213d028465a3b96f5af2f22) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # First few build errors in logs: # 00:03:45 ../../../../../../gdb/gdb/unittests/parallel-for-selftests.c:53:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’ # 00:03:45 make[1]: *** [unittests/parallel-for-selftests.o] Error 1 # 00:03:46 make: *** [all-gdb] Error 2 from (for last_good == ee8b88452c1cb1be97199942aee7a76bbca210ee) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gdb-282aa4f7d292eb4bc213d028465a3b96f5af2f22 cd investigate-gdb-282aa4f7d292eb4bc213d028465a3b96f5af2f22 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gdb/ ./ ./bisect/baseline/ cd gdb # Reproduce first_bad build git checkout --detach 282aa4f7d292eb4bc213d028465a3b96f5af2f22 ../artifacts/test.sh # Reproduce last_good build git checkout --detach ee8b88452c1cb1be97199942aee7a76bbca210ee ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… Build log: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/console… Full commit (up to 1000 lines): <cut> commit 282aa4f7d292eb4bc213d028465a3b96f5af2f22 Author: Tom Tromey <tom(a)tromey.com> Date: Sat Aug 28 13:16:50 2021 -0600 Add some parallel_for_each tests Tom de Vries noticed that a patch in the DWARF scanner rewrite series caused a regression in parallel_for_each -- it started crashing in the case where the number of threads is 0 (there was an unchecked use of "n-1" that was used to size an array). He also pointed out that there were no tests of parallel_for_each. This adds a few tests of parallel_for_each, primarily testing that different settings for the number of threads will work. This test catches the bug that he found in that series. --- gdb/Makefile.in | 1 + gdb/unittests/parallel-for-selftests.c | 86 ++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/gdb/Makefile.in b/gdb/Makefile.in index 73a1bf83c85..320d3326a81 100644 --- a/gdb/Makefile.in +++ b/gdb/Makefile.in @@ -456,6 +456,7 @@ SELFTESTS_SRCS = \ unittests/offset-type-selftests.c \ unittests/observable-selftests.c \ unittests/optional-selftests.c \ + unittests/parallel-for-selftests.c \ unittests/parse-connection-spec-selftests.c \ unittests/ptid-selftests.c \ unittests/main-thread-selftests.c \ diff --git a/gdb/unittests/parallel-for-selftests.c b/gdb/unittests/parallel-for-selftests.c new file mode 100644 index 00000000000..7f61b709fa7 --- /dev/null +++ b/gdb/unittests/parallel-for-selftests.c @@ -0,0 +1,86 @@ +/* Self tests for parallel_for_each + + Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GDB. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#include "defs.h" +#include "gdbsupport/selftest.h" +#include "gdbsupport/parallel-for.h" +#include "gdbsupport/thread-pool.h" + +#if CXX_STD_THREAD + +namespace selftests { +namespace parallel_for { + +struct save_restore_n_threads +{ + save_restore_n_threads () + : n_threads (gdb::thread_pool::g_thread_pool->thread_count ()) + { + } + + ~save_restore_n_threads () + { + gdb::thread_pool::g_thread_pool->set_thread_count (n_threads); + } + + int n_threads; +}; + +static void +test (int n_threads) +{ + save_restore_n_threads saver; + gdb::thread_pool::g_thread_pool->set_thread_count (n_threads); + +#define NUMBER 10000 + + std::atomic<int> counter = 0; + gdb::parallel_for_each (0, NUMBER, + [&] (int start, int end) + { + counter += end - start; + }); + + SELF_CHECK (counter == NUMBER); + +#undef NUMBER +} + +static void +test_n_threads () +{ + test (0); + test (1); + test (3); +} + +} +} + +#endif /* CXX_STD_THREAD */ + +void _initialize_parallel_for_selftests (); +void +_initialize_parallel_for_selftests () +{ +#ifdef CXX_STD_THREAD + selftests::register_test ("parallel_for", + selftests::parallel_for::test_n_threads); +#endif /* CXX_STD_THREAD */ +} </cut>

4 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_cross_build/master-arm - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gnu_cross_build/master-arm. So far, this commit has regressed CI configurations: - tcwg_gnu_cross_build/master-arm Culprit: <cut> commit caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 Author: Sebastian Huber <sebastian.huber(a)embedded-brains.de> Date: Tue Aug 17 09:53:43 2021 +0200 Use __builtin_trap() for abort() if inhibit_libc abort() is used in gcc_assert() and gcc_unreachable() which is used by target libraries such as libgcov.a. This patch changes the abort() definition under certain conditions. If inhibit_libc is defined and abort is not already defined, then abort() is defined to __builtin_trap(). The inhibit_libc define is usually defined if GCC is built for targets running in embedded systems which may optionally use a C standard library. If inhibit_libc is defined, then there may be still a full featured abort() available. abort() is a heavy weight function which depends on signals and file streams. For statically linked applications, this means that a dependency on gcc_assert() pulls in the support for signals and file streams. This could prevent using gcov to test low end targets for example. Using __builtin_trap() avoids these dependencies if the target implements a "trap" instruction. The application or operating system could use a trap handler to react to failed GCC runtime checks which caused a trap. gcc/ * tsystem.h (abort): Define abort() if inhibit_libc is defined and it is not already defined. </cut> Results regressed to (for first_bad == caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:01:44 cc1: error: no include path in which to search for stdc-predef.h # 00:02:05 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/unwind-arm-common.inc:55:24: error: macro passed 1 arguments, but takes just 0 # 00:02:05 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/static-object.mk:17: unwind-arm.o] Error 1 # 00:02:06 make[1]: *** [Makefile:12484: all-target-libgcc] Error 2 # 00:02:06 make: *** [Makefile:953: all] Error 2 from (for last_good == d7e56b084d0b230ae5ee280f569d679fa0f09f4d) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 cd investigate-gcc-caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d7e56b084d0b230ae5ee280f569d679fa0f09f4d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… Build log: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/consoleT… Full commit (up to 1000 lines): <cut> commit caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 Author: Sebastian Huber <sebastian.huber(a)embedded-brains.de> Date: Tue Aug 17 09:53:43 2021 +0200 Use __builtin_trap() for abort() if inhibit_libc abort() is used in gcc_assert() and gcc_unreachable() which is used by target libraries such as libgcov.a. This patch changes the abort() definition under certain conditions. If inhibit_libc is defined and abort is not already defined, then abort() is defined to __builtin_trap(). The inhibit_libc define is usually defined if GCC is built for targets running in embedded systems which may optionally use a C standard library. If inhibit_libc is defined, then there may be still a full featured abort() available. abort() is a heavy weight function which depends on signals and file streams. For statically linked applications, this means that a dependency on gcc_assert() pulls in the support for signals and file streams. This could prevent using gcov to test low end targets for example. Using __builtin_trap() avoids these dependencies if the target implements a "trap" instruction. The application or operating system could use a trap handler to react to failed GCC runtime checks which caused a trap. gcc/ * tsystem.h (abort): Define abort() if inhibit_libc is defined and it is not already defined. --- gcc/tsystem.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tsystem.h b/gcc/tsystem.h index e1e6a96a4f4..5c72c69ff3e 100644 --- a/gcc/tsystem.h +++ b/gcc/tsystem.h @@ -59,7 +59,7 @@ extern int atexit (void (*)(void)); #endif #ifndef abort -extern void abort (void) __attribute__ ((__noreturn__)); +#define abort() __builtin_trap () #endif #ifndef strlen </cut>

4 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled - Build # 3 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled Culprit: <cut> commit a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Mon Aug 23 00:00:07 2021 +0000 Automatic date update in version.in </cut> Results regressed to (for first_bad == a12ea97b9dab8eedf411fc5052ffaa8be29f5d36) # reset_artifacts: -10 # true: 0 # First few build errors in logs: from (for last_good == fe7f0b013526b30ef657c5ad34a3c622a54499ac) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_profiled: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 cd investigate-binutils-a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fe7f0b013526b30ef657c5ad34a3c622a54499ac ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Full commit (up to 1000 lines): <cut> commit a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Mon Aug 23 00:00:07 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index d22e02c7fd4..66d43d6f3a1 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210822 +#define BFD_VERSION_DATE 20210823 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 10 months

1
0
0 0

[ACTIVITY] report week ending 26 Aug

by Peter Maydell

Progress (short week, 3 days): * UM-2 [QEMU upstream maintainership] + QEMU 6.1.0 has now been released + Sent out the first arm pullreq for the 6.2 cycle, including another slice of the MVE patches + tried to work through some of the codereview backlog -- PMM

4 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_debug - Build # 2 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_debug. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_debug Culprit: <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. </cut> Results regressed to (for first_bad == 1d244020246cb155e4de62ca3b302b920a1f513f) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:06:26 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:25:39 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: error: type mismatch in ‘lshift_expr’ # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: internal compiler error: ‘verify_gimple’ failed # 00:29:38 make[3]: *** [bitmap.o] Error 1 # 00:34:06 make[2]: *** [all-stage3-gcc] Error 2 # 00:34:06 make[1]: *** [stage3-bubble] Error 2 # 00:34:07 make: *** [all] Error 2 from (for last_good == b320edc0c29c838b0090c3c9be14187d132f73f2) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_debug: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f cd investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 1d244020246cb155e4de62ca3b302b920a1f513f ../artifacts/test.sh # Reproduce last_good build git checkout --detach b320edc0c29c838b0090c3c9be14187d132f73f2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Full commit (up to 1000 lines): <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. --- gcc/match.pd | 9 +++++++++ gcc/testsuite/gcc.dg/fold-convlshift-1.c | 20 ++++++++++++++++++++ gcc/testsuite/gcc.dg/fold-convlshift-2.c | 20 ++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 0fcfd0ea62c..978a1b0172e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3385,6 +3385,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (integer_zerop (@2) || integer_all_onesp (@2)) (cmp @0 @2))))) +/* Both signed and unsigned lshift produce the same result, so use + the form that minimizes the number of conversions. */ +(simplify + (convert (lshift:s@0 (convert:s@1 @2) INTEGER_CST@3)) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) + && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)) + (lshift (convert @2) @3))) + /* Simplifications of conversions. */ /* Basic strip-useless-type-conversions / strip_nops. */ diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-1.c b/gcc/testsuite/gcc.dg/fold-convlshift-1.c new file mode 100644 index 00000000000..b6f57f81e72 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned int i) +{ + int t1 = i; + int t2 = t1 << 8; + return t2; +} + +int bar(int i) +{ + unsigned int t1 = i; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-not "\$int\$" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\$unsigned int\$" "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-2.c b/gcc/testsuite/gcc.dg/fold-convlshift-2.c new file mode 100644 index 00000000000..f21358c4584 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned char c) +{ + int t1 = c; + int t2 = t1 << 8; + return t2; +} + +int bar(unsigned char c) +{ + unsigned int t1 = c; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-times "\$int\$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\$unsigned int\$" 1 "optimized" } } */ + </cut>

4 years, 10 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain