linaro-toolchain

linaro-toolchain@lists.linaro.org

14 participants
5758 discussions

[ACTIVITY] week ending Nov. 28 2021

by Alex Bennée

VirtIO Initiative ([STR-9]) =========================== - [this weeks sync], topics on AF_XDP, virtio-video and virtio-watchdog [upstream rust-vmm sync meeting] <https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…> QEMU Upstream Work ([UM-2]) =========================== - posted [PATCH for 6.2 v2 0/7] more tcg, plugin, test and build fixes Message-Id: <20211125154144.2904741-1-alex.bennee(a)linaro.org> Upstream MTTCG tests ([QEMU-52]) - posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org> [mttcg tests to current state and fixed up] <https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8> Other ===== - renewal feedback Completed Reviews [2/2] ======================= [PATCH v2 0/3] KVM: qemu patches for few KVM features I developed Message-Id: <20211101132300.192584-1-mlevitsk(a)redhat.com> [PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes Message-Id: <20211124202005.989935-1-peter.maydell(a)linaro.org> Absences ======== - off 2 days sick Current Review Queue ==================== TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64 Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com> =============================================================================================================== TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com> =================================================================================================================== TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option Message-Id: <20211013010120.96851-1-sjg(a)chromium.org> =========================================================================================================== TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

4 years, 7 months

[ACTIVITY] report week ending 26 Nov

by Peter Maydell

Progress: * QEMU-420 [GICv4 emulation] - Tracked down and fixed a bug in our ITS emulation which would (intermittently?) result in a Linux guest reporting "irq 54: nobody cared" and hanging, because we were not correctly recalculating the highest priority pending interrupt when the guest acknowledged a pending LPI. This fix will go into 6.2. - Set up a test environment for GICv4 work -- because the major feature of GICv4 is support for directly injecting interrupts into a VM, the test setup needs to be nested virtualization, where an outer L1 guest runs on pure emulated QEMU, the inner L2 guest uses KVM (as provided by L1), and we pass a PCI device (emulated by QEMU) through from L1 to L2. I think I have this correctly set up now, but... - ...the L2 guest hangs because it apparently never sees an interrupt from the passed-through PCI device. This implies a bug in our current GICv3 emulation somewhere: need to track this down before starting in on GICv4 work. - Separately, I found through code inspection a bug where we do the wrong thing in the non-passthrough case when the L1 guest sets a virtual interrupt for the L2 guest in the GIC list registers and that interrupt has an ID > 1023 (ie it is an LPI). We got this wrong both for acknowledging and ending an interrupt, so the two bugs cancel each other out except that we don't set the vCPU priority and so the L2 guest might get an unexpected interrupt while it was servicing the LPI. Patches sent. -- PMM

4 years, 7 months

[TCWG CI] 433.milc slowed down by 5% after llvm: [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32

by ci_notify＠linaro.org

After llvm commit d7e03df719464354b20a845b7853be57da863924 Author: Jay Foad <jay.foad(a)amd.com> [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 the following benchmarks slowed down by more than 2%: - 433.milc slowed down by 5% from 12335 to 12997 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d7e03df719464354b20a845b7853be57da863924 cd investigate-llvm-d7e03df719464354b20a845b7853be57da863924 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d7e03df719464354b20a845b7853be57da863924 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8a52bd82e36855b3ad842f2535d0c78a97db55dc ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d7e03df719464354b20a845b7853be57da863924 Author: Jay Foad <jay.foad(a)amd.com> Date: Fri Nov 12 18:02:58 2021 +0000 [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions so it is better to use one instruction than two. Further improvements are possible to make better use of the addend operand, but this is already a strict improvement over what we have now. Differential Revision: https://reviews.llvm.org/D113986 --- llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 29 + llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h | 1 + llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | 49 + llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h | 1 + llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 23 + llvm/lib/Target/AMDGPU/SIISelLowering.h | 1 + .../AMDGPU/atomic_optimizations_global_pointer.ll | 104 +- .../AMDGPU/atomic_optimizations_local_pointer.ll | 108 +- llvm/test/CodeGen/AMDGPU/bypass-div.ll | 1064 +++++++++----------- llvm/test/CodeGen/AMDGPU/llvm.mulo.ll | 178 ++-- llvm/test/CodeGen/AMDGPU/mad_64_32.ll | 110 +- llvm/test/CodeGen/AMDGPU/mul.ll | 55 +- llvm/test/CodeGen/AMDGPU/mul_int24.ll | 9 +- llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll | 24 +- llvm/test/CodeGen/AMDGPU/udiv.ll | 358 +++---- llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll | 126 +-- llvm/test/CodeGen/AMDGPU/wwm-reserved.ll | 16 +- 17 files changed, 1126 insertions(+), 1130 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index 2e571ad01c1c..8236e6672247 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -654,6 +654,9 @@ void AMDGPUDAGToDAGISel::Select(SDNode *N) { SelectMAD_64_32(N); return; } + case ISD::SMUL_LOHI: + case ISD::UMUL_LOHI: + return SelectMUL_LOHI(N); case ISD::CopyToReg: { const SITargetLowering& Lowering = *static_cast<const SITargetLowering*>(getTargetLowering()); @@ -1013,6 +1016,32 @@ void AMDGPUDAGToDAGISel::SelectMAD_64_32(SDNode *N) { CurDAG->SelectNodeTo(N, Opc, N->getVTList(), Ops); } +// We need to handle this here because tablegen doesn't support matching +// instructions with multiple outputs. +void AMDGPUDAGToDAGISel::SelectMUL_LOHI(SDNode *N) { + SDLoc SL(N); + bool Signed = N->getOpcode() == ISD::SMUL_LOHI; + unsigned Opc = Signed ? AMDGPU::V_MAD_I64_I32_e64 : AMDGPU::V_MAD_U64_U32_e64; + + SDValue Zero = CurDAG->getTargetConstant(0, SL, MVT::i64); + SDValue Clamp = CurDAG->getTargetConstant(0, SL, MVT::i1); + SDValue Ops[] = {N->getOperand(0), N->getOperand(1), Zero, Clamp}; + SDNode *Mad = CurDAG->getMachineNode(Opc, SL, N->getVTList(), Ops); + if (!SDValue(N, 0).use_empty()) { + SDValue Sub0 = CurDAG->getTargetConstant(AMDGPU::sub0, SL, MVT::i32); + SDNode *Lo = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, SL, + MVT::i32, SDValue(Mad, 0), Sub0); + ReplaceUses(SDValue(N, 0), SDValue(Lo, 0)); + } + if (!SDValue(N, 1).use_empty()) { + SDValue Sub1 = CurDAG->getTargetConstant(AMDGPU::sub1, SL, MVT::i32); + SDNode *Hi = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, SL, + MVT::i32, SDValue(Mad, 0), Sub1); + ReplaceUses(SDValue(N, 1), SDValue(Hi, 0)); + } + CurDAG->RemoveDeadNode(N); +} + bool AMDGPUDAGToDAGISel::isDSOffsetLegal(SDValue Base, unsigned Offset) const { if (!isUInt<16>(Offset)) return false; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h index 74aff9e406c9..d638d9877a9b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h @@ -235,6 +235,7 @@ private: void SelectUADDO_USUBO(SDNode *N); void SelectDIV_SCALE(SDNode *N); void SelectMAD_64_32(SDNode *N); + void SelectMUL_LOHI(SDNode *N); void SelectFMA_W_CHAIN(SDNode *N); void SelectFMUL_W_CHAIN(SDNode *N); SDNode *getBFE32(bool IsSigned, const SDLoc &DL, SDValue Val, uint32_t Offset, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp index 523fa2d3724b..54177564afbc 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -594,6 +594,8 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM, setTargetDAGCombine(ISD::SRL); setTargetDAGCombine(ISD::TRUNCATE); setTargetDAGCombine(ISD::MUL); + setTargetDAGCombine(ISD::SMUL_LOHI); + setTargetDAGCombine(ISD::UMUL_LOHI); setTargetDAGCombine(ISD::MULHU); setTargetDAGCombine(ISD::MULHS); setTargetDAGCombine(ISD::SELECT); @@ -3462,6 +3464,50 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N, return DAG.getSExtOrTrunc(Mul, DL, VT); } +SDValue +AMDGPUTargetLowering::performMulLoHiCombine(SDNode *N, + DAGCombinerInfo &DCI) const { + if (N->getValueType(0) != MVT::i32) + return SDValue(); + + SelectionDAG &DAG = DCI.DAG; + SDLoc DL(N); + + SDValue N0 = N->getOperand(0); + SDValue N1 = N->getOperand(1); + + // SimplifyDemandedBits has the annoying habit of turning useful zero_extends + // in the source into any_extends if the result of the mul is truncated. Since + // we can assume the high bits are whatever we want, use the underlying value + // to avoid the unknown high bits from interfering. + if (N0.getOpcode() == ISD::ANY_EXTEND) + N0 = N0.getOperand(0); + if (N1.getOpcode() == ISD::ANY_EXTEND) + N1 = N1.getOperand(0); + + // Try to use two fast 24-bit multiplies (one for each half of the result) + // instead of one slow extending multiply. + unsigned LoOpcode, HiOpcode; + if (Subtarget->hasMulU24() && isU24(N0, DAG) && isU24(N1, DAG)) { + N0 = DAG.getZExtOrTrunc(N0, DL, MVT::i32); + N1 = DAG.getZExtOrTrunc(N1, DL, MVT::i32); + LoOpcode = AMDGPUISD::MUL_U24; + HiOpcode = AMDGPUISD::MULHI_U24; + } else if (Subtarget->hasMulI24() && isI24(N0, DAG) && isI24(N1, DAG)) { + N0 = DAG.getSExtOrTrunc(N0, DL, MVT::i32); + N1 = DAG.getSExtOrTrunc(N1, DL, MVT::i32); + LoOpcode = AMDGPUISD::MUL_I24; + HiOpcode = AMDGPUISD::MULHI_I24; + } else { + return SDValue(); + } + + SDValue Lo = DAG.getNode(LoOpcode, DL, MVT::i32, N0, N1); + SDValue Hi = DAG.getNode(HiOpcode, DL, MVT::i32, N0, N1); + DCI.CombineTo(N, Lo, Hi); + return SDValue(N, 0); +} + SDValue AMDGPUTargetLowering::performMulhsCombine(SDNode *N, DAGCombinerInfo &DCI) const { EVT VT = N->getValueType(0); @@ -4103,6 +4149,9 @@ SDValue AMDGPUTargetLowering::PerformDAGCombine(SDNode *N, return performTruncateCombine(N, DCI); case ISD::MUL: return performMulCombine(N, DCI); + case ISD::SMUL_LOHI: + case ISD::UMUL_LOHI: + return performMulLoHiCombine(N, DCI); case ISD::MULHS: return performMulhsCombine(N, DCI); case ISD::MULHU: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h index 03632ac18598..daaca8737c5d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h @@ -91,6 +91,7 @@ protected: SDValue performSrlCombine(SDNode *N, DAGCombinerInfo &DCI) const; SDValue performTruncateCombine(SDNode *N, DAGCombinerInfo &DCI) const; SDValue performMulCombine(SDNode *N, DAGCombinerInfo &DCI) const; + SDValue performMulLoHiCombine(SDNode *N, DAGCombinerInfo &DCI) const; SDValue performMulhsCombine(SDNode *N, DAGCombinerInfo &DCI) const; SDValue performMulhuCombine(SDNode *N, DAGCombinerInfo &DCI) const; SDValue performCtlz_CttzCombine(const SDLoc &SL, SDValue Cond, SDValue LHS, diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 519c5b936536..02440044d6e2 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -809,6 +809,11 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, setOperationAction(ISD::SMULO, MVT::i64, Custom); setOperationAction(ISD::UMULO, MVT::i64, Custom); + if (Subtarget->hasMad64_32()) { + setOperationAction(ISD::SMUL_LOHI, MVT::i32, Custom); + setOperationAction(ISD::UMUL_LOHI, MVT::i32, Custom); + } + setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom); setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f32, Custom); setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom); @@ -4691,6 +4696,9 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { case ISD::SMULO: case ISD::UMULO: return lowerXMULO(Op, DAG); + case ISD::SMUL_LOHI: + case ISD::UMUL_LOHI: + return lowerXMUL_LOHI(Op, DAG); case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG); } @@ -5304,6 +5312,21 @@ SDValue SITargetLowering::lowerXMULO(SDValue Op, SelectionDAG &DAG) const { return DAG.getMergeValues({ Result, Overflow }, SL); } +SDValue SITargetLowering::lowerXMUL_LOHI(SDValue Op, SelectionDAG &DAG) const { + if (Op->isDivergent()) { + // Select to V_MAD_[IU]64_[IU]32. + return Op; + } + if (Subtarget->hasSMulHi()) { + // Expand to S_MUL_I32 + S_MUL_HI_[IU]32. + return SDValue(); + } + // The multiply is uniform but we would have to use V_MUL_HI_[IU]32 to + // calculate the high part, so we might as well do the whole thing with + // V_MAD_[IU]64_[IU]32. + return Op; +} + SDValue SITargetLowering::lowerTRAP(SDValue Op, SelectionDAG &DAG) const { if (!Subtarget->isTrapHandlerEnabled() || Subtarget->getTrapHandlerAbi() != GCNSubtarget::TrapHandlerAbi::AMDHSA) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h index 1e48c96ad3c8..ea6ca3f48827 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.h +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -135,6 +135,7 @@ private: SDValue lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const; SDValue lowerFMINNUM_FMAXNUM(SDValue Op, SelectionDAG &DAG) const; SDValue lowerXMULO(SDValue Op, SelectionDAG &DAG) const; + SDValue lowerXMUL_LOHI(SDValue Op, SelectionDAG &DAG) const; SDValue getSegmentAperture(unsigned AS, const SDLoc &DL, SelectionDAG &DAG) const; diff --git a/llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll b/llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll index 49f05fceb8ed..4ad774db6686 100644 --- a/llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll +++ b/llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll @@ -818,32 +818,29 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX8-NEXT: s_mov_b32 s12, s6 ; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[8:9] ; GFX8-NEXT: v_mov_b32_e32 v0, s6 -; GFX8-NEXT: v_mul_hi_u32 v0, s0, v0 -; GFX8-NEXT: s_mov_b32 s13, s7 -; GFX8-NEXT: s_mul_i32 s7, s1, s6 -; GFX8-NEXT: s_mul_i32 s6, s0, s6 +; GFX8-NEXT: v_mad_u64_u32 v[0:1], s[8:9], s0, v0, 0 +; GFX8-NEXT: s_mul_i32 s6, s1, s6 ; GFX8-NEXT: s_mov_b32 s15, 0xf000 ; GFX8-NEXT: s_mov_b32 s14, -1 -; GFX8-NEXT: v_add_u32_e32 v1, vcc, s7, v0 -; GFX8-NEXT: v_mov_b32_e32 v0, s6 +; GFX8-NEXT: s_mov_b32 s13, s7 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1 ; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX8-NEXT: buffer_atomic_add_x2 v[0:1], off, s[12:15], 0 glc ; GFX8-NEXT: s_waitcnt vmcnt(0) ; GFX8-NEXT: buffer_wbinvl1_vol ; GFX8-NEXT: .LBB4_2: ; GFX8-NEXT: s_or_b64 exec, exec, s[2:3] -; GFX8-NEXT: v_readfirstlane_b32 s2, v0 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) -; GFX8-NEXT: v_mul_lo_u32 v0, s1, v2 -; GFX8-NEXT: v_mul_hi_u32 v3, s0, v2 +; GFX8-NEXT: v_mul_lo_u32 v4, s1, v2 +; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s0, v2, 0 +; GFX8-NEXT: v_readfirstlane_b32 s0, v0 ; GFX8-NEXT: v_readfirstlane_b32 s1, v1 -; GFX8-NEXT: v_mul_lo_u32 v1, s0, v2 -; GFX8-NEXT: s_mov_b32 s7, 0xf000 -; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v0 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4 ; GFX8-NEXT: v_mov_b32_e32 v3, s1 -; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v1 +; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v2 +; GFX8-NEXT: s_mov_b32 s7, 0xf000 ; GFX8-NEXT: s_mov_b32 s6, -1 -; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v3, v2, vcc +; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v3, v1, vcc ; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX8-NEXT: s_endpgm ; @@ -878,17 +875,16 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX9-NEXT: .LBB4_2: ; GFX9-NEXT: s_or_b64 exec, exec, s[0:1] ; GFX9-NEXT: s_waitcnt lgkmcnt(0) -; GFX9-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, s2, v2 +; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0 ; GFX9-NEXT: v_readfirstlane_b32 s0, v0 -; GFX9-NEXT: v_mul_lo_u32 v0, s2, v2 ; GFX9-NEXT: v_readfirstlane_b32 s1, v1 -; GFX9-NEXT: v_add_u32_e32 v1, v4, v3 -; GFX9-NEXT: v_mov_b32_e32 v2, s1 -; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0 +; GFX9-NEXT: v_add_u32_e32 v1, v3, v4 +; GFX9-NEXT: v_mov_b32_e32 v3, s1 +; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2 ; GFX9-NEXT: s_mov_b32 s7, 0xf000 ; GFX9-NEXT: s_mov_b32 s6, -1 -; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v3, v1, vcc ; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX9-NEXT: s_endpgm ; @@ -927,14 +923,13 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX1064-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1064-NEXT: s_or_b64 exec, exec, s[0:1] ; GFX1064-NEXT: s_waitcnt lgkmcnt(0) -; GFX1064-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1064-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1064-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1064-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1064-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0 ; GFX1064-NEXT: v_readfirstlane_b32 s0, v0 ; GFX1064-NEXT: v_readfirstlane_b32 s1, v1 ; GFX1064-NEXT: s_mov_b32 s7, 0x31016000 ; GFX1064-NEXT: s_mov_b32 s6, -1 -; GFX1064-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1064-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1064-NEXT: v_add_co_u32 v0, vcc, s0, v2 ; GFX1064-NEXT: v_add_co_ci_u32_e32 v1, vcc, s1, v1, vcc ; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 @@ -974,14 +969,13 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX1032-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s0 ; GFX1032-NEXT: s_waitcnt lgkmcnt(0) -; GFX1032-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1032-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1032-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1032-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1032-NEXT: v_mad_u64_u32 v[2:3], s0, s2, v2, 0 ; GFX1032-NEXT: v_readfirstlane_b32 s0, v0 ; GFX1032-NEXT: v_readfirstlane_b32 s1, v1 ; GFX1032-NEXT: s_mov_b32 s7, 0x31016000 ; GFX1032-NEXT: s_mov_b32 s6, -1 -; GFX1032-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1032-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1032-NEXT: v_add_co_u32 v0, vcc_lo, s0, v2 ; GFX1032-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, s1, v1, vcc_lo ; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 @@ -1955,32 +1949,29 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX8-NEXT: s_mov_b32 s12, s6 ; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[8:9] ; GFX8-NEXT: v_mov_b32_e32 v0, s6 -; GFX8-NEXT: v_mul_hi_u32 v0, s0, v0 -; GFX8-NEXT: s_mov_b32 s13, s7 -; GFX8-NEXT: s_mul_i32 s7, s1, s6 -; GFX8-NEXT: s_mul_i32 s6, s0, s6 +; GFX8-NEXT: v_mad_u64_u32 v[0:1], s[8:9], s0, v0, 0 +; GFX8-NEXT: s_mul_i32 s6, s1, s6 ; GFX8-NEXT: s_mov_b32 s15, 0xf000 ; GFX8-NEXT: s_mov_b32 s14, -1 -; GFX8-NEXT: v_add_u32_e32 v1, vcc, s7, v0 -; GFX8-NEXT: v_mov_b32_e32 v0, s6 +; GFX8-NEXT: s_mov_b32 s13, s7 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1 ; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX8-NEXT: buffer_atomic_sub_x2 v[0:1], off, s[12:15], 0 glc ; GFX8-NEXT: s_waitcnt vmcnt(0) ; GFX8-NEXT: buffer_wbinvl1_vol ; GFX8-NEXT: .LBB10_2: ; GFX8-NEXT: s_or_b64 exec, exec, s[2:3] -; GFX8-NEXT: v_readfirstlane_b32 s2, v0 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) -; GFX8-NEXT: v_mul_lo_u32 v0, s1, v2 -; GFX8-NEXT: v_mul_hi_u32 v3, s0, v2 +; GFX8-NEXT: v_mul_lo_u32 v4, s1, v2 +; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s0, v2, 0 +; GFX8-NEXT: v_readfirstlane_b32 s0, v0 ; GFX8-NEXT: v_readfirstlane_b32 s1, v1 -; GFX8-NEXT: v_mul_lo_u32 v1, s0, v2 -; GFX8-NEXT: s_mov_b32 s7, 0xf000 -; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v0 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4 ; GFX8-NEXT: v_mov_b32_e32 v3, s1 -; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v1 +; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v2 +; GFX8-NEXT: s_mov_b32 s7, 0xf000 ; GFX8-NEXT: s_mov_b32 s6, -1 -; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v3, v2, vcc +; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v3, v1, vcc ; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX8-NEXT: s_endpgm ; @@ -2015,17 +2006,16 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX9-NEXT: .LBB10_2: ; GFX9-NEXT: s_or_b64 exec, exec, s[0:1] ; GFX9-NEXT: s_waitcnt lgkmcnt(0) -; GFX9-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, s2, v2 +; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0 ; GFX9-NEXT: v_readfirstlane_b32 s0, v0 -; GFX9-NEXT: v_mul_lo_u32 v0, s2, v2 ; GFX9-NEXT: v_readfirstlane_b32 s1, v1 -; GFX9-NEXT: v_add_u32_e32 v1, v4, v3 -; GFX9-NEXT: v_mov_b32_e32 v2, s1 -; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v0 +; GFX9-NEXT: v_add_u32_e32 v1, v3, v4 +; GFX9-NEXT: v_mov_b32_e32 v3, s1 +; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v2 ; GFX9-NEXT: s_mov_b32 s7, 0xf000 ; GFX9-NEXT: s_mov_b32 s6, -1 -; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v2, v1, vcc +; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v3, v1, vcc ; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX9-NEXT: s_endpgm ; @@ -2064,14 +2054,13 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX1064-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1064-NEXT: s_or_b64 exec, exec, s[0:1] ; GFX1064-NEXT: s_waitcnt lgkmcnt(0) -; GFX1064-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1064-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1064-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1064-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1064-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0 ; GFX1064-NEXT: v_readfirstlane_b32 s0, v0 ; GFX1064-NEXT: v_readfirstlane_b32 s1, v1 ; GFX1064-NEXT: s_mov_b32 s7, 0x31016000 ; GFX1064-NEXT: s_mov_b32 s6, -1 -; GFX1064-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1064-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1064-NEXT: v_sub_co_u32 v0, vcc, s0, v2 ; GFX1064-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s1, v1, vcc ; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 @@ -2111,14 +2100,13 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 addrspace ; GFX1032-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s0 ; GFX1032-NEXT: s_waitcnt lgkmcnt(0) -; GFX1032-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1032-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1032-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1032-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1032-NEXT: v_mad_u64_u32 v[2:3], s0, s2, v2, 0 ; GFX1032-NEXT: v_readfirstlane_b32 s0, v0 ; GFX1032-NEXT: v_readfirstlane_b32 s1, v1 ; GFX1032-NEXT: s_mov_b32 s7, 0x31016000 ; GFX1032-NEXT: s_mov_b32 s6, -1 -; GFX1032-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1032-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1032-NEXT: v_sub_co_u32 v0, vcc_lo, s0, v2 ; GFX1032-NEXT: v_sub_co_ci_u32_e32 v1, vcc_lo, s1, v1, vcc_lo ; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 diff --git a/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll b/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll index 455f9de836ba..bf91960537a4 100644 --- a/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll +++ b/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll @@ -954,15 +954,13 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive ; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc ; GFX8-NEXT: s_cbranch_execz .LBB5_2 ; GFX8-NEXT: ; %bb.1: -; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[6:7] -; GFX8-NEXT: v_mov_b32_e32 v0, s6 +; GFX8-NEXT: s_bcnt1_i32_b64 s8, s[6:7] +; GFX8-NEXT: v_mov_b32_e32 v0, s8 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) -; GFX8-NEXT: v_mul_hi_u32 v0, s2, v0 -; GFX8-NEXT: s_mul_i32 s7, s3, s6 -; GFX8-NEXT: s_mul_i32 s6, s2, s6 +; GFX8-NEXT: v_mad_u64_u32 v[0:1], s[6:7], s2, v0, 0 +; GFX8-NEXT: s_mul_i32 s6, s3, s8 ; GFX8-NEXT: v_mov_b32_e32 v3, 0 -; GFX8-NEXT: v_add_u32_e32 v1, vcc, s7, v0 -; GFX8-NEXT: v_mov_b32_e32 v0, s6 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1 ; GFX8-NEXT: s_mov_b32 m0, -1 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) ; GFX8-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1] @@ -971,18 +969,17 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive ; GFX8-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX8-NEXT: s_waitcnt lgkmcnt(0) ; GFX8-NEXT: s_mov_b32 s4, s0 -; GFX8-NEXT: v_readfirstlane_b32 s0, v0 -; GFX8-NEXT: v_mul_lo_u32 v0, s3, v2 -; GFX8-NEXT: v_mul_hi_u32 v3, s2, v2 ; GFX8-NEXT: s_mov_b32 s5, s1 +; GFX8-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0 +; GFX8-NEXT: v_readfirstlane_b32 s0, v0 ; GFX8-NEXT: v_readfirstlane_b32 s1, v1 -; GFX8-NEXT: v_mul_lo_u32 v1, s2, v2 -; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v0 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4 ; GFX8-NEXT: v_mov_b32_e32 v3, s1 -; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v1 +; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v2 ; GFX8-NEXT: s_mov_b32 s7, 0xf000 ; GFX8-NEXT: s_mov_b32 s6, -1 -; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v3, v2, vcc +; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v3, v1, vcc ; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX8-NEXT: s_endpgm ; @@ -1012,19 +1009,18 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive ; GFX9-NEXT: .LBB5_2: ; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_waitcnt lgkmcnt(0) +; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0 ; GFX9-NEXT: s_mov_b32 s4, s0 -; GFX9-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX9-NEXT: v_readfirstlane_b32 s0, v0 -; GFX9-NEXT: v_mul_lo_u32 v0, s2, v2 ; GFX9-NEXT: s_mov_b32 s5, s1 +; GFX9-NEXT: v_readfirstlane_b32 s0, v0 ; GFX9-NEXT: v_readfirstlane_b32 s1, v1 -; GFX9-NEXT: v_add_u32_e32 v1, v4, v3 -; GFX9-NEXT: v_mov_b32_e32 v2, s1 -; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0 +; GFX9-NEXT: v_add_u32_e32 v1, v3, v4 +; GFX9-NEXT: v_mov_b32_e32 v3, s1 +; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2 ; GFX9-NEXT: s_mov_b32 s7, 0xf000 ; GFX9-NEXT: s_mov_b32 s6, -1 -; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v3, v1, vcc ; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX9-NEXT: s_endpgm ; @@ -1057,13 +1053,12 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive ; GFX1064-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX1064-NEXT: s_waitcnt lgkmcnt(0) -; GFX1064-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1064-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1064-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1064-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1064-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0 ; GFX1064-NEXT: v_readfirstlane_b32 s2, v0 ; GFX1064-NEXT: v_readfirstlane_b32 s4, v1 ; GFX1064-NEXT: s_mov_b32 s3, 0x31016000 -; GFX1064-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1064-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1064-NEXT: v_add_co_u32 v0, vcc, s2, v2 ; GFX1064-NEXT: s_mov_b32 s2, -1 ; GFX1064-NEXT: v_add_co_ci_u32_e32 v1, vcc, s4, v1, vcc @@ -1098,13 +1093,12 @@ define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive ; GFX1032-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4 ; GFX1032-NEXT: s_waitcnt lgkmcnt(0) -; GFX1032-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1032-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1032-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1032-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1032-NEXT: v_mad_u64_u32 v[2:3], s2, s2, v2, 0 ; GFX1032-NEXT: v_readfirstlane_b32 s2, v0 ; GFX1032-NEXT: v_readfirstlane_b32 s4, v1 ; GFX1032-NEXT: s_mov_b32 s3, 0x31016000 -; GFX1032-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1032-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1032-NEXT: v_add_co_u32 v0, vcc_lo, s2, v2 ; GFX1032-NEXT: s_mov_b32 s2, -1 ; GFX1032-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, s4, v1, vcc_lo @@ -2133,15 +2127,13 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive ; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc ; GFX8-NEXT: s_cbranch_execz .LBB12_2 ; GFX8-NEXT: ; %bb.1: -; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[6:7] -; GFX8-NEXT: v_mov_b32_e32 v0, s6 +; GFX8-NEXT: s_bcnt1_i32_b64 s8, s[6:7] +; GFX8-NEXT: v_mov_b32_e32 v0, s8 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) -; GFX8-NEXT: v_mul_hi_u32 v0, s2, v0 -; GFX8-NEXT: s_mul_i32 s7, s3, s6 -; GFX8-NEXT: s_mul_i32 s6, s2, s6 +; GFX8-NEXT: v_mad_u64_u32 v[0:1], s[6:7], s2, v0, 0 +; GFX8-NEXT: s_mul_i32 s6, s3, s8 ; GFX8-NEXT: v_mov_b32_e32 v3, 0 -; GFX8-NEXT: v_add_u32_e32 v1, vcc, s7, v0 -; GFX8-NEXT: v_mov_b32_e32 v0, s6 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1 ; GFX8-NEXT: s_mov_b32 m0, -1 ; GFX8-NEXT: s_waitcnt lgkmcnt(0) ; GFX8-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1] @@ -2150,18 +2142,17 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive ; GFX8-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX8-NEXT: s_waitcnt lgkmcnt(0) ; GFX8-NEXT: s_mov_b32 s4, s0 -; GFX8-NEXT: v_readfirstlane_b32 s0, v0 -; GFX8-NEXT: v_mul_lo_u32 v0, s3, v2 -; GFX8-NEXT: v_mul_hi_u32 v3, s2, v2 ; GFX8-NEXT: s_mov_b32 s5, s1 +; GFX8-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0 +; GFX8-NEXT: v_readfirstlane_b32 s0, v0 ; GFX8-NEXT: v_readfirstlane_b32 s1, v1 -; GFX8-NEXT: v_mul_lo_u32 v1, s2, v2 -; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v0 +; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4 ; GFX8-NEXT: v_mov_b32_e32 v3, s1 -; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v1 +; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v2 ; GFX8-NEXT: s_mov_b32 s7, 0xf000 ; GFX8-NEXT: s_mov_b32 s6, -1 -; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v3, v2, vcc +; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v3, v1, vcc ; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX8-NEXT: s_endpgm ; @@ -2191,19 +2182,18 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive ; GFX9-NEXT: .LBB12_2: ; GFX9-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX9-NEXT: s_waitcnt lgkmcnt(0) +; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0 ; GFX9-NEXT: s_mov_b32 s4, s0 -; GFX9-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX9-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX9-NEXT: v_readfirstlane_b32 s0, v0 -; GFX9-NEXT: v_mul_lo_u32 v0, s2, v2 ; GFX9-NEXT: s_mov_b32 s5, s1 +; GFX9-NEXT: v_readfirstlane_b32 s0, v0 ; GFX9-NEXT: v_readfirstlane_b32 s1, v1 -; GFX9-NEXT: v_add_u32_e32 v1, v4, v3 -; GFX9-NEXT: v_mov_b32_e32 v2, s1 -; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v0 +; GFX9-NEXT: v_add_u32_e32 v1, v3, v4 +; GFX9-NEXT: v_mov_b32_e32 v3, s1 +; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v2 ; GFX9-NEXT: s_mov_b32 s7, 0xf000 ; GFX9-NEXT: s_mov_b32 s6, -1 -; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v2, v1, vcc +; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v3, v1, vcc ; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 ; GFX9-NEXT: s_endpgm ; @@ -2236,13 +2226,12 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive ; GFX1064-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5] ; GFX1064-NEXT: s_waitcnt lgkmcnt(0) -; GFX1064-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1064-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1064-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1064-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1064-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0 ; GFX1064-NEXT: v_readfirstlane_b32 s2, v0 ; GFX1064-NEXT: v_readfirstlane_b32 s4, v1 ; GFX1064-NEXT: s_mov_b32 s3, 0x31016000 -; GFX1064-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1064-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1064-NEXT: v_sub_co_u32 v0, vcc, s2, v2 ; GFX1064-NEXT: s_mov_b32 s2, -1 ; GFX1064-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s4, v1, vcc @@ -2277,13 +2266,12 @@ define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive ; GFX1032-NEXT: s_waitcnt_depctr 0xffe3 ; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4 ; GFX1032-NEXT: s_waitcnt lgkmcnt(0) -; GFX1032-NEXT: v_mul_lo_u32 v3, s3, v2 -; GFX1032-NEXT: v_mul_hi_u32 v4, s2, v2 -; GFX1032-NEXT: v_mul_lo_u32 v2, s2, v2 +; GFX1032-NEXT: v_mul_lo_u32 v4, s3, v2 +; GFX1032-NEXT: v_mad_u64_u32 v[2:3], s2, s2, v2, 0 ; GFX1032-NEXT: v_readfirstlane_b32 s2, v0 ; GFX1032-NEXT: v_readfirstlane_b32 s4, v1 ; GFX1032-NEXT: s_mov_b32 s3, 0x31016000 -; GFX1032-NEXT: v_add_nc_u32_e32 v1, v4, v3 +; GFX1032-NEXT: v_add_nc_u32_e32 v1, v3, v4 ; GFX1032-NEXT: v_sub_co_u32 v0, vcc_lo, s2, v2 ; GFX1032-NEXT: s_mov_b32 s2, -1 ; GFX1032-NEXT: v_sub_co_ci_u32_e32 v1, vcc_lo, s4, v1, vcc_lo diff --git a/llvm/test/CodeGen/AMDGPU/bypass-div.ll b/llvm/test/CodeGen/AMDGPU/bypass-div.ll index 4ff9f6159cae..907ba8dd3086 100644 --- a/llvm/test/CodeGen/AMDGPU/bypass-div.ll +++ b/llvm/test/CodeGen/AMDGPU/bypass-div.ll @@ -16,119 +16,107 @@ define i64 @sdiv64(i64 %a, i64 %b) { ; GFX9-NEXT: s_xor_b64 s[6:7], exec, s[4:5] ; GFX9-NEXT: s_cbranch_execz .LBB0_2 ; GFX9-NEXT: ; %bb.1: -; GFX9-NEXT: v_ashrrev_i32_e32 v4, 31, v3 -; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v2, v4 -; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v4, vcc -; GFX9-NEXT: v_xor_b32_e32 v3, v3, v4 -; GFX9-NEXT: v_xor_b32_e32 v2, v2, v4 -; GFX9-NEXT: v_cvt_f32_u32_e32 v5, v2 -; GFX9-NEXT: v_cvt_f32_u32_e32 v6, v3 -; GFX9-NEXT: v_sub_co_u32_e32 v7, vcc, 0, v2 -; GFX9-NEXT: v_subb_co_u32_e32 v8, vcc, 0, v3, vcc -; GFX9-NEXT: v_mac_f32_e32 v5, 0x4f800000, v6 -; GFX9-NEXT: v_rcp_f32_e32 v5, v5 +; GFX9-NEXT: v_ashrrev_i32_e32 v9, 31, v3 +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v2, v9 +; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v9, vcc +; GFX9-NEXT: v_xor_b32_e32 v10, v3, v9 +; GFX9-NEXT: v_xor_b32_e32 v11, v2, v9 +; GFX9-NEXT: v_cvt_f32_u32_e32 v2, v11 +; GFX9-NEXT: v_cvt_f32_u32_e32 v3, v10 +; GFX9-NEXT: v_sub_co_u32_e32 v7, vcc, 0, v11 +; GFX9-NEXT: v_subb_co_u32_e32 v8, vcc, 0, v10, vcc +; GFX9-NEXT: v_mac_f32_e32 v2, 0x4f800000, v3 +; GFX9-NEXT: v_rcp_f32_e32 v2, v2 ; GFX9-NEXT: v_mov_b32_e32 v14, 0 -; GFX9-NEXT: v_mul_f32_e32 v5, 0x5f7ffffc, v5 -; GFX9-NEXT: v_mul_f32_e32 v6, 0x2f800000, v5 -; GFX9-NEXT: v_trunc_f32_e32 v6, v6 -; GFX9-NEXT: v_mac_f32_e32 v5, 0xcf800000, v6 -; GFX9-NEXT: v_cvt_u32_f32_e32 v6, v6 -; GFX9-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GFX9-NEXT: v_mul_lo_u32 v11, v7, v6 -; GFX9-NEXT: v_mul_lo_u32 v9, v8, v5 -; GFX9-NEXT: v_mul_hi_u32 v10, v7, v5 -; GFX9-NEXT: v_mul_lo_u32 v12, v7, v5 -; GFX9-NEXT: v_add3_u32 v9, v10, v11, v9 -; GFX9-NEXT: v_mul_lo_u32 v10, v5, v9 -; GFX9-NEXT: v_mul_hi_u32 v11, v5, v12 -; GFX9-NEXT: v_mul_hi_u32 v13, v5, v9 -; GFX9-NEXT: v_mul_hi_u32 v15, v6, v9 -; GFX9-NEXT: v_mul_lo_u32 v9, v6, v9 -; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, v11, v10 -; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v13, vcc -; GFX9-NEXT: v_mul_lo_u32 v13, v6, v12 -; GFX9-NEXT: v_mul_hi_u32 v12, v6, v12 -; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, v10, v13 -; GFX9-NEXT: v_addc_co_u32_e32 v10, vcc, v11, v12, vcc -; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, v15, v14, vcc -; GFX9-NEXT: v_add_co_u32_e32 v9, vcc, v10, v9 -; GFX9-NEXT: v_addc_co_u32_e32 v10, vcc, 0, v11, vcc -; GFX9-NEXT: v_add_co_u32_e32 v5, vcc, v5, v9 -; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, v6, v10, vcc -; GFX9-NEXT: v_mul_lo_u32 v9, v7, v6 -; GFX9-NEXT: v_mul_lo_u32 v8, v8, v5 -; GFX9-NEXT: v_mul_hi_u32 v10, v7, v5 -; GFX9-NEXT: v_mul_lo_u32 v7, v7, v5 -; GFX9-NEXT: v_add3_u32 v8, v10, v9, v8 -; GFX9-NEXT: v_mul_lo_u32 v11, v5, v8 -; GFX9-NEXT: v_mul_hi_u32 v12, v5, v7 -; GFX9-NEXT: v_mul_hi_u32 v13, v5, v8 -; GFX9-NEXT: v_mul_hi_u32 v10, v6, v7 -; GFX9-NEXT: v_mul_lo_u32 v7, v6, v7 -; GFX9-NEXT: v_mul_hi_u32 v9, v6, v8 -; GFX9-NEXT: v_add_co_u32_e32 v11, vcc, v12, v11 -; GFX9-NEXT: v_addc_co_u32_e32 v12, vcc, 0, v13, vcc -; GFX9-NEXT: v_mul_lo_u32 v8, v6, v8 -; GFX9-NEXT: v_add_co_u32_e32 v7, vcc, v11, v7 -; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, v12, v10, vcc -; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, v9, v14, vcc -; GFX9-NEXT: v_add_co_u32_e32 v7, vcc, v7, v8 -; GFX9-NEXT: v_addc_co_u32_e32 v8, vcc, 0, v9, vcc -; GFX9-NEXT: v_add_co_u32_e32 v5, vcc, v5, v7 -; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, v6, v8, vcc -; GFX9-NEXT: v_ashrrev_i32_e32 v7, 31, v1 -; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, v0, v7 -; GFX9-NEXT: v_xor_b32_e32 v0, v0, v7 -; GFX9-NEXT: v_mul_lo_u32 v8, v0, v6 -; GFX9-NEXT: v_mul_hi_u32 v9, v0, v5 -; GFX9-NEXT: v_mul_hi_u32 v10, v0, v6 -; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v7, vcc -; GFX9-NEXT: v_xor_b32_e32 v1, v1, v7 -; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v9, v8 -; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v10, vcc -; GFX9-NEXT: v_mul_lo_u32 v10, v1, v5 -; GFX9-NEXT: v_mul_hi_u32 v5, v1, v5 -; GFX9-NEXT: v_mul_hi_u32 v11, v1, v6 -; GFX9-NEXT: v_mul_lo_u32 v6, v1, v6 -; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v8, v10 -; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, v9, v5, vcc -; GFX9-NEXT: v_addc_co_u32_e32 v8, vcc, v11, v14, vcc -; GFX9-NEXT: v_add_co_u32_e32 v5, vcc, v5, v6 -; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, 0, v8, vcc -; GFX9-NEXT: v_mul_lo_u32 v8, v3, v5 -; GFX9-NEXT: v_mul_lo_u32 v9, v2, v6 -; GFX9-NEXT: v_mul_hi_u32 v10, v2, v5 -; GFX9-NEXT: v_mul_lo_u32 v11, v2, v5 -; GFX9-NEXT: v_add3_u32 v8, v10, v9, v8 -; GFX9-NEXT: v_sub_u32_e32 v9, v1, v8 -; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, v0, v11 -; GFX9-NEXT: v_subb_co_u32_e64 v9, s[4:5], v9, v3, vcc -; GFX9-NEXT: v_sub_co_u32_e64 v10, s[4:5], v0, v2 -; GFX9-NEXT: v_subbrev_co_u32_e64 v9, s[4:5], 0, v9, s[4:5] -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v9, v3 -; GFX9-NEXT: v_cndmask_b32_e64 v11, 0, -1, s[4:5] -; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v10, v2 -; GFX9-NEXT: v_cndmask_b32_e64 v10, 0, -1, s[4:5] -; GFX9-NEXT: v_cmp_eq_u32_e64 s[4:5], v9, v3 -; GFX9-NEXT: v_cndmask_b32_e64 v9, v11, v10, s[4:5] -; GFX9-NEXT: v_add_co_u32_e64 v10, s[4:5], 2, v5 -; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v1, v8, vcc -; GFX9-NEXT: v_addc_co_u32_e64 v11, s[4:5], 0, v6, s[4:5] -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v1, v3 -; GFX9-NEXT: v_add_co_u32_e64 v12, s[4:5], 1, v5 -; GFX9-NEXT: v_cndmask_b32_e64 v8, 0, -1, vcc -; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v2 -; GFX9-NEXT: v_addc_co_u32_e64 v13, s[4:5], 0, v6, s[4:5] +; GFX9-NEXT: v_mul_f32_e32 v2, 0x5f7ffffc, v2 +; GFX9-NEXT: v_mul_f32_e32 v3, 0x2f800000, v2 +; GFX9-NEXT: v_trunc_f32_e32 v3, v3 +; GFX9-NEXT: v_mac_f32_e32 v2, 0xcf800000, v3 +; GFX9-NEXT: v_cvt_u32_f32_e32 v6, v2 +; GFX9-NEXT: v_cvt_u32_f32_e32 v12, v3 +; GFX9-NEXT: v_mul_lo_u32 v4, v8, v6 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[4:5], v7, v6, 0 +; GFX9-NEXT: v_mul_lo_u32 v5, v7, v12 +; GFX9-NEXT: v_mul_hi_u32 v13, v6, v2 +; GFX9-NEXT: v_add3_u32 v5, v3, v5, v4 +; GFX9-NEXT: v_mad_u64_u32 v[3:4], s[4:5], v6, v5, 0 +; GFX9-NEXT: v_add_co_u32_e32 v13, vcc, v13, v3 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[4:5], v12, v2, 0 +; GFX9-NEXT: v_addc_co_u32_e32 v15, vcc, 0, v4, vcc +; GFX9-NEXT: v_mad_u64_u32 v[4:5], s[4:5], v12, v5, 0 +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v13, v2 +; GFX9-NEXT: v_addc_co_u32_e32 v2, vcc, v15, v3, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v14, vcc +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v2, v4 +; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc +; GFX9-NEXT: v_add_co_u32_e32 v13, vcc, v6, v2 +; GFX9-NEXT: v_addc_co_u32_e32 v12, vcc, v12, v3, vcc +; GFX9-NEXT: v_mul_lo_u32 v4, v7, v12 +; GFX9-NEXT: v_mul_lo_u32 v5, v8, v13 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[4:5], v7, v13, 0 +; GFX9-NEXT: v_add3_u32 v5, v3, v4, v5 +; GFX9-NEXT: v_mad_u64_u32 v[3:4], s[4:5], v12, v5, 0 +; GFX9-NEXT: v_mad_u64_u32 v[5:6], s[4:5], v13, v5, 0 +; GFX9-NEXT: v_mul_hi_u32 v15, v13, v2 +; GFX9-NEXT: v_mad_u64_u32 v[7:8], s[4:5], v12, v2, 0 +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v15, v5 +; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v6, vcc +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v2, v7 +; GFX9-NEXT: v_addc_co_u32_e32 v2, vcc, v5, v8, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v4, vcc, v4, v14, vcc +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v2, v3 +; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v4, vcc +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v13, v2 +; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v12, v3, vcc +; GFX9-NEXT: v_ashrrev_i32_e32 v4, 31, v1 +; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, v0, v4 +; GFX9-NEXT: v_xor_b32_e32 v6, v0, v4 +; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, v1, v4, vcc +; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v6, v3, 0 +; GFX9-NEXT: v_mul_hi_u32 v7, v6, v2 +; GFX9-NEXT: v_xor_b32_e32 v5, v5, v4 +; GFX9-NEXT: v_add_co_u32_e32 v7, vcc, v7, v0 +; GFX9-NEXT: v_addc_co_u32_e32 v8, vcc, 0, v1, vcc +; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v5, v2, 0 +; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[4:5], v5, v3, 0 +; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, v7, v0 +; GFX9-NEXT: v_addc_co_u32_e32 v0, vcc, v8, v1, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v3, v14, vcc +; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v0, v2 +; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v1, vcc +; GFX9-NEXT: v_mul_lo_u32 v7, v10, v2 +; GFX9-NEXT: v_mul_lo_u32 v8, v11, v3 +; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v11, v2, 0 +; GFX9-NEXT: v_add3_u32 v1, v1, v8, v7 +; GFX9-NEXT: v_sub_u32_e32 v7, v5, v1 +; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, v6, v0 +; GFX9-NEXT: v_subb_co_u32_e64 v6, s[4:5], v7, v10, vcc +; GFX9-NEXT: v_sub_co_u32_e64 v7, s[4:5], v0, v11 +; GFX9-NEXT: v_subbrev_co_u32_e64 v6, s[4:5], 0, v6, s[4:5] +; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v10 +; GFX9-NEXT: v_cndmask_b32_e64 v8, 0, -1, s[4:5] +; GFX9-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v11 +; GFX9-NEXT: v_cndmask_b32_e64 v7, 0, -1, s[4:5] +; GFX9-NEXT: v_cmp_eq_u32_e64 s[4:5], v6, v10 +; GFX9-NEXT: v_cndmask_b32_e64 v6, v8, v7, s[4:5] +; GFX9-NEXT: v_add_co_u32_e64 v7, s[4:5], 2, v2 +; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v5, v1, vcc +; GFX9-NEXT: v_addc_co_u32_e64 v8, s[4:5], 0, v3, s[4:5] +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v1, v10 +; GFX9-NEXT: v_add_co_u32_e64 v12, s[4:5], 1, v2 +; GFX9-NEXT: v_cndmask_b32_e64 v5, 0, -1, vcc +; GFX9-NEXT: v_cmp_ge_u32_e32 vcc, v0, v11 +; GFX9-NEXT: v_addc_co_u32_e64 v13, s[4:5], 0, v3, s[4:5] ; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, -1, vcc -; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, v1, v3 -; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, v9 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v8, v0, vcc +; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, v1, v10 +; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, v6 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v0, vcc ; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0 -; GFX9-NEXT: v_cndmask_b32_e64 v1, v12, v10, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e64 v9, v13, v11, s[4:5] -; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc -; GFX9-NEXT: v_xor_b32_e32 v2, v7, v4 -; GFX9-NEXT: v_cndmask_b32_e32 v0, v6, v9, vcc +; GFX9-NEXT: v_cndmask_b32_e64 v1, v12, v7, s[4:5] +; GFX9-NEXT: v_cndmask_b32_e64 v6, v13, v8, s[4:5] +; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc +; GFX9-NEXT: v_xor_b32_e32 v2, v4, v9 +; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v6, vcc ; GFX9-NEXT: v_xor_b32_e32 v1, v1, v2 ; GFX9-NEXT: v_xor_b32_e32 v0, v0, v2 ; GFX9-NEXT: v_sub_co_u32_e32 v4, vcc, v1, v2 @@ -183,106 +171,94 @@ define i64 @udiv64(i64 %a, i64 %b) { ; GFX9-NEXT: ; %bb.1: ; GFX9-NEXT: v_cvt_f32_u32_e32 v4, v2 ; GFX9-NEXT: v_cvt_f32_u32_e32 v5, v3 -; GFX9-NEXT: v_sub_co_u32_e32 v6, vcc, 0, v2 -; GFX9-NEXT: v_subb_co_u32_e32 v7, vcc, 0, v3, vcc +; GFX9-NEXT: v_sub_co_u32_e32 v10, vcc, 0, v2 +; GFX9-NEXT: v_subb_co_u32_e32 v11, vcc, 0, v3, vcc ; GFX9-NEXT: v_mac_f32_e32 v4, 0x4f800000, v5 ; GFX9-NEXT: v_rcp_f32_e32 v4, v4 -; GFX9-NEXT: v_mov_b32_e32 v12, 0 +; GFX9-NEXT: v_mov_b32_e32 v13, 0 ; GFX9-NEXT: v_mul_f32_e32 v4, 0x5f7ffffc, v4 ; GFX9-NEXT: v_mul_f32_e32 v5, 0x2f800000, v4 ; GFX9-NEXT: v_trunc_f32_e32 v5, v5 ; GFX9-NEXT: v_mac_f32_e32 v4, 0xcf800000, v5 -; GFX9-NEXT: v_cvt_u32_f32_e32 v5, v5 -; GFX9-NEXT: v_cvt_u32_f32_e32 v4, v4 -; GFX9-NEXT: v_mul_lo_u32 v8, v6, v5 -; GFX9-NEXT: v_mul_lo_u32 v9, v7, v4 -; GFX9-NEXT: v_mul_hi_u32 v10, v6, v4 -; GFX9-NEXT: v_mul_lo_u32 v11, v6, v4 -; GFX9-NEXT: v_add3_u32 v8, v10, v8, v9 -; GFX9-NEXT: v_mul_hi_u32 v9, v4, v11 -; GFX9-NEXT: v_mul_lo_u32 v10, v4, v8 -; GFX9-NEXT: v_mul_hi_u32 v13, v4, v8 -; GFX9-NEXT: v_mul_hi_u32 v14, v5, v8 -; GFX9-NEXT: v_mul_lo_u32 v8, v5, v8 -; GFX9-NEXT: v_add_co_u32_e32 v9, vcc, v9, v10 -; GFX9-NEXT: v_addc_co_u32_e32 v10, vcc, 0, v13, vcc -; GFX9-NEXT: v_mul_lo_u32 v13, v5, v11 -; GFX9-NEXT: v_mul_hi_u32 v11, v5, v11 -; GFX9-NEXT: v_add_co_u32_e32 v9, vcc, v9, v13 -; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, v10, v11, vcc -; GFX9-NEXT: v_addc_co_u32_e32 v10, vcc, v14, v12, vcc -; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v9, v8 -; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, 0, v10, vcc -; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, v4, v8 -; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, v5, v9, vcc -; GFX9-NEXT: v_mul_lo_u32 v8, v6, v5 -; GFX9-NEXT: v_mul_lo_u32 v7, v7, v4 -; GFX9-NEXT: v_mul_hi_u32 v9, v6, v4 -; GFX9-NEXT: v_mul_lo_u32 v6, v6, v4 -; GFX9-NEXT: v_add3_u32 v7, v9, v8, v7 -; GFX9-NEXT: v_mul_lo_u32 v10, v4, v7 -; GFX9-NEXT: v_mul_hi_u32 v11, v4, v6 -; GFX9-NEXT: v_mul_hi_u32 v13, v4, v7 -; GFX9-NEXT: v_mul_hi_u32 v9, v5, v6 -; GFX9-NEXT: v_mul_lo_u32 v6, v5, v6 -; GFX9-NEXT: v_mul_hi_u32 v8, v5, v7 -; GFX9-NEXT: v_add_co_u32_e32 v10, vcc, v11, v10 -; GFX9-NEXT: v_addc_co_u32_e32 v11, vcc, 0, v13, vcc -; GFX9-NEXT: v_mul_lo_u32 v7, v5, v7 -; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, v10, v6 -; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, v11, v9, vcc -; GFX9-NEXT: v_addc_co_u32_e32 v8, vcc, v8, v12, vcc -; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, v6, v7 -; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v8, vcc +; GFX9-NEXT: v_cvt_u32_f32_e32 v8, v5 +; GFX9-NEXT: v_cvt_u32_f32_e32 v9, v4 +; GFX9-NEXT: v_mul_lo_u32 v6, v10, v8 +; GFX9-NEXT: v_mul_lo_u32 v7, v11, v9 +; GFX9-NEXT: v_mad_u64_u32 v[4:5], s[4:5], v10, v9, 0 +; GFX9-NEXT: v_add3_u32 v7, v5, v6, v7 +; GFX9-NEXT: v_mul_hi_u32 v12, v9, v4 +; GFX9-NEXT: v_mad_u64_u32 v[5:6], s[4:5], v9, v7, 0 +; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, v12, v5 +; GFX9-NEXT: v_mad_u64_u32 v[4:5], s[4:5], v8, v4, 0 +; GFX9-NEXT: v_addc_co_u32_e32 v14, vcc, 0, v6, vcc +; GFX9-NEXT: v_mad_u64_u32 v[6:7], s[4:5], v8, v7, 0 +; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, v12, v4 +; GFX9-NEXT: v_addc_co_u32_e32 v4, vcc, v14, v5, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, v7, v13, vcc ; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, v4, v6 -; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, v5, v7, vcc -; GFX9-NEXT: v_mul_lo_u32 v6, v0, v5 -; GFX9-NEXT: v_mul_hi_u32 v7, v0, v4 -; GFX9-NEXT: v_mul_hi_u32 v8, v0, v5 -; GFX9-NEXT: v_mul_hi_u32 v9, v1, v5 -; GFX9-NEXT: v_mul_lo_u32 v5, v1, v5 -; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, v7, v6 +; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc +; GFX9-NEXT: v_add_co_u32_e32 v12, vcc, v9, v4 +; GFX9-NEXT: v_addc_co_u32_e32 v14, vcc, v8, v5, vcc +; GFX9-NEXT: v_mul_lo_u32 v6, v10, v14 +; GFX9-NEXT: v_mul_lo_u32 v7, v11, v12 +; GFX9-NEXT: v_mad_u64_u32 v[4:5], s[4:5], v10, v12, 0 +; GFX9-NEXT: v_add3_u32 v7, v5, v6, v7 +; GFX9-NEXT: v_mad_u64_u32 v[5:6], s[4:5], v14, v7, 0 +; GFX9-NEXT: v_mad_u64_u32 v[7:8], s[4:5], v12, v7, 0 +; GFX9-NEXT: v_mul_hi_u32 v11, v12, v4 +; GFX9-NEXT: v_mad_u64_u32 v[9:10], s[4:5], v14, v4, 0 +; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, v11, v7 ; GFX9-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v8, vcc -; GFX9-NEXT: v_mul_lo_u32 v8, v1, v4 -; GFX9-NEXT: v_mul_hi_u32 v4, v1, v4 -; GFX9-NEXT: v_add_co_u32_e32 v6, vcc, v6, v8 -; GFX9-NEXT: v_addc_co_u32_e32 v4, vcc, v7, v4, vcc -; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, v9, v12, vcc +; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, v4, v9 +; GFX9-NEXT: v_addc_co_u32_e32 v4, vcc, v7, v10, vcc +; GFX9-NEXT: v_addc_co_u32_e32 v6, vcc, v6, v13, vcc ; GFX9-NEXT: v_add_co_u32_e32 v4, vcc, v4, v5 ; GFX9-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v6, vcc -; GFX9-NEXT: v_mul_lo_u32 v6, v3, v4 -; GFX9-NEXT: v_mul_lo_u32 v7, v2, v5 -; GFX9-NEXT: v_mul_hi_u32 v8, v2, v4 </cut>

4 years, 7 months

[ACTIVITY] week ending 21 Nov 2021

by Richard Henderson

[UM-2] * release work * revived some 6month old ppc fpu fixes * reviews: loongarch, riscv, watchpoints, gdbstub. r~

4 years, 7 months

[ACTIVITY] week ending Nov. 21 2021

by Alex Bennée

VirtIO Initiative ([STR-9]) =========================== - posted Initial thoughts for test scenarios for AF_XDP epic Message-Id: <87k0h5v6ju.fsf(a)linaro.org> vhost-device maintainer effort ([UM-196]) - more review - [did some more noodling with rust] to get comfortable with generics [UM-196] <https://linaro.atlassian.net/browse/UM-196> [vhost-device crate] <https://github.com/rust-vmm/vhost-device> [did some more noodling with rust] <https://gitlab.com/stsquad/softfloat.rs> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL for 6.2 0/7] misc build and test fixes Message-Id: <20211116162515.4100231-1-alex.bennee(a)linaro.org> - posted [RFC PATCH] tests/avocado: fix tcg_plugin mem access count test Message-Id: <20211117095448.136558-1-alex.bennee(a)linaro.org> - posted Re: [RFC PATCH] plugins/meson.build: fix linker issue with weird paths (for v6.2?) Message-Id: <20211117111924.179776-1-alex.bennee(a)linaro.org> - posted Re: [PATCH v2 1/3] icount: preserve cflags when custom tb is about to execute Message-Id: <87h7cbw1tx.fsf(a)linaro.org> - posted [RFC PATCH] gdbstub: handle a potentially racing TaskState Message-Id: <20211119145124.942390-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - posted [kvm-unit-tests PATCH v3 0/3] GIC ITS tests Message-Id: <20211112114734.3058678-1-alex.bennee(a)linaro.org> - posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> [mttcg tests to current state and fixed up] <https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8> Completed Reviews [2/2] ======================= [PATCH v2 0/3] Some watchpoint-related patches Message-Id: <163662450348.125458.5494710452733592356.stgit@pasha-ThinkPad-X280> [PATCH 0/5] Update linux-headers + NOIRQ support for KVM gdbstub Message-Id: <20211111110604.207376-1-pbonzini(a)redhat.com> Absences ======== - none Current Review Queue ==================== TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64 Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com> =============================================================================================================== TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com> =================================================================================================================== TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option Message-Id: <20211013010120.96851-1-sjg(a)chromium.org> =========================================================================================================== TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

4 years, 7 months

[ACTIVITY] report week ending 19 Nov

by Peter Maydell

Progress (short week, 3 days) * UM-2 [QEMU upstream maintainership] - Still trying to sort out the regression of booting EL3 guest code on the imx7 board. I got most of the way through prototyping a cleanup which would fix this, but then spotted that the highbank board has a more awkward-to-fix similar problem. We're going to revert the PSCI emulation change for 6.2 so we can take the time to get the cleanup right and land it in 7.0. - Usual patch accumulation, review, etc during release cycle -- PMM

4 years, 7 months

[TCWG CI] Regression caused by gcc: tree-optimization/102880 - make PHI-OPT recognize more CFGs

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: tree-optimization/102880 - make PHI-OPT recognize more CFGs: commit f98f373dd822b35c52356b753d528924e9f89678 Author: Richard Biener <rguenther(a)suse.de> tree-optimization/102880 - make PHI-OPT recognize more CFGs Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21059 # First few build errors in logs: # 00:20:13 drivers/net/wireless/realtek/rtlwifi/rtl8192se/hw.c:2521:1: error: definition in block 22 does not dominate use in block 21 # 00:20:13 drivers/net/wireless/realtek/rtlwifi/rtl8192se/hw.c:2521:1: internal compiler error: verify_ssa failed # 00:20:13 make[6]: *** [scripts/Makefile.build:280: drivers/net/wireless/realtek/rtlwifi/rtl8192se/hw.o] Error 1 # 00:20:20 make[5]: *** [scripts/Makefile.build:497: drivers/net/wireless/realtek/rtlwifi/rtl8192se] Error 2 # 00:24:02 make[4]: *** [scripts/Makefile.build:497: drivers/net/wireless/realtek/rtlwifi] Error 2 # 00:24:02 make[3]: *** [scripts/Makefile.build:497: drivers/net/wireless/realtek] Error 2 # 00:24:02 make[2]: *** [scripts/Makefile.build:497: drivers/net/wireless] Error 2 # 00:25:21 drivers/staging/comedi/drivers/addi_apci_3120.c:1117:1: error: definition in block 10 does not dominate use in block 11 # 00:25:21 drivers/staging/comedi/drivers/addi_apci_3120.c:1117:1: internal compiler error: verify_ssa failed # 00:25:22 make[4]: *** [scripts/Makefile.build:280: drivers/staging/comedi/drivers/addi_apci_3120.o] Error 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28893 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-lts-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… Reproduce builds: <cut> mkdir investigate-gcc-f98f373dd822b35c52356b753d528924e9f89678 cd investigate-gcc-f98f373dd822b35c52356b753d528924e9f89678 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-lts-allmodc… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach f98f373dd822b35c52356b753d528924e9f89678 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d699f03720fce57b319276226ac4a463a8538e9f ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f98f373dd822b35c52356b753d528924e9f89678 Author: Richard Biener <rguenther(a)suse.de> Date: Mon Nov 15 15:19:36 2021 +0100 tree-optimization/102880 - make PHI-OPT recognize more CFGs This allows extra edges into the middle BB for the PHI-OPT transforms using replace_phi_edge_with_variable that do not end up moving stmts from that middle BB. This avoids regressing gcc.dg/tree-ssa/ssa-hoist-4.c with the actual fix for PR102880 where CFG cleanup has the choice to remove two forwarders and picks "the wrong" leading to if (a > b) / /\ / / <BB> / | # PHI <a, b> rather than if (a > b) | /\ | <BB> \ | / \ | # PHI <a, b, b> but it's relatively straight-forward to support extra edges into the middle-BB in paths ending in replace_phi_edge_with_variable and that do not require moving stmts. That's because we really only want to remove the edge from the condition to the middle BB. Of course actually doing that means updating dominators in non-trival ways which is why I kept the original code for the single edge case and simply defer to CFG cleanup by adjusting the condition for the complicated case. The testcase needs to be a GIMPLE one since it's quite unreliable to produce the desired CFG. 2021-11-15 Richard Biener <rguenther(a)suse.de> PR tree-optimization/102880 * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Push single_pred (bb1) condition to places that really need it. (match_simplify_replacement): Likewise. (value_replacement): Likewise. (replace_phi_edge_with_variable): Deal with extra edges into the middle BB. * gcc.dg/tree-ssa/phi-opt-26.c: New testcase. --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-26.c | 31 +++++++++++++ gcc/tree-ssa-phiopt.c | 71 +++++++++++++++++------------- 2 files changed, 72 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-26.c b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-26.c new file mode 100644 index 00000000000..21aa66e38b8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-26.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fgimple -fdump-tree-phiopt1" } */ + +int __GIMPLE (ssa,startwith("phiopt")) +foo (int a, int b, int flag) +{ + int res; + + __BB(2): + if (flag_2(D) != 0) + goto __BB6; + else + goto __BB4; + + __BB(4): + if (a_3(D) > b_4(D)) + goto __BB7; + else + goto __BB6; + + __BB(6): + goto __BB7; + + __BB(7): + res_1 = __PHI (__BB4: a_3(D), __BB6: b_4(D)); + return res_1; +} + +/* We should be able to detect MAX despite the extra edge into + the middle BB. */ +/* { dg-final { scan-tree-dump "MAX" "phiopt1" } } */ diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index 173ac835ca6..6b22f6bedd4 100644 --- a/gcc/tree-ssa-phiopt.c +++ b/gcc/tree-ssa-phiopt.c @@ -220,7 +220,6 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p) /* If either bb1's succ or bb2 or bb2's succ is non NULL. */ if (EDGE_COUNT (bb1->succs) == 0 - || bb2 == NULL || EDGE_COUNT (bb2->succs) == 0) continue; @@ -276,14 +275,14 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p) || (e1->flags & EDGE_FALLTHRU) == 0) continue; - /* Also make sure that bb1 only have one predecessor and that it - is bb. */ - if (!single_pred_p (bb1) - || single_pred (bb1) != bb) - continue; - if (do_store_elim) { + /* Also make sure that bb1 only have one predecessor and that it + is bb. */ + if (!single_pred_p (bb1) + || single_pred (bb1) != bb) + continue; + /* bb1 is the middle block, bb2 the join block, bb the split block, e1 the fallthrough edge from bb1 to bb2. We can't do the optimization if the join block has more than two predecessors. */ @@ -328,10 +327,11 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p) node. */ gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE); - gphi *newphi = factor_out_conditional_conversion (e1, e2, phi, - arg0, arg1, - cond_stmt); - if (newphi != NULL) + gphi *newphi; + if (single_pred_p (bb1) + && (newphi = factor_out_conditional_conversion (e1, e2, phi, + arg0, arg1, + cond_stmt))) { phi = newphi; /* factor_out_conditional_conversion may create a new PHI in @@ -350,12 +350,14 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p) early_p)) cfgchanged = true; else if (!early_p + && single_pred_p (bb1) && cond_removal_in_builtin_zero_pattern (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; - else if (spaceship_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) + else if (single_pred_p (bb1) + && spaceship_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; } } @@ -386,7 +388,6 @@ replace_phi_edge_with_variable (basic_block cond_block, edge e, gphi *phi, tree new_tree) { basic_block bb = gimple_bb (phi); - basic_block block_to_remove; gimple_stmt_iterator gsi; tree phi_result = PHI_RESULT (phi); @@ -422,28 +423,33 @@ replace_phi_edge_with_variable (basic_block cond_block, SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree); /* Remove the empty basic block. */ + edge edge_to_remove; if (EDGE_SUCC (cond_block, 0)->dest == bb) + edge_to_remove = EDGE_SUCC (cond_block, 1); + else + edge_to_remove = EDGE_SUCC (cond_block, 0); + if (EDGE_COUNT (edge_to_remove->dest->preds) == 1) { - EDGE_SUCC (cond_block, 0)->flags |= EDGE_FALLTHRU; - EDGE_SUCC (cond_block, 0)->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); - EDGE_SUCC (cond_block, 0)->probability = profile_probability::always (); + e->flags |= EDGE_FALLTHRU; + e->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); + e->probability = profile_probability::always (); + delete_basic_block (edge_to_remove->dest); - block_to_remove = EDGE_SUCC (cond_block, 1)->dest; + /* Eliminate the COND_EXPR at the end of COND_BLOCK. */ + gsi = gsi_last_bb (cond_block); + gsi_remove (&gsi, true); } else { - EDGE_SUCC (cond_block, 1)->flags |= EDGE_FALLTHRU; - EDGE_SUCC (cond_block, 1)->flags - &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); - EDGE_SUCC (cond_block, 1)->probability = profile_probability::always (); - - block_to_remove = EDGE_SUCC (cond_block, 0)->dest; + /* If there are other edges into the middle block make + CFG cleanup deal with the edge removal to avoid + updating dominators here in a non-trivial way. */ + gcond *cond = as_a <gcond *> (last_stmt (cond_block)); + if (edge_to_remove->flags & EDGE_TRUE_VALUE) + gimple_cond_make_false (cond); + else + gimple_cond_make_true (cond); } - delete_basic_block (block_to_remove); - - /* Eliminate the COND_EXPR at the end of COND_BLOCK. */ - gsi = gsi_last_bb (cond_block); - gsi_remove (&gsi, true); statistics_counter_event (cfun, "Replace PHI with variable", 1); @@ -959,6 +965,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, allow it and move it once the transformation is done. */ if (!empty_block_p (middle_bb)) { + if (!single_pred_p (middle_bb)) + return false; + stmt_to_move = last_and_only_stmt (middle_bb); if (!stmt_to_move) return false; @@ -1351,7 +1360,10 @@ value_replacement (basic_block cond_bb, basic_block middle_bb, } else { - statistics_counter_event (cfun, "Replace PHI with variable/value_replacement", 1); + if (!single_pred_p (middle_bb)) + return 0; + statistics_counter_event (cfun, "Replace PHI with " + "variable/value_replacement", 1); /* Replace the PHI arguments with arg. */ SET_PHI_ARG_DEF (phi, e0->dest_idx, arg); @@ -1367,7 +1379,6 @@ value_replacement (basic_block cond_bb, basic_block middle_bb, } return 1; } - } /* Now optimize (x != 0) ? x + y : y to just x + y. */ </cut>

4 years, 8 months

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [unroll] Keep unrolled iterations with initial iteration

by ci_notify＠linaro.org

After llvm commit de2fed61528a5584dc54c47f6754408597be24de Author: Philip Reames <listmail(a)philipreames.com> [unroll] Keep unrolled iterations with initial iteration the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 10902 to 11518 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 43% from 1494 to 2141 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-de2fed61528a5584dc54c47f6754408597be24de cd investigate-llvm-de2fed61528a5584dc54c47f6754408597be24de # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach de2fed61528a5584dc54c47f6754408597be24de ../artifacts/test.sh # Reproduce last_good build git checkout --detach da25f968a90ad4560fc920a6d18fc2a0221d2750 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit de2fed61528a5584dc54c47f6754408597be24de Author: Philip Reames <listmail(a)philipreames.com> Date: Fri Nov 12 11:35:28 2021 -0800 [unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious. --- llvm/lib/Transforms/Utils/LoopUnroll.cpp | 6 +- llvm/test/DebugInfo/unrolled-loop-remainder.ll | 86 +- .../Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll | 66 +- .../Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll | 24 +- .../LoopUnroll/AArch64/runtime-unroll-generic.ll | 4 +- .../LoopUnroll/AArch64/thresholdO3-cost-model.ll | 8 +- .../LoopUnroll/AArch64/unroll-upperbound.ll | 4 +- .../Transforms/LoopUnroll/ARM/loop-unrolling.ll | 4 +- .../test/Transforms/LoopUnroll/ARM/multi-blocks.ll | 230 +- llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll | 10 +- .../LoopUnroll/full-unroll-keep-first-exit.ll | 16 +- .../full-unroll-one-unpredictable-exit.ll | 16 +- llvm/test/Transforms/LoopUnroll/multiple-exits.ll | 8 +- llvm/test/Transforms/LoopUnroll/nonlatchcondbr.ll | 20 +- .../LoopUnroll/partial-unroll-non-latch-exit.ll | 14 +- .../partially-unroll-unconditional-latch.ll | 4 +- .../LoopUnroll/runtime-loop-at-most-two-exits.ll | 120 +- .../runtime-loop-multiexit-dom-verify.ll | 206 +- .../LoopUnroll/runtime-loop-multiple-exits.ll | 2560 ++++++++++---------- llvm/test/Transforms/LoopUnroll/runtime-loop5.ll | 34 +- .../LoopUnroll/runtime-multiexit-heuristic.ll | 122 +- .../LoopUnroll/runtime-small-upperbound.ll | 8 +- .../LoopUnroll/runtime-unroll-remainder.ll | 62 +- llvm/test/Transforms/LoopUnroll/scevunroll.ll | 48 +- .../Transforms/LoopUnroll/shifted-tripcount.ll | 4 +- ...er-exiting-with-phis-multiple-exiting-blocks.ll | 20 +- .../LoopUnroll/unroll-unconditional-latch.ll | 12 +- .../Transforms/LoopUnrollAndJam/unroll-and-jam.ll | 68 +- .../PhaseOrdering/AArch64/matrix-extract-insert.ll | 4 +- 29 files changed, 1896 insertions(+), 1892 deletions(-) diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp index ce463927fd50..b0c622b98d5e 100644 --- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp @@ -514,6 +514,10 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes; identifyNoAliasScopesToClone(L->getBlocks(), LoopLocalNoAliasDeclScopes); + // We place the unrolled iterations immediately after the original loop + // latch. This is a reasonable default placement if we don't have block + // frequencies, and if we do, well the layout will be adjusted later. + auto BlockInsertPt = std::next(LatchBlock->getIterator()); for (unsigned It = 1; It != ULO.Count; ++It) { SmallVector<BasicBlock *, 8> NewBlocks; SmallDenseMap<const Loop *, Loop *, 4> NewLoops; @@ -522,7 +526,7 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) { ValueToValueMapTy VMap; BasicBlock *New = CloneBasicBlock(*BB, VMap, "." + Twine(It)); - Header->getParent()->getBasicBlockList().push_back(New); + Header->getParent()->getBasicBlockList().insert(BlockInsertPt, New); assert((*BB != Header || LI->getLoopFor(*BB) == L) && "Header should not be in a sub-loop"); diff --git a/llvm/test/DebugInfo/unrolled-loop-remainder.ll b/llvm/test/DebugInfo/unrolled-loop-remainder.ll index 83c30dec780d..ba4ce1f409f6 100644 --- a/llvm/test/DebugInfo/unrolled-loop-remainder.ll +++ b/llvm/test/DebugInfo/unrolled-loop-remainder.ll @@ -38,71 +38,71 @@ define i32 @func_c() local_unnamed_addr #0 !dbg !14 { ; CHECK-NEXT: [[PROL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1, !dbg [[DBG24]] ; CHECK-NEXT: [[PROL_ITER_CMP:%.*]] = icmp ne i32 [[PROL_ITER_SUB]], 0, !dbg [[DBG24]] ; CHECK-NEXT: br i1 [[PROL_ITER_CMP]], label [[FOR_BODY_PROL_1:%.*]], label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA:%.*]], !dbg [[DBG24]] +; CHECK: for.body.prol.1: +; CHECK-NEXT: [[ARRAYIDX_PROL_1:%.*]] = getelementptr inbounds i32, i32* [[TMP6]], i64 1, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX_PROL_1]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] +; CHECK-NEXT: [[CONV_PROL_1:%.*]] = sext i32 [[TMP7]] to i64, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP8:%.*]] = inttoptr i64 [[CONV_PROL_1]] to i32*, !dbg [[DBG28]] +; CHECK-NEXT: [[ADD_PROL_1:%.*]] = add nsw i32 [[ADD_PROL]], 2, !dbg [[DBG29]] +; CHECK-NEXT: [[PROL_ITER_SUB_1:%.*]] = sub i32 [[PROL_ITER_SUB]], 1, !dbg [[DBG24]] +; CHECK-NEXT: [[PROL_ITER_CMP_1:%.*]] = icmp ne i32 [[PROL_ITER_SUB_1]], 0, !dbg [[DBG24]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_1]], label [[FOR_BODY_PROL_2:%.*]], label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]], !dbg [[DBG24]] +; CHECK: for.body.prol.2: +; CHECK-NEXT: [[ARRAYIDX_PROL_2:%.*]] = getelementptr inbounds i32, i32* [[TMP8]], i64 1, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX_PROL_2]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] +; CHECK-NEXT: [[CONV_PROL_2:%.*]] = sext i32 [[TMP9]] to i64, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP10:%.*]] = inttoptr i64 [[CONV_PROL_2]] to i32*, !dbg [[DBG28]] +; CHECK-NEXT: [[ADD_PROL_2:%.*]] = add nsw i32 [[ADD_PROL_1]], 2, !dbg [[DBG29]] +; CHECK-NEXT: br label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ; CHECK: for.body.prol.loopexit.unr-lcssa: -; CHECK-NEXT: [[DOTLCSSA_UNR_PH:%.*]] = phi i32* [ [[TMP6]], [[FOR_BODY_PROL]] ], [ [[TMP20:%.*]], [[FOR_BODY_PROL_1]] ], [ [[TMP22:%.*]], [[FOR_BODY_PROL_2:%.*]] ] -; CHECK-NEXT: [[DOTUNR_PH:%.*]] = phi i32* [ [[TMP6]], [[FOR_BODY_PROL]] ], [ [[TMP20]], [[FOR_BODY_PROL_1]] ], [ [[TMP22]], [[FOR_BODY_PROL_2]] ] -; CHECK-NEXT: [[DOTUNR1_PH:%.*]] = phi i32 [ [[ADD_PROL]], [[FOR_BODY_PROL]] ], [ [[ADD_PROL_1:%.*]], [[FOR_BODY_PROL_1]] ], [ [[ADD_PROL_2:%.*]], [[FOR_BODY_PROL_2]] ] +; CHECK-NEXT: [[DOTLCSSA_UNR_PH:%.*]] = phi i32* [ [[TMP6]], [[FOR_BODY_PROL]] ], [ [[TMP8]], [[FOR_BODY_PROL_1]] ], [ [[TMP10]], [[FOR_BODY_PROL_2]] ] +; CHECK-NEXT: [[DOTUNR_PH:%.*]] = phi i32* [ [[TMP6]], [[FOR_BODY_PROL]] ], [ [[TMP8]], [[FOR_BODY_PROL_1]] ], [ [[TMP10]], [[FOR_BODY_PROL_2]] ] +; CHECK-NEXT: [[DOTUNR1_PH:%.*]] = phi i32 [ [[ADD_PROL]], [[FOR_BODY_PROL]] ], [ [[ADD_PROL_1]], [[FOR_BODY_PROL_1]] ], [ [[ADD_PROL_2]], [[FOR_BODY_PROL_2]] ] ; CHECK-NEXT: br label [[FOR_BODY_PROL_LOOPEXIT]], !dbg [[DBG24]] ; CHECK: for.body.prol.loopexit: ; CHECK-NEXT: [[DOTLCSSA_UNR:%.*]] = phi i32* [ undef, [[FOR_BODY_LR_PH]] ], [ [[DOTLCSSA_UNR_PH]], [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ] ; CHECK-NEXT: [[DOTUNR:%.*]] = phi i32* [ [[A_PROMOTED]], [[FOR_BODY_LR_PH]] ], [ [[DOTUNR_PH]], [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ] ; CHECK-NEXT: [[DOTUNR1:%.*]] = phi i32 [ [[DOTPR]], [[FOR_BODY_LR_PH]] ], [ [[DOTUNR1_PH]], [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ] -; CHECK-NEXT: [[TMP7:%.*]] = icmp ult i32 [[TMP3]], 3, !dbg [[DBG24]] -; CHECK-NEXT: br i1 [[TMP7]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY_LR_PH_NEW:%.*]], !dbg [[DBG24]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp ult i32 [[TMP3]], 3, !dbg [[DBG24]] +; CHECK-NEXT: br i1 [[TMP11]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY_LR_PH_NEW:%.*]], !dbg [[DBG24]] ; CHECK: for.body.lr.ph.new: ; CHECK-NEXT: br label [[FOR_BODY:%.*]], !dbg [[DBG24]] ; CHECK: for.body: -; CHECK-NEXT: [[TMP8:%.*]] = phi i32* [ [[DOTUNR]], [[FOR_BODY_LR_PH_NEW]] ], [ [[TMP17:%.*]], [[FOR_BODY]] ], !dbg [[DBG28]] -; CHECK-NEXT: [[TMP9:%.*]] = phi i32 [ [[DOTUNR1]], [[FOR_BODY_LR_PH_NEW]] ], [ [[ADD_3:%.*]], [[FOR_BODY]] ] -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[TMP8]], i64 1, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] -; CHECK-NEXT: [[CONV:%.*]] = sext i32 [[TMP10]] to i64, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP11:%.*]] = inttoptr i64 [[CONV]] to i32*, !dbg [[DBG28]] -; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 2, !dbg [[DBG29]] -; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* [[TMP11]], i64 1, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP12:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] -; CHECK-NEXT: [[CONV_1:%.*]] = sext i32 [[TMP12]] to i64, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP13:%.*]] = inttoptr i64 [[CONV_1]] to i32*, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP12:%.*]] = phi i32* [ [[DOTUNR]], [[FOR_BODY_LR_PH_NEW]] ], [ [[TMP21:%.*]], [[FOR_BODY]] ], !dbg [[DBG28]] +; CHECK-NEXT: [[TMP13:%.*]] = phi i32 [ [[DOTUNR1]], [[FOR_BODY_LR_PH_NEW]] ], [ [[ADD_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[TMP12]], i64 1, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] +; CHECK-NEXT: [[CONV:%.*]] = sext i32 [[TMP14]] to i64, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP15:%.*]] = inttoptr i64 [[CONV]] to i32*, !dbg [[DBG28]] +; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP13]], 2, !dbg [[DBG29]] +; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* [[TMP15]], i64 1, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] +; CHECK-NEXT: [[CONV_1:%.*]] = sext i32 [[TMP16]] to i64, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP17:%.*]] = inttoptr i64 [[CONV_1]] to i32*, !dbg [[DBG28]] ; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[ADD]], 2, !dbg [[DBG29]] -; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i32, i32* [[TMP13]], i64 1, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX_2]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] -; CHECK-NEXT: [[CONV_2:%.*]] = sext i32 [[TMP14]] to i64, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP15:%.*]] = inttoptr i64 [[CONV_2]] to i32*, !dbg [[DBG28]] +; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i32, i32* [[TMP17]], i64 1, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP18:%.*]] = load i32, i32* [[ARRAYIDX_2]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] +; CHECK-NEXT: [[CONV_2:%.*]] = sext i32 [[TMP18]] to i64, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP19:%.*]] = inttoptr i64 [[CONV_2]] to i32*, !dbg [[DBG28]] ; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[ADD_1]], 2, !dbg [[DBG29]] -; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i32, i32* [[TMP15]], i64 1, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX_3]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] -; CHECK-NEXT: [[CONV_3:%.*]] = sext i32 [[TMP16]] to i64, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP17]] = inttoptr i64 [[CONV_3]] to i32*, !dbg [[DBG28]] +; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i32, i32* [[TMP19]], i64 1, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP20:%.*]] = load i32, i32* [[ARRAYIDX_3]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] +; CHECK-NEXT: [[CONV_3:%.*]] = sext i32 [[TMP20]] to i64, !dbg [[DBG28]] +; CHECK-NEXT: [[TMP21]] = inttoptr i64 [[CONV_3]] to i32*, !dbg [[DBG28]] ; CHECK-NEXT: [[ADD_3]] = add nsw i32 [[ADD_2]], 2, !dbg [[DBG29]] ; CHECK-NEXT: [[TOBOOL_3:%.*]] = icmp eq i32 [[ADD_3]], 0, !dbg [[DBG24]] ; CHECK-NEXT: br i1 [[TOBOOL_3]], label [[FOR_COND_FOR_END_CRIT_EDGE_UNR_LCSSA:%.*]], label [[FOR_BODY]], !dbg [[DBG24]], !llvm.loop [[LOOP30:![0-9]+]] ; CHECK: for.cond.for.end_crit_edge.unr-lcssa: -; CHECK-NEXT: [[DOTLCSSA_PH:%.*]] = phi i32* [ [[TMP17]], [[FOR_BODY]] ] +; CHECK-NEXT: [[DOTLCSSA_PH:%.*]] = phi i32* [ [[TMP21]], [[FOR_BODY]] ] ; CHECK-NEXT: br label [[FOR_COND_FOR_END_CRIT_EDGE]], !dbg [[DBG24]] ; CHECK: for.cond.for.end_crit_edge: ; CHECK-NEXT: [[DOTLCSSA:%.*]] = phi i32* [ [[DOTLCSSA_UNR]], [[FOR_BODY_PROL_LOOPEXIT]] ], [ [[DOTLCSSA_PH]], [[FOR_COND_FOR_END_CRIT_EDGE_UNR_LCSSA]] ], !dbg [[DBG28]] -; CHECK-NEXT: [[TMP18:%.*]] = add i32 [[TMP2]], 2, !dbg [[DBG24]] +; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[TMP2]], 2, !dbg [[DBG24]] ; CHECK-NEXT: store i32* [[DOTLCSSA]], i32** @a, align 8, !dbg [[DBG25]], !tbaa [[TBAA26]] -; CHECK-NEXT: store i32 [[TMP18]], i32* @b, align 4, !dbg [[DBG33:![0-9]+]], !tbaa [[TBAA20]] +; CHECK-NEXT: store i32 [[TMP22]], i32* @b, align 4, !dbg [[DBG33:![0-9]+]], !tbaa [[TBAA20]] ; CHECK-NEXT: br label [[FOR_END]], !dbg [[DBG24]] ; CHECK: for.end: ; CHECK-NEXT: ret i32 undef, !dbg [[DBG34:![0-9]+]] -; CHECK: for.body.prol.1: -; CHECK-NEXT: [[ARRAYIDX_PROL_1:%.*]] = getelementptr inbounds i32, i32* [[TMP6]], i64 1, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP19:%.*]] = load i32, i32* [[ARRAYIDX_PROL_1]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] -; CHECK-NEXT: [[CONV_PROL_1:%.*]] = sext i32 [[TMP19]] to i64, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP20]] = inttoptr i64 [[CONV_PROL_1]] to i32*, !dbg [[DBG28]] -; CHECK-NEXT: [[ADD_PROL_1]] = add nsw i32 [[ADD_PROL]], 2, !dbg [[DBG29]] -; CHECK-NEXT: [[PROL_ITER_SUB_1:%.*]] = sub i32 [[PROL_ITER_SUB]], 1, !dbg [[DBG24]] -; CHECK-NEXT: [[PROL_ITER_CMP_1:%.*]] = icmp ne i32 [[PROL_ITER_SUB_1]], 0, !dbg [[DBG24]] -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_1]], label [[FOR_BODY_PROL_2]], label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]], !dbg [[DBG24]] -; CHECK: for.body.prol.2: -; CHECK-NEXT: [[ARRAYIDX_PROL_2:%.*]] = getelementptr inbounds i32, i32* [[TMP20]], i64 1, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP21:%.*]] = load i32, i32* [[ARRAYIDX_PROL_2]], align 4, !dbg [[DBG28]], !tbaa [[TBAA20]] -; CHECK-NEXT: [[CONV_PROL_2:%.*]] = sext i32 [[TMP21]] to i64, !dbg [[DBG28]] -; CHECK-NEXT: [[TMP22]] = inttoptr i64 [[CONV_PROL_2]] to i32*, !dbg [[DBG28]] -; CHECK-NEXT: [[ADD_PROL_2]] = add nsw i32 [[ADD_PROL_1]], 2, !dbg [[DBG29]] -; CHECK-NEXT: br label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ; entry: %.pr = load i32, i32* @b, align 4, !dbg !17, !tbaa !20 diff --git a/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll b/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll index 3e611430d69e..7bb2d732195a 100644 --- a/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll +++ b/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll @@ -17,24 +17,24 @@ define void @test1(i32 %i, i32 %j) nounwind uwtable ssp { ; CHECK-NEXT: [[SUB5:%.*]] = sub i32 [[SUB]], [[J:%.*]] ; CHECK-NEXT: [[COND2:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND2]], label [[IF_THEN_LOOPEXIT:%.*]], label [[IF_ELSE_1:%.*]] -; CHECK: if.then.loopexit: -; CHECK-NEXT: [[SUB5_LCSSA:%.*]] = phi i32 [ [[SUB5]], [[IF_ELSE]] ], [ [[SUB5_1:%.*]], [[IF_ELSE_1]] ], [ [[SUB5_2:%.*]], [[IF_ELSE_2:%.*]] ], [ [[SUB5_3]], [[IF_ELSE_3]] ] -; CHECK-NEXT: br label [[IF_THEN]] -; CHECK: if.then: -; CHECK-NEXT: [[I_TR:%.*]] = phi i32 [ [[I]], [[ENTRY:%.*]] ], [ [[SUB5_LCSSA]], [[IF_THEN_LOOPEXIT]] ] -; CHECK-NEXT: ret void ; CHECK: if.else.1: -; CHECK-NEXT: [[SUB5_1]] = sub i32 [[SUB5]], [[J]] +; CHECK-NEXT: [[SUB5_1:%.*]] = sub i32 [[SUB5]], [[J]] ; CHECK-NEXT: [[COND2_1:%.*]] = call zeroext i1 @check() -; CHECK-NEXT: br i1 [[COND2_1]], label [[IF_THEN_LOOPEXIT]], label [[IF_ELSE_2]] +; CHECK-NEXT: br i1 [[COND2_1]], label [[IF_THEN_LOOPEXIT]], label [[IF_ELSE_2:%.*]] ; CHECK: if.else.2: -; CHECK-NEXT: [[SUB5_2]] = sub i32 [[SUB5_1]], [[J]] +; CHECK-NEXT: [[SUB5_2:%.*]] = sub i32 [[SUB5_1]], [[J]] ; CHECK-NEXT: [[COND2_2:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND2_2]], label [[IF_THEN_LOOPEXIT]], label [[IF_ELSE_3]] ; CHECK: if.else.3: ; CHECK-NEXT: [[SUB5_3]] = sub i32 [[SUB5_2]], [[J]] ; CHECK-NEXT: [[COND2_3:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND2_3]], label [[IF_THEN_LOOPEXIT]], label [[IF_ELSE]], !llvm.loop [[LOOP0:![0-9]+]] +; CHECK: if.then.loopexit: +; CHECK-NEXT: [[SUB5_LCSSA:%.*]] = phi i32 [ [[SUB5]], [[IF_ELSE]] ], [ [[SUB5_1]], [[IF_ELSE_1]] ], [ [[SUB5_2]], [[IF_ELSE_2]] ], [ [[SUB5_3]], [[IF_ELSE_3]] ] +; CHECK-NEXT: br label [[IF_THEN]] +; CHECK: if.then: +; CHECK-NEXT: [[I_TR:%.*]] = phi i32 [ [[I]], [[ENTRY:%.*]] ], [ [[SUB5_LCSSA]], [[IF_THEN_LOOPEXIT]] ] +; CHECK-NEXT: ret void ; entry: %cond1 = call zeroext i1 @check() @@ -77,17 +77,11 @@ define i32 @test2(i32* nocapture %p, i32 %n) nounwind readonly { ; CHECK-NEXT: [[INDVAR_NEXT:%.*]] = add nuw nsw i64 [[INDVAR]], 1 ; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVAR_NEXT]], [[TMP]] ; CHECK-NEXT: br i1 [[EXITCOND]], label [[BB_1:%.*]], label [[BB1_BB2_CRIT_EDGE:%.*]] -; CHECK: bb1.bb2_crit_edge: -; CHECK-NEXT: [[DOTLCSSA:%.*]] = phi i32 [ [[TMP2]], [[BB1]] ], [ [[TMP4:%.*]], [[BB1_1:%.*]] ], [ [[TMP6:%.*]], [[BB1_2:%.*]] ], [ [[TMP8]], [[BB1_3]] ] -; CHECK-NEXT: br label [[BB2]] -; CHECK: bb2: -; CHECK-NEXT: [[S_0_LCSSA:%.*]] = phi i32 [ [[DOTLCSSA]], [[BB1_BB2_CRIT_EDGE]] ], [ 0, [[ENTRY:%.*]] ] -; CHECK-NEXT: ret i32 [[S_0_LCSSA]] ; CHECK: bb.1: ; CHECK-NEXT: [[SCEVGEP_1:%.*]] = getelementptr i32, i32* [[P]], i64 [[INDVAR_NEXT]] ; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[SCEVGEP_1]], align 1 -; CHECK-NEXT: [[TMP4]] = add nsw i32 [[TMP3]], [[TMP2]] -; CHECK-NEXT: br label [[BB1_1]] +; CHECK-NEXT: [[TMP4:%.*]] = add nsw i32 [[TMP3]], [[TMP2]] +; CHECK-NEXT: br label [[BB1_1:%.*]] ; CHECK: bb1.1: ; CHECK-NEXT: [[INDVAR_NEXT_1:%.*]] = add nuw nsw i64 [[INDVAR_NEXT]], 1 ; CHECK-NEXT: [[EXITCOND_1:%.*]] = icmp ne i64 [[INDVAR_NEXT_1]], [[TMP]] @@ -95,8 +89,8 @@ define i32 @test2(i32* nocapture %p, i32 %n) nounwind readonly { ; CHECK: bb.2: ; CHECK-NEXT: [[SCEVGEP_2:%.*]] = getelementptr i32, i32* [[P]], i64 [[INDVAR_NEXT_1]] ; CHECK-NEXT: [[TMP5:%.*]] = load i32, i32* [[SCEVGEP_2]], align 1 -; CHECK-NEXT: [[TMP6]] = add nsw i32 [[TMP5]], [[TMP4]] -; CHECK-NEXT: br label [[BB1_2]] +; CHECK-NEXT: [[TMP6:%.*]] = add nsw i32 [[TMP5]], [[TMP4]] +; CHECK-NEXT: br label [[BB1_2:%.*]] ; CHECK: bb1.2: ; CHECK-NEXT: [[INDVAR_NEXT_2:%.*]] = add nuw nsw i64 [[INDVAR_NEXT_1]], 1 ; CHECK-NEXT: [[EXITCOND_2:%.*]] = icmp ne i64 [[INDVAR_NEXT_2]], [[TMP]] @@ -110,6 +104,12 @@ define i32 @test2(i32* nocapture %p, i32 %n) nounwind readonly { ; CHECK-NEXT: [[INDVAR_NEXT_3]] = add i64 [[INDVAR_NEXT_2]], 1 ; CHECK-NEXT: [[EXITCOND_3:%.*]] = icmp ne i64 [[INDVAR_NEXT_3]], [[TMP]] ; CHECK-NEXT: br i1 [[EXITCOND_3]], label [[BB]], label [[BB1_BB2_CRIT_EDGE]], !llvm.loop [[LOOP2:![0-9]+]] +; CHECK: bb1.bb2_crit_edge: +; CHECK-NEXT: [[DOTLCSSA:%.*]] = phi i32 [ [[TMP2]], [[BB1]] ], [ [[TMP4]], [[BB1_1]] ], [ [[TMP6]], [[BB1_2]] ], [ [[TMP8]], [[BB1_3]] ] +; CHECK-NEXT: br label [[BB2]] +; CHECK: bb2: +; CHECK-NEXT: [[S_0_LCSSA:%.*]] = phi i32 [ [[DOTLCSSA]], [[BB1_BB2_CRIT_EDGE]] ], [ 0, [[ENTRY:%.*]] ] +; CHECK-NEXT: ret i32 [[S_0_LCSSA]] ; entry: %0 = icmp sgt i32 %n, 0 ; <i1> [#uses=1] @@ -162,20 +162,12 @@ define i32 @test3() nounwind uwtable ssp align 2 { ; CHECK: do.cond: ; CHECK-NEXT: [[COND3:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND3]], label [[DO_END:%.*]], label [[DO_BODY_1:%.*]] -; CHECK: do.end: -; CHECK-NEXT: br label [[RETURN]] -; CHECK: return.loopexit: -; CHECK-NEXT: [[TMP7_I_LCSSA:%.*]] = phi i32 [ [[TMP7_I]], [[LAND_LHS_TRUE]] ], [ [[TMP7_I_1:%.*]], [[LAND_LHS_TRUE_1:%.*]] ], [ [[TMP7_I_2:%.*]], [[LAND_LHS_TRUE_2:%.*]] ], [ [[TMP7_I_3:%.*]], [[LAND_LHS_TRUE_3:%.*]] ] -; CHECK-NEXT: br label [[RETURN]] -; CHECK: return: -; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i32 [ 0, [[DO_END]] ], [ 0, [[ENTRY:%.*]] ], [ [[TMP7_I_LCSSA]], [[RETURN_LOOPEXIT]] ] -; CHECK-NEXT: ret i32 [[RETVAL_0]] ; CHECK: do.body.1: ; CHECK-NEXT: [[COND2_1:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND2_1]], label [[EXIT_1:%.*]], label [[DO_COND_1:%.*]] ; CHECK: exit.1: -; CHECK-NEXT: [[TMP7_I_1]] = load i32, i32* undef, align 8 -; CHECK-NEXT: br i1 undef, label [[DO_COND_1]], label [[LAND_LHS_TRUE_1]] +; CHECK-NEXT: [[TMP7_I_1:%.*]] = load i32, i32* undef, align 8 +; CHECK-NEXT: br i1 undef, label [[DO_COND_1]], label [[LAND_LHS_TRUE_1:%.*]] ; CHECK: land.lhs.true.1: ; CHECK-NEXT: br i1 true, label [[RETURN_LOOPEXIT]], label [[DO_COND_1]] ; CHECK: do.cond.1: @@ -185,8 +177,8 @@ define i32 @test3() nounwind uwtable ssp align 2 { ; CHECK-NEXT: [[COND2_2:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND2_2]], label [[EXIT_2:%.*]], label [[DO_COND_2:%.*]] ; CHECK: exit.2: -; CHECK-NEXT: [[TMP7_I_2]] = load i32, i32* undef, align 8 -; CHECK-NEXT: br i1 undef, label [[DO_COND_2]], label [[LAND_LHS_TRUE_2]] +; CHECK-NEXT: [[TMP7_I_2:%.*]] = load i32, i32* undef, align 8 +; CHECK-NEXT: br i1 undef, label [[DO_COND_2]], label [[LAND_LHS_TRUE_2:%.*]] ; CHECK: land.lhs.true.2: ; CHECK-NEXT: br i1 true, label [[RETURN_LOOPEXIT]], label [[DO_COND_2]] ; CHECK: do.cond.2: @@ -196,13 +188,21 @@ define i32 @test3() nounwind uwtable ssp align 2 { ; CHECK-NEXT: [[COND2_3:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND2_3]], label [[EXIT_3:%.*]], label [[DO_COND_3:%.*]] ; CHECK: exit.3: -; CHECK-NEXT: [[TMP7_I_3]] = load i32, i32* undef, align 8 -; CHECK-NEXT: br i1 undef, label [[DO_COND_3]], label [[LAND_LHS_TRUE_3]] +; CHECK-NEXT: [[TMP7_I_3:%.*]] = load i32, i32* undef, align 8 +; CHECK-NEXT: br i1 undef, label [[DO_COND_3]], label [[LAND_LHS_TRUE_3:%.*]] ; CHECK: land.lhs.true.3: ; CHECK-NEXT: br i1 true, label [[RETURN_LOOPEXIT]], label [[DO_COND_3]] ; CHECK: do.cond.3: ; CHECK-NEXT: [[COND3_3:%.*]] = call zeroext i1 @check() ; CHECK-NEXT: br i1 [[COND3_3]], label [[DO_END]], label [[DO_BODY]], !llvm.loop [[LOOP3:![0-9]+]] +; CHECK: do.end: +; CHECK-NEXT: br label [[RETURN]] +; CHECK: return.loopexit: +; CHECK-NEXT: [[TMP7_I_LCSSA:%.*]] = phi i32 [ [[TMP7_I]], [[LAND_LHS_TRUE]] ], [ [[TMP7_I_1]], [[LAND_LHS_TRUE_1]] ], [ [[TMP7_I_2]], [[LAND_LHS_TRUE_2]] ], [ [[TMP7_I_3]], [[LAND_LHS_TRUE_3]] ] +; CHECK-NEXT: br label [[RETURN]] +; CHECK: return: +; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i32 [ 0, [[DO_END]] ], [ 0, [[ENTRY:%.*]] ], [ [[TMP7_I_LCSSA]], [[RETURN_LOOPEXIT]] ] +; CHECK-NEXT: ret i32 [[RETVAL_0]] ; entry: %cond1 = call zeroext i1 @check() diff --git a/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll b/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll index be4b6ff64fdd..af648bae8642 100644 --- a/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll +++ b/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll @@ -33,16 +33,13 @@ define i32 @foo() uwtable ssp align 2 { ; CHECK: do.cond: ; CHECK-NEXT: [[CMP18:%.*]] = icmp sgt i32 [[CALL2]], -1 ; CHECK-NEXT: br i1 [[CMP18]], label [[LAND_LHS_TRUE_I_1:%.*]], label [[RETURN]] -; CHECK: return: -; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i32 [ [[TMP7_I]], [[LAND_LHS_TRUE]] ], [ 0, [[DO_COND]] ], [ [[TMP7_I_1:%.*]], [[LAND_LHS_TRUE_1:%.*]] ], [ 0, [[DO_COND_1:%.*]] ], [ [[TMP7_I_2:%.*]], [[LAND_LHS_TRUE_2:%.*]] ], [ 0, [[DO_COND_2:%.*]] ], [ [[TMP7_I_3:%.*]], [[LAND_LHS_TRUE_3:%.*]] ], [ 0, [[DO_COND_3:%.*]] ] -; CHECK-NEXT: ret i32 [[RETVAL_0]] ; CHECK: land.lhs.true.i.1: ; CHECK-NEXT: [[CMP4_I_1:%.*]] = call zeroext i1 @check() #[[ATTR0]] -; CHECK-NEXT: br i1 [[CMP4_I_1]], label [[BAR_EXIT_1:%.*]], label [[DO_COND_1]] +; CHECK-NEXT: br i1 [[CMP4_I_1]], label [[BAR_EXIT_1:%.*]], label [[DO_COND_1:%.*]] ; CHECK: bar.exit.1: -; CHECK-NEXT: [[TMP7_I_1]] = call i32 @getval() #[[ATTR0]] +; CHECK-NEXT: [[TMP7_I_1:%.*]] = call i32 @getval() #[[ATTR0]] ; CHECK-NEXT: [[CMP_NOT_1:%.*]] = icmp eq i32 [[TMP7_I_1]], 0 -; CHECK-NEXT: br i1 [[CMP_NOT_1]], label [[DO_COND_1]], label [[LAND_LHS_TRUE_1]] +; CHECK-NEXT: br i1 [[CMP_NOT_1]], label [[DO_COND_1]], label [[LAND_LHS_TRUE_1:%.*]] ; CHECK: land.lhs.true.1: ; CHECK-NEXT: [[CALL10_1:%.*]] = call i32 @getval() ; CHECK-NEXT: [[CMP11_1:%.*]] = icmp eq i32 [[CALL10_1]], 0 @@ -52,11 +49,11 @@ define i32 @foo() uwtable ssp align 2 { ; CHECK-NEXT: br i1 [[CMP18_1]], label [[LAND_LHS_TRUE_I_2:%.*]], label [[RETURN]] ; CHECK: land.lhs.true.i.2: ; CHECK-NEXT: [[CMP4_I_2:%.*]] = call zeroext i1 @check() #[[ATTR0]] -; CHECK-NEXT: br i1 [[CMP4_I_2]], label [[BAR_EXIT_2:%.*]], label [[DO_COND_2]] +; CHECK-NEXT: br i1 [[CMP4_I_2]], label [[BAR_EXIT_2:%.*]], label [[DO_COND_2:%.*]] ; CHECK: bar.exit.2: -; CHECK-NEXT: [[TMP7_I_2]] = call i32 @getval() #[[ATTR0]] +; CHECK-NEXT: [[TMP7_I_2:%.*]] = call i32 @getval() #[[ATTR0]] ; CHECK-NEXT: [[CMP_NOT_2:%.*]] = icmp eq i32 [[TMP7_I_2]], 0 -; CHECK-NEXT: br i1 [[CMP_NOT_2]], label [[DO_COND_2]], label [[LAND_LHS_TRUE_2]] +; CHECK-NEXT: br i1 [[CMP_NOT_2]], label [[DO_COND_2]], label [[LAND_LHS_TRUE_2:%.*]] ; CHECK: land.lhs.true.2: ; CHECK-NEXT: [[CALL10_2:%.*]] = call i32 @getval() ; CHECK-NEXT: [[CMP11_2:%.*]] = icmp eq i32 [[CALL10_2]], 0 @@ -66,11 +63,11 @@ define i32 @foo() uwtable ssp align 2 { ; CHECK-NEXT: br i1 [[CMP18_2]], label [[LAND_LHS_TRUE_I_3:%.*]], label [[RETURN]] ; CHECK: land.lhs.true.i.3: ; CHECK-NEXT: [[CMP4_I_3:%.*]] = call zeroext i1 @check() #[[ATTR0]] -; CHECK-NEXT: br i1 [[CMP4_I_3]], label [[BAR_EXIT_3:%.*]], label [[DO_COND_3]] +; CHECK-NEXT: br i1 [[CMP4_I_3]], label [[BAR_EXIT_3:%.*]], label [[DO_COND_3:%.*]] ; CHECK: bar.exit.3: -; CHECK-NEXT: [[TMP7_I_3]] = call i32 @getval() #[[ATTR0]] +; CHECK-NEXT: [[TMP7_I_3:%.*]] = call i32 @getval() #[[ATTR0]] ; CHECK-NEXT: [[CMP_NOT_3:%.*]] = icmp eq i32 [[TMP7_I_3]], 0 -; CHECK-NEXT: br i1 [[CMP_NOT_3]], label [[DO_COND_3]], label [[LAND_LHS_TRUE_3]] +; CHECK-NEXT: br i1 [[CMP_NOT_3]], label [[DO_COND_3]], label [[LAND_LHS_TRUE_3:%.*]] ; CHECK: land.lhs.true.3: ; CHECK-NEXT: [[CALL10_3:%.*]] = call i32 @getval() ; CHECK-NEXT: [[CMP11_3:%.*]] = icmp eq i32 [[CALL10_3]], 0 @@ -78,6 +75,9 @@ define i32 @foo() uwtable ssp align 2 { ; CHECK: do.cond.3: ; CHECK-NEXT: [[CMP18_3:%.*]] = icmp sgt i32 [[CALL2]], -1 ; CHECK-NEXT: br i1 [[CMP18_3]], label [[LAND_LHS_TRUE_I]], label [[RETURN]], !llvm.loop [[LOOP0:![0-9]+]] +; CHECK: return: +; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i32 [ [[TMP7_I]], [[LAND_LHS_TRUE]] ], [ 0, [[DO_COND]] ], [ [[TMP7_I_1]], [[LAND_LHS_TRUE_1]] ], [ 0, [[DO_COND_1]] ], [ [[TMP7_I_2]], [[LAND_LHS_TRUE_2]] ], [ 0, [[DO_COND_2]] ], [ [[TMP7_I_3]], [[LAND_LHS_TRUE_3]] ], [ 0, [[DO_COND_3]] ] +; CHECK-NEXT: ret i32 [[RETVAL_0]] ; entry: br i1 undef, label %return, label %if.end diff --git a/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll b/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll index 5bbab929c936..5c8f9ca01679 100644 --- a/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll +++ b/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll @@ -67,8 +67,6 @@ define void @runtime_unroll_generic(i32 %arg_0, i32* %arg_1, i16* %arg_2, i16* % ; CHECK-A55-NEXT: store i32 [[ADD21_EPIL]], i32* [[ARRAYIDX20]], align 4 ; CHECK-A55-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i32 [[XTRAITER]], 1 ; CHECK-A55-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[FOR_END]], label [[FOR_BODY6_EPIL_1:%.*]] -; CHECK-A55: for.end: -; CHECK-A55-NEXT: ret void ; CHECK-A55: for.body6.epil.1: ; CHECK-A55-NEXT: [[TMP14:%.*]] = load i16, i16* [[ARRAYIDX10]], align 2 ; CHECK-A55-NEXT: [[CONV_EPIL_1:%.*]] = sext i16 [[TMP14]] to i32 @@ -90,6 +88,8 @@ define void @runtime_unroll_generic(i32 %arg_0, i32* %arg_1, i16* %arg_2, i16* % ; CHECK-A55-NEXT: [[ADD21_EPIL_2:%.*]] = add nsw i32 [[MUL16_EPIL_2]], [[TMP19]] ; CHECK-A55-NEXT: store i32 [[ADD21_EPIL_2]], i32* [[ARRAYIDX20]], align 4 ; CHECK-A55-NEXT: br label [[FOR_END]] +; CHECK-A55: for.end: +; CHECK-A55-NEXT: ret void ; ; CHECK-GENERIC-LABEL: @runtime_unroll_generic( ; CHECK-GENERIC-NEXT: entry: diff --git a/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll b/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll index ee07518f8cac..5c6ac690c0ca 100644 --- a/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll +++ b/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll @@ -21,10 +21,6 @@ define i32 @tripcount_11() { ; CHECK-NEXT: br label [[DO_BODY6:%.*]] ; CHECK: for.cond: ; CHECK-NEXT: br i1 true, label [[FOR_COND_1:%.*]], label [[IF_THEN11:%.*]] -; CHECK: do.body6: -; CHECK-NEXT: br i1 true, label [[FOR_COND:%.*]], label [[IF_THEN11]] -; CHECK: if.then11: -; CHECK-NEXT: unreachable ; CHECK: for.cond.1: ; CHECK-NEXT: br i1 true, label [[FOR_COND_2:%.*]], label [[IF_THEN11]] ; CHECK: for.cond.2: @@ -45,6 +41,10 @@ define i32 @tripcount_11() { ; CHECK-NEXT: br i1 true, label [[FOR_COND_10:%.*]], label [[IF_THEN11]] ; CHECK: for.cond.10: ; CHECK-NEXT: ret i32 0 +; CHECK: do.body6: +; CHECK-NEXT: br i1 true, label [[FOR_COND:%.*]], label [[IF_THEN11]] +; CHECK: if.then11: +; CHECK-NEXT: unreachable ; do.body6.preheader: br label %do.body6 diff --git a/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll b/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll index 3b82365d1a6e..ee905e5b10fe 100644 --- a/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll +++ b/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll @@ -18,8 +18,6 @@ define void @test(i1 %cond) { ; CHECK-NEXT: br label [[LATCH]] ; CHECK: latch: ; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[FOR_BODY_1:%.*]] -; CHECK: for.end: -; CHECK-NEXT: ret void ; CHECK: for.body.1: ; CHECK-NEXT: switch i32 1, label [[SW_DEFAULT_1:%.*]] [ ; CHECK-NEXT: i32 2, label [[LATCH_1:%.*]] @@ -38,6 +36,8 @@ define void @test(i1 %cond) { ; CHECK-NEXT: br label [[LATCH_2]] ; CHECK: latch.2: ; CHECK-NEXT: br label [[FOR_END]] +; CHECK: for.end: +; CHECK-NEXT: ret void ; entry: %0 = select i1 %cond, i32 2, i32 3 diff --git a/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll b/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll index f2e748ade0a2..e12dbf031b3b 100644 --- a/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll +++ b/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll @@ -121,14 +121,14 @@ for.body4: ; CHECK-NOUNROLL: br ; CHECK-UNROLL: for.body4.epil: +; CHECK-UNROLL: for.body4.epil.1: +; CHECK-UNROLL: for.body4.epil.2: ; CHECK-UNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z0-9.]+]] ], [ [[IV4:%[a-z.0-9]+]], %for.body4 ] ; CHECK-UNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1 ; CHECK-UNROLL: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1 ; CHECK-UNROLL: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1 ; CHECK-UNROLL: [[IV4]] = add nuw i32 [[IV3]], 1 ; CHECK-UNROLL: br -; CHECK-UNROLL: for.body4.epil.1: -; CHECK-UNROLL: for.body4.epil.2: %w.024 = phi i32 [ 0, %for.body4.lr.ph ], [ %inc, %for.body4 ] %add = add i32 %w.024, %mul diff --git a/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll b/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll index 156c0ab10658..8c4257698ab7 100644 --- a/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll +++ b/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll @@ -45,8 +45,37 @@ define void @test_three_blocks(i32* nocapture %Output, ; CHECK-NEXT: [[EPIL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1 ; CHECK-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp ne i32 [[EPIL_ITER_SUB]], 0 ; CHECK-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_BODY_EPIL_1:%.*]], label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA:%.*]] +; CHECK: for.body.epil.1: +; CHECK-NEXT: [[ARRAYIDX_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_EPIL]] +; CHECK-NEXT: [[TMP4:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4 +; CHECK-NEXT: [[TOBOOL_EPIL_1:%.*]] = icmp eq i32 [[TMP4]], 0 +; CHECK-NEXT: br i1 [[TOBOOL_EPIL_1]], label [[FOR_INC_EPIL_1:%.*]], label [[IF_THEN_EPIL_1:%.*]] +; CHECK: if.then.epil.1: +; CHECK-NEXT: [[ARRAYIDX1_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_EPIL]] +; CHECK-NEXT: [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_1]], align 4 +; CHECK-NEXT: [[ADD_EPIL_1:%.*]] = add i32 [[TMP5]], [[TEMP_1_EPIL]] +; CHECK-NEXT: br label [[FOR_INC_EPIL_1]] +; CHECK: for.inc.epil.1: +; CHECK-NEXT: [[TEMP_1_EPIL_1:%.*]] = phi i32 [ [[ADD_EPIL_1]], [[IF_THEN_EPIL_1]] ], [ [[TEMP_1_EPIL]], [[FOR_BODY_EPIL_1]] ] +; CHECK-NEXT: [[INC_EPIL_1:%.*]] = add nuw i32 [[INC_EPIL]], 1 +; CHECK-NEXT: [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1 +; CHECK-NEXT: [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0 +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY_EPIL_2:%.*]], label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] +; CHECK: for.body.epil.2: +; CHECK-NEXT: [[ARRAYIDX_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_EPIL_1]] +; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4 +; CHECK-NEXT: [[TOBOOL_EPIL_2:%.*]] = icmp eq i32 [[TMP6]], 0 +; CHECK-NEXT: br i1 [[TOBOOL_EPIL_2]], label [[FOR_INC_EPIL_2:%.*]], label [[IF_THEN_EPIL_2:%.*]] +; CHECK: if.then.epil.2: +; CHECK-NEXT: [[ARRAYIDX1_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_EPIL_1]] +; CHECK-NEXT: [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_2]], align 4 +; CHECK-NEXT: [[ADD_EPIL_2:%.*]] = add i32 [[TMP7]], [[TEMP_1_EPIL_1]] +; CHECK-NEXT: br label [[FOR_INC_EPIL_2]] +; CHECK: for.inc.epil.2: +; CHECK-NEXT: [[TEMP_1_EPIL_2:%.*]] = phi i32 [ [[ADD_EPIL_2]], [[IF_THEN_EPIL_2]] ], [ [[TEMP_1_EPIL_1]], [[FOR_BODY_EPIL_2]] ] +; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ; CHECK: for.cond.cleanup.loopexit.epilog-lcssa: -; CHECK-NEXT: [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], [[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1:%.*]], [[FOR_INC_EPIL_1:%.*]] ], [ [[TEMP_1_EPIL_2:%.*]], [[FOR_INC_EPIL_2:%.*]] ] +; CHECK-NEXT: [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], [[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1]], [[FOR_INC_EPIL_1]] ], [ [[TEMP_1_EPIL_2]], [[FOR_INC_EPIL_2]] ] ; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT]] ; CHECK: for.cond.cleanup.loopexit: ; CHECK-NEXT: [[TEMP_1_LCSSA:%.*]] = phi i32 [ [[TEMP_1_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[TEMP_1_LCSSA_PH1]], [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ] @@ -60,51 +89,22 @@ define void @test_three_blocks(i32* nocapture %Output, ; CHECK-NEXT: [[TEMP_09:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER_NEW]] ], [ [[TEMP_1_3]], [[FOR_INC_3]] ] ; CHECK-NEXT: [[NITER:%.*]] = phi i32 [ [[UNROLL_ITER]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[FOR_INC_3]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[J_010]] -; CHECK-NEXT: [[TMP4:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 -; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i32 [[TMP4]], 0 +; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 +; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i32 [[TMP8]], 0 ; CHECK-NEXT: br i1 [[TOBOOL]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: if.then: ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[J_010]] -; CHECK-NEXT: [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX1]], align 4 -; CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP5]], [[TEMP_09]] +; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX1]], align 4 +; CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP9]], [[TEMP_09]] ; CHECK-NEXT: br label [[FOR_INC]] ; CHECK: for.inc: ; CHECK-NEXT: [[TEMP_1:%.*]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ [[TEMP_09]], [[FOR_BODY]] ] ; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[J_010]], 1 ; CHECK-NEXT: [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC]] -; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4 -; CHECK-NEXT: [[TOBOOL_1:%.*]] = icmp eq i32 [[TMP6]], 0 +; CHECK-NEXT: [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4 +; CHECK-NEXT: [[TOBOOL_1:%.*]] = icmp eq i32 [[TMP10]], 0 ; CHECK-NEXT: br i1 [[TOBOOL_1]], label [[FOR_INC_1:%.*]], label [[IF_THEN_1:%.*]] -; CHECK: for.body.epil.1: -; CHECK-NEXT: [[ARRAYIDX_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_EPIL]] -; CHECK-NEXT: [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4 -; CHECK-NEXT: [[TOBOOL_EPIL_1:%.*]] = icmp eq i32 [[TMP7]], 0 -; CHECK-NEXT: br i1 [[TOBOOL_EPIL_1]], label [[FOR_INC_EPIL_1]], label [[IF_THEN_EPIL_1:%.*]] -; CHECK: if.then.epil.1: -; CHECK-NEXT: [[ARRAYIDX1_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_EPIL]] -; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_1]], align 4 -; CHECK-NEXT: [[ADD_EPIL_1:%.*]] = add i32 [[TMP8]], [[TEMP_1_EPIL]] -; CHECK-NEXT: br label [[FOR_INC_EPIL_1]] -; CHECK: for.inc.epil.1: -; CHECK-NEXT: [[TEMP_1_EPIL_1]] = phi i32 [ [[ADD_EPIL_1]], [[IF_THEN_EPIL_1]] ], [ [[TEMP_1_EPIL]], [[FOR_BODY_EPIL_1]] ] -; CHECK-NEXT: [[INC_EPIL_1:%.*]] = add nuw i32 [[INC_EPIL]], 1 -; CHECK-NEXT: [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1 -; CHECK-NEXT: [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY_EPIL_2:%.*]], label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] -; CHECK: for.body.epil.2: -; CHECK-NEXT: [[ARRAYIDX_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_EPIL_1]] -; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4 -; CHECK-NEXT: [[TOBOOL_EPIL_2:%.*]] = icmp eq i32 [[TMP9]], 0 -; CHECK-NEXT: br i1 [[TOBOOL_EPIL_2]], label [[FOR_INC_EPIL_2]], label [[IF_THEN_EPIL_2:%.*]] -; CHECK: if.then.epil.2: -; CHECK-NEXT: [[ARRAYIDX1_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_EPIL_1]] -; CHECK-NEXT: [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_2]], align 4 -; CHECK-NEXT: [[ADD_EPIL_2:%.*]] = add i32 [[TMP10]], [[TEMP_1_EPIL_1]] -; CHECK-NEXT: br label [[FOR_INC_EPIL_2]] -; CHECK: for.inc.epil.2: -; CHECK-NEXT: [[TEMP_1_EPIL_2]] = phi i32 [ [[ADD_EPIL_2]], [[IF_THEN_EPIL_2]] ], [ [[TEMP_1_EPIL_1]], [[FOR_BODY_EPIL_2]] ] -; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ; CHECK: if.then.1: ; CHECK-NEXT: [[ARRAYIDX1_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC]] ; CHECK-NEXT: [[TMP11:%.*]] = load i32, i32* [[ARRAYIDX1_1]], align 4 @@ -203,41 +203,34 @@ define void @test_two_exits(i32* nocapture %Output, ; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[J_016]], 1 ; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[INC]], [[MAXJ]] ; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_1:%.*]], label [[CLEANUP_LOOPEXIT]] -; CHECK: cleanup.loopexit: -; CHECK-NEXT: [[TEMP_0_LCSSA_PH:%.*]] = phi i32 [ [[TEMP_0_ADD]], [[IF_END]] ], [ [[TEMP_015]], [[FOR_BODY]] ], [ [[TEMP_0_ADD]], [[FOR_BODY_1]] ], [ [[TEMP_0_ADD_1:%.*]], [[IF_END_1:%.*]] ], [ [[TEMP_0_ADD_1]], [[FOR_BODY_2:%.*]] ], [ [[TEMP_0_ADD_2:%.*]], [[IF_END_2:%.*]] ], [ [[TEMP_0_ADD_2]], [[FOR_BODY_3:%.*]] ], [ [[TEMP_0_ADD_3]], [[IF_END_3]] ] -; CHECK-NEXT: br label [[CLEANUP]] -; CHECK: cleanup: -; CHECK-NEXT: [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[TEMP_0_LCSSA_PH]], [[CLEANUP_LOOPEXIT]] ] -; CHECK-NEXT: store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4 -; CHECK-NEXT: ret void ; CHECK: for.body.1: ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC]] ; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4 ; CHECK-NEXT: [[CMP1_1:%.*]] = icmp ugt i32 [[TMP2]], 65535 -; CHECK-NEXT: br i1 [[CMP1_1]], label [[CLEANUP_LOOPEXIT]], label [[IF_END_1]] +; CHECK-NEXT: br i1 [[CMP1_1]], label [[CLEANUP_LOOPEXIT]], label [[IF_END_1:%.*]] ; CHECK: if.end.1: ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC]] ; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[ARRAYIDX2_1]], align 4 ; CHECK-NEXT: [[TOBOOL_1:%.*]] = icmp eq i32 [[TMP3]], 0 ; CHECK-NEXT: [[ADD_1:%.*]] = select i1 [[TOBOOL_1]], i32 0, i32 [[TMP2]] -; CHECK-NEXT: [[TEMP_0_ADD_1]] = add i32 [[ADD_1]], [[TEMP_0_ADD]] +; CHECK-NEXT: [[TEMP_0_ADD_1:%.*]] = add i32 [[ADD_1]], [[TEMP_0_ADD]] ; CHECK-NEXT: [[INC_1:%.*]] = add nuw nsw i32 [[INC]], 1 ; CHECK-NEXT: [[CMP_1:%.*]] = icmp ult i32 [[INC_1]], [[MAXJ]] -; CHECK-NEXT: br i1 [[CMP_1]], label [[FOR_BODY_2]], label [[CLEANUP_LOOPEXIT]] +; CHECK-NEXT: br i1 [[CMP_1]], label [[FOR_BODY_2:%.*]], label [[CLEANUP_LOOPEXIT]] ; CHECK: for.body.2: ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_1]] ; CHECK-NEXT: [[TMP4:%.*]] = load i32, i32* [[ARRAYIDX_2]], align 4 ; CHECK-NEXT: [[CMP1_2:%.*]] = icmp ugt i32 [[TMP4]], 65535 -; CHECK-NEXT: br i1 [[CMP1_2]], label [[CLEANUP_LOOPEXIT]], label [[IF_END_2]] +; CHECK-NEXT: br i1 [[CMP1_2]], label [[CLEANUP_LOOPEXIT]], label [[IF_END_2:%.*]] ; CHECK: if.end.2: ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_1]] ; CHECK-NEXT: [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX2_2]], align 4 ; CHECK-NEXT: [[TOBOOL_2:%.*]] = icmp eq i32 [[TMP5]], 0 ; CHECK-NEXT: [[ADD_2:%.*]] = select i1 [[TOBOOL_2]], i32 0, i32 [[TMP4]] -; CHECK-NEXT: [[TEMP_0_ADD_2]] = add i32 [[ADD_2]], [[TEMP_0_ADD_1]] +; CHECK-NEXT: [[TEMP_0_ADD_2:%.*]] = add i32 [[ADD_2]], [[TEMP_0_ADD_1]] ; CHECK-NEXT: [[INC_2:%.*]] = add nuw nsw i32 [[INC_1]], 1 ; CHECK-NEXT: [[CMP_2:%.*]] = icmp ult i32 [[INC_2]], [[MAXJ]] -; CHECK-NEXT: br i1 [[CMP_2]], label [[FOR_BODY_3]], label [[CLEANUP_LOOPEXIT]] +; CHECK-NEXT: br i1 [[CMP_2]], label [[FOR_BODY_3:%.*]], label [[CLEANUP_LOOPEXIT]] ; CHECK: for.body.3: ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_2]] ; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_3]], align 4 @@ -252,6 +245,13 @@ define void @test_two_exits(i32* nocapture %Output, ; CHECK-NEXT: [[INC_3]] = add nuw i32 [[INC_2]], 1 ; CHECK-NEXT: [[CMP_3:%.*]] = icmp ult i32 [[INC_3]], [[MAXJ]] ; CHECK-NEXT: br i1 [[CMP_3]], label [[FOR_BODY]], label [[CLEANUP_LOOPEXIT]] +; CHECK: cleanup.loopexit: +; CHECK-NEXT: [[TEMP_0_LCSSA_PH:%.*]] = phi i32 [ [[TEMP_0_ADD]], [[IF_END]] ], [ [[TEMP_015]], [[FOR_BODY]] ], [ [[TEMP_0_ADD]], [[FOR_BODY_1]] ], [ [[TEMP_0_ADD_1]], [[IF_END_1]] ], [ [[TEMP_0_ADD_1]], [[FOR_BODY_2]] ], [ [[TEMP_0_ADD_2]], [[IF_END_2]] ], [ [[TEMP_0_ADD_2]], [[FOR_BODY_3]] ], [ [[TEMP_0_ADD_3]], [[IF_END_3]] ] +; CHECK-NEXT: br label [[CLEANUP]] +; CHECK: cleanup: +; CHECK-NEXT: [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[TEMP_0_LCSSA_PH]], [[CLEANUP_LOOPEXIT]] ] +; CHECK-NEXT: store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4 +; CHECK-NEXT: ret void ; i32* nocapture readonly %Condition, i32* nocapture readonly %Input, @@ -417,100 +417,100 @@ define void @test_four_blocks(i32* nocapture %Output, ; CHECK-NEXT: [[EPIL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1 ; CHECK-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp ne i32 [[EPIL_ITER_SUB]], 0 ; CHECK-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_BODY_EPIL_1:%.*]], label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA:%.*]] -; CHECK: for.cond.cleanup.loopexit.epilog-lcssa: -; CHECK-NEXT: [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], [[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1:%.*]], [[FOR_INC_EPIL_1:%.*]] ], [ [[TEMP_1_EPIL_2:%.*]], [[FOR_INC_EPIL_2:%.*]] ] -; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT]] -; CHECK: for.cond.cleanup.loopexit: -; CHECK-NEXT: [[TEMP_1_LCSSA:%.*]] = phi i32 [ [[TEMP_1_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[TEMP_1_LCSSA_PH1]], [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ] -; CHECK-NEXT: br label [[FOR_COND_CLEANUP]] -; CHECK: for.cond.cleanup: -; CHECK-NEXT: [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[TEMP_1_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ] -; CHECK-NEXT: store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4 -; CHECK-NEXT: ret void -; CHECK: for.body: -; CHECK-NEXT: [[TMP6:%.*]] = phi i32 [ [[DOTPRE]], [[FOR_BODY_LR_PH_NEW]] ], [ [[TMP23]], [[FOR_INC_3]] ] -; CHECK-NEXT: [[J_027:%.*]] = phi i32 [ 1, [[FOR_BODY_LR_PH_NEW]] ], [ [[INC_3]], [[FOR_INC_3]] ] -; CHECK-NEXT: [[TEMP_026:%.*]] = phi i32 [ 0, [[FOR_BODY_LR_PH_NEW]] ], [ [[TEMP_1_3]], [[FOR_INC_3]] ] -; CHECK-NEXT: [[NITER:%.*]] = phi i32 [ [[UNROLL_ITER]], [[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[FOR_INC_3]] ] -; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[J_027]] -; CHECK-NEXT: [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 -; CHECK-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[TMP7]], 65535 -; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[J_027]] -; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4 -; CHECK-NEXT: [[CMP4:%.*]] = icmp ugt i32 [[TMP8]], [[TMP6]] -; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]] -; CHECK: if.then: -; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP4]] to i32 -; CHECK-NEXT: [[ADD:%.*]] = add i32 [[TEMP_026]], [[COND]] -; CHECK-NEXT: br label [[FOR_INC:%.*]] -; CHECK: if.else: -; CHECK-NEXT: [[NOT_CMP4:%.*]] = xor i1 [[CMP4]], true -; CHECK-NEXT: [[SUB:%.*]] = sext i1 [[NOT_CMP4]] to i32 -; CHECK-NEXT: [[SUB10_SINK:%.*]] = add i32 [[J_027]], [[SUB]] -; CHECK-NEXT: [[ARRAYIDX11:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[SUB10_SINK]] -; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX11]], align 4 -; CHECK-NEXT: [[SUB13:%.*]] = sub i32 [[TEMP_026]], [[TMP9]] -; CHECK-NEXT: br label [[FOR_INC]] -; CHECK: for.inc: -; CHECK-NEXT: [[TEMP_1:%.*]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ [[SUB13]], [[IF_ELSE]] ] -; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[J_027]], 1 -; CHECK-NEXT: [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1 -; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC]] -; CHECK-NEXT: [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4 -; CHECK-NEXT: [[CMP1_1:%.*]] = icmp ugt i32 [[TMP10]], 65535 -; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC]] -; CHECK-NEXT: [[TMP11:%.*]] = load i32, i32* [[ARRAYIDX2_1]], align 4 -; CHECK-NEXT: [[CMP4_1:%.*]] = icmp ugt i32 [[TMP11]], [[TMP8]] -; CHECK-NEXT: br i1 [[CMP1_1]], label [[IF_THEN_1:%.*]], label [[IF_ELSE_1:%.*]] ; CHECK: for.body.epil.1: ; CHECK-NEXT: [[ARRAYIDX_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_EPIL]] -; CHECK-NEXT: [[TMP12:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4 -; CHECK-NEXT: [[CMP1_EPIL_1:%.*]] = icmp ugt i32 [[TMP12]], 65535 +; CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4 +; CHECK-NEXT: [[CMP1_EPIL_1:%.*]] = icmp ugt i32 [[TMP6]], 65535 ; CHECK-NEXT: [[ARRAYIDX2_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_EPIL]] -; CHECK-NEXT: [[TMP13:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_1]], align 4 -; CHECK-NEXT: [[CMP4_EPIL_1:%.*]] = icmp ugt i32 [[TMP13]], [[TMP4]] +; CHECK-NEXT: [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_1]], align 4 +; CHECK-NEXT: [[CMP4_EPIL_1:%.*]] = icmp ugt i32 [[TMP7]], [[TMP4]] ; CHECK-NEXT: br i1 [[CMP1_EPIL_1]], label [[IF_THEN_EPIL_1:%.*]], label [[IF_ELSE_EPIL_1:%.*]] ; CHECK: if.else.epil.1: ; CHECK-NEXT: [[NOT_CMP4_EPIL_1:%.*]] = xor i1 [[CMP4_EPIL_1]], true ; CHECK-NEXT: [[SUB_EPIL_1:%.*]] = sext i1 [[NOT_CMP4_EPIL_1]] to i32 ; CHECK-NEXT: [[SUB10_SINK_EPIL_1:%.*]] = add i32 [[INC_EPIL]], [[SUB_EPIL_1]] ; CHECK-NEXT: [[ARRAYIDX11_EPIL_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[SUB10_SINK_EPIL_1]] -; CHECK-NEXT: [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_1]], align 4 -; CHECK-NEXT: [[SUB13_EPIL_1:%.*]] = sub i32 [[TEMP_1_EPIL]], [[TMP14]] -; CHECK-NEXT: br label [[FOR_INC_EPIL_1]] +; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_1]], align 4 +; CHECK-NEXT: [[SUB13_EPIL_1:%.*]] = sub i32 [[TEMP_1_EPIL]], [[TMP8]] +; CHECK-NEXT: br label [[FOR_INC_EPIL_1:%.*]] ; CHECK: if.then.epil.1: ; CHECK-NEXT: [[COND_EPIL_1:%.*]] = zext i1 [[CMP4_EPIL_1]] to i32 ; CHECK-NEXT: [[ADD_EPIL_1:%.*]] = add i32 [[TEMP_1_EPIL]], [[COND_EPIL_1]] ; CHECK-NEXT: br label [[FOR_INC_EPIL_1]] ; CHECK: for.inc.epil.1: -; CHECK-NEXT: [[TEMP_1_EPIL_1]] = phi i32 [ [[ADD_EPIL_1]], [[IF_THEN_EPIL_1]] ], [ [[SUB13_EPIL_1]], [[IF_ELSE_EPIL_1]] ] +; CHECK-NEXT: [[TEMP_1_EPIL_1:%.*]] = phi i32 [ [[ADD_EPIL_1]], [[IF_THEN_EPIL_1]] ], [ [[SUB13_EPIL_1]], [[IF_ELSE_EPIL_1]] ] ; CHECK-NEXT: [[INC_EPIL_1:%.*]] = add nuw i32 [[INC_EPIL]], 1 ; CHECK-NEXT: [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1 ; CHECK-NEXT: [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0 ; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY_EPIL_2:%.*]], label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ; CHECK: for.body.epil.2: ; CHECK-NEXT: [[ARRAYIDX_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC_EPIL_1]] -; CHECK-NEXT: [[TMP15:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4 -; CHECK-NEXT: [[CMP1_EPIL_2:%.*]] = icmp ugt i32 [[TMP15]], 65535 +; CHECK-NEXT: [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4 +; CHECK-NEXT: [[CMP1_EPIL_2:%.*]] = icmp ugt i32 [[TMP9]], 65535 ; CHECK-NEXT: [[ARRAYIDX2_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_EPIL_1]] -; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_2]], align 4 -; CHECK-NEXT: [[CMP4_EPIL_2:%.*]] = icmp ugt i32 [[TMP16]], [[TMP13]] +; CHECK-NEXT: [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_2]], align 4 +; CHECK-NEXT: [[CMP4_EPIL_2:%.*]] = icmp ugt i32 [[TMP10]], [[TMP7]] ; CHECK-NEXT: br i1 [[CMP1_EPIL_2]], label [[IF_THEN_EPIL_2:%.*]], label [[IF_ELSE_EPIL_2:%.*]] ; CHECK: if.else.epil.2: ; CHECK-NEXT: [[NOT_CMP4_EPIL_2:%.*]] = xor i1 [[CMP4_EPIL_2]], true ; CHECK-NEXT: [[SUB_EPIL_2:%.*]] = sext i1 [[NOT_CMP4_EPIL_2]] to i32 ; CHECK-NEXT: [[SUB10_SINK_EPIL_2:%.*]] = add i32 [[INC_EPIL_1]], [[SUB_EPIL_2]] ; CHECK-NEXT: [[ARRAYIDX11_EPIL_2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[SUB10_SINK_EPIL_2]] -; CHECK-NEXT: [[TMP17:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_2]], align 4 -; CHECK-NEXT: [[SUB13_EPIL_2:%.*]] = sub i32 [[TEMP_1_EPIL_1]], [[TMP17]] -; CHECK-NEXT: br label [[FOR_INC_EPIL_2]] +; CHECK-NEXT: [[TMP11:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_2]], align 4 +; CHECK-NEXT: [[SUB13_EPIL_2:%.*]] = sub i32 [[TEMP_1_EPIL_1]], [[TMP11]] +; CHECK-NEXT: br label [[FOR_INC_EPIL_2:%.*]] ; CHECK: if.then.epil.2: ; CHECK-NEXT: [[COND_EPIL_2:%.*]] = zext i1 [[CMP4_EPIL_2]] to i32 ; CHECK-NEXT: [[ADD_EPIL_2:%.*]] = add i32 [[TEMP_1_EPIL_1]], [[COND_EPIL_2]] ; CHECK-NEXT: br label [[FOR_INC_EPIL_2]] ; CHECK: for.inc.epil.2: -; CHECK-NEXT: [[TEMP_1_EPIL_2]] = phi i32 [ [[ADD_EPIL_2]], [[IF_THEN_EPIL_2]] ], [ [[SUB13_EPIL_2]], [[IF_ELSE_EPIL_2]] ] +; CHECK-NEXT: [[TEMP_1_EPIL_2:%.*]] = phi i32 [ [[ADD_EPIL_2]], [[IF_THEN_EPIL_2]] ], [ [[SUB13_EPIL_2]], [[IF_ELSE_EPIL_2]] ] ; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] +; CHECK: for.cond.cleanup.loopexit.epilog-lcssa: +; CHECK-NEXT: [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], [[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1]], [[FOR_INC_EPIL_1]] ], [ [[TEMP_1_EPIL_2]], [[FOR_INC_EPIL_2]] ] +; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT]] +; CHECK: for.cond.cleanup.loopexit: +; CHECK-NEXT: [[TEMP_1_LCSSA:%.*]] = phi i32 [ [[TEMP_1_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[TEMP_1_LCSSA_PH1]], [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ] +; CHECK-NEXT: br label [[FOR_COND_CLEANUP]] +; CHECK: for.cond.cleanup: +; CHECK-NEXT: [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[TEMP_1_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ] +; CHECK-NEXT: store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4 +; CHECK-NEXT: ret void +; CHECK: for.body: +; CHECK-NEXT: [[TMP12:%.*]] = phi i32 [ [[DOTPRE]], [[FOR_BODY_LR_PH_NEW]] ], [ [[TMP23]], [[FOR_INC_3]] ] +; CHECK-NEXT: [[J_027:%.*]] = phi i32 [ 1, [[FOR_BODY_LR_PH_NEW]] ], [ [[INC_3]], [[FOR_INC_3]] ] +; CHECK-NEXT: [[TEMP_026:%.*]] = phi i32 [ 0, [[FOR_BODY_LR_PH_NEW]] ], [ [[TEMP_1_3]], [[FOR_INC_3]] ] +; CHECK-NEXT: [[NITER:%.*]] = phi i32 [ [[UNROLL_ITER]], [[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[FOR_INC_3]] ] +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[J_027]] +; CHECK-NEXT: [[TMP13:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 +; CHECK-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[TMP13]], 65535 +; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[J_027]] +; CHECK-NEXT: [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4 +; CHECK-NEXT: [[CMP4:%.*]] = icmp ugt i32 [[TMP14]], [[TMP12]] +; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]] +; CHECK: if.then: +; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP4]] to i32 +; CHECK-NEXT: [[ADD:%.*]] = add i32 [[TEMP_026]], [[COND]] +; CHECK-NEXT: br label [[FOR_INC:%.*]] +; CHECK: if.else: +; CHECK-NEXT: [[NOT_CMP4:%.*]] = xor i1 [[CMP4]], true +; CHECK-NEXT: [[SUB:%.*]] = sext i1 [[NOT_CMP4]] to i32 +; CHECK-NEXT: [[SUB10_SINK:%.*]] = add i32 [[J_027]], [[SUB]] +; CHECK-NEXT: [[ARRAYIDX11:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[SUB10_SINK]] +; CHECK-NEXT: [[TMP15:%.*]] = load i32, i32* [[ARRAYIDX11]], align 4 +; CHECK-NEXT: [[SUB13:%.*]] = sub i32 [[TEMP_026]], [[TMP15]] +; CHECK-NEXT: br label [[FOR_INC]] +; CHECK: for.inc: +; CHECK-NEXT: [[TEMP_1:%.*]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ [[SUB13]], [[IF_ELSE]] ] +; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[J_027]], 1 +; CHECK-NEXT: [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1 +; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* [[CONDITION]], i32 [[INC]] +; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4 +; CHECK-NEXT: [[CMP1_1:%.*]] = icmp ugt i32 [[TMP16]], 65535 +; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC]] +; CHECK-NEXT: [[TMP17:%.*]] = load i32, i32* [[ARRAYIDX2_1]], align 4 +; CHECK-NEXT: [[CMP4_1:%.*]] = icmp ugt i32 [[TMP17]], [[TMP14]] +; CHECK-NEXT: br i1 [[CMP1_1]], label [[IF_THEN_1:%.*]], label [[IF_ELSE_1:%.*]] ; CHECK: if.else.1: ; CHECK-NEXT: [[NOT_CMP4_1:%.*]] = xor i1 [[CMP4_1]], true ; CHECK-NEXT: [[SUB_1:%.*]] = sext i1 [[NOT_CMP4_1]] to i32 @@ -532,7 +532,7 @@ define void @test_four_blocks(i32* nocapture %Output, ; CHECK-NEXT: [[CMP1_2:%.*]] = icmp ugt i32 [[TMP19]], 65535 ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i32, i32* [[INPUT]], i32 [[INC_1]] ; CHECK-NEXT: [[TMP20:%.*]] = load i32, i32* [[ARRAYIDX2_2]], align 4 -; CHECK-NEXT: [[CMP4_2:%.*]] = icmp ugt i32 [[TMP20]], [[TMP11]] +; CHECK-NEXT: [[CMP4_2:%.*]] = icmp ugt i32 [[TMP20]], [[TMP17]] ; CHECK-NEXT: br i1 [[CMP1_2]], label [[IF_THEN_2:%.*]], label [[IF_ELSE_2:%.*]] ; CHECK: if.else.2: ; CHECK-NEXT: [[NOT_CMP4_2:%.*]] = xor i1 [[CMP4_2]], true @@ -742,10 +742,6 @@ define void @iterate_inc(%struct.Node* %n, i32 %limit) { ; CHECK-NEXT: [[TMP2:%.*]] = load %struct.Node*, %struct.Node** [[TMP1]], align 4 ; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq %struct.Node* [[TMP2]], null ; CHECK-NEXT: br i1 [[TOBOOL]], label [[WHILE_END_LOOPEXIT]], label [[LAND_RHS_1:%.*]] -; CHECK: while.end.loopexit: -; CHECK-NEXT: br label [[WHILE_END]] -; CHECK: while.end: -; CHECK-NEXT: ret void ; CHECK: land.rhs.1: ; CHECK-NEXT: [[VAL_1:%.*]] = getelementptr inbounds [[STRUCT_NODE]], %struct.Node* [[TMP2]], i32 0, i32 1 ; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[VAL_1]], align 4 @@ -782,6 +778,10 @@ define void @iterate_inc(%struct.Node* %n, i32 %limit) { ; CHECK-NEXT: [[TMP11]] = load %struct.Node*, %struct.Node** [[TMP10]], align 4 ; CHECK-NEXT: [[TOBOOL_3:%.*]] = icmp eq %struct.Node* [[TMP11]], null ; CHECK-NEXT: br i1 [[TOBOOL_3]], label [[WHILE_END_LOOPEXIT]], label [[LAND_RHS]] +; CHECK: while.end.loopexit: +; CHECK-NEXT: br label [[WHILE_END]] +; CHECK: while.end: +; CHECK-NEXT: ret void ; entry: %tobool5 = icmp eq %struct.Node* %n, null diff --git a/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll b/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll index ea18d3aa1054..33151c68b319 100644 --- a/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll +++ b/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll @@ -20,8 +20,6 @@ define void @test(i32* %x, i32 %n) { ; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[X]], i64 1 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i32 [[REM]], 1 ; CHECK-NEXT: br i1 [[CMP]], label [[WHILE_BODY_1:%.*]], label [[WHILE_END]] -; CHECK: while.end: -; CHECK-NEXT: ret void ; CHECK: while.body.1: ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4 ; CHECK-NEXT: [[CMP1_1:%.*]] = icmp slt i32 [[TMP1]], 10 @@ -40,6 +38,8 @@ define void @test(i32* %x, i32 %n) { ; CHECK: if.then.2: ; CHECK-NEXT: store i32 0, i32* [[INCDEC_PTR_1]], align 4 ; CHECK-NEXT: br label [[WHILE_END]] +; CHECK: while.end: +; CHECK-NEXT: ret void ; entry: %sub = add nsw i32 %n, -1 @@ -76,9 +76,9 @@ define i32 @test2(i32 %l86) { ; CHECK-NEXT: [[L86_OFF:%.*]] = add i32 [[L86:%.*]], -1 ; CHECK-NEXT: [[SWITCH:%.*]] = icmp ult i32 [[L86_OFF]], 24 ; CHECK-NEXT: [[DOTNOT30:%.*]] = icmp ne i32 [[L86]], 25 -; CHECK-NEXT: [[SPEC_SELECT24:%.*]] = zext i1 [[DOTNOT30]] to i32 -; CHECK-NEXT: [[COMMON_RET31_OP:%.*]] = select i1 [[SWITCH]], i32 0, i32 [[SPEC_SELECT24]] -; CHECK-NEXT: ret i32 [[COMMON_RET31_OP]] +; CHECK-NEXT: [[SPEC_SELECT:%.*]] = zext i1 [[DOTNOT30]] to i32 +; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = select i1 [[SWITCH]], i32 0, i32 [[SPEC_SELECT]] +; CHECK-NEXT: ret i32 [[COMMON_RET_OP]] ; entry: br label %for.body.i.i diff --git a/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll b/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll index 316051715584..cdc8e944715e 100644 --- a/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll +++ b/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll @@ -15,12 +15,12 @@ define void @s32_max1(i32 %n, i32* %p) { ; CHECK-NEXT: [[INC:%.*]] = add i32 [[N]], 1 ; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[N]], [[ADD]] ; CHECK-NEXT: br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]] -; CHECK: do.end: -; CHECK-NEXT: ret void ; CHECK: do.body.1: ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 [[INC]] ; CHECK-NEXT: store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4 ; CHECK-NEXT: br label [[DO_END]] +; CHECK: do.end: +; CHECK-NEXT: ret void ; entry: %add = add i32 %n, 1 @@ -51,8 +51,6 @@ define void @s32_max2(i32 %n, i32* %p) { ; CHECK-NEXT: [[INC:%.*]] = add i32 [[N]], 1 ; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[N]], [[ADD]] ; CHECK-NEXT: br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]] -; CHECK: do.end: -; CHECK-NEXT: ret void ; CHECK: do.body.1: ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 [[INC]] ; CHECK-NEXT: store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4 @@ -60,6 +58,8 @@ define void @s32_max2(i32 %n, i32* %p) { ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr i32, i32* [[P]], i32 [[INC_1]] ; CHECK-NEXT: store i32 [[INC_1]], i32* [[ARRAYIDX_2]], align 4 ; CHECK-NEXT: br label [[DO_END]] +; CHECK: do.end: +; CHECK-NEXT: ret void ; entry: %add = add i32 %n, 2 @@ -163,12 +163,12 @@ define void @u32_max1(i32 %n, i32* %p) { ; CHECK-NEXT: [[INC:%.*]] = add i32 [[N]], 1 ; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[N]], [[ADD]] ; CHECK-NEXT: br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]] -; CHECK: do.end: -; CHECK-NEXT: ret void ; CHECK: do.body.1: ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 [[INC]] ; CHECK-NEXT: store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4 ; CHECK-NEXT: br label [[DO_END]] +; CHECK: do.end: +; CHECK-NEXT: ret void ; entry: %add = add i32 %n, 1 @@ -199,8 +199,6 @@ define void @u32_max2(i32 %n, i32* %p) { ; CHECK-NEXT: [[INC:%.*]] = add i32 [[N]], 1 ; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[N]], [[ADD]] ; CHECK-NEXT: br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]] -; CHECK: do.end: -; CHECK-NEXT: ret void ; CHECK: do.body.1: ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 [[INC]] ; CHECK-NEXT: store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4 @@ -208,6 +206,8 @@ define void @u32_max2(i32 %n, i32* %p) { ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr i32, i32* [[P]], i32 [[INC_1]] ; CHECK-NEXT: store i32 [[INC_1]], i32* [[ARRAYIDX_2]], align 4 ; CHECK-NEXT: br label [[DO_END]] +; CHECK: do.end: +; CHECK-NEXT: ret void ; entry: %add = add i32 %n, 2 diff --git a/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll b/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll index 095a7c1e1dd1..b7d7e00fa0c9 100644 --- a/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll +++ b/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll @@ -34,11 +34,11 @@ define i1 @test_latch() { ; CHECK-NEXT: [[LOAD2_1:%.*]] = load i64, i64* [[GEP2_1]], align 8 ; CHECK-NEXT: [[EXITCOND2_1:%.*]] = icmp eq i64 [[LOAD1_1]], [[LOAD2_1]] ; CHECK-NEXT: br i1 [[EXITCOND2_1]], label [[LATCH_1:%.*]], label [[EXIT]] +; CHECK: latch.1: +; CHECK-NEXT: br label [[EXIT]] ; CHECK: exit: ; CHECK-NEXT: [[EXIT_VAL:%.*]] = phi i1 [ false, [[LOOP]] ], [ false, [[LATCH]] ], [ true, [[LATCH_1]] ] ; CHECK-NEXT: ret i1 [[EXIT_VAL]] -; CHECK: latch.1: -; CHECK-NEXT: br label [[EXIT]] ; start: %a1 = alloca [2 x i64], align 8 @@ -95,22 +95,22 @@ define i1 @test_non_latch() { ; CHECK-NEXT: [[LOAD2:%.*]] = load i64, i64* [[GEP2]], align 8 ; CHECK-NEXT: [[EXITCOND2:%.*]] = icmp eq i64 [[LOAD1]], [[LOAD2]] ; CHECK-NEXT: br i1 [[EXITCOND2]], label [[LOOP_1:%.*]], label [[EXIT:%.*]] -; CHECK: exit: -; CHECK-NEXT: [[EXIT_VAL:%.*]] = phi i1 [ false, [[LATCH]] ], [ false, [[LATCH_1:%.*]] ], [ true, [[LOOP_2:%.*]] ], [ false, [[LATCH_2:%.*]] ] -; CHECK-NEXT: ret i1 [[EXIT_VAL]] ; CHECK: loop.1: -; CHECK-NEXT: br label [[LATCH_1]] +; CHECK-NEXT: br label [[LATCH_1:%.*]] ; CHECK: latch.1: ; CHECK-NEXT: [[GEP1_1:%.*]] = getelementptr inbounds [2 x i64], [2 x i64]* [[A1]], i64 0, i64 1 ; CHECK-NEXT: [[GEP2_1:%.*]] = getelementptr inbounds [2 x i64], [2 x i64]* [[A2]], i64 0, i64 1 ; CHECK-NEXT: [[LOAD1_1:%.*]] = load i64, i64* [[GEP1_1]], align 8 ; CHECK-NEXT: [[LOAD2_1:%.*]] = load i64, i64* [[GEP2_1]], align 8 ; CHECK-NEXT: [[EXITCOND2_1:%.*]] = icmp eq i64 [[LOAD1_1]], [[LOAD2_1]] -; CHECK-NEXT: br i1 [[EXITCOND2_1]], label [[LOOP_2]], label [[EXIT]] +; CHECK-NEXT: br i1 [[EXITCOND2_1]], label [[LOOP_2:%.*]], label [[EXIT]] ; CHECK: loop.2: -; CHECK-NEXT: br i1 true, label [[EXIT]], label [[LATCH_2]] +; CHECK-NEXT: br i1 true, label [[EXIT]], label [[LATCH_2:%.*]] ; CHECK: latch.2: ; CHECK-NEXT: br label [[EXIT]] +; CHECK: exit: +; CHECK-NEXT: [[EXIT_VAL:%.*]] = phi i1 [ false, [[LATCH]] ], [ false, [[LATCH_1]] ], [ true, [[LOOP_2]] ], [ false, [[LATCH_2]] ] +; CHECK-NEXT: ret i1 [[EXIT_VAL]] ; start: %a1 = alloca [2 x i64], align 8 diff --git a/llvm/test/Transforms/LoopUnroll/multiple-exits.ll b/llvm/test/Transforms/LoopUnroll/multiple-exits.ll index 0bea86350b99..9f40f51c10e6 100644 --- a/llvm/test/Transforms/LoopUnroll/multiple-exits.ll +++ b/llvm/test/Transforms/LoopUnroll/multiple-exits.ll @@ -14,8 +14,6 @@ define void @test1() { ; CHECK-NEXT: call void @bar() ; CHECK-NEXT: call void @bar() ; CHECK-NEXT: br label [[LATCH_1:%.*]] -; CHECK: exit: -; CHECK-NEXT: ret void ; CHECK: latch.1: </cut>

4 years, 8 months

[TCWG CI] 482.sphinx3 slowed down by 6% after llvm: Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT(FMA is enabled).

by ci_notify＠linaro.org

After llvm commit f04e387055e495e3e14570087d68e93593cf2918 Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT(FMA is enabled). the following benchmarks slowed down by more than 2%: - 482.sphinx3 slowed down by 6% from 25875 to 27484 perf samples - 482.sphinx3:[.] mgau_eval slowed down by 12% from 9996 to 11165 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-f04e387055e495e3e14570087d68e93593cf2918 cd investigate-llvm-f04e387055e495e3e14570087d68e93593cf2918 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach f04e387055e495e3e14570087d68e93593cf2918 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 491beae71d69960a3bb0298b17d4ef1f3119b767 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f04e387055e495e3e14570087d68e93593cf2918 Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Tue Nov 9 09:35:25 2021 -0500 Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT(FMA is enabled). Fix for https://bugs.llvm.org/show_bug.cgi?id=50222 --- clang/docs/ReleaseNotes.rst | 10 +++ clang/docs/UsersManual.rst | 47 +++++++++++- clang/lib/Driver/ToolChains/Clang.cpp | 48 +++++++----- clang/test/CodeGen/ffp-contract-option.c | 127 +++++++++++++++++++++++++++++-- clang/test/CodeGen/ffp-model.c | 48 ++++++++++++ clang/test/CodeGen/ppc-emmintrin.c | 4 +- clang/test/CodeGen/ppc-xmmintrin.c | 4 +- clang/test/Driver/fp-model.c | 2 +- clang/test/Misc/ffp-contract.c | 10 +++ 9 files changed, 263 insertions(+), 37 deletions(-) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 57c5150becae..00582b689862 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -202,6 +202,16 @@ Arm and AArch64 Support in Clang architecture features, but will enable certain optimizations specific to Cortex-A57 CPUs and enable the use of a more accurate scheduling model. + +Floating Point Support in Clang +------------------------------- +- The -ffp-model=precise now implies -ffp-contract=on rather than + -ffp-contract=fast, and the documentation of these features has been + clarified. Previously, the documentation claimed that -ffp-model=precise was + the default, but this was incorrect because the precise model implied + -ffp-contract=fast, whereas the default behavior is -ffp-contract=on. + -ffp-model=precise is now exactly the default mode of the compiler. + Internal API Changes -------------------- diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 8c6922db6b37..406efb093d55 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,8 +1260,49 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior. The options -are listed below. +Clang provides a number of ways to control floating point behavior, including +with command line options and source pragmas. This section +describes the various floating point semantic modes and the corresponding options. + +.. csv-table:: Floating Point Semantic Modes + :header: "Mode", "Values" + :widths: 15, 30, 30 + + "ffp-exception-behavior", "{ignore, strict, may_trap}", + "fenv_access", "{off, on}", "(none)" + "frounding-math", "{dynamic, tonearest, downward, upward, towardzero}" + "ffp-contract", "{on, off, fast, fast-honor-pragmas}" + "fdenormal-fp-math", "{IEEE, PreserveSign, PositiveZero}" + "fdenormal-fp-math-fp32", "{IEEE, PreserveSign, PositiveZero}" + "fmath-errno", "{on, off}" + "fhonor-nans", "{on, off}" + "fhonor-infinities", "{on, off}" + "fsigned-zeros", "{on, off}" + "freciprocal-math", "{on, off}" + "allow_approximate_fns", "{on, off}" + "fassociative-math", "{on, off}" + +This table describes the option settings that correspond to the three +floating point semantic models: precise (the default), strict, and fast. + + +.. csv-table:: Floating Point Models + :header: "Mode", "Precise", "Strict", "Fast" + :widths: 25, 15, 15, 15 + + "except_behavior", "ignore", "strict", "ignore" + "fenv_access", "off", "on", "off" + "rounding_mode", "tonearest", "dynamic", "tonearest" + "contract", "on", "off", "fast" + "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" + "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" + "support_math_errno", "on", "on", "off" + "no_honor_nans", "off", "off", "on" + "no_honor_infinities", "off", "off", "on" + "no_signed_zeros", "off", "off", "on" + "allow_reciprocal", "off", "off", "on" + "allow_approximate_fns", "off", "off", "on" + "allow_reassociation", "off", "off", "on" .. option:: -ffast-math @@ -1467,7 +1508,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index e8ad105a7829..5d6f8e9fba0e 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2666,10 +2666,14 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = ""; + // CUDA and HIP don't rely on the frontend to pass an ffp-contract option. + // If one wasn't given by the user, don't pass it here. + StringRef FPContract; + if (!JA.isDeviceOffloading(Action::OFK_Cuda) && + !JA.isOffloading(Action::OFK_HIP)) + FPContract = "on"; bool StrictFPModel = false; - if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) { CmdArgs.push_back("-mlimit-float-precision"); CmdArgs.push_back(A->getValue()); @@ -2691,7 +2695,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = ""; + FPContract = "on"; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2711,12 +2715,10 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) { + if (!FPModel.empty() && !FPModel.equals(Val)) D.Diag(clang::diag::warn_drv_overriding_flag_option) - << Args.MakeArgString("-ffp-model=" + FPModel) - << Args.MakeArgString("-ffp-model=" + Val); - FPContract = ""; - } + << Args.MakeArgString("-ffp-model=" + FPModel) + << Args.MakeArgString("-ffp-model=" + Val); if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; @@ -2724,7 +2726,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "fast"; + FPContract = "on"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2812,9 +2814,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // -ffp-model=precise enables ffp-contract=fast as a side effect - // the FPContract value has already been set to a string literal - // and the Val string isn't a pertinent value. + // -ffp-model=precise enables ffp-contract=on. + // -ffp-model=precise sets PreciseFPModel to on and Val to + // "precise". FPContract is set. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2907,23 +2909,27 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, AssociativeMath = false; ReciprocalMath = false; SignedZeros = true; - TrappingMath = false; - RoundingFPMath = false; // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = ""; + if (!JA.isDeviceOffloading(Action::OFK_Cuda) && + !JA.isOffloading(Action::OFK_HIP)) + if (FPContract == "fast") { + FPContract = "on"; + D.Diag(clang::diag::warn_drv_overriding_flag_option) + << "-ffp-contract=fast" + << "-ffp-contract=on"; + } break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && - !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - (FPContract.equals("off") || FPContract.empty()) && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE()) + if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE() && + FPContract.equals("off")) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index 52b750795940..857a9c2369ef 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,9 +1,120 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s -// REQUIRES: aarch64-registered-target - -float fma_test1(float a, float b, float c) { -// CHECK: fmadd - float x = a * b; - float y = x + c; - return y; +// REQUIRES: x86-registered-target +// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ +// RUN:| FileCheck --check-prefixes CHECK,CHECK-DEFAULT %s + +// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ +// RUN:| FileCheck --check-prefixes CHECK,CHECK-DEFAULT %s + +// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ +// RUN:| FileCheck --check-prefixes CHECK,CHECK-ON %s + +// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ +// RUN:| FileCheck --check-prefixes CHECK,CHECK-CONTRACTFAST %s + +// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ +// RUN:| FileCheck --check-prefixes CHECK,CHECK-CONTRACTOFF %s + +// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm \ +// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-CONTRACTOFF %s + +// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm \ +// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-ONFAST %s + +// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm \ +// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-FASTFAST %s + +// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast -ffast-math %s \ +// RUN: -emit-llvm \ +// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-FASTFAST %s + +// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off -fmath-errno \ +// RUN: -fno-rounding-math %s -emit-llvm -o - \ +// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-NOFAST %s + +// RUN: %clang -S -emit-llvm -fno-fast-math %s -o - \ +// RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON + +// RUN: %clang -S -emit-llvm -ffp-contract=fast -fno-fast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON + +// RUN: %clang -S -emit-llvm -ffp-contract=on -fno-fast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON + +// RUN: %clang -S -emit-llvm -ffp-contract=off -fno-fast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-OFF + +// RUN: %clang -S -emit-llvm -ffp-model=fast -fno-fast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON + +// RUN: %clang -S -emit-llvm -ffp-model=precise -fno-fast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON + +// RUN: %clang -S -emit-llvm -ffp-model=strict -fno-fast-math \ +// RUN: -target x86_64 %s -o - | FileCheck %s \ +// RUN: --check-prefixes=CHECK,CHECK-FPSC-OFF + +// RUN: %clang -S -emit-llvm -ffast-math -fno-fast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON + +float mymuladd(float x, float y, float z) { + // CHECK: define{{.*}} float @mymuladd + return x * y + z; + // expected-warning{{overriding '-ffp-contract=fast' option with '-ffp-contract=on'}} + + // CHECK-DEFAULT: load float, float* + // CHECK-DEFAULT: fmul float + // CHECK-DEFAULT: load float, float* + // CHECK-DEFAULT: fadd float + + // CHECK-ON: load float, float* + // CHECK-ON: load float, float* + // CHECK-ON: load float, float* + // CHECK-ON: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}}) + + // CHECK-CONTRACTFAST: load float, float* + // CHECK-CONTRACTFAST: load float, float* + // CHECK-CONTRACTFAST: fmul contract float + // CHECK-CONTRACTFAST: load float, float* + // CHECK-CONTRACTFAST: fadd contract float + + // CHECK-CONTRACTOFF: load float, float* + // CHECK-CONTRACTOFF: load float, float* + // CHECK-CONTRACTOFF: fmul reassoc nnan ninf nsz arcp afn float + // CHECK-CONTRACTOFF: load float, float* + // CHECK-CONTRACTOFF: fadd reassoc nnan ninf nsz arcp afn float {{.*}}, {{.*}} + + // CHECK-ONFAST: load float, float* + // CHECK-ONFAST: load float, float* + // CHECK-ONFAST: load float, float* + // CHECK-ONFAST: call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}}) + + // CHECK-FASTFAST: load float, float* + // CHECK-FASTFAST: load float, float* + // CHECK-FASTFAST: fmul fast float + // CHECK-FASTFAST: load float, float* + // CHECK-FASTFAST: fadd fast float {{.*}}, {{.*}} + + // CHECK-NOFAST: load float, float* + // CHECK-NOFAST: load float, float* + // CHECK-NOFAST: fmul float {{.*}}, {{.*}} + // CHECK-NOFAST: load float, float* + // CHECK-NOFAST: fadd float {{.*}}, {{.*}} + + // CHECK-FPC-ON: load float, float* + // CHECK-FPC-ON: load float, float* + // CHECK-FPC-ON: load float, float* + // CHECK-FPC-ON: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}}) + + // CHECK-FPC-OFF: load float, float* + // CHECK-FPC-OFF: load float, float* + // CHECK-FPC-OFF: fmul float + // CHECK-FPC-OFF: load float, float* + // CHECK-FPC-OFF: fadd float {{.*}}, {{.*}} + + // CHECK-FFPC-OFF: load float, float* + // CHECK-FFPC-OFF: load float, float* + // CHECK-FPSC-OFF: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}}) + // CHECK-FPSC-OFF: load float, float* + // CHECK-FPSC-OFF: [[RES:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}}) + } diff --git a/clang/test/CodeGen/ffp-model.c b/clang/test/CodeGen/ffp-model.c new file mode 100644 index 000000000000..21dbc8de99aa --- /dev/null +++ b/clang/test/CodeGen/ffp-model.c @@ -0,0 +1,48 @@ +// REQUIRES: x86-registered-target +// RUN: %clang -S -emit-llvm -ffp-model=fast -emit-llvm %s -o - \ +// RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-FAST + +// RUN: %clang -S -emit-llvm -ffp-model=precise %s -o - \ +// RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-PRECISE + +// RUN: %clang -S -emit-llvm -ffp-model=strict %s -o - \ +// RUN: -target x86_64 | FileCheck %s --check-prefixes=CHECK,CHECK-STRICT + +// RUN: %clang -S -emit-llvm -ffp-model=strict -ffast-math \ +// RUN: -target x86_64 %s -o - | FileCheck %s \ +// RUN: --check-prefixes CHECK,CHECK-STRICT-FAST + +// RUN: %clang -S -emit-llvm -ffp-model=precise -ffast-math \ +// RUN: %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-FAST1 + +float mymuladd(float x, float y, float z) { + // CHECK: define{{.*}} float @mymuladd + return x * y + z; + + // CHECK-FAST: fmul fast float + // CHECK-FAST: load float, float* + // CHECK-FAST: fadd fast float + + // CHECK-PRECISE: load float, float* + // CHECK-PRECISE: load float, float* + // CHECK-PRECISE: load float, float* + // CHECK-PRECISE: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}}) + + // CHECK-STRICT: load float, float* + // CHECK-STRICT: load float, float* + // CHECK-STRICT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}}) + // CHECK-STRICT: load float, float* + // CHECK-STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}}) + + // CHECK-STRICT-FAST: load float, float* + // CHECK-STRICT-FAST: load float, float* + // CHECK-STRICT-FAST: call fast float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}}) + // CHECK-STRICT-FAST: load float, float* + // CHECK-STRICT-FAST: call fast float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}} + + // CHECK-FAST1: load float, float* + // CHECK-FAST1: load float, float* + // CHECK-FAST1: fmul fast float {{.*}}, {{.*}} + // CHECK-FAST1: load float, float* {{.*}} + // CHECK-FAST1: fadd fast float {{.*}}, {{.*}} +} diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index fa3801f50a01..4a246ff92d76 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index d3f18bfbb1e5..4aff7a7c3dda 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index 5fa9d110dd83..0824b3e2c596 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -99,7 +99,7 @@ // RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=fast" +// CHECK-FPM-PRECISE: "-ffp-contract=on" // CHECK-FPM-PRECISE: "-fno-rounding-math" // RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ diff --git a/clang/test/Misc/ffp-contract.c b/clang/test/Misc/ffp-contract.c new file mode 100644 index 000000000000..0d26905d4ef2 --- /dev/null +++ b/clang/test/Misc/ffp-contract.c @@ -0,0 +1,10 @@ +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin \ +// RUN: -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// REQUIRES: aarch64-registered-target + +float fma_test1(float a, float b, float c) { + // CHECK-FMADD: fmadd + float x = a * b; + float y = x + c; + return y; +} </cut>

4 years, 8 months

[ACTIVITY] week ending Nov. 14 2021

by Alex Bennée

VirtIO Initiative ([STR-9]) =========================== - project admin [STR-9] <https://linaro.atlassian.net/browse/STR-9> [upstream rust-vmm sync meeting] <https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…> [proposal] <https://github.com/rust-vmm/vhost-device/pull/57> vhost-device maintainer effort ([UM-196]) - did a bunch of review on [vhost-device crate] [UM-196] <https://linaro.atlassian.net/browse/UM-196> [vhost-device crate] <https://github.com/rust-vmm/vhost-device> QEMU Upstream Work ([UM-2]) =========================== [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - posted [kvm-unit-tests PATCH v3 0/3] GIC ITS tests Message-Id: <20211112114734.3058678-1-alex.bennee(a)linaro.org> - might as well flush the tree state as I left it - posted [RFC PATCH] hw/intc: clean-up error reporting for failed ITS cmd Message-Id: <20211112170454.3158925-1-alex.bennee(a)linaro.org> - re-based [mttcg tests to current state and fixed up] [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> [mttcg tests to current state and fixed up] <https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8> Completed Reviews [2/2] ======================= [PATCH v2 0/3] Some watchpoint-related patches Message-Id: <163662450348.125458.5494710452733592356.stgit@pasha-ThinkPad-X280> [PATCH 0/5] Update linux-headers + NOIRQ support for KVM gdbstub Message-Id: <20211111110604.207376-1-pbonzini(a)redhat.com> Absences ======== - none Current Review Queue ==================== TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com> =================================================================================================================== TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option Message-Id: <20211013010120.96851-1-sjg(a)chromium.org> =========================================================================================================== TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== TODO [PATCH] softmmu: fix watchpoint processing in icount mode Message-Id: <163101424137.678744.18360776310711795413.stgit@pasha-ThinkPad-X280> ============================================================================================================================================== -- Alex Bennée

4 years, 8 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain