- linaro-toolchain - lists.linaro.org

Interest in reproducing gcc-linaro-4.9-2016.02 arm-linux-gnueabihf target under darwin and linux aarch64 host

by jhgorse＠gmail.com

Hello, I see this release gcc-linaro-4.9-2016.02 for 86_64_arm-linux-gnueabihf: https://releases.linaro.org/components/toolchain/binaries/4.9-2016.02/arm-l… and would like to reproduce the toolchain for aarch64 hosts. I see that it was built with ABE, though I have generally been unsuccessful in getting ABE to work on aarch64 for this. I was looking for some build or ci breadcrumbs or documentation. What can you recommend? The motivation here is to support legacy development/testing from modern aarch64 hardware. Cheers, Joe Gorse

3 years, 3 months

1
0
0 0

[Activity] Week #10

by Thiago Jung Bauermann

Hello, # GDB support for ARMv9 Scalable Matrix Extension (SME) - Synced with Luis Machado to learn what the current status is. Read discussions in the linux-arm-kernel mailing list which he pointed to. - Read Arm architecture documentation about Neon, SVE, SVE2 and SME to familiarise myself with these features. - Basic setup / onboarding - Joined some internal and external mailing lists, IRC and Slack channels. - Read some company policy documents. - Researched models and got a quote for a work laptop. - Set up aarch64 cross-compilation environment on my laptop. - Set up emulated aarch64 machine with Fedora on my laptop. - Attempted setting up emulated aarch64 machine with Ubuntu on my laptop, but ran into problems with the Ubuntu Server installer. -- Thiago

3 years, 3 months

1
0
0 0

[ACTIVITY] week ending Mar. 6 2022

by Alex Bennée

Project Stratos =============== - spent some time talking through design approaches for xen vhost-master with Viresh Linux RPMB Sub-system and virtio-driver ([STR-40]) - continued working on [Linux driver] - discovered a bug in vhost-user config handling in QEMU as well [STR-40] <https://linaro.atlassian.net/browse/STR-40> [Linux driver] <http://git.linaro.org/people/alex.bennee/linux.git/shortlog/refs/heads/rpmb…> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL 00/18] testing and semihosting updates Message-Id: <20220301094715.550871-1-alex.bennee(a)linaro.org> Other ===== - started work on presentation for LTD Completed Reviews [5/5] ======================= [PATCH] gdbstub.c: add support for info proc mappings Message-Id: <20220221030910.3203063-1-dominik.b.czarnota(a)gmail.com> [PATCH] tests/Makefile.include: Let "make clean" remove the TCG tests, too Message-Id: <20220301085900.1443232-1-thuth(a)redhat.com> [PATCH 0/3] gdbstub: add support for switchable endianness Message-Id: <20210823142004.17935-1-changbin.du(a)gmail.com> [PATCH 0/6] More record/replay acceptance tests Message-Id: <162332427732.194926.7555369160312506539.stgit@pasha-ThinkPad-X280> [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> Absences ======== Current Review Queue ==================== TODO [PATCH v4 00/18] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220301215958.157011-1-richard.henderson(a)linaro.org> ===================================================================================================================================== TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

3 years, 4 months

1
0
0 0

[ACTIVITY] report week ending 4 Mar

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Looked at and sent patches to fix a minor decode error for Neon VLD1/VST1 that RTH found + softfreeze is next Tuesday -- sent out last big Arm pullreq before freeze, though there will probably need to be another smaller one + code review, respinning previously sent patches, looking at bug reports, all to get things in before freeze * QEMU-420 [GICv4 emulation] + All the GICv4.0 stuff is now code-complete, but testing and loose ends (like plumbing it into the virt board) will take a while still. -- PMM

3 years, 4 months

1
0
0 0

[weekly][linaro] report week ending 25 Feb

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] + Respins of a few patchsets that needed v2 + Looked at a few bugs since softfreeze for 7.0 is near + Amazingly my to-review queue is now almost empty * QEMU-420 [GICv4 emulation] + Implemented more of the redistributor code -- the last missing big piece is its handling of VMOVI, though there are also probably some loose ends to tidy up + Note that this isn't going to be in time for 7.0, so will likely go on the back-burner a bit in favour of release-critical items thanks -- PMM

3 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 27 2022

by Alex Bennée

Project Stratos =============== - spent more time troubleshooting Xen builds with Viresh Linux RPMB Sub-system and virtio-driver ([STR-40]) - continued working on [Linux driver] - discovered a bug in vhost-user config handling in QEMU as well [STR-40] <https://linaro.atlassian.net/browse/STR-40> [Linux driver] <http://git.linaro.org/people/alex.bennee/linux.git/shortlog/refs/heads/rpmb…> QEMU Upstream Work ([UM-2]) =========================== - follow-up on Analysis of slow distro boots in check-avocado (BootLinuxAarch64.test_virt_tcg*) Message-Id: <874k4xbqvp.fsf(a)linaro.org> - posted [PATCH v2 00/18] testing and semihosting pre-PR Message-Id: <20220225172021.3493923-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Current Review Queue ==================== TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== -- Alex Bennée

3 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 20 2022

by Alex Bennée

Project Stratos =============== - spent more time troubleshooting Xen builds with Viresh Linux RPMB Sub-system and virtio-driver ([STR-40]) - started working on v2 of the Linux driver QEMU Upstream Work ([UM-2]) =========================== - posted Analysis of slow distro boots in check-avocado (BootLinuxAarch64.test_virt_tcg*) Message-Id: <874k4xbqvp.fsf(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Current Review Queue ==================== TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== -- Alex Bennée

3 years, 4 months

1
0
0 0

[ACTIVITY] report week ending 18 Feb

by Peter Maydell

Progress (a report covering two half-weeks) * UM-2 [QEMU upstream maintainership] - lots of code review - fixed another bug in the armv7m clock framework code - refactoring patchset to trim some fat from a header that gets included by every C file in the build * QEMU-420 [GICv4 emulation] - CPU interface parts of GICv4 work are code-complete - started on the redistributor work -- PMM

3 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 13 2022

by Alex Bennée

Project Stratos =============== - posted Metadata and signalling channels for Zephyr virtio-backends on Xen Message-Id: <87h79bgd1m.fsf(a)linaro.org> - spent some time troubleshooting Xen builds with Viresh vhost-device maintainer effort ([UM-196]) - posted [a pull request in rust-vmm/community] [a pull request in rust-vmm/community] <elfeed:github.com#tag:github.com,2008:PullRequestEvent/20180885703> QEMU Upstream Work ([UM-2]) =========================== - posted [RFC PATCH] tcg/optimize: only read val after const check Message-Id: <20220209112142.3367525-1-alex.bennee(a)linaro.org> - posted [PULL 00/28] testing and plugin updates Message-Id: <20220209141529.3418384-1-alex.bennee(a)linaro.org> - triage for [qemu-x86_64 uses host libraries instead of emulated system libraries] - triage for [linux-user: substantial memory leak when threads are created and destroyed] - posted [RFC PATCH] linux-user: trap internal SIGABRT's Message-Id: <20220209112207.3368139-1-alex.bennee(a)linaro.org> - posted [PATCH v5 0/2] semihosting/next (SYS_HEAPINFO) Message-Id: <20220210113021.3799514-2-alex.bennee(a)linaro.org> - posted [PATCH v1 00/11] testing/next (docker, lcitool, ci, tcg) Message-Id: <20220211160309.335014-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> [qemu-x86_64 uses host libraries instead of emulated system libraries] <elfeed:gitlab.com#https://gitlab.com/qemu-project/qemu/-/issues/857> [linux-user: substantial memory leak when threads are created and destroyed] <elfeed:gitlab.com#https://gitlab.com/qemu-project/qemu/-/issues/866> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Completed Reviews [0/0] ======================= Absences ======== Current Review Queue ==================== TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

3 years, 4 months

1
0
0 0

Re: [TCWG CI] 401.bzip2 grew in size by 9% after llvm: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`

by Maxim Kuvyrkov

Hi Roman, Your below patch increased code-size of 401.bzip2 by 9% on 32-bit ARM when compiled with -Os. That’s quite a lot, would you please investigate whether this regression can be avoided? Please let me know if this doesn’t reproduce for you and I’ll try to help. Thank you, -- Maxim Kuvyrkov https://www.linaro.org > On 9 Feb 2022, at 17:10, ci_notify(a)linaro.org wrote: > > After llvm commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > Author: Roman Lebedev <lebedev.ri(a)gmail.com> > > [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` > > the following benchmarks grew in size by more than 1%: > - 401.bzip2 grew in size by 9% from 37909 to 41405 bytes > - 401.bzip2:[.] BZ2_decompress grew in size by 42% from 7664 to 10864 bytes > - 429.mcf grew in size by 2% from 7732 to 7908 bytes > > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: Clang + Glibc + LLVM Linker > - Version: all components were built from their tip of trunk > - Target: arm-linux-gnueabihf > - Compiler flags: -Os -mthumb > - Hardware: APM Mustang 8x X-Gene1 > > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os_LTO > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-77a0da926c9ea86afa9baf28158d79c7678fc6b9 > cd investigate-llvm-77a0da926c9ea86afa9baf28158d79c7678fc6b9 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach f59787084e09aeb787cb3be3103b2419ccd14163 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > Author: Roman Lebedev <lebedev.ri(a)gmail.com> > Date: Mon Feb 7 16:03:40 2022 +0300 > > [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` > > D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. > What it essentially does is prevents scalarized vectorization of masked memory operations: > ``` > // TODO: Cost model for emulated masked load/store is completely > // broken. This hack guides the cost model to use an artificially > // high enough value to practically disable vectorization with such > // operations, except where previously deployed legality hack allowed > // using very low cost values. This is to avoid regressions coming simply > // from moving "masked load/store" check from legality to cost model. > // Masked Load/Gather emulation was previously never allowed. > // Limited number of Masked Store/Scatter emulation was allowed. > ``` > > While i don't really understand about what specifically `is completely broken` > was talking about, i believe that at least on X86 with AVX2-or-later, > this is no longer true. (or at least, i would like to know what is still broken). > So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. > > But since this was added for X86 specifically, let's just instead completely remove this hack. > > Reviewed By: RKSimon > > Differential Revision: https://reviews.llvm.org/D114779 > --- > llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 34 +- > .../X86/masked-gather-i32-with-i8-index.ll | 40 +- > .../X86/masked-gather-i64-with-i8-index.ll | 40 +- > .../CostModel/X86/masked-interleaved-load-i16.ll | 36 +- > .../CostModel/X86/masked-interleaved-store-i16.ll | 24 +- > .../test/Analysis/CostModel/X86/masked-load-i16.ll | 46 +- > .../test/Analysis/CostModel/X86/masked-load-i32.ll | 16 +- > .../test/Analysis/CostModel/X86/masked-load-i64.ll | 16 +- > llvm/test/Analysis/CostModel/X86/masked-load-i8.ll | 46 +- > .../AArch64/tail-fold-uniform-memops.ll | 159 ++- > .../Transforms/LoopVectorize/X86/gather_scatter.ll | 1176 ++++++++++++++++---- > .../X86/x86-interleaved-accesses-masked-group.ll | 1041 ++++++++--------- > .../Transforms/LoopVectorize/if-pred-stores.ll | 6 +- > .../Transforms/LoopVectorize/memdep-fold-tail.ll | 6 +- > llvm/test/Transforms/LoopVectorize/optsize.ll | 837 +++++++++++--- > llvm/test/Transforms/LoopVectorize/tripcount.ll | 673 ++++++++++- > .../LoopVectorize/vplan-sink-scalars-and-merge.ll | 4 +- > 17 files changed, 3064 insertions(+), 1136 deletions(-) > > diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > index bfe08d42c883..ccce2c2a7b15 100644 > --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > @@ -307,11 +307,6 @@ static cl::opt<bool> InterleaveSmallLoopScalarReduction( > cl::desc("Enable interleaving for loops with small iteration counts that " > "contain scalar reductions to expose ILP.")); > > -/// The number of stores in a loop that are allowed to need predication. > -static cl::opt<unsigned> NumberOfStoresToPredicate( > - "vectorize-num-stores-pred", cl::init(1), cl::Hidden, > - cl::desc("Max number of stores to be predicated behind an if.")); > - > static cl::opt<bool> EnableIndVarRegisterHeur( > "enable-ind-var-reg-heur", cl::init(true), cl::Hidden, > cl::desc("Count the induction variable only once when interleaving")); > @@ -1797,10 +1792,6 @@ private: > /// as a vector operation. > bool isConsecutiveLoadOrStore(Instruction *I); > > - /// Returns true if an artificially high cost for emulated masked memrefs > - /// should be used. > - bool useEmulatedMaskMemRefHack(Instruction *I, ElementCount VF); > - > /// Map of scalar integer values to the smallest bitwidth they can be legally > /// represented as. The vector equivalents of these values should be truncated > /// to this type. > @@ -6437,22 +6428,6 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) { > return RUs; > } > > -bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I, > - ElementCount VF) { > - // TODO: Cost model for emulated masked load/store is completely > - // broken. This hack guides the cost model to use an artificially > - // high enough value to practically disable vectorization with such > - // operations, except where previously deployed legality hack allowed > - // using very low cost values. This is to avoid regressions coming simply > - // from moving "masked load/store" check from legality to cost model. > - // Masked Load/Gather emulation was previously never allowed. > - // Limited number of Masked Store/Scatter emulation was allowed. > - assert(isPredicatedInst(I, VF) && "Expecting a scalar emulated instruction"); > - return isa<LoadInst>(I) || > - (isa<StoreInst>(I) && > - NumPredStores > NumberOfStoresToPredicate); > -} > - > void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) { > // If we aren't vectorizing the loop, or if we've already collected the > // instructions to scalarize, there's nothing to do. Collection may already > @@ -6478,9 +6453,7 @@ void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) { > ScalarCostsTy ScalarCosts; > // Do not apply discount if scalable, because that would lead to > // invalid scalarization costs. > - // Do not apply discount logic if hacked cost is needed > - // for emulated masked memrefs. > - if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I, VF) && > + if (!VF.isScalable() && > computePredInstDiscount(&I, ScalarCosts, VF) >= 0) > ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end()); > // Remember that BB will remain after vectorization. > @@ -6736,11 +6709,6 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I, > Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()), > /*Insert=*/false, /*Extract=*/true); > Cost += TTI.getCFInstrCost(Instruction::Br, TTI::TCK_RecipThroughput); > - > - if (useEmulatedMaskMemRefHack(I, VF)) > - // Artificially setting to a high enough value to practically disable > - // vectorization with such operations. > - Cost = 3000000; > } > > return Cost; > diff --git a/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll b/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > index 62412a5d1af0..c52755b7d65c 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > @@ -17,30 +17,30 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > @@ -50,8 +50,8 @@ target triple = "x86_64-unknown-linux-gnu" > ; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX512: LV: Found an estimated cost of 22 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll b/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > index b8eba8b0327b..b38026c824b5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > @@ -17,30 +17,30 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > @@ -50,8 +50,8 @@ target triple = "x86_64-unknown-linux-gnu" > ; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX512: LV: Found an estimated cost of 24 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX512: LV: Found an estimated cost of 12 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > index d6bfdf9d3848..184e23a0128b 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > @@ -89,30 +89,30 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2" > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > @@ -164,17 +164,17 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test" > ; > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test" > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 7 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > > define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) { > diff --git a/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > index 5f67026737fc..224dd75a4dc5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > @@ -89,17 +89,17 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2 > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 23 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 50 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2" > ; > @@ -107,16 +107,16 @@ for.end: > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > > define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) { > entry: > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > index c8c3078f1625..2722a52c3d96 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > @@ -16,37 +16,37 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > index f74c9f044d0b..16c00cfc03b5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > @@ -16,16 +16,16 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX1: LV: Found an estimated cost of 3 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > index c5a7825348e9..1baeff242304 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > @@ -16,16 +16,16 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > index fc540da58700..99d0f28a03f8 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > @@ -16,37 +16,37 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > index bf0aba1931d1..8ce310962b48 100644 > --- a/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > +++ b/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > @@ -1,3 +1,4 @@ > +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py > ; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s | FileCheck %s > > ; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64 > @@ -9,21 +10,43 @@ target triple = "aarch64-linux-gnu" > ; we don't artificially create new predicated blocks for the load. > define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 { > ; CHECK-LABEL: @uniform_load( > +; CHECK-NEXT: entry: > +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] > +; CHECK: vector.ph: > +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], 3 > +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4 > +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] > +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] > ; CHECK: vector.body: > -; CHECK-NEXT: [[IDX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.*]], %vector.body ] > -; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0 > -; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n) > -; CHECK-NEXT: [[LOAD_VAL:%.*]] = load i32, i32* %src, align 4 > -; CHECK-NOT: load i32, i32* %src, align 4 > -; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[LOAD_VAL]], i32 0 > -; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> zeroinitializer > -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* %dst, i64 [[TMP3]] > -; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[TMP6]], i32 0 > -; CHECK-NEXT: [[STORE_PTR:%.*]] = bitcast i32* [[TMP7]] to <4 x i32>* > -; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[TMP5]], <4 x i32>* [[STORE_PTR]], i32 4, <4 x i1> [[LOOP_PRED]]) > -; CHECK-NEXT: [[IDX_NEXT]] = add i64 [[IDX]], 4 > -; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[IDX_NEXT]], %n.vec > -; CHECK-NEXT: br i1 [[CMP]], label %middle.block, label %vector.body > +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] > +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0 > +; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]]) > +; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[SRC:%.*]], align 4 > +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0 > +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer > +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[TMP2]], i32 0 > +; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <4 x i32>* > +; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[BROADCAST_SPLAT]], <4 x i32>* [[TMP4]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]]) > +; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 > +; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] > +; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > +; CHECK: middle.block: > +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] > +; CHECK: scalar.ph: > +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] > +; CHECK-NEXT: br label [[FOR_BODY:%.*]] > +; CHECK: for.body: > +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] > +; CHECK-NEXT: [[VAL:%.*]] = load i32, i32* [[SRC]], align 4 > +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[INDVARS_IV]] > +; CHECK-NEXT: store i32 [[VAL]], i32* [[ARRAYIDX]], align 4 > +; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 > +; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]] > +; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] > +; CHECK: for.end: > +; CHECK-NEXT: ret void > +; > > entry: > br label %for.body > @@ -47,18 +70,108 @@ for.end: ; preds = %for.body, %entry > ; and the original condition. > define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 { > ; CHECK-LABEL: @cond_uniform_load( > +; CHECK-NEXT: entry: > +; CHECK-NEXT: [[DST1:%.*]] = bitcast i32* [[DST:%.*]] to i8* > +; CHECK-NEXT: [[COND3:%.*]] = bitcast i32* [[COND:%.*]] to i8* > +; CHECK-NEXT: [[SRC6:%.*]] = bitcast i32* [[SRC:%.*]] to i8* > +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_MEMCHECK:%.*]] > +; CHECK: vector.memcheck: > +; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i32, i32* [[DST]], i64 [[N:%.*]] > +; CHECK-NEXT: [[SCEVGEP2:%.*]] = bitcast i32* [[SCEVGEP]] to i8* > +; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i32, i32* [[COND]], i64 [[N]] > +; CHECK-NEXT: [[SCEVGEP45:%.*]] = bitcast i32* [[SCEVGEP4]] to i8* > +; CHECK-NEXT: [[SCEVGEP7:%.*]] = getelementptr i32, i32* [[SRC]], i64 1 > +; CHECK-NEXT: [[SCEVGEP78:%.*]] = bitcast i32* [[SCEVGEP7]] to i8* > +; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult i8* [[DST1]], [[SCEVGEP45]] > +; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult i8* [[COND3]], [[SCEVGEP2]] > +; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] > +; CHECK-NEXT: [[BOUND09:%.*]] = icmp ult i8* [[DST1]], [[SCEVGEP78]] > +; CHECK-NEXT: [[BOUND110:%.*]] = icmp ult i8* [[SRC6]], [[SCEVGEP2]] > +; CHECK-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]] > +; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]] > +; CHECK-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]] > ; CHECK: vector.ph: > -; CHECK: [[TMP1:%.*]] = insertelement <4 x i32*> poison, i32* %src, i32 0 > -; CHECK-NEXT: [[SRC_SPLAT:%.*]] = shufflevector <4 x i32*> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer > +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], 3 > +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4 > +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] > +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] > ; CHECK: vector.body: > -; CHECK-NEXT: [[IDX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.*]], %vector.body ] > -; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0 > -; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n) > -; CHECK: [[COND_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* {{%.*}}, i32 4, <4 x i1> [[LOOP_PRED]], <4 x i32> poison) > -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[COND_LOAD]], zeroinitializer > +; CHECK-NEXT: [[INDEX12:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT19:%.*]], [[PRED_LOAD_CONTINUE18:%.*]] ] > +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX12]], 0 > +; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]]) > +; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[COND]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0 > +; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[TMP2]] to <4 x i32>* > +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison), !alias.scope !4 > +; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_MASKED_LOAD]], zeroinitializer > ; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true> > -; CHECK-NEXT: [[MASK:%.*]] = select <4 x i1> [[LOOP_PRED]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer > -; CHECK-NEXT: call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[SRC_SPLAT]], i32 4, <4 x i1> [[MASK]], <4 x i32> undef) > +; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer > +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP6]], i32 0 > +; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] > +; CHECK: pred.load.if: > +; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP8]], i32 0 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]] > +; CHECK: pred.load.continue: > +; CHECK-NEXT: [[TMP10:%.*]] = phi <4 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP9]], [[PRED_LOAD_IF]] ] > +; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i1> [[TMP6]], i32 1 > +; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_LOAD_IF13:%.*]], label [[PRED_LOAD_CONTINUE14:%.*]] > +; CHECK: pred.load.if13: > +; CHECK-NEXT: [[TMP12:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP12]], i32 1 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]] > +; CHECK: pred.load.continue14: > +; CHECK-NEXT: [[TMP14:%.*]] = phi <4 x i32> [ [[TMP10]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP13]], [[PRED_LOAD_IF13]] ] > +; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i1> [[TMP6]], i32 2 > +; CHECK-NEXT: br i1 [[TMP15]], label [[PRED_LOAD_IF15:%.*]], label [[PRED_LOAD_CONTINUE16:%.*]] > +; CHECK: pred.load.if15: > +; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i32> [[TMP14]], i32 [[TMP16]], i32 2 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE16]] > +; CHECK: pred.load.continue16: > +; CHECK-NEXT: [[TMP18:%.*]] = phi <4 x i32> [ [[TMP14]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP17]], [[PRED_LOAD_IF15]] ] > +; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i1> [[TMP6]], i32 3 > +; CHECK-NEXT: br i1 [[TMP19]], label [[PRED_LOAD_IF17:%.*]], label [[PRED_LOAD_CONTINUE18]] > +; CHECK: pred.load.if17: > +; CHECK-NEXT: [[TMP20:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i32> [[TMP18]], i32 [[TMP20]], i32 3 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE18]] > +; CHECK: pred.load.continue18: > +; CHECK-NEXT: [[TMP22:%.*]] = phi <4 x i32> [ [[TMP18]], [[PRED_LOAD_CONTINUE16]] ], [ [[TMP21]], [[PRED_LOAD_IF17]] ] > +; CHECK-NEXT: [[TMP23:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP4]], <4 x i1> zeroinitializer > +; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP23]], <4 x i32> zeroinitializer, <4 x i32> [[TMP22]] > +; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP25:%.*]] = or <4 x i1> [[TMP6]], [[TMP23]] > +; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, i32* [[TMP24]], i32 0 > +; CHECK-NEXT: [[TMP27:%.*]] = bitcast i32* [[TMP26]] to <4 x i32>* > +; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[PREDPHI]], <4 x i32>* [[TMP27]], i32 4, <4 x i1> [[TMP25]]), !alias.scope !9, !noalias !11 > +; CHECK-NEXT: [[INDEX_NEXT19]] = add i64 [[INDEX12]], 4 > +; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT19]], [[N_VEC]] > +; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] > +; CHECK: middle.block: > +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] > +; CHECK: scalar.ph: > +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ] > +; CHECK-NEXT: br label [[FOR_BODY:%.*]] > +; CHECK: for.body: > +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[IF_END:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] > +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[COND]], i64 [[INDEX]] > +; CHECK-NEXT: [[TMP29:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 > +; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP29]], 0 > +; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[IF_END]], label [[IF_THEN:%.*]] > +; CHECK: if.then: > +; CHECK-NEXT: [[TMP30:%.*]] = load i32, i32* [[SRC]], align 4 > +; CHECK-NEXT: br label [[IF_END]] > +; CHECK: if.end: > +; CHECK-NEXT: [[VAL_0:%.*]] = phi i32 [ [[TMP30]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ] > +; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[INDEX]] > +; CHECK-NEXT: store i32 [[VAL_0]], i32* [[ARRAYIDX1]], align 4 > +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1 > +; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N]] > +; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]] > +; CHECK: for.end: > +; CHECK-NEXT: ret void > +; > entry: > br label %for.body > > diff --git a/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll b/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > index def98e03030f..d13942e85466 100644 > --- a/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > +++ b/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > @@ -25,22 +25,22 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: iter.check: > ; AVX512-NEXT: br label [[VECTOR_BODY:%.*]] > ; AVX512: vector.body: > -; AVX512-NEXT: [[INDEX8:%.*]] = phi i64 [ 0, [[ITER_CHECK:%.*]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ] > -; AVX512-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ITER_CHECK:%.*]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ] > +; AVX512-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP1:%.*]] = bitcast i32* [[TMP0]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i32>, <16 x i32>* [[TMP1]], align 4 > ; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer > -; AVX512-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison) > ; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64> > ; AVX512-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], <16 x i64> [[TMP5]] > ; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef) > ; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01> > -; AVX512-NEXT: [[TMP8:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[TMP8:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP9:%.*]] = bitcast float* [[TMP8]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]]) > -; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX8]], 16 > +; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX7]], 16 > ; AVX512-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT]] > ; AVX512-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_1:%.*]] = load <16 x i32>, <16 x i32>* [[TMP11]], align 4 > @@ -55,7 +55,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP18:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT]] > ; AVX512-NEXT: [[TMP19:%.*]] = bitcast float* [[TMP18]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]]) > -; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX8]], 32 > +; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX7]], 32 > ; AVX512-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT_1]] > ; AVX512-NEXT: [[TMP21:%.*]] = bitcast i32* [[TMP20]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_2:%.*]] = load <16 x i32>, <16 x i32>* [[TMP21]], align 4 > @@ -70,7 +70,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP28:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT_1]] > ; AVX512-NEXT: [[TMP29:%.*]] = bitcast float* [[TMP28]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]]) > -; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX8]], 48 > +; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX7]], 48 > ; AVX512-NEXT: [[TMP30:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT_2]] > ; AVX512-NEXT: [[TMP31:%.*]] = bitcast i32* [[TMP30]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_3:%.*]] = load <16 x i32>, <16 x i32>* [[TMP31]], align 4 > @@ -85,7 +85,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP38:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT_2]] > ; AVX512-NEXT: [[TMP39:%.*]] = bitcast float* [[TMP38]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]]) > -; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX8]], 64 > +; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX7]], 64 > ; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096 > ; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > ; AVX512: for.end: > @@ -95,8 +95,8 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: entry: > ; FVW2-NEXT: br label [[VECTOR_BODY:%.*]] > ; FVW2: vector.body: > -; FVW2-NEXT: [[INDEX17:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] > -; FVW2-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX17]] > +; FVW2-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_LOAD_CONTINUE27:%.*]] ] > +; FVW2-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX7]] > ; FVW2-NEXT: [[TMP1:%.*]] = bitcast i32* [[TMP0]] to <2 x i32>* > ; FVW2-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 4 > ; FVW2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP0]], i64 2 > @@ -112,7 +112,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer > ; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer > ; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer > -; FVW2-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX17]] > +; FVW2-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX7]] > ; FVW2-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <2 x i32>* > ; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison) > ; FVW2-NEXT: [[TMP14:%.*]] = getelementptr i32, i32* [[TMP12]], i64 2 > @@ -128,33 +128,105 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64> > ; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64> > ; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64> > -; FVW2-NEXT: [[TMP24:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], <2 x i64> [[TMP20]] > -; FVW2-NEXT: [[TMP25:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP21]] > -; FVW2-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP22]] > -; FVW2-NEXT: [[TMP27:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP23]] > -; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP24]], i32 4, <2 x i1> [[TMP8]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER14:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP25]], i32 4, <2 x i1> [[TMP9]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER15:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP26]], i32 4, <2 x i1> [[TMP10]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER16:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP27]], i32 4, <2 x i1> [[TMP11]], <2 x float> undef) > -; FVW2-NEXT: [[TMP28:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP29:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER14]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP30:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER15]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP31:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER16]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP32:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX17]] > -; FVW2-NEXT: [[TMP33:%.*]] = bitcast float* [[TMP32]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP28]], <2 x float>* [[TMP33]], i32 4, <2 x i1> [[TMP8]]) > -; FVW2-NEXT: [[TMP34:%.*]] = getelementptr float, float* [[TMP32]], i64 2 > -; FVW2-NEXT: [[TMP35:%.*]] = bitcast float* [[TMP34]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP29]], <2 x float>* [[TMP35]], i32 4, <2 x i1> [[TMP9]]) > -; FVW2-NEXT: [[TMP36:%.*]] = getelementptr float, float* [[TMP32]], i64 4 > -; FVW2-NEXT: [[TMP37:%.*]] = bitcast float* [[TMP36]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP30]], <2 x float>* [[TMP37]], i32 4, <2 x i1> [[TMP10]]) > -; FVW2-NEXT: [[TMP38:%.*]] = getelementptr float, float* [[TMP32]], i64 6 > -; FVW2-NEXT: [[TMP39:%.*]] = bitcast float* [[TMP38]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP31]], <2 x float>* [[TMP39]], i32 4, <2 x i1> [[TMP11]]) > -; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX17]], 8 > -; FVW2-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 > -; FVW2-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > +; FVW2-NEXT: [[TMP24:%.*]] = extractelement <2 x i1> [[TMP8]], i64 0 > +; FVW2-NEXT: br i1 [[TMP24]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] > +; FVW2: pred.load.if: > +; FVW2-NEXT: [[TMP25:%.*]] = extractelement <2 x i64> [[TMP20]], i64 0 > +; FVW2-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], i64 [[TMP25]] > +; FVW2-NEXT: [[TMP27:%.*]] = load float, float* [[TMP26]], align 4 > +; FVW2-NEXT: [[TMP28:%.*]] = insertelement <2 x float> poison, float [[TMP27]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]] > +; FVW2: pred.load.continue: > +; FVW2-NEXT: [[TMP29:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP28]], [[PRED_LOAD_IF]] ] > +; FVW2-NEXT: [[TMP30:%.*]] = extractelement <2 x i1> [[TMP8]], i64 1 > +; FVW2-NEXT: br i1 [[TMP30]], label [[PRED_LOAD_IF14:%.*]], label [[PRED_LOAD_CONTINUE15:%.*]] > +; FVW2: pred.load.if14: > +; FVW2-NEXT: [[TMP31:%.*]] = extractelement <2 x i64> [[TMP20]], i64 1 > +; FVW2-NEXT: [[TMP32:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP31]] > +; FVW2-NEXT: [[TMP33:%.*]] = load float, float* [[TMP32]], align 4 > +; FVW2-NEXT: [[TMP34:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP33]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]] > +; FVW2: pred.load.continue15: > +; FVW2-NEXT: [[TMP35:%.*]] = phi <2 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], [[PRED_LOAD_IF14]] ] > +; FVW2-NEXT: [[TMP36:%.*]] = extractelement <2 x i1> [[TMP9]], i64 0 > +; FVW2-NEXT: br i1 [[TMP36]], label [[PRED_LOAD_IF16:%.*]], label [[PRED_LOAD_CONTINUE17:%.*]] > +; FVW2: pred.load.if16: > +; FVW2-NEXT: [[TMP37:%.*]] = extractelement <2 x i64> [[TMP21]], i64 0 > +; FVW2-NEXT: [[TMP38:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP37]] > +; FVW2-NEXT: [[TMP39:%.*]] = load float, float* [[TMP38]], align 4 > +; FVW2-NEXT: [[TMP40:%.*]] = insertelement <2 x float> poison, float [[TMP39]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]] > +; FVW2: pred.load.continue17: > +; FVW2-NEXT: [[TMP41:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE15]] ], [ [[TMP40]], [[PRED_LOAD_IF16]] ] > +; FVW2-NEXT: [[TMP42:%.*]] = extractelement <2 x i1> [[TMP9]], i64 1 > +; FVW2-NEXT: br i1 [[TMP42]], label [[PRED_LOAD_IF18:%.*]], label [[PRED_LOAD_CONTINUE19:%.*]] > +; FVW2: pred.load.if18: > +; FVW2-NEXT: [[TMP43:%.*]] = extractelement <2 x i64> [[TMP21]], i64 1 > +; FVW2-NEXT: [[TMP44:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP43]] > +; FVW2-NEXT: [[TMP45:%.*]] = load float, float* [[TMP44]], align 4 > +; FVW2-NEXT: [[TMP46:%.*]] = insertelement <2 x float> [[TMP41]], float [[TMP45]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]] > +; FVW2: pred.load.continue19: > +; FVW2-NEXT: [[TMP47:%.*]] = phi <2 x float> [ [[TMP41]], [[PRED_LOAD_CONTINUE17]] ], [ [[TMP46]], [[PRED_LOAD_IF18]] ] > +; FVW2-NEXT: [[TMP48:%.*]] = extractelement <2 x i1> [[TMP10]], i64 0 > +; FVW2-NEXT: br i1 [[TMP48]], label [[PRED_LOAD_IF20:%.*]], label [[PRED_LOAD_CONTINUE21:%.*]] > +; FVW2: pred.load.if20: > +; FVW2-NEXT: [[TMP49:%.*]] = extractelement <2 x i64> [[TMP22]], i64 0 > +; FVW2-NEXT: [[TMP50:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP49]] > +; FVW2-NEXT: [[TMP51:%.*]] = load float, float* [[TMP50]], align 4 > +; FVW2-NEXT: [[TMP52:%.*]] = insertelement <2 x float> poison, float [[TMP51]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]] > +; FVW2: pred.load.continue21: > +; FVW2-NEXT: [[TMP53:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE19]] ], [ [[TMP52]], [[PRED_LOAD_IF20]] ] > +; FVW2-NEXT: [[TMP54:%.*]] = extractelement <2 x i1> [[TMP10]], i64 1 > +; FVW2-NEXT: br i1 [[TMP54]], label [[PRED_LOAD_IF22:%.*]], label [[PRED_LOAD_CONTINUE23:%.*]] > +; FVW2: pred.load.if22: > +; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i64> [[TMP22]], i64 1 > +; FVW2-NEXT: [[TMP56:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP55]] > +; FVW2-NEXT: [[TMP57:%.*]] = load float, float* [[TMP56]], align 4 > +; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> [[TMP53]], float [[TMP57]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE23]] > +; FVW2: pred.load.continue23: > +; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ [[TMP53]], [[PRED_LOAD_CONTINUE21]] ], [ [[TMP58]], [[PRED_LOAD_IF22]] ] > +; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP11]], i64 0 > +; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF24:%.*]], label [[PRED_LOAD_CONTINUE25:%.*]] > +; FVW2: pred.load.if24: > +; FVW2-NEXT: [[TMP61:%.*]] = extractelement <2 x i64> [[TMP23]], i64 0 > +; FVW2-NEXT: [[TMP62:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP61]] > +; FVW2-NEXT: [[TMP63:%.*]] = load float, float* [[TMP62]], align 4 > +; FVW2-NEXT: [[TMP64:%.*]] = insertelement <2 x float> poison, float [[TMP63]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE25]] > +; FVW2: pred.load.continue25: > +; FVW2-NEXT: [[TMP65:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE23]] ], [ [[TMP64]], [[PRED_LOAD_IF24]] ] > +; FVW2-NEXT: [[TMP66:%.*]] = extractelement <2 x i1> [[TMP11]], i64 1 > +; FVW2-NEXT: br i1 [[TMP66]], label [[PRED_LOAD_IF26:%.*]], label [[PRED_LOAD_CONTINUE27]] > +; FVW2: pred.load.if26: > +; FVW2-NEXT: [[TMP67:%.*]] = extractelement <2 x i64> [[TMP23]], i64 1 > +; FVW2-NEXT: [[TMP68:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP67]] > +; FVW2-NEXT: [[TMP69:%.*]] = load float, float* [[TMP68]], align 4 > +; FVW2-NEXT: [[TMP70:%.*]] = insertelement <2 x float> [[TMP65]], float [[TMP69]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE27]] > +; FVW2: pred.load.continue27: > +; FVW2-NEXT: [[TMP71:%.*]] = phi <2 x float> [ [[TMP65]], [[PRED_LOAD_CONTINUE25]] ], [ [[TMP70]], [[PRED_LOAD_IF26]] ] > +; FVW2-NEXT: [[TMP72:%.*]] = fadd <2 x float> [[TMP35]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP73:%.*]] = fadd <2 x float> [[TMP47]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP74:%.*]] = fadd <2 x float> [[TMP59]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP71]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP76:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX7]] > +; FVW2-NEXT: [[TMP77:%.*]] = bitcast float* [[TMP76]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP72]], <2 x float>* [[TMP77]], i32 4, <2 x i1> [[TMP8]]) > +; FVW2-NEXT: [[TMP78:%.*]] = getelementptr float, float* [[TMP76]], i64 2 > +; FVW2-NEXT: [[TMP79:%.*]] = bitcast float* [[TMP78]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP73]], <2 x float>* [[TMP79]], i32 4, <2 x i1> [[TMP9]]) > +; FVW2-NEXT: [[TMP80:%.*]] = getelementptr float, float* [[TMP76]], i64 4 > +; FVW2-NEXT: [[TMP81:%.*]] = bitcast float* [[TMP80]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP74]], <2 x float>* [[TMP81]], i32 4, <2 x i1> [[TMP10]]) > +; FVW2-NEXT: [[TMP82:%.*]] = getelementptr float, float* [[TMP76]], i64 6 > +; FVW2-NEXT: [[TMP83:%.*]] = bitcast float* [[TMP82]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP75]], <2 x float>* [[TMP83]], i32 4, <2 x i1> [[TMP11]]) > +; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8 > +; FVW2-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 > +; FVW2-NEXT: br i1 [[TMP84]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > ; FVW2: for.end: > ; FVW2-NEXT: ret void > ; > @@ -365,40 +437,186 @@ define void @foo2(%struct.In* noalias %in, float* noalias %out, i32* noalias %tr > ; FVW2-NEXT: entry: > ; FVW2-NEXT: br label [[VECTOR_BODY:%.*]] > ; FVW2: vector.body: > -; FVW2-NEXT: [[INDEX10:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE9:%.*]] ] > -; FVW2-NEXT: [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.*]], [[PRED_STORE_CONTINUE9]] ] > -; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4 > +; FVW2-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE35:%.*]] ] > +; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4 > ; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16 > -; FVW2-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[OFFSET_IDX]] > -; FVW2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP0]] > -; FVW2-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP1]], align 4 > -; FVW2-NEXT: [[TMP4:%.*]] = load i32, i32* [[TMP2]], align 4 > -; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0 > -; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1 > -; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer > -; FVW2-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_IN:%.*]], %struct.In* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1 > -; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef) > -; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0 > -; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] > +; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32 > +; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48 > +; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64 > +; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80 > +; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96 > +; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112 > +; FVW2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[OFFSET_IDX]] > +; FVW2-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP0]] > +; FVW2-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP1]] > +; FVW2-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP2]] > +; FVW2-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP3]] > +; FVW2-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP4]] > +; FVW2-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP5]] > +; FVW2-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP6]] > +; FVW2-NEXT: [[TMP15:%.*]] = load i32, i32* [[TMP7]], align 4 > +; FVW2-NEXT: [[TMP16:%.*]] = load i32, i32* [[TMP8]], align 4 > +; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0 > +; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1 > +; FVW2-NEXT: [[TMP19:%.*]] = load i32, i32* [[TMP9]], align 4 > +; FVW2-NEXT: [[TMP20:%.*]] = load i32, i32* [[TMP10]], align 4 > +; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0 > </cut>

3 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 6 2022

by Alex Bennée

Project Stratos =============== - posted [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working) Message-Id: <20220121151534.3654562-1-alex.bennee(a)linaro.org> - need to increase coverage of the QEMU boilerplate to get it merged - discussions on next steps with SCMI backend with Vincent (moving from the QEMU->QEMU PoC) QEMU Upstream Work ([UM-2]) =========================== - posted [PATCH v2 00/25] testing and plugin updates Message-Id: <20220201182050.15087-1-alex.bennee(a)linaro.org> - posted [RFC PATCH 0/4] improve coverage of vector backend Message-Id: <20220202191242.652607-2-alex.bennee(a)linaro.org> - posted [PATCH v3 00/26] testing and plugins pre-PR Message-Id: <20220204204335.1689602-1-alex.bennee(a)linaro.org> - posted [RFC PATCH] arm: force flag recalculation when messing with DAIF Message-Id: <20220202122353.457084-1-alex.bennee(a)linaro.org> - trying to track down a weird TLS bug: <https://gitlab.com/stsquad/qemu/-/jobs/2056025874#L3532> - on aarch64 HW, running qemu-s390x with a simple test case fails every 100/200 times - seems TLS memory gets made non-accessible (rw-p -> ---p, except to gdb) - strace doesn't show a culprit, possible kernel bug? [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Other ===== - planning and brainstorming for Linaro Tech Day Completed Reviews [5/5] ======================= [PATCH v4 00/42] CXl 2.0 emulation Support Message-Id: <20220124171705.10432-1-Jonathan.Cameron(a)huawei.com> [PATCH] gitlab: fall back to commit hash in qemu-setup filename Message-Id: <20220125173454.10381-1-stefanha(a)redhat.com> [PATCH for-7.0] gitlab-ci: Add cirrus-ci based tests for NetBSD and OpenBSD Message-Id: <20211209103124.121942-1-thuth(a)redhat.com> [PATCH 00/20] tcg: vector improvements Message-Id: <20211218194250.247633-1-richard.henderson(a)linaro.org> Absences ======== Current Review Queue ==================== TODO [PATCH 0/4] target/arm: SVE fixes versus VHE Message-Id: <20220127063428.30212-1-richard.henderson(a)linaro.org> ================================================================================================================== TODO [PATCH 00/14] arm_gicv3_its: Implement MOVI and MOVALL commands Message-Id: <20220122182444.724087-1-peter.maydell(a)linaro.org> ================================================================================================================================== TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com> ============================================================================================================================================== TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> ====================================================================================================================================== -- Alex Bennée

3 years, 5 months

1
0
0 0

[ACTIVITY] report week ending 4 Feb

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Fixed some minor issues with the hvf accelerator and sent out a patchset + '-cpu max' didn't act like '-cpu host' + we weren't exposing PAuth to the guest * QEMU-420 [GICv4 emulation] - Sent out a patchset with more cleanups and fixes to the existing ITS code - The ITS parts of the GICv4 work are now code-complete; moving on to the redistributor end of things next week. -- PMM

3 years, 5 months

1
0
0 0

[ACTIVITY] report week ending 28 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Before the QEMU 7.0 release we tried to land a bug fix which corrected the handling in our PSCI emulation of calls where the function ID is unrecognized -- these are supposed to return an error code. The bugfix turned out to cause regressions for some boards when running guest code at EL3 (because those boards were incorrectly enabling PSCI emulation in that situation). Sent a patchset that fixed those boards so we don't enable PSCI when running EL3 guests, and re-introduced the original PSCI bugfix. - Fixed various bugs in the highbank/midway boards discovered in the process of writing and testing the above patchset. (These two boards were the most complicated to fix.) - More code review, and sent out an arm pullrequest - Small handful of other minor patches -- PMM

3 years, 5 months

1
0
0 0

tsan buildbot failure possibly due to DWARFv5 switch

by David Blaikie

Seems like my change to make Clang default to DWARFv5 might've caused a buildbot failure on your build worker here: https://lab.llvm.org/buildbot/#/builders/185/builds/1295 But I seem to be able to run this test successfully locally on my Linux machine - so I'm wondering if you can offer any help diagnosing the issue showing up on your builder/worker?

3 years, 5 months

2
2
0 0

[ACTIVITY] week ending Jan. 23 2022

by Alex Bennée

Project Stratos =============== - [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working) Message-Id: <20220121151534.3654562-1-alex.bennee(a)linaro.org> - trying to clear the way for merging virtio-gpio to QEMU vhost-device maintainer effort ([UM-196]) - reviewed vhost-device [pr7 with the vm-virtio vsock abstraction] [UM-196] <https://linaro.atlassian.net/browse/UM-196> [pr7 with the vm-virtio vsock abstraction] <https://github.com/stsquad/vhost-device/tree/review/pr7-with-laurat-abstrac…> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL v2 00/31] testing/next and other misc fixes Message-Id: <20220118190043.1427303-1-alex.bennee(a)linaro.org> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> Completed Reviews [2/2] ======================= [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> [PATCH v2 0/6] qtests/libqos: Allow PCI tests to be run with virt-machine Message-Id: <20220118203833.316741-7-eric.auger(a)redhat.com> Absences ======== Current Review Queue ==================== TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com> ============================================================================================================================================== TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> ====================================================================================================================================== TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org> ============================================================================================================================ TODO [PATCH 0/7] tcg: some small towards more modular tcg Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com> ================================================================================================================= -- Alex Bennée

3 years, 5 months

1
0
0 0

[ACTIVITY] report week ending 21 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Sent patches for some reported bugs to do with state save/load * QEMU-420 [GICv4 emulation] - Wrote patches to implement the missing MOVALL and MOVI commands - Fixed a few minor bugs noticed along the way - Should be able to send out a patchset early next week and then can get back to the new-in-GICv4 work -- PMM

3 years, 5 months

1
0
0 0

[TCWG CI] Regression caused by gcc: Add -Wdangling-pointer [PR63272].

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Add -Wdangling-pointer [PR63272].: commit 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 Author: Martin Sebor <msebor(a)redhat.com> Add -Wdangling-pointer [PR63272]. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21324 # First few build errors in logs: # 00:03:31 sound/core/oss/mixer_oss.c:1057:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized] # 00:03:32 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized] # 00:03:32 sound/core/oss/pcm_oss.c:2488:34: error: ‘setup’ is used uninitialized [-Werror=uninitialized] # 00:03:32 sound/core/oss/pcm_oss.c:2998:51: error: ‘template’ is used uninitialized [-Werror=uninitialized] # 00:03:35 make[3]: *** [scripts/Makefile.build:277: sound/core/oss/mixer_oss.o] Error 1 # 00:03:35 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:03:35 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:03:36 make[4]: *** [scripts/Makefile.build:277: sound/core/seq/oss/seq_oss_init.o] Error 1 # 00:03:40 make[3]: *** [scripts/Makefile.build:277: sound/core/oss/pcm_oss.o] Error 1 # 00:03:50 make[3]: *** [scripts/Makefile.build:540: sound/core/seq/oss] Error 2 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21354 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-stable-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Reproduce builds: <cut> mkdir investigate-gcc-9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 cd investigate-gcc-9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 671a283636de75f7ed638ee6b01ed2d44361b8b6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 Author: Martin Sebor <msebor(a)redhat.com> Date: Sat Jan 15 16:41:40 2022 -0700 Add -Wdangling-pointer [PR63272]. Resolves: PR c/63272 - GCC should warn when using pointer to dead scoped variable with in the same function gcc/c-family/ChangeLog: PR c/63272 * c.opt (-Wdangling-pointer): New option. gcc/ChangeLog: PR c/63272 * diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle -Wdangling-pointer. * doc/invoke.texi (-Wdangling-pointer): Document new option. * gimple-ssa-warn-access.cc (pass_waccess::clone): Set new member. (pass_waccess::check_pointer_uses): New function. (pass_waccess::gimple_call_return_arg): New function. (pass_waccess::gimple_call_return_arg_ref): New function. (pass_waccess::check_call_dangling): New function. (pass_waccess::check_dangling_uses): New function overloads. (pass_waccess::check_dangling_stores): New function. (pass_waccess::check_dangling_stores): New function. (pass_waccess::m_clobbers): New data member. (pass_waccess::m_func): New data member. (pass_waccess::m_run_number): New data member. (pass_waccess::m_check_dangling_p): New data member. (pass_waccess::check_alloca): Check m_early_checks_p. (pass_waccess::check_alloc_size_call): Same. (pass_waccess::check_strcat): Same. (pass_waccess::check_strncat): Same. (pass_waccess::check_stxcpy): Same. (pass_waccess::check_stxncpy): Same. (pass_waccess::check_strncmp): Same. (pass_waccess::check_memop_access): Same. (pass_waccess::check_read_access): Same. (pass_waccess::check_builtin): Call check_pointer_uses. (pass_waccess::warn_invalid_pointer): Add arguments. (is_auto_decl): New function. (pass_waccess::check_stmt): New function. (pass_waccess::check_block): Call check_stmt. (pass_waccess::execute): Call check_dangling_uses, check_dangling_stores. Empty m_clobbers. * passes.def (pass_warn_access): Invoke pass two more times. gcc/testsuite/ChangeLog: PR c/63272 * g++.dg/warn/Wfree-nonheap-object-6.C: Disable valid warnings. * g++.dg/warn/ref-temp1.C: Prune expected warning. * gcc.dg/uninit-pr50476.c: Expect a new warning. * c-c++-common/Wdangling-pointer-2.c: New test. * c-c++-common/Wdangling-pointer-3.c: New test. * c-c++-common/Wdangling-pointer-4.c: New test. * c-c++-common/Wdangling-pointer-5.c: New test. * c-c++-common/Wdangling-pointer-6.c: New test. * c-c++-common/Wdangling-pointer.c: New test. * g++.dg/warn/Wdangling-pointer-2.C: New test. * g++.dg/warn/Wdangling-pointer.C: New test. * gcc.dg/Wdangling-pointer-2.c: New test. * gcc.dg/Wdangling-pointer.c: New test. --- gcc/c-family/c.opt | 8 + gcc/diagnostic-spec.c | 1 + gcc/doc/invoke.texi | 62 +- gcc/gimple-ssa-warn-access.cc | 635 +++++++++++++++++++-- gcc/passes.def | 5 +- gcc/testsuite/c-c++-common/Wdangling-pointer-2.c | 437 ++++++++++++++ gcc/testsuite/c-c++-common/Wdangling-pointer-3.c | 64 +++ gcc/testsuite/c-c++-common/Wdangling-pointer-4.c | 73 +++ gcc/testsuite/c-c++-common/Wdangling-pointer-5.c | 90 +++ gcc/testsuite/c-c++-common/Wdangling-pointer-6.c | 32 ++ gcc/testsuite/c-c++-common/Wdangling-pointer.c | 434 ++++++++++++++ gcc/testsuite/g++.dg/warn/Wdangling-pointer-2.C | 23 + gcc/testsuite/g++.dg/warn/Wdangling-pointer.C | 74 +++ gcc/testsuite/g++.dg/warn/Wfree-nonheap-object-6.C | 4 +- gcc/testsuite/g++.dg/warn/ref-temp1.C | 3 + gcc/testsuite/gcc.dg/Wdangling-pointer-2.c | 82 +++ gcc/testsuite/gcc.dg/Wdangling-pointer.c | 75 +++ gcc/testsuite/gcc.dg/uninit-pr50476.c | 2 +- 18 files changed, 2043 insertions(+), 61 deletions(-) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 28363643664..db65c14a7a5 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -548,6 +548,14 @@ Wdangling-else C ObjC C++ ObjC++ Var(warn_dangling_else) Warning LangEnabledBy(C ObjC C++ ObjC++,Wparentheses) Warn about dangling else. +Wdangling-pointer +C ObjC C++ LTO ObjC++ Alias(Wdangling-pointer=, 2, 0) Warning +Warn for uses of pointers to auto variables whose lifetime has ended. + +Wdangling-pointer= +C ObjC C++ ObjC++ Joined RejectNegative UInteger Var(warn_dangling_pointer) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall, 2, 0) IntegerRange(0, 2) +Warn for uses of pointers to auto variables whose lifetime has ended. + Wdate-time C ObjC C++ ObjC++ CPP(warn_date_time) CppReason(CPP_W_DATE_TIME) Var(cpp_warn_date_time) Init(0) Warning Warn about __TIME__, __DATE__ and __TIMESTAMP__ usage. diff --git a/gcc/diagnostic-spec.c b/gcc/diagnostic-spec.c index c9e1c1be91d..a8af229d677 100644 --- a/gcc/diagnostic-spec.c +++ b/gcc/diagnostic-spec.c @@ -99,6 +99,7 @@ nowarn_spec_t::nowarn_spec_t (opt_code opt) m_bits = NW_UNINIT; break; + case OPT_Wdangling_pointer_: case OPT_Wreturn_local_addr: case OPT_Wuse_after_free_: m_bits = NW_DANGLING; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 121c8ea827f..7f2205e4a85 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -341,7 +341,8 @@ Objective-C and Objective-C++ Dialects}. -Wchar-subscripts @gol -Wclobbered -Wcomment @gol -Wconversion -Wno-coverage-mismatch -Wno-cpp @gol --Wdangling-else -Wdate-time @gol +-Wdangling-else -Wdangling-pointer -Wdangling-pointer=@var{n} @gol +-Wdate-time @gol -Wno-deprecated -Wno-deprecated-declarations -Wno-designated-init @gol -Wdisabled-optimization @gol -Wno-discarded-array-qualifiers -Wno-discarded-qualifiers @gol @@ -4389,6 +4390,8 @@ Warn about overriding virtual functions that are not marked with the @opindex Wno-use-after-free Warn about uses of pointers to dynamically allocated objects that have been rendered indeterminate by a call to a deallocation function. +The warning is enabled at all optimization levels but may yield different +results with optimization than without. @table @gcctabopt @item -Wuse-after-free=1 @@ -5714,6 +5717,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}. -Wcatch-value @r{(C++ and Objective-C++ only)} @gol -Wchar-subscripts @gol -Wcomment @gol +-Wdangling-pointer=2 @gol -Wduplicate-decl-specifier @r{(C and Objective-C only)} @gol -Wenum-compare @r{(in C/ObjC; this is on by default in C++)} @gol -Wformat @gol @@ -8587,6 +8591,62 @@ looks like this: This warning is enabled by @option{-Wparentheses}. +@item -Wdangling-pointer +@itemx -Wdangling-pointer=@var{n} +@opindex Wdangling-pointer +@opindex Wno-dangling-pointer +Warn about uses of pointers (or C++ references) to objects with automatic +storage duration after their lifetime has ended. This includes local +variables declared in nested blocks, compound literals and other unnamed +temporary objects. In addition, warn about storing the address of such +objects in escaped pointers. The warning is enabled at all optimization +levels but may yield different results with optimization than without. + +@table @gcctabopt +@item -Wdangling-pointer=1 +At level 1 the warning diagnoses only unconditional uses of dangling pointers. +For example +@smallexample +int f (int c1, int c2, x) +@{ + char *p = strchr ((char[])@{ c1, c2 @}, c3); + return p ? *p : 'x'; // warning: dangling pointer to a compound literal +@} +@end smallexample +In the following function the store of the address of the local variable +@code{x} in the escaped pointer @code{*p} also triggers the warning. +@smallexample +void g (int **p) +@{ + int x = 7; + *p = &x; // warning: storing the address of a local variable in *p +@} +@end smallexample + +@item -Wdangling-pointer=2 +At level 2, in addition to unconditional uses the warning also diagnoses +conditional uses of dangling pointers. + +For example, because the array @var{a} in the following function is out of +scope when the pointer @var{s} that was set to point is used, the warning +triggers at this level. + +@smallexample +void f (char *s) +@{ + if (!s) + @{ + char a[12] = "tmpname"; + s = a; + @} + strcat (s, ".tmp"); // warning: dangling pointer to a may be used + ... +@} +@end smallexample +@end table + +@option{-Wdangling-pointer=2} is included in @option{-Wall}. + @item -Wdate-time @opindex Wdate-time @opindex Wno-date-time diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc index 882129143a1..f639807a78a 100644 --- a/gcc/gimple-ssa-warn-access.cc +++ b/gcc/gimple-ssa-warn-access.cc @@ -2069,10 +2069,12 @@ class pass_waccess : public gimple_opt_pass ~pass_waccess (); - opt_pass *clone () { return new pass_waccess (m_ctxt); } + opt_pass *clone (); virtual bool gate (function *); + void set_pass_param (unsigned, bool); + virtual unsigned int execute (function *); private: @@ -2089,6 +2091,9 @@ private: /* Check a call to an ordinary function for invalid accesses. */ bool check_call_access (gcall *); + /* Check a non-call statement. */ + void check_stmt (gimple *); + /* Check statements in a basic block. */ void check_block (basic_block); @@ -2112,26 +2117,41 @@ private: void check_atomic_memmodel (gimple *, tree, tree, const unsigned char *); /* Check for uses of indeterminate pointers. */ - void check_pointer_uses (gimple *, tree); + void check_pointer_uses (gimple *, tree, tree = NULL_TREE, bool = false); /* Return the argument that a call returns. */ tree gimple_call_return_arg (gcall *); + tree gimple_call_return_arg_ref (gcall *); + + /* Check a call for uses of a dangling pointer arguments. */ + void check_call_dangling (gcall *); + + /* Check uses of a dangling pointer or those derived from it. */ + void check_dangling_uses (tree, tree, bool = false, bool = false); + void check_dangling_uses (); + void check_dangling_stores (); + void check_dangling_stores (basic_block, hash_set<tree> &, auto_bitmap &); - void warn_invalid_pointer (tree, gimple *, gimple *, bool, bool = false); + void warn_invalid_pointer (tree, gimple *, gimple *, tree, bool, bool = false); /* Return true if use follows an invalidating statement. */ - bool use_after_inval_p (gimple *, gimple *); + bool use_after_inval_p (gimple *, gimple *, bool = false); /* A pointer_query object and its cache to store information about pointers and their targets in. */ pointer_query m_ptr_qry; pointer_query::cache_type m_var_cache; - + /* Mapping from DECLs and their clobber statements in the function. */ + hash_map<tree, gimple *> m_clobbers; /* A bit is set for each basic block whose statements have been assigned valid UIDs. */ bitmap m_bb_uids_set; /* The current function. */ function *m_func; + /* True to run checks for uses of dangling pointers. */ + bool m_check_dangling_p; + /* True to run checks early on in the optimization pipeline. */ + bool m_early_checks_p; }; /* Construct the pass. */ @@ -2140,11 +2160,22 @@ pass_waccess::pass_waccess (gcc::context *ctxt) : gimple_opt_pass (pass_data_waccess, ctxt), m_ptr_qry (NULL, &m_var_cache), m_var_cache (), + m_clobbers (), m_bb_uids_set (), - m_func () + m_func (), + m_check_dangling_p (), + m_early_checks_p () { } +/* Return a copy of the pass with RUN_NUMBER one greater than THIS. */ + +opt_pass* +pass_waccess::clone () +{ + return new pass_waccess (m_ctxt); +} + /* Release pointer_query cache. */ pass_waccess::~pass_waccess () @@ -2152,6 +2183,14 @@ pass_waccess::~pass_waccess () m_ptr_qry.flush_cache (); } +void +pass_waccess::set_pass_param (unsigned int n, bool early) +{ + gcc_assert (n == 0); + + m_early_checks_p = early; +} + /* Return true when any checks performed by the pass are enabled. */ bool @@ -2340,6 +2379,9 @@ maybe_warn_alloc_args_overflow (gimple *stmt, const tree args[2], void pass_waccess::check_alloca (gcall *stmt) { + if (m_early_checks_p) + return; + if ((warn_vla_limit >= HOST_WIDE_INT_MAX && warn_alloc_size_limit < warn_vla_limit) || (warn_alloca_limit >= HOST_WIDE_INT_MAX @@ -2361,6 +2403,13 @@ pass_waccess::check_alloca (gcall *stmt) void pass_waccess::check_alloc_size_call (gcall *stmt) { + if (m_early_checks_p) + return; + + if (gimple_call_num_args (stmt) < 1) + /* Avoid invalid calls to functions without a prototype. */ + return; + tree fndecl = gimple_call_fndecl (stmt); if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) { @@ -2413,6 +2462,9 @@ pass_waccess::check_alloc_size_call (gcall *stmt) void pass_waccess::check_strcat (gcall *stmt) { + if (m_early_checks_p) + return; + if (!warn_stringop_overflow && !warn_stringop_overread) return; @@ -2438,6 +2490,9 @@ pass_waccess::check_strcat (gcall *stmt) void pass_waccess::check_strncat (gcall *stmt) { + if (m_early_checks_p) + return; + if (!warn_stringop_overflow && !warn_stringop_overread) return; @@ -2507,6 +2562,9 @@ pass_waccess::check_strncat (gcall *stmt) void pass_waccess::check_stxcpy (gcall *stmt) { + if (m_early_checks_p) + return; + tree dst = call_arg (stmt, 0); tree src = call_arg (stmt, 1); @@ -2545,7 +2603,7 @@ pass_waccess::check_stxcpy (gcall *stmt) void pass_waccess::check_stxncpy (gcall *stmt) { - if (!warn_stringop_overflow) + if (m_early_checks_p || !warn_stringop_overflow) return; tree dst = call_arg (stmt, 0); @@ -2569,7 +2627,7 @@ pass_waccess::check_stxncpy (gcall *stmt) void pass_waccess::check_strncmp (gcall *stmt) { - if (!warn_stringop_overread) + if (m_early_checks_p || !warn_stringop_overread) return; tree arg1 = call_arg (stmt, 0); @@ -2674,6 +2732,9 @@ pass_waccess::check_strncmp (gcall *stmt) void pass_waccess::check_memop_access (gimple *stmt, tree dest, tree src, tree size) { + if (m_early_checks_p) + return; + /* For functions like memset and memcpy that operate on raw memory try to determine the size of the largest source and destination object using type-0 Object Size regardless of the object size @@ -2695,7 +2756,7 @@ pass_waccess::check_read_access (gimple *stmt, tree src, tree bound /* = NULL_TREE */, int ost /* = 1 */) { - if (!warn_stringop_overread) + if (m_early_checks_p || !warn_stringop_overread) return; if (bound && !useless_type_conversion_p (size_type_node, TREE_TYPE (bound))) @@ -2938,7 +2999,7 @@ pass_waccess::check_atomic_memmodel (gimple *stmt, tree ord_sucs, if (warning_suppressed_p (stmt, OPT_Winvalid_memory_model)) return; - if (maybe_warn_memmodel (stmt, ord_sucs, ord_fail, valid)) + if (!maybe_warn_memmodel (stmt, ord_sucs, ord_fail, valid)) return; suppress_warning (stmt, OPT_Winvalid_memory_model); @@ -3094,11 +3155,12 @@ pass_waccess::check_builtin (gcall *stmt) case BUILT_IN_FREE: case BUILT_IN_REALLOC: - { - tree arg = call_arg (stmt, 0); - if (TREE_CODE (arg) == SSA_NAME) - check_pointer_uses (stmt, arg); - } + if (!m_early_checks_p) + { + tree arg = call_arg (stmt, 0); + if (TREE_CODE (arg) == SSA_NAME) + check_pointer_uses (stmt, arg); + } return true; case BUILT_IN_GETTEXT: @@ -3725,16 +3787,67 @@ pass_waccess::maybe_check_dealloc_call (gcall *call) /* Return true if either USE_STMT's basic block (that of a pointer's use) is dominated by INVAL_STMT's (that of a pointer's invalidating statement, - or if they're in the same block, USE_STMT follows INVAL_STMT. */ + which is either a clobber or a deallocation call), or if they're in + the same block, USE_STMT follows INVAL_STMT. */ bool -pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt) +pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt, + bool last_block /* = false */) { + tree clobvar = + gimple_clobber_p (inval_stmt) ? gimple_assign_lhs (inval_stmt) : NULL_TREE; + basic_block inval_bb = gimple_bb (inval_stmt); basic_block use_bb = gimple_bb (use_stmt); + if (!inval_bb || !use_bb) + return false; + if (inval_bb != use_bb) - return dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb); + { + if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb)) + return true; + + if (!clobvar || !last_block) + return false; + + /* Proceed only when looking for uses of dangling pointers. */ + auto gsi = gsi_for_stmt (use_stmt); + + auto_bitmap visited; + + /* A use statement in the last basic block in a function or one that + falls through to it is after any other prior clobber of the used + variable unless it's followed by a clobber of the same variable. */ + basic_block bb = use_bb; + while (bb != inval_bb + && single_succ_p (bb) + && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK))) + { + if (!bitmap_set_bit (visited, bb->index)) + /* Avoid cycles. */ + return true; + + for (; !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (gimple_clobber_p (stmt)) + { + if (clobvar == gimple_assign_lhs (stmt)) + /* The use is followed by a clobber. */ + return false; + } + } + + bb = single_succ (bb); + gsi = gsi_start_bb (bb); + } + + /* The use is one of a dangling pointer if a clobber of the variable + [the pointer points to] has not been found before the function exit + point. */ + return bb == EXIT_BLOCK_PTR_FOR_FN (cfun); + } if (bitmap_set_bit (m_bb_uids_set, inval_bb->index)) /* The first time this basic block is visited assign increasing ids @@ -3752,27 +3865,30 @@ pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt) return gimple_uid (inval_stmt) < gimple_uid (use_stmt); } -/* Issue a warning for the USE_STMT of pointer PTR rendered invalid - by INVAL_STMT. PTR may be null when it's been optimized away. - MAYBE is true to issue the "maybe" kind of warning. EQUALITY is - true when the pointer is used in an equality expression. */ +/* Issue a warning for the USE_STMT of pointer or reference REF rendered + invalid by INVAL_STMT. REF may be null when it's been optimized away. + When nonnull, INVAL_STMT is the deallocation function that rendered + the pointer or reference dangling. Otherwise, VAR is the auto variable + (including an unnamed temporary such as a compound literal) whose + lifetime's rended it dangling. MAYBE is true to issue the "maybe" + kind of warning. EQUALITY is true when the pointer is used in + an equality expression. */ void -pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt, - gimple *inval_stmt, - bool maybe, - bool equality /* = false */) +pass_waccess::warn_invalid_pointer (tree ref, gimple *use_stmt, + gimple *inval_stmt, tree var, + bool maybe, bool equality /* = false */) { /* Avoid printing the unhelpful "<unknown>" in the diagnostics. */ - if (ptr && TREE_CODE (ptr) == SSA_NAME - && (!SSA_NAME_VAR (ptr) || DECL_ARTIFICIAL (SSA_NAME_VAR (ptr)))) - ptr = NULL_TREE; + if (ref && TREE_CODE (ref) == SSA_NAME + && (!SSA_NAME_VAR (ref) || DECL_ARTIFICIAL (SSA_NAME_VAR (ref)))) + ref = NULL_TREE; location_t use_loc = gimple_location (use_stmt); if (use_loc == UNKNOWN_LOCATION) { - use_loc = cfun->function_end_locus; - if (!ptr) + use_loc = m_func->function_end_locus; + if (!ref) /* Avoid issuing a warning with no context other than the function. That would make it difficult to debug in any but very simple cases. */ @@ -3788,12 +3904,12 @@ pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt, const tree inval_decl = gimple_call_fndecl (inval_stmt); - if ((ptr && warning_at (use_loc, OPT_Wuse_after_free, + if ((ref && warning_at (use_loc, OPT_Wuse_after_free, (maybe ? G_("pointer %qE may be used after %qD") : G_("pointer %qE used after %qD")), - ptr, inval_decl)) - || (!ptr && warning_at (use_loc, OPT_Wuse_after_free, + ref, inval_decl)) + || (!ref && warning_at (use_loc, OPT_Wuse_after_free, (maybe ? G_("pointer may be used after %qD") : G_("pointer used after %qD")), @@ -3805,6 +3921,52 @@ pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt, } return; } + + if ((maybe && warn_dangling_pointer < 2) + || warning_suppressed_p (use_stmt, OPT_Wdangling_pointer_)) + return; + + if (DECL_NAME (var)) + { + if ((ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer %qE to %qD may be used") + : G_("using dangling pointer %qE to %qD")), + ref, var)) + || (!ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer to %qD may be used") + : G_("using a dangling pointer to %qD")), + var))) + inform (DECL_SOURCE_LOCATION (var), + "%qD declared here", var); + suppress_warning (use_stmt, OPT_Wdangling_pointer_); + return; + } + + if ((ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer %qE to an unnamed temporary " + "may be used") + : G_("using dangling pointer %qE to an unnamed " + "temporary")), + ref, var)) + || (!ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer to an unnamed temporary " + "may be used") + : G_("using a dangling pointer to an unnamed " + "temporary")), + var))) + { + inform (DECL_SOURCE_LOCATION (var), + "unnamed temporary defined here"); + suppress_warning (use_stmt, OPT_Wdangling_pointer_); + } } /* If STMT is a call to either the standard realloc or to a user-defined @@ -3927,10 +4089,14 @@ pointers_related_p (gimple *stmt, tree p, tree q, pointer_query &qry) /* For a STMT either a call to a deallocation function or a clobber, warn for uses of the pointer PTR it was called with (including its copies - or others derived from it by pointer arithmetic). */ + or others derived from it by pointer arithmetic). If STMT is a clobber, + VAR is the decl of the clobbered variable. When MAYBE is true use + a "maybe" form of diagnostic. */ void -pass_waccess::check_pointer_uses (gimple *stmt, tree ptr) +pass_waccess::check_pointer_uses (gimple *stmt, tree ptr, + tree var /* = NULL_TREE */, + bool maybe /* = false */) { gcc_assert (TREE_CODE (ptr) == SSA_NAME); @@ -4013,18 +4179,25 @@ pass_waccess::check_pointer_uses (gimple *stmt, tree ptr) /* Warn if USE_STMT is dominated by the deallocation STMT. Otherwise, add the pointer to POINTERS so that the uses of any other pointers derived from it can be checked. */ - if (use_after_inval_p (stmt, use_stmt)) + if (use_after_inval_p (stmt, use_stmt, check_dangling)) { - /* TODO: Handle PHIs but careful of false positives. */ - if (gimple_code (use_stmt) != GIMPLE_PHI) + if (gimple_code (use_stmt) == GIMPLE_PHI) { - basic_block use_bb = gimple_bb (use_stmt); - bool this_maybe - = !dominated_by_p (CDI_POST_DOMINATORS, use_bb, stmt_bb); - warn_invalid_pointer (*use_p->use, use_stmt, stmt, - this_maybe, equality); - continue; + tree lhs = gimple_phi_result (use_stmt); + if (TREE_CODE (lhs) == SSA_NAME) + { + pointers.safe_push (lhs); + continue; + } } + + basic_block use_bb = gimple_bb (use_stmt); + bool this_maybe + = (maybe + || !dominated_by_p (CDI_POST_DOMINATORS, use_bb, stmt_bb)); + warn_invalid_pointer (*use_p->use, use_stmt, stmt, var, + this_maybe, equality); + continue; } if (is_gimple_assign (use_stmt)) @@ -4059,26 +4232,100 @@ pass_waccess::check_call (gcall *stmt) if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) check_builtin (stmt); - if (tree callee = gimple_call_fndecl (stmt)) - { - /* Check for uses of the pointer passed to either a standard - or a user-defined deallocation function. */ - unsigned argno = fndecl_dealloc_argno (callee); - if (argno < (unsigned) call_nargs (stmt)) - { - tree arg = call_arg (stmt, argno); - if (TREE_CODE (arg) == SSA_NAME) - check_pointer_uses (stmt, arg); - } - } + if (!m_early_checks_p) + if (tree callee = gimple_call_fndecl (stmt)) + { + /* Check for uses of the pointer passed to either a standard + or a user-defined deallocation function. */ + unsigned argno = fndecl_dealloc_argno (callee); + if (argno < (unsigned) call_nargs (stmt)) + { + tree arg = call_arg (stmt, argno); + if (TREE_CODE (arg) == SSA_NAME) + check_pointer_uses (stmt, arg); + } + } check_call_access (stmt); + check_call_dangling (stmt); + + if (m_early_checks_p) + return; maybe_check_dealloc_call (stmt); check_nonstring_args (stmt); } +/* Return true of X is a DECL with automatic storage duration. */ + +static inline bool +is_auto_decl (tree x) +{ + return DECL_P (x) && !DECL_EXTERNAL (x) && !TREE_STATIC (x); +} + +/* Check non-call STMT for invalid accesses. */ + +void +pass_waccess::check_stmt (gimple *stmt) +{ + if (m_check_dangling_p && gimple_clobber_p (stmt)) + { + /* Ignore clobber statemts in blocks with exceptional edges. */ + basic_block bb = gimple_bb (stmt); + edge e = EDGE_PRED (bb, 0); + if (e->flags & EDGE_EH) + return; + + tree var = gimple_assign_lhs (stmt); + m_clobbers.put (var, stmt); + return; + } + + if (is_gimple_assign (stmt)) + { + /* Clobbered unnamed temporaries such as compound literals can be + revived. Check for an assignment to one and remove it from + M_CLOBBERS. */ + tree lhs = gimple_assign_lhs (stmt); + while (handled_component_p (lhs)) + lhs = TREE_OPERAND (lhs, 0); + + if (is_auto_decl (lhs)) + m_clobbers.remove (lhs); + return; + } + + if (greturn *ret = dyn_cast <greturn *> (stmt)) + { + if (optimize && flag_isolate_erroneous_paths_dereference) + /* Avoid interfering with -Wreturn-local-addr (which runs only + with optimization enabled). */ + return; + + tree arg = gimple_return_retval (ret); + if (!arg || TREE_CODE (arg) != ADDR_EXPR) + return; + + arg = TREE_OPERAND (arg, 0); + while (handled_component_p (arg)) + arg = TREE_OPERAND (arg, 0); + + if (!is_auto_decl (arg)) + return; + + gimple **pclobber = m_clobbers.get (arg); + if (!pclobber) + return; + + if (!use_after_inval_p (*pclobber, stmt)) + return; + + warn_invalid_pointer (NULL_TREE, stmt, *pclobber, arg, false); + } +} + /* Check basic block BB for invalid accesses. */ void @@ -4091,6 +4338,8 @@ pass_waccess::check_block (basic_block bb) gimple *stmt = gsi_stmt (si); if (gcall *call = dyn_cast <gcall *> (stmt)) check_call (call); + else + check_stmt (stmt); } } @@ -4139,6 +4388,262 @@ pass_waccess::gimple_call_return_arg (gcall *call) return gimple_call_arg (call, argno); } +/* Return the decl referenced by the argument that the call STMT to + a built-in function returns (including with an offset) or null if + it doesn't. */ + +tree +pass_waccess::gimple_call_return_arg_ref (gcall *call) +{ + if (tree arg = gimple_call_return_arg (call)) + { + access_ref aref; + if (m_ptr_qry.get_ref (arg, call, &aref, 0) + && DECL_P (aref.ref)) + return aref.ref; + } + + return NULL_TREE; +} + +/* Check for and diagnose all uses of the dangling pointer VAR to the auto + object DECL whose lifetime has ended. OBJREF is true when VAR denotes + an access to a DECL that may have been clobbered. */ + +void +pass_waccess::check_dangling_uses (tree var, tree decl, bool maybe /* = false */, + bool objref /* = false */) +{ + if (!decl || !is_auto_decl (decl)) + return; + + gimple **pclob = m_clobbers.get (decl); + if (!pclob) + return; + + if (!objref) + { + check_pointer_uses (*pclob, var, decl, maybe); + return; + } + + gimple *use_stmt = SSA_NAME_DEF_STMT (var); + if (!use_after_inval_p (*pclob, use_stmt, true)) + return; + + basic_block use_bb = gimple_bb (use_stmt); + basic_block clob_bb = gimple_bb (*pclob); + maybe = maybe || !dominated_by_p (CDI_POST_DOMINATORS, use_bb, clob_bb); + warn_invalid_pointer (var, use_stmt, *pclob, decl, maybe, false); +} + +/* Diagnose stores in BB and (recursively) its predecessors of the addresses + of local variables into nonlocal pointers that are left dangling after + the function returns. BBS is a bitmap of basic blocks visited. */ + +void +pass_waccess::check_dangling_stores (basic_block bb, + hash_set<tree> &stores, + auto_bitmap &bbs) +{ + if (!bitmap_set_bit (bbs, bb->index)) + /* Avoid cycles. */ + return; + + /* Iterate backwards over the statements looking for a store of + the address of a local variable into a nonlocal pointer. */ + for (auto gsi = gsi_last_nondebug_bb (bb); ; gsi_prev_nondebug (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (!stmt) + break; + + if (is_gimple_call (stmt) + && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))) + /* Avoid looking before nonconst, nonpure calls since those might + use the escaped locals. */ + return; + + if (!is_gimple_assign (stmt) || gimple_clobber_p (stmt)) + continue; + + access_ref lhs_ref; + tree lhs = gimple_assign_lhs (stmt); + if (!m_ptr_qry.get_ref (lhs, stmt, &lhs_ref, 0)) + continue; + + if (is_auto_decl (lhs_ref.ref)) + continue; + + if (DECL_P (lhs_ref.ref)) + { + if (!POINTER_TYPE_P (TREE_TYPE (lhs_ref.ref)) + || lhs_ref.deref > 0) + continue; + } + else if (TREE_CODE (lhs_ref.ref) == SSA_NAME) + { + /* Avoid looking at or before stores into unknown objects. */ + gimple *def_stmt = SSA_NAME_DEF_STMT (lhs_ref.ref); + if (!gimple_nop_p (def_stmt)) + return; + } + else if (TREE_CODE (lhs_ref.ref) == MEM_REF) + { + tree arg = TREE_OPERAND (lhs_ref.ref, 0); + if (TREE_CODE (arg) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (arg); + if (!gimple_nop_p (def_stmt)) + return; + } + } + else + continue; + + if (stores.add (lhs_ref.ref)) + continue; + + /* FIXME: Handle stores of alloca() and VLA. */ + access_ref rhs_ref; + tree rhs = gimple_assign_rhs1 (stmt); + if (!m_ptr_qry.get_ref (rhs, stmt, &rhs_ref, 0) + || rhs_ref.deref != -1) + continue; + + if (!is_auto_decl (rhs_ref.ref)) + continue; + + location_t loc = gimple_location (stmt); + if (warning_at (loc, OPT_Wdangling_pointer_, + "storing the address of local variable %qD in %qE", + rhs_ref.ref, lhs)) + { + location_t loc = DECL_SOURCE_LOCATION (rhs_ref.ref); + inform (loc, "%qD declared here", rhs_ref.ref); + + if (DECL_P (lhs_ref.ref)) + loc = DECL_SOURCE_LOCATION (lhs_ref.ref); + else if (EXPR_HAS_LOCATION (lhs_ref.ref)) + loc = EXPR_LOCATION (lhs_ref.ref); + + if (loc != UNKNOWN_LOCATION) + inform (loc, "%qE declared here", lhs_ref.ref); + } + } + + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + check_dangling_stores (pred, stores, bbs); + } +} + +/* Diagnose stores of the addresses of local variables into nonlocal + pointers that are left dangling after the function returns. */ + +void +pass_waccess::check_dangling_stores () +{ + auto_bitmap bbs; + hash_set<tree> stores; + check_dangling_stores (EXIT_BLOCK_PTR_FOR_FN (m_func), stores, bbs); +} + +/* Check for and diagnose uses of dangling pointers to auto objects + whose lifetime has ended. */ + +void +pass_waccess::check_dangling_uses () +{ + tree var; + unsigned i; + FOR_EACH_SSA_NAME (i, var, m_func) + { + /* For each SSA_NAME pointer VAR find the DECL it points to. + If the DECL is a clobbered local variable, check to see + if any of VAR's uses (or those of other pointers derived + from VAR) happens after the clobber. If so, warn. */ + tree decl = NULL_TREE; + + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + if (is_gimple_assign (def_stmt)) + { + tree rhs = gimple_assign_rhs1 (def_stmt); + if (TREE_CODE (rhs) == ADDR_EXPR) + { + if (!POINTER_TYPE_P (TREE_TYPE (var))) + continue; + decl = TREE_OPERAND (rhs, 0); + } + else + { + /* For other expressions, check the base DECL to see + if it's been clobbered, most likely as a result of </cut>

3 years, 5 months

1
0
0 0

[ACTIVITY] week ending Jan. 16 2022

by Alex Bennée

Project Stratos =============== - reviewed Peter's virtio-video patches for QEMU [PR to clean up some typos in EDK2] <https://github.com/tianocore/edk2-platforms/pull/34> vhost-device maintainer effort ([UM-196]) - started reviewing https://github.com/rust-vmm/vhost-device/pull/7 - looking pretty good, see how https://github.com/rust-vmm/vm-virtio/commit/463dd20552fc32139bbbb56e9152df… would work with it [UM-196] <https://linaro.atlassian.net/browse/UM-196> QEMU Upstream Work ([UM-2]) =========================== - posted [RFC PATCH 0/6] Basic skeleton of RP2040 Raspbery Pi Pico Message-Id: <20220110175104.2908956-1-alex.bennee(a)linaro.org> - posted [PATCH v1 00/34] testing/next and other misc fixes Message-Id: <20220105135009.1584676-1-alex.bennee(a)linaro.org> - and the eventual [PULL 00/31] testing/next and other misc fixes Message-Id: <20220112112722.3641051-1-alex.bennee(a)linaro.org> - and the inevitable fixup [RFC PATCH] linux-user: expand reserved brk space for 64bit guests Message-Id: <20220113165550.4184455-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Completed Reviews [6/6] ======================= [PATCH] tests/docker: Add gentoo-loongarch64-cross image and run cross builds in GitLab Message-Id: <20211229062204.3726981-1-git(a)xen0n.name> [PATCH 0/2] tests/tcg: Fix float_{convs,madds} Message-Id: <20211224035541.2159966-1-richard.henderson(a)linaro.org> [PATCH v5 00/18] tests/docker: start using libvirt-ci's "lcitool" for dockerfiles Message-Id: <20211215141949.3512719-1-berrange(a)redhat.com> [PATCH] tests/tcg: Unconditionally use 90 second timeout Message-Id: <20211230235424.49155-1-richard.henderson(a)linaro.org> [PATCH] gitlab-ci: Speed up the msys2-64bit job by using --without-default-devices Message-Id: <20211216082253.43899-1-thuth(a)redhat.com> [PATCH 0/8] virtio: Add vhost-user based Video decode Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org> Absences ======== Current Review Queue ==================== TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org> ============================================================================================================================ TODO [PATCH 0/7] tcg: some small towards more modular tcg Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com> ================================================================================================================= TODO [PATCH 0/6] Introduce CanoKey QEMU Message-Id: <YcSupUSXWDXOAkas@Sun> ========================================================================= TODO [PATCH] target/arm: Add missing FEAT_TLBIOS instructions Message-Id: <20211231103928.1455657-1-idan.horowitz(a)gmail.com> =========================================================================================================================== -- Alex Bennée

3 years, 5 months

1
0
0 0

[ACTIVITY] week ending Jan. 16 2022

by Alex Bennée

Project Stratos =============== - reviewed Peter's virtio-video patches for QEMU [PR to clean up some typos in EDK2] <https://github.com/tianocore/edk2-platforms/pull/34> vhost-device maintainer effort ([UM-196]) - started reviewing https://github.com/rust-vmm/vhost-device/pull/7 - looking pretty good, see how https://github.com/rust-vmm/vm-virtio/commit/463dd20552fc32139bbbb56e9152df… would work with it [UM-196] <https://linaro.atlassian.net/browse/UM-196> QEMU Upstream Work ([UM-2]) =========================== - posted [RFC PATCH 0/6] Basic skeleton of RP2040 Raspbery Pi Pico Message-Id: <20220110175104.2908956-1-alex.bennee(a)linaro.org> - posted [PATCH v1 00/34] testing/next and other misc fixes Message-Id: <20220105135009.1584676-1-alex.bennee(a)linaro.org> - and the eventual [PULL 00/31] testing/next and other misc fixes Message-Id: <20220112112722.3641051-1-alex.bennee(a)linaro.org> - and the inevitable fixup [RFC PATCH] linux-user: expand reserved brk space for 64bit guests Message-Id: <20220113165550.4184455-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Completed Reviews [6/6] ======================= [PATCH] tests/docker: Add gentoo-loongarch64-cross image and run cross builds in GitLab Message-Id: <20211229062204.3726981-1-git(a)xen0n.name> [PATCH 0/2] tests/tcg: Fix float_{convs,madds} Message-Id: <20211224035541.2159966-1-richard.henderson(a)linaro.org> [PATCH v5 00/18] tests/docker: start using libvirt-ci's "lcitool" for dockerfiles Message-Id: <20211215141949.3512719-1-berrange(a)redhat.com> [PATCH] tests/tcg: Unconditionally use 90 second timeout Message-Id: <20211230235424.49155-1-richard.henderson(a)linaro.org> [PATCH] gitlab-ci: Speed up the msys2-64bit job by using --without-default-devices Message-Id: <20211216082253.43899-1-thuth(a)redhat.com> [PATCH 0/8] virtio: Add vhost-user based Video decode Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org> Absences ======== Current Review Queue ==================== TODO [PATCH 0/6] Introduce CanoKey QEMU Message-Id: <YcSupUSXWDXOAkas@Sun> ========================================================================= TODO [PATCH] target/arm: Add missing FEAT_TLBIOS instructions Message-Id: <20211231103928.1455657-1-idan.horowitz(a)gmail.com> ======================================================================================================================== TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64 Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com> =============================================================================================================== -- Alex Bennée

3 years, 5 months

1
0
0 0

[ACTIVITY] report week ending 14 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Most of this week was spent on continuing to work through my code-review queue :-/ - Sent a few minor cleanup patches for linux-user nits I noticed while reading the code as part of reviewing a big bsd-user patchset * QEMU-420 [GICv4 emulation] - got some reviewed ITS cleanup patches upstream - rerolled and sent v2 patchset for the rest of the cleanup patches - got back up to speed with where I left my GICv4 ITS patches before Christmas, and dealt with some minor loose ends I'd left in the last patch or two I was working on. -- PMM

3 years, 5 months

1
0
0 0

On holiday through 22 Jan

by Richard Henderson

Hi Peter, Welcome back, hope you had a good Christmas break. I'm off oh holiday myself for the next two weeks, so this would be an ideal time to pass back merge control to you. The board is mostly green now, with occasional allowed failures for centos-stream and freebsd for upstream package manager failures. See yall in a couple of weeks. r~

3 years, 6 months

1
0
0 0

[ACTIVITY] week ending 9 Jan 2022

by Richard Henderson

[UM-2] * Re-greening of gitlab-ci. - There are continuing issues with cross-i386-tci. Occasionally I see *really* long test times: https://gitlab.com/qemu-project/qemu/-/jobs/1941996332 with qtest-aarch64/qom-test taking 1738s, or 28 of the 60 minute budget. More often it's merely slow: https://gitlab.com/qemu-project/qemu/-/jobs/1954634840 with qtest-aarch64/qom-test taking 538s. Note that locally this test runs in about 100s, and I have been unable to determine why it runs so much slower on gitlab. - Worked on a ppc64-softmmu slowdown leading to timeouts. - Fixes for meson regressions affecting testing. * Refresh tcg unaligned user patch sets. r~

3 years, 6 months

1
0
0 0

[ACTIVITY] report week ending 7 Jan

by Peter Maydell

Progress (short week, 2 days): * UM-2 [QEMU upstream maintainership] - Catching up with email and codereview backlog from 3 weeks holiday :-) (Have got the codereview queue down to less than a dozen things so should be able to do some more GICv4 development next week.) -- PMM

3 years, 6 months

1
0
0 0

[ACTIVITY] week ending Dec. 19 2021

by Alex Bennée

Project Stratos =============== - got Xen working on the MachiatoBin - posted Configuring the host GIC for guest to guest IPI Message-Id: <87fsqwn2sd.fsf(a)linaro.org> QEMU Upstream Work ([UM-2]) =========================== - posted [RFC PATCH] linux-user: don't adjust base of found hole Message-Id: <20211216144442.2270605-1-alex.bennee(a)linaro.org> - posted [PATCH] hw/arm: add control knob to disable kaslr_seed via DTB Message-Id: <20211215120926.1696302-1-alex.bennee(a)linaro.org> Completed Reviews [3/3] ======================= [PATCH 00/26] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20211211191135.1764649-1-peter.maydell(a)linaro.org> [PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org> [PATCH-for-6.2? v2 0/5] docs/devel/style: Improve rST rendering Message-Id: <20211118145716.4116731-1-philmd(a)redhat.com> Absences ======== Off for holidays, back in the new year. Merry Christmas everyone! -- Alex Bennée

3 years, 6 months

1
0
0 0

[TCWG CI] Regression caused by gcc: tree-object-size: Use trees and support negative offsets

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: tree-object-size: Use trees and support negative offsets: commit 422f9eb7011b76c12ff00ffaee2bcc9cdddf16d5 Author: Siddhesh Poyarekar <siddhesh(a)gotplt.org> tree-object-size: Use trees and support negative offsets Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 550 # First few build errors in logs: # 00:01:37 ./include/linux/thread_info.h:213:25: error: call to ‘__bad_copy_to’ declared with attribute error: copy destination size is too small # 00:01:37 ./include/linux/thread_info.h:213:25: error: call to ‘__bad_copy_to’ declared with attribute error: copy destination size is too small # 00:01:37 make[1]: *** [scripts/Makefile.build:287: fs/io_uring.o] Error 1 # 00:01:37 make: *** [Makefile:1846: fs] Error 2 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 567 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-mainline-allnoconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Reproduce builds: <cut> mkdir investigate-gcc-422f9eb7011b76c12ff00ffaee2bcc9cdddf16d5 cd investigate-gcc-422f9eb7011b76c12ff00ffaee2bcc9cdddf16d5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 422f9eb7011b76c12ff00ffaee2bcc9cdddf16d5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 871504b0dd5cd023d3a28cf9e5ccbda75928b102 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 422f9eb7011b76c12ff00ffaee2bcc9cdddf16d5 Author: Siddhesh Poyarekar <siddhesh(a)gotplt.org> Date: Fri Dec 17 07:07:18 2021 +0530 tree-object-size: Use trees and support negative offsets Transform tree-object-size to operate on tree objects instead of host wide integers. This makes it easier to extend to dynamic expressions for object sizes. The compute_builtin_object_size interface also now returns a tree expression instead of HOST_WIDE_INT, so callers have been adjusted to account for that. The trees in object_sizes are each an object_size object with members size (the bytes from the pointer to the end of the object) and wholesize (the size of the whole object). This allows analysis of negative offsets, which can now be allowed to the extent of the object bounds. Tests have been added to verify that it actually works. gcc/ChangeLog: * tree-object-size.h (compute_builtin_object_size): Return tree instead of HOST_WIDE_INT. * builtins.c (fold_builtin_object_size): Adjust. * gimple-fold.c (gimple_fold_builtin_strncat): Likewise. * ubsan.c (instrument_object_size): Likewise. * tree-object-size.c (object_size): New structure. (object_sizes): Change type to vec<object_size>. (initval): New function. (unknown): Use it. (size_unknown_p, size_initval, size_unknown): New functions. (object_sizes_unknown_p): Use it. (object_sizes_get): Return tree. (object_sizes_initialize): Rename from object_sizes_set_force and set VAL parameter type as tree. Add new parameter WHOLEVAL. (object_sizes_set): Set VAL parameter type as tree and adjust implementation. Add new parameter WHOLEVAL. (size_for_offset): New function. (decl_init_size): Adjust comment. (addr_object_size): Change PSIZE parameter to tree and adjust implementation. Add new parameter PWHOLESIZE. (alloc_object_size): Return tree. (compute_builtin_object_size): Return tree in PSIZE. (expr_object_size, call_object_size, unknown_object_size): Adjust for object_sizes_set change. (merge_object_sizes): Drop OFFSET parameter and adjust implementation for tree change. (plus_stmt_object_size): Call collect_object_sizes_for directly instead of merge_object_size and call size_for_offset to get net size. (cond_expr_object_size, collect_object_sizes_for, object_sizes_execute): Adjust for change of type from HOST_WIDE_INT to tree. (check_for_plus_in_loops_1): Likewise and skip non-positive offsets. gcc/testsuite/ChangeLog: * gcc.dg/builtin-object-size-1.c (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-2.c (test8): New test. (main): Call it. * gcc.dg/builtin-object-size-3.c (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-4.c (test8): New test. (main): Call it. * gcc.dg/builtin-object-size-5.c (test5, test6, test7): New tests. Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org> --- gcc/builtins.c | 10 +- gcc/gimple-fold.c | 11 +- gcc/testsuite/gcc.dg/builtin-object-size-1.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-2.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-3.c | 31 +++ gcc/testsuite/gcc.dg/builtin-object-size-4.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-5.c | 25 ++ gcc/tree-object-size.c | 394 +++++++++++++++++---------- gcc/tree-object-size.h | 2 +- gcc/ubsan.c | 5 +- 10 files changed, 409 insertions(+), 159 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index cd8947b4de2..abe342e111d 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -10255,7 +10255,7 @@ maybe_emit_sprintf_chk_warning (tree exp, enum built_in_function fcode) static tree fold_builtin_object_size (tree ptr, tree ost) { - unsigned HOST_WIDE_INT bytes; + tree bytes; int object_size_type; if (!validate_arg (ptr, POINTER_TYPE) @@ -10280,8 +10280,8 @@ fold_builtin_object_size (tree ptr, tree ost) if (TREE_CODE (ptr) == ADDR_EXPR) { compute_builtin_object_size (ptr, object_size_type, &bytes); - if (wi::fits_to_tree_p (bytes, size_type_node)) - return build_int_cstu (size_type_node, bytes); + if (int_fits_type_p (bytes, size_type_node)) + return fold_convert (size_type_node, bytes); } else if (TREE_CODE (ptr) == SSA_NAME) { @@ -10289,8 +10289,8 @@ fold_builtin_object_size (tree ptr, tree ost) later. Maybe subsequent passes will help determining it. */ if (compute_builtin_object_size (ptr, object_size_type, &bytes) - && wi::fits_to_tree_p (bytes, size_type_node)) - return build_int_cstu (size_type_node, bytes); + && int_fits_type_p (bytes, size_type_node)) + return fold_convert (size_type_node, bytes); } return NULL_TREE; diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 1d8fd74f72c..64515aabc04 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -2493,17 +2493,16 @@ gimple_fold_builtin_strncat (gimple_stmt_iterator *gsi) if (!src_len || known_lower (stmt, len, src_len, true)) return false; - unsigned HOST_WIDE_INT dstsize; - bool found_dstsize = compute_builtin_object_size (dst, 1, &dstsize); - /* Warn on constant LEN. */ if (TREE_CODE (len) == INTEGER_CST) { bool nowarn = warning_suppressed_p (stmt, OPT_Wstringop_overflow_); + tree dstsize; - if (!nowarn && found_dstsize) + if (!nowarn && compute_builtin_object_size (dst, 1, &dstsize) + && TREE_CODE (dstsize) == INTEGER_CST) { - int cmpdst = compare_tree_int (len, dstsize); + int cmpdst = tree_int_cst_compare (len, dstsize); if (cmpdst >= 0) { @@ -2519,7 +2518,7 @@ gimple_fold_builtin_strncat (gimple_stmt_iterator *gsi) ? G_("%qD specified bound %E equals " "destination size") : G_("%qD specified bound %E exceeds " - "destination size %wu"), + "destination size %E"), fndecl, len, dstsize); if (nowarn) suppress_warning (stmt, OPT_Wstringop_overflow_); diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c b/gcc/testsuite/gcc.dg/builtin-object-size-1.c index 8cdae49a6b1..0154f4e9695 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c +++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c @@ -424,6 +424,35 @@ test8 (void) abort (); } +void +__attribute__ ((noinline)) +test9 (unsigned cond) +{ + char *buf2 = malloc (10); + char *p; + + if (cond) + p = &buf2[8]; + else + p = &buf2[4]; + + if (__builtin_object_size (&p[-4], 0) != 10) + abort (); + + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 0) != 10) + abort (); + + p = &y.c[8]; + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 0) != sizeof (y)) + abort (); +} + int main (void) { @@ -437,5 +466,6 @@ main (void) test6 (4); test7 (); test8 (); + test9 (1); exit (0); } diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-2.c b/gcc/testsuite/gcc.dg/builtin-object-size-2.c index ad2dd296a9a..5cf29291aff 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-2.c +++ b/gcc/testsuite/gcc.dg/builtin-object-size-2.c @@ -382,6 +382,35 @@ test7 (void) abort (); } +void +__attribute__ ((noinline)) +test8 (unsigned cond) +{ + char *buf2 = malloc (10); + char *p; + + if (cond) + p = &buf2[8]; + else + p = &buf2[4]; + + if (__builtin_object_size (&p[-4], 1) != 10) + abort (); + + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 1) != 10) + abort (); + + p = &y.c[8]; + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 1) != sizeof (y.c)) + abort (); +} + int main (void) { @@ -394,5 +423,6 @@ main (void) test5 (4); test6 (); test7 (); + test8 (1); exit (0); } diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-3.c b/gcc/testsuite/gcc.dg/builtin-object-size-3.c index d5ca5047ee9..3a692c4e3d2 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-3.c +++ b/gcc/testsuite/gcc.dg/builtin-object-size-3.c @@ -430,6 +430,36 @@ test8 (void) abort (); } +void +__attribute__ ((noinline)) +test9 (unsigned cond) +{ + char *buf2 = malloc (10); + char *p; + + if (cond) + p = &buf2[8]; + else + p = &buf2[4]; + + if (__builtin_object_size (&p[-4], 2) != 6) + abort (); + + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 2) != 2) + abort (); + + p = &y.c[8]; + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 2) + != sizeof (y) - __builtin_offsetof (struct A, c) - 8) + abort (); +} + int main (void) { @@ -443,5 +473,6 @@ main (void) test6 (4); test7 (); test8 (); + test9 (1); exit (0); } diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-4.c b/gcc/testsuite/gcc.dg/builtin-object-size-4.c index 9f159e36a0f..87381620cc9 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-4.c +++ b/gcc/testsuite/gcc.dg/builtin-object-size-4.c @@ -395,6 +395,35 @@ test7 (void) abort (); } +void +__attribute__ ((noinline)) +test8 (unsigned cond) +{ + char *buf2 = malloc (10); + char *p; + + if (cond) + p = &buf2[8]; + else + p = &buf2[4]; + + if (__builtin_object_size (&p[-4], 3) != 6) + abort (); + + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 3) != 2) + abort (); + + p = &y.c[8]; + for (unsigned i = cond; i > 0; i--) + p--; + + if (__builtin_object_size (p, 3) != sizeof (y.c) - 8) + abort (); +} + int main (void) { @@ -407,5 +436,6 @@ main (void) test5 (4); test6 (); test7 (); + test8 (1); exit (0); } diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-5.c b/gcc/testsuite/gcc.dg/builtin-object-size-5.c index 7c274cdfd42..8e63d9c7a5e 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-5.c +++ b/gcc/testsuite/gcc.dg/builtin-object-size-5.c @@ -53,4 +53,29 @@ test4 (size_t x) abort (); } +void +test5 (void) +{ + char *p = &buf[0x90000004]; + if (__builtin_object_size (p + 2, 0) != 0) + abort (); +} + +void +test6 (void) +{ + char *p = &buf[-4]; + if (__builtin_object_size (p + 2, 0) != 0) + abort (); +} + +void +test7 (void) +{ + char *buf2 = __builtin_malloc (8); + char *p = &buf2[0x90000004]; + if (__builtin_object_size (p + 2, 0) != 0) + abort (); +} + /* { dg-final { scan-assembler-not "abort" } } */ diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index b4881ef198f..32ef6dd5133 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -45,6 +45,14 @@ struct object_size_info unsigned int *stack, *tos; }; +struct GTY(()) object_size +{ + /* Estimate of bytes till the end of the object. */ + tree size; + /* Estimate of the size of the whole object. */ + tree wholesize; +}; + enum { OST_SUBOBJECT = 1, @@ -54,13 +62,12 @@ enum static tree compute_object_offset (const_tree, const_tree); static bool addr_object_size (struct object_size_info *, - const_tree, int, unsigned HOST_WIDE_INT *); -static unsigned HOST_WIDE_INT alloc_object_size (const gcall *, int); + const_tree, int, tree *, tree *t = NULL); +static tree alloc_object_size (const gcall *, int); static tree pass_through_call (const gcall *); static void collect_object_sizes_for (struct object_size_info *, tree); static void expr_object_size (struct object_size_info *, tree, tree); -static bool merge_object_sizes (struct object_size_info *, tree, tree, - unsigned HOST_WIDE_INT); +static bool merge_object_sizes (struct object_size_info *, tree, tree); static bool plus_stmt_object_size (struct object_size_info *, tree, gimple *); static bool cond_expr_object_size (struct object_size_info *, tree, gimple *); static void init_offset_limit (void); @@ -68,13 +75,13 @@ static void check_for_plus_in_loops (struct object_size_info *, tree); static void check_for_plus_in_loops_1 (struct object_size_info *, tree, unsigned int); -/* object_sizes[0] is upper bound for number of bytes till the end of - the object. - object_sizes[1] is upper bound for number of bytes till the end of - the subobject (innermost array or field with address taken). - object_sizes[2] is lower bound for number of bytes till the end of - the object and object_sizes[3] lower bound for subobject. */ -static vec<unsigned HOST_WIDE_INT> object_sizes[OST_END]; +/* object_sizes[0] is upper bound for the object size and number of bytes till + the end of the object. + object_sizes[1] is upper bound for the object size and number of bytes till + the end of the subobject (innermost array or field with address taken). + object_sizes[2] is lower bound for the object size and number of bytes till + the end of the object and object_sizes[3] lower bound for subobject. */ +static vec<object_size> object_sizes[OST_END]; /* Bitmaps what object sizes have been computed already. */ static bitmap computed[OST_END]; @@ -82,10 +89,46 @@ static bitmap computed[OST_END]; /* Maximum value of offset we consider to be addition. */ static unsigned HOST_WIDE_INT offset_limit; +/* Initial value of object sizes; zero for maximum and SIZE_MAX for minimum + object size. */ + +static inline unsigned HOST_WIDE_INT +initval (int object_size_type) +{ + return (object_size_type & OST_MINIMUM) ? HOST_WIDE_INT_M1U : 0; +} + +/* Unknown object size value; it's the opposite of initval. */ + static inline unsigned HOST_WIDE_INT unknown (int object_size_type) { - return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1)); + return ~initval (object_size_type); +} + +/* Return true if VAL is represents an unknown size for OBJECT_SIZE_TYPE. */ + +static inline bool +size_unknown_p (tree val, int object_size_type) +{ + return (tree_fits_uhwi_p (val) + && tree_to_uhwi (val) == unknown (object_size_type)); +} + +/* Return a tree with initial value for OBJECT_SIZE_TYPE. */ + +static inline tree +size_initval (int object_size_type) +{ + return size_int (initval (object_size_type)); +} + +/* Return a tree with unknown value for OBJECT_SIZE_TYPE. */ + +static inline tree +size_unknown (int object_size_type) +{ + return size_int (unknown (object_size_type)); } /* Grow object_sizes[OBJECT_SIZE_TYPE] to num_ssa_names. */ @@ -110,47 +153,57 @@ object_sizes_release (int object_size_type) static inline bool object_sizes_unknown_p (int object_size_type, unsigned varno) { - return (object_sizes[object_size_type][varno] - == unknown (object_size_type)); + return size_unknown_p (object_sizes[object_size_type][varno].size, + object_size_type); } -/* Return size for VARNO corresponding to OSI. */ +/* Return size for VARNO corresponding to OSI. If WHOLE is true, return the + whole object size. */ -static inline unsigned HOST_WIDE_INT -object_sizes_get (struct object_size_info *osi, unsigned varno) +static inline tree +object_sizes_get (struct object_size_info *osi, unsigned varno, + bool whole = false) { - return object_sizes[osi->object_size_type][varno]; + if (whole) + return object_sizes[osi->object_size_type][varno].wholesize; + else + return object_sizes[osi->object_size_type][varno].size; } /* Set size for VARNO corresponding to OSI to VAL. */ -static inline bool -object_sizes_set_force (struct object_size_info *osi, unsigned varno, - unsigned HOST_WIDE_INT val) +static inline void +object_sizes_initialize (struct object_size_info *osi, unsigned varno, + tree val, tree wholeval) { - object_sizes[osi->object_size_type][varno] = val; - return true; + int object_size_type = osi->object_size_type; + + object_sizes[object_size_type][varno].size = val; + object_sizes[object_size_type][varno].wholesize = wholeval; } /* Set size for VARNO corresponding to OSI to VAL if it is the new minimum or maximum. */ static inline bool -object_sizes_set (struct object_size_info *osi, unsigned varno, - unsigned HOST_WIDE_INT val) +object_sizes_set (struct object_size_info *osi, unsigned varno, tree val, + tree wholeval) { int object_size_type = osi->object_size_type; - if ((object_size_type & OST_MINIMUM) == 0) - { - if (object_sizes[object_size_type][varno] < val) - return object_sizes_set_force (osi, varno, val); - } - else - { - if (object_sizes[object_size_type][varno] > val) - return object_sizes_set_force (osi, varno, val); - } - return false; + object_size osize = object_sizes[object_size_type][varno]; + + tree oldval = osize.size; + tree old_wholeval = osize.wholesize; + + enum tree_code code = object_size_type & OST_MINIMUM ? MIN_EXPR : MAX_EXPR; + + val = size_binop (code, val, oldval); + wholeval = size_binop (code, wholeval, old_wholeval); + + object_sizes[object_size_type][varno].size = val; + object_sizes[object_size_type][varno].wholesize = wholeval; + return (tree_int_cst_compare (oldval, val) != 0 + || tree_int_cst_compare (old_wholeval, wholeval) != 0); } /* Initialize OFFSET_LIMIT variable. */ @@ -164,6 +217,48 @@ init_offset_limit (void) offset_limit /= 2; } +/* Bytes at end of the object with SZ from offset OFFSET. If WHOLESIZE is not + NULL_TREE, use it to get the net offset of the pointer, which should always + be positive and hence, be within OFFSET_LIMIT for valid offsets. */ + +static tree +size_for_offset (tree sz, tree offset, tree wholesize = NULL_TREE) +{ + gcc_checking_assert (TREE_CODE (offset) == INTEGER_CST); + gcc_checking_assert (TREE_CODE (sz) == INTEGER_CST); + gcc_checking_assert (types_compatible_p (TREE_TYPE (sz), sizetype)); + + /* For negative offsets, if we have a distinct WHOLESIZE, use it to get a net + offset from the whole object. */ + if (wholesize && tree_int_cst_compare (sz, wholesize)) + { + gcc_checking_assert (TREE_CODE (wholesize) == INTEGER_CST); + gcc_checking_assert (types_compatible_p (TREE_TYPE (wholesize), + sizetype)); + + /* Restructure SZ - OFFSET as + WHOLESIZE - (WHOLESIZE + OFFSET - SZ) so that the offset part, i.e. + WHOLESIZE + OFFSET - SZ is only allowed to be positive. */ + tree tmp = size_binop (MAX_EXPR, wholesize, sz); + offset = fold_build2 (PLUS_EXPR, sizetype, tmp, offset); + offset = fold_build2 (MINUS_EXPR, sizetype, offset, sz); + sz = tmp; + } + + /* Safe to convert now, since a valid net offset should be non-negative. */ + if (!types_compatible_p (TREE_TYPE (offset), sizetype)) + fold_convert (sizetype, offset); + + if (integer_zerop (offset)) + return sz; + + /* Negative or too large offset even after adjustment, cannot be within + bounds of an object. */ + if (compare_tree_int (offset, offset_limit) > 0) + return size_zero_node; + + return size_binop (MINUS_EXPR, size_binop (MAX_EXPR, sz, offset), offset); +} /* Compute offset of EXPR within VAR. Return error_mark_node if unknown. */ @@ -274,19 +369,22 @@ decl_init_size (tree decl, bool min) /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR. OBJECT_SIZE_TYPE is the second argument from __builtin_object_size. - If unknown, return unknown (object_size_type). */ + If unknown, return size_unknown (object_size_type). */ static bool addr_object_size (struct object_size_info *osi, const_tree ptr, - int object_size_type, unsigned HOST_WIDE_INT *psize) + int object_size_type, tree *psize, tree *pwholesize) { - tree pt_var, pt_var_size = NULL_TREE, var_size, bytes; + tree pt_var, pt_var_size = NULL_TREE, pt_var_wholesize = NULL_TREE; + tree var_size, bytes, wholebytes; gcc_assert (TREE_CODE (ptr) == ADDR_EXPR); /* Set to unknown and overwrite just before returning if the size could be determined. */ - *psize = unknown (object_size_type); + *psize = size_unknown (object_size_type); + if (pwholesize) + *pwholesize = size_unknown (object_size_type); pt_var = TREE_OPERAND (ptr, 0); while (handled_component_p (pt_var)) @@ -297,13 +395,14 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, if (TREE_CODE (pt_var) == MEM_REF) { - unsigned HOST_WIDE_INT sz; + tree sz, wholesize; if (!osi || (object_size_type & OST_SUBOBJECT) != 0 || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME) { compute_builtin_object_size (TREE_OPERAND (pt_var, 0), object_size_type & ~OST_SUBOBJECT, &sz); + wholesize = sz; } else { @@ -312,46 +411,47 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, collect_object_sizes_for (osi, var); if (bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (var))) - sz = object_sizes_get (osi, SSA_NAME_VERSION (var)); + { + sz = object_sizes_get (osi, SSA_NAME_VERSION (var)); + wholesize = object_sizes_get (osi, SSA_NAME_VERSION (var), true); + } else - sz = unknown (object_size_type); + sz = wholesize = size_unknown (object_size_type); } - if (sz != unknown (object_size_type)) + if (!size_unknown_p (sz, object_size_type)) { - offset_int mem_offset; - if (mem_ref_offset (pt_var).is_constant (&mem_offset)) - { - offset_int dsz = wi::sub (sz, mem_offset); - if (wi::neg_p (dsz)) - sz = 0; - else if (wi::fits_uhwi_p (dsz)) - sz = dsz.to_uhwi (); - else - sz = unknown (object_size_type); - } + tree offset = TREE_OPERAND (pt_var, 1); + if (TREE_CODE (offset) != INTEGER_CST + || TREE_CODE (sz) != INTEGER_CST) + sz = wholesize = size_unknown (object_size_type); else - sz = unknown (object_size_type); + sz = size_for_offset (sz, offset, wholesize); } - if (sz != unknown (object_size_type) && sz < offset_limit) - pt_var_size = size_int (sz); + if (!size_unknown_p (sz, object_size_type) + && TREE_CODE (sz) == INTEGER_CST + && compare_tree_int (sz, offset_limit) < 0) + { + pt_var_size = sz; + pt_var_wholesize = wholesize; + } } else if (DECL_P (pt_var)) { - pt_var_size = decl_init_size (pt_var, object_size_type & OST_MINIMUM); + pt_var_size = pt_var_wholesize + = decl_init_size (pt_var, object_size_type & OST_MINIMUM); if (!pt_var_size) return false; } else if (TREE_CODE (pt_var) == STRING_CST) - pt_var_size = TYPE_SIZE_UNIT (TREE_TYPE (pt_var)); + pt_var_size = pt_var_wholesize = TYPE_SIZE_UNIT (TREE_TYPE (pt_var)); else return false; if (pt_var_size) { /* Validate the size determined above. */ - if (!tree_fits_uhwi_p (pt_var_size) - || tree_to_uhwi (pt_var_size) >= offset_limit) + if (compare_tree_int (pt_var_size, offset_limit) >= 0) return false; } @@ -496,28 +596,35 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, bytes = size_binop (MIN_EXPR, bytes, bytes2); } } + + wholebytes + = object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize; } else if (!pt_var_size) return false; else - bytes = pt_var_size; - - if (tree_fits_uhwi_p (bytes)) { - *psize = tree_to_uhwi (bytes); - return true; + bytes = pt_var_size; + wholebytes = pt_var_wholesize; } - return false; + if (TREE_CODE (bytes) != INTEGER_CST + || TREE_CODE (wholebytes) != INTEGER_CST) + return false; + + *psize = bytes; + if (pwholesize) + *pwholesize = wholebytes; + return true; } /* Compute __builtin_object_size for CALL, which is a GIMPLE_CALL. Handles calls to functions declared with attribute alloc_size. OBJECT_SIZE_TYPE is the second argument from __builtin_object_size. - If unknown, return unknown (object_size_type). */ + If unknown, return size_unknown (object_size_type). */ -static unsigned HOST_WIDE_INT +static tree alloc_object_size (const gcall *call, int object_size_type) { gcc_assert (is_gimple_call (call)); @@ -529,7 +636,7 @@ alloc_object_size (const gcall *call, int object_size_type) calltype = gimple_call_fntype (call); if (!calltype) - return unknown (object_size_type); + return size_unknown (object_size_type); /* Set to positions of alloc_size arguments. */ int arg1 = -1, arg2 = -1; @@ -549,7 +656,7 @@ alloc_object_size (const gcall *call, int object_size_type) || (arg2 >= 0 && (arg2 >= (int)gimple_call_num_args (call) || TREE_CODE (gimple_call_arg (call, arg2)) != INTEGER_CST))) - return unknown (object_size_type); + return size_unknown (object_size_type); tree bytes = NULL_TREE; if (arg2 >= 0) @@ -559,10 +666,7 @@ alloc_object_size (const gcall *call, int object_size_type) else if (arg1 >= 0) bytes = fold_convert (sizetype, gimple_call_arg (call, arg1)); - if (bytes && tree_fits_uhwi_p (bytes)) - return tree_to_uhwi (bytes); - - return unknown (object_size_type); + return bytes; } @@ -598,13 +702,13 @@ pass_through_call (const gcall *call) bool compute_builtin_object_size (tree ptr, int object_size_type, - unsigned HOST_WIDE_INT *psize) + tree *psize) { gcc_assert (object_size_type >= 0 && object_size_type < OST_END); /* Set to unknown and overwrite just before returning if the size could be determined. */ - *psize = unknown (object_size_type); + *psize = size_unknown (object_size_type); if (! offset_limit) init_offset_limit (); @@ -638,8 +742,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, psize)) { /* Return zero when the offset is out of bounds. */ - unsigned HOST_WIDE_INT off = tree_to_shwi (offset); - *psize = off < *psize ? *psize - off : 0; + *psize = size_for_offset (*psize, offset); return true; } } @@ -747,12 +850,13 @@ compute_builtin_object_size (tree ptr, int object_size_type, print_generic_expr (dump_file, ssa_name (i), dump_flags); fprintf (dump_file, - ": %s %sobject size " - HOST_WIDE_INT_PRINT_UNSIGNED "\n", + ": %s %sobject size ", ((object_size_type & OST_MINIMUM) ? "minimum" : "maximum"), - (object_size_type & OST_SUBOBJECT) ? "sub" : "", - object_sizes_get (&osi, i)); + (object_size_type & OST_SUBOBJECT) ? "sub" : ""); + print_generic_expr (dump_file, object_sizes_get (&osi, i), + dump_flags); + fprintf (dump_file, "\n"); } } @@ -761,7 +865,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, } *psize = object_sizes_get (&osi, SSA_NAME_VERSION (ptr)); - return *psize != unknown (object_size_type); + return !size_unknown_p (*psize, object_size_type); } /* Compute object_sizes for PTR, defined to VALUE, which is not an SSA_NAME. */ @@ -771,7 +875,7 @@ expr_object_size (struct object_size_info *osi, tree ptr, tree value) { int object_size_type = osi->object_size_type; unsigned int varno = SSA_NAME_VERSION (ptr); - unsigned HOST_WIDE_INT bytes; + tree bytes, wholesize; gcc_assert (!object_sizes_unknown_p (object_size_type, varno)); gcc_assert (osi->pass == 0); @@ -784,11 +888,11 @@ expr_object_size (struct object_size_info *osi, tree ptr, tree value) || !POINTER_TYPE_P (TREE_TYPE (value))); if (TREE_CODE (value) == ADDR_EXPR) - addr_object_size (osi, value, object_size_type, &bytes); + addr_object_size (osi, value, object_size_type, &bytes, &wholesize); else - bytes = unknown (object_size_type); + bytes = wholesize = size_unknown (object_size_type); - object_sizes_set (osi, varno, bytes); + object_sizes_set (osi, varno, bytes, wholesize); } @@ -799,16 +903,14 @@ call_object_size (struct object_size_info *osi, tree ptr, gcall *call) { int object_size_type = osi->object_size_type; unsigned int varno = SSA_NAME_VERSION (ptr); - unsigned HOST_WIDE_INT bytes; gcc_assert (is_gimple_call (call)); gcc_assert (!object_sizes_unknown_p (object_size_type, varno)); gcc_assert (osi->pass == 0); + tree bytes = alloc_object_size (call, object_size_type); - bytes = alloc_object_size (call, object_size_type); - - object_sizes_set (osi, varno, bytes); + object_sizes_set (osi, varno, bytes, bytes); } @@ -822,8 +924,9 @@ unknown_object_size (struct object_size_info *osi, tree ptr) gcc_checking_assert (!object_sizes_unknown_p (object_size_type, varno)); gcc_checking_assert (osi->pass == 0); + tree bytes = size_unknown (object_size_type); - object_sizes_set (osi, varno, unknown (object_size_type)); + object_sizes_set (osi, varno, bytes, bytes); } @@ -831,30 +934,22 @@ unknown_object_size (struct object_size_info *osi, tree ptr) the object size might need reexamination later. */ static bool -merge_object_sizes (struct object_size_info *osi, tree dest, tree orig, - unsigned HOST_WIDE_INT offset) +merge_object_sizes (struct object_size_info *osi, tree dest, tree orig) { int object_size_type = osi->object_size_type; unsigned int varno = SSA_NAME_VERSION (dest); - unsigned HOST_WIDE_INT orig_bytes; + tree orig_bytes, wholesize; if (object_sizes_unknown_p (object_size_type, varno)) return false; - if (offset >= offset_limit) - { - object_sizes_set (osi, varno, unknown (object_size_type)); - return false; - } if (osi->pass == 0) collect_object_sizes_for (osi, orig); orig_bytes = object_sizes_get (osi, SSA_NAME_VERSION (orig)); - if (orig_bytes != unknown (object_size_type)) - orig_bytes = (offset > orig_bytes) - ? HOST_WIDE_INT_0U : orig_bytes - offset; + wholesize = object_sizes_get (osi, SSA_NAME_VERSION (orig), true); - if (object_sizes_set (osi, varno, orig_bytes)) + if (object_sizes_set (osi, varno, orig_bytes, wholesize)) osi->changed = true; return bitmap_bit_p (osi->reexamine, SSA_NAME_VERSION (orig)); @@ -870,8 +965,9 @@ plus_stmt_object_size (struct object_size_info *osi, tree var, gimple *stmt) { int object_size_type = osi->object_size_type; unsigned int varno = SSA_NAME_VERSION (var); - unsigned HOST_WIDE_INT bytes; + tree bytes, wholesize; tree op0, op1; + bool reexamine = false; if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR) { @@ -896,31 +992,38 @@ plus_stmt_object_size (struct object_size_info *osi, tree var, gimple *stmt) && (TREE_CODE (op0) == SSA_NAME || TREE_CODE (op0) == ADDR_EXPR)) { - if (! tree_fits_uhwi_p (op1)) - bytes = unknown (object_size_type); - else if (TREE_CODE (op0) == SSA_NAME) - return merge_object_sizes (osi, var, op0, tree_to_uhwi (op1)); + if (TREE_CODE (op0) == SSA_NAME) + { + if (osi->pass == 0) + collect_object_sizes_for (osi, op0); + + bytes = object_sizes_get (osi, SSA_NAME_VERSION (op0)); + wholesize = object_sizes_get (osi, SSA_NAME_VERSION (op0), true); + reexamine = bitmap_bit_p (osi->reexamine, SSA_NAME_VERSION (op0)); + } else { - unsigned HOST_WIDE_INT off = tree_to_uhwi (op1); - - /* op0 will be ADDR_EXPR here. */ - addr_object_size (osi, op0, object_size_type, &bytes); - if (bytes == unknown (object_size_type)) - ; - else if (off > offset_limit) - bytes = unknown (object_size_type); - else if (off > bytes) - bytes = 0; - else - bytes -= off; + /* op0 will be ADDR_EXPR here. We should never come here during + reexamination. */ + gcc_checking_assert (osi->pass == 0); + addr_object_size (osi, op0, object_size_type, &bytes, &wholesize); } + + /* In the first pass, do not compute size for offset if either the + maximum size is unknown or the minimum size is not initialized yet; + the latter indicates a dependency loop and will be resolved in + subsequent passes. We attempt to compute offset for 0 minimum size + too because a negative offset could be within bounds of WHOLESIZE, + giving a non-zero result for VAR. */ + if (osi->pass != 0 || !size_unknown_p (bytes, 0)) + bytes = size_for_offset (bytes, op1, wholesize); } else - bytes = unknown (object_size_type); </cut>

3 years, 6 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain