- linaro-toolchain - lists.linaro.org

[ACTIVITY] week ending Apr. 3 2022

by Alex Bennée

Project Stratos =============== - more follow-up on Re: Understanding osdep_xenforeignmemory_map mmap behaviour Message-Id: <alpine.DEB.2.22.394.2203231838130.2910984@ubuntu-linux-20-04-desktop> - various Stratos sync-ups vhost-device maintainer effort ([UM-196]) - bit of maintainer review work Linux RPMB Sub-system and virtio-driver ([STR-40]) - got [kernel] and [vhost-user] daemon passing all tests with multi-block reads/writes - will clean-up series next week for posting to the lists [STR-40] <https://linaro.atlassian.net/browse/STR-40> [kernel] <https://git.linaro.org/people/alex.bennee/linux.git/tag/?h=testing/vrpmb-re…> [vhost-user] <https://github.com/stsquad/qemu/tree/virtio/vhost-user-rpmb-v2> QEMU Upstream Work ([UM-2]) =========================== - posted [PATCH v1 0/2] some tests and plugin tweaks for SVE Message-Id: <20220328152614.2452259-1-alex.bennee(a)linaro.org> - posted [PATCH v3] tests/avocado: update aarch64_virt test to exercise -cpu max Message-Id: <20220328161357.2464572-1-alex.bennee(a)linaro.org> - posted [RFC PATCH] docs/devel: add some notes on the binfmt-image-debian targets Message-Id: <20220329095041.2758355-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Other ===== - Presented [LTD 2022 QEMU talk] - attneded some others [LTD 2022 QEMU talk] <https://resources.linaro.org/en/resource/tL9M2yyti73StqK1d9ap8f> Completed Reviews [1/1] ======================= [PATCH 00/15] tests/docker and tests/tcg cleanup and diet Message-Id: <87czi6xbzo.fsf(a)linaro.org> Absences ======== Current Review Queue ==================== TODO [PATCH 00/17] tests/docker and tests/tcg cleanup and diet Message-Id: <20220401141326.1244422-1-pbonzini(a)redhat.com> ======================================================================================================================== TODO [RFC PATCH 0/6] softfloat 128-bit integer support Message-Id: <20220328201442.175206-1-matheus.ferst(a)eldorado.org.br> ========================================================================================================================= TODO [PATCH for-7.1 v2 00/39] Logging cleanup and per-thread logfiles Message-Id: <20220326132534.543738-1-richard.henderson(a)linaro.org> ======================================================================================================================================= -- Alex Bennée

4 years, 3 months

1
0
0 0

[ACTIVITY] report week ending 1 Apr

by Peter Maydell

Progress (short week, 2 days): * UM-2 [QEMU upstream maintainership] + More of the usual freeze-related work + Tracked down and fixed assertion when running with clang sanitizers + We finally got a Coverity Scan run through for the first time in a month or two, and it was full of new issues. Spent some time going through them and marking false positives or reporting the problems back to original code authors to be fixed + Looking at the tangle of interrupt lines in our exynos4210 SoC model -- this needs a refactoring and cleanup so we can get rid of its uses of an obsolete function -- PMM

4 years, 3 months

1
0
0 0

Re: [TCWG CI] 456.hmmer grew in size by 9% after llvm: Extend the `uwtable` attribute with unwind table kind

by Maxim Kuvyrkov

Hi Momchil, Your patch seems to significantly increase code-size of several benchmarks — by up to 9%. Would you please investigate whether this can be avoided? Please let us know if you need assistance with reproducing the regressions. Thank you, -- Maxim Kuvyrkov https://www.linaro.org > On 25 Mar 2022, at 14:45, ci_notify(a)linaro.org wrote: > > After llvm commit 6398903ac8c141820a84f3063b7956abe1742500 > Author: Momchil Velikov <momchil.velikov(a)arm.com> > > Extend the `uwtable` attribute with unwind table kind > > the following benchmarks grew in size by more than 1%: > - 456.hmmer grew in size by 9% from 104960 to 113912 bytes > - 403.gcc grew in size by 9% from 2191632 to 2394404 bytes > - 400.perlbench grew in size by 9% from 803690 to 879478 bytes > - 458.sjeng grew in size by 5% from 102355 to 107719 bytes > - 401.bzip2 grew in size by 3% from 43428 to 44772 bytes > > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: Clang + Glibc + LLVM Linker > - Version: all components were built from their tip of trunk > - Target: aarch64-linux-gnu > - Compiler flags: -Oz -flto > - Hardware: APM Mustang 8x X-Gene1 > > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-6398903ac8c141820a84f3063b7956abe1742500 > cd investigate-llvm-6398903ac8c141820a84f3063b7956abe1742500 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 6398903ac8c141820a84f3063b7956abe1742500 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 48f188433335846bba4cf3e5e9fa2150d4c0253b > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 6398903ac8c141820a84f3063b7956abe1742500 > Author: Momchil Velikov <momchil.velikov(a)arm.com> > Date: Mon Feb 14 13:41:34 2022 +0000 > > Extend the `uwtable` attribute with unwind table kind > > We have the `clang -cc1` command-line option `-funwind-tables=1|2` and > the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind > tables (1) or asynchronous unwind tables (2)`. However, this is > encoded in LLVM IR by the presence or the absence of the `uwtable` > attribute, i.e. we lose the information whether to generate want just > some unwind tables or asynchronous unwind tables. > > Asynchronous unwind tables take more space in the runtime image, I'd > estimate something like 80-90% more, as the difference is adding > roughly the same number of CFI directives as for prologues, only a bit > simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even > more, if you consider tail duplication of epilogue blocks. > Asynchronous unwind tables could also restrict code generation to > having only a finite number of frame pointer adjustments (an example > of *not* having a finite number of `SP` adjustments is on AArch64 when > untagging the stack (MTE) in some cases the compiler can modify `SP` > in a loop). > Having the CFI precise up to an instruction generally also means one > cannot bundle together CFI instructions once the prologue is done, > they need to be interspersed with ordinary instructions, which means > extra `DW_CFA_advance_loc` commands, further increasing the unwind > tables size. > > That is to say, async unwind tables impose a non-negligible overhead, > yet for the most common use cases (like C++ exceptions), they are not > even needed. > > This patch extends the `uwtable` attribute with an optional > value: > - `uwtable` (default to `async`) > - `uwtable(sync)`, synchronous unwind tables > - `uwtable(async)`, asynchronous (instruction precise) unwind tables > > Reviewed By: MaskRay > > Differential Revision: https://reviews.llvm.org/D114543 > --- > clang/lib/CodeGen/CGExpr.cpp | 2 +- > clang/lib/CodeGen/CodeGenModule.cpp | 4 +- > clang/test/CodeGen/asan-globals.cpp | 2 +- > clang/test/CodeGen/uwtable-attr.c | 30 +++++ > llvm/bindings/go/llvm/ir_test.go | 1 - > llvm/docs/LangRef.rst | 12 +- > llvm/include/llvm/AsmParser/LLParser.h | 1 + > llvm/include/llvm/AsmParser/LLToken.h | 2 + > llvm/include/llvm/IR/Attributes.h | 13 +++ > llvm/include/llvm/IR/Attributes.td | 2 +- > llvm/include/llvm/IR/Function.h | 12 +- > llvm/include/llvm/IR/Module.h | 4 +- > llvm/include/llvm/Support/CodeGen.h | 8 +- > llvm/lib/AsmParser/LLLexer.cpp | 2 + > llvm/lib/AsmParser/LLParser.cpp | 23 ++++ > llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 4 + > llvm/lib/CodeGen/MachineOutliner.cpp | 9 ++ > llvm/lib/IR/AttributeImpl.h | 1 + > llvm/lib/IR/Attributes.cpp | 50 ++++++++ > llvm/lib/IR/Function.cpp | 5 +- > llvm/lib/IR/Module.cpp | 11 +- > llvm/test/Assembler/uwtable-1.ll | 7 ++ > llvm/test/Assembler/uwtable-2.ll | 4 + > llvm/test/Bitcode/attributes.ll | 11 ++ > llvm/test/CodeGen/Thumb2/pacbti-m-outliner-1.ll | 4 +- > llvm/test/CodeGen/Thumb2/pacbti-m-outliner-3.ll | 5 +- > .../AddressSanitizer/module-flags.ll | 2 +- > .../Attributor/ArgumentPromotion/X86/attributes.ll | 34 +++--- > .../X86/min-legal-vector-width.ll | 128 ++++++++++----------- > llvm/test/Transforms/Attributor/align.ll | 48 ++++---- > llvm/test/Transforms/Attributor/allow_list.ll | 4 +- > .../Transforms/Attributor/cb_liveness_disabled.ll | 4 +- > .../Transforms/Attributor/cb_liveness_enabled.ll | 4 +- > .../test/Transforms/Attributor/internal-noalias.ll | 28 ++--- > llvm/test/Transforms/Attributor/liveness.ll | 12 +- > llvm/test/Transforms/Attributor/nocapture-2.ll | 48 ++++---- > llvm/test/Transforms/Attributor/nofree.ll | 38 +++--- > llvm/test/Transforms/Attributor/noreturn.ll | 12 +- > llvm/test/Transforms/Attributor/nosync.ll | 28 ++--- > llvm/test/Transforms/Attributor/returned.ll | 48 ++++---- > .../Attributor/value-simplify-pointer-info.ll | 14 +-- > llvm/test/Transforms/Attributor/willreturn.ll | 56 ++++----- > llvm/test/Transforms/FunctionAttrs/atomic.ll | 4 +- > .../Transforms/FunctionAttrs/nofree-attributor.ll | 2 +- > llvm/test/Transforms/FunctionAttrs/nofree.ll | 2 +- > llvm/test/Transforms/FunctionAttrs/nosync.ll | 16 +-- > llvm/test/Transforms/GCOVProfiling/module-flags.ll | 2 +- > .../Inputs/check_attrs.ll.funcattrs.expected | 2 +- > llvm/unittests/IR/VerifierTest.cpp | 3 +- > 49 files changed, 473 insertions(+), 295 deletions(-) > > diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp > index fa45554bb54f..e0e1dd5df586 100644 > --- a/clang/lib/CodeGen/CGExpr.cpp > +++ b/clang/lib/CodeGen/CGExpr.cpp > @@ -3188,7 +3188,7 @@ static void emitCheckHandlerCall(CodeGenFunction &CGF, > B.addAttribute(llvm::Attribute::NoReturn) > .addAttribute(llvm::Attribute::NoUnwind); > } > - B.addAttribute(llvm::Attribute::UWTable); > + B.addUWTableAttr(llvm::UWTableKind::Default); > > llvm::FunctionCallee Fn = CGF.CGM.CreateRuntimeFunction( > FnType, FnName, > diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp > index 772059a436d1..c99fd899ac93 100644 > --- a/clang/lib/CodeGen/CodeGenModule.cpp > +++ b/clang/lib/CodeGen/CodeGenModule.cpp > @@ -828,7 +828,7 @@ void CodeGenModule::Release() { > if (CodeGenOpts.NoPLT) > getModule().setRtLibUseGOT(); > if (CodeGenOpts.UnwindTables) > - getModule().setUwtable(); > + getModule().setUwtable(llvm::UWTableKind(CodeGenOpts.UnwindTables)); > > switch (CodeGenOpts.getFramePointer()) { > case CodeGenOptions::FramePointerKind::None: > @@ -1839,7 +1839,7 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D, > llvm::AttrBuilder B(F->getContext()); > > if (CodeGenOpts.UnwindTables) > - B.addAttribute(llvm::Attribute::UWTable); > + B.addUWTableAttr(llvm::UWTableKind(CodeGenOpts.UnwindTables)); > > if (CodeGenOpts.StackClashProtector) > B.addAttribute("probe-stack", "inline-asm"); > diff --git a/clang/test/CodeGen/asan-globals.cpp b/clang/test/CodeGen/asan-globals.cpp > index a77060b124e9..2cea167d0ea5 100644 > --- a/clang/test/CodeGen/asan-globals.cpp > +++ b/clang/test/CodeGen/asan-globals.cpp > @@ -48,7 +48,7 @@ void func() { > // RUN: %clang_cc1 -emit-llvm -fsanitize=address -funwind-tables=2 -o - %s | FileCheck %s --check-prefixes=UWTABLE > // UWTABLE: define internal void @asan.module_dtor() #[[#ATTR:]] { > // UWTABLE: attributes #[[#ATTR]] = { nounwind uwtable } > -// UWTABLE: ![[#]] = !{i32 7, !"uwtable", i32 1} > +// UWTABLE: ![[#]] = !{i32 7, !"uwtable", i32 2} > > // CHECK: !llvm.asan.globals = !{![[EXTRA_GLOBAL:[0-9]+]], ![[GLOBAL:[0-9]+]], ![[DYN_INIT_GLOBAL:[0-9]+]], ![[ATTR_GLOBAL:[0-9]+]], ![[IGNORELISTED_GLOBAL:[0-9]+]], ![[SECTIONED_GLOBAL:[0-9]+]], ![[SPECIAL_GLOBAL:[0-9]+]], ![[STATIC_VAR:[0-9]+]], ![[LITERAL:[0-9]+]]} > // CHECK: ![[EXTRA_GLOBAL]] = !{{{.*}} ![[EXTRA_GLOBAL_LOC:[0-9]+]], !"extra_global", i1 false, i1 false} > diff --git a/clang/test/CodeGen/uwtable-attr.c b/clang/test/CodeGen/uwtable-attr.c > new file mode 100644 > index 000000000000..7436db979b6b > --- /dev/null > +++ b/clang/test/CodeGen/uwtable-attr.c > @@ -0,0 +1,30 @@ > +// Test that function and modules attributes react on the command-line options, > +// it does not state the current behaviour makes sense in all cases (it does not). > + > +// RUN: %clang -S -emit-llvm -o - %s | FileCheck %s -check-prefixes=CHECK,DEFAULT > +// RUN: %clang -S -emit-llvm -o - %s -funwind-tables -fno-asynchronous-unwind-tables | FileCheck %s -check-prefixes=CHECK,TABLES > +// RUN: %clang -S -emit-llvm -o - %s -fno-unwind-tables -fno-asynchronous-unwind-tables | FileCheck %s -check-prefixes=CHECK,NO_TABLES > + > +// RUN: %clang -S -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefixes=CHECK,DEFAULT > +// RUN: %clang -S -emit-llvm -o - -x c++ %s -funwind-tables -fno-asynchronous-unwind-tables | FileCheck %s -check-prefixes=CHECK,TABLES > +// RUN: %clang -S -emit-llvm -o - -x c++ %s -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables | FileCheck %s -check-prefixes=CHECK,NO_TABLES > + > +#ifdef __cplusplus > +extern "C" void g(void); > +struct S { ~S(); }; > +extern "C" int f() { S s; g(); return 0;}; > +#else > +void g(void); > +int f() { g(); return 0; }; > +#endif > + > +// CHECK: define {{.*}} @f() #[[#F:]] > +// CHECK: declare {{.*}} @g() #[[#]] > + > +// DEFAULT: attributes #[[#F]] = { {{.*}} uwtable{{ }}{{.*}} } > +// DEFAULT: ![[#]] = !{i32 7, !"uwtable", i32 2} > + > +// TABLES: attributes #[[#F]] = { {{.*}} uwtable(sync){{.*}} } > +// TABLES: ![[#]] = !{i32 7, !"uwtable", i32 1} > + > +// NO_TABLES-NOT: uwtable > diff --git a/llvm/bindings/go/llvm/ir_test.go b/llvm/bindings/go/llvm/ir_test.go > index 71c47d94a0ec..61b482f2ef9a 100644 > --- a/llvm/bindings/go/llvm/ir_test.go > +++ b/llvm/bindings/go/llvm/ir_test.go > @@ -83,7 +83,6 @@ func TestAttributes(t *testing.T) { > "sspstrong", > "sanitize_thread", > "sanitize_memory", > - "uwtable", > "zeroext", > "cold", > "nocf_check", > diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst > index 6b44b7e7355c..1a212c661597 100644 > --- a/llvm/docs/LangRef.rst > +++ b/llvm/docs/LangRef.rst > @@ -2108,12 +2108,15 @@ example: > function with a tail call. The prototype of a thunk should not be used for > optimization purposes. The caller is expected to cast the thunk prototype to > match the thunk target prototype. > -``uwtable`` > +``uwtable[(sync|async)]`` > This attribute indicates that the ABI being targeted requires that > an unwind table entry be produced for this function even if we can > show that no exceptions passes by it. This is normally the case for > the ELF x86-64 abi, but it can be disabled for some compilation > - units. > + units. The optional parameter describes what kind of unwind tables > + to generate: ``sync`` for normal unwind tables, ``async`` for asynchronous > + (instruction precise) unwind tables. Without the parameter, the attribute > + ``uwtable`` is equivalent to ``uwtable(async)``. > ``nocf_check`` > This attribute indicates that no control-flow check will be performed on > the attributed entity. It disables -fcf-protection=<> for a specific > @@ -7215,8 +7218,9 @@ functions is small. > - "frame-pointer": **Max**. The value can be 0, 1, or 2. A synthesized function > will get the "frame-pointer" function attribute, with value being "none", > "non-leaf", or "all", respectively. > -- "uwtable": **Max**. The value can be 0 or 1. If the value is 1, a synthesized > - function will get the ``uwtable`` function attribute. > +- "uwtable": **Max**. The value can be 0, 1, or 2. If the value is 1, a synthesized > + function will get the ``uwtable(sync)`` function attribute, if the value is 2, > + a synthesized function will get the ``uwtable(async)`` function attribute. > > Objective-C Garbage Collection Module Flags Metadata > ---------------------------------------------------- > diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h > index 62af3afbc142..b2f7b9ebb721 100644 > --- a/llvm/include/llvm/AsmParser/LLParser.h > +++ b/llvm/include/llvm/AsmParser/LLParser.h > @@ -263,6 +263,7 @@ namespace llvm { > bool parseOptionalAlignment(MaybeAlign &Alignment, > bool AllowParens = false); > bool parseOptionalDerefAttrBytes(lltok::Kind AttrKind, uint64_t &Bytes); > + bool parseOptionalUWTableKind(UWTableKind &Kind); > bool parseScopeAndOrdering(bool IsAtomic, SyncScope::ID &SSID, > AtomicOrdering &Ordering); > bool parseScope(SyncScope::ID &SSID); > diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h > index 78ebb35e0ea4..faac67ebbab9 100644 > --- a/llvm/include/llvm/AsmParser/LLToken.h > +++ b/llvm/include/llvm/AsmParser/LLToken.h > @@ -252,6 +252,8 @@ enum Kind { > kw_immarg, > kw_byref, > kw_mustprogress, > + kw_sync, > + kw_async, > > kw_type, > kw_opaque, > diff --git a/llvm/include/llvm/IR/Attributes.h b/llvm/include/llvm/IR/Attributes.h > index 74b60f1e3d05..61819b1a07fa 100644 > --- a/llvm/include/llvm/IR/Attributes.h > +++ b/llvm/include/llvm/IR/Attributes.h > @@ -22,6 +22,7 @@ > #include "llvm/ADT/StringRef.h" > #include "llvm/Config/llvm-config.h" > #include "llvm/Support/Alignment.h" > +#include "llvm/Support/CodeGen.h" > #include "llvm/Support/PointerLikeTypeTraits.h" > #include <bitset> > #include <cassert> > @@ -130,6 +131,7 @@ public: > static Attribute getWithByRefType(LLVMContext &Context, Type *Ty); > static Attribute getWithPreallocatedType(LLVMContext &Context, Type *Ty); > static Attribute getWithInAllocaType(LLVMContext &Context, Type *Ty); > + static Attribute getWithUWTableKind(LLVMContext &Context, UWTableKind Kind); > > /// For a typed attribute, return the equivalent attribute with the type > /// changed to \p ReplacementTy. > @@ -223,6 +225,9 @@ public: > /// unknown. > Optional<unsigned> getVScaleRangeMax() const; > > + // Returns the unwind table kind. > + UWTableKind getUWTableKind() const; > + > /// The Attribute is converted to a string of equivalent mnemonic. This > /// is, presumably, for writing out the mnemonics for the assembly writer. > std::string getAsString(bool InAttrGrp = false) const; > @@ -353,6 +358,7 @@ public: > std::pair<unsigned, Optional<unsigned>> getAllocSizeArgs() const; > unsigned getVScaleRangeMin() const; > Optional<unsigned> getVScaleRangeMax() const; > + UWTableKind getUWTableKind() const; > std::string getAsString(bool InAttrGrp = false) const; > > /// Return true if this attribute set belongs to the LLVMContext. > @@ -841,6 +847,9 @@ public: > /// arg. > uint64_t getParamDereferenceableOrNullBytes(unsigned ArgNo) const; > > + /// Get the unwind table kind requested for the function. > + UWTableKind getUWTableKind() const; > + > /// Return the attributes at the index as a string. > std::string getAsString(unsigned Index, bool InAttrGrp = false) const; > > @@ -1190,6 +1199,10 @@ public: > /// Attribute.getIntValue(). > AttrBuilder &addVScaleRangeAttrFromRawRepr(uint64_t RawVScaleRangeRepr); > > + /// This turns the unwind table kind into the form used internally in > + /// Attribute. > + AttrBuilder &addUWTableAttr(UWTableKind Kind); > + > ArrayRef<Attribute> attrs() const { return Attrs; } > > bool operator==(const AttrBuilder &B) const; > diff --git a/llvm/include/llvm/IR/Attributes.td b/llvm/include/llvm/IR/Attributes.td > index a03e5441827c..d7a79f90e05e 100644 > --- a/llvm/include/llvm/IR/Attributes.td > +++ b/llvm/include/llvm/IR/Attributes.td > @@ -273,7 +273,7 @@ def SwiftSelf : EnumAttr<"swiftself", [ParamAttr]>; > def SwiftAsync : EnumAttr<"swiftasync", [ParamAttr]>; > > /// Function must be in a unwind table. > -def UWTable : EnumAttr<"uwtable", [FnAttr]>; > +def UWTable : IntAttr<"uwtable", [FnAttr]>; > > /// Minimum/Maximum vscale value for function. > def VScaleRange : IntAttr<"vscale_range", [FnAttr]>; > diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h > index 90095cd1bc77..1b9843e08b28 100644 > --- a/llvm/include/llvm/IR/Function.h > +++ b/llvm/include/llvm/IR/Function.h > @@ -623,15 +623,19 @@ public: > bool willReturn() const { return hasFnAttribute(Attribute::WillReturn); } > void setWillReturn() { addFnAttr(Attribute::WillReturn); } > > + /// Get what kind of unwind table entry to generate for this function. > + UWTableKind getUWTableKind() const { > + return AttributeSets.getUWTableKind(); > + } > + > /// True if the ABI mandates (or the user requested) that this > /// function be in a unwind table. > bool hasUWTable() const { > - return hasFnAttribute(Attribute::UWTable); > + return getUWTableKind() != UWTableKind::None; > } > - void setHasUWTable() { > - addFnAttr(Attribute::UWTable); > + void setUWTableKind(UWTableKind K) { > + addFnAttr(Attribute::getWithUWTableKind(getContext(), K)); > } > - > /// True if this function needs an unwind table. > bool needsUnwindTableEntry() const { > return hasUWTable() || !doesNotThrow() || hasPersonalityFn(); > diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h > index 9385ecab83d2..0414adfaee4d 100644 > --- a/llvm/include/llvm/IR/Module.h > +++ b/llvm/include/llvm/IR/Module.h > @@ -888,8 +888,8 @@ public: > void setRtLibUseGOT(); > > /// Get/set whether synthesized functions should get the uwtable attribute. > - bool getUwtable() const; > - void setUwtable(); > + UWTableKind getUwtable() const; > + void setUwtable(UWTableKind Kind); > > /// Get/set whether synthesized functions should get the "frame-pointer" > /// attribute. > diff --git a/llvm/include/llvm/Support/CodeGen.h b/llvm/include/llvm/Support/CodeGen.h > index ef5cc5d19fc5..71d0ddbfe05e 100644 > --- a/llvm/include/llvm/Support/CodeGen.h > +++ b/llvm/include/llvm/Support/CodeGen.h > @@ -97,6 +97,12 @@ namespace llvm { > }; > } // namespace ZeroCallUsedRegs > > -} // end llvm namespace > + enum class UWTableKind { > + None = 0, ///< No unwind table requested > + Sync = 1, ///< "Synchronous" unwind tables > + Async = 2, ///< "Asynchronous" unwind tables (instr precise) > + Default = 2, > + }; > + } // namespace llvm > > #endif > diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp > index e3bf41c9721b..a508660edfa5 100644 > --- a/llvm/lib/AsmParser/LLLexer.cpp > +++ b/llvm/lib/AsmParser/LLLexer.cpp > @@ -708,6 +708,8 @@ lltok::Kind LLLexer::LexIdentifier() { > KEYWORD(immarg); > KEYWORD(byref); > KEYWORD(mustprogress); > + KEYWORD(sync); > + KEYWORD(async); > > KEYWORD(type); > KEYWORD(opaque); > diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp > index 4281193caf85..769601c7e633 100644 > --- a/llvm/lib/AsmParser/LLParser.cpp > +++ b/llvm/lib/AsmParser/LLParser.cpp > @@ -1333,6 +1333,13 @@ bool LLParser::parseEnumAttribute(Attribute::AttrKind Attr, AttrBuilder &B, > B.addDereferenceableOrNullAttr(Bytes); > return false; > } > + case Attribute::UWTable: { > + UWTableKind Kind; > + if (parseOptionalUWTableKind(Kind)) > + return true; > + B.addUWTableAttr(Kind); > + return false; > + } > default: > B.addAttribute(Attr); > Lex.Lex(); > @@ -1996,6 +2003,22 @@ bool LLParser::parseOptionalDerefAttrBytes(lltok::Kind AttrKind, > return false; > } > > +bool LLParser::parseOptionalUWTableKind(UWTableKind &Kind) { > + Lex.Lex(); > + Kind = UWTableKind::Default; > + if (!EatIfPresent(lltok::lparen)) > + return false; > + LocTy KindLoc = Lex.getLoc(); > + if (Lex.getKind() == lltok::kw_sync) > + Kind = UWTableKind::Sync; > + else if (Lex.getKind() == lltok::kw_async) > + Kind = UWTableKind::Async; > + else > + return error(KindLoc, "expected unwind table kind"); > + Lex.Lex(); > + return parseToken(lltok::rparen, "expected ')'"); > +} > + > /// parseOptionalCommaAlign > /// ::= > /// ::= ',' align 4 > diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > index 3d4b1f64b11c..5f6d980708a5 100644 > --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp > @@ -1628,6 +1628,8 @@ Error BitcodeReader::parseAttributeGroupBlock() { > B.addStructRetAttr(nullptr); > else if (Kind == Attribute::InAlloca) > B.addInAllocaAttr(nullptr); > + else if (Kind == Attribute::UWTable) > + B.addUWTableAttr(UWTableKind::Default); > else if (Attribute::isEnumAttrKind(Kind)) > B.addAttribute(Kind); > else > @@ -1650,6 +1652,8 @@ Error BitcodeReader::parseAttributeGroupBlock() { > B.addAllocSizeAttrFromRawRepr(Record[++i]); > else if (Kind == Attribute::VScaleRange) > B.addVScaleRangeAttrFromRawRepr(Record[++i]); > + else if (Kind == Attribute::UWTable) > + B.addUWTableAttr(UWTableKind(Record[++i])); > } else if (Record[i] == 3 || Record[i] == 4) { // String attribute > bool HasValue = (Record[i++] == 4); > SmallString<64> KindStr; > diff --git a/llvm/lib/CodeGen/MachineOutliner.cpp b/llvm/lib/CodeGen/MachineOutliner.cpp > index 7783b5e0d3cc..d7d098278d2a 100644 > --- a/llvm/lib/CodeGen/MachineOutliner.cpp > +++ b/llvm/lib/CodeGen/MachineOutliner.cpp > @@ -623,6 +623,15 @@ MachineFunction *MachineOutliner::createOutlinedFunction( > > TII.mergeOutliningCandidateAttributes(*F, OF.Candidates); > > + // Set uwtable, so we generate eh_frame. > + UWTableKind UW = std::accumulate( > + OF.Candidates.cbegin(), OF.Candidates.cend(), UWTableKind::None, > + [](UWTableKind K, const outliner::Candidate &C) { > + return std::max(K, C.getMF()->getFunction().getUWTableKind()); > + }); > + if (UW != UWTableKind::None) > + F->setUWTableKind(UW); > + > BasicBlock *EntryBB = BasicBlock::Create(C, "entry", F); > IRBuilder<> Builder(EntryBB); > Builder.CreateRetVoid(); > diff --git a/llvm/lib/IR/AttributeImpl.h b/llvm/lib/IR/AttributeImpl.h > index 1153fb827b56..adf8a4d34a0a 100644 > --- a/llvm/lib/IR/AttributeImpl.h > +++ b/llvm/lib/IR/AttributeImpl.h > @@ -255,6 +255,7 @@ public: > std::pair<unsigned, Optional<unsigned>> getAllocSizeArgs() const; > unsigned getVScaleRangeMin() const; > Optional<unsigned> getVScaleRangeMax() const; > + UWTableKind getUWTableKind() const; > std::string getAsString(bool InAttrGrp) const; > Type *getAttributeType(Attribute::AttrKind Kind) const; > > diff --git a/llvm/lib/IR/Attributes.cpp b/llvm/lib/IR/Attributes.cpp > index 43fde64c3734..5751b99a2807 100644 > --- a/llvm/lib/IR/Attributes.cpp > +++ b/llvm/lib/IR/Attributes.cpp > @@ -205,6 +205,11 @@ Attribute Attribute::getWithInAllocaType(LLVMContext &Context, Type *Ty) { > return get(Context, InAlloca, Ty); > } > > +Attribute Attribute::getWithUWTableKind(LLVMContext &Context, > + UWTableKind Kind) { > + return get(Context, UWTable, uint64_t(Kind)); > +} > + > Attribute > Attribute::getWithAllocSizeArgs(LLVMContext &Context, unsigned ElemSizeArg, > const Optional<unsigned> &NumElemsArg) { > @@ -366,6 +371,12 @@ Optional<unsigned> Attribute::getVScaleRangeMax() const { > return unpackVScaleRangeArgs(pImpl->getValueAsInt()).second; > } > > +UWTableKind Attribute::getUWTableKind() const { > + assert(hasAttribute(Attribute::UWTable) && > + "Trying to get unwind table kind from non-uwtable attribute"); > + return UWTableKind(pImpl->getValueAsInt()); > +} > + > std::string Attribute::getAsString(bool InAttrGrp) const { > if (!pImpl) return {}; > > @@ -426,6 +437,25 @@ std::string Attribute::getAsString(bool InAttrGrp) const { > .str(); > } > > + if (hasAttribute(Attribute::UWTable)) { > + UWTableKind Kind = getUWTableKind(); > + if (Kind != UWTableKind::None) { > + return Kind == UWTableKind::Default > + ? "uwtable" > + : ("uwtable(" + > + Twine(Kind == UWTableKind::Sync ? "sync" : "async") + ")") > + .str(); > + } > + > + if (Kind != UWTableKind::None) { > + if (Kind == UWTableKind::Default) > + return "uwtable"; > + return ("uwtable(" + Twine(Kind == UWTableKind::Sync ? "sync" : "async") + > + ")") > + .str(); > + } > + } > + > // Convert target-dependent attributes to strings of the form: > // > // "kind" > @@ -710,6 +740,10 @@ Optional<unsigned> AttributeSet::getVScaleRangeMax() const { > return SetNode ? SetNode->getVScaleRangeMax() : None; > } > > +UWTableKind AttributeSet::getUWTableKind() const { > + return SetNode ? SetNode->getUWTableKind() : UWTableKind::None; > +} > + > std::string AttributeSet::getAsString(bool InAttrGrp) const { > return SetNode ? SetNode->getAsString(InAttrGrp) : ""; > } > @@ -876,6 +910,12 @@ Optional<unsigned> AttributeSetNode::getVScaleRangeMax() const { > return None; > } > > +UWTableKind AttributeSetNode::getUWTableKind() const { > + if (auto A = findEnumAttribute(Attribute::UWTable)) > + return A->getUWTableKind(); > + return UWTableKind::None; > +} > + > std::string AttributeSetNode::getAsString(bool InAttrGrp) const { > std::string Str; > for (iterator I = begin(), E = end(); I != E; ++I) { > @@ -1428,6 +1468,10 @@ AttributeList::getParamDereferenceableOrNullBytes(unsigned Index) const { > return getParamAttrs(Index).getDereferenceableOrNullBytes(); > } > > +UWTableKind AttributeList::getUWTableKind() const { > + return getFnAttrs().getUWTableKind(); > +} > + > std::string AttributeList::getAsString(unsigned Index, bool InAttrGrp) const { > return getAttributes(Index).getAsString(InAttrGrp); > } > @@ -1649,6 +1693,12 @@ AttrBuilder &AttrBuilder::addVScaleRangeAttrFromRawRepr(uint64_t RawArgs) { > return addRawIntAttr(Attribute::VScaleRange, RawArgs); > } > > +AttrBuilder &AttrBuilder::addUWTableAttr(UWTableKind Kind) { > + if (Kind == UWTableKind::None) > + return *this; > + return addRawIntAttr(Attribute::UWTable, uint64_t(Kind)); > +} > + > Type *AttrBuilder::getTypeAttr(Attribute::AttrKind Kind) const { > assert(Attribute::isTypeAttrKind(Kind) && "Not a type attribute"); > Attribute A = getAttribute(Kind); > diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp > index 726ba80da41b..6ae3d0b4dcb9 100644 > --- a/llvm/lib/IR/Function.cpp > +++ b/llvm/lib/IR/Function.cpp > @@ -339,8 +339,9 @@ Function *Function::createWithDefaultAttr(FunctionType *Ty, > Module *M) { > auto *F = new Function(Ty, Linkage, AddrSpace, N, M); > AttrBuilder B(F->getContext()); > - if (M->getUwtable()) > - B.addAttribute(Attribute::UWTable); > + UWTableKind UWTable = M->getUwtable(); > + if (UWTable != UWTableKind::None) > + B.addUWTableAttr(UWTable); > switch (M->getFramePointer()) { > case FramePointerKind::None: > // 0 ("none") is the default. > diff --git a/llvm/lib/IR/Module.cpp b/llvm/lib/IR/Module.cpp > index 6156edd99790..b66a99ba17b0 100644 > --- a/llvm/lib/IR/Module.cpp > +++ b/llvm/lib/IR/Module.cpp > @@ -671,12 +671,15 @@ void Module::setRtLibUseGOT() { > addModuleFlag(ModFlagBehavior::Max, "RtLibUseGOT", 1); > } > > -bool Module::getUwtable() const { > - auto *Val = cast_or_null<ConstantAsMetadata>(getModuleFlag("uwtable")); > - return Val && (cast<ConstantInt>(Val->getValue())->getZExtValue() > 0); > +UWTableKind Module::getUwtable() const { > + if (auto *Val = cast_or_null<ConstantAsMetadata>(getModuleFlag("uwtable"))) > + return UWTableKind(cast<ConstantInt>(Val->getValue())->getZExtValue()); > + return UWTableKind::None; > } > > -void Module::setUwtable() { addModuleFlag(ModFlagBehavior::Max, "uwtable", 1); } > +void Module::setUwtable(UWTableKind Kind) { > + addModuleFlag(ModFlagBehavior::Max, "uwtable", uint32_t(Kind)); > +} > > FramePointerKind Module::getFramePointer() const { > auto *Val = cast_or_null<ConstantAsMetadata>(getModuleFlag("frame-pointer")); > diff --git a/llvm/test/Assembler/uwtable-1.ll b/llvm/test/Assembler/uwtable-1.ll > new file mode 100644 > index 000000000000..2e9e3f0cab6d > --- /dev/null > +++ b/llvm/test/Assembler/uwtable-1.ll > @@ -0,0 +1,7 @@ > +; RUN: not llvm-as %s -o /dev/null 2>&1 | FileCheck %s > + > +declare void @f0() uwtable > +declare void @f1() uwtable(sync) > +declare void @f2() uwtable(async) > +declare void @f3() uwtable(unsync) > +; CHECK: :[[#@LINE-1]]:28: error: expected unwind table kind > diff --git a/llvm/test/Assembler/uwtable-2.ll b/llvm/test/Assembler/uwtable-2.ll > new file mode 100644 > index 000000000000..c04228dbf157 > --- /dev/null > +++ b/llvm/test/Assembler/uwtable-2.ll > @@ -0,0 +1,4 @@ > +; RUN: not llvm-as %s -o /dev/null 2>&1 | FileCheck %s > + > +declare void @f() uwtable(sync x > +; CHECK: :[[#@LINE-1]]:32: error: expected ')' > diff --git a/llvm/test/Bitcode/attributes.ll b/llvm/test/Bitcode/attributes.ll > index b2b92bb6e12d..5d3828d2762d 100644 > --- a/llvm/test/Bitcode/attributes.ll > +++ b/llvm/test/Bitcode/attributes.ll > @@ -516,6 +516,16 @@ define void @f83(<4 x i8*> align 32 %0, <vscale x 1 x double*> align 64 %1) { > ret void > } > > +; CHECK: define void @f84() #51 > +define void @f84() uwtable(sync) { > + ret void; > +} > + > +; CHECK: define void @f85() #15 > +define void @f85() uwtable(async) { > + ret void; > +} > + > ; CHECK: attributes #0 = { noreturn } > ; CHECK: attributes #1 = { nounwind } > ; CHECK: attributes #2 = { readnone } > @@ -567,4 +577,5 @@ define void @f83(<4 x i8*> align 32 %0, <vscale x 1 x double*> align 64 %1) { > ; CHECK: attributes #48 = { nosanitize_coverage } > ; CHECK: attributes #49 = { noprofile } > ; CHECK: attributes #50 = { disable_sanitizer_instrumentation } > +; CHECK: attributes #51 = { uwtable(sync) } > ; CHECK: attributes #[[NOBUILTIN]] = { nobuiltin } > diff --git a/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-1.ll b/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-1.ll > index af761a4c37ec..84983411e86c 100644 > --- a/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-1.ll > +++ b/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-1.ll > @@ -137,7 +137,9 @@ attributes #0 = { minsize nofree norecurse nounwind optsize uwtable} > ; UNWIND-NEXT: 0xB0 ; finish > > ; UNWIND-LABEL: FunctionAddress: 0x40 > -; UNWIND: Model: CantUnwind > +; UNWIND: Opcodes [ > +; UNWIND-NEXT: 0xB0 ; finish > + > > ; UNWINND-LABEL: 00000041 {{.*}} OUTLINED_FUNCTION_0 > ; UNWINND-LABEL: 00000001 {{.*}} x > diff --git a/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-3.ll b/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-3.ll > index 9251e1b4ddf6..edbae593ee84 100644 > --- a/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-3.ll > +++ b/llvm/test/CodeGen/Thumb2/pacbti-m-outliner-3.ll > @@ -154,8 +154,9 @@ attributes #0 = { minsize noinline norecurse nounwind optsize readnone uwtable } > ; UNWIND-NEXT: 0xAA ; pop {r4, r5, r6, lr} > > ; UNWIND-LABEL: FunctionAddress: 0x5C > -; UNWIND: Model: CantUnwind > - > +; UNWIND: 0xB4 ; pop ra_auth_code > +; UNWIND: 0x84 0x00 ; pop {lr} > + > ; UNWIND-LABEL: 0000005d {{.*}} OUTLINED_FUNCTION_0 > ; UNWIND-LABEL: 00000005 {{.*}} f > ; UNWIND-LABEL: 00000031 {{.*}} g > diff --git a/llvm/test/Instrumentation/AddressSanitizer/module-flags.ll b/llvm/test/Instrumentation/AddressSanitizer/module-flags.ll > index ca3c6f3051f1..c046592890b4 100644 > --- a/llvm/test/Instrumentation/AddressSanitizer/module-flags.ll > +++ b/llvm/test/Instrumentation/AddressSanitizer/module-flags.ll > @@ -13,7 +13,7 @@ entry: > !llvm.module.flags = !{!0, !1} > > ;; Due to -fasynchronous-unwind-tables. > -!0 = !{i32 7, !"uwtable", i32 1} > +!0 = !{i32 7, !"uwtable", i32 2} > > ;; Due to -fno-omit-frame-pointer. > !1 = !{i32 7, !"frame-pointer", i32 2} > diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll > index b077dd388780..90437f5876d7 100644 > --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll > +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/attributes.ll > @@ -9,7 +9,7 @@ > target triple = "x86_64-unknown-linux-gnu" > > define internal fastcc void @no_promote_avx2(<4 x i64>* %arg, <4 x i64>* readonly %arg1) #0 { > -; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS________OPM-LABEL: define {{[^@]+}}@no_promote_avx2 > ; IS________OPM-SAME: (<4 x i64>* nocapture nofree noundef nonnull writeonly align 32 dereferenceable(32) [[ARG:%.*]], <4 x i64>* nocapture nofree noundef nonnull readonly align 32 dereferenceable(32) [[ARG1:%.*]]) #[[ATTR0:[0-9]+]] { > ; IS________OPM-NEXT: bb: > @@ -17,7 +17,7 @@ define internal fastcc void @no_promote_avx2(<4 x i64>* %arg, <4 x i64>* readonl > ; IS________OPM-NEXT: store <4 x i64> [[TMP]], <4 x i64>* [[ARG]], align 32 > ; IS________OPM-NEXT: ret void > ; > -; IS________NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS________NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS________NPM-LABEL: define {{[^@]+}}@no_promote_avx2 > ; IS________NPM-SAME: (<4 x i64>* noalias nocapture nofree noundef nonnull writeonly align 32 dereferenceable(32) [[ARG:%.*]], <4 x i64>* noalias nocapture nofree noundef nonnull readonly align 32 dereferenceable(32) [[ARG1:%.*]]) #[[ATTR0:[0-9]+]] { > ; IS________NPM-NEXT: bb: > @@ -32,7 +32,7 @@ bb: > } > > define void @no_promote(<4 x i64>* %arg) #1 { > -; IS__TUNIT_OPM: Function Attrs: argmemonly nofree nosync nounwind uwtable willreturn > +; IS__TUNIT_OPM: Function Attrs: argmemonly nofree nosync nounwind willreturn uwtable > ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@no_promote > ; IS__TUNIT_OPM-SAME: (<4 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS__TUNIT_OPM-NEXT: bb: > @@ -45,7 +45,7 @@ define void @no_promote(<4 x i64>* %arg) #1 { > ; IS__TUNIT_OPM-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[ARG]], align 2 > ; IS__TUNIT_OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly nofree nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly nofree nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@no_promote > ; IS__TUNIT_NPM-SAME: (<4 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -58,7 +58,7 @@ define void @no_promote(<4 x i64>* %arg) #1 { > ; IS__TUNIT_NPM-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[ARG]], align 2 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_OPM: Function Attrs: argmemonly nofree nosync nounwind uwtable willreturn > +; IS__CGSCC_OPM: Function Attrs: argmemonly nofree nosync nounwind willreturn uwtable > ; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@no_promote > ; IS__CGSCC_OPM-SAME: (<4 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(32) [[ARG:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS__CGSCC_OPM-NEXT: bb: > @@ -71,7 +71,7 @@ define void @no_promote(<4 x i64>* %arg) #1 { > ; IS__CGSCC_OPM-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[ARG]], align 2 > ; IS__CGSCC_OPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly nofree nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly nofree nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@no_promote > ; IS__CGSCC_NPM-SAME: (<4 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(32) [[ARG:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -96,7 +96,7 @@ bb: > } > > define internal fastcc void @promote_avx2(<4 x i64>* %arg, <4 x i64>* readonly %arg1) #0 { > -; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS________OPM-LABEL: define {{[^@]+}}@promote_avx2 > ; IS________OPM-SAME: (<4 x i64>* nocapture nofree noundef nonnull writeonly align 32 dereferenceable(32) [[ARG:%.*]], <4 x i64>* nocapture nofree noundef nonnull readonly align 32 dereferenceable(32) [[ARG1:%.*]]) #[[ATTR0]] { > ; IS________OPM-NEXT: bb: > @@ -104,7 +104,7 @@ define internal fastcc void @promote_avx2(<4 x i64>* %arg, <4 x i64>* readonly % > ; IS________OPM-NEXT: store <4 x i64> [[TMP]], <4 x i64>* [[ARG]], align 32 > ; IS________OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@promote_avx2 > ; IS__TUNIT_NPM-SAME: (<4 x i64>* noalias nocapture nofree noundef nonnull writeonly align 32 dereferenceable(32) [[ARG:%.*]], <4 x i64> [[TMP0:%.*]]) #[[ATTR0]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -114,7 +114,7 @@ define internal fastcc void @promote_avx2(<4 x i64>* %arg, <4 x i64>* readonly % > ; IS__TUNIT_NPM-NEXT: store <4 x i64> [[TMP]], <4 x i64>* [[ARG]], align 32 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@promote_avx2 > ; IS__CGSCC_NPM-SAME: (<4 x i64>* noalias nocapture nofree noundef nonnull writeonly align 32 dereferenceable(32) [[ARG:%.*]], <4 x i64> [[TMP0:%.*]]) #[[ATTR0]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -131,7 +131,7 @@ bb: > } > > define void @promote(<4 x i64>* %arg) #0 { > -; IS__TUNIT_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@promote > ; IS__TUNIT_OPM-SAME: (<4 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR0]] { > ; IS__TUNIT_OPM-NEXT: bb: > @@ -144,7 +144,7 @@ define void @promote(<4 x i64>* %arg) #0 { > ; IS__TUNIT_OPM-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[ARG]], align 2 > ; IS__TUNIT_OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@promote > ; IS__TUNIT_NPM-SAME: (<4 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR0]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -158,7 +158,7 @@ define void @promote(<4 x i64>* %arg) #0 { > ; IS__TUNIT_NPM-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[ARG]], align 2 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@promote > ; IS__CGSCC_OPM-SAME: (<4 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(32) [[ARG:%.*]]) #[[ATTR0]] { > ; IS__CGSCC_OPM-NEXT: bb: > @@ -171,7 +171,7 @@ define void @promote(<4 x i64>* %arg) #0 { > ; IS__CGSCC_OPM-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[ARG]], align 2 > ; IS__CGSCC_OPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@promote > ; IS__CGSCC_NPM-SAME: (<4 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(32) [[ARG:%.*]]) #[[ATTR0]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -203,14 +203,14 @@ attributes #0 = { inlinehint norecurse nounwind uwtable "target-features"="+avx2 > attributes #1 = { nounwind uwtable } > attributes #2 = { argmemonly nounwind } > ;. > -; IS__TUNIT____: attributes #[[ATTR0:[0-9]+]] = { argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn "target-features"="+avx2" } > -; IS__TUNIT____: attributes #[[ATTR1:[0-9]+]] = { argmemonly nofree nosync nounwind uwtable willreturn } > +; IS__TUNIT____: attributes #[[ATTR0:[0-9]+]] = { argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable "target-features"="+avx2" } > +; IS__TUNIT____: attributes #[[ATTR1:[0-9]+]] = { argmemonly nofree nosync nounwind willreturn uwtable } > ; IS__TUNIT____: attributes #[[ATTR2:[0-9]+]] = { argmemonly nofree nounwind willreturn writeonly } > ; IS__TUNIT____: attributes #[[ATTR3:[0-9]+]] = { willreturn writeonly } > ; IS__TUNIT____: attributes #[[ATTR4:[0-9]+]] = { nofree nosync nounwind willreturn } > ;. > -; IS__CGSCC____: attributes #[[ATTR0:[0-9]+]] = { argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn "target-features"="+avx2" } > -; IS__CGSCC____: attributes #[[ATTR1:[0-9]+]] = { argmemonly nofree nosync nounwind uwtable willreturn } > +; IS__CGSCC____: attributes #[[ATTR0:[0-9]+]] = { argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable "target-features"="+avx2" } > +; IS__CGSCC____: attributes #[[ATTR1:[0-9]+]] = { argmemonly nofree nosync nounwind willreturn uwtable } > ; IS__CGSCC____: attributes #[[ATTR2:[0-9]+]] = { argmemonly nofree nounwind willreturn writeonly } > ; IS__CGSCC____: attributes #[[ATTR3:[0-9]+]] = { willreturn writeonly } > ; IS__CGSCC____: attributes #[[ATTR4:[0-9]+]] = { nosync nounwind willreturn } > diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll > index a7bcf1e42252..7a2b796cb321 100644 > --- a/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll > +++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/X86/min-legal-vector-width.ll > @@ -11,7 +11,7 @@ target triple = "x86_64-unknown-linux-gnu" > ; This should promote > define internal fastcc void @callee_avx512_legal512_prefer512_call_avx512_legal512_prefer512(<8 x i64>* %arg, <8 x i64>* readonly %arg1) #0 { > ; > -; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS________OPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS________OPM-SAME: (<8 x i64>* nocapture nofree noundef nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64>* nocapture nofree noundef nonnull readonly align 64 dereferenceable(64) [[ARG1:%.*]]) #[[ATTR0:[0-9]+]] { > ; IS________OPM-NEXT: bb: > @@ -19,7 +19,7 @@ define internal fastcc void @callee_avx512_legal512_prefer512_call_avx512_legal5 > ; IS________OPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 > ; IS________OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree noundef nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -29,7 +29,7 @@ define internal fastcc void @callee_avx512_legal512_prefer512_call_avx512_legal5 > ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS__CGSCC_NPM-SAME: (<8 x i64>* noalias nocapture nofree noundef nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -47,7 +47,7 @@ bb: > > define void @avx512_legal512_prefer512_call_avx512_legal512_prefer512(<8 x i64>* %arg) #0 { > ; > -; IS__TUNIT_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS__TUNIT_OPM-SAME: (<8 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR0]] { > ; IS__TUNIT_OPM-NEXT: bb: > @@ -60,7 +60,7 @@ define void @avx512_legal512_prefer512_call_avx512_legal512_prefer512(<8 x i64>* > ; IS__TUNIT_OPM-NEXT: store <8 x i64> [[TMP4]], <8 x i64>* [[ARG]], align 2 > ; IS__TUNIT_OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS__TUNIT_NPM-SAME: (<8 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR0]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -74,7 +74,7 @@ define void @avx512_legal512_prefer512_call_avx512_legal512_prefer512(<8 x i64>* > ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP4]], <8 x i64>* [[ARG]], align 2 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS__CGSCC_OPM-SAME: (<8 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(64) [[ARG:%.*]]) #[[ATTR0]] { > ; IS__CGSCC_OPM-NEXT: bb: > @@ -87,7 +87,7 @@ define void @avx512_legal512_prefer512_call_avx512_legal512_prefer512(<8 x i64>* > ; IS__CGSCC_OPM-NEXT: store <8 x i64> [[TMP4]], <8 x i64>* [[ARG]], align 2 > ; IS__CGSCC_OPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer512_call_avx512_legal512_prefer512 > ; IS__CGSCC_NPM-SAME: (<8 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(64) [[ARG:%.*]]) #[[ATTR0]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -115,7 +115,7 @@ bb: > ; This should promote > define internal fastcc void @callee_avx512_legal512_prefer256_call_avx512_legal512_prefer256(<8 x i64>* %arg, <8 x i64>* readonly %arg1) #1 { > ; > -; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS________OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS________OPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS________OPM-SAME: (<8 x i64>* nocapture nofree noundef nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64>* nocapture nofree noundef nonnull readonly align 64 dereferenceable(64) [[ARG1:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS________OPM-NEXT: bb: > @@ -123,7 +123,7 @@ define internal fastcc void @callee_avx512_legal512_prefer256_call_avx512_legal5 > ; IS________OPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 > ; IS________OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS__TUNIT_NPM-SAME: (<8 x i64>* noalias nocapture nofree noundef nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -133,7 +133,7 @@ define internal fastcc void @callee_avx512_legal512_prefer256_call_avx512_legal5 > ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP]], <8 x i64>* [[ARG]], align 64 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@callee_avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS__CGSCC_NPM-SAME: (<8 x i64>* noalias nocapture nofree noundef nonnull writeonly align 64 dereferenceable(64) [[ARG:%.*]], <8 x i64> [[TMP0:%.*]]) #[[ATTR1:[0-9]+]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -151,7 +151,7 @@ bb: > > define void @avx512_legal512_prefer256_call_avx512_legal512_prefer256(<8 x i64>* %arg) #1 { > ; > -; IS__TUNIT_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_OPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS__TUNIT_OPM-SAME: (<8 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR1]] { > ; IS__TUNIT_OPM-NEXT: bb: > @@ -164,7 +164,7 @@ define void @avx512_legal512_prefer256_call_avx512_legal512_prefer256(<8 x i64>* > ; IS__TUNIT_OPM-NEXT: store <8 x i64> [[TMP4]], <8 x i64>* [[ARG]], align 2 > ; IS__TUNIT_OPM-NEXT: ret void > ; > -; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__TUNIT_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__TUNIT_NPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS__TUNIT_NPM-SAME: (<8 x i64>* nocapture nofree writeonly [[ARG:%.*]]) #[[ATTR1]] { > ; IS__TUNIT_NPM-NEXT: bb: > @@ -178,7 +178,7 @@ define void @avx512_legal512_prefer256_call_avx512_legal512_prefer256(<8 x i64>* > ; IS__TUNIT_NPM-NEXT: store <8 x i64> [[TMP4]], <8 x i64>* [[ARG]], align 2 > ; IS__TUNIT_NPM-NEXT: ret void > ; > -; IS__CGSCC_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_OPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_OPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS__CGSCC_OPM-SAME: (<8 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(64) [[ARG:%.*]]) #[[ATTR1]] { > ; IS__CGSCC_OPM-NEXT: bb: > @@ -191,7 +191,7 @@ define void @avx512_legal512_prefer256_call_avx512_legal512_prefer256(<8 x i64>* > ; IS__CGSCC_OPM-NEXT: store <8 x i64> [[TMP4]], <8 x i64>* [[ARG]], align 2 > ; IS__CGSCC_OPM-NEXT: ret void > ; > -; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind uwtable willreturn > +; IS__CGSCC_NPM: Function Attrs: argmemonly inlinehint nofree norecurse nosync nounwind willreturn uwtable > ; IS__CGSCC_NPM-LABEL: define {{[^@]+}}@avx512_legal512_prefer256_call_avx512_legal512_prefer256 > ; IS__CGSCC_NPM-SAME: (<8 x i64>* nocapture nofree noundef nonnull writeonly align 2 dereferenceable(64) [[ARG:%.*]]) #[[ATTR1]] { > ; IS__CGSCC_NPM-NEXT: bb: > @@ -219,7 +219,7 @@ bb: > ; This should promote > define internal fastcc void @callee_avx512_legal512_prefer512_call_avx512_legal512_prefer256(<8 x i64>* %arg, <8 x i64>* readonly %arg1) #1 { > </cut>

4 years, 3 months

4
6
0 0

Re: [TCWG CI] 401.bzip2 grew in size by 11% after llvm: [MachineSink] Disable if there are any irreducible cycles

by Maxim Kuvyrkov

Hi Nikita, Your patch seems to increase code-size of 401.bzip2 by 11% at -Oz. This is due to BZ2_decompress() function growing by 56%. Would you please investigate and see if this regression can be avoided? Please let us know if you need help reproducing or analyzing the problem. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On Mar 27, 2022, at 11:26 AM, ci_notify(a)linaro.org wrote: > > After llvm commit 6fde0439512580df793f3f48f95757b47de40d2b > Author: Nikita Popov <npopov(a)redhat.com> > > [MachineSink] Disable if there are any irreducible cycles > > the following benchmarks grew in size by more than 1%: > - 401.bzip2 grew in size by 11% from 36213 to 40325 bytes > - 401.bzip2:[.] BZ2_decompress grew in size by 56% from 7400 to 11560 bytes > > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: Clang + Glibc + LLVM Linker > - Version: all components were built from their tip of trunk > - Target: arm-linux-gnueabihf > - Compiler flags: -Oz -mthumb > - Hardware: APM Mustang 8x X-Gene1 > > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz_LTO > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-6fde0439512580df793f3f48f95757b47de40d2b > cd investigate-llvm-6fde0439512580df793f3f48f95757b47de40d2b > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 6fde0439512580df793f3f48f95757b47de40d2b > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach eb27da7dec67f1a36505b589b786ba1a499c274a > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 6fde0439512580df793f3f48f95757b47de40d2b > Author: Nikita Popov <npopov(a)redhat.com> > Date: Thu Feb 24 10:09:49 2022 +0100 > > [MachineSink] Disable if there are any irreducible cycles > > This is an alternative to D120330, which disables MachineSink for > functions with irreducible cycles entirely. This avoids both the > correctness problem, and ensures we don't perform non-profitable > sinks into cycles. At the same time, it may also disable > profitable sinks in the same function. This can be made more > precise by using MachineCycleInfo in the future. > > Fixes https://github.com/llvm/llvm-project/issues/53990. > > Differential Revision: https://reviews.llvm.org/D120800 > --- > llvm/lib/CodeGen/MachineSink.cpp | 12 +++ > llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll | 22 +++-- > llvm/test/CodeGen/X86/pr38795.ll | 93 +++++++++++----------- > .../CodeGen/X86/pr53990-incorrect-machine-sink.ll | 9 +-- > llvm/test/CodeGen/X86/x86-shrink-wrapping.ll | 36 ++++----- > 5 files changed, 87 insertions(+), 85 deletions(-) > > diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp > index 7ed33f9fdeac..301cc73a0530 100644 > --- a/llvm/lib/CodeGen/MachineSink.cpp > +++ b/llvm/lib/CodeGen/MachineSink.cpp > @@ -18,12 +18,14 @@ > #include "llvm/ADT/DenseSet.h" > #include "llvm/ADT/MapVector.h" > #include "llvm/ADT/PointerIntPair.h" > +#include "llvm/ADT/PostOrderIterator.h" > #include "llvm/ADT/SetVector.h" > #include "llvm/ADT/SmallSet.h" > #include "llvm/ADT/SmallVector.h" > #include "llvm/ADT/SparseBitVector.h" > #include "llvm/ADT/Statistic.h" > #include "llvm/Analysis/AliasAnalysis.h" > +#include "llvm/Analysis/CFG.h" > #include "llvm/CodeGen/MachineBasicBlock.h" > #include "llvm/CodeGen/MachineBlockFrequencyInfo.h" > #include "llvm/CodeGen/MachineBranchProbabilityInfo.h" > @@ -429,6 +431,16 @@ bool MachineSinking::runOnMachineFunction(MachineFunction &MF) { > AA = &getAnalysis<AAResultsWrapperPass>().getAAResults(); > RegClassInfo.runOnMachineFunction(MF); > > + // MachineSink currently uses MachineLoopInfo, which only recognizes natural > + // loops. As such, we could sink instructions into irreducible cycles, which > + // would be non-profitable. > + // WARNING: The current implementation of hasStoreBetween() is incorrect for > + // sinking into irreducible cycles (PR53990), this bailout is currently > + // necessary for correctness, not just profitability. > + ReversePostOrderTraversal<MachineBasicBlock *> RPOT(&*MF.begin()); > + if (containsIrreducibleCFG<MachineBasicBlock *>(RPOT, *LI)) > + return false; > + > bool EverMadeChange = false; > > while (true) { > diff --git a/llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll b/llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll > index 024b6c608aba..f93e181d157c 100644 > --- a/llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll > +++ b/llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll > @@ -24,7 +24,7 @@ define dso_local void @n(i32* %o, i32 %p, i32 %u) nounwind { > ; CHECK-NEXT: movq %r15, %rdi > ; CHECK-NEXT: callq l > ; CHECK-NEXT: testl %eax, %eax > -; CHECK-NEXT: jne .LBB0_10 > +; CHECK-NEXT: jne .LBB0_9 > ; CHECK-NEXT: # %bb.1: # %if.end > ; CHECK-NEXT: movl %ebx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill > ; CHECK-NEXT: cmpl $0, e(%rip) > @@ -44,21 +44,19 @@ define dso_local void @n(i32* %o, i32 %p, i32 %u) nounwind { > ; CHECK-NEXT: callq i > ; CHECK-NEXT: movl %eax, %ebp > ; CHECK-NEXT: orl %r14d, %ebp > -; CHECK-NEXT: testl %r13d, %r13d > -; CHECK-NEXT: je .LBB0_6 > -; CHECK-NEXT: # %bb.5: > ; CHECK-NEXT: andl $4, %ebx > -; CHECK-NEXT: jmp .LBB0_3 > -; CHECK-NEXT: .LBB0_6: # %if.end12 > +; CHECK-NEXT: testl %r13d, %r13d > +; CHECK-NEXT: jne .LBB0_3 > +; CHECK-NEXT: # %bb.5: # %if.end12 > ; CHECK-NEXT: testl %ebp, %ebp > -; CHECK-NEXT: je .LBB0_9 > -; CHECK-NEXT: # %bb.7: # %if.then14 > +; CHECK-NEXT: je .LBB0_8 > +; CHECK-NEXT: # %bb.6: # %if.then14 > ; CHECK-NEXT: movl {{[-0-9]+}}(%r{{[sb]}}p), %eax # 4-byte Reload > ; CHECK-NEXT: #APP > ; CHECK-NEXT: #NO_APP > -; CHECK-NEXT: jmp .LBB0_10 > +; CHECK-NEXT: jmp .LBB0_9 > ; CHECK-NEXT: .Ltmp0: # Block address taken > -; CHECK-NEXT: # %bb.8: # %if.then20.critedge > +; CHECK-NEXT: # %bb.7: # %if.then20.critedge > ; CHECK-NEXT: movl j(%rip), %edi > ; CHECK-NEXT: movslq %eax, %rcx > ; CHECK-NEXT: movl $1, %esi > @@ -71,9 +69,9 @@ define dso_local void @n(i32* %o, i32 %p, i32 %u) nounwind { > ; CHECK-NEXT: popq %r15 > ; CHECK-NEXT: popq %rbp > ; CHECK-NEXT: jmp k # TAILCALL > -; CHECK-NEXT: .LBB0_9: # %if.else > +; CHECK-NEXT: .LBB0_8: # %if.else > ; CHECK-NEXT: incq 0 > -; CHECK-NEXT: .LBB0_10: # %cleanup > +; CHECK-NEXT: .LBB0_9: # %cleanup > ; CHECK-NEXT: addq $8, %rsp > ; CHECK-NEXT: popq %rbx > ; CHECK-NEXT: popq %r12 > diff --git a/llvm/test/CodeGen/X86/pr38795.ll b/llvm/test/CodeGen/X86/pr38795.ll > index d805dcad8b6e..b526e4f471b1 100644 > --- a/llvm/test/CodeGen/X86/pr38795.ll > +++ b/llvm/test/CodeGen/X86/pr38795.ll > @@ -32,13 +32,14 @@ define dso_local void @fn() { > ; CHECK-NEXT: # implicit-def: $ebp > ; CHECK-NEXT: jmp .LBB0_1 > ; CHECK-NEXT: .p2align 4, 0x90 > -; CHECK-NEXT: .LBB0_16: # %for.inc > +; CHECK-NEXT: .LBB0_15: # %for.inc > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: movl %esi, %ecx > ; CHECK-NEXT: movb %dl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill > ; CHECK-NEXT: movb %dh, %dl > ; CHECK-NEXT: .LBB0_1: # %for.cond > ; CHECK-NEXT: # =>This Loop Header: Depth=1 > -; CHECK-NEXT: # Child Loop BB0_20 Depth 2 > +; CHECK-NEXT: # Child Loop BB0_19 Depth 2 > ; CHECK-NEXT: cmpb $8, %dl > ; CHECK-NEXT: movb %dl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill > ; CHECK-NEXT: ja .LBB0_3 > @@ -55,7 +56,7 @@ define dso_local void @fn() { > ; CHECK-NEXT: movb %cl, %dh > ; CHECK-NEXT: movl $0, h > ; CHECK-NEXT: cmpb $8, %dl > -; CHECK-NEXT: jg .LBB0_8 > +; CHECK-NEXT: jg .LBB0_9 > ; CHECK-NEXT: # %bb.5: # %if.then13 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: movl %eax, %esi > @@ -64,12 +65,10 @@ define dso_local void @fn() { > ; CHECK-NEXT: calll printf > ; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %dh # 1-byte Reload > ; CHECK-NEXT: testb %bl, %bl > -; CHECK-NEXT: movl %esi, %ecx > ; CHECK-NEXT: # implicit-def: $eax > -; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %dl # 1-byte Reload > -; CHECK-NEXT: movb %dl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill > +; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %cl # 1-byte Reload > ; CHECK-NEXT: movb %dh, %dl > -; CHECK-NEXT: jne .LBB0_16 > +; CHECK-NEXT: jne .LBB0_15 > ; CHECK-NEXT: jmp .LBB0_6 > ; CHECK-NEXT: .p2align 4, 0x90 > ; CHECK-NEXT: .LBB0_3: # %if.then > @@ -78,82 +77,82 @@ define dso_local void @fn() { > ; CHECK-NEXT: calll printf > ; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %dl # 1-byte Reload > ; CHECK-NEXT: # implicit-def: $eax > +; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %cl # 1-byte Reload > +; CHECK-NEXT: jmp .LBB0_6 > +; CHECK-NEXT: .p2align 4, 0x90 > +; CHECK-NEXT: .LBB0_9: # %if.end21 > +; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: # implicit-def: $ebp > +; CHECK-NEXT: jmp .LBB0_10 > +; CHECK-NEXT: .p2align 4, 0x90 > ; CHECK-NEXT: .LBB0_6: # %for.cond35 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: movb %dl, %dh > ; CHECK-NEXT: testl %edi, %edi > -; CHECK-NEXT: je .LBB0_7 > -; CHECK-NEXT: .LBB0_11: # %af > +; CHECK-NEXT: movl %edi, %esi > +; CHECK-NEXT: movl $0, %edi > +; CHECK-NEXT: movb %cl, %dl > +; CHECK-NEXT: je .LBB0_19 > +; CHECK-NEXT: # %bb.7: # %af > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: testb %bl, %bl > -; CHECK-NEXT: jne .LBB0_12 > -; CHECK-NEXT: .LBB0_17: # %if.end39 > +; CHECK-NEXT: jne .LBB0_8 > +; CHECK-NEXT: .LBB0_16: # %if.end39 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: testl %eax, %eax > -; CHECK-NEXT: je .LBB0_19 > -; CHECK-NEXT: # %bb.18: # %if.then41 > +; CHECK-NEXT: je .LBB0_18 > +; CHECK-NEXT: # %bb.17: # %if.then41 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp) > ; CHECK-NEXT: movl $fn, {{[0-9]+}}(%esp) > ; CHECK-NEXT: movl $.str, (%esp) > ; CHECK-NEXT: calll printf > -; CHECK-NEXT: .LBB0_19: # %for.end46 > +; CHECK-NEXT: .LBB0_18: # %for.end46 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: movl %esi, %edi > ; CHECK-NEXT: # implicit-def: $dl > ; CHECK-NEXT: # implicit-def: $dh > ; CHECK-NEXT: # implicit-def: $ebp > -; CHECK-NEXT: jmp .LBB0_20 > -; CHECK-NEXT: .p2align 4, 0x90 > -; CHECK-NEXT: .LBB0_8: # %if.end21 > -; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > -; CHECK-NEXT: # implicit-def: $ebp > -; CHECK-NEXT: jmp .LBB0_9 > ; CHECK-NEXT: .p2align 4, 0x90 > -; CHECK-NEXT: .LBB0_7: # in Loop: Header=BB0_1 Depth=1 > -; CHECK-NEXT: xorl %edi, %edi > -; CHECK-NEXT: movb %dl, %dh > -; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %dl # 1-byte Reload > -; CHECK-NEXT: .p2align 4, 0x90 > -; CHECK-NEXT: .LBB0_20: # %for.cond47 > +; CHECK-NEXT: .LBB0_19: # %for.cond47 > ; CHECK-NEXT: # Parent Loop BB0_1 Depth=1 > ; CHECK-NEXT: # => This Inner Loop Header: Depth=2 > ; CHECK-NEXT: testb %bl, %bl > -; CHECK-NEXT: jne .LBB0_20 > -; CHECK-NEXT: # %bb.21: # %for.cond47 > -; CHECK-NEXT: # in Loop: Header=BB0_20 Depth=2 > +; CHECK-NEXT: jne .LBB0_19 > +; CHECK-NEXT: # %bb.20: # %for.cond47 > +; CHECK-NEXT: # in Loop: Header=BB0_19 Depth=2 > ; CHECK-NEXT: testb %bl, %bl > -; CHECK-NEXT: jne .LBB0_20 > -; CHECK-NEXT: .LBB0_9: # %ae > +; CHECK-NEXT: jne .LBB0_19 > +; CHECK-NEXT: .LBB0_10: # %ae > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: testb %bl, %bl > -; CHECK-NEXT: jne .LBB0_10 > -; CHECK-NEXT: # %bb.13: # %if.end26 > +; CHECK-NEXT: jne .LBB0_11 > +; CHECK-NEXT: # %bb.12: # %if.end26 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > -; CHECK-NEXT: xorl %ecx, %ecx > +; CHECK-NEXT: xorl %esi, %esi > ; CHECK-NEXT: testb %dl, %dl > -; CHECK-NEXT: je .LBB0_16 > -; CHECK-NEXT: # %bb.14: # %if.end26 > +; CHECK-NEXT: je .LBB0_15 > +; CHECK-NEXT: # %bb.13: # %if.end26 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: testl %ebp, %ebp > -; CHECK-NEXT: jne .LBB0_16 > -; CHECK-NEXT: # %bb.15: # %if.then31 > +; CHECK-NEXT: jne .LBB0_15 > +; CHECK-NEXT: # %bb.14: # %if.then31 > ; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1 > -; CHECK-NEXT: xorl %ecx, %ecx > +; CHECK-NEXT: xorl %esi, %esi > ; CHECK-NEXT: xorl %ebp, %ebp > -; CHECK-NEXT: jmp .LBB0_16 > +; CHECK-NEXT: jmp .LBB0_15 > ; CHECK-NEXT: .p2align 4, 0x90 > -; CHECK-NEXT: .LBB0_10: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: .LBB0_11: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: movl %edi, %esi > ; CHECK-NEXT: # implicit-def: $eax > ; CHECK-NEXT: testb %bl, %bl > -; CHECK-NEXT: je .LBB0_17 > -; CHECK-NEXT: .LBB0_12: # in Loop: Header=BB0_1 Depth=1 > +; CHECK-NEXT: je .LBB0_16 > +; CHECK-NEXT: .LBB0_8: # in Loop: Header=BB0_1 Depth=1 > ; CHECK-NEXT: # implicit-def: $edi > ; CHECK-NEXT: # implicit-def: $cl > -; CHECK-NEXT: # kill: killed $cl > ; CHECK-NEXT: # implicit-def: $dl > ; CHECK-NEXT: # implicit-def: $ebp > -; CHECK-NEXT: testl %edi, %edi > -; CHECK-NEXT: jne .LBB0_11 > -; CHECK-NEXT: jmp .LBB0_7 > +; CHECK-NEXT: jmp .LBB0_6 > entry: > br label %for.cond > > diff --git a/llvm/test/CodeGen/X86/pr53990-incorrect-machine-sink.ll b/llvm/test/CodeGen/X86/pr53990-incorrect-machine-sink.ll > index 3d7ff6cbe676..4f56d7b16a87 100644 > --- a/llvm/test/CodeGen/X86/pr53990-incorrect-machine-sink.ll > +++ b/llvm/test/CodeGen/X86/pr53990-incorrect-machine-sink.ll > @@ -7,18 +7,15 @@ define void @test(i1 %c, i64* %p, i64* noalias %p2) nounwind { > ; CHECK-LABEL: test: > ; CHECK: # %bb.0: # %entry > ; CHECK-NEXT: pushq %rbp > -; CHECK-NEXT: pushq %r15 > ; CHECK-NEXT: pushq %r14 > ; CHECK-NEXT: pushq %rbx > -; CHECK-NEXT: pushq %rax > ; CHECK-NEXT: movq %rdx, %rbx > -; CHECK-NEXT: movq %rsi, %r14 > -; CHECK-NEXT: movl %edi, %r15d > +; CHECK-NEXT: movl %edi, %r14d > +; CHECK-NEXT: movq (%rsi), %rbp > ; CHECK-NEXT: xorl %eax, %eax > ; CHECK-NEXT: jmpq *.LJTI0_0(,%rax,8) > ; CHECK-NEXT: .LBB0_1: # %split.3 > -; CHECK-NEXT: movq (%r14), %rbp > -; CHECK-NEXT: testb $1, %r15b > +; CHECK-NEXT: testb $1, %r14b > ; CHECK-NEXT: je .LBB0_3 > ; CHECK-NEXT: # %bb.2: # %clobber > ; CHECK-NEXT: callq clobber@PLT > diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll > index 0f8bb837f82a..b44895293b41 100644 > --- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll > +++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll > @@ -1377,6 +1377,8 @@ define i32 @irreducibleCFG() #4 { > ; ENABLE-NEXT: pushq %rbx > ; ENABLE-NEXT: pushq %rax > ; ENABLE-NEXT: .cfi_offset %rbx, -24 > +; ENABLE-NEXT: movq _irreducibleCFGa@GOTPCREL(%rip), %rax > +; ENABLE-NEXT: movl (%rax), %edi > ; ENABLE-NEXT: movq _irreducibleCFGf@GOTPCREL(%rip), %rax > ; ENABLE-NEXT: cmpb $0, (%rax) > ; ENABLE-NEXT: je LBB16_2 > @@ -1386,24 +1388,20 @@ define i32 @irreducibleCFG() #4 { > ; ENABLE-NEXT: jmp LBB16_1 > ; ENABLE-NEXT: LBB16_2: ## %split > ; ENABLE-NEXT: movq _irreducibleCFGb@GOTPCREL(%rip), %rax > +; ENABLE-NEXT: xorl %ebx, %ebx > ; ENABLE-NEXT: cmpl $0, (%rax) > -; ENABLE-NEXT: je LBB16_3 > -; ENABLE-NEXT: ## %bb.4: ## %for.body4.i > -; ENABLE-NEXT: movq _irreducibleCFGa@GOTPCREL(%rip), %rax > -; ENABLE-NEXT: movl (%rax), %edi > +; ENABLE-NEXT: je LBB16_4 > +; ENABLE-NEXT: ## %bb.3: ## %for.body4.i > ; ENABLE-NEXT: xorl %ebx, %ebx > ; ENABLE-NEXT: xorl %eax, %eax > ; ENABLE-NEXT: callq _something > -; ENABLE-NEXT: jmp LBB16_5 > -; ENABLE-NEXT: LBB16_3: > -; ENABLE-NEXT: xorl %ebx, %ebx > ; ENABLE-NEXT: .p2align 4, 0x90 > -; ENABLE-NEXT: LBB16_5: ## %for.inc > +; ENABLE-NEXT: LBB16_4: ## %for.inc > ; ENABLE-NEXT: ## =>This Inner Loop Header: Depth=1 > ; ENABLE-NEXT: incl %ebx > ; ENABLE-NEXT: cmpl $7, %ebx > -; ENABLE-NEXT: jl LBB16_5 > -; ENABLE-NEXT: ## %bb.6: ## %fn1.exit > +; ENABLE-NEXT: jl LBB16_4 > +; ENABLE-NEXT: ## %bb.5: ## %fn1.exit > ; ENABLE-NEXT: xorl %eax, %eax > ; ENABLE-NEXT: addq $8, %rsp > ; ENABLE-NEXT: popq %rbx > @@ -1420,6 +1418,8 @@ define i32 @irreducibleCFG() #4 { > ; DISABLE-NEXT: pushq %rbx > ; DISABLE-NEXT: pushq %rax > ; DISABLE-NEXT: .cfi_offset %rbx, -24 > +; DISABLE-NEXT: movq _irreducibleCFGa@GOTPCREL(%rip), %rax > +; DISABLE-NEXT: movl (%rax), %edi > ; DISABLE-NEXT: movq _irreducibleCFGf@GOTPCREL(%rip), %rax > ; DISABLE-NEXT: cmpb $0, (%rax) > ; DISABLE-NEXT: je LBB16_2 > @@ -1429,24 +1429,20 @@ define i32 @irreducibleCFG() #4 { > ; DISABLE-NEXT: jmp LBB16_1 > ; DISABLE-NEXT: LBB16_2: ## %split > ; DISABLE-NEXT: movq _irreducibleCFGb@GOTPCREL(%rip), %rax > +; DISABLE-NEXT: xorl %ebx, %ebx > ; DISABLE-NEXT: cmpl $0, (%rax) > -; DISABLE-NEXT: je LBB16_3 > -; DISABLE-NEXT: ## %bb.4: ## %for.body4.i > -; DISABLE-NEXT: movq _irreducibleCFGa@GOTPCREL(%rip), %rax > -; DISABLE-NEXT: movl (%rax), %edi > +; DISABLE-NEXT: je LBB16_4 > +; DISABLE-NEXT: ## %bb.3: ## %for.body4.i > ; DISABLE-NEXT: xorl %ebx, %ebx > ; DISABLE-NEXT: xorl %eax, %eax > ; DISABLE-NEXT: callq _something > -; DISABLE-NEXT: jmp LBB16_5 > -; DISABLE-NEXT: LBB16_3: > -; DISABLE-NEXT: xorl %ebx, %ebx > ; DISABLE-NEXT: .p2align 4, 0x90 > -; DISABLE-NEXT: LBB16_5: ## %for.inc > +; DISABLE-NEXT: LBB16_4: ## %for.inc > ; DISABLE-NEXT: ## =>This Inner Loop Header: Depth=1 > ; DISABLE-NEXT: incl %ebx > ; DISABLE-NEXT: cmpl $7, %ebx > -; DISABLE-NEXT: jl LBB16_5 > -; DISABLE-NEXT: ## %bb.6: ## %fn1.exit > +; DISABLE-NEXT: jl LBB16_4 > +; DISABLE-NEXT: ## %bb.5: ## %fn1.exit > ; DISABLE-NEXT: xorl %eax, %eax > ; DISABLE-NEXT: addq $8, %rsp > ; DISABLE-NEXT: popq %rbx > </cut>

4 years, 3 months

1
0
0 0

[ACTIVITY] Report for week #12

by Thiago Jung Bauermann

Hello, # [GNU-732] GDB support for ARMv9 Scalable Matrix Extension (SME) * Continued working on gdbserver and remote protocol support for programs that change the SVE vector length during execution: - Studied the 2020 discussion around Luis' proposal to support changed VL size in the Remote Serial Protocol / gdbserver, as well as relevant parts of the RSP itself. - Started studying GDB's type system and how it handles dynamic types, and also the target description code to assess the feasibility of making the vector registers use a dynamically sized type. - Luis made a second proposal in that mailing list discussion where the target description wouldn't be transferred via the remote protocol, just the vector length. Then both GDB and gdbserver could locally update their own descriptions based on that. Starting to think about this alternative. It would probably be simpler than changing the target description to use a dynamically sized type for the Z registers. # Basic setup / onboarding * Work laptop arrived. Set it up. -- Thiago

4 years, 3 months

1
0
0 0

[ACTIVITY] week ending Mar. 27 2022

by Alex Bennée

Project Stratos =============== - spent some time talking through design approaches for xen vhost-master with Viresh - posted Re: Understanding osdep_xenforeignmemory_map mmap behaviour Message-Id: <alpine.DEB.2.22.394.2203231838130.2910984@ubuntu-linux-20-04-desktop> Linux RPMB Sub-system and virtio-driver ([STR-40]) - continued working on [Linux driver] - discovered a bug in vhost-user config handling in QEMU as well - posted [PATCH v1 00/13] various virtio docs, fixes and tweaks Message-Id: <20220321153037.3622127-1-alex.bennee(a)linaro.org> [STR-40] <https://linaro.atlassian.net/browse/STR-40> [Linux driver] <http://git.linaro.org/people/alex.bennee/linux.git/shortlog/refs/heads/rpmb…> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL 00/18] testing and semihosting updates Message-Id: <20220301094715.550871-1-alex.bennee(a)linaro.org> - posted [PULL for 7.0 0/8] i386, docs, gitlab fixes Message-Id: <20220323112711.440376-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Other ===== - Finished work on presentation for LTD Completed Reviews [11/11] ========================= [PATCH] gdbstub.c: add support for info proc mappings Message-Id: <20220221030910.3203063-1-dominik.b.czarnota(a)gmail.com> [PATCH] tests/Makefile.include: Let "make clean" remove the TCG tests, too Message-Id: <20220301085900.1443232-1-thuth(a)redhat.com> [PATCH 0/3] gdbstub: add support for switchable endianness Message-Id: <20210823142004.17935-1-changbin.du(a)gmail.com> [PATCH 0/6] More record/replay acceptance tests Message-Id: <162332427732.194926.7555369160312506539.stgit@pasha-ThinkPad-X280> [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> [PATCH 00/11] edk2: update to stable202202 Message-Id: <20220308145521.3106395-1-kraxel(a)redhat.com> [PATCH 0/3] Use g_new() & friends where that makes obvious Message-Id: <20220314160108.1440470-3-armbru(a)redhat.com> [PATCH 2/2] target/arm: Log fault address for M-profile faults Message-Id: <20220315204306.2797684-3-peter.maydell(a)linaro.org> [RFC PATCH 0/6] Port PPC64/PowerNV MMU tests to QEMU Message-Id: <20220324190854.156898-1-leandro.lupori(a)eldorado.org.br> [PULL for-7.1 00/36] Logging cleanup and per-thread logfiles Message-Id: <20220320171135.2704502-1-richard.henderson(a)linaro.org> [PATCH 1/2] gdbstub: Set current_cpu for memory read write Message-Id: <20220322154213.86475-1-bmeng.cn(a)gmail.com> Absences ======== Current Review Queue ==================== TODO [PULL for-7.1 00/36] Logging cleanup and per-thread logfiles Message-Id: <20220320171135.2704502-1-richard.henderson(a)linaro.org> ==================================================================================================================================== TODO [PATCH 00/11] edk2: update to stable202202 Message-Id: <20220308145521.3106395-1-kraxel(a)redhat.com> ======================================================================================================= TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ -- Alex Bennée

4 years, 3 months

1
0
0 0

[ACTIVITY] report week ending 25 Mar

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] + More of the usual freeze-related work * QEMU-420 [GICv4 emulation] + I think the code is more or less bug-free now; still need to figure out the best way for a board to request a GICv4 (eg do we want a 'revision' property specifying 3, 3.1, 4, 4.1, or just 3 vs 4 with some optional booleans for extra features?) -- PMM

4 years, 3 months

1
0
0 0

[ACTIVITY] Report for week #11

by Thiago Jung Bauermann

Hello, # [GNU-732] GDB support for ARMv9 Scalable Matrix Extension (SME) * Continued reading patches from Mark Brown's v12 patch set adding SME support to the Linux kernel. Sent a few trivial review comments. * After conversation with Luis, decided to work on gdbserver and remote protocol support for programs which change the SVE vector length during execution (native GDB already supports it). This issue will most likely be relevant for SME as well. Started by studying Luis' proposal from 2020 and background information provided by him. # Basic setup / onboarding * Bought a work laptop. * Set up access to the team's machines. -- Thiago

4 years, 3 months

1
0
0 0

[ACTIVITY] report week ending 18 Mar

by Peter Maydell

Progress (for a week-and-a-half) * UM-2 [QEMU upstream maintainership] + Lots of freeze-related work (softfreeze was last week and we tagged rc0 this week) + Code review of other peoples's stuff to go into the release + Assembling arm pullreqs + Investigating an intermittent failure in one of our test cases on s390 host, which seems like it may be a bug in the s390 h/w-accelerated zlib * QEMU-420 [GICv4 emulation] + Still debugging... -- PMM

4 years, 3 months

1
0
0 0

Interest in reproducing gcc-linaro-4.9-2016.02 arm-linux-gnueabihf target under darwin and linux aarch64 host

by jhgorse＠gmail.com

Hello, I see this release gcc-linaro-4.9-2016.02 for 86_64_arm-linux-gnueabihf: https://releases.linaro.org/components/toolchain/binaries/4.9-2016.02/arm-l… and would like to reproduce the toolchain for aarch64 hosts. I see that it was built with ABE, though I have generally been unsuccessful in getting ABE to work on aarch64 for this. I was looking for some build or ci breadcrumbs or documentation. What can you recommend? The motivation here is to support legacy development/testing from modern aarch64 hardware. Cheers, Joe Gorse

4 years, 3 months

1
0
0 0

[Activity] Week #10

by Thiago Jung Bauermann

Hello, # GDB support for ARMv9 Scalable Matrix Extension (SME) - Synced with Luis Machado to learn what the current status is. Read discussions in the linux-arm-kernel mailing list which he pointed to. - Read Arm architecture documentation about Neon, SVE, SVE2 and SME to familiarise myself with these features. - Basic setup / onboarding - Joined some internal and external mailing lists, IRC and Slack channels. - Read some company policy documents. - Researched models and got a quote for a work laptop. - Set up aarch64 cross-compilation environment on my laptop. - Set up emulated aarch64 machine with Fedora on my laptop. - Attempted setting up emulated aarch64 machine with Ubuntu on my laptop, but ran into problems with the Ubuntu Server installer. -- Thiago

4 years, 3 months

1
0
0 0

[ACTIVITY] week ending Mar. 6 2022

by Alex Bennée

Project Stratos =============== - spent some time talking through design approaches for xen vhost-master with Viresh Linux RPMB Sub-system and virtio-driver ([STR-40]) - continued working on [Linux driver] - discovered a bug in vhost-user config handling in QEMU as well [STR-40] <https://linaro.atlassian.net/browse/STR-40> [Linux driver] <http://git.linaro.org/people/alex.bennee/linux.git/shortlog/refs/heads/rpmb…> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL 00/18] testing and semihosting updates Message-Id: <20220301094715.550871-1-alex.bennee(a)linaro.org> Other ===== - started work on presentation for LTD Completed Reviews [5/5] ======================= [PATCH] gdbstub.c: add support for info proc mappings Message-Id: <20220221030910.3203063-1-dominik.b.czarnota(a)gmail.com> [PATCH] tests/Makefile.include: Let "make clean" remove the TCG tests, too Message-Id: <20220301085900.1443232-1-thuth(a)redhat.com> [PATCH 0/3] gdbstub: add support for switchable endianness Message-Id: <20210823142004.17935-1-changbin.du(a)gmail.com> [PATCH 0/6] More record/replay acceptance tests Message-Id: <162332427732.194926.7555369160312506539.stgit@pasha-ThinkPad-X280> [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> Absences ======== Current Review Queue ==================== TODO [PATCH v4 00/18] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220301215958.157011-1-richard.henderson(a)linaro.org> ===================================================================================================================================== TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

4 years, 3 months

1
0
0 0

[ACTIVITY] report week ending 4 Mar

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Looked at and sent patches to fix a minor decode error for Neon VLD1/VST1 that RTH found + softfreeze is next Tuesday -- sent out last big Arm pullreq before freeze, though there will probably need to be another smaller one + code review, respinning previously sent patches, looking at bug reports, all to get things in before freeze * QEMU-420 [GICv4 emulation] + All the GICv4.0 stuff is now code-complete, but testing and loose ends (like plumbing it into the virt board) will take a while still. -- PMM

4 years, 3 months

1
0
0 0

[weekly][linaro] report week ending 25 Feb

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] + Respins of a few patchsets that needed v2 + Looked at a few bugs since softfreeze for 7.0 is near + Amazingly my to-review queue is now almost empty * QEMU-420 [GICv4 emulation] + Implemented more of the redistributor code -- the last missing big piece is its handling of VMOVI, though there are also probably some loose ends to tidy up + Note that this isn't going to be in time for 7.0, so will likely go on the back-burner a bit in favour of release-critical items thanks -- PMM

4 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 27 2022

by Alex Bennée

Project Stratos =============== - spent more time troubleshooting Xen builds with Viresh Linux RPMB Sub-system and virtio-driver ([STR-40]) - continued working on [Linux driver] - discovered a bug in vhost-user config handling in QEMU as well [STR-40] <https://linaro.atlassian.net/browse/STR-40> [Linux driver] <http://git.linaro.org/people/alex.bennee/linux.git/shortlog/refs/heads/rpmb…> QEMU Upstream Work ([UM-2]) =========================== - follow-up on Analysis of slow distro boots in check-avocado (BootLinuxAarch64.test_virt_tcg*) Message-Id: <874k4xbqvp.fsf(a)linaro.org> - posted [PATCH v2 00/18] testing and semihosting pre-PR Message-Id: <20220225172021.3493923-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Current Review Queue ==================== TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== -- Alex Bennée

4 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 20 2022

by Alex Bennée

Project Stratos =============== - spent more time troubleshooting Xen builds with Viresh Linux RPMB Sub-system and virtio-driver ([STR-40]) - started working on v2 of the Linux driver QEMU Upstream Work ([UM-2]) =========================== - posted Analysis of slow distro boots in check-avocado (BootLinuxAarch64.test_virt_tcg*) Message-Id: <874k4xbqvp.fsf(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> Current Review Queue ==================== TODO [RFC PATCH 00/27] Virtio sound card implementation Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com> ============================================================================================================================ TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== -- Alex Bennée

4 years, 4 months

1
0
0 0

[ACTIVITY] report week ending 18 Feb

by Peter Maydell

Progress (a report covering two half-weeks) * UM-2 [QEMU upstream maintainership] - lots of code review - fixed another bug in the armv7m clock framework code - refactoring patchset to trim some fat from a header that gets included by every C file in the build * QEMU-420 [GICv4 emulation] - CPU interface parts of GICv4 work are code-complete - started on the redistributor work -- PMM

4 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 13 2022

by Alex Bennée

Project Stratos =============== - posted Metadata and signalling channels for Zephyr virtio-backends on Xen Message-Id: <87h79bgd1m.fsf(a)linaro.org> - spent some time troubleshooting Xen builds with Viresh vhost-device maintainer effort ([UM-196]) - posted [a pull request in rust-vmm/community] [a pull request in rust-vmm/community] <elfeed:github.com#tag:github.com,2008:PullRequestEvent/20180885703> QEMU Upstream Work ([UM-2]) =========================== - posted [RFC PATCH] tcg/optimize: only read val after const check Message-Id: <20220209112142.3367525-1-alex.bennee(a)linaro.org> - posted [PULL 00/28] testing and plugin updates Message-Id: <20220209141529.3418384-1-alex.bennee(a)linaro.org> - triage for [qemu-x86_64 uses host libraries instead of emulated system libraries] - triage for [linux-user: substantial memory leak when threads are created and destroyed] - posted [RFC PATCH] linux-user: trap internal SIGABRT's Message-Id: <20220209112207.3368139-1-alex.bennee(a)linaro.org> - posted [PATCH v5 0/2] semihosting/next (SYS_HEAPINFO) Message-Id: <20220210113021.3799514-2-alex.bennee(a)linaro.org> - posted [PATCH v1 00/11] testing/next (docker, lcitool, ci, tcg) Message-Id: <20220211160309.335014-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> [qemu-x86_64 uses host libraries instead of emulated system libraries] <elfeed:gitlab.com#https://gitlab.com/qemu-project/qemu/-/issues/857> [linux-user: substantial memory leak when threads are created and destroyed] <elfeed:gitlab.com#https://gitlab.com/qemu-project/qemu/-/issues/866> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Completed Reviews [0/0] ======================= Absences ======== Current Review Queue ==================== TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

4 years, 4 months

1
0
0 0

Re: [TCWG CI] 401.bzip2 grew in size by 9% after llvm: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`

by Maxim Kuvyrkov

Hi Roman, Your below patch increased code-size of 401.bzip2 by 9% on 32-bit ARM when compiled with -Os. That’s quite a lot, would you please investigate whether this regression can be avoided? Please let me know if this doesn’t reproduce for you and I’ll try to help. Thank you, -- Maxim Kuvyrkov https://www.linaro.org > On 9 Feb 2022, at 17:10, ci_notify(a)linaro.org wrote: > > After llvm commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > Author: Roman Lebedev <lebedev.ri(a)gmail.com> > > [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` > > the following benchmarks grew in size by more than 1%: > - 401.bzip2 grew in size by 9% from 37909 to 41405 bytes > - 401.bzip2:[.] BZ2_decompress grew in size by 42% from 7664 to 10864 bytes > - 429.mcf grew in size by 2% from 7732 to 7908 bytes > > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: Clang + Glibc + LLVM Linker > - Version: all components were built from their tip of trunk > - Target: arm-linux-gnueabihf > - Compiler flags: -Os -mthumb > - Hardware: APM Mustang 8x X-Gene1 > > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os_LTO > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-77a0da926c9ea86afa9baf28158d79c7678fc6b9 > cd investigate-llvm-77a0da926c9ea86afa9baf28158d79c7678fc6b9 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach f59787084e09aeb787cb3be3103b2419ccd14163 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > Author: Roman Lebedev <lebedev.ri(a)gmail.com> > Date: Mon Feb 7 16:03:40 2022 +0300 > > [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` > > D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. > What it essentially does is prevents scalarized vectorization of masked memory operations: > ``` > // TODO: Cost model for emulated masked load/store is completely > // broken. This hack guides the cost model to use an artificially > // high enough value to practically disable vectorization with such > // operations, except where previously deployed legality hack allowed > // using very low cost values. This is to avoid regressions coming simply > // from moving "masked load/store" check from legality to cost model. > // Masked Load/Gather emulation was previously never allowed. > // Limited number of Masked Store/Scatter emulation was allowed. > ``` > > While i don't really understand about what specifically `is completely broken` > was talking about, i believe that at least on X86 with AVX2-or-later, > this is no longer true. (or at least, i would like to know what is still broken). > So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. > > But since this was added for X86 specifically, let's just instead completely remove this hack. > > Reviewed By: RKSimon > > Differential Revision: https://reviews.llvm.org/D114779 > --- > llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 34 +- > .../X86/masked-gather-i32-with-i8-index.ll | 40 +- > .../X86/masked-gather-i64-with-i8-index.ll | 40 +- > .../CostModel/X86/masked-interleaved-load-i16.ll | 36 +- > .../CostModel/X86/masked-interleaved-store-i16.ll | 24 +- > .../test/Analysis/CostModel/X86/masked-load-i16.ll | 46 +- > .../test/Analysis/CostModel/X86/masked-load-i32.ll | 16 +- > .../test/Analysis/CostModel/X86/masked-load-i64.ll | 16 +- > llvm/test/Analysis/CostModel/X86/masked-load-i8.ll | 46 +- > .../AArch64/tail-fold-uniform-memops.ll | 159 ++- > .../Transforms/LoopVectorize/X86/gather_scatter.ll | 1176 ++++++++++++++++---- > .../X86/x86-interleaved-accesses-masked-group.ll | 1041 ++++++++--------- > .../Transforms/LoopVectorize/if-pred-stores.ll | 6 +- > .../Transforms/LoopVectorize/memdep-fold-tail.ll | 6 +- > llvm/test/Transforms/LoopVectorize/optsize.ll | 837 +++++++++++--- > llvm/test/Transforms/LoopVectorize/tripcount.ll | 673 ++++++++++- > .../LoopVectorize/vplan-sink-scalars-and-merge.ll | 4 +- > 17 files changed, 3064 insertions(+), 1136 deletions(-) > > diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > index bfe08d42c883..ccce2c2a7b15 100644 > --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > @@ -307,11 +307,6 @@ static cl::opt<bool> InterleaveSmallLoopScalarReduction( > cl::desc("Enable interleaving for loops with small iteration counts that " > "contain scalar reductions to expose ILP.")); > > -/// The number of stores in a loop that are allowed to need predication. > -static cl::opt<unsigned> NumberOfStoresToPredicate( > - "vectorize-num-stores-pred", cl::init(1), cl::Hidden, > - cl::desc("Max number of stores to be predicated behind an if.")); > - > static cl::opt<bool> EnableIndVarRegisterHeur( > "enable-ind-var-reg-heur", cl::init(true), cl::Hidden, > cl::desc("Count the induction variable only once when interleaving")); > @@ -1797,10 +1792,6 @@ private: > /// as a vector operation. > bool isConsecutiveLoadOrStore(Instruction *I); > > - /// Returns true if an artificially high cost for emulated masked memrefs > - /// should be used. > - bool useEmulatedMaskMemRefHack(Instruction *I, ElementCount VF); > - > /// Map of scalar integer values to the smallest bitwidth they can be legally > /// represented as. The vector equivalents of these values should be truncated > /// to this type. > @@ -6437,22 +6428,6 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) { > return RUs; > } > > -bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I, > - ElementCount VF) { > - // TODO: Cost model for emulated masked load/store is completely > - // broken. This hack guides the cost model to use an artificially > - // high enough value to practically disable vectorization with such > - // operations, except where previously deployed legality hack allowed > - // using very low cost values. This is to avoid regressions coming simply > - // from moving "masked load/store" check from legality to cost model. > - // Masked Load/Gather emulation was previously never allowed. > - // Limited number of Masked Store/Scatter emulation was allowed. > - assert(isPredicatedInst(I, VF) && "Expecting a scalar emulated instruction"); > - return isa<LoadInst>(I) || > - (isa<StoreInst>(I) && > - NumPredStores > NumberOfStoresToPredicate); > -} > - > void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) { > // If we aren't vectorizing the loop, or if we've already collected the > // instructions to scalarize, there's nothing to do. Collection may already > @@ -6478,9 +6453,7 @@ void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) { > ScalarCostsTy ScalarCosts; > // Do not apply discount if scalable, because that would lead to > // invalid scalarization costs. > - // Do not apply discount logic if hacked cost is needed > - // for emulated masked memrefs. > - if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I, VF) && > + if (!VF.isScalable() && > computePredInstDiscount(&I, ScalarCosts, VF) >= 0) > ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end()); > // Remember that BB will remain after vectorization. > @@ -6736,11 +6709,6 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I, > Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()), > /*Insert=*/false, /*Extract=*/true); > Cost += TTI.getCFInstrCost(Instruction::Br, TTI::TCK_RecipThroughput); > - > - if (useEmulatedMaskMemRefHack(I, VF)) > - // Artificially setting to a high enough value to practically disable > - // vectorization with such operations. > - Cost = 3000000; > } > > return Cost; > diff --git a/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll b/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > index 62412a5d1af0..c52755b7d65c 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > @@ -17,30 +17,30 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > @@ -50,8 +50,8 @@ target triple = "x86_64-unknown-linux-gnu" > ; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX512: LV: Found an estimated cost of 22 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll b/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > index b8eba8b0327b..b38026c824b5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > @@ -17,30 +17,30 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > @@ -50,8 +50,8 @@ target triple = "x86_64-unknown-linux-gnu" > ; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX512: LV: Found an estimated cost of 24 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX512: LV: Found an estimated cost of 12 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > index d6bfdf9d3848..184e23a0128b 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > @@ -89,30 +89,30 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2" > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > @@ -164,17 +164,17 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test" > ; > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test" > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 7 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > > define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) { > diff --git a/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > index 5f67026737fc..224dd75a4dc5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > @@ -89,17 +89,17 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2 > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 23 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 50 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2" > ; > @@ -107,16 +107,16 @@ for.end: > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > > define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) { > entry: > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > index c8c3078f1625..2722a52c3d96 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > @@ -16,37 +16,37 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > index f74c9f044d0b..16c00cfc03b5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > @@ -16,16 +16,16 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX1: LV: Found an estimated cost of 3 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > index c5a7825348e9..1baeff242304 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > @@ -16,16 +16,16 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > index fc540da58700..99d0f28a03f8 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > @@ -16,37 +16,37 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > index bf0aba1931d1..8ce310962b48 100644 > --- a/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > +++ b/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > @@ -1,3 +1,4 @@ > +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py > ; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s | FileCheck %s > > ; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64 > @@ -9,21 +10,43 @@ target triple = "aarch64-linux-gnu" > ; we don't artificially create new predicated blocks for the load. > define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 { > ; CHECK-LABEL: @uniform_load( > +; CHECK-NEXT: entry: > +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] > +; CHECK: vector.ph: > +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], 3 > +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4 > +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] > +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] > ; CHECK: vector.body: > -; CHECK-NEXT: [[IDX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.*]], %vector.body ] > -; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0 > -; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n) > -; CHECK-NEXT: [[LOAD_VAL:%.*]] = load i32, i32* %src, align 4 > -; CHECK-NOT: load i32, i32* %src, align 4 > -; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[LOAD_VAL]], i32 0 > -; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> zeroinitializer > -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* %dst, i64 [[TMP3]] > -; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[TMP6]], i32 0 > -; CHECK-NEXT: [[STORE_PTR:%.*]] = bitcast i32* [[TMP7]] to <4 x i32>* > -; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[TMP5]], <4 x i32>* [[STORE_PTR]], i32 4, <4 x i1> [[LOOP_PRED]]) > -; CHECK-NEXT: [[IDX_NEXT]] = add i64 [[IDX]], 4 > -; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[IDX_NEXT]], %n.vec > -; CHECK-NEXT: br i1 [[CMP]], label %middle.block, label %vector.body > +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] > +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0 > +; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]]) > +; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[SRC:%.*]], align 4 > +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0 > +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer > +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[TMP2]], i32 0 > +; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <4 x i32>* > +; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[BROADCAST_SPLAT]], <4 x i32>* [[TMP4]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]]) > +; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 > +; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] > +; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > +; CHECK: middle.block: > +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] > +; CHECK: scalar.ph: > +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] > +; CHECK-NEXT: br label [[FOR_BODY:%.*]] > +; CHECK: for.body: > +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] > +; CHECK-NEXT: [[VAL:%.*]] = load i32, i32* [[SRC]], align 4 > +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[INDVARS_IV]] > +; CHECK-NEXT: store i32 [[VAL]], i32* [[ARRAYIDX]], align 4 > +; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 > +; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]] > +; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] > +; CHECK: for.end: > +; CHECK-NEXT: ret void > +; > > entry: > br label %for.body > @@ -47,18 +70,108 @@ for.end: ; preds = %for.body, %entry > ; and the original condition. > define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 { > ; CHECK-LABEL: @cond_uniform_load( > +; CHECK-NEXT: entry: > +; CHECK-NEXT: [[DST1:%.*]] = bitcast i32* [[DST:%.*]] to i8* > +; CHECK-NEXT: [[COND3:%.*]] = bitcast i32* [[COND:%.*]] to i8* > +; CHECK-NEXT: [[SRC6:%.*]] = bitcast i32* [[SRC:%.*]] to i8* > +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_MEMCHECK:%.*]] > +; CHECK: vector.memcheck: > +; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i32, i32* [[DST]], i64 [[N:%.*]] > +; CHECK-NEXT: [[SCEVGEP2:%.*]] = bitcast i32* [[SCEVGEP]] to i8* > +; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i32, i32* [[COND]], i64 [[N]] > +; CHECK-NEXT: [[SCEVGEP45:%.*]] = bitcast i32* [[SCEVGEP4]] to i8* > +; CHECK-NEXT: [[SCEVGEP7:%.*]] = getelementptr i32, i32* [[SRC]], i64 1 > +; CHECK-NEXT: [[SCEVGEP78:%.*]] = bitcast i32* [[SCEVGEP7]] to i8* > +; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult i8* [[DST1]], [[SCEVGEP45]] > +; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult i8* [[COND3]], [[SCEVGEP2]] > +; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] > +; CHECK-NEXT: [[BOUND09:%.*]] = icmp ult i8* [[DST1]], [[SCEVGEP78]] > +; CHECK-NEXT: [[BOUND110:%.*]] = icmp ult i8* [[SRC6]], [[SCEVGEP2]] > +; CHECK-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]] > +; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]] > +; CHECK-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]] > ; CHECK: vector.ph: > -; CHECK: [[TMP1:%.*]] = insertelement <4 x i32*> poison, i32* %src, i32 0 > -; CHECK-NEXT: [[SRC_SPLAT:%.*]] = shufflevector <4 x i32*> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer > +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], 3 > +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4 > +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] > +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] > ; CHECK: vector.body: > -; CHECK-NEXT: [[IDX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.*]], %vector.body ] > -; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0 > -; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n) > -; CHECK: [[COND_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* {{%.*}}, i32 4, <4 x i1> [[LOOP_PRED]], <4 x i32> poison) > -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[COND_LOAD]], zeroinitializer > +; CHECK-NEXT: [[INDEX12:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT19:%.*]], [[PRED_LOAD_CONTINUE18:%.*]] ] > +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX12]], 0 > +; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]]) > +; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[COND]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0 > +; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[TMP2]] to <4 x i32>* > +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison), !alias.scope !4 > +; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_MASKED_LOAD]], zeroinitializer > ; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true> > -; CHECK-NEXT: [[MASK:%.*]] = select <4 x i1> [[LOOP_PRED]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer > -; CHECK-NEXT: call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[SRC_SPLAT]], i32 4, <4 x i1> [[MASK]], <4 x i32> undef) > +; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer > +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP6]], i32 0 > +; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] > +; CHECK: pred.load.if: > +; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP8]], i32 0 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]] > +; CHECK: pred.load.continue: > +; CHECK-NEXT: [[TMP10:%.*]] = phi <4 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP9]], [[PRED_LOAD_IF]] ] > +; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i1> [[TMP6]], i32 1 > +; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_LOAD_IF13:%.*]], label [[PRED_LOAD_CONTINUE14:%.*]] > +; CHECK: pred.load.if13: > +; CHECK-NEXT: [[TMP12:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP12]], i32 1 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]] > +; CHECK: pred.load.continue14: > +; CHECK-NEXT: [[TMP14:%.*]] = phi <4 x i32> [ [[TMP10]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP13]], [[PRED_LOAD_IF13]] ] > +; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i1> [[TMP6]], i32 2 > +; CHECK-NEXT: br i1 [[TMP15]], label [[PRED_LOAD_IF15:%.*]], label [[PRED_LOAD_CONTINUE16:%.*]] > +; CHECK: pred.load.if15: > +; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i32> [[TMP14]], i32 [[TMP16]], i32 2 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE16]] > +; CHECK: pred.load.continue16: > +; CHECK-NEXT: [[TMP18:%.*]] = phi <4 x i32> [ [[TMP14]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP17]], [[PRED_LOAD_IF15]] ] > +; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i1> [[TMP6]], i32 3 > +; CHECK-NEXT: br i1 [[TMP19]], label [[PRED_LOAD_IF17:%.*]], label [[PRED_LOAD_CONTINUE18]] > +; CHECK: pred.load.if17: > +; CHECK-NEXT: [[TMP20:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i32> [[TMP18]], i32 [[TMP20]], i32 3 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE18]] > +; CHECK: pred.load.continue18: > +; CHECK-NEXT: [[TMP22:%.*]] = phi <4 x i32> [ [[TMP18]], [[PRED_LOAD_CONTINUE16]] ], [ [[TMP21]], [[PRED_LOAD_IF17]] ] > +; CHECK-NEXT: [[TMP23:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP4]], <4 x i1> zeroinitializer > +; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP23]], <4 x i32> zeroinitializer, <4 x i32> [[TMP22]] > +; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP25:%.*]] = or <4 x i1> [[TMP6]], [[TMP23]] > +; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, i32* [[TMP24]], i32 0 > +; CHECK-NEXT: [[TMP27:%.*]] = bitcast i32* [[TMP26]] to <4 x i32>* > +; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[PREDPHI]], <4 x i32>* [[TMP27]], i32 4, <4 x i1> [[TMP25]]), !alias.scope !9, !noalias !11 > +; CHECK-NEXT: [[INDEX_NEXT19]] = add i64 [[INDEX12]], 4 > +; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT19]], [[N_VEC]] > +; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] > +; CHECK: middle.block: > +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] > +; CHECK: scalar.ph: > +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ] > +; CHECK-NEXT: br label [[FOR_BODY:%.*]] > +; CHECK: for.body: > +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[IF_END:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] > +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[COND]], i64 [[INDEX]] > +; CHECK-NEXT: [[TMP29:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 > +; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP29]], 0 > +; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[IF_END]], label [[IF_THEN:%.*]] > +; CHECK: if.then: > +; CHECK-NEXT: [[TMP30:%.*]] = load i32, i32* [[SRC]], align 4 > +; CHECK-NEXT: br label [[IF_END]] > +; CHECK: if.end: > +; CHECK-NEXT: [[VAL_0:%.*]] = phi i32 [ [[TMP30]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ] > +; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[INDEX]] > +; CHECK-NEXT: store i32 [[VAL_0]], i32* [[ARRAYIDX1]], align 4 > +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1 > +; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N]] > +; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]] > +; CHECK: for.end: > +; CHECK-NEXT: ret void > +; > entry: > br label %for.body > > diff --git a/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll b/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > index def98e03030f..d13942e85466 100644 > --- a/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > +++ b/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > @@ -25,22 +25,22 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: iter.check: > ; AVX512-NEXT: br label [[VECTOR_BODY:%.*]] > ; AVX512: vector.body: > -; AVX512-NEXT: [[INDEX8:%.*]] = phi i64 [ 0, [[ITER_CHECK:%.*]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ] > -; AVX512-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ITER_CHECK:%.*]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ] > +; AVX512-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP1:%.*]] = bitcast i32* [[TMP0]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i32>, <16 x i32>* [[TMP1]], align 4 > ; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer > -; AVX512-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison) > ; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64> > ; AVX512-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], <16 x i64> [[TMP5]] > ; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef) > ; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01> > -; AVX512-NEXT: [[TMP8:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[TMP8:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP9:%.*]] = bitcast float* [[TMP8]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]]) > -; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX8]], 16 > +; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX7]], 16 > ; AVX512-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT]] > ; AVX512-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_1:%.*]] = load <16 x i32>, <16 x i32>* [[TMP11]], align 4 > @@ -55,7 +55,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP18:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT]] > ; AVX512-NEXT: [[TMP19:%.*]] = bitcast float* [[TMP18]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]]) > -; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX8]], 32 > +; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX7]], 32 > ; AVX512-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT_1]] > ; AVX512-NEXT: [[TMP21:%.*]] = bitcast i32* [[TMP20]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_2:%.*]] = load <16 x i32>, <16 x i32>* [[TMP21]], align 4 > @@ -70,7 +70,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP28:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT_1]] > ; AVX512-NEXT: [[TMP29:%.*]] = bitcast float* [[TMP28]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]]) > -; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX8]], 48 > +; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX7]], 48 > ; AVX512-NEXT: [[TMP30:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT_2]] > ; AVX512-NEXT: [[TMP31:%.*]] = bitcast i32* [[TMP30]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_3:%.*]] = load <16 x i32>, <16 x i32>* [[TMP31]], align 4 > @@ -85,7 +85,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP38:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT_2]] > ; AVX512-NEXT: [[TMP39:%.*]] = bitcast float* [[TMP38]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]]) > -; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX8]], 64 > +; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX7]], 64 > ; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096 > ; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > ; AVX512: for.end: > @@ -95,8 +95,8 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: entry: > ; FVW2-NEXT: br label [[VECTOR_BODY:%.*]] > ; FVW2: vector.body: > -; FVW2-NEXT: [[INDEX17:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] > -; FVW2-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX17]] > +; FVW2-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_LOAD_CONTINUE27:%.*]] ] > +; FVW2-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX7]] > ; FVW2-NEXT: [[TMP1:%.*]] = bitcast i32* [[TMP0]] to <2 x i32>* > ; FVW2-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 4 > ; FVW2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP0]], i64 2 > @@ -112,7 +112,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer > ; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer > ; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer > -; FVW2-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX17]] > +; FVW2-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX7]] > ; FVW2-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <2 x i32>* > ; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison) > ; FVW2-NEXT: [[TMP14:%.*]] = getelementptr i32, i32* [[TMP12]], i64 2 > @@ -128,33 +128,105 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64> > ; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64> > ; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64> > -; FVW2-NEXT: [[TMP24:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], <2 x i64> [[TMP20]] > -; FVW2-NEXT: [[TMP25:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP21]] > -; FVW2-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP22]] > -; FVW2-NEXT: [[TMP27:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP23]] > -; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP24]], i32 4, <2 x i1> [[TMP8]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER14:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP25]], i32 4, <2 x i1> [[TMP9]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER15:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP26]], i32 4, <2 x i1> [[TMP10]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER16:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP27]], i32 4, <2 x i1> [[TMP11]], <2 x float> undef) > -; FVW2-NEXT: [[TMP28:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP29:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER14]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP30:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER15]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP31:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER16]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP32:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX17]] > -; FVW2-NEXT: [[TMP33:%.*]] = bitcast float* [[TMP32]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP28]], <2 x float>* [[TMP33]], i32 4, <2 x i1> [[TMP8]]) > -; FVW2-NEXT: [[TMP34:%.*]] = getelementptr float, float* [[TMP32]], i64 2 > -; FVW2-NEXT: [[TMP35:%.*]] = bitcast float* [[TMP34]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP29]], <2 x float>* [[TMP35]], i32 4, <2 x i1> [[TMP9]]) > -; FVW2-NEXT: [[TMP36:%.*]] = getelementptr float, float* [[TMP32]], i64 4 > -; FVW2-NEXT: [[TMP37:%.*]] = bitcast float* [[TMP36]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP30]], <2 x float>* [[TMP37]], i32 4, <2 x i1> [[TMP10]]) > -; FVW2-NEXT: [[TMP38:%.*]] = getelementptr float, float* [[TMP32]], i64 6 > -; FVW2-NEXT: [[TMP39:%.*]] = bitcast float* [[TMP38]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP31]], <2 x float>* [[TMP39]], i32 4, <2 x i1> [[TMP11]]) > -; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX17]], 8 > -; FVW2-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 > -; FVW2-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > +; FVW2-NEXT: [[TMP24:%.*]] = extractelement <2 x i1> [[TMP8]], i64 0 > +; FVW2-NEXT: br i1 [[TMP24]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] > +; FVW2: pred.load.if: > +; FVW2-NEXT: [[TMP25:%.*]] = extractelement <2 x i64> [[TMP20]], i64 0 > +; FVW2-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], i64 [[TMP25]] > +; FVW2-NEXT: [[TMP27:%.*]] = load float, float* [[TMP26]], align 4 > +; FVW2-NEXT: [[TMP28:%.*]] = insertelement <2 x float> poison, float [[TMP27]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]] > +; FVW2: pred.load.continue: > +; FVW2-NEXT: [[TMP29:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP28]], [[PRED_LOAD_IF]] ] > +; FVW2-NEXT: [[TMP30:%.*]] = extractelement <2 x i1> [[TMP8]], i64 1 > +; FVW2-NEXT: br i1 [[TMP30]], label [[PRED_LOAD_IF14:%.*]], label [[PRED_LOAD_CONTINUE15:%.*]] > +; FVW2: pred.load.if14: > +; FVW2-NEXT: [[TMP31:%.*]] = extractelement <2 x i64> [[TMP20]], i64 1 > +; FVW2-NEXT: [[TMP32:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP31]] > +; FVW2-NEXT: [[TMP33:%.*]] = load float, float* [[TMP32]], align 4 > +; FVW2-NEXT: [[TMP34:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP33]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]] > +; FVW2: pred.load.continue15: > +; FVW2-NEXT: [[TMP35:%.*]] = phi <2 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], [[PRED_LOAD_IF14]] ] > +; FVW2-NEXT: [[TMP36:%.*]] = extractelement <2 x i1> [[TMP9]], i64 0 > +; FVW2-NEXT: br i1 [[TMP36]], label [[PRED_LOAD_IF16:%.*]], label [[PRED_LOAD_CONTINUE17:%.*]] > +; FVW2: pred.load.if16: > +; FVW2-NEXT: [[TMP37:%.*]] = extractelement <2 x i64> [[TMP21]], i64 0 > +; FVW2-NEXT: [[TMP38:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP37]] > +; FVW2-NEXT: [[TMP39:%.*]] = load float, float* [[TMP38]], align 4 > +; FVW2-NEXT: [[TMP40:%.*]] = insertelement <2 x float> poison, float [[TMP39]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]] > +; FVW2: pred.load.continue17: > +; FVW2-NEXT: [[TMP41:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE15]] ], [ [[TMP40]], [[PRED_LOAD_IF16]] ] > +; FVW2-NEXT: [[TMP42:%.*]] = extractelement <2 x i1> [[TMP9]], i64 1 > +; FVW2-NEXT: br i1 [[TMP42]], label [[PRED_LOAD_IF18:%.*]], label [[PRED_LOAD_CONTINUE19:%.*]] > +; FVW2: pred.load.if18: > +; FVW2-NEXT: [[TMP43:%.*]] = extractelement <2 x i64> [[TMP21]], i64 1 > +; FVW2-NEXT: [[TMP44:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP43]] > +; FVW2-NEXT: [[TMP45:%.*]] = load float, float* [[TMP44]], align 4 > +; FVW2-NEXT: [[TMP46:%.*]] = insertelement <2 x float> [[TMP41]], float [[TMP45]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]] > +; FVW2: pred.load.continue19: > +; FVW2-NEXT: [[TMP47:%.*]] = phi <2 x float> [ [[TMP41]], [[PRED_LOAD_CONTINUE17]] ], [ [[TMP46]], [[PRED_LOAD_IF18]] ] > +; FVW2-NEXT: [[TMP48:%.*]] = extractelement <2 x i1> [[TMP10]], i64 0 > +; FVW2-NEXT: br i1 [[TMP48]], label [[PRED_LOAD_IF20:%.*]], label [[PRED_LOAD_CONTINUE21:%.*]] > +; FVW2: pred.load.if20: > +; FVW2-NEXT: [[TMP49:%.*]] = extractelement <2 x i64> [[TMP22]], i64 0 > +; FVW2-NEXT: [[TMP50:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP49]] > +; FVW2-NEXT: [[TMP51:%.*]] = load float, float* [[TMP50]], align 4 > +; FVW2-NEXT: [[TMP52:%.*]] = insertelement <2 x float> poison, float [[TMP51]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]] > +; FVW2: pred.load.continue21: > +; FVW2-NEXT: [[TMP53:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE19]] ], [ [[TMP52]], [[PRED_LOAD_IF20]] ] > +; FVW2-NEXT: [[TMP54:%.*]] = extractelement <2 x i1> [[TMP10]], i64 1 > +; FVW2-NEXT: br i1 [[TMP54]], label [[PRED_LOAD_IF22:%.*]], label [[PRED_LOAD_CONTINUE23:%.*]] > +; FVW2: pred.load.if22: > +; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i64> [[TMP22]], i64 1 > +; FVW2-NEXT: [[TMP56:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP55]] > +; FVW2-NEXT: [[TMP57:%.*]] = load float, float* [[TMP56]], align 4 > +; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> [[TMP53]], float [[TMP57]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE23]] > +; FVW2: pred.load.continue23: > +; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ [[TMP53]], [[PRED_LOAD_CONTINUE21]] ], [ [[TMP58]], [[PRED_LOAD_IF22]] ] > +; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP11]], i64 0 > +; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF24:%.*]], label [[PRED_LOAD_CONTINUE25:%.*]] > +; FVW2: pred.load.if24: > +; FVW2-NEXT: [[TMP61:%.*]] = extractelement <2 x i64> [[TMP23]], i64 0 > +; FVW2-NEXT: [[TMP62:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP61]] > +; FVW2-NEXT: [[TMP63:%.*]] = load float, float* [[TMP62]], align 4 > +; FVW2-NEXT: [[TMP64:%.*]] = insertelement <2 x float> poison, float [[TMP63]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE25]] > +; FVW2: pred.load.continue25: > +; FVW2-NEXT: [[TMP65:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE23]] ], [ [[TMP64]], [[PRED_LOAD_IF24]] ] > +; FVW2-NEXT: [[TMP66:%.*]] = extractelement <2 x i1> [[TMP11]], i64 1 > +; FVW2-NEXT: br i1 [[TMP66]], label [[PRED_LOAD_IF26:%.*]], label [[PRED_LOAD_CONTINUE27]] > +; FVW2: pred.load.if26: > +; FVW2-NEXT: [[TMP67:%.*]] = extractelement <2 x i64> [[TMP23]], i64 1 > +; FVW2-NEXT: [[TMP68:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP67]] > +; FVW2-NEXT: [[TMP69:%.*]] = load float, float* [[TMP68]], align 4 > +; FVW2-NEXT: [[TMP70:%.*]] = insertelement <2 x float> [[TMP65]], float [[TMP69]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE27]] > +; FVW2: pred.load.continue27: > +; FVW2-NEXT: [[TMP71:%.*]] = phi <2 x float> [ [[TMP65]], [[PRED_LOAD_CONTINUE25]] ], [ [[TMP70]], [[PRED_LOAD_IF26]] ] > +; FVW2-NEXT: [[TMP72:%.*]] = fadd <2 x float> [[TMP35]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP73:%.*]] = fadd <2 x float> [[TMP47]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP74:%.*]] = fadd <2 x float> [[TMP59]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP71]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP76:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX7]] > +; FVW2-NEXT: [[TMP77:%.*]] = bitcast float* [[TMP76]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP72]], <2 x float>* [[TMP77]], i32 4, <2 x i1> [[TMP8]]) > +; FVW2-NEXT: [[TMP78:%.*]] = getelementptr float, float* [[TMP76]], i64 2 > +; FVW2-NEXT: [[TMP79:%.*]] = bitcast float* [[TMP78]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP73]], <2 x float>* [[TMP79]], i32 4, <2 x i1> [[TMP9]]) > +; FVW2-NEXT: [[TMP80:%.*]] = getelementptr float, float* [[TMP76]], i64 4 > +; FVW2-NEXT: [[TMP81:%.*]] = bitcast float* [[TMP80]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP74]], <2 x float>* [[TMP81]], i32 4, <2 x i1> [[TMP10]]) > +; FVW2-NEXT: [[TMP82:%.*]] = getelementptr float, float* [[TMP76]], i64 6 > +; FVW2-NEXT: [[TMP83:%.*]] = bitcast float* [[TMP82]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP75]], <2 x float>* [[TMP83]], i32 4, <2 x i1> [[TMP11]]) > +; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8 > +; FVW2-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 > +; FVW2-NEXT: br i1 [[TMP84]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > ; FVW2: for.end: > ; FVW2-NEXT: ret void > ; > @@ -365,40 +437,186 @@ define void @foo2(%struct.In* noalias %in, float* noalias %out, i32* noalias %tr > ; FVW2-NEXT: entry: > ; FVW2-NEXT: br label [[VECTOR_BODY:%.*]] > ; FVW2: vector.body: > -; FVW2-NEXT: [[INDEX10:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE9:%.*]] ] > -; FVW2-NEXT: [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.*]], [[PRED_STORE_CONTINUE9]] ] > -; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4 > +; FVW2-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE35:%.*]] ] > +; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4 > ; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16 > -; FVW2-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[OFFSET_IDX]] > -; FVW2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP0]] > -; FVW2-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP1]], align 4 > -; FVW2-NEXT: [[TMP4:%.*]] = load i32, i32* [[TMP2]], align 4 > -; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0 > -; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1 > -; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer > -; FVW2-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_IN:%.*]], %struct.In* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1 > -; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef) > -; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0 > -; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] > +; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32 > +; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48 > +; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64 > +; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80 > +; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96 > +; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112 > +; FVW2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[OFFSET_IDX]] > +; FVW2-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP0]] > +; FVW2-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP1]] > +; FVW2-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP2]] > +; FVW2-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP3]] > +; FVW2-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP4]] > +; FVW2-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP5]] > +; FVW2-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP6]] > +; FVW2-NEXT: [[TMP15:%.*]] = load i32, i32* [[TMP7]], align 4 > +; FVW2-NEXT: [[TMP16:%.*]] = load i32, i32* [[TMP8]], align 4 > +; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0 > +; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1 > +; FVW2-NEXT: [[TMP19:%.*]] = load i32, i32* [[TMP9]], align 4 > +; FVW2-NEXT: [[TMP20:%.*]] = load i32, i32* [[TMP10]], align 4 > +; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0 > </cut>

4 years, 4 months

1
0
0 0

[ACTIVITY] week ending Feb. 6 2022

by Alex Bennée

Project Stratos =============== - posted [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working) Message-Id: <20220121151534.3654562-1-alex.bennee(a)linaro.org> - need to increase coverage of the QEMU boilerplate to get it merged - discussions on next steps with SCMI backend with Vincent (moving from the QEMU->QEMU PoC) QEMU Upstream Work ([UM-2]) =========================== - posted [PATCH v2 00/25] testing and plugin updates Message-Id: <20220201182050.15087-1-alex.bennee(a)linaro.org> - posted [RFC PATCH 0/4] improve coverage of vector backend Message-Id: <20220202191242.652607-2-alex.bennee(a)linaro.org> - posted [PATCH v3 00/26] testing and plugins pre-PR Message-Id: <20220204204335.1689602-1-alex.bennee(a)linaro.org> - posted [RFC PATCH] arm: force flag recalculation when messing with DAIF Message-Id: <20220202122353.457084-1-alex.bennee(a)linaro.org> - trying to track down a weird TLS bug: <https://gitlab.com/stsquad/qemu/-/jobs/2056025874#L3532> - on aarch64 HW, running qemu-s390x with a simple test case fails every 100/200 times - seems TLS memory gets made non-accessible (rw-p -> ---p, except to gdb) - strace doesn't show a culprit, possible kernel bug? [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Other ===== - planning and brainstorming for Linaro Tech Day Completed Reviews [5/5] ======================= [PATCH v4 00/42] CXl 2.0 emulation Support Message-Id: <20220124171705.10432-1-Jonathan.Cameron(a)huawei.com> [PATCH] gitlab: fall back to commit hash in qemu-setup filename Message-Id: <20220125173454.10381-1-stefanha(a)redhat.com> [PATCH for-7.0] gitlab-ci: Add cirrus-ci based tests for NetBSD and OpenBSD Message-Id: <20211209103124.121942-1-thuth(a)redhat.com> [PATCH 00/20] tcg: vector improvements Message-Id: <20211218194250.247633-1-richard.henderson(a)linaro.org> Absences ======== Current Review Queue ==================== TODO [PATCH 0/4] target/arm: SVE fixes versus VHE Message-Id: <20220127063428.30212-1-richard.henderson(a)linaro.org> ================================================================================================================== TODO [PATCH 00/14] arm_gicv3_its: Implement MOVI and MOVALL commands Message-Id: <20220122182444.724087-1-peter.maydell(a)linaro.org> ================================================================================================================================== TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com> ============================================================================================================================================== TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> ====================================================================================================================================== -- Alex Bennée

4 years, 4 months

1
0
0 0

[ACTIVITY] report week ending 4 Feb

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Fixed some minor issues with the hvf accelerator and sent out a patchset + '-cpu max' didn't act like '-cpu host' + we weren't exposing PAuth to the guest * QEMU-420 [GICv4 emulation] - Sent out a patchset with more cleanups and fixes to the existing ITS code - The ITS parts of the GICv4 work are now code-complete; moving on to the redistributor end of things next week. -- PMM

4 years, 4 months

1
0
0 0

[ACTIVITY] report week ending 28 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Before the QEMU 7.0 release we tried to land a bug fix which corrected the handling in our PSCI emulation of calls where the function ID is unrecognized -- these are supposed to return an error code. The bugfix turned out to cause regressions for some boards when running guest code at EL3 (because those boards were incorrectly enabling PSCI emulation in that situation). Sent a patchset that fixed those boards so we don't enable PSCI when running EL3 guests, and re-introduced the original PSCI bugfix. - Fixed various bugs in the highbank/midway boards discovered in the process of writing and testing the above patchset. (These two boards were the most complicated to fix.) - More code review, and sent out an arm pullrequest - Small handful of other minor patches -- PMM

4 years, 5 months

1
0
0 0

tsan buildbot failure possibly due to DWARFv5 switch

by David Blaikie

Seems like my change to make Clang default to DWARFv5 might've caused a buildbot failure on your build worker here: https://lab.llvm.org/buildbot/#/builders/185/builds/1295 But I seem to be able to run this test successfully locally on my Linux machine - so I'm wondering if you can offer any help diagnosing the issue showing up on your builder/worker?

4 years, 5 months

2
2
0 0

[ACTIVITY] week ending Jan. 23 2022

by Alex Bennée

Project Stratos =============== - [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working) Message-Id: <20220121151534.3654562-1-alex.bennee(a)linaro.org> - trying to clear the way for merging virtio-gpio to QEMU vhost-device maintainer effort ([UM-196]) - reviewed vhost-device [pr7 with the vm-virtio vsock abstraction] [UM-196] <https://linaro.atlassian.net/browse/UM-196> [pr7 with the vm-virtio vsock abstraction] <https://github.com/stsquad/vhost-device/tree/review/pr7-with-laurat-abstrac…> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL v2 00/31] testing/next and other misc fixes Message-Id: <20220118190043.1427303-1-alex.bennee(a)linaro.org> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> Completed Reviews [2/2] ======================= [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> [PATCH v2 0/6] qtests/libqos: Allow PCI tests to be run with virt-machine Message-Id: <20220118203833.316741-7-eric.auger(a)redhat.com> Absences ======== Current Review Queue ==================== TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com> ============================================================================================================================================== TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> ====================================================================================================================================== TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org> ============================================================================================================================ TODO [PATCH 0/7] tcg: some small towards more modular tcg Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com> ================================================================================================================= -- Alex Bennée

4 years, 5 months

1
0
0 0

[ACTIVITY] report week ending 21 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Sent patches for some reported bugs to do with state save/load * QEMU-420 [GICv4 emulation] - Wrote patches to implement the missing MOVALL and MOVI commands - Fixed a few minor bugs noticed along the way - Should be able to send out a patchset early next week and then can get back to the new-in-GICv4 work -- PMM

4 years, 5 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain