linaro-toolchain

linaro-toolchain@lists.linaro.org

3 participants
5707 discussions

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_debug - Build # 2 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_debug. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_debug Culprit: <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. </cut> Results regressed to (for first_bad == 1d244020246cb155e4de62ca3b302b920a1f513f) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:06:26 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:25:39 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: error: type mismatch in ‘lshift_expr’ # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: internal compiler error: ‘verify_gimple’ failed # 00:29:38 make[3]: *** [bitmap.o] Error 1 # 00:34:06 make[2]: *** [all-stage3-gcc] Error 2 # 00:34:06 make[1]: *** [stage3-bubble] Error 2 # 00:34:07 make: *** [all] Error 2 from (for last_good == b320edc0c29c838b0090c3c9be14187d132f73f2) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_debug: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f cd investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 1d244020246cb155e4de62ca3b302b920a1f513f ../artifacts/test.sh # Reproduce last_good build git checkout --detach b320edc0c29c838b0090c3c9be14187d132f73f2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Full commit (up to 1000 lines): <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. --- gcc/match.pd | 9 +++++++++ gcc/testsuite/gcc.dg/fold-convlshift-1.c | 20 ++++++++++++++++++++ gcc/testsuite/gcc.dg/fold-convlshift-2.c | 20 ++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 0fcfd0ea62c..978a1b0172e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3385,6 +3385,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (integer_zerop (@2) || integer_all_onesp (@2)) (cmp @0 @2))))) +/* Both signed and unsigned lshift produce the same result, so use + the form that minimizes the number of conversions. */ +(simplify + (convert (lshift:s@0 (convert:s@1 @2) INTEGER_CST@3)) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) + && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)) + (lshift (convert @2) @3))) + /* Simplifications of conversions. */ /* Basic strip-useless-type-conversions / strip_nops. */ diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-1.c b/gcc/testsuite/gcc.dg/fold-convlshift-1.c new file mode 100644 index 00000000000..b6f57f81e72 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned int i) +{ + int t1 = i; + int t2 = t1 << 8; + return t2; +} + +int bar(int i) +{ + unsigned int t1 = i; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-not "\$int\$" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\$unsigned int\$" "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-2.c b/gcc/testsuite/gcc.dg/fold-convlshift-2.c new file mode 100644 index 00000000000..f21358c4584 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned char c) +{ + int t1 = c; + int t2 = t1 << 8; + return t2; +} + +int bar(unsigned char c) +{ + unsigned int t1 = c; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-times "\$int\$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\$unsigned int\$" 1 "optimized" } } */ + </cut>

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 14 - Successful!

by ci_notify＠linaro.org

4 years, 8 months

Moving to llvm@lists.linux.dev

by Nathan Chancellor

Hi everyone, We are shifting the ClangBuiltLinux mailing list from clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev. Google Groups has served us well but moving to lists.linux.dev allows for easier archival (as we will be on lore.kernel.org automatically) and allows for people to subscribe to us easier, as they only need an email address, rather than a Google account. Please follow these directions to subscribe to the new mailing list: https://subspace.kernel.org/index.html#subscribing Some more information about lists.linux.dev: https://www.kernel.org/lists-linux-dev.html https://subspace.kernel.org/lists.linux.dev.html I have added CI maintainers/mailing lists that send us regular reports to this announcement. Please continue to send us emails about build results, just switch the email from clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev so that they get archived as a part of lore and can be easily searched, especially with the upcoming https://x-lore.kernel.org/all/. I will send a patch shortly to update MAINTAINERS. Cheers, Nathan

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 Author: Fangrui Song <i(a)maskray.me> Date: Thu Mar 25 21:55:27 2021 -0700 [sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926 </cut> Results regressed to (for first_bad == 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 9d375a40c3df90dd48edc0e1b1115c702c55d716) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-9d375a40c3df90dd48edc0e1b1115c702c55d716/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4304 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4302 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 cd investigate-llvm-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9d375a40c3df90dd48edc0e1b1115c702c55d716 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 Author: Fangrui Song <i(a)maskray.me> Date: Thu Mar 25 21:55:27 2021 -0700 [sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926 --- compiler-rt/lib/asan/asan_rtl.cpp | 5 +- compiler-rt/lib/asan/asan_thread.cpp | 2 +- compiler-rt/lib/hwasan/hwasan.cpp | 2 - compiler-rt/lib/lsan/lsan.cpp | 1 - compiler-rt/lib/memprof/memprof_rtl.cpp | 3 - compiler-rt/lib/msan/msan.cpp | 1 - .../lib/sanitizer_common/sanitizer_common.h | 1 - .../lib/sanitizer_common/sanitizer_fuchsia.cpp | 1 - compiler-rt/lib/sanitizer_common/sanitizer_linux.h | 1 - .../sanitizer_common/sanitizer_linux_libcdep.cpp | 231 ++++++++------------- compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp | 3 - .../lib/sanitizer_common/sanitizer_rtems.cpp | 1 - compiler-rt/lib/sanitizer_common/sanitizer_win.cpp | 3 - .../tests/sanitizer_common_test.cpp | 2 - .../tests/sanitizer_linux_test.cpp | 17 +- compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp | 1 - 16 files changed, 91 insertions(+), 184 deletions(-) diff --git a/compiler-rt/lib/asan/asan_rtl.cpp b/compiler-rt/lib/asan/asan_rtl.cpp index 7b5a929963c6..106a52607631 100644 --- a/compiler-rt/lib/asan/asan_rtl.cpp +++ b/compiler-rt/lib/asan/asan_rtl.cpp @@ -490,9 +490,6 @@ static void AsanInitInternal() { if (flags()->start_deactivated) AsanDeactivate(); - // interceptors - InitTlsSize(); - // Create main thread. AsanThread *main_thread = CreateMainThread(); CHECK_EQ(0, main_thread->tid()); @@ -568,7 +565,7 @@ void UnpoisonStack(uptr bottom, uptr top, const char *type) { type, top, bottom, top - bottom, top - bottom); return; } - PoisonShadow(bottom, top - bottom, 0); + PoisonShadow(bottom, RoundUpTo(top - bottom, SHADOW_GRANULARITY), 0); } static void UnpoisonDefaultStack() { diff --git a/compiler-rt/lib/asan/asan_thread.cpp b/compiler-rt/lib/asan/asan_thread.cpp index ae3bcba204c6..f7778c0f1e34 100644 --- a/compiler-rt/lib/asan/asan_thread.cpp +++ b/compiler-rt/lib/asan/asan_thread.cpp @@ -307,7 +307,7 @@ void AsanThread::SetThreadStackAndTls(const InitOptions *options) { uptr stack_size = 0; GetThreadStackAndTls(tid() == 0, &stack_bottom_, &stack_size, &tls_begin_, &tls_size); - stack_top_ = stack_bottom_ + stack_size; + stack_top_ = RoundDownTo(stack_bottom_ + stack_size, SHADOW_GRANULARITY); tls_end_ = tls_begin_ + tls_size; dtls_ = DTLS_Get(); diff --git a/compiler-rt/lib/hwasan/hwasan.cpp b/compiler-rt/lib/hwasan/hwasan.cpp index 5c0d804561d2..ce08ec3508c4 100644 --- a/compiler-rt/lib/hwasan/hwasan.cpp +++ b/compiler-rt/lib/hwasan/hwasan.cpp @@ -265,8 +265,6 @@ void __hwasan_init() { hwasan_init_is_running = 1; SanitizerToolName = "HWAddressSanitizer"; - InitTlsSize(); - CacheBinaryName(); InitializeFlags(); diff --git a/compiler-rt/lib/lsan/lsan.cpp b/compiler-rt/lib/lsan/lsan.cpp index 2c0a3bf0787c..b264be0ba792 100644 --- a/compiler-rt/lib/lsan/lsan.cpp +++ b/compiler-rt/lib/lsan/lsan.cpp @@ -98,7 +98,6 @@ extern "C" void __lsan_init() { InitCommonLsan(); InitializeAllocator(); ReplaceSystemMalloc(); - InitTlsSize(); InitializeInterceptors(); InitializeThreadRegistry(); InstallDeadlySignalHandlers(LsanOnDeadlySignal); diff --git a/compiler-rt/lib/memprof/memprof_rtl.cpp b/compiler-rt/lib/memprof/memprof_rtl.cpp index d6d606f666ee..05759e406f7a 100644 --- a/compiler-rt/lib/memprof/memprof_rtl.cpp +++ b/compiler-rt/lib/memprof/memprof_rtl.cpp @@ -214,9 +214,6 @@ static void MemprofInitInternal() { InitializeCoverage(common_flags()->coverage, common_flags()->coverage_dir); - // interceptors - InitTlsSize(); - // Create main thread. MemprofThread *main_thread = CreateMainThread(); CHECK_EQ(0, main_thread->tid()); diff --git a/compiler-rt/lib/msan/msan.cpp b/compiler-rt/lib/msan/msan.cpp index 4be1630cd302..4ee7e2ec4dd6 100644 --- a/compiler-rt/lib/msan/msan.cpp +++ b/compiler-rt/lib/msan/msan.cpp @@ -436,7 +436,6 @@ void __msan_init() { InitializeInterceptors(); CheckASLR(); - InitTlsSize(); InstallDeadlySignalHandlers(MsanOnDeadlySignal); InstallAtExitHandler(); // Needs __cxa_atexit interceptor. diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common.h b/compiler-rt/lib/sanitizer_common/sanitizer_common.h index dcd625d30f77..2b2629fc12dd 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_common.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_common.h @@ -284,7 +284,6 @@ void SetSandboxingCallback(void (*f)()); void InitializeCoverage(bool enabled, const char *coverage_dir); -void InitTlsSize(); uptr GetTlsSize(); // Other diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp index 4f692f99c207..5d68ad8ee8e4 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp @@ -103,7 +103,6 @@ void DisableCoreDumperIfNecessary() {} void InstallDeadlySignalHandlers(SignalHandlerType handler) {} void SetAlternateSignalStack() {} void UnsetAlternateSignalStack() {} -void InitTlsSize() {} bool SignalContext::IsStackOverflow() const { return false; } void SignalContext::DumpAllRegisters(void *context) { UNIMPLEMENTED(); } diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.h b/compiler-rt/lib/sanitizer_common/sanitizer_linux.h index 41ae072d6cac..9a23fcfb3b93 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.h @@ -98,7 +98,6 @@ class ThreadLister { // Exposed for testing. uptr ThreadDescriptorSize(); uptr ThreadSelf(); -uptr ThreadSelfOffset(); // Matches a library's file name against a base name (stripping path and version // information). diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp index 613658147bbd..1177a1ceb14f 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp @@ -184,80 +184,8 @@ __attribute__((unused)) static bool GetLibcVersion(int *major, int *minor, #endif } -#if SANITIZER_GLIBC && !SANITIZER_GO -static uptr g_tls_size; - -#ifdef __i386__ -#define CHECK_GET_TLS_STATIC_INFO_VERSION (!__GLIBC_PREREQ(2, 27)) -#else -#define CHECK_GET_TLS_STATIC_INFO_VERSION 0 -#endif - -#if CHECK_GET_TLS_STATIC_INFO_VERSION -#define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall)) -#else -#define DL_INTERNAL_FUNCTION -#endif - -namespace { -struct GetTlsStaticInfoCall { - typedef void (*get_tls_func)(size_t*, size_t*); -}; -struct GetTlsStaticInfoRegparmCall { - typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION; -}; - -template <typename T> -void CallGetTls(void* ptr, size_t* size, size_t* align) { - typename T::get_tls_func get_tls; - CHECK_EQ(sizeof(get_tls), sizeof(ptr)); - internal_memcpy(&get_tls, &ptr, sizeof(ptr)); - CHECK_NE(get_tls, 0); - get_tls(size, align); -} - -bool CmpLibcVersion(int major, int minor, int patch) { - int ma; - int mi; - int pa; - if (!GetLibcVersion(&ma, &mi, &pa)) - return false; - if (ma > major) - return true; - if (ma < major) - return false; - if (mi > minor) - return true; - if (mi < minor) - return false; - return pa >= patch; -} - -} // namespace - -void InitTlsSize() { - // all current supported platforms have 16 bytes stack alignment - const size_t kStackAlign = 16; - void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info"); - size_t tls_size = 0; - size_t tls_align = 0; - // On i?86, _dl_get_tls_static_info used to be internal_function, i.e. - // __attribute__((regparm(3), stdcall)) before glibc 2.27 and is normal - // function in 2.27 and later. - if (CHECK_GET_TLS_STATIC_INFO_VERSION && !CmpLibcVersion(2, 27, 0)) - CallGetTls<GetTlsStaticInfoRegparmCall>(get_tls_static_info_ptr, - &tls_size, &tls_align); - else - CallGetTls<GetTlsStaticInfoCall>(get_tls_static_info_ptr, - &tls_size, &tls_align); - if (tls_align < kStackAlign) - tls_align = kStackAlign; - g_tls_size = RoundUpTo(tls_size, tls_align); -} -#else -void InitTlsSize() { } -#endif // SANITIZER_GLIBC && !SANITIZER_GO - +// ThreadDescriptorSize() is only used by lsan to get the pointer to +// thread-specific data keys in the thread control block. #if (defined(__x86_64__) || defined(__i386__) || defined(__mips__) || \ defined(__aarch64__) || defined(__powerpc64__) || defined(__s390__) || \ defined(__arm__) || SANITIZER_RISCV64) && \ @@ -330,13 +258,6 @@ uptr ThreadDescriptorSize() { return val; } -// The offset at which pointer to self is located in the thread descriptor. -const uptr kThreadSelfOffset = FIRST_32_SECOND_64(8, 16); - -uptr ThreadSelfOffset() { - return kThreadSelfOffset; -} - #if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 // TlsPreTcbSize includes size of struct pthread_descr and size of tcb // head structure. It lies before the static tls blocks. @@ -355,48 +276,61 @@ static uptr TlsPreTcbSize() { } #endif -uptr ThreadSelf() { - uptr descr_addr; -#if defined(__i386__) - asm("mov %%gs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset)); -#elif defined(__x86_64__) - asm("mov %%fs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset)); -#elif defined(__mips__) - // MIPS uses TLS variant I. The thread pointer (in hardware register $29) - // points to the end of the TCB + 0x7000. The pthread_descr structure is - // immediately in front of the TCB. TlsPreTcbSize() includes the size of the - // TCB and the size of pthread_descr. - const uptr kTlsTcbOffset = 0x7000; - uptr thread_pointer; - asm volatile(".set push;\ - .set mips64r2;\ - rdhwr %0,$29;\ - .set pop" : "=r" (thread_pointer)); - descr_addr = thread_pointer - kTlsTcbOffset - TlsPreTcbSize(); -#elif defined(__aarch64__) || defined(__arm__) - descr_addr = reinterpret_cast<uptr>(__builtin_thread_pointer()) - - ThreadDescriptorSize(); -#elif SANITIZER_RISCV64 - // https://github.com/riscv/riscv-elf-psabi-doc/issues/53 - uptr thread_pointer = reinterpret_cast<uptr>(__builtin_thread_pointer()); - descr_addr = thread_pointer - TlsPreTcbSize(); -#elif defined(__s390__) - descr_addr = reinterpret_cast<uptr>(__builtin_thread_pointer()); -#elif defined(__powerpc64__) - // PPC64LE uses TLS variant I. The thread pointer (in GPR 13) - // points to the end of the TCB + 0x7000. The pthread_descr structure is - // immediately in front of the TCB. TlsPreTcbSize() includes the size of the - // TCB and the size of pthread_descr. - const uptr kTlsTcbOffset = 0x7000; - uptr thread_pointer; - asm("addi %0,13,%1" : "=r"(thread_pointer) : "I"(-kTlsTcbOffset)); - descr_addr = thread_pointer - TlsPreTcbSize(); -#else -#error "unsupported CPU arch" -#endif - return descr_addr; +#if !SANITIZER_GO +namespace { +struct TlsRange { + uptr begin, end, align; + size_t tls_modid; + bool operator<(const TlsRange &rhs) const { return begin < rhs.begin; } +}; +} // namespace + +static int CollectStaticTlsRanges(struct dl_phdr_info *info, size_t size, + void *data) { + if (!info->dlpi_tls_data) + return 0; + const uptr begin = (uptr)info->dlpi_tls_data; + for (unsigned i = 0; i != info->dlpi_phnum; ++i) + if (info->dlpi_phdr[i].p_type == PT_TLS) { + static_cast<InternalMmapVector<TlsRange> *>(data)->push_back( + TlsRange{begin, begin + info->dlpi_phdr[i].p_memsz, + info->dlpi_phdr[i].p_align, info->dlpi_tls_modid}); + break; + } + return 0; } -#endif // (x86_64 || i386 || MIPS) && SANITIZER_LINUX + +static void GetStaticTlsRange(uptr *addr, uptr *size) { + InternalMmapVector<TlsRange> ranges; + dl_iterate_phdr(CollectStaticTlsRanges, &ranges); + uptr len = ranges.size(); + Sort(ranges.begin(), len); + // Find the range with tls_modid=1. For glibc, because libc.so uses PT_TLS, + // this module is guaranteed to exist and is one of the initially loaded + // modules. + uptr one = 0; + while (one != len && ranges[one].tls_modid != 1) ++one; + if (one == len) { + // This may happen with musl if no module uses PT_TLS. + *addr = 0; + *size = 0; + return; + } + // Find the maximum consecutive ranges. We consider two modules consecutive if + // the gap is smaller than the alignment. The dynamic loader places static TLS + // blocks this way not to waste space. + uptr l = one; + while (l != 0 && ranges[l].begin < ranges[l - 1].end + ranges[l - 1].align) + --l; + uptr r = one + 1; + while (r != len && ranges[r].begin < ranges[r - 1].end + ranges[r - 1].align) + ++r; + *addr = ranges[l].begin; + *size = ranges[r - 1].end - ranges[l].begin; +} +#endif // !SANITIZER_GO +#endif // (x86_64 || i386 || mips || ...) && SANITIZER_LINUX && + // !SANITIZER_ANDROID #if SANITIZER_FREEBSD static void **ThreadSelfSegbase() { @@ -468,18 +402,36 @@ static void GetTls(uptr *addr, uptr *size) { *size = 0; } #elif SANITIZER_LINUX + GetStaticTlsRange(addr, size); #if defined(__x86_64__) || defined(__i386__) || defined(__s390__) - *addr = ThreadSelf(); - *size = GetTlsSize(); - *addr -= *size; - *addr += ThreadDescriptorSize(); -#elif defined(__mips__) || defined(__aarch64__) || defined(__powerpc64__) || \ - defined(__arm__) || SANITIZER_RISCV64 - *addr = ThreadSelf(); - *size = GetTlsSize(); + // lsan requires the range to additionally cover the static TLS surplus + // (elf/dl-tls.c defines 1664). Otherwise there may be false positives for + // allocations only referenced by tls in dynamically loaded modules. + if (SANITIZER_GLIBC) { + *addr -= 1664; + *size += 1664; + } + // Extend the range to include the thread control block. On glibc, lsan needs + // the range to include pthread::{specific_1stblock,specific} so that + // allocations only referenced by pthread_setspecific can be scanned. This may + // underestimate by at most TLS_TCB_ALIGN-1 bytes but it should be fine + // because the number of bytes after pthread::specific is larger. + *size += ThreadDescriptorSize(); #else - *addr = 0; - *size = 0; + if (SANITIZER_GLIBC) + *size += 1664; +#if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 + const uptr pre_tcb_size = TlsPreTcbSize(); + *addr -= pre_tcb_size; + *size += pre_tcb_size; +#else + // arm and aarch64 reserve two words at TP, so this underestimates the range. + // However, this is sufficient for the purpose of finding the pointers to + // thread-specific data keys. + const uptr tcb_size = ThreadDescriptorSize(); + *addr -= tcb_size; + *size += tcb_size; +#endif #endif #elif SANITIZER_FREEBSD void** segbase = ThreadSelfSegbase(); @@ -520,17 +472,11 @@ static void GetTls(uptr *addr, uptr *size) { #if !SANITIZER_GO uptr GetTlsSize() { -#if SANITIZER_FREEBSD || SANITIZER_ANDROID || SANITIZER_NETBSD || \ +#if SANITIZER_FREEBSD || SANITIZER_LINUX || SANITIZER_NETBSD || \ SANITIZER_SOLARIS uptr addr, size; GetTls(&addr, &size); return size; -#elif SANITIZER_GLIBC -#if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 - return RoundUpTo(g_tls_size + TlsPreTcbSize(), 16); -#else - return g_tls_size; -#endif #else return 0; #endif @@ -553,10 +499,9 @@ void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, if (!main) { // If stack and tls intersect, make them non-intersecting. if (*tls_addr > *stk_addr && *tls_addr < *stk_addr + *stk_size) { - CHECK_GT(*tls_addr + *tls_size, *stk_addr); - CHECK_LE(*tls_addr + *tls_size, *stk_addr + *stk_size); - *stk_size -= *tls_size; - *tls_addr = *stk_addr + *stk_size; + if (*stk_addr + *stk_size < *tls_addr + *tls_size) + *tls_size = *stk_addr + *stk_size - *tls_addr; + *stk_size = *tls_addr - *stk_addr; } } #endif diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp index d7b0bde173c8..5055df1ec29a 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp @@ -548,9 +548,6 @@ uptr GetTlsSize() { return 0; } -void InitTlsSize() { -} - uptr TlsBaseAddr() { uptr segbase = 0; #if defined(__x86_64__) diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp index d58bd08fb1a8..01554349cc04 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp @@ -106,7 +106,6 @@ void DisableCoreDumperIfNecessary() {} void InstallDeadlySignalHandlers(SignalHandlerType handler) {} void SetAlternateSignalStack() {} void UnsetAlternateSignalStack() {} -void InitTlsSize() {} void SignalContext::DumpAllRegisters(void *context) {} const char *DescribeSignalOrException(int signo) { UNIMPLEMENTED(); } diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp index f383e130fa59..d47ccad1764d 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp @@ -846,9 +846,6 @@ uptr GetTlsSize() { return 0; } -void InitTlsSize() { -} - void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, uptr *tls_addr, uptr *tls_size) { #if SANITIZER_GO diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp index 80df9b497b2d..21c6b036b956 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp @@ -210,12 +210,10 @@ static void *WorkerThread(void *arg) { } TEST(SanitizerCommon, ThreadStackTlsMain) { - InitTlsSize(); TestThreadInfo(true); } TEST(SanitizerCommon, ThreadStackTlsWorker) { - InitTlsSize(); pthread_t t; PTHREAD_CREATE(&t, 0, WorkerThread, 0); PTHREAD_JOIN(t, 0); diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp index cb6c0724ac88..025cba922d2d 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp @@ -188,24 +188,9 @@ TEST(SanitizerCommon, SetEnvTest) { } #if (defined(__x86_64__) || defined(__i386__)) && !SANITIZER_ANDROID -void *thread_self_offset_test_func(void *arg) { - bool result = - *(uptr *)((char *)ThreadSelf() + ThreadSelfOffset()) == ThreadSelf(); - return (void *)result; -} - -TEST(SanitizerLinux, ThreadSelfOffset) { - EXPECT_TRUE((bool)thread_self_offset_test_func(0)); - pthread_t tid; - void *result; - ASSERT_EQ(0, pthread_create(&tid, 0, thread_self_offset_test_func, 0)); - ASSERT_EQ(0, pthread_join(tid, &result)); - EXPECT_TRUE((bool)result); -} - // libpthread puts the thread descriptor at the end of stack space. void *thread_descriptor_size_test_func(void *arg) { - uptr descr_addr = ThreadSelf(); + uptr descr_addr = (uptr)pthread_self(); pthread_attr_t attr; pthread_getattr_np(pthread_self(), &attr); void *stackaddr; diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp index 45acfe66ff3f..0d26f497f2bd 100644 --- a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp +++ b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp @@ -318,7 +318,6 @@ void InitializePlatform() { } CheckAndProtect(); - InitTlsSize(); #endif // !SANITIZER_GO } </cut>

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 </cut> Results regressed to (for first_bad == a26f1bf67ec70f72e64101cf483b26466928fc38) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-a26f1bf67ec70f72e64101cf483b26466928fc38/results_id: 1 # 462.libquantum,libquantum_base.default regressed by 104 from (for last_good == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-baseline/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/baseline-llvm-release-arm-spec2k6-O3_LTO/4271 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/4289 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 cd investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a26f1bf67ec70f72e64101cf483b26466928fc38 ../artifacts/test.sh # Reproduce last_good build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 --- llvm/lib/Passes/PassBuilder.cpp | 10 +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | 4 + llvm/test/CodeGen/AMDGPU/opt-pipeline.ll | 30 +++++--- llvm/test/Other/new-pm-defaults.ll | 7 +- llvm/test/Other/new-pm-thinlto-defaults.ll | 7 +- .../Other/new-pm-thinlto-postlink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-postlink-samplepgo-defaults.ll | 7 +- .../Other/new-pm-thinlto-prelink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-prelink-samplepgo-defaults.ll | 5 +- llvm/test/Other/opt-O2-pipeline.ll | 10 ++- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | 10 ++- llvm/test/Other/opt-O3-pipeline.ll | 10 ++- llvm/test/Other/opt-Os-pipeline.ll | 10 ++- llvm/test/Other/pass-pipelines.ll | 3 + llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll | 25 ++++--- .../PhaseOrdering/X86/spurious-peeling.ll | 87 +++++++++------------- llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll | 78 +++++++++---------- .../loop-rotation-vs-common-code-hoisting.ll | 22 +++--- 18 files changed, 193 insertions(+), 150 deletions(-) diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 3a325277e370..5a2285215769 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true, isLTOPreLink(Phase))); // TODO: Investigate promotion cap for O1. @@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + // Disable header duplication in loop rotation at -Oz. LPM1.addPass( LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 109e7c97ff1b..2c80a16febef 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses( MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO)); // TODO: Investigate promotion cap for O1. diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 34e5e6c647da..5e33d968c710 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -129,16 +129,20 @@ ; GCN-O1-NEXT: Simplify the CFG ; GCN-O1-NEXT: Reassociate expressions ; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O1-NEXT: Function Alias Analysis Results +; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: Canonicalize natural loops ; GCN-O1-NEXT: LCSSA Verifier ; GCN-O1-NEXT: Loop-Closed SSA Form Pass -; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O1-NEXT: Function Alias Analysis Results ; GCN-O1-NEXT: Scalar Evolution Analysis +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Loop Pass Manager +; GCN-O1-NEXT: Loop Invariant Code Motion ; GCN-O1-NEXT: Loop Pass Manager ; GCN-O1-NEXT: Rotate Loops -; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Lazy Branch Probability Analysis ; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: Loop Pass Manager @@ -451,16 +455,20 @@ ; GCN-O2-NEXT: Simplify the CFG ; GCN-O2-NEXT: Reassociate expressions ; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O2-NEXT: Function Alias Analysis Results +; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: Canonicalize natural loops ; GCN-O2-NEXT: LCSSA Verifier ; GCN-O2-NEXT: Loop-Closed SSA Form Pass -; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O2-NEXT: Function Alias Analysis Results ; GCN-O2-NEXT: Scalar Evolution Analysis +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis +; GCN-O2-NEXT: Loop Pass Manager +; GCN-O2-NEXT: Loop Invariant Code Motion ; GCN-O2-NEXT: Loop Pass Manager ; GCN-O2-NEXT: Rotate Loops -; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Lazy Branch Probability Analysis ; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: Loop Pass Manager @@ -810,16 +818,20 @@ ; GCN-O3-NEXT: Simplify the CFG ; GCN-O3-NEXT: Reassociate expressions ; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O3-NEXT: Function Alias Analysis Results +; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Natural Loop Information ; GCN-O3-NEXT: Canonicalize natural loops ; GCN-O3-NEXT: LCSSA Verifier ; GCN-O3-NEXT: Loop-Closed SSA Form Pass -; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O3-NEXT: Function Alias Analysis Results ; GCN-O3-NEXT: Scalar Evolution Analysis +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis +; GCN-O3-NEXT: Loop Pass Manager +; GCN-O3-NEXT: Loop Invariant Code Motion ; GCN-O3-NEXT: Loop Pass Manager ; GCN-O3-NEXT: Rotate Loops -; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Lazy Branch Probability Analysis ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll index 01b02b8fd482..337a0857701c 100644 --- a/llvm/test/Other/new-pm-defaults.ll +++ b/llvm/test/Other/new-pm-defaults.ll @@ -113,9 +113,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}> ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -156,6 +156,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll b/llvm/test/Other/new-pm-thinlto-defaults.ll index fbf47de87eeb..bba43dd50e7a 100644 --- a/llvm/test/Other/new-pm-thinlto-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-defaults.ll @@ -98,9 +98,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -139,6 +139,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll index 4bcf70e15a5b..57f0e0da73b6 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll @@ -68,10 +68,10 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -112,6 +112,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll index 1071d28432b9..0e0e2854b8df 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll @@ -78,9 +78,9 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -121,6 +121,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll index e2f1385cf52b..4cfb9825c97e 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll @@ -93,10 +93,10 @@ ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo @@ -158,6 +158,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll index d4dc552aea01..a05555c57003 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll @@ -73,8 +73,8 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis @@ -116,6 +116,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index f7217c122fdb..a3b01e5464d4 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -101,16 +101,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll index 6b98c1f80d9e..fafd5c8fdcb8 100644 --- a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll +++ b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 00a1d61ac058..103d49bbbbab 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index 21f9b8c6009e..508c21edbc68 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -87,16 +87,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/pass-pipelines.ll b/llvm/test/Other/pass-pipelines.ll index ccd364d5d740..768e8343529e 100644 --- a/llvm/test/Other/pass-pipelines.ll +++ b/llvm/test/Other/pass-pipelines.ll @@ -53,6 +53,9 @@ ; CHECK-O2-NEXT: FunctionPass Manager ; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager +; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager ; CHECK-O2-NOT: Manager ; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index 82deee9f367b..8f43029fa303 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -22,30 +22,33 @@ define dso_local i32 @main() { ; CHECK-NEXT: bb: ; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4 -; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]] +; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4 +; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]] ; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0 -; CHECK-NEXT: br label [[BB1:%.*]] -; CHECK: bb1: -; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]] -; CHECK: bb19.preheader: +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]] +; CHECK: bb27.preheader: ; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]] ; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4 ; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0 -; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]] -; CHECK: bb13.preheader.bb27.thread.split_crit_edge: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 +; CHECK-NEXT: br label [[BB27:%.*]] +; CHECK: bb27.thread: ; CHECK-NEXT: store i32 0, i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 0, i32* @c, align 4 ; CHECK-NEXT: br label [[BB32:%.*]] +; CHECK: bb27: +; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]] ; CHECK: bb32.loopexit: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: br label [[BB32]] ; CHECK: bb32: -; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ] +; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ] ; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4 ; CHECK-NEXT: ret i32 0 +; CHECK: bb36: +; CHECK-NEXT: store i32 1, i32* @c, align 4 +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]] ; bb: %i = alloca i32, align 4 diff --git a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll index 3e659414d982..4661bd8a36cc 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll @@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; OLDPM-NEXT: entry: ; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; OLDPM: for.body7.lr.ph.i: ; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] -; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]] -; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]] +; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] +; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]] +; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]] +; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; OLDPM: for.body7.i: -; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]] +; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1 +; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 ; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] ; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; OLDPM: _ZN12FloatVecPair6vecIncEv.exit: @@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; NEWPM-NEXT: entry: ; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; NEWPM: for.body7.lr.ph.i: ; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] -; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]] -; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]] -; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1 +; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] +; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]] +; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]] ; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; NEWPM: for.body7.i: -; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]] +; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]] -; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]] -; NEWPM: for.body7.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1 -; NEWPM-NEXT: br label [[FOR_BODY7_I]] +; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 +; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] +; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; NEWPM: _ZN12FloatVecPair6vecIncEv.exit: ; NEWPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll index 280f849dbb35..8b8b535f1a77 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll @@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-LABEL: @vdiv( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0 -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]] -; CHECK: for.body.lr.ph: +; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 -; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]] ; CHECK: vector.memcheck: ; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]] ; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]] ; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]] ; CHECK: vector.ph: ; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292 ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0 @@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7 ; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4 ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]] ; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8 ; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]] ; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12 ; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]] ; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16 ; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4 ; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0 -; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] ; CHECK: middle.block.unr-lcssa: ; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0 @@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ] ; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]] ; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4 ; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1 ; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]] -; CHECK: for.body.preheader: -; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]] +; CHECK: for.body.preheader8: +; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1 ; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 -; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0 -; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] +; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 +; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0 +; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] ; CHECK: for.body.prol.preheader: ; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]] ; CHECK: for.body.prol: ; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ] -; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ] +; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ] ; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]] ; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1 ; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1 ; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]] ; CHECK: for.body.prol.loopexit: -; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] +; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] ; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3 -; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]] -; CHECK: for.body.preheader.new: +; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]] +; CHECK: for.body.preheader8.new: ; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]] ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]] -; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]] ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2 ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]] ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3 ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]] ; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4 ; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]] ; CHECK: for.end: ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll index 7d7d18a5247d..bb320af193e3 100644 --- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll +++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll @@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_OLDPM: for.cond.preheader: +; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_OLDPM: for.body.preheader: ; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]] +; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_OLDPM: for.cond.cleanup: ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f2() ; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_OLDPM: for.body: -; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ] +; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f1() -; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1 +; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1 ; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]] ; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] ; ROTATED_LATER_OLDPM: return: @@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_NEWPM: for.cond.preheader: +; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_NEWPM: for.body.preheader: ; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]] -; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_NEWPM: for.cond.cleanup: ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f2() ; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_NEWPM: for.body: -; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ] +; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f1() ; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]] ; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]] ; ROTATED_LATER_NEWPM: return: ; ROTATED_LATER_NEWPM-NEXT: ret void </cut>

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 12 - Successful!

by ci_notify＠linaro.org

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 8 - Successful!

by ci_notify＠linaro.org

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b Author: Carl Ritson <carl.ritson(a)amd.com> Date: Tue May 11 12:14:01 2021 +0900 [AMDGPU] Pre-commit tests for D102211 </cut> Results regressed to (for first_bad == ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b/results_id: 1 # 473.astar,astar_base.default regressed by 104 from (for last_good == d8ec2b183e9243366e3a0cd1116dbe879856b333) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-d8ec2b183e9243366e3a0cd1116dbe879856b333/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4218 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4219 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b cd investigate-llvm-ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b ../artifacts/test.sh # Reproduce last_good build git checkout --detach d8ec2b183e9243366e3a0cd1116dbe879856b333 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b Author: Carl Ritson <carl.ritson(a)amd.com> Date: Tue May 11 12:14:01 2021 +0900 [AMDGPU] Pre-commit tests for D102211 --- llvm/test/CodeGen/AMDGPU/hard-clauses.mir | 36 +++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/llvm/test/CodeGen/AMDGPU/hard-clauses.mir b/llvm/test/CodeGen/AMDGPU/hard-clauses.mir index 506f9a77c177..e6ca33341bfb 100644 --- a/llvm/test/CodeGen/AMDGPU/hard-clauses.mir +++ b/llvm/test/CodeGen/AMDGPU/hard-clauses.mir @@ -209,3 +209,39 @@ body: | $vgpr79 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 316, 0, 0, 0, implicit $exec $vgpr80 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 320, 0, 0, 0, implicit $exec ... + +--- +name: mimg_nsa +tracksRegLiveness: true +body: | + bb.0: + liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK-LABEL: name: mimg_nsa + ; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK: BUNDLE implicit-def $vgpr10_vgpr11_vgpr12_vgpr13, implicit-def $vgpr10, implicit-def $vgpr10_lo16, implicit-def $vgpr10_hi16, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit-def $vgpr13, implicit-def $vgpr13_lo16, implicit-def $vgpr13_hi16, implicit-def $vgpr10_vgpr11, implicit-def $vgpr10_vgpr11_vgpr12, implicit-def $vgpr11_vgpr12, implicit-def $vgpr11_vgpr12_vgpr13, implicit-def $vgpr12_vgpr13, implicit-def $vgpr20_vgpr21_vgpr22_vgpr23, implicit-def $vgpr20, implicit-def $vgpr20_lo16, implicit-def $vgpr20_hi16, implicit-def $vgpr21, implicit-def $vgpr21_lo16, implicit-def $vgpr21_hi16, implicit-def $vgpr22, implicit-def $vgpr22_lo16, implicit-def $vgpr22_hi16, implicit-def $vgpr23, implicit-def $vgpr23_lo16, implicit-def $vgpr23_hi16, implicit-def $vgpr20_vgpr21, implicit-def $vgpr20_vgpr21_vgpr22, implicit-def $vgpr21_vgpr22, implicit-def $vgpr21_vgpr22_vgpr23, implicit-def $vgpr22_vgpr23, implicit $vgpr3, implicit $vgpr8, implicit $vgpr7, implicit $vgpr5, implicit $vgpr4, implicit $vgpr6, implicit $vgpr0, implicit $vgpr2, implicit $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, implicit $sgpr8_sgpr9_sgpr10_sgpr11, implicit $exec { + ; CHECK: S_CLAUSE 1 + ; CHECK: $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: } + $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) +... + +--- +name: mimg_nsa_mixed +tracksRegLiveness: true +body: | + bb.0: + liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK-LABEL: name: mimg_nsa_mixed + ; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK: BUNDLE implicit-def $vgpr10_vgpr11_vgpr12_vgpr13, implicit-def $vgpr10, implicit-def $vgpr10_lo16, implicit-def $vgpr10_hi16, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit-def $vgpr13, implicit-def $vgpr13_lo16, implicit-def $vgpr13_hi16, implicit-def $vgpr10_vgpr11, implicit-def $vgpr10_vgpr11_vgpr12, implicit-def $vgpr11_vgpr12, implicit-def $vgpr11_vgpr12_vgpr13, implicit-def $vgpr12_vgpr13, implicit-def $vgpr14, implicit-def $vgpr14_lo16, implicit-def $vgpr14_hi16, implicit-def $vgpr20_vgpr21_vgpr22_vgpr23, implicit-def $vgpr20, implicit-def $vgpr20_lo16, implicit-def $vgpr20_hi16, implicit-def $vgpr21, implicit-def $vgpr21_lo16, implicit-def $vgpr21_hi16, implicit-def $vgpr22, implicit-def $vgpr22_lo16, implicit-def $vgpr22_hi16, implicit-def $vgpr23, implicit-def $vgpr23_lo16, implicit-def $vgpr23_hi16, implicit-def $vgpr20_vgpr21, implicit-def $vgpr20_vgpr21_vgpr22, implicit-def $vgpr21_vgpr22, implicit-def $vgpr21_vgpr22_vgpr23, implicit-def $vgpr22_vgpr23, implicit $vgpr3, implicit $vgpr8, implicit $vgpr7, implicit $vgpr5, implicit $vgpr4, implicit $vgpr6, implicit $vgpr0, implicit $vgpr2, implicit $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, implicit $sgpr8_sgpr9_sgpr10_sgpr11, implicit $exec, implicit $vgpr5_vgpr6 { + ; CHECK: S_CLAUSE 2 + ; CHECK: $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: $vgpr14 = IMAGE_SAMPLE_LZ_V1_V2_gfx10 $vgpr5_vgpr6, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 16 from custom "ImageResource") + ; CHECK: $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: } + $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + $vgpr14 = IMAGE_SAMPLE_LZ_V1_V2_gfx10 $vgpr5_vgpr6, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 16 from custom "ImageResource") + $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) +... </cut>

4 years, 8 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3 - Build # 11 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3 Culprit: <cut> commit 99203f2004d031f2ef22f01e3c569d2775de1836 Author: Alexey Bataev <a.bataev(a)outlook.com> Date: Tue Mar 23 13:22:58 2021 -0700 [Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967 </cut> Results regressed to (for first_bad == 99203f2004d031f2ef22f01e3c569d2775de1836) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-99203f2004d031f2ef22f01e3c569d2775de1836/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 103 from (for last_good == 4157a079afbf7fa5c3ce3ac0e9f4541f89188ae2) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-4157a079afbf7fa5c3ce3ac0e9f4541f89188ae2/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4192 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4194 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-99203f2004d031f2ef22f01e3c569d2775de1836 cd investigate-llvm-99203f2004d031f2ef22f01e3c569d2775de1836 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 99203f2004d031f2ef22f01e3c569d2775de1836 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4157a079afbf7fa5c3ce3ac0e9f4541f89188ae2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 99203f2004d031f2ef22f01e3c569d2775de1836 Author: Alexey Bataev <a.bataev(a)outlook.com> Date: Tue Mar 23 13:22:58 2021 -0700 [Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967 --- llvm/include/llvm/Analysis/LoopAccessAnalysis.h | 9 + llvm/lib/Analysis/LoopAccessAnalysis.cpp | 198 ++++++++++------------ llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 79 ++++++--- llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll | 17 +- 4 files changed, 163 insertions(+), 140 deletions(-) diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h index 13fbe884eddf..39acfd5bbbee 100644 --- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h +++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h @@ -679,6 +679,15 @@ int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr, const Loop *Lp, const ValueToValueMap &StridesMap = ValueToValueMap(), bool Assume = false, bool ShouldCheckWrap = true); +/// Returns the distance between the pointers \p PtrA and \p PtrB iff they are +/// compatible and it is possible to calculate the distance between them. This +/// is a simple API that does not depend on the analysis pass. +/// \param StrictCheck Ensure that the calculated distance matches the +/// type-based one after all the bitcasts removal in the provided pointers. +Optional<int> getPointersDiff(Value *PtrA, Value *PtrB, const DataLayout &DL, + ScalarEvolution &SE, bool StrictCheck = false, + bool CheckType = true); + /// Attempt to sort the pointers in \p VL and return the sorted indices /// in \p SortedIndices, if reordering is required. /// diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index e632fe25c24c..997d4474a448 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1124,139 +1124,123 @@ int64_t llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr, return Stride; } +Optional<int> llvm::getPointersDiff(Value *PtrA, Value *PtrB, + const DataLayout &DL, ScalarEvolution &SE, + bool StrictCheck, bool CheckType) { + assert(PtrA && PtrB && "Expected non-nullptr pointers."); + // Make sure that A and B are different pointers. + if (PtrA == PtrB) + return 0; + + // Make sure that PtrA and PtrB have the same type if required + if (CheckType && PtrA->getType() != PtrB->getType()) + return None; + + unsigned ASA = PtrA->getType()->getPointerAddressSpace(); + unsigned ASB = PtrB->getType()->getPointerAddressSpace(); + + // Check that the address spaces match. + if (ASA != ASB) + return None; + unsigned IdxWidth = DL.getIndexSizeInBits(ASA); + + APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0); + Value *PtrA1 = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA); + Value *PtrB1 = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB); + + int Val; + if (PtrA1 == PtrB1) { + // Retrieve the address space again as pointer stripping now tracks through + // `addrspacecast`. + ASA = cast<PointerType>(PtrA1->getType())->getAddressSpace(); + ASB = cast<PointerType>(PtrB1->getType())->getAddressSpace(); + // Check that the address spaces match and that the pointers are valid. + if (ASA != ASB) + return None; + + IdxWidth = DL.getIndexSizeInBits(ASA); + OffsetA = OffsetA.sextOrTrunc(IdxWidth); + OffsetB = OffsetB.sextOrTrunc(IdxWidth); + + OffsetB -= OffsetA; + Val = OffsetB.getSExtValue(); + } else { + // Otherwise compute the distance with SCEV between the base pointers. + const SCEV *PtrSCEVA = SE.getSCEV(PtrA); + const SCEV *PtrSCEVB = SE.getSCEV(PtrB); + const auto *Diff = + dyn_cast<SCEVConstant>(SE.getMinusSCEV(PtrSCEVB, PtrSCEVA)); + if (!Diff) + return None; + Val = Diff->getAPInt().getSExtValue(); + } + Type *Ty = cast<PointerType>(PtrA->getType())->getElementType(); + int Size = DL.getTypeStoreSize(Ty); + int Dist = Val / Size; + + // Ensure that the calculated distance matches the type-based one after all + // the bitcasts removal in the provided pointers. + if (!StrictCheck || Dist * Size == Val) + return Dist; + return None; +} + bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL, ScalarEvolution &SE, SmallVectorImpl<unsigned> &SortedIndices) { assert(llvm::all_of( VL, [](const Value *V) { return V->getType()->isPointerTy(); }) && "Expected list of pointer operands."); - SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs; - OffValPairs.reserve(VL.size()); - // Walk over the pointers, and map each of them to an offset relative to // first pointer in the array. Value *Ptr0 = VL[0]; - const SCEV *Scev0 = SE.getSCEV(Ptr0); - Value *Obj0 = getUnderlyingObject(Ptr0); - - llvm::SmallSet<int64_t, 4> Offsets; - for (auto *Ptr : VL) { - // TODO: Outline this code as a special, more time consuming, version of - // computeConstantDifference() function. - if (Ptr->getType()->getPointerAddressSpace() != - Ptr0->getType()->getPointerAddressSpace()) - return false; - // If a pointer refers to a different underlying object, bail - the - // pointers are by definition incomparable. - Value *CurrObj = getUnderlyingObject(Ptr); - if (CurrObj != Obj0) - return false; - const SCEV *Scev = SE.getSCEV(Ptr); - const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev, Scev0)); - // The pointers may not have a constant offset from each other, or SCEV - // may just not be smart enough to figure out they do. Regardless, - // there's nothing we can do. + using DistOrdPair = std::pair<int64_t, int>; + auto Compare = [](const DistOrdPair &L, const DistOrdPair &R) { + return L.first < R.first; + }; + std::set<DistOrdPair, decltype(Compare)> Offsets(Compare); + Offsets.emplace(0, 0); + int Cnt = 1; + bool IsConsecutive = true; + for (auto *Ptr : VL.drop_front()) { + Optional<int> Diff = getPointersDiff(Ptr0, Ptr, DL, SE); if (!Diff) return false; // Check if the pointer with the same offset is found. - int64_t Offset = Diff->getAPInt().getSExtValue(); - if (!Offsets.insert(Offset).second) + int64_t Offset = *Diff; + auto Res = Offsets.emplace(Offset, Cnt); + if (!Res.second) return false; - OffValPairs.emplace_back(Offset, Ptr); + // Consecutive order if the inserted element is the last one. + IsConsecutive = IsConsecutive && std::next(Res.first) == Offsets.end(); + ++Cnt; } SortedIndices.clear(); - SortedIndices.resize(VL.size()); - std::iota(SortedIndices.begin(), SortedIndices.end(), 0); - - // Sort the memory accesses and keep the order of their uses in UseOrder. - llvm::stable_sort(SortedIndices, [&](unsigned Left, unsigned Right) { - return OffValPairs[Left].first < OffValPairs[Right].first; - }); - - // Check if the order is consecutive already. - if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) { - return I == SortedIndices[I]; - })) - SortedIndices.clear(); - + if (!IsConsecutive) { + // Fill SortedIndices array only if it is non-consecutive. + SortedIndices.resize(VL.size()); + Cnt = 0; + for (const std::pair<int64_t, int> &Pair : Offsets) { + IsConsecutive = IsConsecutive && Cnt == Pair.second; + SortedIndices[Cnt] = Pair.second; + ++Cnt; + } + } return true; } -/// Take the address space operand from the Load/Store instruction. -/// Returns -1 if this is not a valid Load/Store instruction. -static unsigned getAddressSpaceOperand(Value *I) { - if (LoadInst *L = dyn_cast<LoadInst>(I)) - return L->getPointerAddressSpace(); - if (StoreInst *S = dyn_cast<StoreInst>(I)) - return S->getPointerAddressSpace(); - return -1; -} - /// Returns true if the memory operations \p A and \p B are consecutive. bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL, ScalarEvolution &SE, bool CheckType) { Value *PtrA = getLoadStorePointerOperand(A); Value *PtrB = getLoadStorePointerOperand(B); - unsigned ASA = getAddressSpaceOperand(A); - unsigned ASB = getAddressSpaceOperand(B); - - // Check that the address spaces match and that the pointers are valid. - if (!PtrA || !PtrB || (ASA != ASB)) - return false; - - // Make sure that A and B are different pointers. - if (PtrA == PtrB) - return false; - - // Make sure that A and B have the same type if required. - if (CheckType && PtrA->getType() != PtrB->getType()) + if (!PtrA || !PtrB) return false; - - unsigned IdxWidth = DL.getIndexSizeInBits(ASA); - Type *Ty = cast<PointerType>(PtrA->getType())->getElementType(); - - APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0); - PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA); - PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB); - - // Retrieve the address space again as pointer stripping now tracks through - // `addrspacecast`. - ASA = cast<PointerType>(PtrA->getType())->getAddressSpace(); - ASB = cast<PointerType>(PtrB->getType())->getAddressSpace(); - // Check that the address spaces match and that the pointers are valid. - if (ASA != ASB) - return false; - - IdxWidth = DL.getIndexSizeInBits(ASA); - OffsetA = OffsetA.sextOrTrunc(IdxWidth); - OffsetB = OffsetB.sextOrTrunc(IdxWidth); - - APInt Size(IdxWidth, DL.getTypeStoreSize(Ty)); - - // OffsetDelta = OffsetB - OffsetA; - const SCEV *OffsetSCEVA = SE.getConstant(OffsetA); - const SCEV *OffsetSCEVB = SE.getConstant(OffsetB); - const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB, OffsetSCEVA); - const APInt &OffsetDelta = cast<SCEVConstant>(OffsetDeltaSCEV)->getAPInt(); - - // Check if they are based on the same pointer. That makes the offsets - // sufficient. - if (PtrA == PtrB) - return OffsetDelta == Size; - - // Compute the necessary base pointer delta to have the necessary final delta - // equal to the size. - // BaseDelta = Size - OffsetDelta; - const SCEV *SizeSCEV = SE.getConstant(Size); - const SCEV *BaseDelta = SE.getMinusSCEV(SizeSCEV, OffsetDeltaSCEV); - - // Otherwise compute the distance with SCEV between the base pointers. - const SCEV *PtrSCEVA = SE.getSCEV(PtrA); - const SCEV *PtrSCEVB = SE.getSCEV(PtrB); - const SCEV *X = SE.getAddExpr(PtrSCEVA, BaseDelta); - return X == PtrSCEVB; + Optional<int> Diff = + getPointersDiff(PtrA, PtrB, DL, SE, /*StrictCheck=*/true, CheckType); + return Diff && *Diff == 1; } MemoryDepChecker::VectorizationSafetyStatus diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 385b6f30dc0f..78d2ea0032db 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -941,10 +941,16 @@ public: ScalarEvolution &SE) { auto *LI1 = dyn_cast<LoadInst>(V1); auto *LI2 = dyn_cast<LoadInst>(V2); - if (LI1 && LI2) - return isConsecutiveAccess(LI1, LI2, DL, SE) - ? VLOperands::ScoreConsecutiveLoads - : VLOperands::ScoreFail; + if (LI1 && LI2) { + if (LI1->getParent() != LI2->getParent()) + return VLOperands::ScoreFail; + + Optional<int> Dist = + getPointersDiff(LI1->getPointerOperand(), LI2->getPointerOperand(), + DL, SE, /*StrictCheck=*/true); + return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads + : VLOperands::ScoreFail; + } auto *C1 = dyn_cast<Constant>(V1); auto *C2 = dyn_cast<Constant>(V2); @@ -2871,13 +2877,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth, Ptr0 = PointerOps[CurrentOrder.front()]; PtrN = PointerOps[CurrentOrder.back()]; } - const SCEV *Scev0 = SE->getSCEV(Ptr0); - const SCEV *ScevN = SE->getSCEV(PtrN); - const auto *Diff = - dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0)); - uint64_t Size = DL->getTypeAllocSize(ScalarTy); + Optional<int> Diff = getPointersDiff(Ptr0, PtrN, *DL, *SE); // Check that the sorted loads are consecutive. - if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) { + if (static_cast<unsigned>(*Diff) == VL.size() - 1) { if (CurrentOrder.empty()) { // Original loads are consecutive and does not require reordering. ++NumOpsWantToKeepOriginalOrder; @@ -3150,13 +3152,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth, Ptr0 = PointerOps[CurrentOrder.front()]; PtrN = PointerOps[CurrentOrder.back()]; } - const SCEV *Scev0 = SE->getSCEV(Ptr0); - const SCEV *ScevN = SE->getSCEV(PtrN); - const auto *Diff = - dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0)); - uint64_t Size = DL->getTypeAllocSize(ScalarTy); + Optional<int> Dist = getPointersDiff(Ptr0, PtrN, *DL, *SE); // Check that the sorted pointer operands are consecutive. - if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) { + if (static_cast<unsigned>(*Dist) == VL.size() - 1) { if (CurrentOrder.empty()) { // Original stores are consecutive and does not require reordering. ++NumOpsWantToKeepOriginalOrder; @@ -6107,20 +6105,41 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores, int E = Stores.size(); SmallBitVector Tails(E, false); - SmallVector<int, 16> ConsecutiveChain(E, E + 1); int MaxIter = MaxStoreLookup.getValue(); + SmallVector<std::pair<int, int>, 16> ConsecutiveChain( + E, std::make_pair(E, INT_MAX)); + SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E, false)); int IterCnt; auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter, + &CheckedPairs, &ConsecutiveChain](int K, int Idx) { if (IterCnt >= MaxIter) return true; + if (CheckedPairs[Idx].test(K)) + return ConsecutiveChain[K].second == 1 && + ConsecutiveChain[K].first == Idx; ++IterCnt; - if (!isConsecutiveAccess(Stores[K], Stores[Idx], *DL, *SE)) + CheckedPairs[Idx].set(K); + CheckedPairs[K].set(Idx); + Optional<int> Diff = getPointersDiff(Stores[K]->getPointerOperand(), + Stores[Idx]->getPointerOperand(), *DL, + *SE, /*StrictCheck=*/true); + if (!Diff || *Diff == 0) + return false; + int Val = *Diff; + if (Val < 0) { + if (ConsecutiveChain[Idx].second > -Val) { + Tails.set(K); + ConsecutiveChain[Idx] = std::make_pair(K, -Val); + } + return false; + } + if (ConsecutiveChain[K].second <= Val) return false; Tails.set(Idx); - ConsecutiveChain[K] = Idx; - return true; + ConsecutiveChain[K] = std::make_pair(Idx, Val); + return Val == 1; }; // Do a quadratic search on all of the given stores in reverse order and find // all of the pairs of stores that follow each other. @@ -6140,17 +6159,31 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores, // For stores that start but don't end a link in the chain: for (int Cnt = E; Cnt > 0; --Cnt) { int I = Cnt - 1; - if (ConsecutiveChain[I] == E + 1 || Tails.test(I)) + if (ConsecutiveChain[I].first == E || Tails.test(I)) continue; // We found a store instr that starts a chain. Now follow the chain and try // to vectorize it. BoUpSLP::ValueList Operands; // Collect the chain into a list. - while (I != E + 1 && !VectorizedStores.count(Stores[I])) { + while (I != E && !VectorizedStores.count(Stores[I])) { Operands.push_back(Stores[I]); + Tails.set(I); + if (ConsecutiveChain[I].second != 1) { + // Mark the new end in the chain and go back, if required. It might be + // required if the original stores come in reversed order, for example. + if (ConsecutiveChain[I].first != E && + Tails.test(ConsecutiveChain[I].first) && + !VectorizedStores.count(Stores[ConsecutiveChain[I].first])) { + Tails.reset(ConsecutiveChain[I].first); + if (Cnt < ConsecutiveChain[I].first + 2) + Cnt = ConsecutiveChain[I].first + 2; + } + break; + } // Move to the next value in the chain. - I = ConsecutiveChain[I]; + I = ConsecutiveChain[I].first; } + assert(!Operands.empty() && "Expected non-empty list of stores."); unsigned MaxVecRegSize = R.getMaxVecRegSize(); unsigned EltSize = R.getVectorElementSize(Operands[0]); diff --git a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll index 267cf1a02c29..e28362894910 100644 --- a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll +++ b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll @@ -1,7 +1,7 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+avx -S | FileCheck %s --check-prefix=AVX -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+avx -S | FileCheck %s --check-prefix=AVX +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX %class.1 = type { %class.2 } %class.2 = type { %"class.3" } @@ -117,13 +117,10 @@ define void @pr35497() local_unnamed_addr #0 { ; AVX-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds [0 x i64], [0 x i64]* undef, i64 0, i64 0 ; AVX-NEXT: [[TMP10:%.*]] = bitcast i64* [[ARRAYIDX2_6]] to <2 x i64>* ; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1 -; AVX-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0 -; AVX-NEXT: [[TMP12:%.*]] = insertelement <2 x i64> poison, i64 [[TMP11]], i32 0 -; AVX-NEXT: [[TMP13:%.*]] = insertelement <2 x i64> [[TMP12]], i64 [[TMP5]], i32 1 -; AVX-NEXT: [[TMP14:%.*]] = lshr <2 x i64> [[TMP13]], <i64 6, i64 6> -; AVX-NEXT: [[TMP15:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP14]] -; AVX-NEXT: [[TMP16:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x i64>* -; AVX-NEXT: store <2 x i64> [[TMP15]], <2 x i64>* [[TMP16]], align 1 +; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6> +; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]] +; AVX-NEXT: [[TMP13:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x i64>* +; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1 ; AVX-NEXT: ret void ; entry: </cut>

4 years, 8 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain