linaro-toolchain

linaro-toolchain@lists.linaro.org

5599 discussions

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled - Build # 3 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled Culprit: <cut> commit a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Mon Aug 23 00:00:07 2021 +0000 Automatic date update in version.in </cut> Results regressed to (for first_bad == a12ea97b9dab8eedf411fc5052ffaa8be29f5d36) # reset_artifacts: -10 # true: 0 # First few build errors in logs: from (for last_good == fe7f0b013526b30ef657c5ad34a3c622a54499ac) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_profiled: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 cd investigate-binutils-a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fe7f0b013526b30ef657c5ad34a3c622a54499ac ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Full commit (up to 1000 lines): <cut> commit a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Mon Aug 23 00:00:07 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index d22e02c7fd4..66d43d6f3a1 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210822 +#define BFD_VERSION_DATE 20210823 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

3 years, 10 months

[ACTIVITY] report week ending 26 Aug

by Peter Maydell

Progress (short week, 3 days): * UM-2 [QEMU upstream maintainership] + QEMU 6.1.0 has now been released + Sent out the first arm pullreq for the 6.2 cycle, including another slice of the MVE patches + tried to work through some of the codereview backlog -- PMM

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_debug - Build # 2 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_debug. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_debug Culprit: <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. </cut> Results regressed to (for first_bad == 1d244020246cb155e4de62ca3b302b920a1f513f) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:06:26 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:25:39 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: error: type mismatch in ‘lshift_expr’ # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: internal compiler error: ‘verify_gimple’ failed # 00:29:38 make[3]: *** [bitmap.o] Error 1 # 00:34:06 make[2]: *** [all-stage3-gcc] Error 2 # 00:34:06 make[1]: *** [stage3-bubble] Error 2 # 00:34:07 make: *** [all] Error 2 from (for last_good == b320edc0c29c838b0090c3c9be14187d132f73f2) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_debug: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f cd investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 1d244020246cb155e4de62ca3b302b920a1f513f ../artifacts/test.sh # Reproduce last_good build git checkout --detach b320edc0c29c838b0090c3c9be14187d132f73f2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Full commit (up to 1000 lines): <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. --- gcc/match.pd | 9 +++++++++ gcc/testsuite/gcc.dg/fold-convlshift-1.c | 20 ++++++++++++++++++++ gcc/testsuite/gcc.dg/fold-convlshift-2.c | 20 ++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 0fcfd0ea62c..978a1b0172e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3385,6 +3385,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (integer_zerop (@2) || integer_all_onesp (@2)) (cmp @0 @2))))) +/* Both signed and unsigned lshift produce the same result, so use + the form that minimizes the number of conversions. */ +(simplify + (convert (lshift:s@0 (convert:s@1 @2) INTEGER_CST@3)) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) + && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)) + (lshift (convert @2) @3))) + /* Simplifications of conversions. */ /* Basic strip-useless-type-conversions / strip_nops. */ diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-1.c b/gcc/testsuite/gcc.dg/fold-convlshift-1.c new file mode 100644 index 00000000000..b6f57f81e72 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned int i) +{ + int t1 = i; + int t2 = t1 << 8; + return t2; +} + +int bar(int i) +{ + unsigned int t1 = i; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-not "\$int\$" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\$unsigned int\$" "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-2.c b/gcc/testsuite/gcc.dg/fold-convlshift-2.c new file mode 100644 index 00000000000..f21358c4584 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned char c) +{ + int t1 = c; + int t2 = t1 << 8; + return t2; +} + +int bar(unsigned char c) +{ + unsigned int t1 = c; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-times "\$int\$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\$unsigned int\$" 1 "optimized" } } */ + </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 14 - Successful!

by ci_notify＠linaro.org

3 years, 10 months

Moving to llvm@lists.linux.dev

by Nathan Chancellor

Hi everyone, We are shifting the ClangBuiltLinux mailing list from clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev. Google Groups has served us well but moving to lists.linux.dev allows for easier archival (as we will be on lore.kernel.org automatically) and allows for people to subscribe to us easier, as they only need an email address, rather than a Google account. Please follow these directions to subscribe to the new mailing list: https://subspace.kernel.org/index.html#subscribing Some more information about lists.linux.dev: https://www.kernel.org/lists-linux-dev.html https://subspace.kernel.org/lists.linux.dev.html I have added CI maintainers/mailing lists that send us regular reports to this announcement. Please continue to send us emails about build results, just switch the email from clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev so that they get archived as a part of lore and can be easily searched, especially with the upcoming https://x-lore.kernel.org/all/. I will send a patch shortly to update MAINTAINERS. Cheers, Nathan

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 Author: Fangrui Song <i(a)maskray.me> Date: Thu Mar 25 21:55:27 2021 -0700 [sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926 </cut> Results regressed to (for first_bad == 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 9d375a40c3df90dd48edc0e1b1115c702c55d716) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-9d375a40c3df90dd48edc0e1b1115c702c55d716/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4304 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4302 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 cd investigate-llvm-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9d375a40c3df90dd48edc0e1b1115c702c55d716 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 Author: Fangrui Song <i(a)maskray.me> Date: Thu Mar 25 21:55:27 2021 -0700 [sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926 --- compiler-rt/lib/asan/asan_rtl.cpp | 5 +- compiler-rt/lib/asan/asan_thread.cpp | 2 +- compiler-rt/lib/hwasan/hwasan.cpp | 2 - compiler-rt/lib/lsan/lsan.cpp | 1 - compiler-rt/lib/memprof/memprof_rtl.cpp | 3 - compiler-rt/lib/msan/msan.cpp | 1 - .../lib/sanitizer_common/sanitizer_common.h | 1 - .../lib/sanitizer_common/sanitizer_fuchsia.cpp | 1 - compiler-rt/lib/sanitizer_common/sanitizer_linux.h | 1 - .../sanitizer_common/sanitizer_linux_libcdep.cpp | 231 ++++++++------------- compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp | 3 - .../lib/sanitizer_common/sanitizer_rtems.cpp | 1 - compiler-rt/lib/sanitizer_common/sanitizer_win.cpp | 3 - .../tests/sanitizer_common_test.cpp | 2 - .../tests/sanitizer_linux_test.cpp | 17 +- compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp | 1 - 16 files changed, 91 insertions(+), 184 deletions(-) diff --git a/compiler-rt/lib/asan/asan_rtl.cpp b/compiler-rt/lib/asan/asan_rtl.cpp index 7b5a929963c6..106a52607631 100644 --- a/compiler-rt/lib/asan/asan_rtl.cpp +++ b/compiler-rt/lib/asan/asan_rtl.cpp @@ -490,9 +490,6 @@ static void AsanInitInternal() { if (flags()->start_deactivated) AsanDeactivate(); - // interceptors - InitTlsSize(); - // Create main thread. AsanThread *main_thread = CreateMainThread(); CHECK_EQ(0, main_thread->tid()); @@ -568,7 +565,7 @@ void UnpoisonStack(uptr bottom, uptr top, const char *type) { type, top, bottom, top - bottom, top - bottom); return; } - PoisonShadow(bottom, top - bottom, 0); + PoisonShadow(bottom, RoundUpTo(top - bottom, SHADOW_GRANULARITY), 0); } static void UnpoisonDefaultStack() { diff --git a/compiler-rt/lib/asan/asan_thread.cpp b/compiler-rt/lib/asan/asan_thread.cpp index ae3bcba204c6..f7778c0f1e34 100644 --- a/compiler-rt/lib/asan/asan_thread.cpp +++ b/compiler-rt/lib/asan/asan_thread.cpp @@ -307,7 +307,7 @@ void AsanThread::SetThreadStackAndTls(const InitOptions *options) { uptr stack_size = 0; GetThreadStackAndTls(tid() == 0, &stack_bottom_, &stack_size, &tls_begin_, &tls_size); - stack_top_ = stack_bottom_ + stack_size; + stack_top_ = RoundDownTo(stack_bottom_ + stack_size, SHADOW_GRANULARITY); tls_end_ = tls_begin_ + tls_size; dtls_ = DTLS_Get(); diff --git a/compiler-rt/lib/hwasan/hwasan.cpp b/compiler-rt/lib/hwasan/hwasan.cpp index 5c0d804561d2..ce08ec3508c4 100644 --- a/compiler-rt/lib/hwasan/hwasan.cpp +++ b/compiler-rt/lib/hwasan/hwasan.cpp @@ -265,8 +265,6 @@ void __hwasan_init() { hwasan_init_is_running = 1; SanitizerToolName = "HWAddressSanitizer"; - InitTlsSize(); - CacheBinaryName(); InitializeFlags(); diff --git a/compiler-rt/lib/lsan/lsan.cpp b/compiler-rt/lib/lsan/lsan.cpp index 2c0a3bf0787c..b264be0ba792 100644 --- a/compiler-rt/lib/lsan/lsan.cpp +++ b/compiler-rt/lib/lsan/lsan.cpp @@ -98,7 +98,6 @@ extern "C" void __lsan_init() { InitCommonLsan(); InitializeAllocator(); ReplaceSystemMalloc(); - InitTlsSize(); InitializeInterceptors(); InitializeThreadRegistry(); InstallDeadlySignalHandlers(LsanOnDeadlySignal); diff --git a/compiler-rt/lib/memprof/memprof_rtl.cpp b/compiler-rt/lib/memprof/memprof_rtl.cpp index d6d606f666ee..05759e406f7a 100644 --- a/compiler-rt/lib/memprof/memprof_rtl.cpp +++ b/compiler-rt/lib/memprof/memprof_rtl.cpp @@ -214,9 +214,6 @@ static void MemprofInitInternal() { InitializeCoverage(common_flags()->coverage, common_flags()->coverage_dir); - // interceptors - InitTlsSize(); - // Create main thread. MemprofThread *main_thread = CreateMainThread(); CHECK_EQ(0, main_thread->tid()); diff --git a/compiler-rt/lib/msan/msan.cpp b/compiler-rt/lib/msan/msan.cpp index 4be1630cd302..4ee7e2ec4dd6 100644 --- a/compiler-rt/lib/msan/msan.cpp +++ b/compiler-rt/lib/msan/msan.cpp @@ -436,7 +436,6 @@ void __msan_init() { InitializeInterceptors(); CheckASLR(); - InitTlsSize(); InstallDeadlySignalHandlers(MsanOnDeadlySignal); InstallAtExitHandler(); // Needs __cxa_atexit interceptor. diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common.h b/compiler-rt/lib/sanitizer_common/sanitizer_common.h index dcd625d30f77..2b2629fc12dd 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_common.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_common.h @@ -284,7 +284,6 @@ void SetSandboxingCallback(void (*f)()); void InitializeCoverage(bool enabled, const char *coverage_dir); -void InitTlsSize(); uptr GetTlsSize(); // Other diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp index 4f692f99c207..5d68ad8ee8e4 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp @@ -103,7 +103,6 @@ void DisableCoreDumperIfNecessary() {} void InstallDeadlySignalHandlers(SignalHandlerType handler) {} void SetAlternateSignalStack() {} void UnsetAlternateSignalStack() {} -void InitTlsSize() {} bool SignalContext::IsStackOverflow() const { return false; } void SignalContext::DumpAllRegisters(void *context) { UNIMPLEMENTED(); } diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.h b/compiler-rt/lib/sanitizer_common/sanitizer_linux.h index 41ae072d6cac..9a23fcfb3b93 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.h @@ -98,7 +98,6 @@ class ThreadLister { // Exposed for testing. uptr ThreadDescriptorSize(); uptr ThreadSelf(); -uptr ThreadSelfOffset(); // Matches a library's file name against a base name (stripping path and version // information). diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp index 613658147bbd..1177a1ceb14f 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp @@ -184,80 +184,8 @@ __attribute__((unused)) static bool GetLibcVersion(int *major, int *minor, #endif } -#if SANITIZER_GLIBC && !SANITIZER_GO -static uptr g_tls_size; - -#ifdef __i386__ -#define CHECK_GET_TLS_STATIC_INFO_VERSION (!__GLIBC_PREREQ(2, 27)) -#else -#define CHECK_GET_TLS_STATIC_INFO_VERSION 0 -#endif - -#if CHECK_GET_TLS_STATIC_INFO_VERSION -#define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall)) -#else -#define DL_INTERNAL_FUNCTION -#endif - -namespace { -struct GetTlsStaticInfoCall { - typedef void (*get_tls_func)(size_t*, size_t*); -}; -struct GetTlsStaticInfoRegparmCall { - typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION; -}; - -template <typename T> -void CallGetTls(void* ptr, size_t* size, size_t* align) { - typename T::get_tls_func get_tls; - CHECK_EQ(sizeof(get_tls), sizeof(ptr)); - internal_memcpy(&get_tls, &ptr, sizeof(ptr)); - CHECK_NE(get_tls, 0); - get_tls(size, align); -} - -bool CmpLibcVersion(int major, int minor, int patch) { - int ma; - int mi; - int pa; - if (!GetLibcVersion(&ma, &mi, &pa)) - return false; - if (ma > major) - return true; - if (ma < major) - return false; - if (mi > minor) - return true; - if (mi < minor) - return false; - return pa >= patch; -} - -} // namespace - -void InitTlsSize() { - // all current supported platforms have 16 bytes stack alignment - const size_t kStackAlign = 16; - void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info"); - size_t tls_size = 0; - size_t tls_align = 0; - // On i?86, _dl_get_tls_static_info used to be internal_function, i.e. - // __attribute__((regparm(3), stdcall)) before glibc 2.27 and is normal - // function in 2.27 and later. - if (CHECK_GET_TLS_STATIC_INFO_VERSION && !CmpLibcVersion(2, 27, 0)) - CallGetTls<GetTlsStaticInfoRegparmCall>(get_tls_static_info_ptr, - &tls_size, &tls_align); - else - CallGetTls<GetTlsStaticInfoCall>(get_tls_static_info_ptr, - &tls_size, &tls_align); - if (tls_align < kStackAlign) - tls_align = kStackAlign; - g_tls_size = RoundUpTo(tls_size, tls_align); -} -#else -void InitTlsSize() { } -#endif // SANITIZER_GLIBC && !SANITIZER_GO - +// ThreadDescriptorSize() is only used by lsan to get the pointer to +// thread-specific data keys in the thread control block. #if (defined(__x86_64__) || defined(__i386__) || defined(__mips__) || \ defined(__aarch64__) || defined(__powerpc64__) || defined(__s390__) || \ defined(__arm__) || SANITIZER_RISCV64) && \ @@ -330,13 +258,6 @@ uptr ThreadDescriptorSize() { return val; } -// The offset at which pointer to self is located in the thread descriptor. -const uptr kThreadSelfOffset = FIRST_32_SECOND_64(8, 16); - -uptr ThreadSelfOffset() { - return kThreadSelfOffset; -} - #if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 // TlsPreTcbSize includes size of struct pthread_descr and size of tcb // head structure. It lies before the static tls blocks. @@ -355,48 +276,61 @@ static uptr TlsPreTcbSize() { } #endif -uptr ThreadSelf() { - uptr descr_addr; -#if defined(__i386__) - asm("mov %%gs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset)); -#elif defined(__x86_64__) - asm("mov %%fs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset)); -#elif defined(__mips__) - // MIPS uses TLS variant I. The thread pointer (in hardware register $29) - // points to the end of the TCB + 0x7000. The pthread_descr structure is - // immediately in front of the TCB. TlsPreTcbSize() includes the size of the - // TCB and the size of pthread_descr. - const uptr kTlsTcbOffset = 0x7000; - uptr thread_pointer; - asm volatile(".set push;\ - .set mips64r2;\ - rdhwr %0,$29;\ - .set pop" : "=r" (thread_pointer)); - descr_addr = thread_pointer - kTlsTcbOffset - TlsPreTcbSize(); -#elif defined(__aarch64__) || defined(__arm__) - descr_addr = reinterpret_cast<uptr>(__builtin_thread_pointer()) - - ThreadDescriptorSize(); -#elif SANITIZER_RISCV64 - // https://github.com/riscv/riscv-elf-psabi-doc/issues/53 - uptr thread_pointer = reinterpret_cast<uptr>(__builtin_thread_pointer()); - descr_addr = thread_pointer - TlsPreTcbSize(); -#elif defined(__s390__) - descr_addr = reinterpret_cast<uptr>(__builtin_thread_pointer()); -#elif defined(__powerpc64__) - // PPC64LE uses TLS variant I. The thread pointer (in GPR 13) - // points to the end of the TCB + 0x7000. The pthread_descr structure is - // immediately in front of the TCB. TlsPreTcbSize() includes the size of the - // TCB and the size of pthread_descr. - const uptr kTlsTcbOffset = 0x7000; - uptr thread_pointer; - asm("addi %0,13,%1" : "=r"(thread_pointer) : "I"(-kTlsTcbOffset)); - descr_addr = thread_pointer - TlsPreTcbSize(); -#else -#error "unsupported CPU arch" -#endif - return descr_addr; +#if !SANITIZER_GO +namespace { +struct TlsRange { + uptr begin, end, align; + size_t tls_modid; + bool operator<(const TlsRange &rhs) const { return begin < rhs.begin; } +}; +} // namespace + +static int CollectStaticTlsRanges(struct dl_phdr_info *info, size_t size, + void *data) { + if (!info->dlpi_tls_data) + return 0; + const uptr begin = (uptr)info->dlpi_tls_data; + for (unsigned i = 0; i != info->dlpi_phnum; ++i) + if (info->dlpi_phdr[i].p_type == PT_TLS) { + static_cast<InternalMmapVector<TlsRange> *>(data)->push_back( + TlsRange{begin, begin + info->dlpi_phdr[i].p_memsz, + info->dlpi_phdr[i].p_align, info->dlpi_tls_modid}); + break; + } + return 0; } -#endif // (x86_64 || i386 || MIPS) && SANITIZER_LINUX + +static void GetStaticTlsRange(uptr *addr, uptr *size) { + InternalMmapVector<TlsRange> ranges; + dl_iterate_phdr(CollectStaticTlsRanges, &ranges); + uptr len = ranges.size(); + Sort(ranges.begin(), len); + // Find the range with tls_modid=1. For glibc, because libc.so uses PT_TLS, + // this module is guaranteed to exist and is one of the initially loaded + // modules. + uptr one = 0; + while (one != len && ranges[one].tls_modid != 1) ++one; + if (one == len) { + // This may happen with musl if no module uses PT_TLS. + *addr = 0; + *size = 0; + return; + } + // Find the maximum consecutive ranges. We consider two modules consecutive if + // the gap is smaller than the alignment. The dynamic loader places static TLS + // blocks this way not to waste space. + uptr l = one; + while (l != 0 && ranges[l].begin < ranges[l - 1].end + ranges[l - 1].align) + --l; + uptr r = one + 1; + while (r != len && ranges[r].begin < ranges[r - 1].end + ranges[r - 1].align) + ++r; + *addr = ranges[l].begin; + *size = ranges[r - 1].end - ranges[l].begin; +} +#endif // !SANITIZER_GO +#endif // (x86_64 || i386 || mips || ...) && SANITIZER_LINUX && + // !SANITIZER_ANDROID #if SANITIZER_FREEBSD static void **ThreadSelfSegbase() { @@ -468,18 +402,36 @@ static void GetTls(uptr *addr, uptr *size) { *size = 0; } #elif SANITIZER_LINUX + GetStaticTlsRange(addr, size); #if defined(__x86_64__) || defined(__i386__) || defined(__s390__) - *addr = ThreadSelf(); - *size = GetTlsSize(); - *addr -= *size; - *addr += ThreadDescriptorSize(); -#elif defined(__mips__) || defined(__aarch64__) || defined(__powerpc64__) || \ - defined(__arm__) || SANITIZER_RISCV64 - *addr = ThreadSelf(); - *size = GetTlsSize(); + // lsan requires the range to additionally cover the static TLS surplus + // (elf/dl-tls.c defines 1664). Otherwise there may be false positives for + // allocations only referenced by tls in dynamically loaded modules. + if (SANITIZER_GLIBC) { + *addr -= 1664; + *size += 1664; + } + // Extend the range to include the thread control block. On glibc, lsan needs + // the range to include pthread::{specific_1stblock,specific} so that + // allocations only referenced by pthread_setspecific can be scanned. This may + // underestimate by at most TLS_TCB_ALIGN-1 bytes but it should be fine + // because the number of bytes after pthread::specific is larger. + *size += ThreadDescriptorSize(); #else - *addr = 0; - *size = 0; + if (SANITIZER_GLIBC) + *size += 1664; +#if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 + const uptr pre_tcb_size = TlsPreTcbSize(); + *addr -= pre_tcb_size; + *size += pre_tcb_size; +#else + // arm and aarch64 reserve two words at TP, so this underestimates the range. + // However, this is sufficient for the purpose of finding the pointers to + // thread-specific data keys. + const uptr tcb_size = ThreadDescriptorSize(); + *addr -= tcb_size; + *size += tcb_size; +#endif #endif #elif SANITIZER_FREEBSD void** segbase = ThreadSelfSegbase(); @@ -520,17 +472,11 @@ static void GetTls(uptr *addr, uptr *size) { #if !SANITIZER_GO uptr GetTlsSize() { -#if SANITIZER_FREEBSD || SANITIZER_ANDROID || SANITIZER_NETBSD || \ +#if SANITIZER_FREEBSD || SANITIZER_LINUX || SANITIZER_NETBSD || \ SANITIZER_SOLARIS uptr addr, size; GetTls(&addr, &size); return size; -#elif SANITIZER_GLIBC -#if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 - return RoundUpTo(g_tls_size + TlsPreTcbSize(), 16); -#else - return g_tls_size; -#endif #else return 0; #endif @@ -553,10 +499,9 @@ void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, if (!main) { // If stack and tls intersect, make them non-intersecting. if (*tls_addr > *stk_addr && *tls_addr < *stk_addr + *stk_size) { - CHECK_GT(*tls_addr + *tls_size, *stk_addr); - CHECK_LE(*tls_addr + *tls_size, *stk_addr + *stk_size); - *stk_size -= *tls_size; - *tls_addr = *stk_addr + *stk_size; + if (*stk_addr + *stk_size < *tls_addr + *tls_size) + *tls_size = *stk_addr + *stk_size - *tls_addr; + *stk_size = *tls_addr - *stk_addr; } } #endif diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp index d7b0bde173c8..5055df1ec29a 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp @@ -548,9 +548,6 @@ uptr GetTlsSize() { return 0; } -void InitTlsSize() { -} - uptr TlsBaseAddr() { uptr segbase = 0; #if defined(__x86_64__) diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp index d58bd08fb1a8..01554349cc04 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp @@ -106,7 +106,6 @@ void DisableCoreDumperIfNecessary() {} void InstallDeadlySignalHandlers(SignalHandlerType handler) {} void SetAlternateSignalStack() {} void UnsetAlternateSignalStack() {} -void InitTlsSize() {} void SignalContext::DumpAllRegisters(void *context) {} const char *DescribeSignalOrException(int signo) { UNIMPLEMENTED(); } diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp index f383e130fa59..d47ccad1764d 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp @@ -846,9 +846,6 @@ uptr GetTlsSize() { return 0; } -void InitTlsSize() { -} - void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, uptr *tls_addr, uptr *tls_size) { #if SANITIZER_GO diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp index 80df9b497b2d..21c6b036b956 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp @@ -210,12 +210,10 @@ static void *WorkerThread(void *arg) { } TEST(SanitizerCommon, ThreadStackTlsMain) { - InitTlsSize(); TestThreadInfo(true); } TEST(SanitizerCommon, ThreadStackTlsWorker) { - InitTlsSize(); pthread_t t; PTHREAD_CREATE(&t, 0, WorkerThread, 0); PTHREAD_JOIN(t, 0); diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp index cb6c0724ac88..025cba922d2d 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp @@ -188,24 +188,9 @@ TEST(SanitizerCommon, SetEnvTest) { } #if (defined(__x86_64__) || defined(__i386__)) && !SANITIZER_ANDROID -void *thread_self_offset_test_func(void *arg) { - bool result = - *(uptr *)((char *)ThreadSelf() + ThreadSelfOffset()) == ThreadSelf(); - return (void *)result; -} - -TEST(SanitizerLinux, ThreadSelfOffset) { - EXPECT_TRUE((bool)thread_self_offset_test_func(0)); - pthread_t tid; - void *result; - ASSERT_EQ(0, pthread_create(&tid, 0, thread_self_offset_test_func, 0)); - ASSERT_EQ(0, pthread_join(tid, &result)); - EXPECT_TRUE((bool)result); -} - // libpthread puts the thread descriptor at the end of stack space. void *thread_descriptor_size_test_func(void *arg) { - uptr descr_addr = ThreadSelf(); + uptr descr_addr = (uptr)pthread_self(); pthread_attr_t attr; pthread_getattr_np(pthread_self(), &attr); void *stackaddr; diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp index 45acfe66ff3f..0d26f497f2bd 100644 --- a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp +++ b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp @@ -318,7 +318,6 @@ void InitializePlatform() { } CheckAndProtect(); - InitTlsSize(); #endif // !SANITIZER_GO } </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 </cut> Results regressed to (for first_bad == a26f1bf67ec70f72e64101cf483b26466928fc38) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-a26f1bf67ec70f72e64101cf483b26466928fc38/results_id: 1 # 462.libquantum,libquantum_base.default regressed by 104 from (for last_good == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-baseline/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/baseline-llvm-release-arm-spec2k6-O3_LTO/4271 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/4289 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 cd investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a26f1bf67ec70f72e64101cf483b26466928fc38 ../artifacts/test.sh # Reproduce last_good build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 --- llvm/lib/Passes/PassBuilder.cpp | 10 +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | 4 + llvm/test/CodeGen/AMDGPU/opt-pipeline.ll | 30 +++++--- llvm/test/Other/new-pm-defaults.ll | 7 +- llvm/test/Other/new-pm-thinlto-defaults.ll | 7 +- .../Other/new-pm-thinlto-postlink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-postlink-samplepgo-defaults.ll | 7 +- .../Other/new-pm-thinlto-prelink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-prelink-samplepgo-defaults.ll | 5 +- llvm/test/Other/opt-O2-pipeline.ll | 10 ++- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | 10 ++- llvm/test/Other/opt-O3-pipeline.ll | 10 ++- llvm/test/Other/opt-Os-pipeline.ll | 10 ++- llvm/test/Other/pass-pipelines.ll | 3 + llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll | 25 ++++--- .../PhaseOrdering/X86/spurious-peeling.ll | 87 +++++++++------------- llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll | 78 +++++++++---------- .../loop-rotation-vs-common-code-hoisting.ll | 22 +++--- 18 files changed, 193 insertions(+), 150 deletions(-) diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 3a325277e370..5a2285215769 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true, isLTOPreLink(Phase))); // TODO: Investigate promotion cap for O1. @@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + // Disable header duplication in loop rotation at -Oz. LPM1.addPass( LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 109e7c97ff1b..2c80a16febef 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses( MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO)); // TODO: Investigate promotion cap for O1. diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 34e5e6c647da..5e33d968c710 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -129,16 +129,20 @@ ; GCN-O1-NEXT: Simplify the CFG ; GCN-O1-NEXT: Reassociate expressions ; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O1-NEXT: Function Alias Analysis Results +; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: Canonicalize natural loops ; GCN-O1-NEXT: LCSSA Verifier ; GCN-O1-NEXT: Loop-Closed SSA Form Pass -; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O1-NEXT: Function Alias Analysis Results ; GCN-O1-NEXT: Scalar Evolution Analysis +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Loop Pass Manager +; GCN-O1-NEXT: Loop Invariant Code Motion ; GCN-O1-NEXT: Loop Pass Manager ; GCN-O1-NEXT: Rotate Loops -; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Lazy Branch Probability Analysis ; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: Loop Pass Manager @@ -451,16 +455,20 @@ ; GCN-O2-NEXT: Simplify the CFG ; GCN-O2-NEXT: Reassociate expressions ; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O2-NEXT: Function Alias Analysis Results +; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: Canonicalize natural loops ; GCN-O2-NEXT: LCSSA Verifier ; GCN-O2-NEXT: Loop-Closed SSA Form Pass -; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O2-NEXT: Function Alias Analysis Results ; GCN-O2-NEXT: Scalar Evolution Analysis +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis +; GCN-O2-NEXT: Loop Pass Manager +; GCN-O2-NEXT: Loop Invariant Code Motion ; GCN-O2-NEXT: Loop Pass Manager ; GCN-O2-NEXT: Rotate Loops -; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Lazy Branch Probability Analysis ; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: Loop Pass Manager @@ -810,16 +818,20 @@ ; GCN-O3-NEXT: Simplify the CFG ; GCN-O3-NEXT: Reassociate expressions ; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O3-NEXT: Function Alias Analysis Results +; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Natural Loop Information ; GCN-O3-NEXT: Canonicalize natural loops ; GCN-O3-NEXT: LCSSA Verifier ; GCN-O3-NEXT: Loop-Closed SSA Form Pass -; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O3-NEXT: Function Alias Analysis Results ; GCN-O3-NEXT: Scalar Evolution Analysis +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis +; GCN-O3-NEXT: Loop Pass Manager +; GCN-O3-NEXT: Loop Invariant Code Motion ; GCN-O3-NEXT: Loop Pass Manager ; GCN-O3-NEXT: Rotate Loops -; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Lazy Branch Probability Analysis ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll index 01b02b8fd482..337a0857701c 100644 --- a/llvm/test/Other/new-pm-defaults.ll +++ b/llvm/test/Other/new-pm-defaults.ll @@ -113,9 +113,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}> ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -156,6 +156,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll b/llvm/test/Other/new-pm-thinlto-defaults.ll index fbf47de87eeb..bba43dd50e7a 100644 --- a/llvm/test/Other/new-pm-thinlto-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-defaults.ll @@ -98,9 +98,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -139,6 +139,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll index 4bcf70e15a5b..57f0e0da73b6 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll @@ -68,10 +68,10 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -112,6 +112,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll index 1071d28432b9..0e0e2854b8df 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll @@ -78,9 +78,9 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -121,6 +121,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll index e2f1385cf52b..4cfb9825c97e 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll @@ -93,10 +93,10 @@ ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo @@ -158,6 +158,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll index d4dc552aea01..a05555c57003 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll @@ -73,8 +73,8 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis @@ -116,6 +116,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index f7217c122fdb..a3b01e5464d4 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -101,16 +101,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll index 6b98c1f80d9e..fafd5c8fdcb8 100644 --- a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll +++ b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 00a1d61ac058..103d49bbbbab 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index 21f9b8c6009e..508c21edbc68 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -87,16 +87,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/pass-pipelines.ll b/llvm/test/Other/pass-pipelines.ll index ccd364d5d740..768e8343529e 100644 --- a/llvm/test/Other/pass-pipelines.ll +++ b/llvm/test/Other/pass-pipelines.ll @@ -53,6 +53,9 @@ ; CHECK-O2-NEXT: FunctionPass Manager ; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager +; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager ; CHECK-O2-NOT: Manager ; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index 82deee9f367b..8f43029fa303 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -22,30 +22,33 @@ define dso_local i32 @main() { ; CHECK-NEXT: bb: ; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4 -; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]] +; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4 +; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]] ; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0 -; CHECK-NEXT: br label [[BB1:%.*]] -; CHECK: bb1: -; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]] -; CHECK: bb19.preheader: +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]] +; CHECK: bb27.preheader: ; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]] ; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4 ; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0 -; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]] -; CHECK: bb13.preheader.bb27.thread.split_crit_edge: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 +; CHECK-NEXT: br label [[BB27:%.*]] +; CHECK: bb27.thread: ; CHECK-NEXT: store i32 0, i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 0, i32* @c, align 4 ; CHECK-NEXT: br label [[BB32:%.*]] +; CHECK: bb27: +; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]] ; CHECK: bb32.loopexit: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: br label [[BB32]] ; CHECK: bb32: -; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ] +; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ] ; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4 ; CHECK-NEXT: ret i32 0 +; CHECK: bb36: +; CHECK-NEXT: store i32 1, i32* @c, align 4 +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]] ; bb: %i = alloca i32, align 4 diff --git a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll index 3e659414d982..4661bd8a36cc 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll @@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; OLDPM-NEXT: entry: ; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; OLDPM: for.body7.lr.ph.i: ; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] -; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]] -; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]] +; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] +; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]] +; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]] +; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; OLDPM: for.body7.i: -; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]] +; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1 +; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 ; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] ; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; OLDPM: _ZN12FloatVecPair6vecIncEv.exit: @@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; NEWPM-NEXT: entry: ; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; NEWPM: for.body7.lr.ph.i: ; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] -; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]] -; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]] -; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1 +; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] +; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]] +; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]] ; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; NEWPM: for.body7.i: -; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]] +; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]] -; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]] -; NEWPM: for.body7.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1 -; NEWPM-NEXT: br label [[FOR_BODY7_I]] +; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 +; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] +; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; NEWPM: _ZN12FloatVecPair6vecIncEv.exit: ; NEWPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll index 280f849dbb35..8b8b535f1a77 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll @@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-LABEL: @vdiv( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0 -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]] -; CHECK: for.body.lr.ph: +; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 -; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]] ; CHECK: vector.memcheck: ; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]] ; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]] ; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]] ; CHECK: vector.ph: ; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292 ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0 @@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7 ; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4 ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]] ; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8 ; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]] ; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12 ; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]] ; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16 ; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4 ; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0 -; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] ; CHECK: middle.block.unr-lcssa: ; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0 @@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ] ; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]] ; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4 ; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1 ; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]] -; CHECK: for.body.preheader: -; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]] +; CHECK: for.body.preheader8: +; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1 ; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 -; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0 -; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] +; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 +; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0 +; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] ; CHECK: for.body.prol.preheader: ; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]] ; CHECK: for.body.prol: ; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ] -; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ] +; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ] ; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]] ; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1 ; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1 ; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]] ; CHECK: for.body.prol.loopexit: -; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] +; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] ; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3 -; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]] -; CHECK: for.body.preheader.new: +; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]] +; CHECK: for.body.preheader8.new: ; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]] ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]] -; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]] ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2 ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]] ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3 ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]] ; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4 ; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]] ; CHECK: for.end: ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll index 7d7d18a5247d..bb320af193e3 100644 --- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll +++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll @@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_OLDPM: for.cond.preheader: +; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_OLDPM: for.body.preheader: ; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]] +; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_OLDPM: for.cond.cleanup: ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f2() ; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_OLDPM: for.body: -; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ] +; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f1() -; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1 +; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1 ; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]] ; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] ; ROTATED_LATER_OLDPM: return: @@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_NEWPM: for.cond.preheader: +; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_NEWPM: for.body.preheader: ; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]] -; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_NEWPM: for.cond.cleanup: ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f2() ; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_NEWPM: for.body: -; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ] +; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f1() ; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]] ; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]] ; ROTATED_LATER_NEWPM: return: ; ROTATED_LATER_NEWPM-NEXT: ret void </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 12 - Successful!

by ci_notify＠linaro.org

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 0237dbfdd38053cc190f814b6f92e311ae3509c6 Author: Chuanqi Xu <yedeng.yd(a)linux.alibaba.com> Date: Tue Jul 27 13:13:39 2021 +0800 [Coroutine] Record the elided coroutines Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105606 </cut> Results regressed to (for first_bad == 0237dbfdd38053cc190f814b6f92e311ae3509c6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-0237dbfdd38053cc190f814b6f92e311ae3509c6/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 104 from (for last_good == dbefcde6da1b58eb181dcbd8d7913175b2ec8350) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-dbefcde6da1b58eb181dcbd8d7913175b2ec8350/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4264 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4262 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-0237dbfdd38053cc190f814b6f92e311ae3509c6 cd investigate-llvm-0237dbfdd38053cc190f814b6f92e311ae3509c6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 0237dbfdd38053cc190f814b6f92e311ae3509c6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach dbefcde6da1b58eb181dcbd8d7913175b2ec8350 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 0237dbfdd38053cc190f814b6f92e311ae3509c6 Author: Chuanqi Xu <yedeng.yd(a)linux.alibaba.com> Date: Tue Jul 27 13:13:39 2021 +0800 [Coroutine] Record the elided coroutines Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105606 --- llvm/lib/Transforms/Coroutines/CoroElide.cpp | 28 ++++++++++++++++++++++ .../{coro-elide-count.ll => coro-elide-stat.ll} | 7 ++++++ 2 files changed, 35 insertions(+) diff --git a/llvm/lib/Transforms/Coroutines/CoroElide.cpp b/llvm/lib/Transforms/Coroutines/CoroElide.cpp index d35a5de6d4bd..84bebb7bf42d 100644 --- a/llvm/lib/Transforms/Coroutines/CoroElide.cpp +++ b/llvm/lib/Transforms/Coroutines/CoroElide.cpp @@ -17,6 +17,7 @@ #include "llvm/InitializePasses.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FileSystem.h" using namespace llvm; @@ -24,6 +25,12 @@ using namespace llvm; STATISTIC(NumOfCoroElided, "The # of coroutine get elided."); +#ifndef NDEBUG +static cl::opt<std::string> CoroElideInfoOutputFilename( + "coro-elide-info-output-file", cl::value_desc("filename"), + cl::desc("File to record the coroutines got elided"), cl::Hidden); +#endif + namespace { // Created on demand if the coro-elide pass has work to do. struct Lowerer : coro::LowererBase { @@ -121,6 +128,21 @@ static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) { llvm_unreachable("no terminator in the entry block"); } +#ifndef NDEBUG +static std::unique_ptr<raw_fd_ostream> getOrCreateLogFile() { + assert(!CoroElideInfoOutputFilename.empty() && + "coro-elide-info-output-file shouldn't be empty"); + std::error_code EC; + auto Result = std::make_unique<raw_fd_ostream>(CoroElideInfoOutputFilename, + EC, sys::fs::OF_Append); + if (!EC) + return Result; + llvm::errs() << "Error opening coro-elide-info-output-file '" + << CoroElideInfoOutputFilename << " for appending!\n"; + return std::make_unique<raw_fd_ostream>(2, false); // stderr. +} +#endif + // To elide heap allocations we need to suppress code blocks guarded by // llvm.coro.alloc and llvm.coro.free instructions. void Lowerer::elideHeapAllocations(Function *F, uint64_t FrameSize, @@ -344,6 +366,12 @@ bool Lowerer::processCoroId(CoroIdInst *CoroId, AAResults &AA, FrameSizeAndAlign.second, AA); coro::replaceCoroFree(CoroId, /*Elide=*/true); NumOfCoroElided++; +#ifndef NDEBUG + if (!CoroElideInfoOutputFilename.empty()) + *getOrCreateLogFile() + << "Elide " << CoroId->getCoroutine()->getName() << " in " + << CoroId->getFunction()->getName() << "\n"; +#endif } return true; diff --git a/llvm/test/Transforms/Coroutines/coro-elide-count.ll b/llvm/test/Transforms/Coroutines/coro-elide-stat.ll similarity index 92% rename from llvm/test/Transforms/Coroutines/coro-elide-count.ll rename to llvm/test/Transforms/Coroutines/coro-elide-stat.ll index ae40a74f41d5..e92d484786c5 100644 --- a/llvm/test/Transforms/Coroutines/coro-elide-count.ll +++ b/llvm/test/Transforms/Coroutines/coro-elide-stat.ll @@ -4,8 +4,15 @@ ; RUN: opt < %s -S \ ; RUN: -passes='cgscc(repeat<2>(inline,function(coro-elide,dce)))' -stats 2>&1 \ ; RUN: | FileCheck %s +; RUN: opt < %s --disable-output \ +; RUN: -passes='cgscc(repeat<2>(inline,function(coro-elide,dce)))' \ +; RUN: -coro-elide-info-output-file=%t && \ +; RUN: cat %t \ +; RUN: | FileCheck %s --check-prefix=FILE ; CHECK: 2 coro-elide - The # of coroutine get elided. +; FILE: Elide f in callResume +; FILE: Elide f in callResumeMultiRetDommmed declare void @print(i32) nounwind </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 8 - Successful!

by ci_notify＠linaro.org

3 years, 10 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain