- linaro-toolchain - lists.linaro.org

[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp: commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 Author: Tom de Vries <tdevries(a)suse.de> [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_VECT_mthumb artifacts/build-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-master-arm_eabi-coremark-O3_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 cd investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3814a9e1fe77c01c7e872c25afa198537d4ac780 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 Author: Tom de Vries <tdevries(a)suse.de> Date: Fri Sep 24 12:39:14 2021 +0200 [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp When interpreting the testsuite results, it's often relevant what kind of machine the testsuite ran on. On a local machine one can just do /proc/cpuinfo, but in case of running tests using a remote system that distributes test runs to other remote systems that are not directly accessible, that's not possible. Fix this by dumping /proc/cpuinfo into the gdb.log, as well as lsb_release -a and uname -a. We could do this at the start of each test run, by putting it into unix.exp or some such. However, this might be too verbose, so we choose to put it into its own test-case, such that it get triggered in a full testrun, but not when running one or a subset of tests. We put the test-case into the gdb.testsuite directory, which is currently the only place in the testsuite where we do not test gdb. [ Though perhaps this could be put into a new gdb.info directory, since the test-case doesn't actually test the testsuite. ] Tested on x86_64-linux. --- gdb/testsuite/gdb.testsuite/dump-system-info.exp | 48 ++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/gdb/testsuite/gdb.testsuite/dump-system-info.exp b/gdb/testsuite/gdb.testsuite/dump-system-info.exp new file mode 100644 index 00000000000..bf181469bd5 --- /dev/null +++ b/gdb/testsuite/gdb.testsuite/dump-system-info.exp @@ -0,0 +1,48 @@ +# Copyright 2021 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# The purpose of this test-case is to dump /proc/cpuinfo and similar system +# info into gdb.log. + +# Check if /proc/cpuinfo is available. +set res [remote_exec target "test -r /proc/cpuinfo"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 && $output == "" } { + verbose -log "Cpuinfo available, dumping:" + remote_exec target "cat /proc/cpuinfo" +} else { + verbose -log "Cpuinfo not available" +} + +set res [remote_exec target "lsb_release -a"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 } { + verbose -log "lsb_release -a availabe, dumping:\n$output" +} else { + verbose -log "lsb_release -a not available" +} + +set res [remote_exec target "uname -a"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 } { + verbose -log "uname -a availabe, dumping:\n$output" +} else { + verbose -log "uname -a not available" +} </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] 464.h264ref slowed down by 3% after llvm: Fix test from 8dd42f, capitalization in test

by ci_notify＠linaro.org

After llvm commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af Author: Erich Keane <erich.keane(a)intel.com> Fix test from 8dd42f, capitalization in test the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 3% from 10973 to 11249 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 12% from 1446 to 1619 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af cd investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh # Reproduce last_good build git checkout --detach 77d200a546136c2855063613ff4bca1f682fb23a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af Author: Erich Keane <erich.keane(a)intel.com> Date: Fri Sep 24 10:24:17 2021 -0700 Fix test from 8dd42f, capitalization in test --- clang/test/CXX/drs/dr17xx.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/test/CXX/drs/dr17xx.cpp b/clang/test/CXX/drs/dr17xx.cpp index 42303c83ae3c..c8648908ebda 100644 --- a/clang/test/CXX/drs/dr17xx.cpp +++ b/clang/test/CXX/drs/dr17xx.cpp @@ -129,7 +129,7 @@ namespace dr1778 { // dr1778: 9 namespace dr1762 { // dr1762: 14 #if __cplusplus >= 201103L float operator ""_E(const char *); - // expected-error@+2 {{invalid suffix on literal; c++11 requires a space between literal and identifier}} + // expected-error@+2 {{invalid suffix on literal; C++11 requires a space between literal and identifier}} // expected-warning@+1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}} float operator ""E(const char *); #endif </cut>

4 years, 9 months

2
1
0 0

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Thanks, Stanislav, FWIW, it will be, probably, easier for you to just rebuild the compiler, it is an x86_64-linux-gnu -> arm-linux-gnueabihf cross. This link has the build log [1]. cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=ARM Then compile the pre-processed source with plain -O2 or -O3 optimisation settings. [1] https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 24 Sep 2021, at 20:30, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > > [AMD Official Use Only] > > I have reverted the whole change. There was yet another perf regression report. > > Stas > > From: Mekhanoshin, Stanislav > Sent: Thursday, September 23, 2021 11:48 > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > Thanks. I see the reload. There shall not be extra pressure since that is the whole idea, make pressure less. However, I see more spills in that specific file, fast_algorithms.s if I get it right. > Can I get the IR for it? Something to feed llc. > > Stas > > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > Sent: Thursday, September 23, 2021 2:31 > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > [CAUTION: External Email] > > Thanks, Stanislav. > > I’ve looked into profile dumps, and 456.hmmer’s hot loop get several additional reloads. E.g., "ldr r1, [sp, #84]” generates 203 additional samples, which translates into 20 seconds of time just for that one instruction. > > See the attached profile dumps and the the screenshot with the hot loop highlighted. > > Maybe your patch increases register pressure too much? > > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org > > > On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > > > > [AMD Official Use Only] > > > > There are actually couple things worth to try if that is easy: > > > > https://reviews.llvm.org/D109077 > > https://reviews.llvm.org/differential/diff/374324/ > > > > Both may slightly change spill weights and then spilling pattern. > > > > Stas > > > > -----Original Message----- > > From: Mekhanoshin, Stanislav > > Sent: Wednesday, September 22, 2021 12:09 > > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > > > I assume some of the newly rematerialized instructions caused perf drops. Probably some very specific ones. I would appreciate if you could point them to me. > > In addition I believe I would need to have a linked or optimized bitcode to feed into llc. > > > > Stas > > > > -----Original Message----- > > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > > Sent: Wednesday, September 22, 2021 12:06 > > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > > > [CAUTION: External Email] > > > > Hi Stanislav, > > > > That's fair; I or someone from Linaro will try to analyze this and follow up here. > > > > On a more general note, what info would you like to see in these benchmarking regression reports? > > > > Thanks, > > > > -- > > Maxim Kuvyrkov > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > > > > > >> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > >> > >> [AMD Official Use Only] > >> > >> Hm... I'd really like to help, but I do not think I can do anything with megabytes of code in an asm which I do not understand and tons of differences in 48 asm files. > >> What I can see there is overall less spilling code which was the intent in the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 less of them. > >> I doubt I could say much more without someone pointing to the actual root cause. > >> > >> Stas > >> > >> -----Original Message----- > >> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > >> Sent: Wednesday, September 22, 2021 5:16 > >> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > >> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > >> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > >> > >> [CAUTION: External Email] > >> > >> Hi Stanislav, > >> > >> Attached is a tarball with -save-temps output (pre-processed source and generated assembly) for first-bad run (your commit) and last-good run (immediate parent of your commit). > >> > >> -- > >> Maxim Kuvyrkov > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > >> > >>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > >>> > >>> [AMD Official Use Only] > >>> > >>> Thanks for letting me know. Some regressions are inevitable, however do you happen to have any analysis and dumps? I myself do not understand ARM ISA well... > >>> > >>> Stas > >>> > >>> -----Original Message----- > >>> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > >>> Sent: Wednesday, September 15, 2021 5:52 > >>> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > >>> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > >>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > >>> > >>> [CAUTION: External Email] > >>> > >>> Hi Stanislav, > >>> > >>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. > >>> > >>> -- > >>> Maxim Kuvyrkov > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > >>> > >> > >> > > > <image001.png>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit fcc561a54de2beb19cb325094fbd3ec76f96e520 Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-fcc561a54de2beb19cb325094fbd3ec76f96e520/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-fcc561a54de2beb19cb325094fbd3ec76f96e520 cd investigate-binutils-fcc561a54de2beb19cb325094fbd3ec76f96e520 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach fcc561a54de2beb19cb325094fbd3ec76f96e520 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit fcc561a54de2beb19cb325094fbd3ec76f96e520 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Wed Sep 22 00:00:31 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 338c1288a22..c45d963473c 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210921 +#define BFD_VERSION_DATE 20210922 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc: Daily bump.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Daily bump.: commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Daily bump. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-d4b84aefe696a5783a58a30b3fb8dc4617cd147a/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a cd investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d4b84aefe696a5783a58a30b3fb8dc4617cd147a ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Date: Tue Sep 21 00:17:57 2021 +0000 Daily bump. --- gcc/DATESTAMP | 2 +- gcc/fortran/ChangeLog | 5 +++++ gcc/testsuite/ChangeLog | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index c1155ef2341..ed865cb70ab 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20210920 +20210921 diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog index f6863fb900a..3d53ed99f33 100644 --- a/gcc/fortran/ChangeLog +++ b/gcc/fortran/ChangeLog @@ -1,3 +1,8 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' + on 'distribute' for combined directives, matching OpenMP 5.0 + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2ea65ee2d7f..7f8d142942a 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail

by ci_notify＠linaro.org

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail: commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Cygwin: allow open_setup to fail Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-e5fcb021cc9dcb1f19d45030457be86b4a226e65/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 cd investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /newlib/ ./ ./bisect/baseline/ cd newlib # Reproduce first_bad build git checkout --detach e5fcb021cc9dcb1f19d45030457be86b4a226e65 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b0841aa789e74b6778744b89af76b60bd1a78bc ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Date: Sat Sep 18 08:13:55 2021 -0400 Cygwin: allow open_setup to fail Convert fhandler_base::open_setup to a (virtual) method that returns a bool result. For the moment, it and its overrides always return true. --- winsup/cygwin/fhandler.cc | 3 ++- winsup/cygwin/fhandler.h | 10 +++++----- winsup/cygwin/fhandler_console.cc | 4 ++-- winsup/cygwin/fhandler_pipe.cc | 9 +++++++-- winsup/cygwin/fhandler_tty.cc | 8 ++++---- 5 files changed, 20 insertions(+), 14 deletions(-) diff --git a/winsup/cygwin/fhandler.cc b/winsup/cygwin/fhandler.cc index 9dfe70be38..1af469e0c9 100644 --- a/winsup/cygwin/fhandler.cc +++ b/winsup/cygwin/fhandler.cc @@ -789,9 +789,10 @@ fhandler_base::fd_reopen (int, mode_t) return NULL; } -void +bool fhandler_base::open_setup (int) { + return true; } /* states: diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h index 61113e6981..3471e95b97 100644 --- a/winsup/cygwin/fhandler.h +++ b/winsup/cygwin/fhandler.h @@ -355,7 +355,7 @@ class fhandler_base int open_null (int flags); virtual int open (int, mode_t); virtual fhandler_base *fd_reopen (int, mode_t); - virtual void open_setup (int flags); + virtual bool open_setup (int flags); void set_unique_id (int64_t u) { unique_id = u; } void set_unique_id () { NtAllocateLocallyUniqueId ((PLUID) &unique_id); } @@ -1206,7 +1206,7 @@ public: select_record *select_except (select_stuff *); char *get_proc_fd_name (char *buf); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); void fixup_after_fork (HANDLE); int dup (fhandler_base *child, int); void set_close_on_exec (bool val); @@ -2132,7 +2132,7 @@ private: bool use_archetype () const {return true;} int open (int flags, mode_t mode); - void open_setup (int flags); + bool open_setup (int flags); int dup (fhandler_base *, int); void __reg3 read (void *ptr, size_t& len); @@ -2300,7 +2300,7 @@ class fhandler_pty_slave: public fhandler_pty_common HANDLE& get_handle_nat () { return io_handle_nat; } int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int init (HANDLE, DWORD, mode_t); @@ -2399,7 +2399,7 @@ public: void doecho (const void *str, DWORD len); int accept_input (); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int close (); diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc index e00f2cdbcc..ee862b17d1 100644 --- a/winsup/cygwin/fhandler_console.cc +++ b/winsup/cygwin/fhandler_console.cc @@ -1366,13 +1366,13 @@ fhandler_console::open (int flags, mode_t) return 1; } -void +bool fhandler_console::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); if (myself->set_ctty (this, flags) && !myself->cygstarted) init_console_handler (true); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } int diff --git a/winsup/cygwin/fhandler_pipe.cc b/winsup/cygwin/fhandler_pipe.cc index 73ace3ac53..590ecf6670 100644 --- a/winsup/cygwin/fhandler_pipe.cc +++ b/winsup/cygwin/fhandler_pipe.cc @@ -191,10 +191,11 @@ out: return 0; } -void +bool fhandler_pipe::open_setup (int flags) { - fhandler_base::open_setup (flags); + if (!fhandler_base::open_setup (flags)) + goto err; if (get_dev () == FH_PIPER && !read_mtx) { SECURITY_ATTRIBUTES *sa = sec_none_cloexec (flags); @@ -211,6 +212,10 @@ fhandler_pipe::open_setup (int flags) } if (get_dev () == FH_PIPEW && !query_hdl) set_pipe_non_blocking (is_nonblocking ()); + return true; + +err: + return false; } off_t diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 1ea9a47ac5..05fe5348af 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -964,13 +964,13 @@ err_no_msg: return 0; } -void +bool fhandler_pty_slave::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); myself->set_ctty (this, flags); report_tty_counts (this, "opened", ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } void @@ -1947,14 +1947,14 @@ fhandler_pty_master::open (int flags, mode_t) return 1; } -void +bool fhandler_pty_master::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); char buf[sizeof ("opened pty master for ptyNNNNNNNNNNN")]; __small_sprintf (buf, "opened pty master for pty%d", get_minor ()); report_tty_counts (this, buf, ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } off_t </cut>

4 years, 9 months

1
0
0 0

gcc-linaro-6.3.1-2017.05-i686_aarch64-elf.tar.xz

by maytte sanchez

I’m trying to import these files into Ds-5. After unzipping files, it still will not show up in ds-5 search. Below is the error that I keep receiving: Sent from my iPhone

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-27439f0edab99c6870cf7fe042074e47632f3fbd/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd cd investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh # Reproduce last_good build git checkout --detach 6060c2f3373e18f76fa9e3e4d7cf2f3d5983da03 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Tue Sep 21 00:00:39 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 72a41aba322..338c1288a22 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210920 +#define BFD_VERSION_DATE 20210921 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute: commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> GCC11 - Fortran: combined directives - order(concurrent) not on distribute Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-b1dc26d3543d79805751c26ba5b142eeeb1f55b8/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 cd investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 79c523d40de1b7ce1dd0f4865c0855ab2bf6744b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> Date: Mon Sep 20 17:24:56 2021 +0200 GCC11 - Fortran: combined directives - order(concurrent) not on distribute While OpenMP 5.1 and GCC 12 permits 'order(concurrent)' on distribute, OpenMP 5.0 and GCC 11 don't. This patch for GCC 11 ensures the clause also does not end up on 'distribute' when splitting combined directives. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' on 'distribute' for combined directives, matching OpenMP 5.0 gcc/testsuite/ChangeLog: * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. --- gcc/fortran/trans-openmp.c | 2 -- .../gomp/distribute-order-concurrent.f90 | 25 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 7e931bf4bc7..973d916b4a2 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -5176,8 +5176,6 @@ gfc_split_omp_clauses (gfc_code *code, /* Duplicate collapse. */ clausesa[GFC_OMP_SPLIT_DISTRIBUTE].collapse = code->ext.omp_clauses->collapse; - clausesa[GFC_OMP_SPLIT_DISTRIBUTE].order_concurrent - = code->ext.omp_clauses->order_concurrent; } if (mask & GFC_OMP_MASK_PARALLEL) { diff --git a/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 new file mode 100644 index 00000000000..9597d913684 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 @@ -0,0 +1,25 @@ +! { dg-additional-options "-fdump-tree-original" } +! +! In OpenMP 5.0, 'order(concurrent)' does not apply to distribute +! Ensure that it is rejected in GCC 11. +! +! Note: OpenMP 5.1 allows it; the GCC 12 testcase for it is gfortran.dg/gomp/order-5.f90 + +subroutine f(a) +implicit none +integer :: i, thr +!save :: thr +integer :: a(:) + +!$omp distribute parallel do order(concurrent) private(thr) + do i = 1, 10 + thr = 5 + a(i) = thr + end do +!$omp end distribute parallel do +end + +! { dg-final { scan-tree-dump-not "omp distribute\[^\n\r]*order" "original" } } +! { dg-final { scan-tree-dump "#pragma omp distribute\[\n\r\]" "original" } } +! { dg-final { scan-tree-dump "#pragma omp parallel private\$thr\$" "original" } } +! { dg-final { scan-tree-dump "#pragma omp for nowait order\$concurrent\$" "original" } } </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 450.soplex grew in size by 2% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks grew in size by more than 1%: - 450.soplex grew in size by 2% from 207260 to 211436 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 7% after llvm: [JumpThreading] Ignore free instructions

by ci_notify＠linaro.org

After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> [JumpThreading] Ignore free instructions the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 7% from 10715 to 11434 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Wed Sep 22 21:34:24 2021 +0200 [JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290 --- .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 ++++++++++------------ .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- .../inlining-alignment-assumptions.ll | 12 ++--- 4 files changed, 52 insertions(+), 53 deletions(-) diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h index 816ea1071e52..0ac7d7c62b7a 100644 --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h @@ -44,6 +44,7 @@ class PHINode; class SelectInst; class SwitchInst; class TargetLibraryInfo; +class TargetTransformInfo; class Value; /// A private "module" namespace for types and utilities used by @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; /// revectored to the false side of the second if. class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { TargetLibraryInfo *TLI; + TargetTransformInfo *TTI; LazyValueInfo *LVI; AAResults *AA; DomTreeUpdater *DTU; @@ -99,9 +101,9 @@ public: JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); // Glue for old PM. - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, - std::unique_ptr<BlockFrequencyInfo> BFI, + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI, + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, std::unique_ptr<BranchProbabilityInfo> BPI); PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp index 688902ecb9ff..fe9a7211967c 100644 --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { dbgs() << "LVI for function '" << F.getName() << "':\n"; @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, } bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, - LazyValueInfo *LVI_, AliasAnalysis *AA_, - DomTreeUpdater *DTU_, bool HasProfileData_, + TargetTransformInfo *TTI_, LazyValueInfo *LVI_, + AliasAnalysis *AA_, DomTreeUpdater *DTU_, + bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_, std::unique_ptr<BranchProbabilityInfo> BPI_) { LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n"); TLI = TLI_; + TTI = TTI_; LVI = LVI_; AA = AA_; DTU = DTU_; @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value *ToVal) { /// Return the cost of duplicating a piece of this block from first non-phi /// and before StopAt instruction to thread across it. Stop scanning the block /// when exceeding the threshold. If duplication is impossible, returns ~0U. -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, + BasicBlock *BB, Instruction *StopAt, unsigned Threshold) { assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, if (Size > Threshold) return Size; - // Debugger intrinsics don't incur code size. - if (isa<DbgInfoIntrinsic>(I)) continue; - - // Pseudo-probes don't incur code size. - if (isa<PseudoProbeInst>(I)) - continue; - - // If this is a pointer->pointer bitcast, it is free. - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) - continue; - - // Freeze instruction is free, too. - if (isa<FreezeInst>(I)) - continue; - // Bail out if this instruction gives back a token type, it is not possible // to duplicate it if it is used outside this BB. if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) return ~0U; + // Blocks with NoDuplicate are modelled as having infinite cost, so they + // are never duplicated. + if (const CallInst *CI = dyn_cast<CallInst>(I)) + if (CI->cannotDuplicate() || CI->isConvergent()) + return ~0U; + + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) + == TargetTransformInfo::TCC_Free) + continue; + // All other instructions count for at least one unit. ++Size; @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, // as having cost of 2 total, and if they are a vector intrinsic, we model // them as having cost 1. if (const CallInst *CI = dyn_cast<CallInst>(I)) { - if (CI->cannotDuplicate() || CI->isConvergent()) - // Blocks with NoDuplicate are modelled as having infinite cost, so they - // are never duplicated. - return ~0U; - else if (!isa<IntrinsicInst>(CI)) + if (!isa<IntrinsicInst>(CI)) Size += 3; else if (!CI->getType()->isVectorTy()) Size += 1; @@ -2234,10 +2228,10 @@ bool JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, } // Compute the cost of duplicating BB and PredBB. - unsigned BBCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned BBCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); unsigned PredBBCost = getJumpThreadDuplicationCost( - PredBB, PredBB->getTerminator(), BBDupThreshold); + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); // Give up if costs are too high. We need to check BBCost and PredBBCost // individually before checking their sum because getJumpThreadDuplicationCost @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( return false; } - unsigned JumpThreadCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned JumpThreadCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (JumpThreadCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() << "' - Cost is too high: " << JumpThreadCost << "\n"); @@ -2614,8 +2608,8 @@ bool JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( return false; } - unsigned DuplicationCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned DuplicationCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (DuplicationCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() << "' - Cost is too high: " << DuplicationCost << "\n"); @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, IntrinsicInst *Guard, ValueToValueMapTy UnguardedMapping, GuardedMapping; Instruction *AfterGuard = Guard->getNextNode(); - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, BBDupThreshold); + unsigned Cost = + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); if (Cost > BBDupThreshold) return false; // Duplicate all instructions before the guard and the guard itself to the diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll b/llvm/test/Transforms/JumpThreading/free_instructions.ll index f768ec996779..76392af77d33 100644 --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll @@ -5,26 +5,28 @@ ; the jump threading threshold, as everything else are free instructions. define i32 @free_instructions(i1 %c, i32* %p) { ; CHECK-LABEL: @free_instructions( -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] -; CHECK: if: +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] +; CHECK: if2: ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 -; CHECK-NEXT: br label [[JOIN:%.*]] -; CHECK: else: -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 -; CHECK-NEXT: br label [[JOIN]] -; CHECK: join: ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META0:![0-9]+]]) ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* +; CHECK-NEXT: [[P32:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P21]]) +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 +; CHECK-NEXT: ret i32 0 +; CHECK: else2: +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META4:![0-9]+]]) +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* ; CHECK-NEXT: [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P2]]) ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] -; CHECK: if2: -; CHECK-NEXT: ret i32 0 -; CHECK: else2: ; CHECK-NEXT: ret i32 1 ; br i1 %c, label %if, label %else diff --git a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll index 57014e856a09..f764a59dd8a2 100644 --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] ; ; ASSUMPTIONS-ON-LABEL: @caller1( -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE1:%.*]] -; ASSUMPTIONS-ON: false1: -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE2:%.*]] ; ASSUMPTIONS-ON: common.ret: -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, [[TMP0:%.*]] ] -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR]], i64 8) ] +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, [[TMP0:%.*]] ] +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR:%.*]], i64 8) ] ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: ret void +; ASSUMPTIONS-ON: false2: +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] ; br i1 %c, label %true1, label %false1 </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 24 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Still looking at the mess that is non-unique bus names. Worked through exactly which devices and machine types are affected for the i2c bus. + Sent a patchset which tries to make the "create a bus" function names a bit more regular across different bus types. * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Luis figured out why GDB was crashing when fed the MVE XML by QEMU's gdbstub; this was a combination of QEMU giving GDB some non-standard extra registers in its "vfp" XML feature and GDB not being robust enough against those unexpected extras. Sent out a patchset which cleans up QEMU's XML in this area and also implements the extra XML for MVE. (This will only go into QEMU once the GDB patches have landed and the XML format is nailed down.) -- PMM

4 years, 9 months

1
0
0 0

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Hi Stanislav, FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. -- Maxim Kuvyrkov https://www.linaro.org > On 15 Sep 2021, at 12:54, ci_notify(a)linaro.org wrote: > > After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > > Allow rematerialization of virtual reg uses > > the following benchmarks slowed down by more than 2%: > - 456.hmmer slowed down by 6% > - 482.sphinx3 slowed down by 3% > > Benchmark: > Toolchain: Clang + Glibc + LLVM Linker > Version: all components were built from their tip of trunk > Target: arm-linux-gnueabihf > Compiler flags: -O3 -marm > Hardware: NVidia TK1 4x Cortex-A15 > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2 > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > Date: Thu Aug 19 11:42:09 2021 -0700 > > Allow rematerialization of virtual reg uses > > Currently isReallyTriviallyReMaterializableGeneric() implementation > prevents rematerialization on any virtual register use on the grounds > that is not a trivial rematerialization and that we do not want to > extend liveranges. > > It appears that LRE logic does not attempt to extend a liverange of > a source register for rematerialization so that is not an issue. > That is checked in the LiveRangeEdit::allUsesAvailableAt(). > > The only non-trivial aspect of it is accounting for tied-defs which > normally represent a read-modify-write operation and not rematerializable. > > The test for a tied-def situation already exists in the > /CodeGen/AMDGPU/remat-vop.mir, > test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. > > The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets > where I more or less understand the asm it seems to reduce spilling > (as expected) or be neutral. However, it needs a review by all targets' > specialists. > > Differential Revision: https://reviews.llvm.org/D106408 > --- > llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- > llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- > llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + > llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- > llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- > llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- > .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- > llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- > llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- > llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- > llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- > llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- > llvm/test/CodeGen/Mips/tls.ll | 4 +- > llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- > llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- > llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- > llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- > llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- > llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- > llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- > llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- > .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- > llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- > llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- > llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- > .../tail-pred-disabled-in-loloops.ll | 14 +- > .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- > .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- > llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- > llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- > llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- > llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- > llvm/test/CodeGen/X86/addcarry.ll | 20 +- > llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- > llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- > llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- > llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- > llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- > llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- > 42 files changed, 4217 insertions(+), 4202 deletions(-) > > diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > index 2f853a2c6f9f..1c05afba730d 100644 > --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h > +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > @@ -117,10 +117,11 @@ public: > const MachineFunction &MF) const; > > /// Return true if the instruction is trivially rematerializable, meaning it > - /// has no side effects and requires no operands that aren't always available. > - /// This means the only allowed uses are constants and unallocatable physical > - /// registers so that the instructions result is independent of the place > - /// in the function. > + /// has no side effects. Uses of constants and unallocatable physical > + /// registers are always trivial to rematerialize so that the instructions > + /// result is independent of the place in the function. Uses of virtual > + /// registers are allowed but it is caller's responsility to ensure these > + /// operands are valid at the point the instruction is beeing moved. > bool isTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA = nullptr) const { > return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || > @@ -140,8 +141,7 @@ protected: > /// set, this hook lets the target specify whether the instruction is actually > /// trivially rematerializable, taking into consideration its operands. This > /// predicate must return false if the instruction has any side effects other > - /// than producing a value, or if it requres any address registers that are > - /// not always available. > + /// than producing a value. > /// Requirements must be check as stated in isTriviallyReMaterializable() . > virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA) const { > diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp > index 1eab8e7443a7..fe7d60e0b7e2 100644 > --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp > +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp > @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > const MachineRegisterInfo &MRI = MF.getRegInfo(); > > // Remat clients assume operand 0 is the defined register. > - if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) > + if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || > + MI.getOperand(0).isTied()) > return false; > Register DefReg = MI.getOperand(0).getReg(); > > @@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > // same virtual register, though. > if (MO.isDef() && Reg != DefReg) > return false; > - > - // Don't allow any virtual-register uses. Rematting an instruction with > - // virtual register uses would length the live ranges of the uses, which > - // is not necessarily a good idea, certainly not "trivial". > - if (MO.isUse()) > - return false; > } > > // Everything checked out. > diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > index ed799bfca028..c9915aaabfde 100644 > --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir > +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > @@ -51,6 +51,66 @@ body: | > S_NOP 0, implicit %2 > S_ENDPGM 0 > ... > +# The liverange of %0 covers a point of rematerialization, source value is > +# availabe. > +--- > +name: test_remat_s_mov_b32_vreg_src_long_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_NOP 0, implicit %0 > + S_ENDPGM 0 > +... > +# The liverange of %0 does not cover a point of rematerialization, source value is > +# unavailabe and we do not want to artificially extend the liverange. > +--- > +name: test_no_remat_s_mov_b32_vreg_src_short_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) > + ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_ENDPGM 0 > +... > --- > name: test_remat_s_mov_b64 > tracksRegLiveness: true > diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > index a4243276c70a..175a2069a441 100644 > --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; ENABLE-NEXT: pophs {r11, pc} > ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; ENABLE-NEXT: movw r12, :lower16:skip > -; ENABLE-NEXT: sub r1, r1, #1 > +; ENABLE-NEXT: sub r3, r1, #1 > ; ENABLE-NEXT: movt r12, :upper16:skip > ; ENABLE-NEXT: .LBB0_4: @ %while.body > ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; ENABLE-NEXT: ldrb r3, [r0] > -; ENABLE-NEXT: ldrb r3, [r12, r3] > -; ENABLE-NEXT: add r0, r0, r3 > -; ENABLE-NEXT: sub r3, r1, #1 > -; ENABLE-NEXT: cmp r3, r1 > +; ENABLE-NEXT: ldrb r1, [r0] > +; ENABLE-NEXT: ldrb r1, [r12, r1] > +; ENABLE-NEXT: add r0, r0, r1 > +; ENABLE-NEXT: sub r1, r3, #1 > +; ENABLE-NEXT: cmp r1, r3 > ; ENABLE-NEXT: bhs .LBB0_6 > ; ENABLE-NEXT: @ %bb.5: @ %while.body > ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; ENABLE-NEXT: cmp r0, r2 > -; ENABLE-NEXT: mov r1, r3 > +; ENABLE-NEXT: mov r3, r1 > ; ENABLE-NEXT: blo .LBB0_4 > ; ENABLE-NEXT: .LBB0_6: @ %if.end29 > ; ENABLE-NEXT: pop {r11, pc} > @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; DISABLE-NEXT: pophs {r11, pc} > ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; DISABLE-NEXT: movw r12, :lower16:skip > -; DISABLE-NEXT: sub r1, r1, #1 > +; DISABLE-NEXT: sub r3, r1, #1 > ; DISABLE-NEXT: movt r12, :upper16:skip > ; DISABLE-NEXT: .LBB0_4: @ %while.body > ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; DISABLE-NEXT: ldrb r3, [r0] > -; DISABLE-NEXT: ldrb r3, [r12, r3] > -; DISABLE-NEXT: add r0, r0, r3 > -; DISABLE-NEXT: sub r3, r1, #1 > -; DISABLE-NEXT: cmp r3, r1 > +; DISABLE-NEXT: ldrb r1, [r0] > +; DISABLE-NEXT: ldrb r1, [r12, r1] > +; DISABLE-NEXT: add r0, r0, r1 > +; DISABLE-NEXT: sub r1, r3, #1 > +; DISABLE-NEXT: cmp r1, r3 > ; DISABLE-NEXT: bhs .LBB0_6 > ; DISABLE-NEXT: @ %bb.5: @ %while.body > ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; DISABLE-NEXT: cmp r0, r2 > -; DISABLE-NEXT: mov r1, r3 > +; DISABLE-NEXT: mov r3, r1 > ; DISABLE-NEXT: blo .LBB0_4 > ; DISABLE-NEXT: .LBB0_6: @ %if.end29 > ; DISABLE-NEXT: pop {r11, pc} > diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > index 55157875d355..ea15fcc5c824 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: push {r4, r5, r11, lr} > ; SCALAR-NEXT: rsb r3, r2, #0 > ; SCALAR-NEXT: and r4, r2, #63 > -; SCALAR-NEXT: and lr, r3, #63 > -; SCALAR-NEXT: rsb r3, lr, #32 > +; SCALAR-NEXT: and r12, r3, #63 > +; SCALAR-NEXT: rsb r3, r12, #32 > ; SCALAR-NEXT: lsl r2, r0, r4 > -; SCALAR-NEXT: lsr r12, r0, lr > -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 > -; SCALAR-NEXT: subs r12, lr, #32 > -; SCALAR-NEXT: lsrpl r3, r1, r12 > +; SCALAR-NEXT: lsr lr, r0, r12 > +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 > +; SCALAR-NEXT: subs lr, r12, #32 > +; SCALAR-NEXT: lsrpl r3, r1, lr > ; SCALAR-NEXT: subs r5, r4, #32 > ; SCALAR-NEXT: movwpl r2, #0 > ; SCALAR-NEXT: cmp r5, #0 > @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: lsr r3, r0, r3 > ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 > ; SCALAR-NEXT: lslpl r3, r0, r5 > -; SCALAR-NEXT: lsr r0, r1, lr > -; SCALAR-NEXT: cmp r12, #0 > +; SCALAR-NEXT: lsr r0, r1, r12 > +; SCALAR-NEXT: cmp lr, #0 > ; SCALAR-NEXT: movwpl r0, #0 > ; SCALAR-NEXT: orr r1, r3, r0 > ; SCALAR-NEXT: mov r0, r2 > @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK: @ %bb.0: > ; CHECK-NEXT: .save {r4, r5, r11, lr} > ; CHECK-NEXT: push {r4, r5, r11, lr} > -; CHECK-NEXT: and lr, r2, #63 > +; CHECK-NEXT: and r12, r2, #63 > ; CHECK-NEXT: rsb r2, r2, #0 > -; CHECK-NEXT: rsb r3, lr, #32 > +; CHECK-NEXT: rsb r3, r12, #32 > ; CHECK-NEXT: and r4, r2, #63 > -; CHECK-NEXT: lsr r12, r0, lr > -; CHECK-NEXT: orr r3, r12, r1, lsl r3 > -; CHECK-NEXT: subs r12, lr, #32 > +; CHECK-NEXT: lsr lr, r0, r12 > +; CHECK-NEXT: orr r3, lr, r1, lsl r3 > +; CHECK-NEXT: subs lr, r12, #32 > ; CHECK-NEXT: lsl r2, r0, r4 > -; CHECK-NEXT: lsrpl r3, r1, r12 > +; CHECK-NEXT: lsrpl r3, r1, lr > ; CHECK-NEXT: subs r5, r4, #32 > ; CHECK-NEXT: movwpl r2, #0 > ; CHECK-NEXT: cmp r5, #0 > @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK-NEXT: lsr r3, r0, r3 > ; CHECK-NEXT: orr r3, r3, r1, lsl r4 > ; CHECK-NEXT: lslpl r3, r0, r5 > -; CHECK-NEXT: lsr r0, r1, lr > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: lsr r0, r1, r12 > +; CHECK-NEXT: cmp lr, #0 > ; CHECK-NEXT: movwpl r0, #0 > ; CHECK-NEXT: orr r1, r0, r3 > ; CHECK-NEXT: mov r0, r2 > diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll > index 54c93b493c98..6372f9be2ca3 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll > @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { > ; CHECK-NEXT: mov r3, #0 > ; CHECK-NEXT: bl __aeabi_uldivmod > ; CHECK-NEXT: add r0, r2, #27 > -; CHECK-NEXT: lsl r6, r6, #27 > -; CHECK-NEXT: and r1, r0, #63 > ; CHECK-NEXT: lsl r2, r7, #27 > +; CHECK-NEXT: and r12, r0, #63 > +; CHECK-NEXT: lsl r6, r6, #27 > ; CHECK-NEXT: orr r7, r6, r7, lsr #5 > +; CHECK-NEXT: rsb r3, r12, #32 > +; CHECK-NEXT: lsr r2, r2, r12 > ; CHECK-NEXT: mov r6, #63 > -; CHECK-NEXT: rsb r3, r1, #32 > -; CHECK-NEXT: lsr r2, r2, r1 > -; CHECK-NEXT: subs r12, r1, #32 > -; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: orr r2, r2, r7, lsl r3 > +; CHECK-NEXT: subs r3, r12, #32 > +; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: lsl r5, r9, #1 > -; CHECK-NEXT: lsrpl r2, r7, r12 > +; CHECK-NEXT: lsrpl r2, r7, r3 > +; CHECK-NEXT: subs r1, r6, #32 > ; CHECK-NEXT: lsl r0, r5, r6 > -; CHECK-NEXT: subs r4, r6, #32 > -; CHECK-NEXT: lsl r3, r8, #1 > +; CHECK-NEXT: lsl r4, r8, #1 > ; CHECK-NEXT: movwpl r0, #0 > -; CHECK-NEXT: orr r3, r3, r9, lsr #31 > +; CHECK-NEXT: orr r4, r4, r9, lsr #31 > ; CHECK-NEXT: orr r0, r0, r2 > ; CHECK-NEXT: rsb r2, r6, #32 > -; CHECK-NEXT: cmp r4, #0 > -; CHECK-NEXT: lsr r1, r7, r1 > +; CHECK-NEXT: cmp r1, #0 > ; CHECK-NEXT: lsr r2, r5, r2 > -; CHECK-NEXT: orr r2, r2, r3, lsl r6 > -; CHECK-NEXT: lslpl r2, r5, r4 > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: orr r2, r2, r4, lsl r6 > +; CHECK-NEXT: lslpl r2, r5, r1 > +; CHECK-NEXT: lsr r1, r7, r12 > +; CHECK-NEXT: cmp r3, #0 > ; CHECK-NEXT: movwpl r1, #0 > ; CHECK-NEXT: orr r1, r2, r1 > ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} > diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > index 2922e0ed5423..0a0bb62b0a09 100644 > --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { > ; BE-LABEL: i56_or: > ; BE: @ %bb.0: > ; BE-NEXT: mov r1, r0 > -; BE-NEXT: ldr r12, [r0] > ; BE-NEXT: ldrh r2, [r1, #4]! > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: orr r2, r3, r2, lsl #8 > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: strb r2, [r1, #2] > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: strb r12, [r1, #2] > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > %aa = load i56, i56* %a > @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: strb r2, [r1, #2] > ; BE-NEXT: orr r2, r3, r12, lsl #8 > -; BE-NEXT: ldr r12, [r0] > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > > diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll > index 09a991da2e59..46490efb6631 100644 > --- a/llvm/test/CodeGen/ARM/neon-copy.ll > +++ b/llvm/test/CodeGen/ARM/neon-copy.ll > @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { > ; CHECK-NEXT: .pad #8 > ; CHECK-NEXT: sub sp, sp, #8 > ; CHECK-NEXT: vmov.u16 r1, d0[1] > -; CHECK-NEXT: and r0, r0, #3 > +; CHECK-NEXT: and r12, r0, #3 > ; CHECK-NEXT: vmov.u16 r2, d0[2] > -; CHECK-NEXT: mov r3, sp > -; CHECK-NEXT: vmov.u16 r12, d0[3] > -; CHECK-NEXT: orr r0, r3, r0, lsl #1 > +; CHECK-NEXT: mov r0, sp > +; CHECK-NEXT: vmov.u16 r3, d0[3] > +; CHECK-NEXT: orr r0, r0, r12, lsl #1 > ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] > ; CHECK-NEXT: vldr d0, [sp] > ; CHECK-NEXT: vmov.16 d0[1], r1 > ; CHECK-NEXT: vmov.16 d0[2], r2 > -; CHECK-NEXT: vmov.16 d0[3], r12 > +; CHECK-NEXT: vmov.16 d0[3], r3 > ; CHECK-NEXT: add sp, sp, #8 > ; CHECK-NEXT: bx lr > %tmp = extractelement <8 x i16> %x, i32 0 > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > index 8be7100d368b..a125446b27c3 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $2, $6 > +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 76($sp) > -; MMR3-NEXT: srlv $4, $7, $16 > -; MMR3-NEXT: not16 $3, $16 > -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sll16 $2, $6, 1 > -; MMR3-NEXT: sllv $3, $2, $3 > -; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: srlv $6, $6, $16 > -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: subu16 $7, $2, $16 > +; MMR3-NEXT: srlv $3, $7, $16 > +; MMR3-NEXT: not16 $6, $16 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sll16 $2, $2, 1 > +; MMR3-NEXT: sllv $2, $2, $6 > +; MMR3-NEXT: li16 $6, 64 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: srlv $4, $4, $16 > +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill > +; MMR3-NEXT: subu16 $7, $6, $16 > ; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: andi16 $2, $7, 32 > -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $5, $16, 32 > -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill > -; MMR3-NEXT: move $4, $9 > +; MMR3-NEXT: andi16 $5, $7, 32 > +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill > +; MMR3-NEXT: andi16 $6, $16, 32 > +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $3, $9 > ; MMR3-NEXT: li16 $17, 0 > -; MMR3-NEXT: movn $4, $17, $2 > -; MMR3-NEXT: movn $3, $6, $5 > -; MMR3-NEXT: addiu $2, $16, -64 > -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srlv $5, $5, $2 > -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $17, 1 > -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > -; MMR3-NEXT: not16 $5, $2 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $4 > -; MMR3-NEXT: srav $1, $17, $2 > -; MMR3-NEXT: andi16 $2, $2, 32 > -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $2 > -; MMR3-NEXT: sllv $2, $17, $7 > -; MMR3-NEXT: not16 $4, $7 > -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $6, $7, 1 > -; MMR3-NEXT: srlv $6, $6, $4 > +; MMR3-NEXT: movn $3, $17, $5 > +; MMR3-NEXT: movn $2, $4, $6 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $4, $17, $4 > +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $6, 1 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $3 > +; MMR3-NEXT: addiu $3, $16, -64 > +; MMR3-NEXT: srav $1, $6, $3 > +; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: sllv $3, $6, $7 > +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: not16 $3, $7 > +; MMR3-NEXT: srl16 $4, $17, 1 > +; MMR3-NEXT: srlv $3, $4, $3 > ; MMR3-NEXT: sltiu $10, $16, 64 > -; MMR3-NEXT: movn $5, $3, $10 > -; MMR3-NEXT: or16 $6, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload > -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sllv $3, $4, $3 > +; MMR3-NEXT: movn $5, $2, $10 > +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload > ; MMR3-NEXT: or16 $3, $2 > -; MMR3-NEXT: srav $11, $17, $16 > -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $11, $4 > -; MMR3-NEXT: sra $2, $17, 31 > +; MMR3-NEXT: srlv $2, $17, $16 > +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sllv $17, $7, $4 > +; MMR3-NEXT: or16 $17, $2 > +; MMR3-NEXT: srav $11, $6, $16 > +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $17, $11, $2 > +; MMR3-NEXT: sra $2, $6, 31 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: move $8, $2 > -; MMR3-NEXT: movn $8, $3, $10 > -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $6, $9, $3 > -; MMR3-NEXT: li16 $3, 0 > -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $7, $3, $4 > -; MMR3-NEXT: or16 $7, $6 > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: movn $4, $17, $10 > +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $9, $6 > +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $3 > ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movn $1, $2, $3 > ; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $3, $16 > -; MMR3-NEXT: movn $11, $2, $4 > +; MMR3-NEXT: movn $11, $2, $6 > ; MMR3-NEXT: movn $2, $11, $10 > -; MMR3-NEXT: move $3, $8 > +; MMR3-NEXT: move $3, $4 > ; MMR3-NEXT: move $4, $1 > ; MMR3-NEXT: lwp $16, 40($sp) > ; MMR3-NEXT: addiusp 48 > @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > -; MMR6-NEXT: move $1, $7 > +; MMR6-NEXT: move $12, $7 > ; MMR6-NEXT: lw $3, 44($sp) > ; MMR6-NEXT: li16 $2, 64 > -; MMR6-NEXT: subu16 $7, $2, $3 > -; MMR6-NEXT: sllv $8, $5, $7 > -; MMR6-NEXT: andi16 $2, $7, 32 > -; MMR6-NEXT: selnez $9, $8, $2 > -; MMR6-NEXT: sllv $10, $4, $7 > -; MMR6-NEXT: not16 $7, $7 > -; MMR6-NEXT: srl16 $16, $5, 1 > -; MMR6-NEXT: srlv $7, $16, $7 > -; MMR6-NEXT: or $7, $10, $7 > -; MMR6-NEXT: seleqz $7, $7, $2 > -; MMR6-NEXT: or $7, $9, $7 > -; MMR6-NEXT: srlv $9, $1, $3 > -; MMR6-NEXT: not16 $16, $3 > -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: subu16 $16, $2, $3 > +; MMR6-NEXT: sllv $1, $5, $16 > +; MMR6-NEXT: andi16 $2, $16, 32 > +; MMR6-NEXT: selnez $8, $1, $2 > +; MMR6-NEXT: sllv $9, $4, $16 > +; MMR6-NEXT: not16 $16, $16 > +; MMR6-NEXT: srl16 $17, $5, 1 > +; MMR6-NEXT: srlv $10, $17, $16 > +; MMR6-NEXT: or $9, $9, $10 > +; MMR6-NEXT: seleqz $9, $9, $2 > +; MMR6-NEXT: or $8, $8, $9 > +; MMR6-NEXT: srlv $9, $7, $3 > +; MMR6-NEXT: not16 $7, $3 > +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $17, $6, 1 > -; MMR6-NEXT: sllv $10, $17, $16 > +; MMR6-NEXT: sllv $10, $17, $7 > ; MMR6-NEXT: or $9, $10, $9 > ; MMR6-NEXT: andi16 $17, $3, 32 > ; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: srlv $10, $6, $3 > ; MMR6-NEXT: selnez $11, $10, $17 > ; MMR6-NEXT: seleqz $10, $10, $17 > -; MMR6-NEXT: or $10, $10, $7 > -; MMR6-NEXT: seleqz $12, $8, $2 > -; MMR6-NEXT: or $8, $11, $9 > +; MMR6-NEXT: or $8, $10, $8 > +; MMR6-NEXT: seleqz $1, $1, $2 > +; MMR6-NEXT: or $9, $11, $9 > ; MMR6-NEXT: addiu $2, $3, -64 > -; MMR6-NEXT: srlv $9, $5, $2 > +; MMR6-NEXT: srlv $10, $5, $2 > ; MMR6-NEXT: sll16 $7, $4, 1 > ; MMR6-NEXT: not16 $16, $2 > ; MMR6-NEXT: sllv $11, $7, $16 > ; MMR6-NEXT: sltiu $13, $3, 64 > -; MMR6-NEXT: or $8, $8, $12 > -; MMR6-NEXT: selnez $10, $10, $13 > -; MMR6-NEXT: or $9, $11, $9 > -; MMR6-NEXT: srav $11, $4, $2 > +; MMR6-NEXT: or $1, $9, $1 > +; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $9, $11, $10 > +; MMR6-NEXT: srav $10, $4, $2 > ; MMR6-NEXT: andi16 $2, $2, 32 > -; MMR6-NEXT: seleqz $12, $11, $2 > +; MMR6-NEXT: seleqz $11, $10, $2 > ; MMR6-NEXT: sra $14, $4, 31 > ; MMR6-NEXT: selnez $15, $14, $2 > ; MMR6-NEXT: seleqz $9, $9, $2 > -; MMR6-NEXT: or $12, $15, $12 > -; MMR6-NEXT: seleqz $12, $12, $13 > -; MMR6-NEXT: selnez $2, $11, $2 > -; MMR6-NEXT: seleqz $11, $14, $13 > -; MMR6-NEXT: or $10, $10, $12 > -; MMR6-NEXT: selnez $10, $10, $3 > -; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $11, $15, $11 > +; MMR6-NEXT: seleqz $11, $11, $13 > +; MMR6-NEXT: selnez $2, $10, $2 > +; MMR6-NEXT: seleqz $10, $14, $13 > +; MMR6-NEXT: or $8, $8, $11 > +; MMR6-NEXT: selnez $8, $8, $3 > +; MMR6-NEXT: selnez $1, $1, $13 > ; MMR6-NEXT: or $2, $2, $9 > ; MMR6-NEXT: srav $9, $4, $3 > ; MMR6-NEXT: seleqz $4, $9, $17 > -; MMR6-NEXT: selnez $12, $14, $17 > -; MMR6-NEXT: or $4, $12, $4 > -; MMR6-NEXT: selnez $12, $4, $13 > +; MMR6-NEXT: selnez $11, $14, $17 > +; MMR6-NEXT: or $4, $11, $4 > +; MMR6-NEXT: selnez $11, $4, $13 > ; MMR6-NEXT: seleqz $2, $2, $13 > ; MMR6-NEXT: seleqz $4, $6, $3 > -; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $8, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > +; MMR6-NEXT: seleqz $6, $12, $3 > ; MMR6-NEXT: or $1, $1, $2 > -; MMR6-NEXT: or $4, $4, $10 > -; MMR6-NEXT: or $2, $12, $11 > -; MMR6-NEXT: srlv $3, $5, $3 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $5, $7, $5 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: seleqz $3, $3, $17 > -; MMR6-NEXT: selnez $5, $9, $17 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: selnez $3, $3, $13 > -; MMR6-NEXT: or $3, $3, $11 > +; MMR6-NEXT: selnez $1, $1, $3 > +; MMR6-NEXT: or $1, $6, $1 > +; MMR6-NEXT: or $4, $4, $8 > +; MMR6-NEXT: or $6, $11, $10 > +; MMR6-NEXT: srlv $2, $5, $3 > +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $3, $7, $3 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: seleqz $2, $2, $17 > +; MMR6-NEXT: selnez $3, $9, $17 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: selnez $2, $2, $13 > +; MMR6-NEXT: or $3, $2, $10 > +; MMR6-NEXT: move $2, $6 > ; MMR6-NEXT: move $5, $1 > ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload > ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > index ed2bfc9fcf60..e4b4b3ae1d0f 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 68($sp) > ; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: subu16 $7, $2, $16 > -; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: move $17, $5 > -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $3, $7, 32 > +; MMR3-NEXT: subu16 $17, $2, $16 > +; MMR3-NEXT: sllv $9, $5, $17 > +; MMR3-NEXT: andi16 $3, $17, 32 > ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > ; MMR3-NEXT: li16 $2, 0 > ; MMR3-NEXT: move $4, $9 > ; MMR3-NEXT: movn $4, $2, $3 > -; MMR3-NEXT: srlv $5, $8, $16 > +; MMR3-NEXT: srlv $5, $7, $16 > ; MMR3-NEXT: not16 $3, $16 > ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sll16 $2, $6, 1 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sllv $2, $2, $3 > ; MMR3-NEXT: or16 $2, $5 > -; MMR3-NEXT: srlv $5, $6, $16 > -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: srlv $7, $6, $16 > ; MMR3-NEXT: andi16 $3, $16, 32 > ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $2, $5, $3 > +; MMR3-NEXT: movn $2, $7, $3 > ; MMR3-NEXT: addiu $3, $16, -64 > ; MMR3-NEXT: or16 $2, $4 > -; MMR3-NEXT: srlv $4, $17, $3 > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $4, 1 > -; MMR3-NEXT: not16 $5, $3 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $17 > -; MMR3-NEXT: srlv $1, $4, $3 > -; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $3, $6, $3 > ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $3, 1 > +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $4 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: srlv $1, $3, $4 > +; MMR3-NEXT: andi16 $4, $4, 32 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $4 > ; MMR3-NEXT: sltiu $10, $16, 64 > ; MMR3-NEXT: movn $5, $2, $10 > -; MMR3-NEXT: sllv $2, $4, $7 > -; MMR3-NEXT: not16 $3, $7 > -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $4, $7, 1 > +; MMR3-NEXT: sllv $2, $3, $17 > +; MMR3-NEXT: not16 $3, $17 > +; MMR3-NEXT: srl16 $4, $6, 1 > ; MMR3-NEXT: srlv $4, $4, $3 > ; MMR3-NEXT: or16 $4, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > +; MMR3-NEXT: srlv $2, $6, $16 > ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload > ; MMR3-NEXT: sllv $3, $6, $3 > ; MMR3-NEXT: or16 $3, $2 > ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload > ; MMR3-NEXT: srlv $2, $2, $16 > -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $2, $17 > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $2, $6 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: li16 $6, 0 > -; MMR3-NEXT: movz $3, $6, $10 > -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $4, $9, $7 > -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $6, $7, $17 > -; MMR3-NEXT: or16 $6, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movz $3, $17, $10 > +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $4, $9, $17 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $4 > ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $1, $7, $4 > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $1, $6, $10 > +; MMR3-NEXT: movn $1, $17, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $4, $16 > -; MMR3-NEXT: movn $2, $7, $17 > +; MMR3-NEXT: movn $2, $17, $6 > ; MMR3-NEXT: li16 $4, 0 > ; MMR3-NEXT: movz $2, $4, $10 > ; MMR3-NEXT: move $4, $1 > @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; > ; MMR6-LABEL: lshr_i128: > ; MMR6: # %bb.0: # %entry > -; MMR6-NEXT: addiu $sp, $sp, -32 > -; MMR6-NEXT: .cfi_def_cfa_offset 32 > -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill > -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill > +; MMR6-NEXT: addiu $sp, $sp, -24 > +; MMR6-NEXT: .cfi_def_cfa_offset 24 > +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > ; MMR6-NEXT: move $1, $7 > -; MMR6-NEXT: move $7, $5 > -; MMR6-NEXT: lw $3, 60($sp) > +; MMR6-NEXT: move $7, $4 > +; MMR6-NEXT: lw $3, 52($sp) > ; MMR6-NEXT: srlv $2, $1, $3 > -; MMR6-NEXT: not16 $5, $3 > -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill > -; MMR6-NEXT: move $17, $6 > -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $3 > +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > +; MMR6-NEXT: move $4, $6 > +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $6, $6, 1 > -; MMR6-NEXT: sllv $6, $6, $5 > +; MMR6-NEXT: sllv $6, $6, $16 > ; MMR6-NEXT: or $8, $6, $2 > -; MMR6-NEXT: addiu $5, $3, -64 > -; MMR6-NEXT: srlv $9, $7, $5 > -; MMR6-NEXT: move $6, $4 > -; MMR6-NEXT: sll16 $2, $4, 1 > -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill > -; MMR6-NEXT: not16 $16, $5 > +; MMR6-NEXT: addiu $6, $3, -64 > +; MMR6-NEXT: srlv $9, $5, $6 > +; MMR6-NEXT: sll16 $2, $7, 1 > +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $6 > ; MMR6-NEXT: sllv $10, $2, $16 > ; MMR6-NEXT: andi16 $16, $3, 32 > ; MMR6-NEXT: seleqz $8, $8, $16 > ; MMR6-NEXT: or $9, $10, $9 > -; MMR6-NEXT: srlv $10, $17, $3 > +; MMR6-NEXT: srlv $10, $4, $3 > ; MMR6-NEXT: selnez $11, $10, $16 > ; MMR6-NEXT: li16 $17, 64 > ; MMR6-NEXT: subu16 $2, $17, $3 > -; MMR6-NEXT: sllv $12, $7, $2 > -; MMR6-NEXT: move $17, $7 > +; MMR6-NEXT: sllv $12, $5, $2 > ; MMR6-NEXT: andi16 $4, $2, 32 > -; MMR6-NEXT: andi16 $7, $5, 32 > -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill > -; MMR6-NEXT: seleqz $9, $9, $7 > +; MMR6-NEXT: andi16 $17, $6, 32 > +; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: seleqz $13, $12, $4 > ; MMR6-NEXT: or $8, $11, $8 > ; MMR6-NEXT: selnez $11, $12, $4 > -; MMR6-NEXT: sllv $12, $6, $2 > -; MMR6-NEXT: move $7, $6 > -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sllv $12, $7, $2 > ; MMR6-NEXT: not16 $2, $2 > -; MMR6-NEXT: srl16 $6, $17, 1 > +; MMR6-NEXT: srl16 $6, $5, 1 > ; MMR6-NEXT: srlv $2, $6, $2 > ; MMR6-NEXT: or $2, $12, $2 > ; MMR6-NEXT: seleqz $2, $2, $4 > -; MMR6-NEXT: srlv $4, $7, $5 > -; MMR6-NEXT: or $11, $11, $2 > -; MMR6-NEXT: or $5, $8, $13 > -; MMR6-NEXT: srlv $6, $17, $3 > -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: selnez $7, $4, $2 > -; MMR6-NEXT: sltiu $8, $3, 64 > -; MMR6-NEXT: selnez $12, $5, $8 > -; MMR6-NEXT: or $7, $7, $9 > -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: addiu $4, $3, -64 > +; MMR6-NEXT: srlv $4, $7, $4 > +; MMR6-NEXT: or $12, $11, $2 > +; MMR6-NEXT: or $6, $8, $13 > +; MMR6-NEXT: srlv $5, $5, $3 > +; MMR6-NEXT: selnez $8, $4, $17 > +; MMR6-NEXT: sltiu $11, $3, 64 > +; MMR6-NEXT: selnez $13, $6, $11 > +; MMR6-NEXT: or $8, $8, $9 > ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $9, $2, $5 > +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $9, $6, $2 > ; MMR6-NEXT: seleqz $10, $10, $16 > -; MMR6-NEXT: li16 $5, 0 > -; MMR6-NEXT: or $10, $10, $11 > -; MMR6-NEXT: or $6, $9, $6 > -; MMR6-NEXT: seleqz $2, $7, $8 > -; MMR6-NEXT: seleqz $7, $5, $8 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: srlv $9, $5, $3 > -; MMR6-NEXT: seleqz $11, $9, $16 > -; MMR6-NEXT: selnez $11, $11, $8 > +; MMR6-NEXT: li16 $2, 0 > +; MMR6-NEXT: or $10, $10, $12 > +; MMR6-NEXT: or $9, $9, $5 > +; MMR6-NEXT: seleqz $5, $8, $11 > +; MMR6-NEXT: seleqz $8, $2, $11 > +; MMR6-NEXT: srlv $7, $7, $3 > +; MMR6-NEXT: seleqz $2, $7, $16 > +; MMR6-NEXT: selnez $2, $2, $11 > ; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $12, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > -; MMR6-NEXT: or $5, $1, $2 > -; MMR6-NEXT: or $2, $7, $11 > -; MMR6-NEXT: seleqz $1, $6, $16 > -; MMR6-NEXT: selnez $6, $9, $16 > -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $9, $16, $3 > -; MMR6-NEXT: selnez $10, $10, $8 > -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $4, $4, $16 > -; MMR6-NEXT: seleqz $4, $4, $8 > -; MMR6-NEXT: or $4, $10, $4 > +; MMR6-NEXT: or $5, $13, $5 > +; MMR6-NEXT: selnez $5, $5, $3 > +; MMR6-NEXT: or $5, $1, $5 > +; MMR6-NEXT: or $2, $8, $2 > +; MMR6-NEXT: seleqz $1, $9, $16 > +; MMR6-NEXT: selnez $6, $7, $16 > +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: seleqz $7, $7, $3 > +; MMR6-NEXT: selnez $9, $10, $11 > +; MMR6-NEXT: seleqz $4, $4, $17 > +; MMR6-NEXT: seleqz $4, $4, $11 > </cut>

4 years, 9 months

2
7
0 0

[TCWG CI] 482.sphinx3 slowed down by 4% after gcc: tree-optimization/65206 - dependence analysis on mixed pointer/array

by ci_notify＠linaro.org

After gcc commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> tree-optimization/65206 - dependence analysis on mixed pointer/array the following benchmarks slowed down by more than 2%: - 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach f92901a508305f291fcf2acae0825379477724de ../artifacts/test.sh # Reproduce last_good build git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> Date: Wed Sep 8 14:42:31 2021 +0200 tree-optimization/65206 - dependence analysis on mixed pointer/array This adds the capability to analyze the dependence of mixed pointer/array accesses. The example is from where using a masked load/store creates the pointer-based access when an otherwise unconditional access is array based. Other examples would include accesses to an array mixed with accesses from inlined helpers that work on pointers. The idea is quite simple and old - analyze the data-ref indices as if the reference was pointer-based. The following change does this by changing dr_analyze_indices to work on the indices sub-structure and storing an alternate indices substructure in each data reference. That alternate set of indices is analyzed lazily by initialize_data_dependence_relation when it fails to match-up the main set of indices of two data references. initialize_data_dependence_relation is refactored into a head and a tail worker and changed to work on one of the indices structures and thus away from using DR_* access macros which continue to reference the main indices substructure. There are quite some vectorization and loop distribution opportunities unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and 544.nab_r see amendments in what they report with -fopt-info-loop while the rest of the specrate set sees no changes there. Measuring runtime for the set where changes were reported reveals nothing off-noise besides 511.povray_r which seems to regress slightly for me (on a Zen2 machine with -Ofast -march=native). 2021-09-08 Richard Biener <rguenther(a)suse.de> PR tree-optimization/65206 * tree-data-ref.h (struct data_reference): Add alt_indices, order it last. * tree-data-ref.c (free_data_ref): Release alt_indices. (dr_analyze_indices): Work on struct indices and get DR_REF as tree. (create_data_ref): Adjust. (initialize_data_dependence_relation): Split into head and tail. When the base objects fail to match up try again with pointer-based analysis of indices. * tree-vectorizer.c (vec_info_shared::check_datarefs): Do not compare the lazily computed alternate set of indices. * gcc.dg/torture/20210916.c: New testcase. * gcc.dg/vect/pr65206.c: Likewise. --- gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++ gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++ gcc/tree-data-ref.c | 174 +++++++++++++++++++++----------- gcc/tree-data-ref.h | 9 +- gcc/tree-vectorizer.c | 3 +- 5 files changed, 168 insertions(+), 60 deletions(-) diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c new file mode 100644 index 00000000000..0ea6d45e463 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/20210916.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +typedef union tree_node *tree; +struct tree_base { + unsigned : 1; + unsigned lang_flag_2 : 1; +}; +struct tree_type { + tree main_variant; +}; +union tree_node { + struct tree_base base; + struct tree_type type; +}; +tree finish_struct_t, finish_struct_x; +void finish_struct() +{ + for (; finish_struct_t->type.main_variant;) + finish_struct_x->base.lang_flag_2 = 0; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c new file mode 100644 index 00000000000..3b6262622c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */ +/* { dg-additional-options "-mavx" { target avx } } */ + +#define N 1024 + +double a[N], b[N]; + +void foo () +{ + for (int i = 0; i < N; ++i) + if (b[i] < 3.) + a[i] += b[i]; +} + +/* We get a .MASK_STORE because while the load of a[i] does not trap + the store would introduce store data races. Make sure we still + can handle the data dependence with zero distance. */ + +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */ diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e061baa7c20..18307a554fc 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "vr-values.h" #include "range-op.h" +#include "tree-ssa-loop-ivopts.h" static struct datadep_stats { @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base) DR, analyzed in LOOP and instantiated before NEST. */ static void -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop) { - vec<tree> access_fns = vNULL; - tree ref, op; - tree base, off, access_fn; - /* If analyzing a basic-block there are no indices to analyze and thus no access functions. */ if (!nest) { - DR_BASE_OBJECT (dr) = DR_REF (dr); - DR_ACCESS_FNS (dr).create (0); + dri->base_object = ref; + dri->access_fns.create (0); return; } - ref = DR_REF (dr); + vec<tree> access_fns = vNULL; /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses into a two element array with a constant index. The base is @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) { if (TREE_CODE (ref) == ARRAY_REF) { - op = TREE_OPERAND (ref, 1); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 1); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); access_fns.safe_push (access_fn); } @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) analyzed nest, add it as an additional independent access-function. */ if (TREE_CODE (ref) == MEM_REF) { - op = TREE_OPERAND (ref, 0); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 0); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC) { - tree orig_type; tree memoff = TREE_OPERAND (ref, 1); - base = initial_condition (access_fn); - orig_type = TREE_TYPE (base); + tree base = initial_condition (access_fn); + tree orig_type = TREE_TYPE (base); STRIP_USELESS_TYPE_CONVERSION (base); + tree off; split_constant_offset (base, &base, &off); STRIP_USELESS_TYPE_CONVERSION (base); /* Fold the MEM_REF offset into the evolutions initial @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) base, memoff); MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old); MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old); - DR_UNCONSTRAINED_BASE (dr) = true; + dri->unconstrained_base = true; access_fns.safe_push (access_fn); } } @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) build_int_cst (reference_alias_ptr_type (ref), 0)); } - DR_BASE_OBJECT (dr) = ref; - DR_ACCESS_FNS (dr) = access_fns; + dri->base_object = ref; + dri->access_fns = access_fns; } /* Extracts the alias analysis information from the memory reference DR. */ @@ -1463,6 +1460,8 @@ void free_data_ref (data_reference_p dr) { DR_ACCESS_FNS (dr).release (); + if (dr->alt_indices.base_object) + dr->alt_indices.access_fns.release (); free (dr); } @@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt, dr_analyze_innermost (&DR_INNERMOST (dr), memref, nest != NULL ? loop : NULL, stmt); - dr_analyze_indices (dr, nest, loop); + dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop); dr_analyze_alias (dr); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b) TREE_TYPE (TREE_OPERAND (ref_b, 0))); } -/* Initialize a data dependence relation between data accesses A and - B. NB_LOOPS is the number of loops surrounding the references: the - size of the classic distance/direction vectors. */ +/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES + is true when the main indices of A and B were not comparable so we try again + with alternate indices computed on an indirect reference. */ struct data_dependence_relation * -initialize_data_dependence_relation (struct data_reference *a, - struct data_reference *b, - vec<loop_p> loop_nest) +initialize_data_dependence_relation (struct data_dependence_relation *res, + vec<loop_p> loop_nest, + bool use_alt_indices) { - struct data_dependence_relation *res; + struct data_reference *a = DDR_A (res); + struct data_reference *b = DDR_B (res); unsigned int i; - res = XCNEW (struct data_dependence_relation); - DDR_A (res) = a; - DDR_B (res) = b; - DDR_LOOP_NEST (res).create (0); - DDR_SUBSCRIPTS (res).create (0); - DDR_DIR_VECTS (res).create (0); - DDR_DIST_VECTS (res).create (0); - - if (a == NULL || b == NULL) + struct indices *indices_a = &a->indices; + struct indices *indices_b = &b->indices; + if (use_alt_indices) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (TREE_CODE (DR_REF (a)) != MEM_REF) + indices_a = &a->alt_indices; + if (TREE_CODE (DR_REF (b)) != MEM_REF) + indices_b = &b->alt_indices; } - - /* If the data references do not alias, then they are independent. */ - if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) - { - DDR_ARE_DEPENDENT (res) = chrec_known; - return res; - } - - unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); - unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + unsigned int num_dimensions_a = indices_a->access_fns.length (); + unsigned int num_dimensions_b = indices_b->access_fns.length (); if (num_dimensions_a == 0 || num_dimensions_b == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; @@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a, the a and b accesses have a single ARRAY_REF component reference [0] but have two subscripts. */ - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) num_dimensions_a -= 1; - if (DR_UNCONSTRAINED_BASE (b)) + if (indices_b->unconstrained_base) num_dimensions_b -= 1; /* These structures describe sequences of component references in @@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a, B: [3, 4] (i.e. s.e) */ while (index_a < num_dimensions_a && index_b < num_dimensions_b) { + /* The alternate indices form always has a single dimension + with unconstrained base. */ + gcc_assert (!use_alt_indices); + /* REF_A and REF_B must be one of the component access types allowed by dr_analyze_indices. */ gcc_checking_assert (access_fn_component_p (ref_a)); @@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a, /* See whether FULL_SEQ ends at the base and whether the two bases are equal. We do not care about TBAA or alignment info so we can use OEP_ADDRESS_OF to avoid false negatives. */ - tree base_a = DR_BASE_OBJECT (a); - tree base_b = DR_BASE_OBJECT (b); + tree base_a = indices_a->base_object; + tree base_b = indices_b->base_object; bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a && full_seq.start_b + full_seq.length == num_dimensions_b - && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && (indices_a->unconstrained_base + == indices_b->unconstrained_base) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) && (types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) @@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a, both lvalues are distinct from the object's declared type. */ if (same_base_p) { - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) full_seq.length += 1; } else @@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a, /* Punt if we didn't find a suitable sequence. */ if (full_seq.length == 0) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (use_alt_indices + || (TREE_CODE (DR_REF (a)) == MEM_REF + && TREE_CODE (DR_REF (b)) == MEM_REF) + || may_be_nonaddressable_p (DR_REF (a)) + || may_be_nonaddressable_p (DR_REF (b))) + { + /* Fully exhausted possibilities. */ + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* Try evaluating both DRs as dereferences of pointers. */ + if (!a->alt_indices.base_object + && TREE_CODE (DR_REF (a)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)), + build_int_cst + (reference_alias_ptr_type (DR_REF (a)), 0)); + dr_analyze_indices (&a->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (a))); + } + if (!b->alt_indices.base_object + && TREE_CODE (DR_REF (b)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)), + build_int_cst + (reference_alias_ptr_type (DR_REF (b)), 0)); + dr_analyze_indices (&b->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (b))); + } + return initialize_data_dependence_relation (res, loop_nest, true); } if (!same_base_p) @@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a, struct subscript *subscript; subscript = XNEW (struct subscript); - SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); - SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); + SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i]; + SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i]; SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a, return res; } +/* Initialize a data dependence relation between data accesses A and + B. NB_LOOPS is the number of loops surrounding the references: the + size of the classic distance/direction vectors. */ + +struct data_dependence_relation * +initialize_data_dependence_relation (struct data_reference *a, + struct data_reference *b, + vec<loop_p> loop_nest) +{ + data_dependence_relation *res = XCNEW (struct data_dependence_relation); + DDR_A (res) = a; + DDR_B (res) = b; + DDR_LOOP_NEST (res).create (0); + DDR_SUBSCRIPTS (res).create (0); + DDR_DIR_VECTS (res).create (0); + DDR_DIST_VECTS (res).create (0); + + if (a == NULL || b == NULL) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* If the data references do not alias, then they are independent. */ + if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) + { + DDR_ARE_DEPENDENT (res) = chrec_known; + return res; + } + + return initialize_data_dependence_relation (res, loop_nest, false); +} + + /* Frees memory used by the conflict function F. */ static void diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 685f33d85ae..74f579c9f3f 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -166,14 +166,19 @@ struct data_reference and runs to completion. */ bool is_conditional_in_stmt; + /* Alias information for the data reference. */ + struct dr_alias alias; + /* Behavior of the memory reference in the innermost loop. */ struct innermost_loop_behavior innermost; /* Subscripts of this data reference. */ struct indices indices; - /* Alias information for the data reference. */ - struct dr_alias alias; + /* Alternate subscripts initialized lazily and used by data-dependence + analysis only when the main indices of two DRs are not comparable. + Keep last to keep vec_info_shared::check_datarefs happy. */ + struct indices alt_indices; }; #define DR_STMT(DR) (DR)->stmt diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 3aa3e2a6783..20daa31187d 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs () return; gcc_assert (datarefs.length () == datarefs_copy.length ()); for (unsigned i = 0; i < datarefs.length (); ++i) - if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0) + if (memcmp (&datarefs_copy[i], datarefs[i], + offsetof (data_reference, alt_indices)) != 0) gcc_unreachable (); } </cut>

4 years, 9 months

3
2
0 0

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.: commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Factor predidacte analysis out of tree-ssa-uninit.c into its own module. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6240 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6999 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-mainline-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Reproduce builds: <cut> mkdir investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 cd investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 94c12ffac234b29a702aa7b6730f2678265857c8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 51166eb2c534692c3c7779def24f83c8c3811b98 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Sep 17 15:39:13 2021 -0600 Factor predidacte analysis out of tree-ssa-uninit.c into its own module. gcc/ChangeLog: * Makefile.in (OBJS): Add gimple-predicate-analysis.o. * tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis. (MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same. (check_defs): Add comment. (can_skip_redundant_opnd): Update comment. (compute_uninit_opnds_pos): Adjust to namespace change. (find_pdom): Move to gimple-predicate-analysis.cc. (find_dom): Same. (struct uninit_undef_val_t): New. (is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc. (find_control_equiv_block): Same. (MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same. (MAX_SWITCH_CASES): Same. (compute_control_dep_chain): Same. (find_uninit_use): Use predicate analyzer. (struct pred_info): Move to gimple-predicate-analysis. (convert_control_dep_chain_into_preds): Same. (find_predicates): Same. (collect_phi_def_edges): Same. (warn_uninitialized_phi): Use predicate analyzer. (find_def_preds): Move to gimple-predicate-analysis. (dump_pred_info): Same. (dump_pred_chain): Same. (dump_predicates): Same. (destroy_predicate_vecs): Remove. (execute_late_warn_uninitialized): New. (get_cmp_code): Move to gimple-predicate-analysis. (is_value_included_in): Same. (value_sat_pred_p): Same. (find_matching_predicate_in_rest_chains): Same. (is_use_properly_guarded): Same. (prune_uninit_phi_opnds): Same. (find_var_cmp_const): Same. (use_pred_not_overlap_with_undef_path_pred): Same. (pred_equal_p): Same. (is_neq_relop_p): Same. (is_neq_zero_form_p): Same. (pred_expr_equal_p): Same. (is_pred_expr_subset_of): Same. (is_pred_chain_subset_of): Same. (is_included_in): Same. (is_superset_of): Same. (pred_neg_p): Same. (simplify_pred): Same. (simplify_preds_2): Same. (simplify_preds_3): Same. (simplify_preds_4): Same. (simplify_preds): Same. (push_pred): Same. (push_to_worklist): Same. (get_pred_info_from_cmp): Same. (is_degenerated_phi): Same. (normalize_one_pred_1): Same. (normalize_one_pred): Same. (normalize_one_pred_chain): Same. (normalize_preds): Same. (can_one_predicate_be_invalidated_p): Same. (can_chain_union_be_invalidated_p): Same. (uninit_uses_cannot_happen): Same. (pass_late_warn_uninitialized::execute): Define. * gimple-predicate-analysis.cc: New file. * gimple-predicate-analysis.h: New file. --- gcc/Makefile.in | 1 + gcc/gimple-predicate-analysis.cc | 2400 +++++++++++++++++++++++++++++++++++++ gcc/gimple-predicate-analysis.h | 158 +++ gcc/tree-ssa-uninit.c | 2431 +++----------------------------------- 4 files changed, 2741 insertions(+), 2249 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index b8229adf580..f36ffa4740b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1394,6 +1394,7 @@ OBJS = \ gimple-loop-jam.o \ gimple-loop-versioning.o \ gimple-low.o \ + gimple-predicate-analysis.o \ gimple-pretty-print.o \ gimple-range.o \ gimple-range-cache.o \ diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc new file mode 100644 index 00000000000..3404f2d630a --- /dev/null +++ b/gcc/gimple-predicate-analysis.cc @@ -0,0 +1,2400 @@ +/* Support for simple predicate analysis. + + Copyright (C) 2001-2021 Free Software Foundation, Inc. + Contributed by Xinliang David Li <davidxl(a)google.com> + Generalized by Martin Sebor <msebor(a)redhat.com> + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#define INCLUDE_STRING +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-pretty-print.h" +#include "diagnostic-core.h" +#include "fold-const.h" +#include "gimple-iterator.h" +#include "tree-ssa.h" +#include "tree-cfg.h" +#include "cfghooks.h" +#include "attribs.h" +#include "builtins.h" +#include "calls.h" +#include "value-query.h" + +#include "gimple-predicate-analysis.h" + +#define DEBUG_PREDICATE_ANALYZER 1 + +/* Find the immediate postdominator of the specified basic block BB. */ + +static inline basic_block +find_pdom (basic_block bb) +{ + basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun); + if (bb == exit_bb) + return exit_bb; + + if (basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb)) + return pdom; + + return exit_bb; +} + +/* Find the immediate dominator of the specified basic block BB. */ + +static inline basic_block +find_dom (basic_block bb) +{ + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + if (bb == entry_bb) + return entry_bb; + + if (basic_block dom = get_immediate_dominator (CDI_DOMINATORS, bb)) + return dom; + + return entry_bb; +} + +/* Return true if BB1 is postdominating BB2 and BB1 is not a loop exit + bb. The loop exit bb check is simple and does not cover all cases. */ + +static bool +is_non_loop_exit_postdominating (basic_block bb1, basic_block bb2) +{ + if (!dominated_by_p (CDI_POST_DOMINATORS, bb2, bb1)) + return false; + + if (single_pred_p (bb1) && !single_succ_p (bb2)) + return false; + + return true; +} + +/* Find BB's closest postdominator that is its control equivalent (i.e., + that's controlled by the same predicate). */ + +static inline basic_block +find_control_equiv_block (basic_block bb) +{ + basic_block pdom = find_pdom (bb); + + /* Skip the postdominating bb that is also a loop exit. */ + if (!is_non_loop_exit_postdominating (pdom, bb)) + return NULL; + + /* If the postdominator is dominated by BB, return it. */ + if (dominated_by_p (CDI_DOMINATORS, pdom, bb)) + return pdom; + + return NULL; +} + +/* Return true if X1 is the negation of X2. */ + +static inline bool +pred_neg_p (const pred_info &x1, const pred_info &x2) +{ + if (!operand_equal_p (x1.pred_lhs, x2.pred_lhs, 0) + || !operand_equal_p (x1.pred_rhs, x2.pred_rhs, 0)) + return false; + + tree_code c1 = x1.cond_code, c2; + if (x1.invert == x2.invert) + c2 = invert_tree_comparison (x2.cond_code, false); + else + c2 = x2.cond_code; + + return c1 == c2; +} + +/* Return whether the condition (VAL CMPC BOUNDARY) is true. */ + +static bool +is_value_included_in (tree val, tree boundary, tree_code cmpc) +{ + /* Only handle integer constant here. */ + if (TREE_CODE (val) != INTEGER_CST || TREE_CODE (boundary) != INTEGER_CST) + return true; + + bool inverted = false; + if (cmpc == GE_EXPR || cmpc == GT_EXPR || cmpc == NE_EXPR) + { + cmpc = invert_tree_comparison (cmpc, false); + inverted = true; + } + + bool result; + if (cmpc == EQ_EXPR) + result = tree_int_cst_equal (val, boundary); + else if (cmpc == LT_EXPR) + result = tree_int_cst_lt (val, boundary); + else + { + gcc_assert (cmpc == LE_EXPR); + result = tree_int_cst_le (val, boundary); + } + + if (inverted) + result ^= 1; + + return result; +} + +/* Format the vector of edges EV as a string. */ + +static std::string +format_edge_vec (const vec<edge> &ev) +{ + std::string str; + + unsigned n = ev.length (); + for (unsigned i = 0; i < n; ++i) + { + char es[32]; + const_edge e = ev[i]; + sprintf (es, "%u", e->src->index); + str += es; + if (i + 1 < n) + str += " -> "; + } + return str; +} + +/* Format the first N elements of the array of vector of edges EVA as + a string. */ + +static std::string +format_edge_vecs (const vec<edge> eva[], unsigned n) +{ + std::string str; + + for (unsigned i = 0; i != n; ++i) + { + str += '{'; + str += format_edge_vec (eva[i]); + str += '}'; + if (i + 1 < n) + str += ", "; + } + return str; +} + +/* Dump a single pred_info to DUMP_FILE. */ + +static void +dump_pred_info (const pred_info &pred) +{ + if (pred.invert) + fprintf (dump_file, "NOT ("); + print_generic_expr (dump_file, pred.pred_lhs); + fprintf (dump_file, " %s ", op_symbol_code (pred.cond_code)); + print_generic_expr (dump_file, pred.pred_rhs); + if (pred.invert) + fputc (')', dump_file); +} + +/* Dump a pred_chain to DUMP_FILE. */ + +static void +dump_pred_chain (const pred_chain &chain) +{ + unsigned np = chain.length (); + if (np > 1) + fprintf (dump_file, "AND ("); + + for (unsigned j = 0; j < np; j++) + { + dump_pred_info (chain[j]); + if (j < np - 1) + fprintf (dump_file, ", "); + else if (j > 0) + fputc (')', dump_file); + } +} + +/* Dump the predicate chain PREDS for STMT, prefixed by MSG. */ + +static void +dump_predicates (gimple *stmt, const pred_chain_union &preds, const char *msg) +{ + fprintf (dump_file, "%s", msg); + if (stmt) + { + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "is guarded by:\n"); + } + + unsigned np = preds.length (); + if (np > 1) + fprintf (dump_file, "OR ("); + for (unsigned i = 0; i < np; i++) + { + dump_pred_chain (preds[i]); + if (i < np - 1) + fprintf (dump_file, ", "); + else if (i > 0) + fputc (')', dump_file); + } + fputc ('\n', dump_file); +} + +/* Dump the first NCHAINS elements of the DEP_CHAINS array into DUMP_FILE. */ + +static void +dump_dep_chains (const auto_vec<edge> dep_chains[], unsigned nchains) +{ + if (!dump_file) + return; + + for (unsigned i = 0; i != nchains; ++i) + { + const auto_vec<edge> &v = dep_chains[i]; + unsigned n = v.length (); + for (unsigned j = 0; j != n; ++j) + { + fprintf (dump_file, "%u", v[j]->src->index); + if (j + 1 < n) + fprintf (dump_file, " -> "); + } + fputc ('\n', dump_file); + } +} + +/* Return the 'normalized' conditional code with operand swapping + and condition inversion controlled by SWAP_COND and INVERT. */ + +static tree_code +get_cmp_code (tree_code orig_cmp_code, bool swap_cond, bool invert) +{ + tree_code tc = orig_cmp_code; + + if (swap_cond) + tc = swap_tree_comparison (orig_cmp_code); + if (invert) + tc = invert_tree_comparison (tc, false); + + switch (tc) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + break; + default: + return ERROR_MARK; + } + return tc; +} + +/* Return true if PRED is common among all predicate chains in PREDS + (and therefore can be factored out). */ + +static bool +find_matching_predicate_in_rest_chains (const pred_info &pred, + const pred_chain_union &preds) +{ + /* Trival case. */ + if (preds.length () == 1) + return true; + + for (unsigned i = 1; i < preds.length (); i++) + { + bool found = false; + const pred_chain &chain = preds[i]; + unsigned n = chain.length (); + for (unsigned j = 0; j < n; j++) + { + const pred_info &pred2 = chain[j]; + /* Can relax the condition comparison to not use address + comparison. However, the most common case is that + multiple control dependent paths share a common path + prefix, so address comparison should be ok. */ + if (operand_equal_p (pred2.pred_lhs, pred.pred_lhs, 0) + && operand_equal_p (pred2.pred_rhs, pred.pred_rhs, 0) + && pred2.invert == pred.invert) + { + found = true; + break; + } + } + if (!found) + return false; + } + return true; +} + +/* Find a predicate to examine against paths of interest. If there + is no predicate of the "FLAG_VAR CMP CONST" form, try to find one + of that's the form "FLAG_VAR CMP FLAG_VAR" with value range info. + PHI is the phi node whose incoming (interesting) paths need to be + examined. On success, return the comparison code, set defintion + gimple of FLAG_DEF and BOUNDARY_CST. Otherwise return ERROR_MARK. */ + +static tree_code +find_var_cmp_const (pred_chain_union preds, gphi *phi, gimple **flag_def, + tree *boundary_cst) +{ + tree_code vrinfo_code = ERROR_MARK; + gimple *vrinfo_def = NULL; + tree vrinfo_cst = NULL; + + gcc_assert (preds.length () > 0); + pred_chain chain = preds[0]; + for (unsigned i = 0; i < chain.length (); i++) + { + bool use_vrinfo_p = false; + const pred_info &pred = chain[i]; + tree cond_lhs = pred.pred_lhs; + tree cond_rhs = pred.pred_rhs; + if (cond_lhs == NULL_TREE || cond_rhs == NULL_TREE) + continue; + + tree_code code = get_cmp_code (pred.cond_code, false, pred.invert); + if (code == ERROR_MARK) + continue; + + /* Convert to the canonical form SSA_NAME CMP CONSTANT. */ + if (TREE_CODE (cond_lhs) == SSA_NAME + && is_gimple_constant (cond_rhs)) + ; + else if (TREE_CODE (cond_rhs) == SSA_NAME + && is_gimple_constant (cond_lhs)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + /* Check if we can take advantage of FLAG_VAR COMP FLAG_VAR predicate + with value range info. Note only first of such case is handled. */ + else if (vrinfo_code == ERROR_MARK + && TREE_CODE (cond_lhs) == SSA_NAME + && TREE_CODE (cond_rhs) == SSA_NAME) + { + gimple* lhs_def = SSA_NAME_DEF_STMT (cond_lhs); + if (!lhs_def || gimple_code (lhs_def) != GIMPLE_PHI + || gimple_bb (lhs_def) != gimple_bb (phi)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + + /* Check value range info of rhs, do following transforms: + flag_var < [min, max] -> flag_var < max + flag_var > [min, max] -> flag_var > min + + We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: + flag_var <= [min, max] -> flag_var < [min, max+1] + flag_var >= [min, max] -> flag_var > [min-1, max] + if no overflow/wrap. */ + tree type = TREE_TYPE (cond_lhs); + value_range r; + if (!INTEGRAL_TYPE_P (type) + || !get_range_query (cfun)->range_of_expr (r, cond_rhs) + || r.kind () != VR_RANGE) + continue; + + wide_int min = r.lower_bound (); + wide_int max = r.upper_bound (); + if (code == LE_EXPR + && max != wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = LT_EXPR; + max = max + 1; + } + if (code == GE_EXPR + && min != wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = GT_EXPR; + min = min - 1; + } + if (code == LT_EXPR) + cond_rhs = wide_int_to_tree (type, max); + else if (code == GT_EXPR) + cond_rhs = wide_int_to_tree (type, min); + else + continue; + + use_vrinfo_p = true; + } + else + continue; + + if ((*flag_def = SSA_NAME_DEF_STMT (cond_lhs)) == NULL) + continue; + + if (gimple_code (*flag_def) != GIMPLE_PHI + || gimple_bb (*flag_def) != gimple_bb (phi) + || !find_matching_predicate_in_rest_chains (pred, preds)) + continue; + + /* Return if any "flag_var comp const" predicate is found. */ + if (!use_vrinfo_p) + { + *boundary_cst = cond_rhs; + return code; + } + /* Record if any "flag_var comp flag_var[vinfo]" predicate is found. */ + else if (vrinfo_code == ERROR_MARK) + { + vrinfo_code = code; + vrinfo_def = *flag_def; + vrinfo_cst = cond_rhs; + } + } + /* Return the "flag_var cmp flag_var[vinfo]" predicate we found. */ + if (vrinfo_code != ERROR_MARK) + { + *flag_def = vrinfo_def; + *boundary_cst = vrinfo_cst; + } + return vrinfo_code; +} + +/* Return true if all interesting opnds are pruned, false otherwise. + PHI is the phi node with interesting operands, OPNDS is the bitmap + of the interesting operand positions, FLAG_DEF is the statement + defining the flag guarding the use of the PHI output, BOUNDARY_CST + is the const value used in the predicate associated with the flag, + CMP_CODE is the comparison code used in the predicate, VISITED_PHIS + is the pointer set of phis visited, and VISITED_FLAG_PHIS is + the pointer to the pointer set of flag definitions that are also + phis. + + Example scenario: + + BB1: + flag_1 = phi <0, 1> // (1) + var_1 = phi <undef, some_val> + + + BB2: + flag_2 = phi <0, flag_1, flag_1> // (2) + var_2 = phi <undef, var_1, var_1> + if (flag_2 == 1) + goto BB3; + + BB3: + use of var_2 // (3) + + Because some flag arg in (1) is not constant, if we do not look into + the flag phis recursively, it is conservatively treated as unknown and + var_1 is thought to flow into use at (3). Since var_1 is potentially + uninitialized a false warning will be emitted. + Checking recursively into (1), the compiler can find out that only + some_val (which is defined) can flow into (3) which is OK. */ + +static bool +prune_phi_opnds (gphi *phi, unsigned opnds, gphi *flag_def, + tree boundary_cst, tree_code cmp_code, + predicate::func_t &eval, + hash_set<gphi *> *visited_phis, + bitmap *visited_flag_phis) +{ + /* The Boolean predicate guarding the PHI definition. Initialized + lazily from PHI in the first call to is_use_guarded() and cached + for subsequent iterations. */ + predicate def_preds (eval); + + unsigned n = MIN (eval.max_phi_args, gimple_phi_num_args (flag_def)); + for (unsigned i = 0; i < n; i++) + { + if (!MASK_TEST_BIT (opnds, i)) + continue; + + tree flag_arg = gimple_phi_arg_def (flag_def, i); + if (!is_gimple_constant (flag_arg)) + { + if (TREE_CODE (flag_arg) != SSA_NAME) + return false; + + gphi *flag_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (flag_arg)); + if (!flag_arg_def) + return false; + + tree phi_arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (phi_arg) != SSA_NAME) + return false; + + gphi *phi_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (phi_arg)); + if (!phi_arg_def) + return false; + + if (gimple_bb (phi_arg_def) != gimple_bb (flag_arg_def)) + return false; + + if (!*visited_flag_phis) + *visited_flag_phis = BITMAP_ALLOC (NULL); + + tree phi_result = gimple_phi_result (flag_arg_def); + if (bitmap_bit_p (*visited_flag_phis, SSA_NAME_VERSION (phi_result))) + return false; + + bitmap_set_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + + /* Now recursively try to prune the interesting phi args. */ + unsigned opnds_arg_phi = eval.phi_arg_set (phi_arg_def); + if (!prune_phi_opnds (phi_arg_def, opnds_arg_phi, flag_arg_def, + boundary_cst, cmp_code, eval, visited_phis, + visited_flag_phis)) + return false; + + bitmap_clear_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + continue; + } + + /* Now check if the constant is in the guarded range. */ + if (is_value_included_in (flag_arg, boundary_cst, cmp_code)) + { + /* Now that we know that this undefined edge is not pruned. + If the operand is defined by another phi, we can further + prune the incoming edges of that phi by checking + the predicates of this operands. */ + + tree opnd = gimple_phi_arg_def (phi, i); + gimple *opnd_def = SSA_NAME_DEF_STMT (opnd); + if (gphi *opnd_def_phi = dyn_cast <gphi *> (opnd_def)) + { + unsigned opnds2 = eval.phi_arg_set (opnd_def_phi); + if (!MASK_EMPTY (opnds2)) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + if (def_preds.is_use_guarded (phi, opnd_edge->src, + opnd_def_phi, opnds2, + visited_phis)) + return false; + } + } + else + return false; + } + } + + return true; +} + +/* Recursively compute the set PHI's incoming edges with "uninteresting" + operands of a phi chain, i.e., those for which EVAL returns false. + CD_ROOT is the control dependence root from which edges are collected + up the CFG nodes that it's dominated by. *EDGES holds the result, and + VISITED is used for detecting cycles. */ + +static void +collect_phi_def_edges (gphi *phi, basic_block cd_root, auto_vec<edge> *edges, + predicate::func_t &eval, hash_set<gimple *> *visited) +{ + if (visited->elements () == 0 + && DEBUG_PREDICATE_ANALYZER + && dump_file) + { + fprintf (dump_file, "%s for cd_root %u and ", + __func__, cd_root->index); + print_gimple_stmt (dump_file, phi, 0); + + } + + if (visited->add (phi)) + return; + + unsigned n = gimple_phi_num_args (phi); + for (unsigned i = 0; i < n; i++) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + tree opnd = gimple_phi_arg_def (phi, i); + + if (TREE_CODE (opnd) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (opnd); + + if (gimple_code (def) == GIMPLE_PHI + && dominated_by_p (CDI_DOMINATORS, gimple_bb (def), cd_root)) + collect_phi_def_edges (as_a<gphi *> (def), cd_root, edges, eval, + visited); + else if (!eval (opnd)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + edges->safe_push (opnd_edge); + } + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + + if (!eval (opnd)) + edges->safe_push (opnd_edge); + } + } +} + +/* Return an expression corresponding to the predicate PRED. */ + +static tree +build_pred_expr (const pred_info &pred) +{ + tree_code cond_code = pred.cond_code; + tree lhs = pred.pred_lhs; + tree rhs = pred.pred_rhs; + + if (pred.invert) + cond_code = invert_tree_comparison (cond_code, false); + + return build2 (cond_code, TREE_TYPE (lhs), lhs, rhs); +} + +/* Return an expression corresponding to PREDS. */ + +static tree +build_pred_expr (const pred_chain_union &preds, bool invert = false) +{ + tree_code code = invert ? TRUTH_AND_EXPR : TRUTH_OR_EXPR; + tree_code subcode = invert ? TRUTH_OR_EXPR : TRUTH_AND_EXPR; + + tree expr = NULL_TREE; + for (unsigned i = 0; i != preds.length (); ++i) + { + tree subexpr = NULL_TREE; + for (unsigned j = 0; j != preds[i].length (); ++j) + { + const pred_info &pi = preds[i][j]; + tree cond = build_pred_expr (pi); + if (invert) + cond = invert_truthvalue (cond); + subexpr = subexpr ? build2 (subcode, boolean_type_node, + subexpr, cond) : cond; + } + if (expr) + expr = build2 (code, boolean_type_node, expr, subexpr); + else + expr = subexpr; + } + + return expr; +} + +/* Return a bitset of all PHI arguments or zero if there are too many. */ + +unsigned +predicate::func_t::phi_arg_set (gphi *phi) +{ + unsigned n = gimple_phi_num_args (phi); + + if (max_phi_args < n) + return 0; + + /* Set the least significant N bits. */ + return (1U << n) - 1; +} + +/* Determine if the predicate set of the use does not overlap with that + of the interesting paths. The most common senario of guarded use is + in Example 1: + Example 1: + if (some_cond) + { + x = ...; // set x to valid + flag = true; + } + + ... some code ... + + if (flag) + use (x); // use when x is valid + + The real world examples are usually more complicated, but similar + and usually result from inlining: + + bool init_func (int * x) + { + if (some_cond) + return false; + *x = ...; // set *x to valid + return true; + } + + void foo (..) + { + int x; + + if (!init_func (&x)) + return; + + .. some_code ... + use (x); // use when x is valid + } + + Another possible use scenario is in the following trivial example: + + Example 2: + if (n > 0) + x = 1; + ... + if (n > 0) + { + if (m < 2) + ... = x; + } + + Predicate analysis needs to compute the composite predicate: + + 1) 'x' use predicate: (n > 0) .AND. (m < 2) + 2) 'x' default value (non-def) predicate: .NOT. (n > 0) + (the predicate chain for phi operand defs can be computed + starting from a bb that is control equivalent to the phi's + bb and is dominating the operand def.) + + and check overlapping: + (n > 0) .AND. (m < 2) .AND. (.NOT. (n > 0)) + <==> false + + This implementation provides a framework that can handle different + scenarios. (Note that many simple cases are handled properly without + the predicate analysis if jump threading eliminates the merge point + thus makes path-sensitive analysis unnecessary.) + + PHI is the phi node whose incoming (undefined) paths need to be + pruned, and OPNDS is the bitmap holding interesting operand + positions. VISITED is the pointer set of phi stmts being + checked. */ + +bool +predicate::overlap (gphi *phi, unsigned opnds, hash_set<gphi *> *visited) +{ + gimple *flag_def = NULL; + tree boundary_cst = NULL_TREE; + bitmap visited_flag_phis = NULL; + + /* Find within the common prefix of multiple predicate chains + a predicate that is a comparison of a flag variable against + a constant. */ + tree_code cmp_code = find_var_cmp_const (m_preds, phi, &flag_def, + &boundary_cst); + if (cmp_code == ERROR_MARK) + return true; + + /* Now check all the uninit incoming edges have a constant flag + value that is in conflict with the use guard/predicate. */ + gphi *phi_def = as_a<gphi *> (flag_def); + bool all_pruned = prune_phi_opnds (phi, opnds, phi_def, boundary_cst, + cmp_code, m_eval, visited, + &visited_flag_phis); + + if (visited_flag_phis) + BITMAP_FREE (visited_flag_phis); + + return !all_pruned; +} + +/* Return true if two predicates PRED1 and X2 are equivalent. Assume + the expressions have already properly re-associated. */ + +static inline bool +pred_equal_p (const pred_info &pred1, const pred_info &pred2) +{ + if (!operand_equal_p (pred1.pred_lhs, pred2.pred_lhs, 0) + || !operand_equal_p (pred1.pred_rhs, pred2.pred_rhs, 0)) + return false; + + tree_code c1 = pred1.cond_code, c2; + if (pred1.invert != pred2.invert + && TREE_CODE_CLASS (pred2.cond_code) == tcc_comparison) + c2 = invert_tree_comparison (pred2.cond_code, false); + else + c2 = pred2.cond_code; + + return c1 == c2; +} + +/* Return true if PRED tests inequality (i.e., X != Y). */ + +static inline bool +is_neq_relop_p (const pred_info &pred) +{ + + return ((pred.cond_code == NE_EXPR && !pred.invert) + || (pred.cond_code == EQ_EXPR && pred.invert)); +} + +/* Returns true if PRED is of the form X != 0. */ + +static inline bool +is_neq_zero_form_p (const pred_info &pred) +{ + if (!is_neq_relop_p (pred) || !integer_zerop (pred.pred_rhs) + || TREE_CODE (pred.pred_lhs) != SSA_NAME) + return false; + return true; +} + +/* Return true if PRED is equivalent to X != 0. */ + +static inline bool +pred_expr_equal_p (const pred_info &pred, tree expr) +{ + if (!is_neq_zero_form_p (pred)) + return false; + + return operand_equal_p (pred.pred_lhs, expr, 0); +} + +/* Return true if VAL satisfies (x CMPC BOUNDARY) predicate. CMPC can + be either one of the range comparison codes ({GE,LT,EQ,NE}_EXPR and + the like), or BIT_AND_EXPR. EXACT_P is only meaningful for the latter. + Modify the question from VAL & BOUNDARY != 0 to VAL & BOUNDARY == VAL. + For other values of CMPC, EXACT_P is ignored. */ + +static bool +value_sat_pred_p (tree val, tree boundary, tree_code cmpc, + bool exact_p = false) +{ + if (cmpc != BIT_AND_EXPR) + return is_value_included_in (val, boundary, cmpc); + + wide_int andw = wi::to_wide (val) & wi::to_wide (boundary); + if (exact_p) + return andw == wi::to_wide (val); + + return andw.to_uhwi (); +} + +/* Return true if the domain of single predicate expression PRED1 + is a subset of that of PRED2, and false if it cannot be proved. */ + +static bool +subset_of (const pred_info &pred1, const pred_info &pred2) +{ + if (pred_equal_p (pred1, pred2)) + return true; + </cut>

4 years, 9 months

2
1
0 0

clang-aarch64-full-2stage buildbot timeout

by Florian Hahn

Hi, It looks like a lot of the recent builds of clang-aarch64-full-2stage are timing out. E.g https://lab.llvm.org/buildbot/#/builders/179/builds/1078 while checking out sources https://lab.llvm.org/buildbot/#/builders/179/builds/1076 during building stage2 Is there anything that could be done to avoid such timeouts and avoid false positive failure emails? Cheers, Florian

4 years, 9 months

1
1
0 0

[TCWG CI] 403.gcc grew in size by 2% after llvm: Turn on the new pass manager by default

by ci_notify＠linaro.org

After llvm commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Turn on the new pass manager by default the following benchmarks grew in size by more than 1%: - 403.gcc grew in size by 2% from 2586180 to 2648252 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Date: Mon Jan 25 11:00:56 2021 -0800 Turn on the new pass manager by default This turns on the new pass manager by default for the optimization pipeline in Clang and ThinLTO in various LLD backends. This also makes uses of `opt -instcombine` use the new pass manager (unless specifically opted out). This does not affect the backend target-dependent codegen pipeline. If this causes regressions, you can opt out of the new pass manager either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag while building LLVM, or via various compiler flags, e.g. -flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for ELF LLD. Please file bugs for any regressions. Major differences: * The inliner works slightly differently * -O1 does some amount of inlining * LCSSA and LoopSimplify are run before all loop passes * Loop unswitching is implemented slightly differently * A new SpeculateAroundPHIs pass is added to the pipeline https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html Reviewed By: asbirlea, ychen, MaskRay, echristo Differential Revision: https://reviews.llvm.org/D95380 --- llvm/CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 1affc289e64b..f5298de9f7ca 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -688,8 +688,8 @@ else() endif() option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default}) -set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL - "Enable the experimental new pass manager by default.") +set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL + "Enable the new pass manager by default.") include(HandleLLVMOptions) </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by linux: parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled

by ci_notify＠linaro.org

[TCWG CI] Regression caused by linux: parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled: commit 9caea0007601d3bc6debec04f8b4cd6f4c2394be Author: Helge Deller <deller(a)gmx.de> parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 37 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 20151 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Baseline build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Even more details: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Reproduce builds: <cut> mkdir investigate-linux-9caea0007601d3bc6debec04f8b4cd6f4c2394be cd investigate-linux-9caea0007601d3bc6debec04f8b4cd6f4c2394be # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 9caea0007601d3bc6debec04f8b4cd6f4c2394be ../artifacts/test.sh # Reproduce last_good build git checkout --detach 31ad37bd6faf871c070650f72ac9488ceeeceeb0 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9caea0007601d3bc6debec04f8b4cd6f4c2394be Author: Helge Deller <deller(a)gmx.de> Date: Sun Sep 19 10:36:09 2021 -0700 parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled Linus noticed odd declaration rules for pci_iounmap() in iomap.h and pci_iomap.h, where it dependend on either NO_GENERIC_PCI_IOPORT_MAP or GENERIC_IOMAP when CONFIG_PCI was disabled. Testing on parisc seems to indicate that we need pci_iounmap() only when CONFIG_PCI is enabled, so the declaration of pci_iounmap() can be moved cleanly into pci_iomap.h in sync with the declarations of pci_iomap(). Link: https://lore.kernel.org/all/CAHk-=wjRrh98pZoQ+AzfWmsTZacWxTJKXZ9eKU2X_0+jM=… Signed-off-by: Helge Deller <deller(a)gmx.de> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Fixes: 97a29d59fc22 ("[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional") Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Guenter Roeck <linux(a)roeck-us.net> Cc: Ulrich Teichert <krypton(a)ulrich-teichert.org> Cc: James Bottomley <James.Bottomley(a)hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> --- arch/parisc/lib/iomap.c | 4 +++- include/asm-generic/iomap.h | 10 ---------- include/asm-generic/pci_iomap.h | 3 +++ 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/arch/parisc/lib/iomap.c b/arch/parisc/lib/iomap.c index f03adb1999e7..367f6397bda7 100644 --- a/arch/parisc/lib/iomap.c +++ b/arch/parisc/lib/iomap.c @@ -513,12 +513,15 @@ void ioport_unmap(void __iomem *addr) } } +#ifdef CONFIG_PCI void pci_iounmap(struct pci_dev *dev, void __iomem * addr) { if (!INDIRECT_ADDR(addr)) { iounmap(addr); } } +EXPORT_SYMBOL(pci_iounmap); +#endif EXPORT_SYMBOL(ioread8); EXPORT_SYMBOL(ioread16); @@ -544,4 +547,3 @@ EXPORT_SYMBOL(iowrite16_rep); EXPORT_SYMBOL(iowrite32_rep); EXPORT_SYMBOL(ioport_map); EXPORT_SYMBOL(ioport_unmap); -EXPORT_SYMBOL(pci_iounmap); diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index 9b3eb6d86200..08237ae8b840 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -110,16 +110,6 @@ static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) } #endif -#ifdef CONFIG_PCI -/* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */ -struct pci_dev; -extern void pci_iounmap(struct pci_dev *dev, void __iomem *); -#elif defined(CONFIG_GENERIC_IOMAP) -struct pci_dev; -static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) -{ } -#endif - #include <asm-generic/pci_iomap.h> #endif diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index df636c6d8e6c..5a2f9bf53384 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -18,6 +18,7 @@ extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void pci_iounmap(struct pci_dev *dev, void __iomem *); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ @@ -50,6 +51,8 @@ static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, { return NULL; } +static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) +{ } #endif #endif /* __ASM_GENERIC_PCI_IOMAP_H */ </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] 447.dealII:libstdc++.so.6.0.29 grew in size by 12% after gcc: libstdc++: Add floating-point std::to_chars implementation

by ci_notify＠linaro.org

After gcc commit 3c57e692357c79ee7623dfc1586652aee2aefb8f Author: Patrick Palka <ppalka(a)redhat.com> libstdc++: Add floating-point std::to_chars implementation the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 447.dealII:libstdc++.so.6.0.29 grew in size by 12% from 1245370 to 1391240 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: arm-linux-gnueabihf - Compiler flags: -Os -mthumb - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f cd investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 3c57e692357c79ee7623dfc1586652aee2aefb8f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5033506993ef92589373270a8e8dbbf50e3ebef1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 3c57e692357c79ee7623dfc1586652aee2aefb8f Author: Patrick Palka <ppalka(a)redhat.com> Date: Thu Dec 17 23:11:34 2020 -0500 libstdc++: Add floating-point std::to_chars implementation This implements the floating-point std::to_chars overloads for float, double and long double. We use the Ryu library to compute the shortest round-trippable fixed and scientific forms for float, double and long double. We also use Ryu for performing explicit-precision fixed and scientific formatting for float and double. For explicit-precision formatting for long double we fall back to using printf. Hexadecimal formatting for float, double and long double is implemented from scratch. The supported long double binary formats are binary64, binary80 (x86 80-bit extended precision), binary128 and ibm128. Much of the complexity of the implementation is in computing the exact output length before handing it off to Ryu (which doesn't do bounds checking). In some cases it's hard to compute the output length beforehand, so in these cases we instead compute an upper bound on the output length and use a sufficiently-sized intermediate buffer only if necessary. Another source of complexity is in the general-with-precision formatting mode, where we need to do zero-trimming of the string returned by Ryu, and where we also take care to avoid having to format the number through Ryu a second time when the general formatting mode resolves to fixed (which we determine by doing a scientific formatting first and inspecting the scientific exponent). We avoid going through Ryu twice by instead transforming the scientific form to the corresponding fixed form via in-place string manipulation. This implementation is non-conforming in a couple of ways: 1. For the shortest hexadecimal formatting, we currently follow the Microsoft implementation's decision to be consistent with the output of printf's '%a' specifier at the expense of sometimes not printing the shortest representation. For example, the shortest hex form for the number 1.08p+0 is 2.1p-1, but we output the former instead of the latter, as does printf. 2. The Ryu routine generic_binary_to_decimal that we use for performing shortest formatting for large floating point types is implemented using the __int128 type, but some targets with a large long double type lack __int128 (e.g. i686), so we can't perform shortest formatting of long double on such targets through Ryu. As a temporary stopgap this patch makes the long double to_chars overloads just dispatch to the double overloads on these targets, which means we lose precision in the output. (We could potentially fix this by writing a specialized version of Ryu's generic_binary_to_decimal routine that uses uint64_t instead of __int128.) [Though I wonder if there's a better way to work around the lack of __int128 on i686 specifically?] 3. Our shortest formatting for __ibm128 doesn't guarantee the round-trip property if the difference between the high- and low-order exponent is large. This is because we treat __ibm128 as if it has a contiguous 105-bit mantissa by merging the mantissas of the high- and low-order parts (using code extracted from glibc), so we potentially lose precision from the low-order part. This seems to be consistent with how glibc printf formats __ibm128. libstdc++-v3/ChangeLog: * config/abi/pre/gnu.ver: Add new exports. * include/std/charconv (to_chars): Declare the floating-point overloads for float, double and long double. * src/c++17/Makefile.am (sources): Add floating_to_chars.cc. * src/c++17/Makefile.in: Regenerate. * src/c++17/floating_to_chars.cc: New file. (to_chars): Define for float, double and long double. * testsuite/20_util/to_chars/long_double.cc: New test. --- libstdc++-v3/config/abi/pre/gnu.ver | 7 + libstdc++-v3/include/std/charconv | 24 + libstdc++-v3/src/c++17/Makefile.am | 1 + libstdc++-v3/src/c++17/Makefile.in | 3 +- libstdc++-v3/src/c++17/floating_to_chars.cc | 1563 ++++++++++++++++++++ .../testsuite/20_util/to_chars/long_double.cc | 199 +++ 6 files changed, 1796 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 4b4bd8ab6da..05e0a512247 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -2393,6 +2393,13 @@ GLIBCXX_3.4.29 { # std::once_flag::_M_finish(bool) _ZNSt9once_flag9_M_finishEb; + # std::to_chars(char*, char*, [float|double|long double]) + _ZSt8to_charsPcS_[defg]; + # std::to_chars(char*, char*, [float|double|long double], chars_format) + _ZSt8to_charsPcS_[defg]St12chars_format; + # std::to_chars(char*, char*, [float|double|long double], chars_format, int) + _ZSt8to_charsPcS_[defg]St12chars_formati; + } GLIBCXX_3.4.28; # Symbols in the support library (libsupc++) have their own tag. diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index dd1ebdf8322..b57b0a16db2 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -702,6 +702,30 @@ namespace __detail chars_format __fmt = chars_format::general) noexcept; #endif + // Floating-point std::to_chars + + // Overloads for float. + to_chars_result to_chars(char* __first, char* __last, float __value) noexcept; + to_chars_result to_chars(char* __first, char* __last, float __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, float __value, + chars_format __fmt, int __precision) noexcept; + + // Overloads for double. + to_chars_result to_chars(char* __first, char* __last, double __value) noexcept; + to_chars_result to_chars(char* __first, char* __last, double __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, double __value, + chars_format __fmt, int __precision) noexcept; + + // Overloads for long double. + to_chars_result to_chars(char* __first, char* __last, long double __value) + noexcept; + to_chars_result to_chars(char* __first, char* __last, long double __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, long double __value, + chars_format __fmt, int __precision) noexcept; + _GLIBCXX_END_NAMESPACE_VERSION } // namespace std #endif // C++14 diff --git a/libstdc++-v3/src/c++17/Makefile.am b/libstdc++-v3/src/c++17/Makefile.am index 37cdb53c076..2ec5ed621ca 100644 --- a/libstdc++-v3/src/c++17/Makefile.am +++ b/libstdc++-v3/src/c++17/Makefile.am @@ -51,6 +51,7 @@ endif sources = \ floating_from_chars.cc \ + floating_to_chars.cc \ fs_dir.cc \ fs_ops.cc \ fs_path.cc \ diff --git a/libstdc++-v3/src/c++17/Makefile.in b/libstdc++-v3/src/c++17/Makefile.in index ccae721ab3f..9b36b7a916c 100644 --- a/libstdc++-v3/src/c++17/Makefile.in +++ b/libstdc++-v3/src/c++17/Makefile.in @@ -124,7 +124,7 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) libc__17convenience_la_LIBADD = @ENABLE_DUAL_ABI_TRUE@am__objects_1 = cow-fs_dir.lo cow-fs_ops.lo \ @ENABLE_DUAL_ABI_TRUE@ cow-fs_path.lo -am__objects_2 = floating_from_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \ +am__objects_2 = floating_from_chars.lo floating_to_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \ memory_resource.lo $(am__objects_1) @ENABLE_DUAL_ABI_TRUE@am__objects_3 = cow-string-inst.lo @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_4 = ostream-inst.lo \ @@ -440,6 +440,7 @@ headers = sources = \ floating_from_chars.cc \ + floating_to_chars.cc \ fs_dir.cc \ fs_ops.cc \ fs_path.cc \ diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc b/libstdc++-v3/src/c++17/floating_to_chars.cc new file mode 100644 index 00000000000..dd83f5eea93 --- /dev/null +++ b/libstdc++-v3/src/c++17/floating_to_chars.cc @@ -0,0 +1,1563 @@ +// std::to_chars implementation for floating-point types -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +// Activate __glibcxx_assert within this file to shake out any bugs. +#define _GLIBCXX_ASSERTIONS 1 + +#include <charconv> + +#include <bit> +#include <cfenv> +#include <cassert> +#include <cmath> +#include <cstdio> +#include <cstring> +#include <langinfo.h> +#include <optional> +#include <string_view> +#include <type_traits> + +// Determine the binary format of 'long double'. + +// We support the binary64, float80 (i.e. x86 80-bit extended precision), +// binary128, and ibm128 formats. +#define LDK_UNSUPPORTED 0 +#define LDK_BINARY64 1 +#define LDK_FLOAT80 2 +#define LDK_BINARY128 3 +#define LDK_IBM128 4 + +#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ +# define LONG_DOUBLE_KIND LDK_BINARY64 +#elif defined(__SIZEOF_INT128__) +// The Ryu routines need a 128-bit integer type in order to do shortest +// formatting of types larger than 64-bit double, so without __int128 we can't +// support any large long double format. This is the case for e.g. i386. +# if __LDBL_MANT_DIG__ == 64 +# define LONG_DOUBLE_KIND LDK_FLOAT80 +# elif __LDBL_MANT_DIG__ == 113 +# define LONG_DOUBLE_KIND LDK_BINARY128 +# elif __LDBL_MANT_DIG__ == 106 +# define LONG_DOUBLE_KIND LDK_IBM128 +# endif +#endif +#if !defined(LONG_DOUBLE_KIND) +# define LONG_DOUBLE_KIND LDK_UNSUPPORTED +#endif + +namespace +{ + namespace ryu + { +#include "ryu/common.h" +#include "ryu/digit_table.h" +#include "ryu/d2s_intrinsics.h" +#include "ryu/d2s_full_table.h" +#include "ryu/d2fixed_full_table.h" +#include "ryu/f2s_intrinsics.h" +#include "ryu/d2s.c" +#include "ryu/d2fixed.c" +#include "ryu/f2s.c" + +#ifdef __SIZEOF_INT128__ + namespace generic128 + { + // Put the generic Ryu bits in their own namespace to avoid name conflicts. +# include "ryu/generic_128.h" +# include "ryu/ryu_generic_128.h" +# include "ryu/generic_128.c" + } // namespace generic128 + + using generic128::floating_decimal_128; + using generic128::generic_binary_to_decimal; + + int + to_chars(const floating_decimal_128 v, char* const result) + { return generic128::generic_to_chars(v, result); } +#endif + } // namespace ryu + + // A traits class that contains pertinent information about the binary + // format of each of the floating-point types we support. + template<typename T> + struct floating_type_traits + { }; + + template<> + struct floating_type_traits<float> + { + // We (and Ryu) assume float has the IEEE binary32 format. + static_assert(__FLT_MANT_DIG__ == 24); + static constexpr int mantissa_bits = 23; + static constexpr int exponent_bits = 8; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = uint32_t; + using shortest_scientific_t = ryu::floating_decimal_32; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000011101011100110101100101101110000000000000000000000000 }; + }; + + template<> + struct floating_type_traits<double> + { + // We (and Ryu) assume double has the IEEE binary64 format. + static_assert(__DBL_MANT_DIG__ == 53); + static constexpr int mantissa_bits = 52; + static constexpr int exponent_bits = 11; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = uint64_t; + using shortest_scientific_t = ryu::floating_decimal_64; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000011000110101110111000001100101110000111100, + 0b0111100011110101011000011110000000110110010101011000001110011111, + 0b0101101100000000011100100100111100110110110100010001010101110000, + 0b0011110010111000101111110101100011101100010001010000000101100111, + 0b0001010000011001011100100001010000010101101000001101000000000000 }; + }; + +#if LONG_DOUBLE_KIND == LDK_BINARY64 + // When long double is equivalent to double, we just forward the long double + // overloads to the double overloads, so we don't need to define a a + // floating_type_traits<long double> specialization in this case. +#elif LONG_DOUBLE_KIND == LDK_FLOAT80 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 64; + static constexpr int exponent_bits = 15; + static constexpr bool has_implicit_leading_bit = false; + using mantissa_t = uint64_t; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000110101011111110100010100110000011101, + 0b1001100101001111010011011111101000101111110001011001011101110000, + 0b0000101111111011110010001000001010111101011110111111010100011001, + 0b0011100000011111001101101011111001111100100010000101001111101001, + 0b0100100100000000100111010010101110011000110001101101110011001010, + 0b0111100111100010100000010011000010010110101111110101000011110100, + 0b1010100111100010011110000011011101101100010110000110101010101010, + 0b0000001111001111000000101100111011011000101000110011101100110010, + 0b0111000011100100101101010100001101111110101111001000010011111111, + 0b0010111000100110100100100010101100111010110001101010010111001000, + 0b0000100000010110000011001001000111000001111010100101101000001111, + 0b0010101011101000111100001011000010011101000101010010010000101111, + 0b1011111011101101110010101011010001111000101000101101011001100011, + 0b1010111011011011110111110011001010000010011001110100101101000101, + 0b0011000001110110011010010000011100100011001011001100001101010110, + 0b0100011111011000111111101000011110000010111110101001000000001001, + 0b1110000001110001001101101110011000100000001010000111100010111010, + 0b1110001001010011101000111000001000010100110000010110100011110000, + 0b0000011010110000110001111000011111000011001101001101001001000110, + 0b1010010111001000101001100101010110100100100010010010000101000010, + 0b1011001110000111100010100110000011100011111001110111001100000101, + 0b0110101001001000010110001000010001010101110101100001111100011001, + 0b1111100011110101011110011010101001010010100011000010110001101001, + 0b0100000100001000111101011100010011011111011001000000001100011000, + 0b1110111111000111100101110111110000000011001110011100011011011001, + 0b1100001100100000010001100011011000111011110000110011010101000011, + 0b1111111011100111011101001111111000010000001111010111110010000100, + 0b1110111001111110101111000101000000001010001110011010001000111010, + 0b1000010001011000101111111010110011111101110101101001111000111010, + 0b0100000111101001000111011001101000001010111011101001101111000100, + 0b0000011100110001000111011100111100110001101111111010110111100000, + 0b0000011101011100100110010011110101010100010011110010010111010000, + 0b0011011001100111110101111100001001101110101101001110110011110110, + 0b1011000101000001110100111001100100111100110011110000000001101000, + 0b1011100011110100001001110101010110111001000000001011101001011110, + 0b1111001010010010100000010110101010101011101000101000000000001100, + 0b1000001111100100111001110101100001010011111111000001000011110000, + 0b0001011101001000010000101101111000001110101100110011001100110111, + 0b1110011100000010101011011111001010111101111110100000011100000011, + 0b1001110110011100101010011110100010110001001110110000101011100110, + 0b1001101000100011100111010000011011100001000000110101100100001001, + 0b1010111000101000101101010111000010001100001010100011111100000100, + 0b0111101000100011000101101011111011100010001101110111001111001011, + 0b1110100111010110001110110110000000010110100011110000010001111100, + 0b1100010100011010001011001000111001010101011110100101011001000000, + 0b0000110001111001100110010110111010101101001101000000000010010101, + 0b0001110111101000001111101010110010010000111110111100000111110100, + 0b0111110111001001111000110001101101001010101110110101111110000100, + 0b0000111110111010101111100010111010011100010110011011011001000001, + 0b1010010100100100101110111111111000101100000010111111101101000110, + 0b1000100111111101100011001101000110001000000100010101010100001101, + 0b1100101010101000111100101100001000110001110010100000000010110101, + 0b1010000100111101100100101010010110100010000000110101101110000100, + 0b1011111011110001110000100100000000001010111010001101100000100100, + 0b0111101101100011001110011100000001000101101101111000100111011111, + 0b0100111010010011011001010011110100001100111010010101111111100011, + 0b0010001001011000111000001100110111110111110010100011000110110110, + 0b0101010110000000010000100000110100111011111101000100000111010010, + 0b0110000011011101000001010100110101101110011100110101000000001001, + 0b1101100110100000011000001111000100100100110001100110101010101100, + 0b0010100101010110010010001010101000011111111111001011001010001111, + 0b0111001010001111001100111001010101001000110101000011110000001000, + 0b0110010011001001001111110001010010001011010010001101110110110011, + 0b0110010100111011000100111000001001101011111001110010111110111111, + 0b0101110111001001101100110100101001110010101110011001101110001000, + 0b0100110101010111011010001100010111100011010011111001010100111000, + 0b0111000110110111011110100100010111000110000110110110110001111110, + 0b1000101101010100100100111110100011110110110010011001110011110101, + 0b1001101110101001010100111101101011000101000010110101101111110000, + 0b0100100101001011011001001011000010001101001010010001010110101000, + 0b0010100001001011100110101000010110000111000111000011100101011011, + 0b0110111000011001111101101011111010001000000010101000101010011110, + 0b1000110110100001111011000001111100001001000000010110010100100100, + 0b1001110100011111100111101011010000010101011100101000010010100110, + 0b0001010110101110100010101010001110110110100011101010001001111100, + 0b1010100101101100000010110011100110100010010000100100001110000100, + 0b0001000000010000001010000010100110000001110100111001110111101101, + 0b1100000000000000000000000000000000000000000000000000000000000000 }; + }; +#elif LONG_DOUBLE_KIND == LDK_BINARY128 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 112; + static constexpr int exponent_bits = 15; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = unsigned __int128; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000000000000000000000000100000010000000, + 0b1011001111110100000100010101101110011100100110000110010110011000, + 0b1010100010001101111111000000001101010010100010010000111011110111, + 0b1011111001110001111000011111000010110111000111110100101010100101, + 0b0110100110011110011011000011000010011001110001001001010011100011, + 0b0000011111110010101111101011101010000110011111100111001110100111, + 0b0100010101010110000010111011110100000010011001001010001110111101, + 0b1101110111000010001101100000110100000111001001101011000101011011, + 0b0100111011101101010000001101011000101100101110010010110000101011, + 0b0100000110111000000110101000010011101000110100010110000011101101, + 0b1011001101001000100001010001100100001111011101010101110001010110, + 0b1000000001000000101001110010110010001111101101010101001100000110, + 0b0101110110100110000110000001001010111110001110010000111111010011, + 0b1010001111100111000100011100100100111100100101000001011001000111, + 0b1010011000011100110101100111001011100101111111100001110100000100, + 0b1100011100100010100000110001001010000000100000001001010111011101, + 0b0101110000100011001111101101000000100110000010010111010001111010, + 0b0100111100011010110111101000100110000111001001101100000001111100, + 0b1100100100111110101011000100000101011010110111000111110100110101, + 0b0110010000010111010100110011000000111010000010111011010110000100, + 0b0101001001010010110111010111000101011100000111100111000001110010, + 0b1101111111001011101010110001000111011010111101001011010110100100, + 0b0001000100110000011111101011001101110010110110010000000011100100, + 0b0001000000000101001001001000000000011000100011001110101001001110, + 0b0010010010001000111010011011100001000110011011011110110100111000, + 0b0000100110101100000111100010100100011100110111011100001111001100, + 0b1011111010001110001100000011110111111111100000001011111111101100, + 0b0000011100001111010101110000100110111100101101110111101001000001, + 0b1100010001110110111100001001001101101000011100000010110101001011, + 0b0100101001101011111001011110101101100011011111011100101010101111, + 0b0001101001111001110000101101101100001011010001011110011101000010, + 0b1111000000101001101111011010110011101110100001011011001011100010, + 0b0101001010111101101100001111100010010110001101001000001101100100, + 0b0101100101011110001100101011111000111001111001001001101101100001, + 0b1111001101010010100100011011000110110010001111000111010001001101, + 0b0001110010011000000001000110110111011000011100001000011001110111, + 0b0100001011011011011011110011101100100101111111101100101000001110, + 0b0101011110111101010111100111101111000101111111111110100011011010, + 0b1110101010001001110100000010110111010111111010111110100110010110, + 0b1010001111100001001100101000110100001100011100110010000011010111, + 0b1111111101101111000100111100000101011000001110011011101010111001, + 0b1111101100001110100101111101011001000100000101110000110010100011, + 0b1001010110110101101101000101010001010000101011011111010011010000, + 0b0111001110110011101001100111000001000100001010110000010000001101, + 0b0101111100111110100111011001111001111011011110010111010011101010, + 0b1110111000000001100100111001100100110001011011001110101111110111, + 0b0001010001001101010111101010011111000011110001101101011001111111, + 0b0101000011100011010010001101100001011101011010100110101100100010, + 0b0001000101011000100101111100110110000101101101111000110001001011, + 0b0101100101001011011000010101000000010100011100101101000010011111, + 0b1000010010001011101001011010100010111011110100110011011000100111, + 0b1000011011100001010111010111010011101100100010010010100100101001, + 0b1001001001010111110101000010111010000000101111010100001010010010, + 0b0011011110110010010101111011000001000000000011011111000011111011, + 0b1011000110100011001110000001000100000001011100010111010010011110, + 0b0111101110110101110111110000011000000100011100011000101101101110, + 0b1001100101111011011100011110101011001111100111101010101010110111, + 0b1100110010010001100011001111010000000100011101001111011101001111, + 0b1000111001111010100101000010000100000001001100101010001011001101, + 0b0011101011110000110010100101010100110010100001000010101011111101, + 0b1100000000000110000010101011000000011101000110011111100010111111, + 0b0010100110000011011100010110111100010110101100110011101110001101, + 0b0010111101010011111000111001111100110111111100100011110001101110, + 0b1001110111001001101001001001011000010100110001000000100011010110, + 0b0011110101100111011011111100001000011001010100111100100101111010, + 0b0010001101000011000010100101110000010101101000100110000100001010, + 0b0010000010100110010101100101110011101111000111111111001001100001, + 0b0100111111011011011011100111111011000010011101101111011111110110, + 0b1111111111010110101011101000100101110100001110001001101011100111, + 0b1011111101000101110000111100100010111010100001010000010010110010, + 0b1111010101001011101011101010000100110110001110111100100110111111, + 0b1011001101000001001101000010101010010110010001100001011100011010, + 0b0101001011011101010001110100010000010001111100100100100001001101, + 0b0010100000111001100011000101100101000001111100111001101000000010, + 0b1011001111010101011001000100100110100100110111110100000110111000, + 0b0101011111010011100011010010111101110010100001111111100010001001, + 0b0010111011101100100000000000001111111010011101100111100001001101, + 0b1101000000000000000000000000000000000000000000000000000000000000 }; + }; +#elif LONG_DOUBLE_KIND == LDK_IBM128 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 105; + static constexpr int exponent_bits = 11; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = unsigned __int128; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000000000000000000000001000000100000000, + 0b0000000000000000000100000000000000000000001000000000000000000010, + 0b0000100000000000000000001001000000000000000001100100000000000000, + 0b0011000000000000000000000000000001110000010000000000000000000000, + 0b0000100000000000001000000000000000000000000000100000000000000000 }; + }; +#endif + + // An IEEE-style decomposition of a floating-point value of type T. + template<typename T> + struct ieee_t + { + typename floating_type_traits<T>::mantissa_t mantissa; + uint32_t biased_exponent; + bool sign; + }; + + // Decompose the floating-point value into its IEEE components. + template<typename T> + ieee_t<T> + get_ieee_repr(const T value) + { + constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits; + constexpr int exponent_bits = floating_type_traits<T>::exponent_bits; + constexpr int total_bits = mantissa_bits + exponent_bits + 1; + + constexpr auto get_uint_t = [] { + if constexpr (total_bits <= 32) + return uint32_t{}; + else if constexpr (total_bits <= 64) + return uint64_t{}; +#ifdef __SIZEOF_INT128__ + else if constexpr (total_bits <= 128) + return (unsigned __int128){}; +#endif + }; + using uint_t = decltype(get_uint_t()); + uint_t value_bits = 0; + memcpy(&value_bits, &value, sizeof(value)); + + ieee_t<T> ieee_repr; + ieee_repr.mantissa = value_bits & ((uint_t{1} << mantissa_bits) - 1u); + ieee_repr.biased_exponent + = (value_bits >> mantissa_bits) & ((uint_t{1} << exponent_bits) - 1u); + ieee_repr.sign = (value_bits >> (mantissa_bits + exponent_bits)) & 1; + return ieee_repr; + } + +#if LONG_DOUBLE_KIND == LDK_IBM128 + template<> + ieee_t<long double> + get_ieee_repr(const long double value) + { + // The layout of __ibm128 isn't compatible with the standard IEEE format. + // So we transform it into an IEEE-compatible format, suitable for + // consumption by the generic Ryu API, with an 11-bit exponent and 105-bit + // mantissa (plus an implicit leading bit). We use the exponent and sign + // of the high part, and we merge the mantissa of the high part with the + // mantissa (and the implicit leading bit) of the low part. + using uint_t = unsigned __int128; + uint_t value_bits = 0; + memcpy(&value_bits, &value, sizeof(value_bits)); + + const uint64_t value_hi = value_bits; + const uint64_t value_lo = value_bits >> 64; + + uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1); + unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1); + const int sign_hi = (value_hi >> 63) & 1; + + uint64_t mantissa_lo = value_lo & ((1ull << 52) - 1); + const unsigned exponent_lo = (value_lo >> 52) & ((1ull << 11) - 1); + const int sign_lo = (value_lo >> 63) & 1; + + { + // The following code for adjusting the low-part mantissa to combine + // it with the high-part mantissa is taken from the glibc source file + // sysdeps/ieee754/ldbl-128ibm/printf_fphex.c. + mantissa_lo <<= 7; + if (exponent_lo != 0) + mantissa_lo |= (1ull << (52 + 7)); + else + mantissa_lo <<= 1; + + const int ediff = exponent_hi - exponent_lo - 53; + if (ediff > 63) + mantissa_lo = 0; + else if (ediff > 0) + mantissa_lo >>= ediff; + else if (ediff < 0) + mantissa_lo <<= -ediff; + + if (sign_lo != sign_hi && mantissa_lo != 0) + { + mantissa_lo = (1ull << 60) - mantissa_lo; + if (mantissa_hi == 0) + { + mantissa_hi = 0xffffffffffffeLL | (mantissa_lo >> 59); + mantissa_lo = 0xfffffffffffffffLL & (mantissa_lo << 1); + exponent_hi--; + } + else + mantissa_hi--; + } + } + + ieee_t<long double> ieee_repr; + ieee_repr.mantissa = ((uint_t{mantissa_hi} << 64) + | (uint_t{mantissa_lo} << 4)) >> 11; + ieee_repr.biased_exponent = exponent_hi; + ieee_repr.sign = sign_hi; + return ieee_repr; + } +#endif + + // Invoke Ryu to obtain the shortest scientific form for the given + // floating-point number. + template<typename T> + typename floating_type_traits<T>::shortest_scientific_t + floating_to_shortest_scientific(const T value) + { + if constexpr (std::is_same_v<T, float>) + return ryu::floating_to_fd32(value); + else if constexpr (std::is_same_v<T, double>) + return ryu::floating_to_fd64(value); +#ifdef __SIZEOF_INT128__ + else if constexpr (std::is_same_v<T, long double>) + { + constexpr int mantissa_bits + = floating_type_traits<T>::mantissa_bits; + constexpr int exponent_bits + = floating_type_traits<T>::exponent_bits; + constexpr bool has_implicit_leading_bit + = floating_type_traits<T>::has_implicit_leading_bit; + + const auto [mantissa, exponent, sign] = get_ieee_repr(value); + return ryu::generic_binary_to_decimal(mantissa, exponent, sign, + mantissa_bits, exponent_bits, + !has_implicit_leading_bit); + } +#endif + } + + // This subroutine returns true if the shortest scientific form fd is a + // positive power of 10, and the floating-point number that has this shortest + // scientific form is smaller than this power of 10. + // + // For instance, the exactly-representable 64-bit number + // 99999999999999991611392.0 has the shortest scientific form 1e23, so its + // exact value is smaller than its shortest scientific form. + // + // For these powers of 10 the length of the fixed form is one digit less + // than what the scientific exponent suggests. + // + // This subroutine inspects a lookup table to detect when fd is such a + // "rounded up" power of 10. + template<typename T> + bool + is_rounded_up_pow10_p(const typename + floating_type_traits<T>::shortest_scientific_t fd) + { + if (fd.exponent < 0 || fd.mantissa != 1) [[likely]] + return false; + + constexpr auto& pow10_adjustment_tab + = floating_type_traits<T>::pow10_adjustment_tab; + __glibcxx_assert(fd.exponent/64 < (int)std::size(pow10_adjustment_tab)); + return (pow10_adjustment_tab[fd.exponent/64] + & (1ull << (63 - fd.exponent%64))); + } + + int + get_mantissa_length(const ryu::floating_decimal_32 fd) + { return ryu::decimalLength9(fd.mantissa); } + + int + get_mantissa_length(const ryu::floating_decimal_64 fd) + { return ryu::decimalLength17(fd.mantissa); } + +#ifdef __SIZEOF_INT128__ + int + get_mantissa_length(const ryu::floating_decimal_128 fd) + { return ryu::generic128::decimalLength(fd.mantissa); } +#endif +} // anon namespace + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + +// This subroutine of __floating_to_chars_* handles writing nan, inf and 0 in +// all formatting modes. +template<typename T> + static optional<to_chars_result> + __handle_special_value(char* first, char* const last, const T value, + const chars_format fmt, const int precision) + { + __glibcxx_assert(precision >= 0); + + string_view str; + switch (__builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, + FP_ZERO, value)) + { + case FP_INFINITE: + str = "-inf"; + break; + + case FP_NAN: + str = "-nan"; + break; + + case FP_ZERO: + break; + + default: + case FP_SUBNORMAL: + case FP_NORMAL: [[likely]] + return nullopt; + } + + if (!str.empty()) + { + // We're formatting +-inf or +-nan. + if (!__builtin_signbit(value)) + str.remove_prefix(strlen("-")); + + if (last - first < (int)str.length()) + return {{last, errc::value_too_large}}; + + memcpy(first, &str[0], str.length()); + first += str.length(); + return {{first, errc{}}}; + } + + // We're formatting 0. + __glibcxx_assert(value == 0); + const auto orig_first = first; + const bool sign = __builtin_signbit(value); + int expected_output_length; + switch (fmt) + { + case chars_format::fixed: + case chars_format::scientific: + case chars_format::hex: + expected_output_length = sign + 1; + if (precision) + expected_output_length += strlen(".") + precision; + if (fmt == chars_format::scientific) + expected_output_length += strlen("e+00"); + else if (fmt == chars_format::hex) + expected_output_length += strlen("p+0"); + if (last - first < expected_output_length) + return {{last, errc::value_too_large}}; + + if (sign) + *first++ = '-'; + *first++ = '0'; + if (precision) + { + *first++ = '.'; + memset(first, '0', precision); + first += precision; + } + if (fmt == chars_format::scientific) + { + memcpy(first, "e+00", 4); + first += 4; + } + else if (fmt == chars_format::hex) + { + memcpy(first, "p+0", 3); + first += 3; + } + break; + + case chars_format::general: + default: // case chars_format{}: + expected_output_length = sign + 1; + if (last - first < expected_output_length) + return {{last, errc::value_too_large}}; + + if (sign) + *first++ = '-'; + *first++ = '0'; + break; + } + __glibcxx_assert(first - orig_first == expected_output_length); + return {{first, errc{}}}; + } + +// This subroutine of the floating-point to_chars overloads performs +// hexadecimal formatting. +template<typename T> + static to_chars_result + __floating_to_chars_hex(char* first, char* const last, const T value, + const optional<int> precision) + { + if (precision.has_value() && precision.value() < 0) [[unlikely]] + // A negative precision argument is treated as if it were omitted. + return __floating_to_chars_hex(first, last, value, nullopt); + + __glibcxx_requires_valid_range(first, last); + + constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits; + constexpr bool has_implicit_leading_bit + = floating_type_traits<T>::has_implicit_leading_bit; + constexpr int exponent_bits = floating_type_traits<T>::exponent_bits; + constexpr int exponent_bias = (1u << (exponent_bits - 1)) - 1; + using mantissa_t = typename floating_type_traits<T>::mantissa_t; + constexpr int mantissa_t_width = sizeof(mantissa_t) * __CHAR_BIT__; + + if (auto result = __handle_special_value(first, last, value, + chars_format::hex, + precision.value_or(0))) + return *result; + + // Extract the sign, mantissa and exponent from the value. + const auto [ieee_mantissa, biased_exponent, sign] = get_ieee_repr(value); + const bool is_normal_number = (biased_exponent != 0); + + // Calculate the unbiased exponent. + const int32_t unbiased_exponent = (is_normal_number + ? biased_exponent - exponent_bias + : 1 - exponent_bias); + + // Shift the mantissa so that its bitwidth is a multiple of 4. + constexpr unsigned rounded_mantissa_bits = (mantissa_bits + 3) / 4 * 4; + static_assert(mantissa_t_width >= rounded_mantissa_bits); + mantissa_t effective_mantissa + = ieee_mantissa << (rounded_mantissa_bits - mantissa_bits); + if (is_normal_number) + { + if constexpr (has_implicit_leading_bit) + // Restore the mantissa's implicit leading bit. + effective_mantissa |= mantissa_t{1} << rounded_mantissa_bits; + else + // The explicit mantissa bit should already be set. + __glibcxx_assert(effective_mantissa & (mantissa_t{1} << (mantissa_bits + - 1u))); + } + + // Compute the shortest precision needed to print this value exactly, + // disregarding trailing zeros. + constexpr int full_hex_precision = (has_implicit_leading_bit + ? (mantissa_bits + 3) / 4 + // With an explicit leading bit, we + // use the four leading nibbles as the + // hexit before the decimal point. + : (mantissa_bits - 4 + 3) / 4); + const int trailing_zeros = __countr_zero(effective_mantissa) / 4; + const int shortest_full_precision = full_hex_precision - trailing_zeros; + __glibcxx_assert(shortest_full_precision >= 0); + + int written_exponent = unbiased_exponent; + const int effective_precision = precision.value_or(shortest_full_precision); + if (effective_precision < shortest_full_precision) + { + // When limiting the precision, we need to determine how to round the + // least significant printed hexit. The following branchless + // bit-level-parallel technique computes whether to round up the + // mantissa bit at index N (according to round-to-nearest rules) when + // dropping N bits of precision, for each index N in the bit vector. + // This technique is borrowed from the MSVC implementation. + using bitvec = mantissa_t; + const bitvec round_bit = effective_mantissa << 1; + const bitvec has_tail_bits = round_bit - 1; + const bitvec lsb_bit = effective_mantissa; + const bitvec should_round = round_bit & (has_tail_bits | lsb_bit); + + const int dropped_bits = 4*(full_hex_precision - effective_precision); + // Mask out the dropped nibbles. + effective_mantissa >>= dropped_bits; + effective_mantissa <<= dropped_bits; + if (should_round & (mantissa_t{1} << dropped_bits)) + { + // Round up the least significant nibble. + effective_mantissa += mantissa_t{1} << dropped_bits; + // Check and adjust for overflow of the leading nibble. When the + // type has an implicit leading bit, then the leading nibble + // before rounding is either 0 or 1, so it can't overflow. + if constexpr (!has_implicit_leading_bit) + { + // The only supported floating-point type with explicit + // leading mantissa bit is LDK_FLOAT80, i.e. x86 80-bit + // extended precision, and so we hardcode the below overflow + // check+adjustment for this type. + static_assert(mantissa_t_width == 64 + && rounded_mantissa_bits == 64); + if (effective_mantissa == 0) + { + // We rounded up the least significant nibble and the + // mantissa overflowed, e.g f.fcp+10 with precision=1 + // became 10.0p+10. Absorb this extra hexit into the + // exponent to obtain 1.0p+14. + effective_mantissa + = mantissa_t{1} << (rounded_mantissa_bits - 4); + written_exponent += 4; + } + } + } + } + + // Compute the leading hexit and mask it out from the mantissa. + char leading_hexit; + if constexpr (has_implicit_leading_bit) + { + const unsigned nibble = effective_mantissa >> rounded_mantissa_bits; + __glibcxx_assert(nibble <= 2); + leading_hexit = '0' + nibble; + effective_mantissa &= ~(mantissa_t{0b11} << rounded_mantissa_bits); + } + else + { + const unsigned nibble = effective_mantissa >> (rounded_mantissa_bits-4); + __glibcxx_assert(nibble < 16); + leading_hexit = "0123456789abcdef"[nibble]; + effective_mantissa &= ~(mantissa_t{0b1111} << (rounded_mantissa_bits-4)); + written_exponent -= 3; + } + + // Now before we start writing the string, determine the total length of + // the output string and perform a single bounds check. + int expected_output_length = sign + 1; + if (effective_precision != 0) + expected_output_length += strlen(".") + effective_precision; + const int abs_written_exponent = abs(written_exponent); + expected_output_length += (abs_written_exponent >= 10000 ? strlen("p+ddddd") + : abs_written_exponent >= 1000 ? strlen("p+dddd") + : abs_written_exponent >= 100 ? strlen("p+ddd") + : abs_written_exponent >= 10 ? strlen("p+dd") + : strlen("p+d")); + if (last - first < expected_output_length) + return {last, errc::value_too_large}; + + const auto saved_first = first; + // Write the negative sign and the leading hexit. + if (sign) + *first++ = '-'; + *first++ = leading_hexit; + + if (effective_precision > 0) + { + *first++ = '.'; + int written_hexits = 0; + // Extract and mask out the leading nibble after the decimal point, + // write its corresponding hexit, and repeat until the mantissa is + // empty. + int nibble_offset = rounded_mantissa_bits; + if constexpr (!has_implicit_leading_bit) + // We already printed the entire leading hexit. + nibble_offset -= 4; + while (effective_mantissa != 0) + { + nibble_offset -= 4; + const unsigned nibble = effective_mantissa >> nibble_offset; + __glibcxx_assert(nibble < 16); + *first++ = "0123456789abcdef"[nibble]; + ++written_hexits; + effective_mantissa &= ~(mantissa_t{0b1111} << nibble_offset); + } + __glibcxx_assert(nibble_offset >= 0); + __glibcxx_assert(written_hexits <= effective_precision); + // Since the mantissa is now empty, every hexit hereafter must be '0'. + if (int remaining_hexits = effective_precision - written_hexits) + { + memset(first, '0', remaining_hexits); + first += remaining_hexits; + } + } + + // Finally, write the exponent. + *first++ = 'p'; + if (written_exponent >= 0) + *first++ = '+'; + const to_chars_result result = to_chars(first, last, written_exponent); + __glibcxx_assert(result.ec == errc{} + && result.ptr == saved_first + expected_output_length); + return result; + } + +template<typename T> + static to_chars_result + __floating_to_chars_shortest(char* first, char* const last, const T value, + chars_format fmt) + { + if (fmt == chars_format::hex) + return __floating_to_chars_hex(first, last, value, nullopt); + + __glibcxx_assert(fmt == chars_format::fixed + || fmt == chars_format::scientific </cut>

4 years, 9 months

2
1
0 0

[TCWG CI] Regression caused by llvm: Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""

by ci_notify＠linaro.org

After llvm commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 Author: Jun Ma <JunMa(a)linux.alibaba.com> Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" the following benchmarks grew in size by more than 1%: - 401.bzip2 grew in size by 4% from 36214 to 37510 bytes - [.] BZ2_decompress grew in size by 19%,401.bzip2,[.] BZ2_decompress grew in size by 19% from 7260 to 8660 bytes Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 cd investigate-llvm-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d1280f6967db1ca8fa4e0c39414003e717b40feb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 Author: Jun Ma <JunMa(a)linux.alibaba.com> Date: Fri Aug 20 17:27:00 2021 +0800 Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056 --- llvm/include/llvm/Transforms/Utils/Local.h | 5 ++++ .../Scalar/CorrelatedValuePropagation.cpp | 27 +++++++++++++++++++++- llvm/lib/Transforms/Utils/Local.cpp | 20 ++++++++++++++++ llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 20 ---------------- .../Transforms/CorrelatedValuePropagation/basic.ll | 11 +++++---- 5 files changed, 57 insertions(+), 26 deletions(-) diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h index 97686d7d5f2f..f003615eca78 100644 --- a/llvm/include/llvm/Transforms/Utils/Local.h +++ b/llvm/include/llvm/Transforms/Utils/Local.h @@ -55,6 +55,7 @@ class MDNode; class MemorySSAUpdater; class PHINode; class StoreInst; +class SwitchInst; class TargetLibraryInfo; class TargetTransformInfo; @@ -236,6 +237,10 @@ CallInst *createCallMatchingInvoke(InvokeInst *II); /// This function converts the specified invoek into a normall call. void changeToCall(InvokeInst *II, DomTreeUpdater *DTU = nullptr); +/// This function removes the default destination from the specified switch. +void createUnreachableSwitchDefault(SwitchInst *Switch, + DomTreeUpdater *DTU = nullptr); + ///===---------------------------------------------------------------------===// /// Dbg Intrinsic utilities /// diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp index 36cbd42a5fdd..cd38ce96e287 100644 --- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp +++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp @@ -341,7 +341,13 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI, // ConstantFoldTerminator() as the underlying SwitchInst can be changed. SwitchInstProfUpdateWrapper SI(*I); - for (auto CI = SI->case_begin(), CE = SI->case_end(); CI != CE;) { + APInt Low = + APInt::getSignedMaxValue(Cond->getType()->getScalarSizeInBits()); + APInt High = + APInt::getSignedMinValue(Cond->getType()->getScalarSizeInBits()); + + SwitchInst::CaseIt CI = SI->case_begin(); + for (auto CE = SI->case_end(); CI != CE;) { ConstantInt *Case = CI->getCaseValue(); LazyValueInfo::Tristate State = LVI->getPredicateAt(CmpInst::ICMP_EQ, Cond, Case, I, @@ -374,9 +380,28 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI, break; } + // Get Lower/Upper bound from switch cases. + Low = APIntOps::smin(Case->getValue(), Low); + High = APIntOps::smax(Case->getValue(), High); + // Increment the case iterator since we didn't delete it. ++CI; } + + // Try to simplify default case as unreachable + if (CI == SI->case_end() && SI->getNumCases() != 0 && + !isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg())) { + const ConstantRange SIRange = + LVI->getConstantRange(SI->getCondition(), SI); + + // If the numbered switch cases cover the entire range of the condition, + // then the default case is not reachable. + if (SIRange.getSignedMin() == Low && SIRange.getSignedMax() == High && + SI->getNumCases() == High - Low + 1) { + createUnreachableSwitchDefault(SI, &DTU); + Changed = true; + } + } } if (Changed) diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp index 3d6ffded9b19..6d7eca8e3678 100644 --- a/llvm/lib/Transforms/Utils/Local.cpp +++ b/llvm/lib/Transforms/Utils/Local.cpp @@ -2182,6 +2182,26 @@ void llvm::changeToCall(InvokeInst *II, DomTreeUpdater *DTU) { DTU->applyUpdates({{DominatorTree::Delete, BB, UnwindDestBB}}); } +void llvm::createUnreachableSwitchDefault(SwitchInst *Switch, + DomTreeUpdater *DTU) { + LLVM_DEBUG(dbgs() << "SimplifyCFG: switch default is dead.\n"); + auto *BB = Switch->getParent(); + auto *OrigDefaultBlock = Switch->getDefaultDest(); + OrigDefaultBlock->removePredecessor(BB); + BasicBlock *NewDefaultBlock = BasicBlock::Create( + BB->getContext(), BB->getName() + ".unreachabledefault", BB->getParent(), + OrigDefaultBlock); + new UnreachableInst(Switch->getContext(), NewDefaultBlock); + Switch->setDefaultDest(&*NewDefaultBlock); + if (DTU) { + SmallVector<DominatorTree::UpdateType, 2> Updates; + Updates.push_back({DominatorTree::Insert, BB, &*NewDefaultBlock}); + if (!is_contained(successors(BB), OrigDefaultBlock)) + Updates.push_back({DominatorTree::Delete, BB, &*OrigDefaultBlock}); + DTU->applyUpdates(Updates); + } +} + BasicBlock *llvm::changeToInvokeAndSplitBasicBlock(CallInst *CI, BasicBlock *UnwindEdge, DomTreeUpdater *DTU) { diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 737b4f97a97a..70297e471a7a 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -4743,26 +4743,6 @@ static bool CasesAreContiguous(SmallVectorImpl<ConstantInt *> &Cases) { return true; } -static void createUnreachableSwitchDefault(SwitchInst *Switch, - DomTreeUpdater *DTU) { - LLVM_DEBUG(dbgs() << "SimplifyCFG: switch default is dead.\n"); - auto *BB = Switch->getParent(); - auto *OrigDefaultBlock = Switch->getDefaultDest(); - OrigDefaultBlock->removePredecessor(BB); - BasicBlock *NewDefaultBlock = BasicBlock::Create( - BB->getContext(), BB->getName() + ".unreachabledefault", BB->getParent(), - OrigDefaultBlock); - new UnreachableInst(Switch->getContext(), NewDefaultBlock); - Switch->setDefaultDest(&*NewDefaultBlock); - if (DTU) { - SmallVector<DominatorTree::UpdateType, 2> Updates; - Updates.push_back({DominatorTree::Insert, BB, &*NewDefaultBlock}); - if (!is_contained(successors(BB), OrigDefaultBlock)) - Updates.push_back({DominatorTree::Delete, BB, &*OrigDefaultBlock}); - DTU->applyUpdates(Updates); - } -} - /// Turn a switch with two reachable destinations into an integer range /// comparison and branch. bool SimplifyCFGOpt::TurnSwitchRangeIntoICmp(SwitchInst *SI, diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll index 5abbcbc90e01..a620c8468d4d 100644 --- a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll +++ b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll @@ -382,7 +382,7 @@ define i32 @switch_range(i32 %cond) { ; CHECK-NEXT: entry: ; CHECK-NEXT: [[S:%.*]] = urem i32 [[COND:%.*]], 3 ; CHECK-NEXT: [[S1:%.*]] = add nuw nsw i32 [[S]], 1 -; CHECK-NEXT: switch i32 [[S1]], label [[UNREACHABLE:%.*]] [ +; CHECK-NEXT: switch i32 [[S1]], label [[ENTRY_UNREACHABLEDEFAULT:%.*]] [ ; CHECK-NEXT: i32 1, label [[EXIT1:%.*]] ; CHECK-NEXT: i32 2, label [[EXIT2:%.*]] ; CHECK-NEXT: i32 3, label [[EXIT1]] @@ -391,6 +391,8 @@ define i32 @switch_range(i32 %cond) { ; CHECK-NEXT: ret i32 1 ; CHECK: exit2: ; CHECK-NEXT: ret i32 2 +; CHECK: entry.unreachabledefault: +; CHECK-NEXT: unreachable ; CHECK: unreachable: ; CHECK-NEXT: ret i32 0 ; @@ -453,10 +455,9 @@ define i8 @switch_defaultdest_multipleuse(i8 %t0) { ; CHECK-NEXT: entry: ; CHECK-NEXT: [[O:%.*]] = or i8 [[T0:%.*]], 1 ; CHECK-NEXT: [[R:%.*]] = srem i8 1, [[O]] -; CHECK-NEXT: switch i8 [[R]], label [[EXIT:%.*]] [ -; CHECK-NEXT: i8 0, label [[EXIT]] -; CHECK-NEXT: i8 1, label [[EXIT]] -; CHECK-NEXT: ] +; CHECK-NEXT: br label [[EXIT:%.*]] +; CHECK: entry.unreachabledefault: +; CHECK-NEXT: unreachable ; CHECK: exit: ; CHECK-NEXT: ret i8 0 ; </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association: commit 51298b330327a568358da069d9808f51c6cb1672 Author: Alan Modra <amodra(a)gmail.com> PR28149, debug info with wrong file association Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6363 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 7116 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Reproduce builds: <cut> mkdir investigate-binutils-51298b330327a568358da069d9808f51c6cb1672 cd investigate-binutils-51298b330327a568358da069d9808f51c6cb1672 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 51298b330327a568358da069d9808f51c6cb1672 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5cdb4f14426a99ec8fcba843fa503efdc55fa078 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 51298b330327a568358da069d9808f51c6cb1672 Author: Alan Modra <amodra(a)gmail.com> Date: Fri Sep 17 09:08:15 2021 +0930 PR28149, debug info with wrong file association gcc-11 and gcc-12 pass -gdwarf-5 to gas, in order to prime gas for DWARF 5 level debug info. Unfortunately it seems there are cases where the compiler does not emit a .file or .loc dwarf debug directive before any machine instructions. (Note that the .file directive typically emitted as the first line of assembly output doesn't count as a dwarf debug directive. The dwarf .file has a file number before the file name string.) This patch delays allocation of file numbers for gas generated line debug info until the end of assembly, thus avoiding any clashes with compiler generated file numbers. Two fixes for test case source are necessary; A .loc can't use a file number that hasn't already been specified with .file. A followup patch will remove all the gas generated line info on seeing a .file directive. PR 28149 * dwarf2dbg.c (num_of_auto_assigned): Delete. (current): Update initialisation. (set_or_check_view): Replace all accesses to view with u.view. (dwarf2_consume_line_info): Likewise. (dwarf2_directive_loc): Likewise. Assert that we aren't generating line info. (dwarf2_gen_line_info_1): Don't call set_or_check_view on gas generated line entries. (dwarf2_gen_line_info): Set and track filenames for gas generated line entries. Simplify generation of labels. (get_directory_table_entry): Use filename_cmp when comparing dirs. (do_allocate_filenum): New function. (dwarf2_where): Set u.filename and filenum to -1 for gas generated line entries. (dwarf2_directive_filename): Remove num_of_auto_assigned handling. (process_entries): Update view field access. Call do_allocate_filenum. * dwarf2dbg.h (struct dwarf2_line_info): Add filename field in union aliasing view. * testsuite/gas/i386/dwarf2-line-3.s: Add .file directive. * testsuite/gas/i386/dwarf2-line-4.s: Likewise. * testsuite/gas/i386/dwarf2-line-4.d: Update expected output. * testsuite/gas/i386/dwarf4-line-1.d: Likewise. * testsuite/gas/i386/dwarf5-line-1.d: Likewise. * testsuite/gas/i386/dwarf5-line-2.d: Likewise. --- gas/dwarf2dbg.c | 152 ++++++++++++++++++--------------- gas/dwarf2dbg.h | 7 +- gas/testsuite/gas/i386/dwarf2-line-3.s | 1 + gas/testsuite/gas/i386/dwarf2-line-4.d | 5 +- gas/testsuite/gas/i386/dwarf2-line-4.s | 1 + gas/testsuite/gas/i386/dwarf4-line-1.d | 4 +- gas/testsuite/gas/i386/dwarf5-line-1.d | 4 +- gas/testsuite/gas/i386/dwarf5-line-2.d | 3 +- 8 files changed, 105 insertions(+), 72 deletions(-) diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c index 9e3437b8948..c6303ba94a6 100644 --- a/gas/dwarf2dbg.c +++ b/gas/dwarf2dbg.c @@ -207,7 +207,6 @@ struct file_entry static struct file_entry *files; static unsigned int files_in_use; static unsigned int files_allocated; -static unsigned int num_of_auto_assigned; /* Table of directories used by .debug_line. */ static char ** dirs = NULL; @@ -233,7 +232,7 @@ static struct dwarf2_line_info current = { 1, 1, 0, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, - 0, NULL + 0, { NULL } }; /* This symbol is used to recognize view number forced resets in loc @@ -342,7 +341,7 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, /* First, compute !(E->label > P->label), to tell whether or not we're to reset the view number. If we can't resolve it to a constant, keep it symbolic. */ - if (!p || (e->loc.view == force_reset_view && force_reset_view)) + if (!p || (e->loc.u.view == force_reset_view && force_reset_view)) { viewx.X_op = O_constant; viewx.X_add_number = 0; @@ -367,9 +366,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, } } - if (S_IS_DEFINED (e->loc.view) && symbol_constant_p (e->loc.view)) + if (S_IS_DEFINED (e->loc.u.view) && symbol_constant_p (e->loc.u.view)) { - expressionS *value = symbol_get_value_expression (e->loc.view); + expressionS *value = symbol_get_value_expression (e->loc.u.view); /* We can't compare the view numbers at this point, because in VIEWX we've only determined whether we're to reset it so far. */ @@ -404,16 +403,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, { expressionS incv; - if (!p->loc.view) + if (!p->loc.u.view) { - p->loc.view = symbol_temp_make (); - gas_assert (!S_IS_DEFINED (p->loc.view)); + p->loc.u.view = symbol_temp_make (); + gas_assert (!S_IS_DEFINED (p->loc.u.view)); } memset (&incv, 0, sizeof (incv)); incv.X_unsigned = 1; incv.X_op = O_symbol; - incv.X_add_symbol = p->loc.view; + incv.X_add_symbol = p->loc.u.view; incv.X_add_number = 1; if (viewx.X_op == O_constant) @@ -430,16 +429,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, } } - if (!S_IS_DEFINED (e->loc.view)) + if (!S_IS_DEFINED (e->loc.u.view)) { - symbol_set_value_expression (e->loc.view, &viewx); - S_SET_SEGMENT (e->loc.view, expr_section); - symbol_set_frag (e->loc.view, &zero_address_frag); + symbol_set_value_expression (e->loc.u.view, &viewx); + S_SET_SEGMENT (e->loc.u.view, expr_section); + symbol_set_frag (e->loc.u.view, &zero_address_frag); } /* Define and attempt to simplify any earlier views needed to compute E's. */ - if (h && p && p->loc.view && !S_IS_DEFINED (p->loc.view)) + if (h && p && p->loc.u.view && !S_IS_DEFINED (p->loc.u.view)) { struct line_entry *h2; /* Reverse the list to avoid quadratic behavior going backwards @@ -459,7 +458,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, break; set_or_check_view (r, r->next, NULL); } - while (r->next && r->next->loc.view && !S_IS_DEFINED (r->next->loc.view) + while (r->next + && r->next->loc.u.view + && !S_IS_DEFINED (r->next->loc.u.view) && (r = r->next)); /* Unreverse the list, so that we can go forward again. */ @@ -475,14 +476,14 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, view of the previous subsegment. */ if (r == h) continue; - gas_assert (S_IS_DEFINED (r->loc.view)); - resolve_expression (symbol_get_value_expression (r->loc.view)); + gas_assert (S_IS_DEFINED (r->loc.u.view)); + resolve_expression (symbol_get_value_expression (r->loc.u.view)); } while (r != p && (r = r->next)); /* Now that we've defined and computed all earlier views that might be needed to compute E's, attempt to simplify it. */ - resolve_expression (symbol_get_value_expression (e->loc.view)); + resolve_expression (symbol_get_value_expression (e->loc.u.view)); } } @@ -518,10 +519,8 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) /* Subseg heads are chained to previous subsegs in dwarf2_finish. */ - if (loc->view && lss->head) - set_or_check_view (e, - (struct line_entry *)lss->ptail, - lss->head); + if (loc->filenum != -1u && loc->u.view && lss->head) + set_or_check_view (e, (struct line_entry *) lss->ptail, lss->head); *lss->ptail = e; lss->ptail = &e->next; @@ -532,9 +531,6 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) void dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc) { - static unsigned int line = -1; - static unsigned int filenum = -1; - symbolS *sym; /* Early out for as-yet incomplete location information. */ @@ -552,20 +548,35 @@ dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc) symbols apply to assembler code. It is necessary to emit duplicate line symbols when a compiler asks for them, because GDB uses them to determine the end of the prologue. */ - if (debug_type == DEBUG_DWARF2 - && line == loc->line && filenum == loc->filenum) - return; + if (debug_type == DEBUG_DWARF2) + { + static unsigned int line = -1; + static const char *filename = NULL; + + if (line == loc->line) + { + if (filename == loc->u.filename) + return; + if (filename_cmp (filename, loc->u.filename) == 0) + { + filename = loc->u.filename; + return; + } + } - line = loc->line; - filenum = loc->filenum; + line = loc->line; + filename = loc->u.filename; + } if (linkrelax) { - char name[120]; + static int label_num = 0; + char name[32]; /* Use a non-fake name for the line number location, so that it can be referred to by relocations. */ - sprintf (name, ".Loc.%u.%u", line, filenum); + sprintf (name, ".Loc.%u", label_num); + label_num++; sym = symbol_new (name, now_seg, frag_now, ofs); } else @@ -624,13 +635,15 @@ get_directory_table_entry (const char *dirname, { const char * pwd = file0_dirname ? file0_dirname : getpwd (); - if (dwarf_level >= 5 && strcmp (dirname, pwd) != 0) + if (dwarf_level >= 5 && filename_cmp (dirname, pwd) != 0) { - /* In DWARF-5 the 0 entry in the directory table is expected to be - the same as the DW_AT_comp_dir (which is set to the current build - directory). Since we are about to create a directory entry that - is not the same, allocate the current directory first. - FIXME: Alternatively we could generate an error message here. */ + /* In DWARF-5 the 0 entry in the directory table is + expected to be the same as the DW_AT_comp_dir (which + is set to the current build directory). Since we are + about to create a directory entry that is not the + same, allocate the current directory first. + FIXME: Alternatively we could generate an error + message here. */ (void) get_directory_table_entry (pwd, NULL, strlen (pwd), true); d = 1; @@ -745,14 +758,30 @@ allocate_filenum (const char * pathname) if (!assign_file_to_slot (i, file, dir)) return -1; - num_of_auto_assigned++; - last_used = i; last_used_dir_len = dir_len; return i; } +/* Run through the list of line entries starting at E, allocating + file entries for gas generated debug. */ + +static void +do_allocate_filenum (struct line_entry *e) +{ + do + { + if (e->loc.filenum == -1u) + { + e->loc.filenum = allocate_filenum (e->loc.u.filename); + e->loc.u.view = NULL; + } + e = e->next; + } + while (e); +} + /* Allocate slot NUM in the .debug_line file table to FILENAME. If DIRNAME is not NULL or there is a directory component to FILENAME then this will be stored in the directory table, if not already present. @@ -929,17 +958,12 @@ dwarf2_where (struct dwarf2_line_info *line) { if (debug_type == DEBUG_DWARF2) { - const char *filename; - - memset (line, 0, sizeof (*line)); - filename = as_where (&line->line); - line->filenum = allocate_filenum (filename); - /* FIXME: We should check the return value from allocate_filenum. */ + line->u.filename = as_where (&line->line); + line->filenum = -1u; line->column = 0; line->flags = DWARF2_FLAG_IS_STMT; line->isa = current.isa; line->discriminator = current.discriminator; - line->view = NULL; } else *line = current; @@ -1018,7 +1042,7 @@ dwarf2_consume_line_info (void) | DWARF2_FLAG_PROLOGUE_END | DWARF2_FLAG_EPILOGUE_BEGIN); current.discriminator = 0; - current.view = NULL; + current.u.view = NULL; } /* Called for each (preferably code) label. If dwarf2_loc_mark_labels @@ -1060,7 +1084,6 @@ dwarf2_directive_filename (void) char *filename; const char * dirname = NULL; int filename_len; - unsigned int i; /* Continue to accept a bare string and pass it off. */ SKIP_WHITESPACE (); @@ -1132,18 +1155,6 @@ dwarf2_directive_filename (void) return NULL; } - if (num_of_auto_assigned) - { - /* Clear slots auto-assigned before the first .file <NUMBER> - directive was seen. */ - if (files_in_use != (num_of_auto_assigned + 1)) - abort (); - for (i = 1; i < files_in_use; i++) - files[i].filename = NULL; - files_in_use = 0; - num_of_auto_assigned = 0; - } - if (! allocate_filename_to_slot (dirname, filename, (unsigned int) num, with_md5)) return NULL; @@ -1191,6 +1202,11 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) return; } + /* debug_type will be turned off by dwarf2_directive_filename, and + if we don't have a dwarf style .file then files_in_use will be + zero and the above error will trigger. */ + gas_assert (debug_type == DEBUG_NONE); + current.filenum = filenum; current.line = line; current.discriminator = 0; @@ -1333,7 +1349,7 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) S_SET_VALUE (sym, 0); symbol_set_frag (sym, &zero_address_frag); } - current.view = sym; + current.u.view = sym; } else { @@ -1347,10 +1363,9 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) demand_empty_rest_of_line (); dwarf2_any_loc_directive_seen = dwarf2_loc_directive_seen = true; - debug_type = DEBUG_NONE; /* If we were given a view id, emit the row right away. */ - if (current.view) + if (current.u.view) dwarf2_emit_insn (0); } @@ -1984,7 +1999,7 @@ process_entries (segT seg, struct line_entry *e) frag_ofs = S_GET_VALUE (lab); if (last_frag == NULL - || (e->loc.view == force_reset_view && force_reset_view + || (e->loc.u.view == force_reset_view && force_reset_view /* If we're going to reset the view, but we know we're advancing the PC, we don't have to force with set_address. We know we do when we're at the same @@ -2850,16 +2865,19 @@ dwarf2_finish (void) struct line_subseg *lss = s->head; struct line_entry **ptail = lss->ptail; + if (lss->head && SEG_NORMAL (s->seg)) + do_allocate_filenum (lss->head); + /* Reset the initial view of the first subsection of the section. */ - if (lss->head && lss->head->loc.view) + if (lss->head && lss->head->loc.u.view) set_or_check_view (lss->head, NULL, NULL); while ((lss = lss->next) != NULL) { /* Link the first view of subsequent subsections to the previous view. */ - if (lss->head && lss->head->loc.view) + if (lss->head && lss->head->loc.u.view) set_or_check_view (lss->head, !s->head ? NULL : (struct line_entry *)ptail, s->head ? s->head->head : NULL); diff --git a/gas/dwarf2dbg.h b/gas/dwarf2dbg.h index 14d770c40dd..700d9dec5cb 100644 --- a/gas/dwarf2dbg.h +++ b/gas/dwarf2dbg.h @@ -36,7 +36,12 @@ struct dwarf2_line_info unsigned int isa; unsigned int flags; unsigned int discriminator; - symbolS *view; + /* filenum == -1u chooses filename, otherwise view. */ + union + { + symbolS *view; + const char *filename; + } u; }; /* Implements the .file FILENO "FILENAME" directive. FILENO can be 0 diff --git a/gas/testsuite/gas/i386/dwarf2-line-3.s b/gas/testsuite/gas/i386/dwarf2-line-3.s index 2085ef93940..e933719fbc3 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-3.s +++ b/gas/testsuite/gas/i386/dwarf2-line-3.s @@ -7,6 +7,7 @@ main: .cfi_startproc nop + .file 1 "dwarf2-test.c" .loc 1 1 ret .cfi_endproc diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.d b/gas/testsuite/gas/i386/dwarf2-line-4.d index c0c85f4639f..a01fd0540f3 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-4.d +++ b/gas/testsuite/gas/i386/dwarf2-line-4.d @@ -33,11 +33,14 @@ Raw dump of debug contents of section \.z?debug_line: The File Name Table $offset 0x.*$: Entry Dir Time Size Name - 1 1 0 0 dwarf2-line-4.s + 1 0 0 0 dwarf2-test.c + 2 1 0 0 dwarf2-line-4.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Special opcode 13: advance Address by 0 to 0x0 and Line by 8 to 9 + \[0x.*\] Set File Name to entry 1 in the File Name Table \[0x.*\] Advance Line by -8 to 1 \[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 1 \[0x.*\] Advance PC by 1 to 0x2 diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.s b/gas/testsuite/gas/i386/dwarf2-line-4.s index 89bb62d9db7..7348f4be62c 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-4.s +++ b/gas/testsuite/gas/i386/dwarf2-line-4.s @@ -7,6 +7,7 @@ main: .cfi_startproc nop + .file 1 "dwarf2-test.c" .loc 1 1 ret .cfi_endproc diff --git a/gas/testsuite/gas/i386/dwarf4-line-1.d b/gas/testsuite/gas/i386/dwarf4-line-1.d index 4f8321e9bfd..8199efbb0c2 100644 --- a/gas/testsuite/gas/i386/dwarf4-line-1.d +++ b/gas/testsuite/gas/i386/dwarf4-line-1.d @@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line: Entry Dir Time Size Name 1 0 0 0 foo.c 2 0 0 0 foo.h + 3 1 0 0 dwarf4-line-1.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Advance Line by 81 to 82 \[0x.*\] Copy - \[0x.*\] Set File Name to entry 2 in the File Name Table + \[0x.*\] Set File Name to entry 3 in the File Name Table \[0x.*\] Advance Line by -73 to 9 \[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 9 \[0x.*\] Advance PC by 3 to 0x4 diff --git a/gas/testsuite/gas/i386/dwarf5-line-1.d b/gas/testsuite/gas/i386/dwarf5-line-1.d index f57fc47d269..2c2cf5696c4 100644 --- a/gas/testsuite/gas/i386/dwarf5-line-1.d +++ b/gas/testsuite/gas/i386/dwarf5-line-1.d @@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line: 0 $indirect line string, offset: 0x.*$: .*/gas/testsuite 1 $indirect line string, offset: 0x.*$: .*/gas/testsuite/gas/i386 - The File Name Table $offset 0x.*, lines 2, columns 3$: + The File Name Table $offset 0x.*, lines 3, columns 3$: Entry Dir MD5 Name 0 0 0xbbd69fc03ce253b2dbaab2522dd519ae $indirect line string, offset: 0x.*$: core.c 1 0 0x0 $indirect line string, offset: 0x.*$: types.h + 2 1 0x0 $indirect line string, offset: 0x.*$: dwarf5-line-1.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Special opcode 8: advance Address by 0 to 0x0 and Line by 3 to 4 \[0x.*\] Advance PC by 1 to 0x1 diff --git a/gas/testsuite/gas/i386/dwarf5-line-2.d b/gas/testsuite/gas/i386/dwarf5-line-2.d index 2f96df510d0..85f98c8ab9c 100644 --- a/gas/testsuite/gas/i386/dwarf5-line-2.d +++ b/gas/testsuite/gas/i386/dwarf5-line-2.d @@ -36,9 +36,10 @@ Raw dump of debug contents of section \.z?debug_line: 0 $indirect line string, offset: 0x.*$: .*/gas/testsuite 1 $indirect line string, offset: 0x.*$: .*/gas/testsuite/gas/i386 - The File Name Table $offset 0x.*, lines 1, columns 3$: + The File Name Table $offset 0x.*, lines 2, columns 3$: Entry Dir MD5 Name 0 0 0xbbd69fc03ce253b2dbaab2522dd519ae $indirect line string, offset: 0x.*$: core.c + 1 1 0x0 $indirect line string, offset: .*$: dwarf5-line-2.s Line Number Statements: \[0x.*\] Extended opcode 2: set Address to 0x0 </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] 447.dealII grew in size by 2% after gcc: libstdc++: Fix and improve std::vector<bool> implementation.

by ci_notify＠linaro.org

After gcc commit 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f Author: François Dumont <fdumont(a)gcc.gnu.org> libstdc++: Fix and improve std::vector<bool> implementation. the following benchmarks grew in size by more than 1%: - 447.dealII grew in size by 2% from 348834 to 356146 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-gcc-6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f cd investigate-gcc-6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5f9669d9e23a1116e040c80e0f3d4f43639bda52 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f Author: François Dumont <fdumont(a)gcc.gnu.org> Date: Tue Jan 21 07:18:08 2020 +0100 libstdc++: Fix and improve std::vector<bool> implementation. Do not consider allocator noexcept qualification for vector<bool> move constructor. Improve swap performance using TBAA like in main vector implementation. Bypass _M_initialize_dispatch/_M_assign_dispatch in post-c++11 modes. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h [_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): Define as _Bit_type*. (_Bvector_impl_data(const _Bvector_impl_data&)): Default. (_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to latter. (_Bvector_impl_data::operator=(const _Bvector_impl_data&)): Default. (_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter. (_Bvector_impl_data::_M_reset()): Likewise. (_Bvector_impl_data::_M_swap_data): New. (_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement explicitely. (_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, _Bvector_impl&&)): New. (_Bvector_base::_Bvector_base(_Bvector_base&&, const allocator_type&)): New, use latter. (vector::vector(vector&&, const allocator_type&, true_type)): New, use latter. (vector::vector(vector&&, const allocator_type&, false_type)): New. (vector::vector(vector&&, const allocator_type&)): Use latters. (vector::vector(const vector&, const allocator_type&)): Adapt. [__cplusplus >= 201103](vector::vector(_InputIt, _InputIt, const allocator_type&)): Use _M_initialize_range. (vector::operator[](size_type)): Use iterator operator[]. (vector::operator[](size_type) const): Use const_iterator operator[]. (vector::swap(vector&)): Add assertions on allocators. Use _M_swap_data. [__cplusplus >= 201103](vector::insert(const_iterator, _InputIt, _InputIt)): Use _M_insert_range. (vector::_M_initialize(size_type)): Adapt. [__cplusplus >= 201103](vector::_M_initialize_dispatch): Remove. [__cplusplus >= 201103](vector::_M_insert_dispatch): Remove. * python/libstdcxx/v6/printers.py (StdVectorPrinter._iterator): Stop using start _M_offset. (StdVectorPrinter.to_string): Likewise. * testsuite/23_containers/vector/bool/allocator/swap.cc: Adapt. * testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc: Add check. --- libstdc++-v3/include/bits/stl_bvector.h | 140 +++++++++++++-------- libstdc++-v3/python/libstdcxx/v6/printers.py | 5 +- .../23_containers/vector/bool/allocator/swap.cc | 22 ++-- .../vector/bool/cons/noexcept_move_construct.cc | 32 ++++- 4 files changed, 130 insertions(+), 69 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index a365e7182eb..d6f5435bdfb 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -427,53 +427,75 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER struct _Bvector_impl_data { - _Bit_iterator _M_start; - _Bit_iterator _M_finish; - _Bit_pointer _M_end_of_storage; +#if !_GLIBCXX_INLINE_VERSION + _Bit_iterator _M_start; +#else + // We don't need the offset field for the start, it's always zero. + struct { + _Bit_type* _M_p; + // Allow assignment from iterators (assume offset is zero): + void operator=(_Bit_iterator __it) { _M_p = __it._M_p; } + } _M_start; +#endif + _Bit_iterator _M_finish; + _Bit_pointer _M_end_of_storage; _Bvector_impl_data() _GLIBCXX_NOEXCEPT : _M_start(), _M_finish(), _M_end_of_storage() { } #if __cplusplus >= 201103L + _Bvector_impl_data(const _Bvector_impl_data&) = default; + _Bvector_impl_data& + operator=(const _Bvector_impl_data&) = default; + _Bvector_impl_data(_Bvector_impl_data&& __x) noexcept - : _M_start(__x._M_start), _M_finish(__x._M_finish) - , _M_end_of_storage(__x._M_end_of_storage) + : _Bvector_impl_data(__x) { __x._M_reset(); } void _M_move_data(_Bvector_impl_data&& __x) noexcept { - this->_M_start = __x._M_start; - this->_M_finish = __x._M_finish; - this->_M_end_of_storage = __x._M_end_of_storage; + *this = __x; __x._M_reset(); } #endif void _M_reset() _GLIBCXX_NOEXCEPT + { *this = _Bvector_impl_data(); } + + void + _M_swap_data(_Bvector_impl_data& __x) _GLIBCXX_NOEXCEPT { - _M_start = _M_finish = _Bit_iterator(); - _M_end_of_storage = _Bit_pointer(); + // Do not use std::swap(_M_start, __x._M_start), etc as it loses + // information used by TBAA. + std::swap(*this, __x); } }; struct _Bvector_impl : public _Bit_alloc_type, public _Bvector_impl_data - { - public: - _Bvector_impl() _GLIBCXX_NOEXCEPT_IF( - is_nothrow_default_constructible<_Bit_alloc_type>::value) - : _Bit_alloc_type() - { } + { + _Bvector_impl() _GLIBCXX_NOEXCEPT_IF( + is_nothrow_default_constructible<_Bit_alloc_type>::value) + : _Bit_alloc_type() + { } - _Bvector_impl(const _Bit_alloc_type& __a) _GLIBCXX_NOEXCEPT - : _Bit_alloc_type(__a) - { } + _Bvector_impl(const _Bit_alloc_type& __a) _GLIBCXX_NOEXCEPT + : _Bit_alloc_type(__a) + { } #if __cplusplus >= 201103L - _Bvector_impl(_Bvector_impl&&) = default; + // Not defaulted, to enforce noexcept(true) even when + // !is_nothrow_move_constructible<_Bit_alloc_type>. + _Bvector_impl(_Bvector_impl&& __x) noexcept + : _Bit_alloc_type(std::move(__x)), _Bvector_impl_data(std::move(__x)) + { } + + _Bvector_impl(_Bit_alloc_type&& __a, _Bvector_impl&& __x) noexcept + : _Bit_alloc_type(std::move(__a)), _Bvector_impl_data(std::move(__x)) + { } #endif _Bit_type* @@ -511,6 +533,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus >= 201103L _Bvector_base(_Bvector_base&&) = default; + + _Bvector_base(_Bvector_base&& __x, const allocator_type& __a) noexcept + : _M_impl(_Bit_alloc_type(__a), std::move(__x._M_impl)) + { } #endif ~_Bvector_base() @@ -647,14 +673,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER : _Base(_Bit_alloc_traits::_S_select_on_copy(__x._M_get_Bit_allocator())) { _M_initialize(__x.size()); - _M_copy_aligned(__x.begin(), __x.end(), this->_M_impl._M_start); + _M_copy_aligned(__x.begin(), __x.end(), begin()); } #if __cplusplus >= 201103L vector(vector&&) = default; - vector(vector&& __x, const allocator_type& __a) - noexcept(_Bit_alloc_traits::_S_always_equal()) + private: + vector(vector&& __x, const allocator_type& __a, true_type) noexcept + : _Base(std::move(__x), __a) + { } + + vector(vector&& __x, const allocator_type& __a, false_type) : _Base(__a) { if (__x.get_allocator() == __a) @@ -667,11 +697,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } + public: + vector(vector&& __x, const allocator_type& __a) + noexcept(_Bit_alloc_traits::_S_always_equal()) + : vector(std::move(__x), __a, + typename _Bit_alloc_traits::is_always_equal{}) + { } + vector(const vector& __x, const allocator_type& __a) : _Base(__a) { _M_initialize(__x.size()); - _M_copy_aligned(__x.begin(), __x.end(), this->_M_impl._M_start); + _M_copy_aligned(__x.begin(), __x.end(), begin()); } vector(initializer_list<bool> __l, @@ -689,13 +726,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER vector(_InputIterator __first, _InputIterator __last, const allocator_type& __a = allocator_type()) : _Base(__a) - { _M_initialize_dispatch(__first, __last, __false_type()); } + { + _M_initialize_range(__first, __last, + std::__iterator_category(__first)); + } #else template<typename _InputIterator> vector(_InputIterator __first, _InputIterator __last, const allocator_type& __a = allocator_type()) : _Base(__a) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_initialize_dispatch(__first, __last, _Integral()); } @@ -762,7 +803,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER vector& operator=(initializer_list<bool> __l) { - this->assign (__l.begin(), __l.end()); + this->assign(__l.begin(), __l.end()); return *this; } #endif @@ -786,6 +827,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void assign(_InputIterator __first, _InputIterator __last) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_assign_dispatch(__first, __last, _Integral()); } @@ -874,17 +916,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER reference operator[](size_type __n) - { - return *iterator(this->_M_impl._M_start._M_p - + __n / int(_S_word_bit), __n % int(_S_word_bit)); - } + { return begin()[__n]; } const_reference operator[](size_type __n) const - { - return *const_iterator(this->_M_impl._M_start._M_p - + __n / int(_S_word_bit), __n % int(_S_word_bit)); - } + { return begin()[__n]; } protected: void @@ -951,10 +987,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void swap(vector& __x) _GLIBCXX_NOEXCEPT { - std::swap(this->_M_impl._M_start, __x._M_impl._M_start); - std::swap(this->_M_impl._M_finish, __x._M_impl._M_finish); - std::swap(this->_M_impl._M_end_of_storage, - __x._M_impl._M_end_of_storage); +#if __cplusplus >= 201103L + __glibcxx_assert(_Bit_alloc_traits::propagate_on_container_swap::value + || _M_get_Bit_allocator() == __x._M_get_Bit_allocator()); +#endif + this->_M_impl._M_swap_data(__x._M_impl); _Bit_alloc_traits::_S_on_swap(_M_get_Bit_allocator(), __x._M_get_Bit_allocator()); } @@ -992,8 +1029,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _InputIterator __first, _InputIterator __last) { difference_type __offset = __position - cbegin(); - _M_insert_dispatch(__position._M_const_cast(), - __first, __last, __false_type()); + _M_insert_range(__position._M_const_cast(), + __first, __last, + std::__iterator_category(__first)); return begin() + __offset; } #else @@ -1002,6 +1040,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER insert(iterator __position, _InputIterator __first, _InputIterator __last) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_insert_dispatch(__position, __first, __last, _Integral()); } @@ -1113,15 +1152,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { _Bit_pointer __q = this->_M_allocate(__n); this->_M_impl._M_end_of_storage = __q + _S_nword(__n); - this->_M_impl._M_start = iterator(std::__addressof(*__q), 0); + iterator __start = iterator(std::__addressof(*__q), 0); + this->_M_impl._M_start = __start; + this->_M_impl._M_finish = __start + difference_type(__n); } - else - { - this->_M_impl._M_end_of_storage = _Bit_pointer(); - this->_M_impl._M_start = iterator(0, 0); - } - this->_M_impl._M_finish = this->_M_impl._M_start + difference_type(__n); - } void @@ -1141,8 +1175,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _M_shrink_to_fit(); #endif - // Check whether it's an integral type. If so, it's not an iterator. - +#if __cplusplus < 201103L // _GLIBCXX_RESOLVE_LIB_DEFECTS // 438. Ambiguity in the "do the right thing" clause template<typename _Integer> @@ -1159,6 +1192,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __false_type) { _M_initialize_range(__first, __last, std::__iterator_category(__first)); } +#endif template<typename _InputIterator> void @@ -1176,7 +1210,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { const size_type __n = std::distance(__first, __last); _M_initialize(__n); - std::copy(__first, __last, this->_M_impl._M_start); + std::copy(__first, __last, begin()); } #if __cplusplus < 201103L @@ -1240,8 +1274,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } - // Check whether it's an integral type. If so, it's not an iterator. - +#if __cplusplus < 201103L // _GLIBCXX_RESOLVE_LIB_DEFECTS // 438. Ambiguity in the "do the right thing" clause template<typename _Integer> @@ -1257,6 +1290,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __false_type) { _M_insert_range(__pos, __first, __last, std::__iterator_category(__first)); } +#endif void _M_fill_insert(iterator __position, size_type __n, bool __x); diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py index e4da8dfe5b6..0bf307b8e5f 100644 --- a/libstdc++-v3/python/libstdcxx/v6/printers.py +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py @@ -405,7 +405,7 @@ class StdVectorPrinter: self.bitvec = bitvec if bitvec: self.item = start['_M_p'] - self.so = start['_M_offset'] + self.so = 0 self.finish = finish['_M_p'] self.fo = finish['_M_offset'] itype = self.item.dereference().type @@ -453,12 +453,11 @@ class StdVectorPrinter: end = self.val['_M_impl']['_M_end_of_storage'] if self.is_bool: start = self.val['_M_impl']['_M_start']['_M_p'] - so = self.val['_M_impl']['_M_start']['_M_offset'] finish = self.val['_M_impl']['_M_finish']['_M_p'] fo = self.val['_M_impl']['_M_finish']['_M_offset'] itype = start.dereference().type bl = 8 * itype.sizeof - length = (bl - so) + bl * ((finish - start) - 1) + fo + length = bl * (finish - start) + fo capacity = bl * (end - start) return ('%s<bool> of length %d, capacity %d' % (self.typename, int (length), int (capacity))) diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc index a8107145c58..793115b473e 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc @@ -28,19 +28,17 @@ namespace __gnu_test // It is undefined behaviour to swap() containers with unequal allocators // if the allocator doesn't propagate, so ensure the allocators compare // equal, while still being able to test propagation via get_personality(). - bool - operator==(const propagating_allocator<T, false>&, - const propagating_allocator<T, false>&) - { - return true; - } + template<typename Type> + bool + operator==(const propagating_allocator<Type, false>&, + const propagating_allocator<Type, false>&) + { return true; } - bool - operator!=(const propagating_allocator<T, false>&, - const propagating_allocator<T, false>&) - { - return false; - } + template<typename Type> + bool + operator!=(const propagating_allocator<Type, false>&, + const propagating_allocator<Type, false>&) + { return false; } } using __gnu_test::propagating_allocator; diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc index 03794d8ebd8..296ba33bba8 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc @@ -23,4 +23,34 @@ typedef std::vector<bool> vbtype; -static_assert(std::is_nothrow_move_constructible<vbtype>::value, "Error"); +static_assert( std::is_nothrow_move_constructible<vbtype>::value, + "noexcept move constructor" ); +static_assert( std::is_nothrow_constructible<vbtype, + vbtype&&, const typename vbtype::allocator_type&>::value, + "noexcept move constructor with allocator" ); + +template<typename Type> + class not_noexcept_move_constructor_alloc : public std::allocator<Type> + { + public: + not_noexcept_move_constructor_alloc() noexcept { } + + not_noexcept_move_constructor_alloc( + const not_noexcept_move_constructor_alloc& x) noexcept + : std::allocator<Type>(x) + { } + + not_noexcept_move_constructor_alloc( + not_noexcept_move_constructor_alloc&& x) noexcept(false) + : std::allocator<Type>(std::move(x)) + { } + + template<typename _Tp1> + struct rebind + { typedef not_noexcept_move_constructor_alloc<_Tp1> other; }; + }; + +typedef std::vector<bool, not_noexcept_move_constructor_alloc<bool>> vbtype2; + +static_assert( std::is_nothrow_move_constructible<vbtype2>::value, + "noexcept move constructor with not noexcept alloc" ); </cut>

4 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:9e58de3ce00fc2385c9efb7faf321e0c601f0b0c

by ci_notify＠linaro.org

Identified regression caused by *gcc:9e58de3ce00fc2385c9efb7faf321e0c601f0b0c*: commit 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c Author: Andrew Pinski <apinski(a)marvell.com> Fix PR lto/49664: liblto_plugin.so exports too many symbols Results regressed to (for first_bad == 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 512b383534785f9fc021e700a1fdda86cf0f3fe7) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_lto: 2 This commit has regressed these CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_lto Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Reproduce builds: <cut> mkdir investigate-gcc-9e58de3ce00fc2385c9efb7faf321e0c601f0b0c cd investigate-gcc-9e58de3ce00fc2385c9efb7faf321e0c601f0b0c # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c ../artifacts/test.sh # Reproduce last_good build git checkout --detach 512b383534785f9fc021e700a1fdda86cf0f3fe7 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c Author: Andrew Pinski <apinski(a)marvell.com> Date: Sun Sep 12 08:58:16 2021 +0000 Fix PR lto/49664: liblto_plugin.so exports too many symbols So right now liblto_plugin.so exports many libiberty symbols and simple_object file symbols but really it just needs to export onload. This fixes the problem by using "-export-symbols-regex onload" on the libtool link line. lto-plugin/ChangeLog: PR lto/49664 * Makefile.am: Export only onload. * Makefile.in: Regenerate. --- lto-plugin/Makefile.am | 3 ++- lto-plugin/Makefile.in | 7 ++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am index 8b20e1d1d87..988d7a78294 100644 --- a/lto-plugin/Makefile.am +++ b/lto-plugin/Makefile.am @@ -21,7 +21,8 @@ in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) liblto_plugin_la_SOURCES = lto-plugin.c # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) \ - $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir) + $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir) \ + -export-symbols-regex onload # Can be simplified when libiberty becomes a normal convenience library. libiberty = $(with_libiberty)/libiberty.a libiberty_noasan = $(with_libiberty)/noasan/libiberty.a diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index 20611c6b1e6..f8df31bb1e8 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -323,6 +323,7 @@ prefix = @prefix@ program_transform_name = @program_transform_name@ psdir = @psdir@ real_target_noncanonical = @real_target_noncanonical@ +runstatedir = @runstatedir@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ srcdir = @srcdir@ @@ -350,9 +351,9 @@ libexecsub_LTLIBRARIES = liblto_plugin.la in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) liblto_plugin_la_SOURCES = lto-plugin.c # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. -liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module -avoid-version \ - -bindir $(libexecsubdir) $(if $(wildcard \ - $(libiberty_noasan)),, $(if $(wildcard \ +liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module \ + -avoid-version -bindir $(libexecsubdir) -export-symbols-regex \ + onload $(if $(wildcard $(libiberty_noasan)),, $(if $(wildcard \ $(libiberty_pic)),,-Wc,$(libiberty))) # Can be simplified when libiberty becomes a normal convenience library. libiberty = $(with_libiberty)/libiberty.a </cut>

4 years, 9 months

3
5
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit dc746ef741993a7aed1f7fc0083cd7a9636481a3 Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28705 # linux build successful: all # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28705 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Reproduce builds: <cut> mkdir investigate-binutils-dc746ef741993a7aed1f7fc0083cd7a9636481a3 cd investigate-binutils-dc746ef741993a7aed1f7fc0083cd7a9636481a3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach dc746ef741993a7aed1f7fc0083cd7a9636481a3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach f677852bbdaeac38c7d8ef859905879a21d5bb71 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit dc746ef741993a7aed1f7fc0083cd7a9636481a3 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Thu Sep 16 00:00:07 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 3fc5b8197cf..f7ae1790855 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210915 +#define BFD_VERSION_DATE 20210916 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 16 Sep

by Peter Maydell

Progress (short week, 3 days) * UM-2 [QEMU upstream maintainership] + more code review, notably the Apple Silicon hvf support, which is nearly ready to go in * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Sent out v2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. + Fixed a bug where we weren't correctly setting up FPSCR.LTPSIZE when using QEMU's user-mode-only emulator + Wrote some code to add support for the (not yet finalized) gdbstub XML that tells GDB that the guest CPU has MVE. This causes a GDB with the MVE handling to crash, so one or the other of us has got something wrong :-) KVM Forum was this week, as a 2-day virtual conference. I felt the programme was comparatively a bit small this year, but there were some interesting talks. Also a BoF session on whether/how we should consider adding Rust code to QEMU: I am pushing for (a) a clearer medium-to-long-term vision of where we would be going and why we'd be doing this and (b) more design-sketch type work of "what would XYZ in rust look like", which would hopefully both (a) make the benefit/lack thereof a bit more clear and (b) demonstrate that there are enough people enthusiastic enough about the prospect to make it a success... -- PMM

4 years, 9 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain