After llvm commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Turn on the new pass manager by default
the following benchmarks grew in size by more than 1%:
- 403.gcc grew in size by 2% from 2586180 to 2648252 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their latest release branch
- Target: aarch64-linux-gnu
- Compiler flags: -Os -flto
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Date: Mon Jan 25 11:00:56 2021 -0800
Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).
This does not affect the backend target-dependent codegen pipeline.
If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.
Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html
Reviewed By: asbirlea, ychen, MaskRay, echristo
Differential Revision: https://reviews.llvm.org/D95380
---
llvm/CMakeLists.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 1affc289e64b..f5298de9f7ca 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -688,8 +688,8 @@ else()
endif()
option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default})
-set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL
- "Enable the experimental new pass manager by default.")
+set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL
+ "Enable the new pass manager by default.")
include(HandleLLVMOptions)
</cut>
After gcc commit 3c57e692357c79ee7623dfc1586652aee2aefb8f
Author: Patrick Palka <ppalka(a)redhat.com>
libstdc++: Add floating-point std::to_chars implementation
the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%):
- 447.dealII:libstdc++.so.6.0.29 grew in size by 12% from 1245370 to 1391240 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their latest release branch
- Target: arm-linux-gnueabihf
- Compiler flags: -Os -mthumb
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f
cd investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 3c57e692357c79ee7623dfc1586652aee2aefb8f
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 5033506993ef92589373270a8e8dbbf50e3ebef1
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 3c57e692357c79ee7623dfc1586652aee2aefb8f
Author: Patrick Palka <ppalka(a)redhat.com>
Date: Thu Dec 17 23:11:34 2020 -0500
libstdc++: Add floating-point std::to_chars implementation
This implements the floating-point std::to_chars overloads for float,
double and long double. We use the Ryu library to compute the shortest
round-trippable fixed and scientific forms for float, double and long
double. We also use Ryu for performing explicit-precision fixed and
scientific formatting for float and double. For explicit-precision
formatting for long double we fall back to using printf. Hexadecimal
formatting for float, double and long double is implemented from
scratch.
The supported long double binary formats are binary64, binary80 (x86
80-bit extended precision), binary128 and ibm128.
Much of the complexity of the implementation is in computing the exact
output length before handing it off to Ryu (which doesn't do bounds
checking). In some cases it's hard to compute the output length
beforehand, so in these cases we instead compute an upper bound on the
output length and use a sufficiently-sized intermediate buffer only if
necessary.
Another source of complexity is in the general-with-precision formatting
mode, where we need to do zero-trimming of the string returned by Ryu,
and where we also take care to avoid having to format the number through
Ryu a second time when the general formatting mode resolves to fixed
(which we determine by doing a scientific formatting first and
inspecting the scientific exponent). We avoid going through Ryu twice
by instead transforming the scientific form to the corresponding fixed
form via in-place string manipulation.
This implementation is non-conforming in a couple of ways:
1. For the shortest hexadecimal formatting, we currently follow the
Microsoft implementation's decision to be consistent with the
output of printf's '%a' specifier at the expense of sometimes not
printing the shortest representation. For example, the shortest hex
form for the number 1.08p+0 is 2.1p-1, but we output the former
instead of the latter, as does printf.
2. The Ryu routine generic_binary_to_decimal that we use for performing
shortest formatting for large floating point types is implemented
using the __int128 type, but some targets with a large long double
type lack __int128 (e.g. i686), so we can't perform shortest
formatting of long double on such targets through Ryu. As a
temporary stopgap this patch makes the long double to_chars overloads
just dispatch to the double overloads on these targets, which means
we lose precision in the output. (We could potentially fix this by
writing a specialized version of Ryu's generic_binary_to_decimal
routine that uses uint64_t instead of __int128.) [Though I wonder if
there's a better way to work around the lack of __int128 on i686
specifically?]
3. Our shortest formatting for __ibm128 doesn't guarantee the round-trip
property if the difference between the high- and low-order exponent
is large. This is because we treat __ibm128 as if it has a
contiguous 105-bit mantissa by merging the mantissas of the high-
and low-order parts (using code extracted from glibc), so we
potentially lose precision from the low-order part. This seems to be
consistent with how glibc printf formats __ibm128.
libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver: Add new exports.
* include/std/charconv (to_chars): Declare the floating-point
overloads for float, double and long double.
* src/c++17/Makefile.am (sources): Add floating_to_chars.cc.
* src/c++17/Makefile.in: Regenerate.
* src/c++17/floating_to_chars.cc: New file.
(to_chars): Define for float, double and long double.
* testsuite/20_util/to_chars/long_double.cc: New test.
---
libstdc++-v3/config/abi/pre/gnu.ver | 7 +
libstdc++-v3/include/std/charconv | 24 +
libstdc++-v3/src/c++17/Makefile.am | 1 +
libstdc++-v3/src/c++17/Makefile.in | 3 +-
libstdc++-v3/src/c++17/floating_to_chars.cc | 1563 ++++++++++++++++++++
.../testsuite/20_util/to_chars/long_double.cc | 199 +++
6 files changed, 1796 insertions(+), 1 deletion(-)
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 4b4bd8ab6da..05e0a512247 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2393,6 +2393,13 @@ GLIBCXX_3.4.29 {
# std::once_flag::_M_finish(bool)
_ZNSt9once_flag9_M_finishEb;
+ # std::to_chars(char*, char*, [float|double|long double])
+ _ZSt8to_charsPcS_[defg];
+ # std::to_chars(char*, char*, [float|double|long double], chars_format)
+ _ZSt8to_charsPcS_[defg]St12chars_format;
+ # std::to_chars(char*, char*, [float|double|long double], chars_format, int)
+ _ZSt8to_charsPcS_[defg]St12chars_formati;
+
} GLIBCXX_3.4.28;
# Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index dd1ebdf8322..b57b0a16db2 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -702,6 +702,30 @@ namespace __detail
chars_format __fmt = chars_format::general) noexcept;
#endif
+ // Floating-point std::to_chars
+
+ // Overloads for float.
+ to_chars_result to_chars(char* __first, char* __last, float __value) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, float __value,
+ chars_format __fmt) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, float __value,
+ chars_format __fmt, int __precision) noexcept;
+
+ // Overloads for double.
+ to_chars_result to_chars(char* __first, char* __last, double __value) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, double __value,
+ chars_format __fmt) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, double __value,
+ chars_format __fmt, int __precision) noexcept;
+
+ // Overloads for long double.
+ to_chars_result to_chars(char* __first, char* __last, long double __value)
+ noexcept;
+ to_chars_result to_chars(char* __first, char* __last, long double __value,
+ chars_format __fmt) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, long double __value,
+ chars_format __fmt, int __precision) noexcept;
+
_GLIBCXX_END_NAMESPACE_VERSION
} // namespace std
#endif // C++14
diff --git a/libstdc++-v3/src/c++17/Makefile.am b/libstdc++-v3/src/c++17/Makefile.am
index 37cdb53c076..2ec5ed621ca 100644
--- a/libstdc++-v3/src/c++17/Makefile.am
+++ b/libstdc++-v3/src/c++17/Makefile.am
@@ -51,6 +51,7 @@ endif
sources = \
floating_from_chars.cc \
+ floating_to_chars.cc \
fs_dir.cc \
fs_ops.cc \
fs_path.cc \
diff --git a/libstdc++-v3/src/c++17/Makefile.in b/libstdc++-v3/src/c++17/Makefile.in
index ccae721ab3f..9b36b7a916c 100644
--- a/libstdc++-v3/src/c++17/Makefile.in
+++ b/libstdc++-v3/src/c++17/Makefile.in
@@ -124,7 +124,7 @@ LTLIBRARIES = $(noinst_LTLIBRARIES)
libc__17convenience_la_LIBADD =
@ENABLE_DUAL_ABI_TRUE@am__objects_1 = cow-fs_dir.lo cow-fs_ops.lo \
@ENABLE_DUAL_ABI_TRUE@ cow-fs_path.lo
-am__objects_2 = floating_from_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \
+am__objects_2 = floating_from_chars.lo floating_to_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \
memory_resource.lo $(am__objects_1)
@ENABLE_DUAL_ABI_TRUE@am__objects_3 = cow-string-inst.lo
@ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_4 = ostream-inst.lo \
@@ -440,6 +440,7 @@ headers =
sources = \
floating_from_chars.cc \
+ floating_to_chars.cc \
fs_dir.cc \
fs_ops.cc \
fs_path.cc \
diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc b/libstdc++-v3/src/c++17/floating_to_chars.cc
new file mode 100644
index 00000000000..dd83f5eea93
--- /dev/null
+++ b/libstdc++-v3/src/c++17/floating_to_chars.cc
@@ -0,0 +1,1563 @@
+// std::to_chars implementation for floating-point types -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+// Activate __glibcxx_assert within this file to shake out any bugs.
+#define _GLIBCXX_ASSERTIONS 1
+
+#include <charconv>
+
+#include <bit>
+#include <cfenv>
+#include <cassert>
+#include <cmath>
+#include <cstdio>
+#include <cstring>
+#include <langinfo.h>
+#include <optional>
+#include <string_view>
+#include <type_traits>
+
+// Determine the binary format of 'long double'.
+
+// We support the binary64, float80 (i.e. x86 80-bit extended precision),
+// binary128, and ibm128 formats.
+#define LDK_UNSUPPORTED 0
+#define LDK_BINARY64 1
+#define LDK_FLOAT80 2
+#define LDK_BINARY128 3
+#define LDK_IBM128 4
+
+#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
+# define LONG_DOUBLE_KIND LDK_BINARY64
+#elif defined(__SIZEOF_INT128__)
+// The Ryu routines need a 128-bit integer type in order to do shortest
+// formatting of types larger than 64-bit double, so without __int128 we can't
+// support any large long double format. This is the case for e.g. i386.
+# if __LDBL_MANT_DIG__ == 64
+# define LONG_DOUBLE_KIND LDK_FLOAT80
+# elif __LDBL_MANT_DIG__ == 113
+# define LONG_DOUBLE_KIND LDK_BINARY128
+# elif __LDBL_MANT_DIG__ == 106
+# define LONG_DOUBLE_KIND LDK_IBM128
+# endif
+#endif
+#if !defined(LONG_DOUBLE_KIND)
+# define LONG_DOUBLE_KIND LDK_UNSUPPORTED
+#endif
+
+namespace
+{
+ namespace ryu
+ {
+#include "ryu/common.h"
+#include "ryu/digit_table.h"
+#include "ryu/d2s_intrinsics.h"
+#include "ryu/d2s_full_table.h"
+#include "ryu/d2fixed_full_table.h"
+#include "ryu/f2s_intrinsics.h"
+#include "ryu/d2s.c"
+#include "ryu/d2fixed.c"
+#include "ryu/f2s.c"
+
+#ifdef __SIZEOF_INT128__
+ namespace generic128
+ {
+ // Put the generic Ryu bits in their own namespace to avoid name conflicts.
+# include "ryu/generic_128.h"
+# include "ryu/ryu_generic_128.h"
+# include "ryu/generic_128.c"
+ } // namespace generic128
+
+ using generic128::floating_decimal_128;
+ using generic128::generic_binary_to_decimal;
+
+ int
+ to_chars(const floating_decimal_128 v, char* const result)
+ { return generic128::generic_to_chars(v, result); }
+#endif
+ } // namespace ryu
+
+ // A traits class that contains pertinent information about the binary
+ // format of each of the floating-point types we support.
+ template<typename T>
+ struct floating_type_traits
+ { };
+
+ template<>
+ struct floating_type_traits<float>
+ {
+ // We (and Ryu) assume float has the IEEE binary32 format.
+ static_assert(__FLT_MANT_DIG__ == 24);
+ static constexpr int mantissa_bits = 23;
+ static constexpr int exponent_bits = 8;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = uint32_t;
+ using shortest_scientific_t = ryu::floating_decimal_32;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000011101011100110101100101101110000000000000000000000000 };
+ };
+
+ template<>
+ struct floating_type_traits<double>
+ {
+ // We (and Ryu) assume double has the IEEE binary64 format.
+ static_assert(__DBL_MANT_DIG__ == 53);
+ static constexpr int mantissa_bits = 52;
+ static constexpr int exponent_bits = 11;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = uint64_t;
+ using shortest_scientific_t = ryu::floating_decimal_64;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000011000110101110111000001100101110000111100,
+ 0b0111100011110101011000011110000000110110010101011000001110011111,
+ 0b0101101100000000011100100100111100110110110100010001010101110000,
+ 0b0011110010111000101111110101100011101100010001010000000101100111,
+ 0b0001010000011001011100100001010000010101101000001101000000000000 };
+ };
+
+#if LONG_DOUBLE_KIND == LDK_BINARY64
+ // When long double is equivalent to double, we just forward the long double
+ // overloads to the double overloads, so we don't need to define a a
+ // floating_type_traits<long double> specialization in this case.
+#elif LONG_DOUBLE_KIND == LDK_FLOAT80
+ template<>
+ struct floating_type_traits<long double>
+ {
+ static constexpr int mantissa_bits = 64;
+ static constexpr int exponent_bits = 15;
+ static constexpr bool has_implicit_leading_bit = false;
+ using mantissa_t = uint64_t;
+ using shortest_scientific_t = ryu::floating_decimal_128;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000000000110101011111110100010100110000011101,
+ 0b1001100101001111010011011111101000101111110001011001011101110000,
+ 0b0000101111111011110010001000001010111101011110111111010100011001,
+ 0b0011100000011111001101101011111001111100100010000101001111101001,
+ 0b0100100100000000100111010010101110011000110001101101110011001010,
+ 0b0111100111100010100000010011000010010110101111110101000011110100,
+ 0b1010100111100010011110000011011101101100010110000110101010101010,
+ 0b0000001111001111000000101100111011011000101000110011101100110010,
+ 0b0111000011100100101101010100001101111110101111001000010011111111,
+ 0b0010111000100110100100100010101100111010110001101010010111001000,
+ 0b0000100000010110000011001001000111000001111010100101101000001111,
+ 0b0010101011101000111100001011000010011101000101010010010000101111,
+ 0b1011111011101101110010101011010001111000101000101101011001100011,
+ 0b1010111011011011110111110011001010000010011001110100101101000101,
+ 0b0011000001110110011010010000011100100011001011001100001101010110,
+ 0b0100011111011000111111101000011110000010111110101001000000001001,
+ 0b1110000001110001001101101110011000100000001010000111100010111010,
+ 0b1110001001010011101000111000001000010100110000010110100011110000,
+ 0b0000011010110000110001111000011111000011001101001101001001000110,
+ 0b1010010111001000101001100101010110100100100010010010000101000010,
+ 0b1011001110000111100010100110000011100011111001110111001100000101,
+ 0b0110101001001000010110001000010001010101110101100001111100011001,
+ 0b1111100011110101011110011010101001010010100011000010110001101001,
+ 0b0100000100001000111101011100010011011111011001000000001100011000,
+ 0b1110111111000111100101110111110000000011001110011100011011011001,
+ 0b1100001100100000010001100011011000111011110000110011010101000011,
+ 0b1111111011100111011101001111111000010000001111010111110010000100,
+ 0b1110111001111110101111000101000000001010001110011010001000111010,
+ 0b1000010001011000101111111010110011111101110101101001111000111010,
+ 0b0100000111101001000111011001101000001010111011101001101111000100,
+ 0b0000011100110001000111011100111100110001101111111010110111100000,
+ 0b0000011101011100100110010011110101010100010011110010010111010000,
+ 0b0011011001100111110101111100001001101110101101001110110011110110,
+ 0b1011000101000001110100111001100100111100110011110000000001101000,
+ 0b1011100011110100001001110101010110111001000000001011101001011110,
+ 0b1111001010010010100000010110101010101011101000101000000000001100,
+ 0b1000001111100100111001110101100001010011111111000001000011110000,
+ 0b0001011101001000010000101101111000001110101100110011001100110111,
+ 0b1110011100000010101011011111001010111101111110100000011100000011,
+ 0b1001110110011100101010011110100010110001001110110000101011100110,
+ 0b1001101000100011100111010000011011100001000000110101100100001001,
+ 0b1010111000101000101101010111000010001100001010100011111100000100,
+ 0b0111101000100011000101101011111011100010001101110111001111001011,
+ 0b1110100111010110001110110110000000010110100011110000010001111100,
+ 0b1100010100011010001011001000111001010101011110100101011001000000,
+ 0b0000110001111001100110010110111010101101001101000000000010010101,
+ 0b0001110111101000001111101010110010010000111110111100000111110100,
+ 0b0111110111001001111000110001101101001010101110110101111110000100,
+ 0b0000111110111010101111100010111010011100010110011011011001000001,
+ 0b1010010100100100101110111111111000101100000010111111101101000110,
+ 0b1000100111111101100011001101000110001000000100010101010100001101,
+ 0b1100101010101000111100101100001000110001110010100000000010110101,
+ 0b1010000100111101100100101010010110100010000000110101101110000100,
+ 0b1011111011110001110000100100000000001010111010001101100000100100,
+ 0b0111101101100011001110011100000001000101101101111000100111011111,
+ 0b0100111010010011011001010011110100001100111010010101111111100011,
+ 0b0010001001011000111000001100110111110111110010100011000110110110,
+ 0b0101010110000000010000100000110100111011111101000100000111010010,
+ 0b0110000011011101000001010100110101101110011100110101000000001001,
+ 0b1101100110100000011000001111000100100100110001100110101010101100,
+ 0b0010100101010110010010001010101000011111111111001011001010001111,
+ 0b0111001010001111001100111001010101001000110101000011110000001000,
+ 0b0110010011001001001111110001010010001011010010001101110110110011,
+ 0b0110010100111011000100111000001001101011111001110010111110111111,
+ 0b0101110111001001101100110100101001110010101110011001101110001000,
+ 0b0100110101010111011010001100010111100011010011111001010100111000,
+ 0b0111000110110111011110100100010111000110000110110110110001111110,
+ 0b1000101101010100100100111110100011110110110010011001110011110101,
+ 0b1001101110101001010100111101101011000101000010110101101111110000,
+ 0b0100100101001011011001001011000010001101001010010001010110101000,
+ 0b0010100001001011100110101000010110000111000111000011100101011011,
+ 0b0110111000011001111101101011111010001000000010101000101010011110,
+ 0b1000110110100001111011000001111100001001000000010110010100100100,
+ 0b1001110100011111100111101011010000010101011100101000010010100110,
+ 0b0001010110101110100010101010001110110110100011101010001001111100,
+ 0b1010100101101100000010110011100110100010010000100100001110000100,
+ 0b0001000000010000001010000010100110000001110100111001110111101101,
+ 0b1100000000000000000000000000000000000000000000000000000000000000 };
+ };
+#elif LONG_DOUBLE_KIND == LDK_BINARY128
+ template<>
+ struct floating_type_traits<long double>
+ {
+ static constexpr int mantissa_bits = 112;
+ static constexpr int exponent_bits = 15;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = unsigned __int128;
+ using shortest_scientific_t = ryu::floating_decimal_128;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000000000000000000000000000000100000010000000,
+ 0b1011001111110100000100010101101110011100100110000110010110011000,
+ 0b1010100010001101111111000000001101010010100010010000111011110111,
+ 0b1011111001110001111000011111000010110111000111110100101010100101,
+ 0b0110100110011110011011000011000010011001110001001001010011100011,
+ 0b0000011111110010101111101011101010000110011111100111001110100111,
+ 0b0100010101010110000010111011110100000010011001001010001110111101,
+ 0b1101110111000010001101100000110100000111001001101011000101011011,
+ 0b0100111011101101010000001101011000101100101110010010110000101011,
+ 0b0100000110111000000110101000010011101000110100010110000011101101,
+ 0b1011001101001000100001010001100100001111011101010101110001010110,
+ 0b1000000001000000101001110010110010001111101101010101001100000110,
+ 0b0101110110100110000110000001001010111110001110010000111111010011,
+ 0b1010001111100111000100011100100100111100100101000001011001000111,
+ 0b1010011000011100110101100111001011100101111111100001110100000100,
+ 0b1100011100100010100000110001001010000000100000001001010111011101,
+ 0b0101110000100011001111101101000000100110000010010111010001111010,
+ 0b0100111100011010110111101000100110000111001001101100000001111100,
+ 0b1100100100111110101011000100000101011010110111000111110100110101,
+ 0b0110010000010111010100110011000000111010000010111011010110000100,
+ 0b0101001001010010110111010111000101011100000111100111000001110010,
+ 0b1101111111001011101010110001000111011010111101001011010110100100,
+ 0b0001000100110000011111101011001101110010110110010000000011100100,
+ 0b0001000000000101001001001000000000011000100011001110101001001110,
+ 0b0010010010001000111010011011100001000110011011011110110100111000,
+ 0b0000100110101100000111100010100100011100110111011100001111001100,
+ 0b1011111010001110001100000011110111111111100000001011111111101100,
+ 0b0000011100001111010101110000100110111100101101110111101001000001,
+ 0b1100010001110110111100001001001101101000011100000010110101001011,
+ 0b0100101001101011111001011110101101100011011111011100101010101111,
+ 0b0001101001111001110000101101101100001011010001011110011101000010,
+ 0b1111000000101001101111011010110011101110100001011011001011100010,
+ 0b0101001010111101101100001111100010010110001101001000001101100100,
+ 0b0101100101011110001100101011111000111001111001001001101101100001,
+ 0b1111001101010010100100011011000110110010001111000111010001001101,
+ 0b0001110010011000000001000110110111011000011100001000011001110111,
+ 0b0100001011011011011011110011101100100101111111101100101000001110,
+ 0b0101011110111101010111100111101111000101111111111110100011011010,
+ 0b1110101010001001110100000010110111010111111010111110100110010110,
+ 0b1010001111100001001100101000110100001100011100110010000011010111,
+ 0b1111111101101111000100111100000101011000001110011011101010111001,
+ 0b1111101100001110100101111101011001000100000101110000110010100011,
+ 0b1001010110110101101101000101010001010000101011011111010011010000,
+ 0b0111001110110011101001100111000001000100001010110000010000001101,
+ 0b0101111100111110100111011001111001111011011110010111010011101010,
+ 0b1110111000000001100100111001100100110001011011001110101111110111,
+ 0b0001010001001101010111101010011111000011110001101101011001111111,
+ 0b0101000011100011010010001101100001011101011010100110101100100010,
+ 0b0001000101011000100101111100110110000101101101111000110001001011,
+ 0b0101100101001011011000010101000000010100011100101101000010011111,
+ 0b1000010010001011101001011010100010111011110100110011011000100111,
+ 0b1000011011100001010111010111010011101100100010010010100100101001,
+ 0b1001001001010111110101000010111010000000101111010100001010010010,
+ 0b0011011110110010010101111011000001000000000011011111000011111011,
+ 0b1011000110100011001110000001000100000001011100010111010010011110,
+ 0b0111101110110101110111110000011000000100011100011000101101101110,
+ 0b1001100101111011011100011110101011001111100111101010101010110111,
+ 0b1100110010010001100011001111010000000100011101001111011101001111,
+ 0b1000111001111010100101000010000100000001001100101010001011001101,
+ 0b0011101011110000110010100101010100110010100001000010101011111101,
+ 0b1100000000000110000010101011000000011101000110011111100010111111,
+ 0b0010100110000011011100010110111100010110101100110011101110001101,
+ 0b0010111101010011111000111001111100110111111100100011110001101110,
+ 0b1001110111001001101001001001011000010100110001000000100011010110,
+ 0b0011110101100111011011111100001000011001010100111100100101111010,
+ 0b0010001101000011000010100101110000010101101000100110000100001010,
+ 0b0010000010100110010101100101110011101111000111111111001001100001,
+ 0b0100111111011011011011100111111011000010011101101111011111110110,
+ 0b1111111111010110101011101000100101110100001110001001101011100111,
+ 0b1011111101000101110000111100100010111010100001010000010010110010,
+ 0b1111010101001011101011101010000100110110001110111100100110111111,
+ 0b1011001101000001001101000010101010010110010001100001011100011010,
+ 0b0101001011011101010001110100010000010001111100100100100001001101,
+ 0b0010100000111001100011000101100101000001111100111001101000000010,
+ 0b1011001111010101011001000100100110100100110111110100000110111000,
+ 0b0101011111010011100011010010111101110010100001111111100010001001,
+ 0b0010111011101100100000000000001111111010011101100111100001001101,
+ 0b1101000000000000000000000000000000000000000000000000000000000000 };
+ };
+#elif LONG_DOUBLE_KIND == LDK_IBM128
+ template<>
+ struct floating_type_traits<long double>
+ {
+ static constexpr int mantissa_bits = 105;
+ static constexpr int exponent_bits = 11;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = unsigned __int128;
+ using shortest_scientific_t = ryu::floating_decimal_128;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000000000000000000000000000001000000100000000,
+ 0b0000000000000000000100000000000000000000001000000000000000000010,
+ 0b0000100000000000000000001001000000000000000001100100000000000000,
+ 0b0011000000000000000000000000000001110000010000000000000000000000,
+ 0b0000100000000000001000000000000000000000000000100000000000000000 };
+ };
+#endif
+
+ // An IEEE-style decomposition of a floating-point value of type T.
+ template<typename T>
+ struct ieee_t
+ {
+ typename floating_type_traits<T>::mantissa_t mantissa;
+ uint32_t biased_exponent;
+ bool sign;
+ };
+
+ // Decompose the floating-point value into its IEEE components.
+ template<typename T>
+ ieee_t<T>
+ get_ieee_repr(const T value)
+ {
+ constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits;
+ constexpr int exponent_bits = floating_type_traits<T>::exponent_bits;
+ constexpr int total_bits = mantissa_bits + exponent_bits + 1;
+
+ constexpr auto get_uint_t = [] {
+ if constexpr (total_bits <= 32)
+ return uint32_t{};
+ else if constexpr (total_bits <= 64)
+ return uint64_t{};
+#ifdef __SIZEOF_INT128__
+ else if constexpr (total_bits <= 128)
+ return (unsigned __int128){};
+#endif
+ };
+ using uint_t = decltype(get_uint_t());
+ uint_t value_bits = 0;
+ memcpy(&value_bits, &value, sizeof(value));
+
+ ieee_t<T> ieee_repr;
+ ieee_repr.mantissa = value_bits & ((uint_t{1} << mantissa_bits) - 1u);
+ ieee_repr.biased_exponent
+ = (value_bits >> mantissa_bits) & ((uint_t{1} << exponent_bits) - 1u);
+ ieee_repr.sign = (value_bits >> (mantissa_bits + exponent_bits)) & 1;
+ return ieee_repr;
+ }
+
+#if LONG_DOUBLE_KIND == LDK_IBM128
+ template<>
+ ieee_t<long double>
+ get_ieee_repr(const long double value)
+ {
+ // The layout of __ibm128 isn't compatible with the standard IEEE format.
+ // So we transform it into an IEEE-compatible format, suitable for
+ // consumption by the generic Ryu API, with an 11-bit exponent and 105-bit
+ // mantissa (plus an implicit leading bit). We use the exponent and sign
+ // of the high part, and we merge the mantissa of the high part with the
+ // mantissa (and the implicit leading bit) of the low part.
+ using uint_t = unsigned __int128;
+ uint_t value_bits = 0;
+ memcpy(&value_bits, &value, sizeof(value_bits));
+
+ const uint64_t value_hi = value_bits;
+ const uint64_t value_lo = value_bits >> 64;
+
+ uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1);
+ unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1);
+ const int sign_hi = (value_hi >> 63) & 1;
+
+ uint64_t mantissa_lo = value_lo & ((1ull << 52) - 1);
+ const unsigned exponent_lo = (value_lo >> 52) & ((1ull << 11) - 1);
+ const int sign_lo = (value_lo >> 63) & 1;
+
+ {
+ // The following code for adjusting the low-part mantissa to combine
+ // it with the high-part mantissa is taken from the glibc source file
+ // sysdeps/ieee754/ldbl-128ibm/printf_fphex.c.
+ mantissa_lo <<= 7;
+ if (exponent_lo != 0)
+ mantissa_lo |= (1ull << (52 + 7));
+ else
+ mantissa_lo <<= 1;
+
+ const int ediff = exponent_hi - exponent_lo - 53;
+ if (ediff > 63)
+ mantissa_lo = 0;
+ else if (ediff > 0)
+ mantissa_lo >>= ediff;
+ else if (ediff < 0)
+ mantissa_lo <<= -ediff;
+
+ if (sign_lo != sign_hi && mantissa_lo != 0)
+ {
+ mantissa_lo = (1ull << 60) - mantissa_lo;
+ if (mantissa_hi == 0)
+ {
+ mantissa_hi = 0xffffffffffffeLL | (mantissa_lo >> 59);
+ mantissa_lo = 0xfffffffffffffffLL & (mantissa_lo << 1);
+ exponent_hi--;
+ }
+ else
+ mantissa_hi--;
+ }
+ }
+
+ ieee_t<long double> ieee_repr;
+ ieee_repr.mantissa = ((uint_t{mantissa_hi} << 64)
+ | (uint_t{mantissa_lo} << 4)) >> 11;
+ ieee_repr.biased_exponent = exponent_hi;
+ ieee_repr.sign = sign_hi;
+ return ieee_repr;
+ }
+#endif
+
+ // Invoke Ryu to obtain the shortest scientific form for the given
+ // floating-point number.
+ template<typename T>
+ typename floating_type_traits<T>::shortest_scientific_t
+ floating_to_shortest_scientific(const T value)
+ {
+ if constexpr (std::is_same_v<T, float>)
+ return ryu::floating_to_fd32(value);
+ else if constexpr (std::is_same_v<T, double>)
+ return ryu::floating_to_fd64(value);
+#ifdef __SIZEOF_INT128__
+ else if constexpr (std::is_same_v<T, long double>)
+ {
+ constexpr int mantissa_bits
+ = floating_type_traits<T>::mantissa_bits;
+ constexpr int exponent_bits
+ = floating_type_traits<T>::exponent_bits;
+ constexpr bool has_implicit_leading_bit
+ = floating_type_traits<T>::has_implicit_leading_bit;
+
+ const auto [mantissa, exponent, sign] = get_ieee_repr(value);
+ return ryu::generic_binary_to_decimal(mantissa, exponent, sign,
+ mantissa_bits, exponent_bits,
+ !has_implicit_leading_bit);
+ }
+#endif
+ }
+
+ // This subroutine returns true if the shortest scientific form fd is a
+ // positive power of 10, and the floating-point number that has this shortest
+ // scientific form is smaller than this power of 10.
+ //
+ // For instance, the exactly-representable 64-bit number
+ // 99999999999999991611392.0 has the shortest scientific form 1e23, so its
+ // exact value is smaller than its shortest scientific form.
+ //
+ // For these powers of 10 the length of the fixed form is one digit less
+ // than what the scientific exponent suggests.
+ //
+ // This subroutine inspects a lookup table to detect when fd is such a
+ // "rounded up" power of 10.
+ template<typename T>
+ bool
+ is_rounded_up_pow10_p(const typename
+ floating_type_traits<T>::shortest_scientific_t fd)
+ {
+ if (fd.exponent < 0 || fd.mantissa != 1) [[likely]]
+ return false;
+
+ constexpr auto& pow10_adjustment_tab
+ = floating_type_traits<T>::pow10_adjustment_tab;
+ __glibcxx_assert(fd.exponent/64 < (int)std::size(pow10_adjustment_tab));
+ return (pow10_adjustment_tab[fd.exponent/64]
+ & (1ull << (63 - fd.exponent%64)));
+ }
+
+ int
+ get_mantissa_length(const ryu::floating_decimal_32 fd)
+ { return ryu::decimalLength9(fd.mantissa); }
+
+ int
+ get_mantissa_length(const ryu::floating_decimal_64 fd)
+ { return ryu::decimalLength17(fd.mantissa); }
+
+#ifdef __SIZEOF_INT128__
+ int
+ get_mantissa_length(const ryu::floating_decimal_128 fd)
+ { return ryu::generic128::decimalLength(fd.mantissa); }
+#endif
+} // anon namespace
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+// This subroutine of __floating_to_chars_* handles writing nan, inf and 0 in
+// all formatting modes.
+template<typename T>
+ static optional<to_chars_result>
+ __handle_special_value(char* first, char* const last, const T value,
+ const chars_format fmt, const int precision)
+ {
+ __glibcxx_assert(precision >= 0);
+
+ string_view str;
+ switch (__builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
+ FP_ZERO, value))
+ {
+ case FP_INFINITE:
+ str = "-inf";
+ break;
+
+ case FP_NAN:
+ str = "-nan";
+ break;
+
+ case FP_ZERO:
+ break;
+
+ default:
+ case FP_SUBNORMAL:
+ case FP_NORMAL: [[likely]]
+ return nullopt;
+ }
+
+ if (!str.empty())
+ {
+ // We're formatting +-inf or +-nan.
+ if (!__builtin_signbit(value))
+ str.remove_prefix(strlen("-"));
+
+ if (last - first < (int)str.length())
+ return {{last, errc::value_too_large}};
+
+ memcpy(first, &str[0], str.length());
+ first += str.length();
+ return {{first, errc{}}};
+ }
+
+ // We're formatting 0.
+ __glibcxx_assert(value == 0);
+ const auto orig_first = first;
+ const bool sign = __builtin_signbit(value);
+ int expected_output_length;
+ switch (fmt)
+ {
+ case chars_format::fixed:
+ case chars_format::scientific:
+ case chars_format::hex:
+ expected_output_length = sign + 1;
+ if (precision)
+ expected_output_length += strlen(".") + precision;
+ if (fmt == chars_format::scientific)
+ expected_output_length += strlen("e+00");
+ else if (fmt == chars_format::hex)
+ expected_output_length += strlen("p+0");
+ if (last - first < expected_output_length)
+ return {{last, errc::value_too_large}};
+
+ if (sign)
+ *first++ = '-';
+ *first++ = '0';
+ if (precision)
+ {
+ *first++ = '.';
+ memset(first, '0', precision);
+ first += precision;
+ }
+ if (fmt == chars_format::scientific)
+ {
+ memcpy(first, "e+00", 4);
+ first += 4;
+ }
+ else if (fmt == chars_format::hex)
+ {
+ memcpy(first, "p+0", 3);
+ first += 3;
+ }
+ break;
+
+ case chars_format::general:
+ default: // case chars_format{}:
+ expected_output_length = sign + 1;
+ if (last - first < expected_output_length)
+ return {{last, errc::value_too_large}};
+
+ if (sign)
+ *first++ = '-';
+ *first++ = '0';
+ break;
+ }
+ __glibcxx_assert(first - orig_first == expected_output_length);
+ return {{first, errc{}}};
+ }
+
+// This subroutine of the floating-point to_chars overloads performs
+// hexadecimal formatting.
+template<typename T>
+ static to_chars_result
+ __floating_to_chars_hex(char* first, char* const last, const T value,
+ const optional<int> precision)
+ {
+ if (precision.has_value() && precision.value() < 0) [[unlikely]]
+ // A negative precision argument is treated as if it were omitted.
+ return __floating_to_chars_hex(first, last, value, nullopt);
+
+ __glibcxx_requires_valid_range(first, last);
+
+ constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits;
+ constexpr bool has_implicit_leading_bit
+ = floating_type_traits<T>::has_implicit_leading_bit;
+ constexpr int exponent_bits = floating_type_traits<T>::exponent_bits;
+ constexpr int exponent_bias = (1u << (exponent_bits - 1)) - 1;
+ using mantissa_t = typename floating_type_traits<T>::mantissa_t;
+ constexpr int mantissa_t_width = sizeof(mantissa_t) * __CHAR_BIT__;
+
+ if (auto result = __handle_special_value(first, last, value,
+ chars_format::hex,
+ precision.value_or(0)))
+ return *result;
+
+ // Extract the sign, mantissa and exponent from the value.
+ const auto [ieee_mantissa, biased_exponent, sign] = get_ieee_repr(value);
+ const bool is_normal_number = (biased_exponent != 0);
+
+ // Calculate the unbiased exponent.
+ const int32_t unbiased_exponent = (is_normal_number
+ ? biased_exponent - exponent_bias
+ : 1 - exponent_bias);
+
+ // Shift the mantissa so that its bitwidth is a multiple of 4.
+ constexpr unsigned rounded_mantissa_bits = (mantissa_bits + 3) / 4 * 4;
+ static_assert(mantissa_t_width >= rounded_mantissa_bits);
+ mantissa_t effective_mantissa
+ = ieee_mantissa << (rounded_mantissa_bits - mantissa_bits);
+ if (is_normal_number)
+ {
+ if constexpr (has_implicit_leading_bit)
+ // Restore the mantissa's implicit leading bit.
+ effective_mantissa |= mantissa_t{1} << rounded_mantissa_bits;
+ else
+ // The explicit mantissa bit should already be set.
+ __glibcxx_assert(effective_mantissa & (mantissa_t{1} << (mantissa_bits
+ - 1u)));
+ }
+
+ // Compute the shortest precision needed to print this value exactly,
+ // disregarding trailing zeros.
+ constexpr int full_hex_precision = (has_implicit_leading_bit
+ ? (mantissa_bits + 3) / 4
+ // With an explicit leading bit, we
+ // use the four leading nibbles as the
+ // hexit before the decimal point.
+ : (mantissa_bits - 4 + 3) / 4);
+ const int trailing_zeros = __countr_zero(effective_mantissa) / 4;
+ const int shortest_full_precision = full_hex_precision - trailing_zeros;
+ __glibcxx_assert(shortest_full_precision >= 0);
+
+ int written_exponent = unbiased_exponent;
+ const int effective_precision = precision.value_or(shortest_full_precision);
+ if (effective_precision < shortest_full_precision)
+ {
+ // When limiting the precision, we need to determine how to round the
+ // least significant printed hexit. The following branchless
+ // bit-level-parallel technique computes whether to round up the
+ // mantissa bit at index N (according to round-to-nearest rules) when
+ // dropping N bits of precision, for each index N in the bit vector.
+ // This technique is borrowed from the MSVC implementation.
+ using bitvec = mantissa_t;
+ const bitvec round_bit = effective_mantissa << 1;
+ const bitvec has_tail_bits = round_bit - 1;
+ const bitvec lsb_bit = effective_mantissa;
+ const bitvec should_round = round_bit & (has_tail_bits | lsb_bit);
+
+ const int dropped_bits = 4*(full_hex_precision - effective_precision);
+ // Mask out the dropped nibbles.
+ effective_mantissa >>= dropped_bits;
+ effective_mantissa <<= dropped_bits;
+ if (should_round & (mantissa_t{1} << dropped_bits))
+ {
+ // Round up the least significant nibble.
+ effective_mantissa += mantissa_t{1} << dropped_bits;
+ // Check and adjust for overflow of the leading nibble. When the
+ // type has an implicit leading bit, then the leading nibble
+ // before rounding is either 0 or 1, so it can't overflow.
+ if constexpr (!has_implicit_leading_bit)
+ {
+ // The only supported floating-point type with explicit
+ // leading mantissa bit is LDK_FLOAT80, i.e. x86 80-bit
+ // extended precision, and so we hardcode the below overflow
+ // check+adjustment for this type.
+ static_assert(mantissa_t_width == 64
+ && rounded_mantissa_bits == 64);
+ if (effective_mantissa == 0)
+ {
+ // We rounded up the least significant nibble and the
+ // mantissa overflowed, e.g f.fcp+10 with precision=1
+ // became 10.0p+10. Absorb this extra hexit into the
+ // exponent to obtain 1.0p+14.
+ effective_mantissa
+ = mantissa_t{1} << (rounded_mantissa_bits - 4);
+ written_exponent += 4;
+ }
+ }
+ }
+ }
+
+ // Compute the leading hexit and mask it out from the mantissa.
+ char leading_hexit;
+ if constexpr (has_implicit_leading_bit)
+ {
+ const unsigned nibble = effective_mantissa >> rounded_mantissa_bits;
+ __glibcxx_assert(nibble <= 2);
+ leading_hexit = '0' + nibble;
+ effective_mantissa &= ~(mantissa_t{0b11} << rounded_mantissa_bits);
+ }
+ else
+ {
+ const unsigned nibble = effective_mantissa >> (rounded_mantissa_bits-4);
+ __glibcxx_assert(nibble < 16);
+ leading_hexit = "0123456789abcdef"[nibble];
+ effective_mantissa &= ~(mantissa_t{0b1111} << (rounded_mantissa_bits-4));
+ written_exponent -= 3;
+ }
+
+ // Now before we start writing the string, determine the total length of
+ // the output string and perform a single bounds check.
+ int expected_output_length = sign + 1;
+ if (effective_precision != 0)
+ expected_output_length += strlen(".") + effective_precision;
+ const int abs_written_exponent = abs(written_exponent);
+ expected_output_length += (abs_written_exponent >= 10000 ? strlen("p+ddddd")
+ : abs_written_exponent >= 1000 ? strlen("p+dddd")
+ : abs_written_exponent >= 100 ? strlen("p+ddd")
+ : abs_written_exponent >= 10 ? strlen("p+dd")
+ : strlen("p+d"));
+ if (last - first < expected_output_length)
+ return {last, errc::value_too_large};
+
+ const auto saved_first = first;
+ // Write the negative sign and the leading hexit.
+ if (sign)
+ *first++ = '-';
+ *first++ = leading_hexit;
+
+ if (effective_precision > 0)
+ {
+ *first++ = '.';
+ int written_hexits = 0;
+ // Extract and mask out the leading nibble after the decimal point,
+ // write its corresponding hexit, and repeat until the mantissa is
+ // empty.
+ int nibble_offset = rounded_mantissa_bits;
+ if constexpr (!has_implicit_leading_bit)
+ // We already printed the entire leading hexit.
+ nibble_offset -= 4;
+ while (effective_mantissa != 0)
+ {
+ nibble_offset -= 4;
+ const unsigned nibble = effective_mantissa >> nibble_offset;
+ __glibcxx_assert(nibble < 16);
+ *first++ = "0123456789abcdef"[nibble];
+ ++written_hexits;
+ effective_mantissa &= ~(mantissa_t{0b1111} << nibble_offset);
+ }
+ __glibcxx_assert(nibble_offset >= 0);
+ __glibcxx_assert(written_hexits <= effective_precision);
+ // Since the mantissa is now empty, every hexit hereafter must be '0'.
+ if (int remaining_hexits = effective_precision - written_hexits)
+ {
+ memset(first, '0', remaining_hexits);
+ first += remaining_hexits;
+ }
+ }
+
+ // Finally, write the exponent.
+ *first++ = 'p';
+ if (written_exponent >= 0)
+ *first++ = '+';
+ const to_chars_result result = to_chars(first, last, written_exponent);
+ __glibcxx_assert(result.ec == errc{}
+ && result.ptr == saved_first + expected_output_length);
+ return result;
+ }
+
+template<typename T>
+ static to_chars_result
+ __floating_to_chars_shortest(char* first, char* const last, const T value,
+ chars_format fmt)
+ {
+ if (fmt == chars_format::hex)
+ return __floating_to_chars_hex(first, last, value, nullopt);
+
+ __glibcxx_assert(fmt == chars_format::fixed
+ || fmt == chars_format::scientific
</cut>
[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association:
commit 51298b330327a568358da069d9808f51c6cb1672
Author: Alan Modra <amodra(a)gmail.com>
PR28149, debug info with wrong file association
Results regressed to
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
6363
# First few build errors in logs:
from
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
7116
# linux build successful:
all
# linux boot successful:
boot
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_kernel/gnu-master-aarch64-lts-defconfig
First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def…
Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def…
Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def…
Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def…
Reproduce builds:
<cut>
mkdir investigate-binutils-51298b330327a568358da069d9808f51c6cb1672
cd investigate-binutils-51298b330327a568358da069d9808f51c6cb1672
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/
cd binutils
# Reproduce first_bad build
git checkout --detach 51298b330327a568358da069d9808f51c6cb1672
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 5cdb4f14426a99ec8fcba843fa503efdc55fa078
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 51298b330327a568358da069d9808f51c6cb1672
Author: Alan Modra <amodra(a)gmail.com>
Date: Fri Sep 17 09:08:15 2021 +0930
PR28149, debug info with wrong file association
gcc-11 and gcc-12 pass -gdwarf-5 to gas, in order to prime gas for
DWARF 5 level debug info. Unfortunately it seems there are cases
where the compiler does not emit a .file or .loc dwarf debug directive
before any machine instructions. (Note that the .file directive
typically emitted as the first line of assembly output doesn't count as
a dwarf debug directive. The dwarf .file has a file number before the
file name string.)
This patch delays allocation of file numbers for gas generated line
debug info until the end of assembly, thus avoiding any clashes with
compiler generated file numbers. Two fixes for test case source are
necessary; A .loc can't use a file number that hasn't already been
specified with .file.
A followup patch will remove all the gas generated line info on
seeing a .file directive.
PR 28149
* dwarf2dbg.c (num_of_auto_assigned): Delete.
(current): Update initialisation.
(set_or_check_view): Replace all accesses to view with u.view.
(dwarf2_consume_line_info): Likewise.
(dwarf2_directive_loc): Likewise. Assert that we aren't generating
line info.
(dwarf2_gen_line_info_1): Don't call set_or_check_view on
gas generated line entries.
(dwarf2_gen_line_info): Set and track filenames for gas generated
line entries. Simplify generation of labels.
(get_directory_table_entry): Use filename_cmp when comparing dirs.
(do_allocate_filenum): New function.
(dwarf2_where): Set u.filename and filenum to -1 for gas generated
line entries.
(dwarf2_directive_filename): Remove num_of_auto_assigned handling.
(process_entries): Update view field access. Call
do_allocate_filenum.
* dwarf2dbg.h (struct dwarf2_line_info): Add filename field in
union aliasing view.
* testsuite/gas/i386/dwarf2-line-3.s: Add .file directive.
* testsuite/gas/i386/dwarf2-line-4.s: Likewise.
* testsuite/gas/i386/dwarf2-line-4.d: Update expected output.
* testsuite/gas/i386/dwarf4-line-1.d: Likewise.
* testsuite/gas/i386/dwarf5-line-1.d: Likewise.
* testsuite/gas/i386/dwarf5-line-2.d: Likewise.
---
gas/dwarf2dbg.c | 152 ++++++++++++++++++---------------
gas/dwarf2dbg.h | 7 +-
gas/testsuite/gas/i386/dwarf2-line-3.s | 1 +
gas/testsuite/gas/i386/dwarf2-line-4.d | 5 +-
gas/testsuite/gas/i386/dwarf2-line-4.s | 1 +
gas/testsuite/gas/i386/dwarf4-line-1.d | 4 +-
gas/testsuite/gas/i386/dwarf5-line-1.d | 4 +-
gas/testsuite/gas/i386/dwarf5-line-2.d | 3 +-
8 files changed, 105 insertions(+), 72 deletions(-)
diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c
index 9e3437b8948..c6303ba94a6 100644
--- a/gas/dwarf2dbg.c
+++ b/gas/dwarf2dbg.c
@@ -207,7 +207,6 @@ struct file_entry
static struct file_entry *files;
static unsigned int files_in_use;
static unsigned int files_allocated;
-static unsigned int num_of_auto_assigned;
/* Table of directories used by .debug_line. */
static char ** dirs = NULL;
@@ -233,7 +232,7 @@ static struct dwarf2_line_info current =
{
1, 1, 0, 0,
DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0,
- 0, NULL
+ 0, { NULL }
};
/* This symbol is used to recognize view number forced resets in loc
@@ -342,7 +341,7 @@ set_or_check_view (struct line_entry *e, struct line_entry *p,
/* First, compute !(E->label > P->label), to tell whether or not
we're to reset the view number. If we can't resolve it to a
constant, keep it symbolic. */
- if (!p || (e->loc.view == force_reset_view && force_reset_view))
+ if (!p || (e->loc.u.view == force_reset_view && force_reset_view))
{
viewx.X_op = O_constant;
viewx.X_add_number = 0;
@@ -367,9 +366,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p,
}
}
- if (S_IS_DEFINED (e->loc.view) && symbol_constant_p (e->loc.view))
+ if (S_IS_DEFINED (e->loc.u.view) && symbol_constant_p (e->loc.u.view))
{
- expressionS *value = symbol_get_value_expression (e->loc.view);
+ expressionS *value = symbol_get_value_expression (e->loc.u.view);
/* We can't compare the view numbers at this point, because in
VIEWX we've only determined whether we're to reset it so
far. */
@@ -404,16 +403,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p,
{
expressionS incv;
- if (!p->loc.view)
+ if (!p->loc.u.view)
{
- p->loc.view = symbol_temp_make ();
- gas_assert (!S_IS_DEFINED (p->loc.view));
+ p->loc.u.view = symbol_temp_make ();
+ gas_assert (!S_IS_DEFINED (p->loc.u.view));
}
memset (&incv, 0, sizeof (incv));
incv.X_unsigned = 1;
incv.X_op = O_symbol;
- incv.X_add_symbol = p->loc.view;
+ incv.X_add_symbol = p->loc.u.view;
incv.X_add_number = 1;
if (viewx.X_op == O_constant)
@@ -430,16 +429,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p,
}
}
- if (!S_IS_DEFINED (e->loc.view))
+ if (!S_IS_DEFINED (e->loc.u.view))
{
- symbol_set_value_expression (e->loc.view, &viewx);
- S_SET_SEGMENT (e->loc.view, expr_section);
- symbol_set_frag (e->loc.view, &zero_address_frag);
+ symbol_set_value_expression (e->loc.u.view, &viewx);
+ S_SET_SEGMENT (e->loc.u.view, expr_section);
+ symbol_set_frag (e->loc.u.view, &zero_address_frag);
}
/* Define and attempt to simplify any earlier views needed to
compute E's. */
- if (h && p && p->loc.view && !S_IS_DEFINED (p->loc.view))
+ if (h && p && p->loc.u.view && !S_IS_DEFINED (p->loc.u.view))
{
struct line_entry *h2;
/* Reverse the list to avoid quadratic behavior going backwards
@@ -459,7 +458,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p,
break;
set_or_check_view (r, r->next, NULL);
}
- while (r->next && r->next->loc.view && !S_IS_DEFINED (r->next->loc.view)
+ while (r->next
+ && r->next->loc.u.view
+ && !S_IS_DEFINED (r->next->loc.u.view)
&& (r = r->next));
/* Unreverse the list, so that we can go forward again. */
@@ -475,14 +476,14 @@ set_or_check_view (struct line_entry *e, struct line_entry *p,
view of the previous subsegment. */
if (r == h)
continue;
- gas_assert (S_IS_DEFINED (r->loc.view));
- resolve_expression (symbol_get_value_expression (r->loc.view));
+ gas_assert (S_IS_DEFINED (r->loc.u.view));
+ resolve_expression (symbol_get_value_expression (r->loc.u.view));
}
while (r != p && (r = r->next));
/* Now that we've defined and computed all earlier views that might
be needed to compute E's, attempt to simplify it. */
- resolve_expression (symbol_get_value_expression (e->loc.view));
+ resolve_expression (symbol_get_value_expression (e->loc.u.view));
}
}
@@ -518,10 +519,8 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc)
/* Subseg heads are chained to previous subsegs in
dwarf2_finish. */
- if (loc->view && lss->head)
- set_or_check_view (e,
- (struct line_entry *)lss->ptail,
- lss->head);
+ if (loc->filenum != -1u && loc->u.view && lss->head)
+ set_or_check_view (e, (struct line_entry *) lss->ptail, lss->head);
*lss->ptail = e;
lss->ptail = &e->next;
@@ -532,9 +531,6 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc)
void
dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc)
{
- static unsigned int line = -1;
- static unsigned int filenum = -1;
-
symbolS *sym;
/* Early out for as-yet incomplete location information. */
@@ -552,20 +548,35 @@ dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc)
symbols apply to assembler code. It is necessary to emit
duplicate line symbols when a compiler asks for them, because GDB
uses them to determine the end of the prologue. */
- if (debug_type == DEBUG_DWARF2
- && line == loc->line && filenum == loc->filenum)
- return;
+ if (debug_type == DEBUG_DWARF2)
+ {
+ static unsigned int line = -1;
+ static const char *filename = NULL;
+
+ if (line == loc->line)
+ {
+ if (filename == loc->u.filename)
+ return;
+ if (filename_cmp (filename, loc->u.filename) == 0)
+ {
+ filename = loc->u.filename;
+ return;
+ }
+ }
- line = loc->line;
- filenum = loc->filenum;
+ line = loc->line;
+ filename = loc->u.filename;
+ }
if (linkrelax)
{
- char name[120];
+ static int label_num = 0;
+ char name[32];
/* Use a non-fake name for the line number location,
so that it can be referred to by relocations. */
- sprintf (name, ".Loc.%u.%u", line, filenum);
+ sprintf (name, ".Loc.%u", label_num);
+ label_num++;
sym = symbol_new (name, now_seg, frag_now, ofs);
}
else
@@ -624,13 +635,15 @@ get_directory_table_entry (const char *dirname,
{
const char * pwd = file0_dirname ? file0_dirname : getpwd ();
- if (dwarf_level >= 5 && strcmp (dirname, pwd) != 0)
+ if (dwarf_level >= 5 && filename_cmp (dirname, pwd) != 0)
{
- /* In DWARF-5 the 0 entry in the directory table is expected to be
- the same as the DW_AT_comp_dir (which is set to the current build
- directory). Since we are about to create a directory entry that
- is not the same, allocate the current directory first.
- FIXME: Alternatively we could generate an error message here. */
+ /* In DWARF-5 the 0 entry in the directory table is
+ expected to be the same as the DW_AT_comp_dir (which
+ is set to the current build directory). Since we are
+ about to create a directory entry that is not the
+ same, allocate the current directory first.
+ FIXME: Alternatively we could generate an error
+ message here. */
(void) get_directory_table_entry (pwd, NULL, strlen (pwd),
true);
d = 1;
@@ -745,14 +758,30 @@ allocate_filenum (const char * pathname)
if (!assign_file_to_slot (i, file, dir))
return -1;
- num_of_auto_assigned++;
-
last_used = i;
last_used_dir_len = dir_len;
return i;
}
+/* Run through the list of line entries starting at E, allocating
+ file entries for gas generated debug. */
+
+static void
+do_allocate_filenum (struct line_entry *e)
+{
+ do
+ {
+ if (e->loc.filenum == -1u)
+ {
+ e->loc.filenum = allocate_filenum (e->loc.u.filename);
+ e->loc.u.view = NULL;
+ }
+ e = e->next;
+ }
+ while (e);
+}
+
/* Allocate slot NUM in the .debug_line file table to FILENAME.
If DIRNAME is not NULL or there is a directory component to FILENAME
then this will be stored in the directory table, if not already present.
@@ -929,17 +958,12 @@ dwarf2_where (struct dwarf2_line_info *line)
{
if (debug_type == DEBUG_DWARF2)
{
- const char *filename;
-
- memset (line, 0, sizeof (*line));
- filename = as_where (&line->line);
- line->filenum = allocate_filenum (filename);
- /* FIXME: We should check the return value from allocate_filenum. */
+ line->u.filename = as_where (&line->line);
+ line->filenum = -1u;
line->column = 0;
line->flags = DWARF2_FLAG_IS_STMT;
line->isa = current.isa;
line->discriminator = current.discriminator;
- line->view = NULL;
}
else
*line = current;
@@ -1018,7 +1042,7 @@ dwarf2_consume_line_info (void)
| DWARF2_FLAG_PROLOGUE_END
| DWARF2_FLAG_EPILOGUE_BEGIN);
current.discriminator = 0;
- current.view = NULL;
+ current.u.view = NULL;
}
/* Called for each (preferably code) label. If dwarf2_loc_mark_labels
@@ -1060,7 +1084,6 @@ dwarf2_directive_filename (void)
char *filename;
const char * dirname = NULL;
int filename_len;
- unsigned int i;
/* Continue to accept a bare string and pass it off. */
SKIP_WHITESPACE ();
@@ -1132,18 +1155,6 @@ dwarf2_directive_filename (void)
return NULL;
}
- if (num_of_auto_assigned)
- {
- /* Clear slots auto-assigned before the first .file <NUMBER>
- directive was seen. */
- if (files_in_use != (num_of_auto_assigned + 1))
- abort ();
- for (i = 1; i < files_in_use; i++)
- files[i].filename = NULL;
- files_in_use = 0;
- num_of_auto_assigned = 0;
- }
-
if (! allocate_filename_to_slot (dirname, filename, (unsigned int) num,
with_md5))
return NULL;
@@ -1191,6 +1202,11 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED)
return;
}
+ /* debug_type will be turned off by dwarf2_directive_filename, and
+ if we don't have a dwarf style .file then files_in_use will be
+ zero and the above error will trigger. */
+ gas_assert (debug_type == DEBUG_NONE);
+
current.filenum = filenum;
current.line = line;
current.discriminator = 0;
@@ -1333,7 +1349,7 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED)
S_SET_VALUE (sym, 0);
symbol_set_frag (sym, &zero_address_frag);
}
- current.view = sym;
+ current.u.view = sym;
}
else
{
@@ -1347,10 +1363,9 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED)
demand_empty_rest_of_line ();
dwarf2_any_loc_directive_seen = dwarf2_loc_directive_seen = true;
- debug_type = DEBUG_NONE;
/* If we were given a view id, emit the row right away. */
- if (current.view)
+ if (current.u.view)
dwarf2_emit_insn (0);
}
@@ -1984,7 +1999,7 @@ process_entries (segT seg, struct line_entry *e)
frag_ofs = S_GET_VALUE (lab);
if (last_frag == NULL
- || (e->loc.view == force_reset_view && force_reset_view
+ || (e->loc.u.view == force_reset_view && force_reset_view
/* If we're going to reset the view, but we know we're
advancing the PC, we don't have to force with
set_address. We know we do when we're at the same
@@ -2850,16 +2865,19 @@ dwarf2_finish (void)
struct line_subseg *lss = s->head;
struct line_entry **ptail = lss->ptail;
+ if (lss->head && SEG_NORMAL (s->seg))
+ do_allocate_filenum (lss->head);
+
/* Reset the initial view of the first subsection of the
section. */
- if (lss->head && lss->head->loc.view)
+ if (lss->head && lss->head->loc.u.view)
set_or_check_view (lss->head, NULL, NULL);
while ((lss = lss->next) != NULL)
{
/* Link the first view of subsequent subsections to the
previous view. */
- if (lss->head && lss->head->loc.view)
+ if (lss->head && lss->head->loc.u.view)
set_or_check_view (lss->head,
!s->head ? NULL : (struct line_entry *)ptail,
s->head ? s->head->head : NULL);
diff --git a/gas/dwarf2dbg.h b/gas/dwarf2dbg.h
index 14d770c40dd..700d9dec5cb 100644
--- a/gas/dwarf2dbg.h
+++ b/gas/dwarf2dbg.h
@@ -36,7 +36,12 @@ struct dwarf2_line_info
unsigned int isa;
unsigned int flags;
unsigned int discriminator;
- symbolS *view;
+ /* filenum == -1u chooses filename, otherwise view. */
+ union
+ {
+ symbolS *view;
+ const char *filename;
+ } u;
};
/* Implements the .file FILENO "FILENAME" directive. FILENO can be 0
diff --git a/gas/testsuite/gas/i386/dwarf2-line-3.s b/gas/testsuite/gas/i386/dwarf2-line-3.s
index 2085ef93940..e933719fbc3 100644
--- a/gas/testsuite/gas/i386/dwarf2-line-3.s
+++ b/gas/testsuite/gas/i386/dwarf2-line-3.s
@@ -7,6 +7,7 @@
main:
.cfi_startproc
nop
+ .file 1 "dwarf2-test.c"
.loc 1 1
ret
.cfi_endproc
diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.d b/gas/testsuite/gas/i386/dwarf2-line-4.d
index c0c85f4639f..a01fd0540f3 100644
--- a/gas/testsuite/gas/i386/dwarf2-line-4.d
+++ b/gas/testsuite/gas/i386/dwarf2-line-4.d
@@ -33,11 +33,14 @@ Raw dump of debug contents of section \.z?debug_line:
The File Name Table \(offset 0x.*\):
Entry Dir Time Size Name
- 1 1 0 0 dwarf2-line-4.s
+ 1 0 0 0 dwarf2-test.c
+ 2 1 0 0 dwarf2-line-4.s
Line Number Statements:
+ \[0x.*\] Set File Name to entry 2 in the File Name Table
\[0x.*\] Extended opcode 2: set Address to 0x0
\[0x.*\] Special opcode 13: advance Address by 0 to 0x0 and Line by 8 to 9
+ \[0x.*\] Set File Name to entry 1 in the File Name Table
\[0x.*\] Advance Line by -8 to 1
\[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 1
\[0x.*\] Advance PC by 1 to 0x2
diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.s b/gas/testsuite/gas/i386/dwarf2-line-4.s
index 89bb62d9db7..7348f4be62c 100644
--- a/gas/testsuite/gas/i386/dwarf2-line-4.s
+++ b/gas/testsuite/gas/i386/dwarf2-line-4.s
@@ -7,6 +7,7 @@
main:
.cfi_startproc
nop
+ .file 1 "dwarf2-test.c"
.loc 1 1
ret
.cfi_endproc
diff --git a/gas/testsuite/gas/i386/dwarf4-line-1.d b/gas/testsuite/gas/i386/dwarf4-line-1.d
index 4f8321e9bfd..8199efbb0c2 100644
--- a/gas/testsuite/gas/i386/dwarf4-line-1.d
+++ b/gas/testsuite/gas/i386/dwarf4-line-1.d
@@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line:
Entry Dir Time Size Name
1 0 0 0 foo.c
2 0 0 0 foo.h
+ 3 1 0 0 dwarf4-line-1.s
Line Number Statements:
+ \[0x.*\] Set File Name to entry 2 in the File Name Table
\[0x.*\] Extended opcode 2: set Address to 0x0
\[0x.*\] Advance Line by 81 to 82
\[0x.*\] Copy
- \[0x.*\] Set File Name to entry 2 in the File Name Table
+ \[0x.*\] Set File Name to entry 3 in the File Name Table
\[0x.*\] Advance Line by -73 to 9
\[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 9
\[0x.*\] Advance PC by 3 to 0x4
diff --git a/gas/testsuite/gas/i386/dwarf5-line-1.d b/gas/testsuite/gas/i386/dwarf5-line-1.d
index f57fc47d269..2c2cf5696c4 100644
--- a/gas/testsuite/gas/i386/dwarf5-line-1.d
+++ b/gas/testsuite/gas/i386/dwarf5-line-1.d
@@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line:
0 \(indirect line string, offset: 0x.*\): .*/gas/testsuite
1 \(indirect line string, offset: 0x.*\): .*/gas/testsuite/gas/i386
- The File Name Table \(offset 0x.*, lines 2, columns 3\):
+ The File Name Table \(offset 0x.*, lines 3, columns 3\):
Entry Dir MD5 Name
0 0 0xbbd69fc03ce253b2dbaab2522dd519ae \(indirect line string, offset: 0x.*\): core.c
1 0 0x0 \(indirect line string, offset: 0x.*\): types.h
+ 2 1 0x0 \(indirect line string, offset: 0x.*\): dwarf5-line-1.s
Line Number Statements:
+ \[0x.*\] Set File Name to entry 2 in the File Name Table
\[0x.*\] Extended opcode 2: set Address to 0x0
\[0x.*\] Special opcode 8: advance Address by 0 to 0x0 and Line by 3 to 4
\[0x.*\] Advance PC by 1 to 0x1
diff --git a/gas/testsuite/gas/i386/dwarf5-line-2.d b/gas/testsuite/gas/i386/dwarf5-line-2.d
index 2f96df510d0..85f98c8ab9c 100644
--- a/gas/testsuite/gas/i386/dwarf5-line-2.d
+++ b/gas/testsuite/gas/i386/dwarf5-line-2.d
@@ -36,9 +36,10 @@ Raw dump of debug contents of section \.z?debug_line:
0 \(indirect line string, offset: 0x.*\): .*/gas/testsuite
1 \(indirect line string, offset: 0x.*\): .*/gas/testsuite/gas/i386
- The File Name Table \(offset 0x.*, lines 1, columns 3\):
+ The File Name Table \(offset 0x.*, lines 2, columns 3\):
Entry Dir MD5 Name
0 0 0xbbd69fc03ce253b2dbaab2522dd519ae \(indirect line string, offset: 0x.*\): core.c
+ 1 1 0x0 \(indirect line string, offset: .*\): dwarf5-line-2.s
Line Number Statements:
\[0x.*\] Extended opcode 2: set Address to 0x0
</cut>
Progress (short week, 3 days)
* UM-2 [QEMU upstream maintainership]
+ more code review, notably the Apple Silicon hvf support, which is
nearly ready to go in
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Sent out v2 of the "optimized code gen for MVE" patchset;
this now covers all the insns that have an easy optimized version.
+ Fixed a bug where we weren't correctly setting up FPSCR.LTPSIZE
when using QEMU's user-mode-only emulator
+ Wrote some code to add support for the (not yet finalized) gdbstub
XML that tells GDB that the guest CPU has MVE. This causes a GDB
with the MVE handling to crash, so one or the other of us has
got something wrong :-)
KVM Forum was this week, as a 2-day virtual conference. I felt the
programme was comparatively a bit small this year, but there were some
interesting talks. Also a BoF session on whether/how we should
consider adding Rust code to QEMU: I am pushing for (a) a clearer
medium-to-long-term vision of where we would be going and why we'd be
doing this and (b) more design-sketch type work of "what would XYZ in
rust look like", which would hopefully both (a) make the benefit/lack
thereof a bit more clear and (b) demonstrate that there are enough
people enthusiastic enough about the prospect to make it a success...
-- PMM
After llvm commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
Author: Amy Kwan <amy.kwan1(a)ibm.com>
[libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.
the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%):
- 447.dealII,[.] contract<3> grew in size by 164%
Benchmark:
Toolchain: Clang + Glibc + LLVM Linker
Version: all components were built from their latest release branch
Target: aarch64-linux-gnu
Compiler flags: -Oz
Hardware: APM Mustang 8x X-Gene1
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Oz
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
cd investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c8905f1bb304f1cfe297312ae0dda9946cb27594
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
Author: Amy Kwan <amy.kwan1(a)ibm.com>
Date: Fri Sep 3 14:53:57 2021 -0400
[libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.
It appears when testing LLVM 13 on Power, we run into failures with the
`libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing
values out.
Despite some the functions in the test already being marked with optnone,
adding the `MarkAsLive()` calls inside of the pretty printer comparison functions
resolves the issues of the values being optimized out.
This patch aims to address https://llvm.org/PR51675.
Differential Revision: https://reviews.llvm.org/D109204
(cherry picked from commit 217c6d643124be312f4a99b203118744edb9d54c)
---
libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
index 2d8e9620089a..7c8d307d19fb 100644
--- a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
+++ b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
@@ -92,24 +92,28 @@ void MarkAsLive(Type &&) {}
template <typename TypeToPrint> void ComparePrettyPrintToChars(
TypeToPrint value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
template <typename TypeToPrint> void ComparePrettyPrintToRegex(
TypeToPrint value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
void CompareExpressionPrettyPrintToChars(
std::string value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
void CompareExpressionPrettyPrintToRegex(
std::string value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
</cut>
After gcc commit c416c52bcdb120db5e8c53a51bd78c4360daf79b
Author: Nathan Sidwell <nathan(a)acm.org>
c++ ICE with nested requirement as default tpl parm[PR94827]
the following benchmarks slowed down by more than 2%:
- 456.hmmer slowed down by 4%
Benchmark:
Toolchain: GCC + Glibc + GNU Linker
Version: all components were built from their latest release branch
Target: aarch64-linux-gnu
Compiler flags: -O3 -flto
Hardware: NVidia TX1 4x Cortex-A57
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Reproduce builds:
<cut>
mkdir investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b
cd investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach c416c52bcdb120db5e8c53a51bd78c4360daf79b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b1983f4582bbe060b7da83578acb9ed653681fc8
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit c416c52bcdb120db5e8c53a51bd78c4360daf79b
Author: Nathan Sidwell <nathan(a)acm.org>
Date: Thu Apr 30 08:23:16 2020 -0700
c++ ICE with nested requirement as default tpl parm[PR94827]
Template headers are not incrementally updated as we parse its parameters.
We maintain a dummy level until the closing > when we replace the dummy with
a real parameter set. requires processing was expecting a properly populated
arg_vec in current_template_parms, and then creates a self-mapping of parameters
from that. But we don't need to do that, just teach map_arguments to look at
TREE_VALUE when args is NULL.
* constraint.cc (map_arguments): If ARGS is null, it's a
self-mapping of parms.
(finish_nested_requirement): Do not pass argified
current_template_parms to normalization.
(tsubst_nested_requirement): Don't assert no template parms.
---
gcc/cp/ChangeLog | 10 ++++++++++
gcc/cp/constraint.cc | 27 ++++++++++++++++-----------
gcc/testsuite/g++.dg/concepts/pr94827.C | 15 +++++++++++++++
3 files changed, 41 insertions(+), 11 deletions(-)
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 1fa0e123cb1..3c57945cecf 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,13 @@
+2020-04-30 Jason Merrill <jason(a)redhat.com>
+ Nathan Sidwell <nathan(a)acm.org>
+
+ PR c++/94827
+ * constraint.cc (map_arguments): If ARGS is null, it's a
+ self-mapping of parms.
+ (finish_nested_requirement): Do not pass argified
+ current_template_parms to normalization.
+ (tsubst_nested_requirement): Don't assert no template parms.
+
2020-04-30 Iain Sandoe <iain(a)sandoe.co.uk>
PR c++/94886
diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 866b0f51b05..85513fecf43 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -546,12 +546,16 @@ static tree
map_arguments (tree parms, tree args)
{
for (tree p = parms; p; p = TREE_CHAIN (p))
- {
- int level;
- int index;
- template_parm_level_and_index (TREE_VALUE (p), &level, &index);
- TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
- }
+ if (args)
+ {
+ int level;
+ int index;
+ template_parm_level_and_index (TREE_VALUE (p), &level, &index);
+ TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
+ }
+ else
+ TREE_PURPOSE (p) = TREE_VALUE (p);
+
return parms;
}
@@ -2005,8 +2009,6 @@ tsubst_compound_requirement (tree t, tree args, subst_info info)
static tree
tsubst_nested_requirement (tree t, tree args, subst_info info)
{
- gcc_assert (!uses_template_parms (args));
-
/* Ensure that we're in an evaluation context prior to satisfaction. */
tree norm = TREE_VALUE (TREE_TYPE (t));
tree result = satisfy_constraint (norm, args, info);
@@ -2953,12 +2955,15 @@ finish_compound_requirement (location_t loc, tree expr, tree type, bool noexcept
tree
finish_nested_requirement (location_t loc, tree expr)
{
+ /* Currently open template headers have dummy arg vectors, so don't
+ pass into normalization. */
+ tree norm = normalize_constraint_expression (expr, NULL_TREE, false);
+ tree args = current_template_parms
+ ? template_parms_to_args (current_template_parms) : NULL_TREE;
+
/* Save the normalized constraint and complete set of normalization
arguments with the requirement. We keep the complete set of arguments
around for re-normalization during diagnostics. */
- tree args = current_template_parms
- ? template_parms_to_args (current_template_parms) : NULL_TREE;
- tree norm = normalize_constraint_expression (expr, args, false);
tree info = build_tree_list (args, norm);
/* Build the constraint, saving its normalization as its type. */
diff --git a/gcc/testsuite/g++.dg/concepts/pr94827.C b/gcc/testsuite/g++.dg/concepts/pr94827.C
new file mode 100644
index 00000000000..f14ec2551a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/pr94827.C
@@ -0,0 +1,15 @@
+// PR 94287 ICE looking inside open template-parm level
+// { dg-do run { target c++17 } }
+// { dg-options -fconcepts }
+
+template <typename T,
+ bool X = requires { requires (sizeof(T)==1); } >
+ int foo(T) { return X; }
+
+int main() {
+ if (!foo('4'))
+ return 1;
+ if (foo (4))
+ return 2;
+ return 0;
+}
</cut>
After llvm commit f17d60d620283b5d53286056ceeaeb8c27b6530a
Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com>
Inform pass manager when child loops are deleted
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a
cd investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach f17d60d620283b5d53286056ceeaeb8c27b6530a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach f56129fe78d5c849971017976c71333b6b1a27c6
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f17d60d620283b5d53286056ceeaeb8c27b6530a
Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com>
Date: Fri Sep 3 20:50:33 2021 +0200
Inform pass manager when child loops are deleted
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).
Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109257
(fixes PR51754)
(cherry-picked from commit 0f0344dd1e3b53387bb396070916e67f4c426da6)
---
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | 43 +++++++++----
.../nontrivial-unswitch-markloopasdeleted.ll | 71 ++++++++++++++++++++++
2 files changed, 102 insertions(+), 12 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index b9cccc2af309..b1c105258027 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -1587,10 +1587,12 @@ deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
BB->eraseFromParent();
}
-static void deleteDeadBlocksFromLoop(Loop &L,
- SmallVectorImpl<BasicBlock *> &ExitBlocks,
- DominatorTree &DT, LoopInfo &LI,
- MemorySSAUpdater *MSSAU) {
+static void
+deleteDeadBlocksFromLoop(Loop &L,
+ SmallVectorImpl<BasicBlock *> &ExitBlocks,
+ DominatorTree &DT, LoopInfo &LI,
+ MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
// Find all the dead blocks tied to this loop, and remove them from their
// successors.
SmallSetVector<BasicBlock *, 8> DeadBlockSet;
@@ -1640,6 +1642,7 @@ static void deleteDeadBlocksFromLoop(Loop &L,
}) &&
"If the child loop header is dead all blocks in the child loop must "
"be dead as well!");
+ DestroyLoopCB(*ChildL, ChildL->getName());
LI.destroy(ChildL);
return true;
});
@@ -1980,6 +1983,8 @@ static bool rebuildLoopAfterUnswitch(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
ParentL->removeChildLoop(llvm::find(*ParentL, &L));
else
LI.removeLoop(llvm::find(LI, &L));
+ // markLoopAsDeleted for L should be triggered by the caller (it is typically
+ // done by using the UnswitchCB callback).
LI.destroy(&L);
return false;
}
@@ -2019,7 +2024,8 @@ static void unswitchNontrivialInvariants(
SmallVectorImpl<BasicBlock *> &ExitBlocks, IVConditionInfo &PartialIVInfo,
DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
- ScalarEvolution *SE, MemorySSAUpdater *MSSAU) {
+ ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
auto *ParentBB = TI.getParent();
BranchInst *BI = dyn_cast<BranchInst>(&TI);
SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI);
@@ -2319,7 +2325,7 @@ static void unswitchNontrivialInvariants(
// Now that our cloned loops have been built, we can update the original loop.
// First we delete the dead blocks from it and then we rebuild the loop
// structure taking these deletions into account.
- deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU);
+ deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU, DestroyLoopCB);
if (MSSAU && VerifyMemorySSA)
MSSAU->getMemorySSA()->verifyMemorySSA();
@@ -2670,7 +2676,8 @@ static bool unswitchBestCondition(
Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
AAResults &AA, TargetTransformInfo &TTI,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
- ScalarEvolution *SE, MemorySSAUpdater *MSSAU) {
+ ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
// Collect all invariant conditions within this loop (as opposed to an inner
// loop which would be handled when visiting that inner loop).
SmallVector<std::pair<Instruction *, TinyPtrVector<Value *>>, 4>
@@ -2958,7 +2965,7 @@ static bool unswitchBestCondition(
<< "\n");
unswitchNontrivialInvariants(L, *BestUnswitchTI, BestUnswitchInvariants,
ExitBlocks, PartialIVInfo, DT, LI, AC,
- UnswitchCB, SE, MSSAU);
+ UnswitchCB, SE, MSSAU, DestroyLoopCB);
return true;
}
@@ -2988,7 +2995,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
AAResults &AA, TargetTransformInfo &TTI, bool Trivial,
bool NonTrivial,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
- ScalarEvolution *SE, MemorySSAUpdater *MSSAU) {
+ ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
assert(L.isRecursivelyLCSSAForm(DT, LI) &&
"Loops must be in LCSSA form before unswitching.");
@@ -3036,7 +3044,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
// Try to unswitch the best invariant condition. We prefer this full unswitch to
// a partial unswitch when possible below the threshold.
- if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU))
+ if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU,
+ DestroyLoopCB))
return true;
// No other opportunities to unswitch.
@@ -3083,6 +3092,10 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
U.markLoopAsDeleted(L, LoopName);
};
+ auto DestroyLoopCB = [&U](Loop &L, StringRef Name) {
+ U.markLoopAsDeleted(L, Name);
+ };
+
Optional<MemorySSAUpdater> MSSAU;
if (AR.MSSA) {
MSSAU = MemorySSAUpdater(AR.MSSA);
@@ -3091,7 +3104,8 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
}
if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial,
UnswitchCB, &AR.SE,
- MSSAU.hasValue() ? MSSAU.getPointer() : nullptr))
+ MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,
+ DestroyLoopCB))
return PreservedAnalyses::all();
if (AR.MSSA && VerifyMemorySSA)
@@ -3179,12 +3193,17 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
LPM.markLoopAsDeleted(*L);
};
+ auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) {
+ LPM.markLoopAsDeleted(L);
+ };
+
if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();
bool Changed =
unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial, UnswitchCB, SE,
- MSSAU.hasValue() ? MSSAU.getPointer() : nullptr);
+ MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,
+ DestroyLoopCB);
if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll
new file mode 100644
index 000000000000..455a38535576
--- /dev/null
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll
@@ -0,0 +1,71 @@
+; RUN: opt < %s -enable-loop-distribute -passes='loop-distribute,loop-mssa(simple-loop-unswitch<nontrivial>),loop-distribute' -o /dev/null -S -debug-pass-manager=verbose 2>&1 | FileCheck %s
+
+
+; Running loop-distribute will result in LoopAccessAnalysis being required and
+; cached in the LoopAnalysisManagerFunctionProxy.
+;
+; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner<header><latch><exiting>
+
+
+; Then simple-loop-unswitch is removing/replacing some loops (resulting in
+; Loop objects used as key in the analyses cache is destroyed). So here we
+; want to see that any analysis results cached on the destroyed loop is
+; cleared. A special case here is that loop_a_inner is destroyed when
+; unswitching the parent loop.
+;
+; The bug solved and verified by this test case was related to the
+; SimpleLoopUnswitch not marking the Loop as removed, so we missed clearing
+; the analysis caches.
+;
+; CHECK: Running pass: SimpleLoopUnswitchPass on Loop at depth 1 containing: %loop_begin<header>,%loop_b,%loop_b_inner,%loop_b_inner_exit,%loop_a,%loop_a_inner,%loop_a_inner_exit,%latch<latch><exiting>
+; CHECK-NEXT: Clearing all analysis results for: loop_a_inner
+
+
+; When running loop-distribute the second time we can see that loop_a_inner
+; isn't analysed because the loop no longer exists (instead we find a new loop,
+; loop_a_inner.us). This kind of verifies that it was correct to remove the
+; loop_a_inner related analysis above.
+;
+; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner.us<header><latch><exiting>
+
+
+define i32 @test6(i1* %ptr, i1 %cond1, i32* %a.ptr, i32* %b.ptr) {
+entry:
+ br label %loop_begin
+
+loop_begin:
+ %v = load i1, i1* %ptr
+ br i1 %cond1, label %loop_a, label %loop_b
+
+loop_a:
+ br label %loop_a_inner
+
+loop_a_inner:
+ %va = load i1, i1* %ptr
+ %a = load i32, i32* %a.ptr
+ br i1 %va, label %loop_a_inner, label %loop_a_inner_exit
+
+loop_a_inner_exit:
+ %a.lcssa = phi i32 [ %a, %loop_a_inner ]
+ br label %latch
+
+loop_b:
+ br label %loop_b_inner
+
+loop_b_inner:
+ %vb = load i1, i1* %ptr
+ %b = load i32, i32* %b.ptr
+ br i1 %vb, label %loop_b_inner, label %loop_b_inner_exit
+
+loop_b_inner_exit:
+ %b.lcssa = phi i32 [ %b, %loop_b_inner ]
+ br label %latch
+
+latch:
+ %ab.phi = phi i32 [ %a.lcssa, %loop_a_inner_exit ], [ %b.lcssa, %loop_b_inner_exit ]
+ br i1 %v, label %loop_begin, label %loop_exit
+
+loop_exit:
+ %ab.lcssa = phi i32 [ %ab.phi, %latch ]
+ ret i32 %ab.lcssa
+}
</cut>
Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*:
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
c++: implement C++17 hardware interference size
Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# First few build errors in logs:
from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# build_abe stage2:
5
# build_abe gdb:
6
# build_abe qemu:
7
This commit has regressed these CI configurations:
- tcwg_gnu_cross_build/master-aarch64
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti…
Even more details: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti…
Reproduce builds:
<cut>
mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
Date: Thu Jul 15 15:30:17 2021 -0400
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
gcc/c-family/c-cppbuiltin.c | 14 ++++++
gcc/c-family/c.opt | 5 ++
gcc/config/aarch64/aarch64.c | 22 +++++++++
gcc/config/arm/arm.c | 22 +++++++++
gcc/config/i386/i386-options.c | 6 +++
gcc/cp/constexpr.c | 33 +++++++++++++
gcc/cp/decl.c | 32 ++++++++++++
gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++
gcc/params.opt | 16 ++++++
gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++
gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++
gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++
gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++
gcc/testsuite/g++.target/arm/interference.C | 9 ++++
gcc/testsuite/g++.target/i386/interference.C | 8 +++
libstdc++-v3/include/std/version | 3 ++
libstdc++-v3/libsupc++/new | 10 +++-
17 files changed, 279 insertions(+), 2 deletions(-)
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 48cbefd8bf8..ce88e707127 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
targetm.atomic_test_and_set_trueval);
+ /* Macros for C++17 hardware interference size constants. Either both or
+ neither should be set. */
+ gcc_assert (!param_destruct_interfere_size
+ == !param_construct_interfere_size);
+ if (param_destruct_interfere_size)
+ {
+ /* FIXME The way of communicating these values to the library should be
+ part of the C++ ABI, whether macro or builtin. */
+ builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+ param_destruct_interfere_size);
+ builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+ param_construct_interfere_size);
+ }
+
/* ptr_type_node can't be used here since ptr_mode is only set when
toplev calls backend_init which is not done with -E or pch. */
psize = POINTER_SIZE_UNITS;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c5fe90003f2..9c151d19870 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
C++ ObjC++ Var(warn_init_list) Warning Init(1)
Warn about uses of std::initializer_list that can result in dangling pointers.
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
Wimplicit
C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
Warn about implicit declarations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 30d9a0b7a3d..36519ccc5a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l1_cache_line_size,
aarch64_tune_params.prefetch->l1_cache_line_size);
+
+ if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic AArch64 target, cover the current range of cache line
+ sizes. */
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ 256);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ 64);
+ }
+
if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1e628253d0..6c6e77fab66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3669,6 +3669,28 @@ arm_option_override (void)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_line_size,
current_tune->prefetch.l1_cache_line_size);
+ if (current_tune->prefetch.l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic ARM target, JF Bastien proposed using 64 for both. */
+ /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for
+ constructive? */
+ /* More recent Cortex chips have a 64-byte cache line, but are marked
+ ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+ }
+
if (current_tune->prefetch.l1_cache_size >= 0)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2cb87cedec0..c0006b3674b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p,
SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
ix86_tune_cost->l2_cache_size);
+ /* 64B is the accepted value for these for all x86. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7772fe62d95..0c2498aee22 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc)
"%<constexpr%> function in C++20");
}
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+ context; maybe complain about that. */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+ static bool explained = false;
+ if (cxx_dialect >= cxx17
+ && warn_interference_size
+ && !global_options_set.x_param_destruct_interfere_size
+ && DECL_CONTEXT (decl) == std_node
+ && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size")
+ && (LOCATION_FILE (input_location) != main_input_filename
+ || module_exporting_p ())
+ && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+ && !explained)
+ {
+ explained = true;
+ inform (loc, "its value can vary between compiler versions or "
+ "with different %<-mtune%> or %<-mcpu%> flags");
+ inform (loc, "if this use is part of a public ABI, change it to "
+ "instead use a constant variable you define");
+ inform (loc, "the default value for the current CPU tuning "
+ "is %d bytes", param_destruct_interfere_size);
+ inform (loc, "you can stabilize this value with %<--param "
+ "hardware_destructive_interference_size=%d%>, or disable "
+ "this warning with %<-Wno-interference-size%>",
+ param_destruct_interfere_size);
+ }
+}
+
/* Attempt to reduce the expression T to a constant value.
On failure, issue diagnostic and return error_mark_node. */
/* FIXME unify with c_fully_fold */
@@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
r = *p;
break;
}
+ if (ctx->manifestly_const_eval)
+ maybe_warn_about_constant_value (loc, t);
if (COMPLETE_TYPE_P (TREE_TYPE (t))
&& is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bce62ad202a..c2065027369 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void)
/* Show we use EH for cleanups. */
if (flag_exceptions)
using_eh_for_cleanups ();
+
+ /* Check that the hardware interference sizes are at least
+ alignof(max_align_t), as required by the standard. */
+ const int max_align = max_align_t_align () / BITS_PER_UNIT;
+ if (param_destruct_interfere_size)
+ {
+ if (param_destruct_interfere_size < max_align)
+ error ("%<--param destructive-interference-size=%d%> is less than "
+ "%d", param_destruct_interfere_size, max_align);
+ else if (param_destruct_interfere_size < param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param destructive-interference-size=%d%> "
+ "is less than %<--param l1-cache-line-size=%d%>",
+ param_destruct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_destruct_interfere_size = param_l1_cache_line_size;
+ /* else leave it unset. */
+
+ if (param_construct_interfere_size)
+ {
+ if (param_construct_interfere_size < max_align)
+ error ("%<--param constructive-interference-size=%d%> is less than "
+ "%d", param_construct_interfere_size, max_align);
+ else if (param_construct_interfere_size > param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param constructive-interference-size=%d%> "
+ "is greater than %<--param l1-cache-line-size=%d%>",
+ param_construct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_construct_interfere_size = param_l1_cache_line_size;
}
/* Enter an abi node in global-module context. returns a cookie to
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23cc68f92b5..78cfc100ac2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore,
seemingly insignificant changes in the source program can cause the
warnings produced by @option{-Winline} to appear or disappear.
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+This variable is intended to be used for controlling class layout, to
+avoid false sharing in concurrent code:
+
+@smallexample
+struct independent_fields @{
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> one;
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> two;
+@};
+@end smallexample
+
+Here @samp{one} and @samp{two} are intended to be far enough apart
+that stores to one won't require accesses to the other to reload the
+cache line.
+
+By default, @option{--param destructive-interference-size} and
+@option{--param constructive-interference-size} are set based on the
+current @option{-mtune} option, typically to the L1 cache line size
+for the particular target CPU, sometimes to a range if tuning for a
+generic target. So all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+the same @option{-mtune} (or @option{-mcpu}).
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all. Alternatively, you can force a particular value
+with @option{--param}.
+
+If you are confident that your use of the variable does not affect ABI
+outside a single build of your project, you can turn off the warning
+with @option{-Wno-interference-size}.
+
@item -Wint-in-bool-context
@opindex Wint-in-bool-context
@opindex Wno-int-in-bool-context
@@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride.
This setting is only useful for strides that are known and constant.
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}. The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together. Typically both will be the size of an L1 cache
+line for the target, in bytes. For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+The destructive interference size is intended to be used for layout,
+and thus has ABI impact. The default value is not expected to be
+stable, and on some targets varies with @option{-mtune}, so use of
+this variable in a context where ABI stability is important, such as
+the public interface of a library, is strongly discouraged; if it is
+used in that context, users can stabilize the value using this
+option.
+
+The constructive interference size is less sensitive, as it is
+typically only used in a @samp{static_assert} to make sure that a type
+fits within a cache line.
+
+See also @option{-Winterference-size}.
+
@item loop-interchange-max-num-stmts
The maximum number of stmts in a loop to be interchanged.
diff --git a/gcc/params.opt b/gcc/params.opt
index 3a701e22c46..658ca028851 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
The size of L1 cache line.
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation. Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout, but is strongly
+discouraged from doing so in public ABIs.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads. Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
-param=l1-cache-size=
Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
The size of L1 cache.
diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C
new file mode 100644
index 00000000000..2af75c63f83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules-ts }
+
+module ;
+
+#include <new>
+
+export module foo;
+
+export {
+ struct A {
+ alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size }
+ };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C
new file mode 100644
index 00000000000..57c001bc032
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.C
@@ -0,0 +1,6 @@
+// Test that we warn about use of std::hardware_destructive_interference_size
+// in a header.
+// { dg-do compile { target c++17 } }
+
+// { dg-warning Winterference-size "" { target *-*-* } 0 }
+#include "Winterference.H"
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H
new file mode 100644
index 00000000000..36f0ad5f6d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.H
@@ -0,0 +1,7 @@
+#include <new>
+
+struct A
+{
+ alignas(std::hardware_destructive_interference_size) int i;
+ alignas(std::hardware_destructive_interference_size) int j;
+};
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64. Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index f950bf0f0db..f41004b5911 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
#define __cpp_lib_filesystem 201703
#define __cpp_lib_gcd 201606
#define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
#define __cpp_lib_hypot 201603
#define __cpp_lib_invoke 201411L
#define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
} // extern "C++"
#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
namespace std
{
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
#define __cpp_lib_launder 201606
/// Pointer optimization barrier [ptr.launder]
template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
void launder(const void*) = delete;
void launder(volatile void*) = delete;
void launder(const volatile void*) = delete;
-}
#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+ inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+ inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
#endif // C++17
#if __cplusplus > 201703L
</cut>
Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*:
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
c++: implement C++17 hardware interference size
Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap:
2
This commit has regressed these CI configurations:
- tcwg_gcc_bootstrap/master-aarch64-bootstrap
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra…
Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra…
Reproduce builds:
<cut>
mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
Date: Thu Jul 15 15:30:17 2021 -0400
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
gcc/c-family/c-cppbuiltin.c | 14 ++++++
gcc/c-family/c.opt | 5 ++
gcc/config/aarch64/aarch64.c | 22 +++++++++
gcc/config/arm/arm.c | 22 +++++++++
gcc/config/i386/i386-options.c | 6 +++
gcc/cp/constexpr.c | 33 +++++++++++++
gcc/cp/decl.c | 32 ++++++++++++
gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++
gcc/params.opt | 16 ++++++
gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++
gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++
gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++
gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++
gcc/testsuite/g++.target/arm/interference.C | 9 ++++
gcc/testsuite/g++.target/i386/interference.C | 8 +++
libstdc++-v3/include/std/version | 3 ++
libstdc++-v3/libsupc++/new | 10 +++-
17 files changed, 279 insertions(+), 2 deletions(-)
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 48cbefd8bf8..ce88e707127 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
targetm.atomic_test_and_set_trueval);
+ /* Macros for C++17 hardware interference size constants. Either both or
+ neither should be set. */
+ gcc_assert (!param_destruct_interfere_size
+ == !param_construct_interfere_size);
+ if (param_destruct_interfere_size)
+ {
+ /* FIXME The way of communicating these values to the library should be
+ part of the C++ ABI, whether macro or builtin. */
+ builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+ param_destruct_interfere_size);
+ builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+ param_construct_interfere_size);
+ }
+
/* ptr_type_node can't be used here since ptr_mode is only set when
toplev calls backend_init which is not done with -E or pch. */
psize = POINTER_SIZE_UNITS;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c5fe90003f2..9c151d19870 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
C++ ObjC++ Var(warn_init_list) Warning Init(1)
Warn about uses of std::initializer_list that can result in dangling pointers.
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
Wimplicit
C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
Warn about implicit declarations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 30d9a0b7a3d..36519ccc5a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l1_cache_line_size,
aarch64_tune_params.prefetch->l1_cache_line_size);
+
+ if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic AArch64 target, cover the current range of cache line
+ sizes. */
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ 256);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ 64);
+ }
+
if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1e628253d0..6c6e77fab66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3669,6 +3669,28 @@ arm_option_override (void)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_line_size,
current_tune->prefetch.l1_cache_line_size);
+ if (current_tune->prefetch.l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic ARM target, JF Bastien proposed using 64 for both. */
+ /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for
+ constructive? */
+ /* More recent Cortex chips have a 64-byte cache line, but are marked
+ ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+ }
+
if (current_tune->prefetch.l1_cache_size >= 0)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2cb87cedec0..c0006b3674b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p,
SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
ix86_tune_cost->l2_cache_size);
+ /* 64B is the accepted value for these for all x86. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7772fe62d95..0c2498aee22 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc)
"%<constexpr%> function in C++20");
}
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+ context; maybe complain about that. */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+ static bool explained = false;
+ if (cxx_dialect >= cxx17
+ && warn_interference_size
+ && !global_options_set.x_param_destruct_interfere_size
+ && DECL_CONTEXT (decl) == std_node
+ && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size")
+ && (LOCATION_FILE (input_location) != main_input_filename
+ || module_exporting_p ())
+ && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+ && !explained)
+ {
+ explained = true;
+ inform (loc, "its value can vary between compiler versions or "
+ "with different %<-mtune%> or %<-mcpu%> flags");
+ inform (loc, "if this use is part of a public ABI, change it to "
+ "instead use a constant variable you define");
+ inform (loc, "the default value for the current CPU tuning "
+ "is %d bytes", param_destruct_interfere_size);
+ inform (loc, "you can stabilize this value with %<--param "
+ "hardware_destructive_interference_size=%d%>, or disable "
+ "this warning with %<-Wno-interference-size%>",
+ param_destruct_interfere_size);
+ }
+}
+
/* Attempt to reduce the expression T to a constant value.
On failure, issue diagnostic and return error_mark_node. */
/* FIXME unify with c_fully_fold */
@@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
r = *p;
break;
}
+ if (ctx->manifestly_const_eval)
+ maybe_warn_about_constant_value (loc, t);
if (COMPLETE_TYPE_P (TREE_TYPE (t))
&& is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bce62ad202a..c2065027369 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void)
/* Show we use EH for cleanups. */
if (flag_exceptions)
using_eh_for_cleanups ();
+
+ /* Check that the hardware interference sizes are at least
+ alignof(max_align_t), as required by the standard. */
+ const int max_align = max_align_t_align () / BITS_PER_UNIT;
+ if (param_destruct_interfere_size)
+ {
+ if (param_destruct_interfere_size < max_align)
+ error ("%<--param destructive-interference-size=%d%> is less than "
+ "%d", param_destruct_interfere_size, max_align);
+ else if (param_destruct_interfere_size < param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param destructive-interference-size=%d%> "
+ "is less than %<--param l1-cache-line-size=%d%>",
+ param_destruct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_destruct_interfere_size = param_l1_cache_line_size;
+ /* else leave it unset. */
+
+ if (param_construct_interfere_size)
+ {
+ if (param_construct_interfere_size < max_align)
+ error ("%<--param constructive-interference-size=%d%> is less than "
+ "%d", param_construct_interfere_size, max_align);
+ else if (param_construct_interfere_size > param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param constructive-interference-size=%d%> "
+ "is greater than %<--param l1-cache-line-size=%d%>",
+ param_construct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_construct_interfere_size = param_l1_cache_line_size;
}
/* Enter an abi node in global-module context. returns a cookie to
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23cc68f92b5..78cfc100ac2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore,
seemingly insignificant changes in the source program can cause the
warnings produced by @option{-Winline} to appear or disappear.
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+This variable is intended to be used for controlling class layout, to
+avoid false sharing in concurrent code:
+
+@smallexample
+struct independent_fields @{
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> one;
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> two;
+@};
+@end smallexample
+
+Here @samp{one} and @samp{two} are intended to be far enough apart
+that stores to one won't require accesses to the other to reload the
+cache line.
+
+By default, @option{--param destructive-interference-size} and
+@option{--param constructive-interference-size} are set based on the
+current @option{-mtune} option, typically to the L1 cache line size
+for the particular target CPU, sometimes to a range if tuning for a
+generic target. So all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+the same @option{-mtune} (or @option{-mcpu}).
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all. Alternatively, you can force a particular value
+with @option{--param}.
+
+If you are confident that your use of the variable does not affect ABI
+outside a single build of your project, you can turn off the warning
+with @option{-Wno-interference-size}.
+
@item -Wint-in-bool-context
@opindex Wint-in-bool-context
@opindex Wno-int-in-bool-context
@@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride.
This setting is only useful for strides that are known and constant.
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}. The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together. Typically both will be the size of an L1 cache
+line for the target, in bytes. For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+The destructive interference size is intended to be used for layout,
+and thus has ABI impact. The default value is not expected to be
+stable, and on some targets varies with @option{-mtune}, so use of
+this variable in a context where ABI stability is important, such as
+the public interface of a library, is strongly discouraged; if it is
+used in that context, users can stabilize the value using this
+option.
+
+The constructive interference size is less sensitive, as it is
+typically only used in a @samp{static_assert} to make sure that a type
+fits within a cache line.
+
+See also @option{-Winterference-size}.
+
@item loop-interchange-max-num-stmts
The maximum number of stmts in a loop to be interchanged.
diff --git a/gcc/params.opt b/gcc/params.opt
index 3a701e22c46..658ca028851 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
The size of L1 cache line.
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation. Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout, but is strongly
+discouraged from doing so in public ABIs.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads. Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
-param=l1-cache-size=
Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
The size of L1 cache.
diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C
new file mode 100644
index 00000000000..2af75c63f83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules-ts }
+
+module ;
+
+#include <new>
+
+export module foo;
+
+export {
+ struct A {
+ alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size }
+ };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C
new file mode 100644
index 00000000000..57c001bc032
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.C
@@ -0,0 +1,6 @@
+// Test that we warn about use of std::hardware_destructive_interference_size
+// in a header.
+// { dg-do compile { target c++17 } }
+
+// { dg-warning Winterference-size "" { target *-*-* } 0 }
+#include "Winterference.H"
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H
new file mode 100644
index 00000000000..36f0ad5f6d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.H
@@ -0,0 +1,7 @@
+#include <new>
+
+struct A
+{
+ alignas(std::hardware_destructive_interference_size) int i;
+ alignas(std::hardware_destructive_interference_size) int j;
+};
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64. Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index f950bf0f0db..f41004b5911 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
#define __cpp_lib_filesystem 201703
#define __cpp_lib_gcd 201606
#define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
#define __cpp_lib_hypot 201603
#define __cpp_lib_invoke 201411L
#define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
} // extern "C++"
#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
namespace std
{
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
#define __cpp_lib_launder 201606
/// Pointer optimization barrier [ptr.launder]
template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
void launder(const void*) = delete;
void launder(volatile void*) = delete;
void launder(const volatile void*) = delete;
-}
#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+ inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+ inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
#endif // C++17
#if __cplusplus > 201703L
</cut>
Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*:
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
c++: implement C++17 hardware interference size
Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe gcc:
2
# build_abe linux:
4
# build_abe glibc:
5
# build_abe gdb:
6
This commit has regressed these CI configurations:
- tcwg_gnu_native_build/master-arm
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac…
Even more details: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac…
Reproduce builds:
<cut>
mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
Date: Thu Jul 15 15:30:17 2021 -0400
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
gcc/c-family/c-cppbuiltin.c | 14 ++++++
gcc/c-family/c.opt | 5 ++
gcc/config/aarch64/aarch64.c | 22 +++++++++
gcc/config/arm/arm.c | 22 +++++++++
gcc/config/i386/i386-options.c | 6 +++
gcc/cp/constexpr.c | 33 +++++++++++++
gcc/cp/decl.c | 32 ++++++++++++
gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++
gcc/params.opt | 16 ++++++
gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++
gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++
gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++
gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++
gcc/testsuite/g++.target/arm/interference.C | 9 ++++
gcc/testsuite/g++.target/i386/interference.C | 8 +++
libstdc++-v3/include/std/version | 3 ++
libstdc++-v3/libsupc++/new | 10 +++-
17 files changed, 279 insertions(+), 2 deletions(-)
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 48cbefd8bf8..ce88e707127 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
targetm.atomic_test_and_set_trueval);
+ /* Macros for C++17 hardware interference size constants. Either both or
+ neither should be set. */
+ gcc_assert (!param_destruct_interfere_size
+ == !param_construct_interfere_size);
+ if (param_destruct_interfere_size)
+ {
+ /* FIXME The way of communicating these values to the library should be
+ part of the C++ ABI, whether macro or builtin. */
+ builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+ param_destruct_interfere_size);
+ builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+ param_construct_interfere_size);
+ }
+
/* ptr_type_node can't be used here since ptr_mode is only set when
toplev calls backend_init which is not done with -E or pch. */
psize = POINTER_SIZE_UNITS;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c5fe90003f2..9c151d19870 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
C++ ObjC++ Var(warn_init_list) Warning Init(1)
Warn about uses of std::initializer_list that can result in dangling pointers.
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
Wimplicit
C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
Warn about implicit declarations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 30d9a0b7a3d..36519ccc5a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l1_cache_line_size,
aarch64_tune_params.prefetch->l1_cache_line_size);
+
+ if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic AArch64 target, cover the current range of cache line
+ sizes. */
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ 256);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ 64);
+ }
+
if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1e628253d0..6c6e77fab66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3669,6 +3669,28 @@ arm_option_override (void)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_line_size,
current_tune->prefetch.l1_cache_line_size);
+ if (current_tune->prefetch.l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic ARM target, JF Bastien proposed using 64 for both. */
+ /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for
+ constructive? */
+ /* More recent Cortex chips have a 64-byte cache line, but are marked
+ ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+ }
+
if (current_tune->prefetch.l1_cache_size >= 0)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2cb87cedec0..c0006b3674b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p,
SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
ix86_tune_cost->l2_cache_size);
+ /* 64B is the accepted value for these for all x86. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7772fe62d95..0c2498aee22 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc)
"%<constexpr%> function in C++20");
}
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+ context; maybe complain about that. */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+ static bool explained = false;
+ if (cxx_dialect >= cxx17
+ && warn_interference_size
+ && !global_options_set.x_param_destruct_interfere_size
+ && DECL_CONTEXT (decl) == std_node
+ && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size")
+ && (LOCATION_FILE (input_location) != main_input_filename
+ || module_exporting_p ())
+ && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+ && !explained)
+ {
+ explained = true;
+ inform (loc, "its value can vary between compiler versions or "
+ "with different %<-mtune%> or %<-mcpu%> flags");
+ inform (loc, "if this use is part of a public ABI, change it to "
+ "instead use a constant variable you define");
+ inform (loc, "the default value for the current CPU tuning "
+ "is %d bytes", param_destruct_interfere_size);
+ inform (loc, "you can stabilize this value with %<--param "
+ "hardware_destructive_interference_size=%d%>, or disable "
+ "this warning with %<-Wno-interference-size%>",
+ param_destruct_interfere_size);
+ }
+}
+
/* Attempt to reduce the expression T to a constant value.
On failure, issue diagnostic and return error_mark_node. */
/* FIXME unify with c_fully_fold */
@@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
r = *p;
break;
}
+ if (ctx->manifestly_const_eval)
+ maybe_warn_about_constant_value (loc, t);
if (COMPLETE_TYPE_P (TREE_TYPE (t))
&& is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bce62ad202a..c2065027369 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void)
/* Show we use EH for cleanups. */
if (flag_exceptions)
using_eh_for_cleanups ();
+
+ /* Check that the hardware interference sizes are at least
+ alignof(max_align_t), as required by the standard. */
+ const int max_align = max_align_t_align () / BITS_PER_UNIT;
+ if (param_destruct_interfere_size)
+ {
+ if (param_destruct_interfere_size < max_align)
+ error ("%<--param destructive-interference-size=%d%> is less than "
+ "%d", param_destruct_interfere_size, max_align);
+ else if (param_destruct_interfere_size < param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param destructive-interference-size=%d%> "
+ "is less than %<--param l1-cache-line-size=%d%>",
+ param_destruct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_destruct_interfere_size = param_l1_cache_line_size;
+ /* else leave it unset. */
+
+ if (param_construct_interfere_size)
+ {
+ if (param_construct_interfere_size < max_align)
+ error ("%<--param constructive-interference-size=%d%> is less than "
+ "%d", param_construct_interfere_size, max_align);
+ else if (param_construct_interfere_size > param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param constructive-interference-size=%d%> "
+ "is greater than %<--param l1-cache-line-size=%d%>",
+ param_construct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_construct_interfere_size = param_l1_cache_line_size;
}
/* Enter an abi node in global-module context. returns a cookie to
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23cc68f92b5..78cfc100ac2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore,
seemingly insignificant changes in the source program can cause the
warnings produced by @option{-Winline} to appear or disappear.
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+This variable is intended to be used for controlling class layout, to
+avoid false sharing in concurrent code:
+
+@smallexample
+struct independent_fields @{
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> one;
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> two;
+@};
+@end smallexample
+
+Here @samp{one} and @samp{two} are intended to be far enough apart
+that stores to one won't require accesses to the other to reload the
+cache line.
+
+By default, @option{--param destructive-interference-size} and
+@option{--param constructive-interference-size} are set based on the
+current @option{-mtune} option, typically to the L1 cache line size
+for the particular target CPU, sometimes to a range if tuning for a
+generic target. So all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+the same @option{-mtune} (or @option{-mcpu}).
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all. Alternatively, you can force a particular value
+with @option{--param}.
+
+If you are confident that your use of the variable does not affect ABI
+outside a single build of your project, you can turn off the warning
+with @option{-Wno-interference-size}.
+
@item -Wint-in-bool-context
@opindex Wint-in-bool-context
@opindex Wno-int-in-bool-context
@@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride.
This setting is only useful for strides that are known and constant.
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}. The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together. Typically both will be the size of an L1 cache
+line for the target, in bytes. For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+The destructive interference size is intended to be used for layout,
+and thus has ABI impact. The default value is not expected to be
+stable, and on some targets varies with @option{-mtune}, so use of
+this variable in a context where ABI stability is important, such as
+the public interface of a library, is strongly discouraged; if it is
+used in that context, users can stabilize the value using this
+option.
+
+The constructive interference size is less sensitive, as it is
+typically only used in a @samp{static_assert} to make sure that a type
+fits within a cache line.
+
+See also @option{-Winterference-size}.
+
@item loop-interchange-max-num-stmts
The maximum number of stmts in a loop to be interchanged.
diff --git a/gcc/params.opt b/gcc/params.opt
index 3a701e22c46..658ca028851 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
The size of L1 cache line.
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation. Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout, but is strongly
+discouraged from doing so in public ABIs.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads. Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
-param=l1-cache-size=
Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
The size of L1 cache.
diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C
new file mode 100644
index 00000000000..2af75c63f83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules-ts }
+
+module ;
+
+#include <new>
+
+export module foo;
+
+export {
+ struct A {
+ alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size }
+ };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C
new file mode 100644
index 00000000000..57c001bc032
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.C
@@ -0,0 +1,6 @@
+// Test that we warn about use of std::hardware_destructive_interference_size
+// in a header.
+// { dg-do compile { target c++17 } }
+
+// { dg-warning Winterference-size "" { target *-*-* } 0 }
+#include "Winterference.H"
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H
new file mode 100644
index 00000000000..36f0ad5f6d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.H
@@ -0,0 +1,7 @@
+#include <new>
+
+struct A
+{
+ alignas(std::hardware_destructive_interference_size) int i;
+ alignas(std::hardware_destructive_interference_size) int j;
+};
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64. Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index f950bf0f0db..f41004b5911 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
#define __cpp_lib_filesystem 201703
#define __cpp_lib_gcd 201606
#define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
#define __cpp_lib_hypot 201603
#define __cpp_lib_invoke 201411L
#define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
} // extern "C++"
#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
namespace std
{
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
#define __cpp_lib_launder 201606
/// Pointer optimization barrier [ptr.launder]
template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
void launder(const void*) = delete;
void launder(volatile void*) = delete;
void launder(const volatile void*) = delete;
-}
#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+ inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+ inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
#endif // C++17
#if __cplusplus > 201703L
</cut>
Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig. So far, this commit has regressed CI configurations:
- tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig
Culprit:
<cut>
commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
Author: Slark Xiao <slark_xiao(a)163.com>
Date: Tue Aug 31 10:40:25 2021 +0800
net: Add depends on OF_NET for LiteX's LiteETH
Current settings may produce a build error when
CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls
a headfile <linux/of.h> and some functions
in <linux/of_net.h>.
Signed-off-by: Slark Xiao <slark_xiao(a)163.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
</cut>
Results regressed to (for first_bad == c3496da580b0fc10fdeba8f6a5e6aef4c78b5598)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
29873
# linux build successful:
all
# First few build errors in logs:
from (for last_good == a9e7c3cedc2914f63cd135b75832b9bf850af782)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
29873
# linux build successful:
all
# linux boot successful:
boot
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
cd investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach a9e7c3cedc2914f63cd135b75832b9bf850af782
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Full commit (up to 1000 lines):
<cut>
commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
Author: Slark Xiao <slark_xiao(a)163.com>
Date: Tue Aug 31 10:40:25 2021 +0800
net: Add depends on OF_NET for LiteX's LiteETH
Current settings may produce a build error when
CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls
a headfile <linux/of.h> and some functions
in <linux/of_net.h>.
Signed-off-by: Slark Xiao <slark_xiao(a)163.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
drivers/net/ethernet/litex/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/litex/Kconfig b/drivers/net/ethernet/litex/Kconfig
index 265dba414b41..63bf01d28f0c 100644
--- a/drivers/net/ethernet/litex/Kconfig
+++ b/drivers/net/ethernet/litex/Kconfig
@@ -17,6 +17,7 @@ if NET_VENDOR_LITEX
config LITEX_LITEETH
tristate "LiteX Ethernet support"
+ depends on OF_NET
help
If you wish to compile a kernel for hardware with a LiteX LiteEth
device then you should answer Y to this.
</cut>
Identified regression caused by *gcc:01b5038718056b024b370b74a874fbd92c5bbab3*:
commit 01b5038718056b024b370b74a874fbd92c5bbab3
Author: Aldy Hernandez <aldyh(a)redhat.com>
Disable threading through latches until after loop optimizations.
Results regressed to (for first_bad == 01b5038718056b024b370b74a874fbd92c5bbab3)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -Os artifacts/build-01b5038718056b024b370b74a874fbd92c5bbab3/results_id:
1
# 459.GemsFDTD,GemsFDTD_base.default regressed by 102
# 464.h264ref,h264ref_base.default regressed by 102
from (for last_good == fb88bf9931f17d137eb50c001e1c924aa1e34e83)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -Os artifacts/build-fb88bf9931f17d137eb50c001e1c924aa1e34e83/results_id:
1
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3
cd investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 01b5038718056b024b370b74a874fbd92c5bbab3
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach fb88bf9931f17d137eb50c001e1c924aa1e34e83
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 01b5038718056b024b370b74a874fbd92c5bbab3
Author: Aldy Hernandez <aldyh(a)redhat.com>
Date: Thu Sep 9 20:30:28 2021 +0200
Disable threading through latches until after loop optimizations.
The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.
As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]). I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite. They're mostly noise
now.
Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point. I have added a reminder to the function to keep this
in mind.
Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader. At some point I'd like to combine the cost models.
Tested on x86-64 Linux.
p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:
https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html
gcc/ChangeLog:
* tree-pass.h (PROP_loop_opts_done): New.
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Intersect with global range.
* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Disable
threading through latches until after loop optimizations have run.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
threading through latches.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
Co-authored-by: Michael Matz <matz(a)suse.de>
---
gcc/gimple-range-path.cc | 3 ++
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +--
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 37 ++---------------------
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 17 +----------
gcc/tree-pass.h | 2 ++
gcc/tree-ssa-loop.c | 2 +-
gcc/tree-ssa-threadbackward.c | 28 +++++++++++++++--
7 files changed, 37 insertions(+), 56 deletions(-)
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index a4fa3b296ff..c616b65756f 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt)
basic_block bb = stmt ? gimple_bb (stmt) : exit_bb ();
if (stmt && range_defined_in_block (r, name, bb))
{
+ if (TREE_CODE (name) == SSA_NAME)
+ r.intersect (gimple_range_global (name));
+
set_cache (r, name);
return true;
}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
index e1c33e86cd7..823ada982ff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
void foo();
void bla();
@@ -26,4 +26,4 @@ void thread_latch_through_header (void)
case. And we want to thread through the header as well. These
are both caught by threading in DOM. */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index c7bf867b084..ee46759bacc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,41 +1,8 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
-/* All the threads in the thread1 dump start on a X->BB12 edge, as can
- be seen in the dump:
-
- Registering FSM jump thread: (x, 12) incoming edge; ...
- etc
- etc
-
- Before the new evrp, we were threading paths that started at the
- following edges:
-
- Registering FSM jump thread: (10, 12) incoming edge
- Registering FSM jump thread: (6, 12) incoming edge
- Registering FSM jump thread: (9, 12) incoming edge
-
- This was because the PHI at BB12 had constant values coming in from
- BB10, BB6, and BB9:
-
- # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)>
-
- Now with the new evrp, we get:
-
- # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)>
-
- Thus, we have 3 more paths that are known to be constant and can be
- threaded. Which means that by the second threading pass, we can
- only find one profitable path.
-
- For the record, all these extra constants are better paths coming
- out of switches. For example:
-
- SWITCH_BB -> BBx -> BBy -> BBz -> PHI
-
- We now know the value of the switch index at PHI. */
/* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */
+/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */
int sum0, sum1, sum2, sum3;
int foo (char *s, char **ret)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 5fc2145a432..ba07942f9dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,23 +1,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-/* Here we have the same issue as was commented in ssa-dom-thread-6.c.
- The PHI coming into the threader has a lot more constants, so the
- threader can thread more paths.
-
-$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2
-252c252
-< # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)>
----
-> # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)>
-272a273
-
- I spot checked a few and they all have the same pattern. We are
- basically tracking the switch index better through multiple
- paths. */
-
/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 83941bc0cee..eb75eb17951 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -225,6 +225,8 @@ protected:
been optimized. */
#define PROP_gimple_lomp_dev (1 << 16) /* done omp_device_lower */
#define PROP_rtl_split_insns (1 << 17) /* RTL has insns split. */
+#define PROP_loop_opts_done (1 << 18) /* SSA loop optimizations
+ have completed. */
#define PROP_gimple \
(PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 0cc4b3bbccf..1bbf2f1fb2c 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done =
OPTGROUP_LOOP, /* optinfo_flags */
TV_NONE, /* tv_id */
PROP_cfg, /* properties_required */
- 0, /* properties_provided */
+ PROP_loop_opts_done, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
TODO_cleanup_cfg, /* todo_flags_finish */
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..e72992328de 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
#include "ssa.h"
#include "tree-cfgcleanup.h"
#include "tree-pretty-print.h"
+#include "cfghooks.h"
// Path registry for the backwards threader. After all paths have been
// registered with register_path(), thread_through_all_blocks() is called
@@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers)
TAKEN_EDGE, otherwise it is NULL.
CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path
- would create an irreducible loop. */
+ would create an irreducible loop.
+
+ ?? It seems we should be able to loosen some of the restrictions in
+ this function after loop optimizations have run. */
bool
back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
@@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
the last entry in the array when determining if we thread
through the loop latch. */
if (loop->latch == bb)
- threaded_through_latch = true;
+ {
+ threaded_through_latch = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, " (latch)");
+ }
}
gimple *stmt = get_gimple_control_stmt (m_path[0]);
@@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
"a multiway branch.\n");
return false;
}
+
+ /* Threading through an empty latch would cause code to be added to
+ the latch. This could alter the loop form sufficiently to cause
+ loop optimizations to fail. Disable these threads until after
+ loop optimizations have run. */
+ if ((threaded_through_latch
+ || (taken_edge && taken_edge->dest == loop->latch))
+ && !(cfun->curr_properties & PROP_loop_opts_done)
+ && empty_block_p (loop->latch))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ " FAIL: FSM Thread through latch before loop opts would create non-empty latch\n");
+ return false;
+
+ }
return true;
}
</cut>