Successfully identified regression in *gcc* in CI configuration tcwg_gnu/gnu-master-arm-check_bootstrap. So far, this commit has regressed CI configurations:
- tcwg_gnu/gnu-master-arm-check_bootstrap
Culprit:
<cut>
commit 8a8a7d332d5d01db5aea7336a36d9fd71a679fb1
Author: Ian Lance Taylor <iant(a)golang.org>
Date: Mon Jun 28 16:47:55 2021 -0700
compiler: in composite literals use temps only for interfaces
For a composite literal we only need to introduce a temporary variable
if we may be converting to an interface type, so only do it then.
This saves over 80% of compilation time when using gccgo to compile
cmd/internal/obj/x86, as the GCC middle-end spends a lot of time
pointlessly computing interactions between temporary variables.
For PR debug/101064
For golang/go#46600
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/331513
</cut>
Results regressed to (for first_bad == 8a8a7d332d5d01db5aea7336a36d9fd71a679fb1)
# reset_artifacts:
-10
# build_abe bootstrap:
0
# build_abe check_bootstrap:
1
# # Comparing directories
# # REFERENCE: base-artifacts/sumfiles
# # CURRENT: /home/tcwg-buildslave/workspace/tcwg_gnu_2/artifacts/build-8a8a7d332d5d01db5aea7336a36d9fd71a679fb1/sumfiles
#
# # Comparing 12 common sum files:
# g++.sum
# gcc.sum
# gfortran.sum
# go.sum
# gotools.sum
# libatomic.sum
# libffi.sum
# libgo.sum
# libgomp.sum
# libitm.sum
# libstdc++.sum
# objc.sum
# Comparing:
# REFERENCE:/tmp/gxx-sum1.1601595
# CURRENT: /tmp/gxx-sum2.1601595
#
# ` +---------+---------+
# o RUN STATUS: | REF | RES |
# +------------------------------------------+---------+---------+
# | Passes [PASS] | 460522 | 460519 |
# | Unexpected fails [FAIL] | 194 | 197 |
# | Errors [ERROR] | 0 | 0 |
# | Unexpected passes [XPASS] | 15 | 15 |
# | Expected fails [XFAIL] | 2737 | 2737 |
# | Unresolved [UNRESOLVED] | 104 | 104 |
# | Unsupported [UNSUPPORTED] | 22896 | 22896 |
# | Untested [UNTESTED] | 10 | 10 |
# +------------------------------------------+---------+---------+
#
# REF PASS ratio: 0.952271
# RES PASS ratio: 0.952265
#
# o REGRESSIONS:
# +------------------------------------------+---------+
# | PASS now FAIL [PASS => FAIL] | 3 |
# +------------------------------------------+---------+
# | TOTAL_REGRESSIONS | 3 |
# +------------------------------------------+---------+
#
# - PASS now FAIL [PASS => FAIL]:
#
# Executed from: go.test/go-test.exp
# go:go.test/test/fixedbugs/issue19182.go execution, -O2 -g
# Executed from: /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/libgo/libgo.exp
# libgo:os/signal
# libgo:sync/atomic
#
#
#
# o IMPROVEMENTS TO BE CHECKED:
# +------------------------------------------+---------+
# +------------------------------------------+---------+
# | TOTAL_IMPROVEMENTS_TO_BE_CHECKED | 0 |
# +------------------------------------------+---------+
#
#
# # Regressions found
# # Regressions in 12 common sum files found
from (for last_good == c60d9160b4d966dbea5b1bbea4f817c64d0bee2d)
# reset_artifacts:
-10
# build_abe bootstrap:
0
# build_abe check_bootstrap:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72…
Build top page/logs: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72/
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-8a8a7d332d5d01db5aea7336a36d9fd71a679fb1
cd investigate-gcc-8a8a7d332d5d01db5aea7336a36d9fd71a679fb1
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
cd gcc
# Reproduce first_bad build
git checkout --detach 8a8a7d332d5d01db5aea7336a36d9fd71a679fb1
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c60d9160b4d966dbea5b1bbea4f817c64d0bee2d
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72…
Build log: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-arm-check_bootstrap/72…
Full commit (up to 1000 lines):
<cut>
commit 8a8a7d332d5d01db5aea7336a36d9fd71a679fb1
Author: Ian Lance Taylor <iant(a)golang.org>
Date: Mon Jun 28 16:47:55 2021 -0700
compiler: in composite literals use temps only for interfaces
For a composite literal we only need to introduce a temporary variable
if we may be converting to an interface type, so only do it then.
This saves over 80% of compilation time when using gccgo to compile
cmd/internal/obj/x86, as the GCC middle-end spends a lot of time
pointlessly computing interactions between temporary variables.
For PR debug/101064
For golang/go#46600
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/331513
---
gcc/go/gofrontend/MERGE | 2 +-
gcc/go/gofrontend/expressions.cc | 17 +++++++++++++----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f16fb9facc3..f7bcc8c484a 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-bcafcb3c39530bb325514d6377747eb3127d1a03
+cad187fe3aceb2a7d964b64c70dfa8c8ad24ce65
The first line of this file holds the git revision number of the last
merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 5d45e4baab4..94342b2f9b8 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -15148,7 +15148,7 @@ Struct_construction_expression::do_copy()
}
// Flatten a struct construction expression. Store the values into
-// temporaries in case they need interface conversion.
+// temporaries if they may need interface conversion.
Expression*
Struct_construction_expression::do_flatten(Gogo*, Named_object*,
@@ -15162,10 +15162,13 @@ Struct_construction_expression::do_flatten(Gogo*, Named_object*,
return this;
Location loc = this->location();
+ const Struct_field_list* fields = this->type_->struct_type()->fields();
+ Struct_field_list::const_iterator pf = fields->begin();
for (Expression_list::iterator pv = this->vals()->begin();
pv != this->vals()->end();
- ++pv)
+ ++pv, ++pf)
{
+ go_assert(pf != fields->end());
if (*pv != NULL)
{
if ((*pv)->is_error_expression() || (*pv)->type()->is_error_type())
@@ -15173,7 +15176,8 @@ Struct_construction_expression::do_flatten(Gogo*, Named_object*,
go_assert(saw_errors());
return Expression::make_error(loc);
}
- if (!(*pv)->is_multi_eval_safe())
+ if (pf->type()->interface_type() != NULL
+ && !(*pv)->is_multi_eval_safe())
{
Temporary_statement* temp =
Statement::make_temporary(NULL, *pv, loc);
@@ -15448,7 +15452,7 @@ Array_construction_expression::do_check_types(Gogo*)
}
// Flatten an array construction expression. Store the values into
-// temporaries in case they need interface conversion.
+// temporaries if they may need interface conversion.
Expression*
Array_construction_expression::do_flatten(Gogo*, Named_object*,
@@ -15467,6 +15471,11 @@ Array_construction_expression::do_flatten(Gogo*, Named_object*,
if (this->is_constant_array() || this->is_static_initializer())
return this;
+ // If the array element type is not an interface type, we don't need
+ // temporaries.
+ if (this->type_->array_type()->element_type()->interface_type() == NULL)
+ return this;
+
Location loc = this->location();
for (Expression_list::iterator pv = this->vals()->begin();
pv != this->vals()->end();
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut>
commit 6998f8ae2d14e096aff33968f226587b5c1a193a
Author: David Sherwood <david.sherwood(a)arm.com>
Date: Wed Mar 10 08:34:19 2021 +0000
[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.
Differential Revision: https://reviews.llvm.org/D99718
</cut>
Results regressed to (for first_bad == 6998f8ae2d14e096aff33968f226587b5c1a193a)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-6998f8ae2d14e096aff33968f226587b5c1a193a/results_id:
1
# 462.libquantum,libquantum_base.default regressed by 113
from (for last_good == c835630c25a4f9925517949579f66a43b113fbc9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-c835630c25a4f9925517949579f66a43b113fbc9/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/1050
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/1048
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a
cd investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach 6998f8ae2d14e096aff33968f226587b5c1a193a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c835630c25a4f9925517949579f66a43b113fbc9
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 6998f8ae2d14e096aff33968f226587b5c1a193a
Author: David Sherwood <david.sherwood(a)arm.com>
Date: Wed Mar 10 08:34:19 2021 +0000
[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.
Differential Revision: https://reviews.llvm.org/D99718
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 73 +++++++++++++---------
.../AArch64/no_vector_instructions.ll | 2 +-
.../LoopVectorize/AArch64/predication_costs.ll | 35 +++++++++++
.../Transforms/LoopVectorize/scalarized-bitcast.ll | 40 ++++++++++++
4 files changed, 121 insertions(+), 29 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2b413fc49505..f25af23c86c2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7383,10 +7383,39 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *RetTy = I->getType();
if (canTruncateToMinimalBitwidth(I, VF))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
- VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF);
auto SE = PSE.getSE();
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+ auto hasSingleCopyAfterVectorization = [this](Instruction *I,
+ ElementCount VF) -> bool {
+ if (VF.isScalar())
+ return true;
+
+ auto Scalarized = InstsToScalarize.find(VF);
+ assert(Scalarized != InstsToScalarize.end() &&
+ "VF not yet analyzed for scalarization profitability");
+ return !Scalarized->second.count(I) &&
+ llvm::all_of(I->users(), [&](User *U) {
+ auto *UI = cast<Instruction>(U);
+ return !Scalarized->second.count(UI);
+ });
+ };
+
+ if (isScalarAfterVectorization(I, VF)) {
+ // With the exception of GEPs and PHIs, after scalarization there should
+ // only be one copy of the instruction generated in the loop. This is
+ // because the VF is either 1, or any instructions that need scalarizing
+ // have already been dealt with by the the time we get here. As a result,
+ // it means we don't have to multiply the instruction cost by VF.
+ assert(I->getOpcode() == Instruction::GetElementPtr ||
+ I->getOpcode() == Instruction::PHI ||
+ (I->getOpcode() == Instruction::BitCast &&
+ I->getType()->isPointerTy()) ||
+ hasSingleCopyAfterVectorization(I, VF));
+ VectorTy = RetTy;
+ } else
+ VectorTy = ToVectorTy(RetTy, VF);
+
// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {
case Instruction::GetElementPtr:
@@ -7514,21 +7543,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Op2VK = TargetTransformInfo::OK_UniformValue;
SmallVector<const Value *, 4> Operands(I->operand_values());
- unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
- return N * TTI.getArithmeticInstrCost(
- I->getOpcode(), VectorTy, CostKind,
- TargetTransformInfo::OK_AnyValue,
- Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
+ return TTI.getArithmeticInstrCost(
+ I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
+ Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
}
case Instruction::FNeg: {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");
- unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
- return N * TTI.getArithmeticInstrCost(
- I->getOpcode(), VectorTy, CostKind,
- TargetTransformInfo::OK_AnyValue,
- TargetTransformInfo::OK_AnyValue,
- TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,
- I->getOperand(0), I);
+ return TTI.getArithmeticInstrCost(
+ I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
+ TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None,
+ TargetTransformInfo::OP_None, I->getOperand(0), I);
}
case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);
@@ -7583,6 +7607,10 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
VectorTy = ToVectorTy(getMemInstValueType(I), Width);
return getMemoryInstructionCost(I, VF);
}
+ case Instruction::BitCast:
+ if (I->getType()->isPointerTy())
+ return 0;
+ LLVM_FALLTHROUGH;
case Instruction::ZExt:
case Instruction::SExt:
case Instruction::FPToUI:
@@ -7593,8 +7621,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::SIToFP:
case Instruction::UIToFP:
case Instruction::Trunc:
- case Instruction::FPTrunc:
- case Instruction::BitCast: {
+ case Instruction::FPTrunc: {
// Computes the CastContextHint from a Load/Store instruction.
auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint {
assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
@@ -7672,14 +7699,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
}
}
- unsigned N;
- if (isScalarAfterVectorization(I, VF)) {
- assert(!VF.isScalable() && "VF is assumed to be non scalable");
- N = VF.getKnownMinValue();
- } else
- N = 1;
- return N *
- TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
+ return TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
}
case Instruction::Call: {
bool NeedToScalarize;
@@ -7694,11 +7714,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::ExtractValue:
return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);
default:
- // The cost of executing VF copies of the scalar instruction. This opcode
- // is unknown. Assume that it is the same as 'mul'.
- return VF.getKnownMinValue() * TTI.getArithmeticInstrCost(
- Instruction::Mul, VectorTy, CostKind) +
- getScalarizationOverhead(I, VF);
+ // This opcode is unknown. Assume that it is the same as 'mul'.
+ return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind);
} // end of switch.
}
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
index 247ea35ff5d0..3061998518ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
@@ -6,7 +6,7 @@ target triple = "aarch64--linux-gnu"
; CHECK-LABEL: all_scalar
; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions
;
define void @all_scalar(i64* %a, i64 %n) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
index b0ebb4edf2ad..858b28ddd321 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
@@ -86,6 +86,41 @@ for.end:
ret void
}
+; CHECK-LABEL: predicated_store_phi
+;
+; Same as predicate_store except we use a pointer PHI to maintain the address
+;
+; CHECK: Found new scalar instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ]
+; CHECK: Found new scalar instruction: %addr.next = getelementptr inbounds i32, i32* %addr, i64 1
+; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %addr, align 4
+; CHECK: Found an estimated cost of 0 for VF 2 For instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ]
+; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %addr, align 4
+;
+define void @predicated_store_phi(i32* %a, i1 %c, i32 %x, i64 %n) {
+entry:
+ br label %for.body
+
+for.body:
+ %i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
+ %addr = phi i32 * [ %a, %entry ], [ %addr.next, %for.inc ]
+ %tmp1 = load i32, i32* %addr, align 4
+ %tmp2 = add nsw i32 %tmp1, %x
+ br i1 %c, label %if.then, label %for.inc
+
+if.then:
+ store i32 %tmp2, i32* %addr, align 4
+ br label %for.inc
+
+for.inc:
+ %i.next = add nuw nsw i64 %i, 1
+ %cond = icmp slt i64 %i.next, %n
+ %addr.next = getelementptr inbounds i32, i32* %addr, i64 1
+ br i1 %cond, label %for.body, label %for.end
+
+for.end:
+ ret void
+}
+
; CHECK-LABEL: predicated_udiv_scalarized_operand
;
; This test checks that we correctly compute the cost of the predicated udiv
diff --git a/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll
new file mode 100644
index 000000000000..0c97e6ac475e
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll
@@ -0,0 +1,40 @@
+; REQUIRES: asserts
+; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -S -o - < %s 2>&1 | FileCheck %s
+
+%struct.foo = type { i32, i64 }
+
+; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %0 = bitcast i64* %b to i32*
+
+; The bitcast below will be scalarized due to the predication in the loop. Bitcasts
+; between pointer types should be treated as free, despite the scalarization.
+define void @foo(%struct.foo* noalias nocapture %in, i32* noalias nocapture readnone %out, i64 %n) {
+entry:
+ br label %for.body
+
+for.body: ; preds = %entry, %if.end
+ %i.012 = phi i64 [ %inc, %if.end ], [ 0, %entry ]
+ %b = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 1
+ %0 = bitcast i64* %b to i32*
+ %a = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 0
+ %1 = load i32, i32* %a, align 8
+ %tobool.not = icmp eq i32 %1, 0
+ br i1 %tobool.not, label %if.end, label %land.lhs.true
+
+land.lhs.true: ; preds = %for.body
+ %2 = load i32, i32* %0, align 4
+ %cmp2 = icmp sgt i32 %2, 0
+ br i1 %cmp2, label %if.then, label %if.end
+
+if.then: ; preds = %land.lhs.true
+ %sub = add nsw i32 %2, -1
+ store i32 %sub, i32* %0, align 4
+ br label %if.end
+
+if.end: ; preds = %if.then, %land.lhs.true, %for.body
+ %inc = add nuw nsw i64 %i.012, 1
+ %exitcond.not = icmp eq i64 %inc, %n
+ br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end: ; preds = %if.end
+ ret void
+}
</cut>
Successfully identified regression in *glibc* in CI configuration tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut>
commit 8208be389bce84be0e1c35a3daa0c3467418f921
Author: Florian Weimer <fweimer(a)redhat.com>
Date: Mon Jun 28 08:33:57 2021 +0200
Install shared objects under their ABI names
Previously, the installed objects were named like libc-2.33.so,
and the ABI soname libc.so.6 was just a symbolic link.
The Makefile targets to install these symbolic links are no longer
needed after this, so they are removed with this commit. The more
general $(make-link) command (which invokes scripts/rellns-sh) is
retained because other symbolic links are still needed.
Reviewed-by: Carlos O'Donell <carlos(a)redhat.com>
Tested-by: Carlos O'Donell <carlos(a)rehdat.com>
</cut>
Results regressed to (for first_bad == 8208be389bce84be0e1c35a3daa0c3467418f921)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -O2_LTO -- artifacts/build-8208be389bce84be0e1c35a3daa0c3467418f921/results_id:
1
# 434.zeusmp,zeusmp_base.default regressed by 7549591
# 435.gromacs,gromacs_base.default regressed by 10863956
# 447.dealII,dealII_base.default regressed by 12999253
# 454.calculix,calculix_base.default regressed by 3929138
# 465.tonto,tonto_base.default regressed by 12056000
# 459.GemsFDTD,GemsFDTD_base.default regressed by 7978538
# 410.bwaves,bwaves_base.default regressed by 8373106
# 416.gamess,gamess_base.default regressed by 4372732
# 481.wrf,wrf_base.default regressed by 8973237
# 436.cactusADM,cactusADM_base.default regressed by 4181826
# 437.leslie3d,leslie3d_base.default regressed by 7255644
from (for last_good == 6bf789d69e6be48419094ca98f064e00297a27d5)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -O2_LTO -- artifacts/build-6bf789d69e6be48419094ca98f064e00297a27d5/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-master-aarch64-spec2k6-O2_LTO/976
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-master-aarch64-spec2k6-O2_LTO/988
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-glibc-8208be389bce84be0e1c35a3daa0c3467418f921
cd investigate-glibc-8208be389bce84be0e1c35a3daa0c3467418f921
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd glibc
# Reproduce first_bad build
git checkout --detach 8208be389bce84be0e1c35a3daa0c3467418f921
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 6bf789d69e6be48419094ca98f064e00297a27d5
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Full commit (up to 1000 lines):
<cut>
commit 8208be389bce84be0e1c35a3daa0c3467418f921
Author: Florian Weimer <fweimer(a)redhat.com>
Date: Mon Jun 28 08:33:57 2021 +0200
Install shared objects under their ABI names
Previously, the installed objects were named like libc-2.33.so,
and the ABI soname libc.so.6 was just a symbolic link.
The Makefile targets to install these symbolic links are no longer
needed after this, so they are removed with this commit. The more
general $(make-link) command (which invokes scripts/rellns-sh) is
retained because other symbolic links are still needed.
Reviewed-by: Carlos O'Donell <carlos(a)redhat.com>
Tested-by: Carlos O'Donell <carlos(a)rehdat.com>
---
Makefile | 6 ------
Makerules | 45 +++++----------------------------------------
NEWS | 8 ++++++++
elf/Makefile | 10 ++--------
4 files changed, 15 insertions(+), 54 deletions(-)
diff --git a/Makefile b/Makefile
index 0157b53cb8..f98d5a9e67 100644
--- a/Makefile
+++ b/Makefile
@@ -109,12 +109,6 @@ elf/ldso_install:
# Ignore the error if we cannot update /etc/ld.so.cache.
ifeq (no,$(cross-compiling))
ifeq (yes,$(build-shared))
-install: install-symbolic-link
-.PHONY: install-symbolic-link
-install-symbolic-link: subdir_install
- $(symbolic-link-prog) $(symbolic-link-list)
- rm -f $(symbolic-link-list)
-
install:
-test ! -x $(elf-objpfx)ldconfig || LC_ALL=C \
$(elf-objpfx)ldconfig $(addprefix -r ,$(install_root)) \
diff --git a/Makerules b/Makerules
index f2587907fb..596fa68376 100644
--- a/Makerules
+++ b/Makerules
@@ -990,14 +990,12 @@ versioned := $(strip $(foreach so,$(install-lib.so),\
install-lib.so-versioned := $(filter $(versioned), $(install-lib.so))
install-lib.so-unversioned := $(filter-out $(versioned), $(install-lib.so))
-# For libraries whose soname have version numbers, we install three files:
+# For libraries whose soname have version numbers, we install two files:
# $(inst_libdir)/libfoo.so -- for linking, symlink or ld script
-# $(inst_slibdir)/libfoo.so.NN -- for loading by SONAME, symlink
-# $(inst_slibdir)/libfoo-X.Y.Z.so -- the real shared object file
+# $(inst_slibdir)/libfoo.so.NN -- for loading by SONAME
install-lib-nosubdir: $(install-lib.so-unversioned:%=$(inst_slibdir)/%) \
$(foreach L,$(install-lib.so-versioned),\
$(inst_libdir)/$L \
- $(inst_slibdir)/$(L:.so=)-$(version).so \
$(inst_slibdir)/$L$($L-version))
# Install all the unversioned shared libraries.
@@ -1030,35 +1028,10 @@ ln -f $(objpfx)/$(@F) $@
endef
endif
-ifeq (yes,$(build-shared))
-ifeq (no,$(cross-compiling))
-symbolic-link-prog := $(elf-objpfx)sln
-symbolic-link-list := $(elf-objpfx)symlink.list
-define make-shlib-link
-echo `$(..)scripts/rellns-sh -p $< $@` $@ >> $(symbolic-link-list)
-endef
-else # cross-compiling
-# We need a definition that can be used by elf/Makefile's install rules.
-symbolic-link-prog = $(LN_S)
-endif
-endif
-ifndef make-shlib-link
-define make-shlib-link
-rm -f $@
-$(LN_S) `$(..)scripts/rellns-sh -p $< $@` $@
-endef
-endif
-
ifdef libc.so-version
-# For a library specified to be version N, install three files:
-# libc.so -> libc.so.N (e.g. libc.so.6)
-# libc.so.6 -> libc-VERSION.so (e.g. libc-1.10.so)
-
-$(inst_slibdir)/libc.so$(libc.so-version): $(inst_slibdir)/libc-$(version).so \
- $(+force)
- $(make-shlib-link)
-$(inst_slibdir)/libc-$(version).so: $(common-objpfx)libc.so $(+force)
+$(inst_slibdir)/libc.so$(libc.so-version): $(common-objpfx)libc.so $(+force)
$(do-install-program)
+
install: $(inst_slibdir)/libc.so$(libc.so-version)
# This fragment of linker script gives the OUTPUT_FORMAT statement
@@ -1126,15 +1099,7 @@ include $(o-iterator)
generated += $(foreach o,$(versioned),$o$($o-version))
define o-iterator-doit
-$(inst_slibdir)/$o$($o-version): $(inst_slibdir)/$(o:.so=)-$(version).so \
- $(+force);
- $$(make-shlib-link)
-endef
-object-suffixes-left := $(versioned)
-include $(o-iterator)
-
-define o-iterator-doit
-$(inst_slibdir)/$(o:.so=)-$(version).so: $(objpfx)$o $(+force);
+$(inst_slibdir)/$o$($o-version): $(objpfx)$o $(+force);
$$(do-install-program)
endef
object-suffixes-left := $(versioned)
diff --git a/NEWS b/NEWS
index b24ebf9898..37ba4334c6 100644
--- a/NEWS
+++ b/NEWS
@@ -74,6 +74,14 @@ Deprecated and removed features, and other changes affecting compatibility:
buggy kernel interfaces (for instance some CIFS version) that could still
see spurious EINTR error when cancellation interrupts a blocking syscall.
+* Previously, glibc installed its various shared objects under versioned
+ file names such as libc-2.33.so. The ABI sonames (e.g., libc.so.6)
+ were provided as symbolic links. Starting with glibc 2.34, the shared
+ objects are installed under their ABI sonames directly, without
+ symbolic links. This increases compatibility with distribution
+ package managers that delete removed files late during the package
+ upgrade or downgrade process.
+
Changes to build and runtime requirements:
* On Linux, the shm_open, sem_open, and related functions now expect the
diff --git a/elf/Makefile b/elf/Makefile
index 62f7e8a225..cdbcc14087 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -630,20 +630,14 @@ $(objpfx)trusted-dirs.st: Makefile $(..)Makeconfig
CPPFLAGS-dl-load.c += -I$(objpfx). -I$(csu-objpfx).
ifeq (yes,$(build-shared))
-$(inst_slibdir)/$(rtld-version-installed-name): $(objpfx)ld.so $(+force)
+$(inst_rtlddir)/$(rtld-installed-name): $(objpfx)ld.so $(+force)
$(make-target-directory)
$(do-install-program)
-$(inst_rtlddir)/$(rtld-installed-name): \
- $(inst_slibdir)/$(rtld-version-installed-name) \
- $(inst_slibdir)/libc-$(version).so
- $(make-target-directory)
- $(make-shlib-link)
-
# Special target called by parent to install just the dynamic linker.
.PHONY: ldso_install
ldso_install: $(inst_rtlddir)/$(rtld-installed-name)
-endif
+endif # $(build-shared)
# Workarounds for ${exec_prefix} expansion in configure variables.
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
Culprit:
<cut>
commit b6bf4d8a773cde07e751542f2911307d78b717fd
Author: Andreas Tobler <andreast(a)gcc.gnu.org>
Date: Thu Apr 25 22:03:35 2019 +0200
freebsd64.h: Add bits for 32-bit multilib support.
2019-04-25 Andreas Tobler <andreast(a)gcc.gnu.org>
* config/i386/freebsd64.h: Add bits for 32-bit multilib support.
* config/i386/t-freebsd64: New file.
* config.gcc: Add the t-freebsd64 for multilib support.
From-SVN: r270586
</cut>
Results regressed to (for first_bad == b6bf4d8a773cde07e751542f2911307d78b717fd)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -Os_mthumb -- artifacts/build-b6bf4d8a773cde07e751542f2911307d78b717fd/results_id:
1
# 429.mcf,mcf_base.default regressed by 104
# 470.lbm,lbm_base.default regressed by 103
from (for last_good == 8a55f9c57a1ffd900262aa2fc2015822dc059331)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -Os_mthumb -- artifacts/build-baseline/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/baseline-llvm-release-arm-spec2k6-Os/824
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/929
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-b6bf4d8a773cde07e751542f2911307d78b717fd
cd investigate-gcc-b6bf4d8a773cde07e751542f2911307d78b717fd
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd gcc
# Reproduce first_bad build
git checkout --detach b6bf4d8a773cde07e751542f2911307d78b717fd
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8a55f9c57a1ffd900262aa2fc2015822dc059331
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit b6bf4d8a773cde07e751542f2911307d78b717fd
Author: Andreas Tobler <andreast(a)gcc.gnu.org>
Date: Thu Apr 25 22:03:35 2019 +0200
freebsd64.h: Add bits for 32-bit multilib support.
2019-04-25 Andreas Tobler <andreast(a)gcc.gnu.org>
* config/i386/freebsd64.h: Add bits for 32-bit multilib support.
* config/i386/t-freebsd64: New file.
* config.gcc: Add the t-freebsd64 for multilib support.
From-SVN: r270586
---
gcc/ChangeLog | 6 ++++++
gcc/config.gcc | 5 ++++-
gcc/config/i386/freebsd64.h | 5 ++++-
gcc/config/i386/t-freebsd64 | 30 ++++++++++++++++++++++++++++++
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7cd02c850d3..3a927ed00d3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-04-25 Andreas Tobler <andreast(a)gcc.gnu.org>
+
+ * config/i386/freebsd64.h: Add bits for 32-bit multilib support.
+ * config/i386/t-freebsd64: New file.
+ * config.gcc: Add the t-freebsd64 for multilib support.
+
2019-04-25 Uroš Bizjak <ubizjak(a)gmail.com>
* doc/extend.texi (vector_size): Add missing comma after @xref.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 09fb9ecd2cd..c7a464c89d0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4927,7 +4927,10 @@ case ${target} in
;;
i[34567]86-*-dragonfly* | x86_64-*-dragonfly*)
;;
- i[34567]86-*-freebsd* | x86_64-*-freebsd*)
+ i[34567]86-*-freebsd*)
+ ;;
+ x86_64-*-freebsd*)
+ tmake_file="${tmake_file} i386/t-freebsd64"
;;
ia64*-*-linux*)
;;
diff --git a/gcc/config/i386/freebsd64.h b/gcc/config/i386/freebsd64.h
index 1f99e812f01..bffe19c14ff 100644
--- a/gcc/config/i386/freebsd64.h
+++ b/gcc/config/i386/freebsd64.h
@@ -31,7 +31,7 @@ along with GCC; see the file COPYING3. If not see
#undef LINK_SPEC
#define LINK_SPEC "\
- %{m32:-m elf_i386_fbsd} \
+ %{m32:-m elf_i386_fbsd}%{!m32:-m elf_x86_64_fbsd} \
%{p:%nconsider using '-pg' instead of '-p' with gprof(1)} \
%{v:-V} \
%{assert*} %{R*} %{rpath*} %{defsym*} \
@@ -42,3 +42,6 @@ along with GCC; see the file COPYING3. If not see
-dynamic-linker %(fbsd_dynamic_linker) } \
%{static:-Bstatic}} \
%{symbolic:-Bsymbolic}"
+
+#undef MULTILIB_DEFAULTS
+#define MULTILIB_DEFAULTS { "m64" }
diff --git a/gcc/config/i386/t-freebsd64 b/gcc/config/i386/t-freebsd64
new file mode 100644
index 00000000000..0dd05d479ac
--- /dev/null
+++ b/gcc/config/i386/t-freebsd64
@@ -0,0 +1,30 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+# The 32-bit libraries are found in /usr/lib32
+
+# To support i386 and x86-64, the directory structrue
+# should be:
+#
+# /lib has x86-64 libraries.
+# /lib32 has i386 libraries.
+#
+
+MULTILIB_OPTIONS = m32
+MULTILIB_DIRNAMES = 32
+MULTILIB_OSDIRNAMES = ../lib32
</cut>
Hey, Maxim,
You recently mentioned we have such a setup working, yes? Is there an easy
way to point the qemu portion to a custom branch?
I'm currently working on adding vdso support to qemu, and want to make sure
that the unwind info that I've written works correctly with libgcc and
llvm's libunwind.
Also, what arm CPU models are being used? My edge cases are v4, v4t, m0
(which is not really linux-user, but I think Christophe Lyon had made work).
r~
Progress:
* UM-2 [QEMU upstream maintainership]
+ Code review:
- RTH's latest bswap series
- RTH's series converting nios2 to TranslatorOps
- Alex's SYS_HEAPINFO series
- RTH's series to avoid linux-user putting signal trampoline code
on the stack
- Renesas SCI device refactoring
+ Working on making the docs report the license, version, etc in each
page's footer (something I was asked for back when I did the initial
conversion to Sphinx and which I promised I would do afterwards...)
+ Investigated a bug reported by Maxim Uvarov where the virt board's
GPIO-driven shutdown/reset mechanism for secure firmware wasn't
working correctly. This turns out to be caused by our PL061 GPIO
implementation hardcoding that GPIO lines have pullup resistors,
whereas the virt board and the gpio-pwr device implicitly assume
pulldown. That bug was then masked/confused by a second bug, where
we only actually signal the line level implied by the pullup when
the guest first touches the PL061, rather than on reset. Sent a
series that fixes both of these things (and does a bit of other
PL061 cleanup in the process).
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Implemented all the non-vector shift insns
+ Sent out a second slice of patches for review
+ Progress: 112/210 (53%)
-- PMM
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
Culprit:
<cut>
commit 3a2b05f9fe74fcf9560632cf2695058d47d8683b
Author: Evgeniy Brevnov <ybrevnov(a)azul.com>
Date: Fri Jul 24 18:57:10 2020 +0700
[BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84514
</cut>
Results regressed to (for first_bad == 3a2b05f9fe74fcf9560632cf2695058d47d8683b)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -Os_mthumb -- artifacts/build-3a2b05f9fe74fcf9560632cf2695058d47d8683b/results_id:
1
# 401.bzip2,bzip2_base.default regressed by 104
from (for last_good == 7294ca3f6ecacd05a197bbf0637e10afcb99b6d6)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -Os_mthumb -- artifacts/build-7294ca3f6ecacd05a197bbf0637e10afcb99b6d6/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/699
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/688
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-3a2b05f9fe74fcf9560632cf2695058d47d8683b
cd investigate-llvm-3a2b05f9fe74fcf9560632cf2695058d47d8683b
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach 3a2b05f9fe74fcf9560632cf2695058d47d8683b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7294ca3f6ecacd05a197bbf0637e10afcb99b6d6
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 3a2b05f9fe74fcf9560632cf2695058d47d8683b
Author: Evgeniy Brevnov <ybrevnov(a)azul.com>
Date: Fri Jul 24 18:57:10 2020 +0700
[BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84514
---
llvm/include/llvm/Analysis/BranchProbabilityInfo.h | 71 ++++++++-
llvm/lib/Analysis/BranchProbabilityInfo.cpp | 164 +++++++++++++--------
2 files changed, 169 insertions(+), 66 deletions(-)
diff --git a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
index 3e72afba36c3..7feb5b625938 100644
--- a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
+++ b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
@@ -151,13 +151,66 @@ public:
/// Forget analysis results for the given basic block.
void eraseBlock(const BasicBlock *BB);
- // Use to track SCCs for handling irreducible loops.
- using SccMap = DenseMap<const BasicBlock *, int>;
- using SccHeaderMap = DenseMap<const BasicBlock *, bool>;
- using SccHeaderMaps = std::vector<SccHeaderMap>;
- struct SccInfo {
+ class SccInfo {
+ // Enum of types to classify basic blocks in SCC. Basic block belonging to
+ // SCC is 'Inner' until it is either 'Header' or 'Exiting'. Note that a
+ // basic block can be 'Header' and 'Exiting' at the same time.
+ enum SccBlockType {
+ Inner = 0x0,
+ Header = 0x1,
+ Exiting = 0x2,
+ };
+ // Map of basic blocks to SCC IDs they belong to. If basic block doesn't
+ // belong to any SCC it is not in the map.
+ using SccMap = DenseMap<const BasicBlock *, int>;
+ // Each basic block in SCC is attributed with one or several types from
+ // SccBlockType. Map value has uint32_t type (instead of SccBlockType)
+ // since basic block may be for example "Header" and "Exiting" at the same
+ // time and we need to be able to keep more than one value from
+ // SccBlockType.
+ using SccBlockTypeMap = DenseMap<const BasicBlock *, uint32_t>;
+ // Vector containing classification of basic blocks for all SCCs where i'th
+ // vector element corresponds to SCC with ID equal to i.
+ using SccBlockTypeMaps = std::vector<SccBlockTypeMap>;
+
SccMap SccNums;
- SccHeaderMaps SccHeaders;
+ SccBlockTypeMaps SccBlocks;
+
+ public:
+ explicit SccInfo(const Function &F);
+
+ /// If \p BB belongs to some SCC then ID of that SCC is returned, otherwise
+ /// -1 is returned. If \p BB belongs to more than one SCC at the same time
+ /// result is undefined.
+ int getSCCNum(const BasicBlock *BB) const;
+ /// Returns true if \p BB is a 'header' block in SCC with \p SccNum ID,
+ /// false otherwise.
+ bool isSCCHeader(const BasicBlock *BB, int SccNum) const {
+ return getSccBlockType(BB, SccNum) & Header;
+ }
+ /// Returns true if \p BB is an 'exiting' block in SCC with \p SccNum ID,
+ /// false otherwise.
+ bool isSCCExitingBlock(const BasicBlock *BB, int SccNum) const {
+ return getSccBlockType(BB, SccNum) & Exiting;
+ }
+ /// Fills in \p Enters vector with all such blocks that don't belong to
+ /// SCC with \p SccNum ID but there is an edge to a block belonging to the
+ /// SCC.
+ void getSccEnterBlocks(int SccNum,
+ SmallVectorImpl<BasicBlock *> &Enters) const;
+ /// Fills in \p Exits vector with all such blocks that don't belong to
+ /// SCC with \p SccNum ID but there is an edge from a block belonging to the
+ /// SCC.
+ void getSccExitBlocks(int SccNum,
+ SmallVectorImpl<BasicBlock *> &Exits) const;
+
+ private:
+ /// Returns \p BB's type according to classification given by SccBlockType
+ /// enum. Please note that \p BB must belong to SSC with \p SccNum ID.
+ uint32_t getSccBlockType(const BasicBlock *BB, int SccNum) const;
+ /// Calculates \p BB's type and stores it in internal data structures for
+ /// future use. Please note that \p BB must belong to SSC with \p SccNum ID.
+ void calculateSccBlockType(const BasicBlock *BB, int SccNum);
};
private:
@@ -196,6 +249,9 @@ private:
/// Track the last function we run over for printing.
const Function *LastF = nullptr;
+ /// Keeps information about all SCCs in a function.
+ std::unique_ptr<const SccInfo> SccI;
+
/// Track the set of blocks directly succeeded by a returning block.
SmallPtrSet<const BasicBlock *, 16> PostDominatedByUnreachable;
@@ -210,8 +266,7 @@ private:
bool calcMetadataWeights(const BasicBlock *BB);
bool calcColdCallHeuristics(const BasicBlock *BB);
bool calcPointerHeuristics(const BasicBlock *BB);
- bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI,
- SccInfo &SccI);
+ bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI);
bool calcZeroHeuristics(const BasicBlock *BB, const TargetLibraryInfo *TLI);
bool calcFloatingPointHeuristics(const BasicBlock *BB);
bool calcInvokeHeuristics(const BasicBlock *BB);
diff --git a/llvm/lib/Analysis/BranchProbabilityInfo.cpp b/llvm/lib/Analysis/BranchProbabilityInfo.cpp
index a396b5ad21c6..195fc69d9601 100644
--- a/llvm/lib/Analysis/BranchProbabilityInfo.cpp
+++ b/llvm/lib/Analysis/BranchProbabilityInfo.cpp
@@ -148,6 +148,105 @@ static const uint32_t IH_TAKEN_WEIGHT = 1024 * 1024 - 1;
/// instruction. This is essentially never taken.
static const uint32_t IH_NONTAKEN_WEIGHT = 1;
+BranchProbabilityInfo::SccInfo::SccInfo(const Function &F) {
+ // Record SCC numbers of blocks in the CFG to identify irreducible loops.
+ // FIXME: We could only calculate this if the CFG is known to be irreducible
+ // (perhaps cache this info in LoopInfo if we can easily calculate it there?).
+ int SccNum = 0;
+ for (scc_iterator<const Function *> It = scc_begin(&F); !It.isAtEnd();
+ ++It, ++SccNum) {
+ // Ignore single-block SCCs since they either aren't loops or LoopInfo will
+ // catch them.
+ const std::vector<const BasicBlock *> &Scc = *It;
+ if (Scc.size() == 1)
+ continue;
+
+ LLVM_DEBUG(dbgs() << "BPI: SCC " << SccNum << ":");
+ for (const auto *BB : Scc) {
+ LLVM_DEBUG(dbgs() << " " << BB->getName());
+ SccNums[BB] = SccNum;
+ calculateSccBlockType(BB, SccNum);
+ }
+ LLVM_DEBUG(dbgs() << "\n");
+ }
+}
+
+int BranchProbabilityInfo::SccInfo::getSCCNum(const BasicBlock *BB) const {
+ auto SccIt = SccNums.find(BB);
+ if (SccIt == SccNums.end())
+ return -1;
+ return SccIt->second;
+}
+
+void BranchProbabilityInfo::SccInfo::getSccEnterBlocks(
+ int SccNum, SmallVectorImpl<BasicBlock *> &Enters) const {
+
+ for (auto MapIt : SccBlocks[SccNum]) {
+ const auto *BB = MapIt.first;
+ if (isSCCHeader(BB, SccNum))
+ for (const auto *Pred : predecessors(BB))
+ if (getSCCNum(Pred) != SccNum)
+ Enters.push_back(const_cast<BasicBlock *>(BB));
+ }
+}
+
+void BranchProbabilityInfo::SccInfo::getSccExitBlocks(
+ int SccNum, SmallVectorImpl<BasicBlock *> &Exits) const {
+ for (auto MapIt : SccBlocks[SccNum]) {
+ const auto *BB = MapIt.first;
+ if (isSCCExitingBlock(BB, SccNum))
+ for (const auto *Succ : successors(BB))
+ if (getSCCNum(Succ) != SccNum)
+ Exits.push_back(const_cast<BasicBlock *>(BB));
+ }
+}
+
+uint32_t BranchProbabilityInfo::SccInfo::getSccBlockType(const BasicBlock *BB,
+ int SccNum) const {
+ assert(getSCCNum(BB) == SccNum);
+
+ assert(SccBlocks.size() > static_cast<unsigned>(SccNum) && "Unknown SCC");
+ const auto &SccBlockTypes = SccBlocks[SccNum];
+
+ auto It = SccBlockTypes.find(BB);
+ if (It != SccBlockTypes.end()) {
+ return It->second;
+ }
+ return Inner;
+}
+
+void BranchProbabilityInfo::SccInfo::calculateSccBlockType(const BasicBlock *BB,
+ int SccNum) {
+ assert(getSCCNum(BB) == SccNum);
+ uint32_t BlockType = Inner;
+
+ if (llvm::any_of(make_range(pred_begin(BB), pred_end(BB)),
+ [&](const BasicBlock *Pred) {
+ // Consider any block that is an entry point to the SCC as
+ // a header.
+ return getSCCNum(Pred) != SccNum;
+ }))
+ BlockType |= Header;
+
+ if (llvm::any_of(
+ make_range(succ_begin(BB), succ_end(BB)),
+ [&](const BasicBlock *Succ) { return getSCCNum(Succ) != SccNum; }))
+ BlockType |= Exiting;
+
+ // Lazily compute the set of headers for a given SCC and cache the results
+ // in the SccHeaderMap.
+ if (SccBlocks.size() <= static_cast<unsigned>(SccNum))
+ SccBlocks.resize(SccNum + 1);
+ auto &SccBlockTypes = SccBlocks[SccNum];
+
+ if (BlockType != Inner) {
+ bool IsInserted;
+ std::tie(std::ignore, IsInserted) =
+ SccBlockTypes.insert(std::make_pair(BB, BlockType));
+ assert(IsInserted && "Duplicated block in SCC");
+ }
+}
+
static void UpdatePDTWorklist(const BasicBlock *BB, PostDominatorTree *PDT,
SmallVectorImpl<const BasicBlock *> &WorkList,
SmallPtrSetImpl<const BasicBlock *> &TargetSet) {
@@ -511,38 +610,6 @@ bool BranchProbabilityInfo::calcPointerHeuristics(const BasicBlock *BB) {
return true;
}
-static int getSCCNum(const BasicBlock *BB,
- const BranchProbabilityInfo::SccInfo &SccI) {
- auto SccIt = SccI.SccNums.find(BB);
- if (SccIt == SccI.SccNums.end())
- return -1;
- return SccIt->second;
-}
-
-// Consider any block that is an entry point to the SCC as a header.
-static bool isSCCHeader(const BasicBlock *BB, int SccNum,
- BranchProbabilityInfo::SccInfo &SccI) {
- assert(getSCCNum(BB, SccI) == SccNum);
-
- // Lazily compute the set of headers for a given SCC and cache the results
- // in the SccHeaderMap.
- if (SccI.SccHeaders.size() <= static_cast<unsigned>(SccNum))
- SccI.SccHeaders.resize(SccNum + 1);
- auto &HeaderMap = SccI.SccHeaders[SccNum];
- bool Inserted;
- BranchProbabilityInfo::SccHeaderMap::iterator HeaderMapIt;
- std::tie(HeaderMapIt, Inserted) = HeaderMap.insert(std::make_pair(BB, false));
- if (Inserted) {
- bool IsHeader = llvm::any_of(make_range(pred_begin(BB), pred_end(BB)),
- [&](const BasicBlock *Pred) {
- return getSCCNum(Pred, SccI) != SccNum;
- });
- HeaderMapIt->second = IsHeader;
- return IsHeader;
- } else
- return HeaderMapIt->second;
-}
-
// Compute the unlikely successors to the block BB in the loop L, specifically
// those that are unlikely because this is a loop, and add them to the
// UnlikelyBlocks set.
@@ -653,12 +720,11 @@ computeUnlikelySuccessors(const BasicBlock *BB, Loop *L,
// Calculate Edge Weights using "Loop Branch Heuristics". Predict backedges
// as taken, exiting edges as not-taken.
bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,
- const LoopInfo &LI,
- SccInfo &SccI) {
+ const LoopInfo &LI) {
int SccNum;
Loop *L = LI.getLoopFor(BB);
if (!L) {
- SccNum = getSCCNum(BB, SccI);
+ SccNum = SccI->getSCCNum(BB);
if (SccNum < 0)
return false;
}
@@ -685,9 +751,9 @@ bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,
else
InEdges.push_back(I.getSuccessorIndex());
} else {
- if (getSCCNum(*I, SccI) != SccNum)
+ if (SccI->getSCCNum(*I) != SccNum)
ExitingEdges.push_back(I.getSuccessorIndex());
- else if (isSCCHeader(*I, SccNum, SccI))
+ else if (SccI->isSCCHeader(*I, SccNum))
BackEdges.push_back(I.getSuccessorIndex());
else
InEdges.push_back(I.getSuccessorIndex());
@@ -1072,26 +1138,7 @@ void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
assert(PostDominatedByUnreachable.empty());
assert(PostDominatedByColdCall.empty());
- // Record SCC numbers of blocks in the CFG to identify irreducible loops.
- // FIXME: We could only calculate this if the CFG is known to be irreducible
- // (perhaps cache this info in LoopInfo if we can easily calculate it there?).
- int SccNum = 0;
- SccInfo SccI;
- for (scc_iterator<const Function *> It = scc_begin(&F); !It.isAtEnd();
- ++It, ++SccNum) {
- // Ignore single-block SCCs since they either aren't loops or LoopInfo will
- // catch them.
- const std::vector<const BasicBlock *> &Scc = *It;
- if (Scc.size() == 1)
- continue;
-
- LLVM_DEBUG(dbgs() << "BPI: SCC " << SccNum << ":");
- for (auto *BB : Scc) {
- LLVM_DEBUG(dbgs() << " " << BB->getName());
- SccI.SccNums[BB] = SccNum;
- }
- LLVM_DEBUG(dbgs() << "\n");
- }
+ SccI = std::make_unique<SccInfo>(F);
std::unique_ptr<PostDominatorTree> PDTPtr;
@@ -1119,7 +1166,7 @@ void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
continue;
if (calcColdCallHeuristics(BB))
continue;
- if (calcLoopBranchHeuristics(BB, LI, SccI))
+ if (calcLoopBranchHeuristics(BB, LI))
continue;
if (calcPointerHeuristics(BB))
continue;
@@ -1131,6 +1178,7 @@ void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
PostDominatedByUnreachable.clear();
PostDominatedByColdCall.clear();
+ SccI.release();
if (PrintBranchProb &&
(PrintBranchProbFuncName.empty() ||
</cut>
Hi folks,
I'm hoping that I might be able to get some development help with
binutils for aarch64...
I'm maintaining the UEFI Secure Boot stack in Debian (shim etc.),
including for arm64/aarch64 (as I wanted to make that work too!). UEFI
binaries are awkward for those of used to the Linux and ELF world -
they're PE/COFF format with different calling conventions to match the
Microsoft world. But we've made things work.
On x86 platforms, the shim build process uses objcopy
--target=efi-app-$(ARCH) to produce the final output binaries. We've
never had similar support for the aarch64 platform, and instead
somebody came up with a method using locally-hacked linker script and
"-O binary" to generate the output binaries. That's worked well
enough for a while, but it's been annoying for various reasons
(particularly debugging problems).
*However*, recently for security reasons we've tweaked the layout of
Secure Boot binaries [1] and this has caused lots of problems. The
older hacks to hand-build the right sections etc. needed significant
extra work, and we're still dealing with awkward bugs related to
this. Based ont these problems, I recently had to make the painful
decision to drop support for arm64 SB in Debian. I know that other
distributions are feeling similar pain. :-(
Rather than continuing to hack on things, I think it's (way past) time
that we did things correctly! We need aarch64 binary format support in
binutils so we can just use it like we do on x86. AFAICS, there is
already a bug open asking for this from last year [2]. Could I please
prevail on some friendly neighourhood aarch64 toolchain engineer to
help with that?
Thanks for considering,
Steve
[1] https://github.com/rhboot/shim/blob/main/SBAT.md
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=26206#add_comment
--
Steve McIntyre, Cambridge, UK. steve(a)einval.com
"...In the UNIX world, people tend to interpret `non-technical user'
as meaning someone who's only ever written one device driver." -- Daniel Pead
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut>
commit 8fe62b7af1127691d6732438b322e66ae6d39a03
Author: Max Kazantsev <mkazantsev(a)azul.com>
Date: Thu Apr 22 12:50:38 2021 +0700
[GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
</cut>
Results regressed to (for first_bad == 8fe62b7af1127691d6732438b322e66ae6d39a03)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-8fe62b7af1127691d6732438b322e66ae6d39a03/results_id:
1
# 464.h264ref,h264ref_base.default regressed by 104
from (for last_good == 722d4d8e7585457d407d0639a4ae2610157e06a8)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-722d4d8e7585457d407d0639a4ae2610157e06a8/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/641
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/642
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-8fe62b7af1127691d6732438b322e66ae6d39a03
cd investigate-llvm-8fe62b7af1127691d6732438b322e66ae6d39a03
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach 8fe62b7af1127691d6732438b322e66ae6d39a03
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 722d4d8e7585457d407d0639a4ae2610157e06a8
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 8fe62b7af1127691d6732438b322e66ae6d39a03
Author: Max Kazantsev <mkazantsev(a)azul.com>
Date: Thu Apr 22 12:50:38 2021 +0700
[GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
---
llvm/include/llvm/Transforms/Scalar/GVN.h | 6 ++
llvm/lib/Transforms/Scalar/GVN.cpp | 92 +++++++++++++++++++++++++--
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll | 9 ++-
3 files changed, 98 insertions(+), 9 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index 13f55ddcf2d6..70662ca213c3 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -328,6 +328,12 @@ private:
bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);
+ /// Try to replace a load which executes on each loop iteraiton with Phi
+ /// translation of load in preheader and load(s) in conditionally executed
+ /// paths.
+ bool performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
+ UnavailBlkVect &UnavailableBlocks);
+
/// Eliminates partially redundant \p Load, replacing it with \p
/// AvailableLoads (connected by Phis if needed).
void eliminatePartiallyRedundantLoad(
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 846a5cdb33d1..29da739fa16e 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -91,13 +91,14 @@ using namespace PatternMatch;
#define DEBUG_TYPE "gvn"
-STATISTIC(NumGVNInstr, "Number of instructions deleted");
-STATISTIC(NumGVNLoad, "Number of loads deleted");
-STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
+STATISTIC(NumGVNInstr, "Number of instructions deleted");
+STATISTIC(NumGVNLoad, "Number of loads deleted");
+STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");
-STATISTIC(NumGVNSimpl, "Number of instructions simplified");
+STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");
-STATISTIC(NumPRELoad, "Number of loads PRE'd");
+STATISTIC(NumPRELoad, "Number of loads PRE'd");
+STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");
STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
"Number of blocks speculated as available in "
@@ -1447,6 +1448,84 @@ bool GVN::PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
return true;
}
+bool GVN::performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
+ UnavailBlkVect &UnavailableBlocks) {
+ if (!LI)
+ return false;
+
+ const Loop *L = LI->getLoopFor(Load->getParent());
+ // TODO: Generalize to other loop blocks that dominate the latch.
+ if (!L || L->getHeader() != Load->getParent())
+ return false;
+
+ BasicBlock *Preheader = L->getLoopPreheader();
+ BasicBlock *Latch = L->getLoopLatch();
+ if (!Preheader || !Latch)
+ return false;
+
+ Value *LoadPtr = Load->getPointerOperand();
+ // Must be available in preheader.
+ if (!L->isLoopInvariant(LoadPtr))
+ return false;
+
+ // We plan to hoist the load to preheader without introducing a new fault.
+ // In order to do it, we need to prove that we cannot side-exit the loop
+ // once loop header is first entered before execution of the load.
+ if (ICF->isDominatedByICFIFromSameBlock(Load))
+ return false;
+
+ BasicBlock *LoopBlock = nullptr;
+ for (auto *Blocker : UnavailableBlocks) {
+ // Blockers from outside the loop are handled in preheader.
+ if (!L->contains(Blocker))
+ continue;
+
+ // Only allow one loop block. Loop header is not less frequently executed
+ // than each loop block, and likely it is much more frequently executed. But
+ // in case of multiple loop blocks, we need extra information (such as block
+ // frequency info) to understand whether it is profitable to PRE into
+ // multiple loop blocks.
+ if (LoopBlock)
+ return false;
+
+ // Do not sink into inner loops. This may be non-profitable.
+ if (L != LI->getLoopFor(Blocker))
+ return false;
+
+ // Blocks that dominate the latch execute on every single iteration, maybe
+ // except the last one. So PREing into these blocks doesn't make much sense
+ // in most cases. But the blocks that do not necessarily execute on each
+ // iteration are sometimes much colder than the header, and this is when
+ // PRE is potentially profitable.
+ if (DT->dominates(Blocker, Latch))
+ return false;
+
+ // Make sure that the terminator itself doesn't clobber.
+ if (Blocker->getTerminator()->mayWriteToMemory())
+ return false;
+
+ LoopBlock = Blocker;
+ }
+
+ if (!LoopBlock)
+ return false;
+
+ // Make sure the memory at this pointer cannot be freed, therefore we can
+ // safely reload from it after clobber.
+ if (LoadPtr->canBeFreed())
+ return false;
+
+ // TODO: Support critical edge splitting if blocker has more than 1 successor.
+ MapVector<BasicBlock *, Value *> AvailableLoads;
+ AvailableLoads[LoopBlock] = LoadPtr;
+ AvailableLoads[Preheader] = LoadPtr;
+
+ LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');
+ eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads);
+ ++NumPRELoopLoad;
+ return true;
+}
+
static void reportLoadElim(LoadInst *Load, Value *AvailableValue,
OptimizationRemarkEmitter *ORE) {
using namespace ore;
@@ -1544,7 +1623,8 @@ bool GVN::processNonLocalLoad(LoadInst *Load) {
if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))
return Changed;
- return Changed || PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
+ return Changed || PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) ||
+ performLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
}
static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {
diff --git a/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll b/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
index 15bc49e7ab9a..8ca12284d5c4 100644
--- a/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
+++ b/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
@@ -7,22 +7,25 @@ declare void @may_free_memory()
declare i32 @personality_function()
-; TODO: We can PRE the load from gc-managed memory away from the hot path.
+; We can PRE the load from gc-managed memory away from the hot path.
define i32 @test_load_on_cold_path_gc(i32 addrspace(1)* %p) gc "statepoint-example" personality i32 ()* @"personality_function" {
; CHECK-LABEL: @test_load_on_cold_path_gc(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[X_PRE1:%.*]] = load i32, i32 addrspace(1)* [[P:%.*]], align 4
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ]
-; CHECK-NEXT: [[X:%.*]] = load i32, i32 addrspace(1)* [[P:%.*]], align 4
+; CHECK-NEXT: [[X:%.*]] = phi i32 [ [[X_PRE1]], [[ENTRY:%.*]] ], [ [[X2:%.*]], [[BACKEDGE:%.*]] ]
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE]] ]
; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.*]], label [[COLD_PATH:%.*]]
; CHECK: hot_path:
; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: cold_path:
; CHECK-NEXT: call void @may_free_memory()
+; CHECK-NEXT: [[X_PRE:%.*]] = load i32, i32 addrspace(1)* [[P]], align 4
; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: backedge:
+; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
</cut>
This bot failure appears to be unrelated to the fingered change. From
the commit log, I'd guess that 9eaf0d120 by Joel E. Denny
<jdenny.ornl(a)gmail.com> was the triggering change, but that's not in the
blame list.
Joel, FYI.
Linaro folks, as bot owner, you should investigate why the blame list is
wrong. JFYI, this is not the only bot failing with the wrong blame list,
so it might be a common problem.
Philip
On 6/25/21 10:43 AM, llvm.buildmaster(a)lab.llvm.org wrote:
> The Buildbot has detected a failed build on builder clang-armv7-quick while building llvm.
>
> Full details are available at:
> https://lab.llvm.org/buildbot#builders/171/builds/113
>
> Worker for this Build: linaro-clang-armv7-quick
> Blamelist:
> Philip Reames <listmail(a)philipreames.com>
>
> BUILD FAILED: failed 48104 expected passes 1 unexpected failures 72 expected failures Unexpected test result output SKIPPED 26089 unsupported tests (failure)
>
> Step 5 (ninja check 1) failure: 48104 expected passes 1 unexpected failures 72 expected failures Unexpected test result output SKIPPED 26089 unsupported tests (failure)
> ******************** TEST 'Clang :: utils/update_cc_test_checks/check-globals.test' FAILED ********************
> Script:
> --
> : 'RUN: at line 1'; rm -rf /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp && mkdir /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp
> : 'RUN: at line 5'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c
> : 'RUN: at line 6'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c --check-globals
> : 'RUN: at line 7'; /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test --input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c --match-full-lines -strict-whitespace -check-prefixes=BOTH,NRM
> : 'RUN: at line 10'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c
> : 'RUN: at line 11'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c --check-globals --include-generated-funcs
> : 'RUN: at line 12'; /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test --input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c --match-full-lines -strict-whitespace -check-prefixes=BOTH,IGF
> : 'RUN: at line 17'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c
> : 'RUN: at line 18'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c --check-globals
> : 'RUN: at line 19'; diff -u /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c
> : 'RUN: at line 20'; rm /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c
> : 'RUN: at line 22'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c
> : 'RUN: at line 23'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c --check-globals --include-generated-funcs
> : 'RUN: at line 25'; diff -u /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c
> : 'RUN: at line 26'; rm /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c
> : 'RUN: at line 31'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/lit.cfg.example /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/lit.cfg
> : 'RUN: at line 33'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/lit/lit.py -Dclang_obj_root=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang -j1 -vv /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp
> : 'RUN: at line 35'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/lit/lit.py -Dclang_obj_root=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang -j1 -vv /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp 2>&1 | /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck -check-prefix=LIT-RUN /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test
> --
> Exit Code: 1
>
> Command Output (stdout):
> --
> $ ":" "RUN: at line 1"
> $ "rm" "-rf" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp"
> $ "mkdir" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp"
> $ ":" "RUN: at line 5"
> $ "cp" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c"
> $ ":" "RUN: at line 6"
> $ "/usr/bin/python3.6" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py" "--clang" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang" "--opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c" "--check-globals"
> # command stderr:
> NOTE: Executing non-FileChecked RUN line: true
>
> $ ":" "RUN: at line 7"
> $ "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test" "--input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c" "--match-full-lines" "-strict-whitespace" "-check-prefixes=BOTH,NRM"
> $ ":" "RUN: at line 10"
> $ "cp" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c"
> $ ":" "RUN: at line 11"
> $ "/usr/bin/python3.6" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py" "--clang" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang" "--opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c" "--check-globals" "--include-generated-funcs"
> # command stderr:
> NOTE: Executing non-FileChecked RUN line: true
>
> $ ":" "RUN: at line 12"
> $ "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test" "--input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c" "--match-full-lines" "-strict-whitespace" "-check-prefixes=BOTH,IGF"
> $ ":" "RUN: at line 17"
> $ "cp" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c"
> $ ":" "RUN: at line 18"
> ...
>
> Sincerely,
> LLVM Buildbot
>
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO
Culprit:
<cut>
commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6
Author: Richard Biener <rguenther(a)suse.de>
Date: Mon May 3 12:07:58 2021 +0200
Improve PHI handling in DSE
This improves handling of PHI defs when walking uses in
dse_classify_store to track two PHI defs. This happens
when there are CFG merges and one PHI feeds into another.
If we decide to want more then using a sbitmap for this might be
the way to go.
2021-05-03 Richard Biener <rguenther(a)suse.de>
* tree-ssa-dse.c (dse_classify_store): Track two PHI defs.
* gcc.dg/tree-ssa/ssa-dse-42.c: New testcase.
* gcc.dg/pr81192.c: Disable DSE.
</cut>
Results regressed to (for first_bad == 32955416d8040b1fa1ba21cd4179b3264e6c5bd6)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -O2_LTO_marm -- artifacts/build-32955416d8040b1fa1ba21cd4179b3264e6c5bd6/results_id:
1
# 483.xalancbmk,Xalan_base.default regressed by 103
from (for last_good == ed3c43224cc4e378dbab066122bc63536ccb1276)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -O2_LTO_marm -- artifacts/build-ed3c43224cc4e378dbab066122bc63536ccb1276/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Results ID of last_good: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O2_LTO/458
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Results ID of first_bad: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O2_LTO/480
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-32955416d8040b1fa1ba21cd4179b3264e6c5bd6
cd investigate-gcc-32955416d8040b1fa1ba21cd4179b3264e6c5bd6
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd gcc
# Reproduce first_bad build
git checkout --detach 32955416d8040b1fa1ba21cd4179b3264e6c5bd6
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ed3c43224cc4e378dbab066122bc63536ccb1276
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Full commit (up to 1000 lines):
<cut>
commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6
Author: Richard Biener <rguenther(a)suse.de>
Date: Mon May 3 12:07:58 2021 +0200
Improve PHI handling in DSE
This improves handling of PHI defs when walking uses in
dse_classify_store to track two PHI defs. This happens
when there are CFG merges and one PHI feeds into another.
If we decide to want more then using a sbitmap for this might be
the way to go.
2021-05-03 Richard Biener <rguenther(a)suse.de>
* tree-ssa-dse.c (dse_classify_store): Track two PHI defs.
* gcc.dg/tree-ssa/ssa-dse-42.c: New testcase.
* gcc.dg/pr81192.c: Disable DSE.
---
gcc/testsuite/gcc.dg/pr81192.c | 4 +++-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c | 20 ++++++++++++++++++++
gcc/tree-ssa-dse.c | 23 ++++++++++++++---------
3 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
index 71bbc13a0e9..6cab6056558 100644
--- a/gcc/testsuite/gcc.dg/pr81192.c
+++ b/gcc/testsuite/gcc.dg/pr81192.c
@@ -1,4 +1,4 @@
-/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp" } */
+/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp -fno-tree-dse" } */
/* Disable tree-evrp because the new version of evrp sees
<bb 3> :
@@ -16,6 +16,8 @@ produces
# iftmp.2_12 = PHI <2147483647(3), iftmp.2_11(4)>
which causes the situation being tested to dissapear before we get to PRE. */
+/* Likewise disable DSE which also elides the tail merging "opportunity". */
+
#if __SIZEOF_INT__ == 2
#define unsigned __UINT32_TYPE__
#define int __INT32_TYPE__
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c
new file mode 100644
index 00000000000..34108c83828
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1" } */
+
+int a[2];
+void foo(int i, int k, int j)
+{
+ a[0] = i;
+ if (k)
+ a[0] = a[i] + k;
+ else
+ {
+ if (j)
+ a[1] = 1;
+ a[0] = a[i] + 3;
+ }
+ a[0] = 0;
+}
+
+/* The last stores to a[0] and a[1] remain. */
+/* { dg-final { scan-tree-dump-times " = " 2 "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index e0a944c704a..dfa6d314727 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -799,7 +799,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
return DSE_STORE_LIVE;
auto_vec<gimple *, 10> defs;
- gimple *phi_def = NULL;
+ gimple *first_phi_def = NULL;
+ gimple *last_phi_def = NULL;
FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
{
/* Limit stmt walking. */
@@ -825,7 +826,9 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
SSA_NAME_VERSION (PHI_RESULT (use_stmt))))
{
defs.safe_push (use_stmt);
- phi_def = use_stmt;
+ if (!first_phi_def)
+ first_phi_def = use_stmt;
+ last_phi_def = use_stmt;
}
}
/* If the statement is a use the store is not dead. */
@@ -889,6 +892,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
gimple *def = defs[i];
gimple *use_stmt;
use_operand_p use_p;
+ tree vdef = (gimple_code (def) == GIMPLE_PHI
+ ? gimple_phi_result (def) : gimple_vdef (def));
/* If the path to check starts with a kill we do not need to
process it further.
??? With byte tracking we need only kill the bytes currently
@@ -901,8 +906,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
}
/* If the path ends here we do not need to process it further.
This for example happens with calls to noreturn functions. */
- else if (gimple_code (def) != GIMPLE_PHI
- && has_zero_uses (gimple_vdef (def)))
+ else if (has_zero_uses (vdef))
{
/* But if the store is to global memory it is definitely
not dead. */
@@ -912,12 +916,13 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
}
/* In addition to kills we can remove defs whose only use
is another def in defs. That can only ever be PHIs of which
- we track a single for simplicity reasons (we fail for multiple
- PHIs anyways). We can also ignore defs that feed only into
+ we track two for simplicity reasons, the first and last in
+ {first,last}_phi_def (we fail for multiple PHIs anyways).
+ We can also ignore defs that feed only into
already visited PHIs. */
- else if (gimple_code (def) != GIMPLE_PHI
- && single_imm_use (gimple_vdef (def), &use_p, &use_stmt)
- && (use_stmt == phi_def
+ else if (single_imm_use (vdef, &use_p, &use_stmt)
+ && (use_stmt == first_phi_def
+ || use_stmt == last_phi_def
|| (gimple_code (use_stmt) == GIMPLE_PHI
&& bitmap_bit_p (visited,
SSA_NAME_VERSION
</cut>
Progress:
* UM-2 [QEMU upstream maintainership]
+ Not much this week. Reviewed rth's bswap improvement/cleanup series
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Implemented logical-immediate insns; various vector shifts; VADDLV;
some of the scalar shifts that work on general-purpose registers
+ Fixed a few bugs in already-implemented insns (widening/narrowing
load/store, and VRMLALDAVH, VRMLSLDAVH)
+ Progress: 102/210 (48%)
-- PMM