After llvm commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
[LV] Pass compare predicate to getCmpSelInstrCost.
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 11115 to 11846 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
cd investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 3d549dddf75b6ff9e0ec8c053677750bde4226ea
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ab31d003e16e483bff298ea2f28fec0f23e8eb79
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
Date: Mon Dec 6 11:14:27 2021 +0000
[LV] Pass compare predicate to getCmpSelInstrCost.
If the condition of a select is a compare, pass its predicate to
TTI::getCmpSelInstrCost to get a more accurate cost value instead
of passing BAD_ICMP_PREDICATE.
I noticed that the commit message from D90070 had a comment about the
vectorized select predicate possibly being composed of other compares with
different predicate values, but I wasn't able to construct an example
where this was an actual issue. If this is an issue, I guess we could
add another check that the block isn't predicated for any reason.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D114646
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 11 ++++++++---
llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll | 14 +++++++-------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 050879144afd..c03e506b7474 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7570,8 +7570,12 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);
- return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+
+ CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+ if (auto *Cmp = dyn_cast<CmpInst>(SI->getCondition()))
+ Pred = Cmp->getPredicate();
+ return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, Pred,
+ CostKind, I);
}
case Instruction::ICmp:
case Instruction::FCmp: {
@@ -7581,7 +7585,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);
VectorTy = ToVectorTy(ValTy, VF);
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+ cast<CmpInst>(I)->getPredicate(), CostKind,
+ I);
}
case Instruction::Store:
case Instruction::Load: {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
index 62b18f44fbc5..20d2dc0b7cda 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
@@ -5,17 +5,17 @@ target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios5.0.0"
define void @selects_1(i32* nocapture %dst, i32 %A, i32 %B, i32 %C, i32 %N) {
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
; CHECK-LABEL: define void @selects_1(
; CHECK: vector.body:
-; CHECK: select <2 x i1>
+; CHECK: select <4 x i1>
entry:
%cmp26 = icmp sgt i32 %N, 0
</cut>
Dear Linaro Toolchain Working Group,
clang-thumbv7-full-2stage is red for 20 days.
Could you take it to the staging area and make it green again, please?
Thanks
Galina
After llvm commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d
Author: David Blaikie <dblaikie(a)gmail.com>
Add missing header
the following benchmarks slowed down by more than 2%:
- 433.milc slowed down by 4% from 12427 to 12916 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-bd4c6a476fd037fb07a1c484f75d93ee40713d3d
cd investigate-llvm-bd4c6a476fd037fb07a1c484f75d93ee40713d3d
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach bd4c6a476fd037fb07a1c484f75d93ee40713d3d
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7d4da4e1ab7f79e51db0d5c2a0f5ef1711122dd7
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d
Author: David Blaikie <dblaikie(a)gmail.com>
Date: Mon Nov 29 16:29:25 2021 -0800
Add missing header
---
llvm/lib/Demangle/DLangDemangle.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/llvm/lib/Demangle/DLangDemangle.cpp b/llvm/lib/Demangle/DLangDemangle.cpp
index faf91b239490..f380aa90035e 100644
--- a/llvm/lib/Demangle/DLangDemangle.cpp
+++ b/llvm/lib/Demangle/DLangDemangle.cpp
@@ -17,6 +17,7 @@
#include "llvm/Demangle/StringView.h"
#include "llvm/Demangle/Utility.h"
+#include <cctype>
#include <cstring>
#include <limits>
</cut>
VirtIO Initiative ([STR-9])
===========================
- synced up on [AX_XDP task with Akashi-san]
- synced on rust-vmm
[AX_XDP task with Akashi-san]
<https://linaro.atlassian.net/browse/STR-68>
vhost-device maintainer effort ([UM-196])
- started looking at https://github.com/rust-vmm/vhost-device/pull/4
QEMU Upstream Work ([UM-2])
===========================
- posted [PULL for 6.2 0/8] more tcg, plugin, test and build fixes
Message-Id: <20211129171449.4176301-1-alex.bennee(a)linaro.org>
- commented on Re: Follow-up on the CXL discussion at OFTC Message-Id:
<20211119015207.62fhk5mjmvaj5nz4(a)intel.com> to see if I can unblock
- posted [RFC PATCH] blog post: how to get your new feature
up-streamed Message-Id:
<20211126203319.3298089-1-alex.bennee(a)linaro.org>
- posted [PATCH for 6.2?] Revert "vga: don't abort when adding a
duplicate isa-vga device" Message-Id:
<20211202164929.1119036-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM
Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Other
=====
- wrote [RFC PATCH 0/2] insn plugin tweaks for measuring frequency
Message-Id: <20211203144421.1445232-1-alex.bennee(a)linaro.org>
- might make a good basis for a TCG plugins blog post
Completed Reviews [2/2]
=======================
[PATCH] tests/plugin/syscall.c: fix compiler warnings
Message-Id: <20211128011551.2115468-1-juro.bystricky(a)intel.com>
[PATCH for-6.2? 0/2] arm_gicv3: Fix handling of LPIs in list registers
Message-Id: <20211126163915.1048353-2-peter.maydell(a)linaro.org>
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Code review: worked through some of the backlog and accumulated
a list of series to take once the tree reopens for 7.0
- Wrote and sent some cleanup patches relating to the qemu-common.h
header file
- Fixed a bug where we miscalculated the length for TLB range
invalidations
* QEMU-420 [GICv4 emulation]
- Found the problem with PCI passthrough in my nested test setup:
apparently virtio PCI devices need an extra command line argument
to get them to honour the presence of an IOMMU. Everything is
now working and I've put some notes about the setup into
https://linaro.atlassian.net/browse/QEMU-447
- started to implement the GICv4 redistributor changes
-- PMM
VirtIO Initiative ([STR-9])
===========================
- [this weeks sync], topics on AF_XDP, virtio-video and
virtio-watchdog
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 6.2 v2 0/7] more tcg, plugin, test and build fixes
Message-Id: <20211125154144.2904741-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM
Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Other
=====
- renewal feedback
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20211101132300.192584-1-mlevitsk(a)redhat.com>
[PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes
Message-Id: <20211124202005.989935-1-peter.maydell(a)linaro.org>
Absences
========
- off 2 days sick
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée