Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2
Culprit:
<cut>
commit d181fd918d18cbd99768f025e14a69d35d275f14
Author: Simon Pilgrim <llvm-dev(a)redking.me.uk>
Date: Fri Jul 2 14:27:27 2021 +0100
[CostModel][X86] Drop some hard coded fp<->int scalarization costs
Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695
</cut>
Results regressed to (for first_bad == d181fd918d18cbd99768f025e14a69d35d275f14)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_marm -- artifacts/build-d181fd918d18cbd99768f025e14a69d35d275f14/results_id:
1
# 400.perlbench,libc-2.33.9000.so regressed by 113
from (for last_good == 5df556ac8bb8c5f4ef3dff1a2039dd389d1d27c0)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_marm -- artifacts/build-5df556ac8bb8c5f4ef3dff1a2039dd389d1d27c0/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/1840
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/1837
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-d181fd918d18cbd99768f025e14a69d35d275f14
cd investigate-llvm-d181fd918d18cbd99768f025e14a69d35d275f14
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ --exclude llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach d181fd918d18cbd99768f025e14a69d35d275f14
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 5df556ac8bb8c5f4ef3dff1a2039dd389d1d27c0
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit d181fd918d18cbd99768f025e14a69d35d275f14
Author: Simon Pilgrim <llvm-dev(a)redking.me.uk>
Date: Fri Jul 2 14:27:27 2021 +0100
[CostModel][X86] Drop some hard coded fp<->int scalarization costs
Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695
---
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | 13 -------------
llvm/test/Analysis/CostModel/X86/sitofp.ll | 6 +++---
2 files changed, 3 insertions(+), 16 deletions(-)
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index d55cd8a8c7a8..9eb5abe4dd9b 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1977,13 +1977,6 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 10 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 6 },
- // The generic code to compute the scalar overhead is currently broken.
- // Workaround this limitation by estimating the scalarization overhead
- // here. We have roughly 10 instructions per scalar element.
- // Multiply that by the vector width.
- // FIXME: remove that when PR19268 is fixed.
- { ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
- { ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 4 },
{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f64, 3 },
@@ -2003,12 +1996,6 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f32, 3 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 9 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f64, 19 },
- // This node is expanded into scalarized operations but BasicTTI is overly
- // optimistic estimating its cost. It computes 3 per element (one
- // vector-extract, one scalar conversion and one vector-insert). The
- // problem is that the inserts form a read-modify-write chain so latency
- // should be factored in too. Inflating the cost per element by 1.
- { ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },
{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },
{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },
diff --git a/llvm/test/Analysis/CostModel/X86/sitofp.ll b/llvm/test/Analysis/CostModel/X86/sitofp.ll
index b3c400c93b9f..b327454c1d09 100644
--- a/llvm/test/Analysis/CostModel/X86/sitofp.ll
+++ b/llvm/test/Analysis/CostModel/X86/sitofp.ll
@@ -122,14 +122,14 @@ define i32 @sitofp_i64_double() {
; AVX-LABEL: 'sitofp_i64_double'
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cvt_i64_f64 = sitofp i64 undef to double
; AVX-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cvt_v2i64_v2f64 = sitofp <2 x i64> undef to <2 x double>
-; AVX-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double>
-; AVX-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %cvt_v8i64_v8f64 = sitofp <8 x i64> undef to <8 x double>
+; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double>
+; AVX-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cvt_v8i64_v8f64 = sitofp <8 x i64> undef to <8 x double>
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; AVX512F-LABEL: 'sitofp_i64_double'
; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cvt_i64_f64 = sitofp i64 undef to double
; AVX512F-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cvt_v2i64_v2f64 = sitofp <2 x i64> undef to <2 x double>
-; AVX512F-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double>
+; AVX512F-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double>
; AVX512F-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %cvt_v8i64_v8f64 = sitofp <8 x i64> undef to <8 x double>
; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
</cut>
All,
During Connect the suggestion was made that each working group should have
its own IRC Channel for discussions and topics relating to the group in
particular (as opposed to #linaro which is 'generic' Linaro conversations).
Therefore I have just set up #linaro-tcwg on Freenode for the Toolchain
Working Group.
This channel is public and open to anyone who wants to talk with the TCWG
group about anything toolchain related.
Thanks,
Matt
--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
Progress:
* UM-2 [QEMU upstream maintainership]
+ Code review:
- ITS patchset v6
- RTH's series to allow usermode emulation users to set default vector length
+ Arm pullreq; shepherding stuff in for softfreeze
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Sent out patchset with the 3rd slice of MVE insns; I now have
all the non-floating-point insns done, I think. New insns:
scatter-gather loads/stores, interleaving loads/stores, VCTP,
realized I already did VMOVL as it is "VSHLL-by-0"
+ Implemented most of the floating point insns: implemented
VADD fp, VSUB fp, VABD fp, VMUL fp, VMAXNM, VMINNM, VCADD fp,
VFMA, VFMS, VCMUL, VCMLA, VMAXNMA, VMINNMA, VADD fp scalar,
VSUB fp scalar, VMUL fp scalar, VFMA fp scalar, VFMAS fp scalar,
VMAXNMV, VMAXNMAV, VMINNMV, VMINNMAV, VCMP and VPT fp vector and scalar,
VCVT fixed-point
+ Just 4 insns to go: three flavours of VCVT, plus VRINT (and then
all the other stuff that wasn't included in this simplistic
measure of progress :-))
+ Progress: 206/210 (98%)
-- PMM