Project Stratos
===============
- reviewed Peter's virtio-video patches for QEMU
[PR to clean up some typos in EDK2]
<https://github.com/tianocore/edk2-platforms/pull/34>
vhost-device maintainer effort ([UM-196])
- started reviewing https://github.com/rust-vmm/vhost-device/pull/7
- looking pretty good, see how
https://github.com/rust-vmm/vm-virtio/commit/463dd20552fc32139bbbb56e9152df…
would work with it
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
QEMU Upstream Work ([UM-2])
===========================
- posted [RFC PATCH 0/6] Basic skeleton of RP2040 Raspbery Pi Pico
Message-Id: <20220110175104.2908956-1-alex.bennee(a)linaro.org>
- posted [PATCH v1 00/34] testing/next and other misc fixes
Message-Id: <20220105135009.1584676-1-alex.bennee(a)linaro.org>
- and the eventual [PULL 00/31] testing/next and other misc fixes
Message-Id: <20220112112722.3641051-1-alex.bennee(a)linaro.org>
- and the inevitable fixup [RFC PATCH] linux-user: expand reserved
brk space for 64bit guests Message-Id:
<20220113165550.4184455-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG
sanity tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Completed Reviews [6/6]
=======================
[PATCH] tests/docker: Add gentoo-loongarch64-cross image and run cross builds in GitLab
Message-Id: <20211229062204.3726981-1-git(a)xen0n.name>
[PATCH 0/2] tests/tcg: Fix float_{convs,madds}
Message-Id: <20211224035541.2159966-1-richard.henderson(a)linaro.org>
[PATCH v5 00/18] tests/docker: start using libvirt-ci's "lcitool" for dockerfiles
Message-Id: <20211215141949.3512719-1-berrange(a)redhat.com>
[PATCH] tests/tcg: Unconditionally use 90 second timeout
Message-Id: <20211230235424.49155-1-richard.henderson(a)linaro.org>
[PATCH] gitlab-ci: Speed up the msys2-64bit job by using --without-default-devices
Message-Id: <20211216082253.43899-1-thuth(a)redhat.com>
[PATCH 0/8] virtio: Add vhost-user based Video decode
Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org>
Absences
========
Current Review Queue
====================
TODO [PATCH 0/6] Introduce CanoKey QEMU
Message-Id: <YcSupUSXWDXOAkas@Sun>
=========================================================================
TODO [PATCH] target/arm: Add missing FEAT_TLBIOS instructions
Message-Id: <20211231103928.1455657-1-idan.horowitz(a)gmail.com>
========================================================================================================================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Most of this week was spent on continuing to work through
my code-review queue :-/
- Sent a few minor cleanup patches for linux-user nits I noticed while
reading the code as part of reviewing a big bsd-user patchset
* QEMU-420 [GICv4 emulation]
- got some reviewed ITS cleanup patches upstream
- rerolled and sent v2 patchset for the rest of the cleanup patches
- got back up to speed with where I left my GICv4 ITS patches
before Christmas, and dealt with some minor loose ends I'd
left in the last patch or two I was working on.
-- PMM
Hi Peter,
Welcome back, hope you had a good Christmas break. I'm off oh holiday myself for the next
two weeks, so this would be an ideal time to pass back merge control to you.
The board is mostly green now, with occasional allowed failures for centos-stream and
freebsd for upstream package manager failures.
See yall in a couple of weeks.
r~
[UM-2]
* Re-greening of gitlab-ci.
- There are continuing issues with cross-i386-tci.
Occasionally I see *really* long test times:
https://gitlab.com/qemu-project/qemu/-/jobs/1941996332
with qtest-aarch64/qom-test taking 1738s, or 28 of the 60 minute budget.
More often it's merely slow:
https://gitlab.com/qemu-project/qemu/-/jobs/1954634840
with qtest-aarch64/qom-test taking 538s. Note that locally this test
runs in about 100s, and I have been unable to determine why it runs so
much slower on gitlab.
- Worked on a ppc64-softmmu slowdown leading to timeouts.
- Fixes for meson regressions affecting testing.
* Refresh tcg unaligned user patch sets.
r~
Progress (short week, 2 days):
* UM-2 [QEMU upstream maintainership]
- Catching up with email and codereview backlog from 3 weeks holiday :-)
(Have got the codereview queue down to less than a dozen things
so should be able to do some more GICv4 development next week.)
-- PMM
Project Stratos
===============
- got Xen working on the MachiatoBin
- posted Configuring the host GIC for guest to guest IPI Message-Id:
<87fsqwn2sd.fsf(a)linaro.org>
QEMU Upstream Work ([UM-2])
===========================
- posted [RFC PATCH] linux-user: don't adjust base of found hole
Message-Id: <20211216144442.2270605-1-alex.bennee(a)linaro.org>
- posted [PATCH] hw/arm: add control knob to disable kaslr_seed via
DTB Message-Id: <20211215120926.1696302-1-alex.bennee(a)linaro.org>
Completed Reviews [3/3]
=======================
[PATCH 00/26] arm gicv3 ITS: Various bug fixes and refactorings
Message-Id: <20211211191135.1764649-1-peter.maydell(a)linaro.org>
[PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org>
[PATCH-for-6.2? v2 0/5] docs/devel/style: Improve rST rendering
Message-Id: <20211118145716.4116731-1-philmd(a)redhat.com>
Absences
========
Off for holidays, back in the new year. Merry Christmas everyone!
--
Alex Bennée
Project Stratos
===============
- posted Potential demo setup for a TSN/XDP networking Message-Id:
<87wnkfkp2f.fsf(a)linaro.org>
- final Stratos call of the year
- CC and Arnd will look at fat virtq
- nice update from EPAM on Zephyr
- had another round of getting working ACPI on MachiatoBin
- posted [PR to clean up some typos in EDK2]
- might have a working Xen setup without needing SMC hacks
[PR to clean up some typos in EDK2]
<https://github.com/tianocore/edk2-platforms/pull/34>
vhost-device maintainer effort ([UM-196])
- finished review of https://github.com/rust-vmm/vhost-device/pull/4
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
QEMU Upstream Work ([UM-2])
===========================
- discussion around Suggestions for TCG performance improvements
Message-Id: <c76bde31-8f3b-2d03-b7c7-9e026d4b5873(a)huawei.com>
- did a bunch of bug triage and tagging
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- awaiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity
tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Completed Reviews [3/3]
=======================
[PATCH] tests/plugin/syscall.c: fix compiler warnings
Message-Id: <20211128011551.2115468-1-juro.bystricky(a)intel.com>
[PATCH for-6.2? 0/2] arm_gicv3: Fix handling of LPIs in list registers
Message-Id: <20211126163915.1048353-2-peter.maydell(a)linaro.org>
[PATCH] tests/docker: add libfuse3 development headers
Message-Id: <20211207160025.52466-1-stefanha(a)redhat.com>
Absences
========
Current Review Queue
====================
TODO [PATCH 0/8] virtio: Add vhost-user based Video decode
Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org>
========================================================================================================================
TODO [PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org>
========================================================================================================================================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- More code review: now have a target-arm.next poised and ready to
send once 6.2 is released
* QEMU-420 [GICv4 emulation]
- Working on the ITS changes needed for GICv4 support (this turns
out to be a more tractable end to start than the redistributor)
- I have a preliminary set of 25 or so patches to the ITS which
clean up the code and fix some pre-existing bugs that I found
while working on the GICv4 changes
- have implemented the new VMAPI, VMAPTI, VMAPP ITS commands
-- PMM
After llvm commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
[LV] Pass compare predicate to getCmpSelInstrCost.
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 11115 to 11846 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
cd investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 3d549dddf75b6ff9e0ec8c053677750bde4226ea
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ab31d003e16e483bff298ea2f28fec0f23e8eb79
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
Date: Mon Dec 6 11:14:27 2021 +0000
[LV] Pass compare predicate to getCmpSelInstrCost.
If the condition of a select is a compare, pass its predicate to
TTI::getCmpSelInstrCost to get a more accurate cost value instead
of passing BAD_ICMP_PREDICATE.
I noticed that the commit message from D90070 had a comment about the
vectorized select predicate possibly being composed of other compares with
different predicate values, but I wasn't able to construct an example
where this was an actual issue. If this is an issue, I guess we could
add another check that the block isn't predicated for any reason.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D114646
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 11 ++++++++---
llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll | 14 +++++++-------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 050879144afd..c03e506b7474 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7570,8 +7570,12 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);
- return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+
+ CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+ if (auto *Cmp = dyn_cast<CmpInst>(SI->getCondition()))
+ Pred = Cmp->getPredicate();
+ return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, Pred,
+ CostKind, I);
}
case Instruction::ICmp:
case Instruction::FCmp: {
@@ -7581,7 +7585,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);
VectorTy = ToVectorTy(ValTy, VF);
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+ cast<CmpInst>(I)->getPredicate(), CostKind,
+ I);
}
case Instruction::Store:
case Instruction::Load: {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
index 62b18f44fbc5..20d2dc0b7cda 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
@@ -5,17 +5,17 @@ target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios5.0.0"
define void @selects_1(i32* nocapture %dst, i32 %A, i32 %B, i32 %C, i32 %N) {
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
; CHECK-LABEL: define void @selects_1(
; CHECK: vector.body:
-; CHECK: select <2 x i1>
+; CHECK: select <4 x i1>
entry:
%cmp26 = icmp sgt i32 %N, 0
</cut>
Dear Linaro Toolchain Working Group,
clang-thumbv7-full-2stage is red for 20 days.
Could you take it to the staging area and make it green again, please?
Thanks
Galina
After llvm commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d
Author: David Blaikie <dblaikie(a)gmail.com>
Add missing header
the following benchmarks slowed down by more than 2%:
- 433.milc slowed down by 4% from 12427 to 12916 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-bd4c6a476fd037fb07a1c484f75d93ee40713d3d
cd investigate-llvm-bd4c6a476fd037fb07a1c484f75d93ee40713d3d
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach bd4c6a476fd037fb07a1c484f75d93ee40713d3d
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7d4da4e1ab7f79e51db0d5c2a0f5ef1711122dd7
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d
Author: David Blaikie <dblaikie(a)gmail.com>
Date: Mon Nov 29 16:29:25 2021 -0800
Add missing header
---
llvm/lib/Demangle/DLangDemangle.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/llvm/lib/Demangle/DLangDemangle.cpp b/llvm/lib/Demangle/DLangDemangle.cpp
index faf91b239490..f380aa90035e 100644
--- a/llvm/lib/Demangle/DLangDemangle.cpp
+++ b/llvm/lib/Demangle/DLangDemangle.cpp
@@ -17,6 +17,7 @@
#include "llvm/Demangle/StringView.h"
#include "llvm/Demangle/Utility.h"
+#include <cctype>
#include <cstring>
#include <limits>
</cut>
VirtIO Initiative ([STR-9])
===========================
- synced up on [AX_XDP task with Akashi-san]
- synced on rust-vmm
[AX_XDP task with Akashi-san]
<https://linaro.atlassian.net/browse/STR-68>
vhost-device maintainer effort ([UM-196])
- started looking at https://github.com/rust-vmm/vhost-device/pull/4
QEMU Upstream Work ([UM-2])
===========================
- posted [PULL for 6.2 0/8] more tcg, plugin, test and build fixes
Message-Id: <20211129171449.4176301-1-alex.bennee(a)linaro.org>
- commented on Re: Follow-up on the CXL discussion at OFTC Message-Id:
<20211119015207.62fhk5mjmvaj5nz4(a)intel.com> to see if I can unblock
- posted [RFC PATCH] blog post: how to get your new feature
up-streamed Message-Id:
<20211126203319.3298089-1-alex.bennee(a)linaro.org>
- posted [PATCH for 6.2?] Revert "vga: don't abort when adding a
duplicate isa-vga device" Message-Id:
<20211202164929.1119036-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM
Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Other
=====
- wrote [RFC PATCH 0/2] insn plugin tweaks for measuring frequency
Message-Id: <20211203144421.1445232-1-alex.bennee(a)linaro.org>
- might make a good basis for a TCG plugins blog post
Completed Reviews [2/2]
=======================
[PATCH] tests/plugin/syscall.c: fix compiler warnings
Message-Id: <20211128011551.2115468-1-juro.bystricky(a)intel.com>
[PATCH for-6.2? 0/2] arm_gicv3: Fix handling of LPIs in list registers
Message-Id: <20211126163915.1048353-2-peter.maydell(a)linaro.org>
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Code review: worked through some of the backlog and accumulated
a list of series to take once the tree reopens for 7.0
- Wrote and sent some cleanup patches relating to the qemu-common.h
header file
- Fixed a bug where we miscalculated the length for TLB range
invalidations
* QEMU-420 [GICv4 emulation]
- Found the problem with PCI passthrough in my nested test setup:
apparently virtio PCI devices need an extra command line argument
to get them to honour the presence of an IOMMU. Everything is
now working and I've put some notes about the setup into
https://linaro.atlassian.net/browse/QEMU-447
- started to implement the GICv4 redistributor changes
-- PMM
VirtIO Initiative ([STR-9])
===========================
- [this weeks sync], topics on AF_XDP, virtio-video and
virtio-watchdog
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 6.2 v2 0/7] more tcg, plugin, test and build fixes
Message-Id: <20211125154144.2904741-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM
Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Other
=====
- renewal feedback
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20211101132300.192584-1-mlevitsk(a)redhat.com>
[PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes
Message-Id: <20211124202005.989935-1-peter.maydell(a)linaro.org>
Absences
========
- off 2 days sick
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress:
* QEMU-420 [GICv4 emulation]
- Tracked down and fixed a bug in our ITS emulation which would
(intermittently?) result in a Linux guest reporting "irq 54:
nobody cared" and hanging, because we were not correctly
recalculating the highest priority pending interrupt when the
guest acknowledged a pending LPI. This fix will go into 6.2.
- Set up a test environment for GICv4 work -- because the major
feature of GICv4 is support for directly injecting interrupts
into a VM, the test setup needs to be nested virtualization,
where an outer L1 guest runs on pure emulated QEMU, the inner
L2 guest uses KVM (as provided by L1), and we pass a PCI device
(emulated by QEMU) through from L1 to L2. I think I have this
correctly set up now, but...
- ...the L2 guest hangs because it apparently never sees an
interrupt from the passed-through PCI device. This implies a
bug in our current GICv3 emulation somewhere: need to track
this down before starting in on GICv4 work.
- Separately, I found through code inspection a bug where we
do the wrong thing in the non-passthrough case when the L1 guest
sets a virtual interrupt for the L2 guest in the GIC list
registers and that interrupt has an ID > 1023 (ie it is an LPI).
We got this wrong both for acknowledging and ending an interrupt,
so the two bugs cancel each other out except that we don't set
the vCPU priority and so the L2 guest might get an unexpected
interrupt while it was servicing the LPI. Patches sent.
-- PMM
VirtIO Initiative ([STR-9])
===========================
- posted Initial thoughts for test scenarios for AF_XDP epic
Message-Id: <87k0h5v6ju.fsf(a)linaro.org>
vhost-device maintainer effort ([UM-196])
- more review
- [did some more noodling with rust] to get comfortable with generics
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
[vhost-device crate] <https://github.com/rust-vmm/vhost-device>
[did some more noodling with rust]
<https://gitlab.com/stsquad/softfloat.rs>
QEMU Upstream Work ([UM-2])
===========================
- posted [PULL for 6.2 0/7] misc build and test fixes Message-Id:
<20211116162515.4100231-1-alex.bennee(a)linaro.org>
- posted [RFC PATCH] tests/avocado: fix tcg_plugin mem access count
test Message-Id: <20211117095448.136558-1-alex.bennee(a)linaro.org>
- posted Re: [RFC PATCH] plugins/meson.build: fix linker issue with
weird paths (for v6.2?) Message-Id:
<20211117111924.179776-1-alex.bennee(a)linaro.org>
- posted Re: [PATCH v2 1/3] icount: preserve cflags when custom tb is
about to execute Message-Id: <87h7cbw1tx.fsf(a)linaro.org>
- posted [RFC PATCH] gdbstub: handle a potentially racing TaskState
Message-Id: <20211119145124.942390-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v3 0/3] GIC ITS tests Message-Id:
<20211112114734.3058678-1-alex.bennee(a)linaro.org>
- posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM
Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] Some watchpoint-related patches
Message-Id: <163662450348.125458.5494710452733592356.stgit@pasha-ThinkPad-X280>
[PATCH 0/5] Update linux-headers + NOIRQ support for KVM gdbstub
Message-Id: <20211111110604.207376-1-pbonzini(a)redhat.com>
Absences
========
- none
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress (short week, 3 days)
* UM-2 [QEMU upstream maintainership]
- Still trying to sort out the regression of booting EL3 guest
code on the imx7 board. I got most of the way through prototyping
a cleanup which would fix this, but then spotted that the
highbank board has a more awkward-to-fix similar problem.
We're going to revert the PSCI emulation change for 6.2 so we
can take the time to get the cleanup right and land it in 7.0.
- Usual patch accumulation, review, etc during release cycle
-- PMM
VirtIO Initiative ([STR-9])
===========================
- project admin
[STR-9] <https://linaro.atlassian.net/browse/STR-9>
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
[proposal] <https://github.com/rust-vmm/vhost-device/pull/57>
vhost-device maintainer effort ([UM-196])
- did a bunch of review on [vhost-device crate]
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
[vhost-device crate] <https://github.com/rust-vmm/vhost-device>
QEMU Upstream Work ([UM-2])
===========================
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v3 0/3] GIC ITS tests Message-Id:
<20211112114734.3058678-1-alex.bennee(a)linaro.org>
- might as well flush the tree state as I left it
- posted [RFC PATCH] hw/intc: clean-up error reporting for failed
ITS cmd Message-Id:
<20211112170454.3158925-1-alex.bennee(a)linaro.org>
- re-based [mttcg tests to current state and fixed up]
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] Some watchpoint-related patches
Message-Id: <163662450348.125458.5494710452733592356.stgit@pasha-ThinkPad-X280>
[PATCH 0/5] Update linux-headers + NOIRQ support for KVM gdbstub
Message-Id: <20211111110604.207376-1-pbonzini(a)redhat.com>
Absences
========
- none
Current Review Queue
====================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
TODO [PATCH] softmmu: fix watchpoint processing in icount mode
Message-Id: <163101424137.678744.18360776310711795413.stgit@pasha-ThinkPad-X280>
==============================================================================================================================================
--
Alex Bennée
Progress (short week, 3 days)
* UM-2 [QEMU upstream maintainership]
- recent changes to QEMU's PSCI emulation broke booting of guest code
at EL3 on the imx7 board, which was previously accidentally
relying on PSCI-emulation-via-SMC not getting in its way despite
being enabled. We need to make this board disable PSCI when the
guest code is booting to EL3, as the virt board does, but it's
trickier here because the CPU-creation code is hidden inside a
model of an SoC object. After some on-list discussion I have a
plan for how to restructure this, and need to write some code...
* QEMU-420 [GICv4 emulation]
- re-read the GIC architecture specification, acquired a better
understanding of the required work, and broke this epic down into
stories
- discussed with Leif how the ITS support should be landed in the
sbsa-ref board
Misc:
* higher-than-usual amount of meetings and meeting-prep this week
-- PMM
After llvm commit f411c1dd95092139c8b992260705ac0b75c8583f
Author: Peter Klausler <pklausler(a)nvidia.com>
[flang] Fix crash in semantic error recovery situation
the following benchmarks slowed down by more than 2%:
- 456.hmmer slowed down by 3% from 7600 to 7806 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-f411c1dd95092139c8b992260705ac0b75c8583f
cd investigate-llvm-f411c1dd95092139c8b992260705ac0b75c8583f
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach f411c1dd95092139c8b992260705ac0b75c8583f
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c0b298fc213c1b33e97ca72fba58597365375875
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f411c1dd95092139c8b992260705ac0b75c8583f
Author: Peter Klausler <pklausler(a)nvidia.com>
Date: Tue Nov 2 16:41:15 2021 -0700
[flang] Fix crash in semantic error recovery situation
A CHECK() in semantics is triggering when analyzing a program
with an undefined derived type pointer because the CHECK is
expecting a new error message to have been issued in a function
but not allowing for the case that a diagnostic could have been
produced earlier. Adjust the predicate.
Differential Revision: https://reviews.llvm.org/D113307
---
flang/lib/Semantics/expression.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index 331b9b2cf5bc..8ee8c9a9c9ce 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -1916,7 +1916,7 @@ auto ExpressionAnalyzer::AnalyzeProcedureComponentRef(
"Base of procedure component reference is not a derived-type object"_err_en_US);
}
}
- CHECK(!GetContextualMessages().empty());
+ CHECK(context_.AnyFatalError());
return std::nullopt;
}
</cut>
Hello,
We have been using Linaro GCC 7.5-2019.12 for the A53.
As we move on to new tech there seems to be no support for "-
mcpu=cortex-a55".
Today, we use the aarch64-elf- toolchain.
What GCC do you suggest we start using for A55 ?
Thanks,
Stefan
VirtIO Initiative ([STR-9])
===========================
- various rust-vmm discussions
- [upstream rust-vmm sync meeting]
- how to deal with vhost-device/vm-virtio split: [proposal]
- synced with ARM on their interests
- got update on Fwd: FW: [App-services] Slides from the
hypervisor-less virtio status meeting Message-Id:
<CAHDbmO2G4hUyfxtaxwnbxsrMk+P41zbL-7VNe=Aa6DshxC-5zQ(a)mail.gmail.com>
[STR-9] <https://linaro.atlassian.net/browse/STR-9>
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
[proposal] <https://github.com/rust-vmm/vhost-device/pull/57>
QEMU Upstream Work ([UM-2])
===========================
- did some bug triage and investigated [555] and [690] which might
intersect with earlier changes I made
- spent time on the PR from hell [PULL 00/30] testing, gdbstub and
semihosting Message-Id:
<20210115130828.23968-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
[555] <https://gitlab.com/qemu-project/qemu/-/issues/555>
[690] <https://gitlab.com/qemu-project/qemu/-/issues/690>
Other
=====
- TSC report preparation for QEMU and Stratos
Completed Reviews [1/1]
=======================
[XEN PATCH v7 00/51] xen: Build system improvements, now with out-of-tree build!
Message-Id: <20210824105038.1257926-1-anthony.perard(a)citrix.com>
Absences
========
,----
| (save-excursion
| (goto-char (point-min))
| (when (re-search-forward "* Absences")
| (goto-char (match-beginning 0))
| (org-export-as 'ascii t nil t )))
`----
Current Review Queue
====================
TODO [PATCH v2 00/48] tcg: optimize redundant sign extensions
Message-Id: <20211007195456.1168070-1-richard.henderson(a)linaro.org>
================================================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress
* UM-2 [QEMU upstream maintainership]
+ worked through the big pile of email that had built up while
I was on holiday...
+ some long-delayed sysadmin tasks on my work machines now I have
an opportunity to go into the office and do things that would be
too risky with only remote access
+ triaged a bunch of Coverity issues
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ All work here has now gone upstream; closed!
-- PMM
After llvm commit fbc0c308d599fe3300ab6516650b65b41979446d
Author: Nikita Popov <nikita.ppv(a)gmail.com>
[BasicAA] Handle known bits as ranges
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 10899 to 11610 perf samples
- 464.h264ref:libc.so.6 slowed down by 11% from 3538 to 3922 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d
cd investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach fbc0c308d599fe3300ab6516650b65b41979446d
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 30a3652b6ade43504087f6e3acd8dc879055f501
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit fbc0c308d599fe3300ab6516650b65b41979446d
Author: Nikita Popov <nikita.ppv(a)gmail.com>
Date: Mon Oct 25 15:47:21 2021 +0200
[BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by
checking whether all variable indices are positive based on known
bits, multiplied by a positive scale. However, this is incorrect
if the scale multiplication might overflow. In the modified test
case the original value is positive, but may be negative after a
left shift.
Fix this by converting known bits into a constant range and reusing
the range-based logic, which handles overflow correctly.
Differential Revision: https://reviews.llvm.org/D112611
---
llvm/lib/Analysis/BasicAliasAnalysis.cpp | 51 +++++-----------------
.../test/Analysis/BasicAA/assume-index-positive.ll | 4 +-
2 files changed, 12 insertions(+), 43 deletions(-)
diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index 0305732ca5d5..8cf947c43bf4 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -318,15 +318,6 @@ struct CastedValue {
return N;
}
- KnownBits evaluateWith(KnownBits N) const {
- assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() &&
- "Incompatible bit width");
- if (TruncBits) N = N.trunc(N.getBitWidth() - TruncBits);
- if (SExtBits) N = N.sext(N.getBitWidth() + SExtBits);
- if (ZExtBits) N = N.zext(N.getBitWidth() + ZExtBits);
- return N;
- }
-
ConstantRange evaluateWith(ConstantRange N) const {
assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() &&
"Incompatible bit width");
@@ -1250,8 +1241,6 @@ AliasResult BasicAAResult::aliasGEP(
if (!DecompGEP1.VarIndices.empty()) {
APInt GCD;
- bool AllNonNegative = DecompGEP1.Offset.isNonNegative();
- bool AllNonPositive = DecompGEP1.Offset.isNonPositive();
ConstantRange OffsetRange = ConstantRange(DecompGEP1.Offset);
for (unsigned i = 0, e = DecompGEP1.VarIndices.size(); i != e; ++i) {
const VariableGEPIndex &Index = DecompGEP1.VarIndices[i];
@@ -1266,24 +1255,19 @@ AliasResult BasicAAResult::aliasGEP(
else
GCD = APIntOps::GreatestCommonDivisor(GCD, ScaleForGCD.abs());
- if (AllNonNegative || AllNonPositive) {
- KnownBits Known = Index.Val.evaluateWith(
- computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT));
- bool SignKnownZero = Known.isNonNegative();
- bool SignKnownOne = Known.isNegative();
- AllNonNegative &= (SignKnownZero && Scale.isNonNegative()) ||
- (SignKnownOne && Scale.isNonPositive());
- AllNonPositive &= (SignKnownZero && Scale.isNonPositive()) ||
- (SignKnownOne && Scale.isNonNegative());
- }
+ ConstantRange CR =
+ computeConstantRange(Index.Val.V, true, &AC, Index.CxtI);
+ KnownBits Known =
+ computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT);
+ CR = CR.intersectWith(
+ ConstantRange::fromKnownBits(Known, /* Signed */ true),
+ ConstantRange::Signed);
assert(OffsetRange.getBitWidth() == Scale.getBitWidth() &&
"Bit widths are normalized to MaxPointerSize");
- OffsetRange = OffsetRange.add(Index.Val
- .evaluateWith(computeConstantRange(
- Index.Val.V, true, &AC, Index.CxtI))
- .sextOrTrunc(OffsetRange.getBitWidth())
- .smul_fast(ConstantRange(Scale)));
+ OffsetRange = OffsetRange.add(
+ Index.Val.evaluateWith(CR).sextOrTrunc(OffsetRange.getBitWidth())
+ .smul_fast(ConstantRange(Scale)));
}
// We now have accesses at two offsets from the same base:
@@ -1300,21 +1284,6 @@ AliasResult BasicAAResult::aliasGEP(
(GCD - ModOffset).uge(V1Size.getValue()))
return AliasResult::NoAlias;
- // If we know all the variables are non-negative, then the total offset is
- // also non-negative and >= DecompGEP1.Offset. We have the following layout:
- // [0, V2Size) ... [TotalOffset, TotalOffer+V1Size]
- // If DecompGEP1.Offset >= V2Size, the accesses don't alias.
- if (AllNonNegative && V2Size.hasValue() &&
- DecompGEP1.Offset.uge(V2Size.getValue()))
- return AliasResult::NoAlias;
- // Similarly, if the variables are non-positive, then the total offset is
- // also non-positive and <= DecompGEP1.Offset. We have the following layout:
- // [TotalOffset, TotalOffset+V1Size) ... [0, V2Size)
- // If -DecompGEP1.Offset >= V1Size, the accesses don't alias.
- if (AllNonPositive && V1Size.hasValue() &&
- (-DecompGEP1.Offset).uge(V1Size.getValue()))
- return AliasResult::NoAlias;
-
if (V1Size.hasValue() && V2Size.hasValue()) {
// Compute ranges of potentially accessed bytes for both accesses. If the
// interseciton is empty, there can be no overlap.
diff --git a/llvm/test/Analysis/BasicAA/assume-index-positive.ll b/llvm/test/Analysis/BasicAA/assume-index-positive.ll
index 451592067f4b..a53fff2c6009 100644
--- a/llvm/test/Analysis/BasicAA/assume-index-positive.ll
+++ b/llvm/test/Analysis/BasicAA/assume-index-positive.ll
@@ -130,12 +130,12 @@ define void @symmetry([0 x i8]* %ptr, i32 %a, i32 %b, i32 %c) {
ret void
}
-; TODO: %ptr.neg and %ptr.shl may alias, as the shl renders the previously
+; %ptr.neg and %ptr.shl may alias, as the shl renders the previously
; non-negative value potentially negative.
define void @shl_of_non_negative(i8* %ptr, i64 %a) {
; CHECK-LABEL: Function: shl_of_non_negative
; CHECK: NoAlias: i8* %ptr.a, i8* %ptr.neg
-; CHECK: NoAlias: i8* %ptr.neg, i8* %ptr.shl
+; CHECK: MayAlias: i8* %ptr.neg, i8* %ptr.shl
%a.cmp = icmp sge i64 %a, 0
call void @llvm.assume(i1 %a.cmp)
%ptr.neg = getelementptr i8, i8* %ptr, i64 -2
</cut>
After llvm commit adf55ac6657693f7bfbe3087b599b4031a765a44
Author: Lang Hames <lhames(a)gmail.com>
[ORC] Call ExecutorProcessControl::disconnect in unit tests that require it.
the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%):
- 400.perlbench:[.] S_find_byclass slowed down by 12% from 644 to 721 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-adf55ac6657693f7bfbe3087b599b4031a765a44
cd investigate-llvm-adf55ac6657693f7bfbe3087b599b4031a765a44
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach adf55ac6657693f7bfbe3087b599b4031a765a44
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach f526ee5b8517b60620cd03bb3e5945ed69d6bfaa
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit adf55ac6657693f7bfbe3087b599b4031a765a44
Author: Lang Hames <lhames(a)gmail.com>
Date: Tue Oct 12 14:55:49 2021 -0700
[ORC] Call ExecutorProcessControl::disconnect in unit tests that require it.
Another follow-up to 2815ed57e3c and 19b4e3cfc6a. For unit tests that don't use
an ExecutionSession we need to call ExecutorProcessControl::disconnect directly
to wait for the dispatcher to shut down.
https://llvm.org/PR52153
---
.../ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp | 2 ++
llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp | 2 ++
2 files changed, 4 insertions(+)
diff --git a/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp b/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp
index f2b157e424b6..a95435aec2a3 100644
--- a/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp
+++ b/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp
@@ -134,6 +134,8 @@ TEST(EPCGenericJITLinkMemoryManagerTest, AllocFinalizeFree) {
auto Err2 = MemMgr->deallocate(std::move(*FA));
EXPECT_THAT_ERROR(std::move(Err2), Succeeded());
+
+ cantFail(SelfEPC->disconnect());
}
} // namespace
diff --git a/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp b/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp
index 78024644ca8b..beb0fefa094a 100644
--- a/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp
+++ b/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp
@@ -93,6 +93,8 @@ TEST(EPCGenericMemoryAccessTest, MemWrites) {
{{pointerToJITTargetAddress(&Test_Buffer), TestMsg}});
EXPECT_THAT_ERROR(std::move(Err5), Succeeded());
EXPECT_EQ(StringRef(Test_Buffer, TestMsg.size()), TestMsg);
+
+ cantFail(SelfEPC->disconnect());
}
} // namespace
</cut>
== This Week ==
* GCC
- Committed a clean up patch to gimple-isel
- PR93183: Committed fix
- PR102376: Patch approved upstream
- PR83750: Patch approved upstream but it regresses one test-case.
== Next Week ==
- Continue with ongoing tasks
After llvm commit bc69dd62c04a70d29943c1c06c7effed150b70e1
Author: Alexey Bataev <a.bataev(a)outlook.com>
[SLP]Improve graph reordering.
the following benchmarks grew in size by more than 1%:
- 444.namd grew in size by 2% from 192302 to 195218 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-bc69dd62c04a70d29943c1c06c7effed150b70e1
cd investigate-llvm-bc69dd62c04a70d29943c1c06c7effed150b70e1
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach bc69dd62c04a70d29943c1c06c7effed150b70e1
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 5661317f864abf750cf893c6a4cc7a977be0995a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit bc69dd62c04a70d29943c1c06c7effed150b70e1
Author: Alexey Bataev <a.bataev(a)outlook.com>
Date: Tue Aug 3 13:20:32 2021 -0700
[SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/D105020
---
.../llvm/Transforms/Vectorize/SLPVectorizer.h | 3 +-
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 1364 ++++++++++++++------
.../AArch64/transpose-inseltpoison.ll | 84 +-
.../Transforms/SLPVectorizer/AArch64/transpose.ll | 84 +-
llvm/test/Transforms/SLPVectorizer/X86/addsub.ll | 42 +-
.../Transforms/SLPVectorizer/X86/crash_cmpop.ll | 6 +-
llvm/test/Transforms/SLPVectorizer/X86/extract.ll | 6 +-
.../SLPVectorizer/X86/jumbled-load-multiuse.ll | 12 +-
.../Transforms/SLPVectorizer/X86/jumbled-load.ll | 22 +-
.../SLPVectorizer/X86/jumbled_store_crash.ll | 29 +-
.../SLPVectorizer/X86/reorder_repeated_ops.ll | 4 +-
.../SLPVectorizer/X86/split-load8_2-unord.ll | 4 +-
.../X86/vectorize-reorder-alt-shuffle.ll | 9 +-
.../SLPVectorizer/X86/vectorize-reorder-reuse.ll | 52 +-
14 files changed, 1119 insertions(+), 602 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h b/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
index f416a592d683..5e8c29913cad 100644
--- a/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
+++ b/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
@@ -95,8 +95,7 @@ private:
/// Try to vectorize a list of operands.
/// \returns true if a value was vectorized.
- bool tryToVectorizeList(ArrayRef<Value *> VL, slpvectorizer::BoUpSLP &R,
- bool AllowReorder = false);
+ bool tryToVectorizeList(ArrayRef<Value *> VL, slpvectorizer::BoUpSLP &R);
/// Try to vectorize a chain that may start at the operands of \p I.
bool tryToVectorize(Instruction *I, slpvectorizer::BoUpSLP &R);
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 9c0029484964..7400b3d8a503 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -21,6 +21,7 @@
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/Optional.h"
#include "llvm/ADT/PostOrderIterator.h"
+#include "llvm/ADT/PriorityQueue.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/SetVector.h"
@@ -535,13 +536,68 @@ static bool isSimple(Instruction *I) {
return true;
}
+/// Shuffles \p Mask in accordance with the given \p SubMask.
+static void addMask(SmallVectorImpl<int> &Mask, ArrayRef<int> SubMask) {
+ if (SubMask.empty())
+ return;
+ if (Mask.empty()) {
+ Mask.append(SubMask.begin(), SubMask.end());
+ return;
+ }
+ SmallVector<int> NewMask(SubMask.size(), UndefMaskElem);
+ int TermValue = std::min(Mask.size(), SubMask.size());
+ for (int I = 0, E = SubMask.size(); I < E; ++I) {
+ if (SubMask[I] >= TermValue || SubMask[I] == UndefMaskElem ||
+ Mask[SubMask[I]] >= TermValue)
+ continue;
+ NewMask[I] = Mask[SubMask[I]];
+ }
+ Mask.swap(NewMask);
+}
+
+/// Order may have elements assigned special value (size) which is out of
+/// bounds. Such indices only appear on places which correspond to undef values
+/// (see canReuseExtract for details) and used in order to avoid undef values
+/// have effect on operands ordering.
+/// The first loop below simply finds all unused indices and then the next loop
+/// nest assigns these indices for undef values positions.
+/// As an example below Order has two undef positions and they have assigned
+/// values 3 and 7 respectively:
+/// before: 6 9 5 4 9 2 1 0
+/// after: 6 3 5 4 7 2 1 0
+/// \returns Fixed ordering.
+static void fixupOrderingIndices(SmallVectorImpl<unsigned> &Order) {
+ const unsigned Sz = Order.size();
+ SmallBitVector UsedIndices(Sz);
+ SmallVector<int> MaskedIndices;
+ for (unsigned I = 0; I < Sz; ++I) {
+ if (Order[I] < Sz)
+ UsedIndices.set(Order[I]);
+ else
+ MaskedIndices.push_back(I);
+ }
+ if (MaskedIndices.empty())
+ return;
+ SmallVector<int> AvailableIndices(MaskedIndices.size());
+ unsigned Cnt = 0;
+ int Idx = UsedIndices.find_first();
+ do {
+ AvailableIndices[Cnt] = Idx;
+ Idx = UsedIndices.find_next(Idx);
+ ++Cnt;
+ } while (Idx > 0);
+ assert(Cnt == MaskedIndices.size() && "Non-synced masked/available indices.");
+ for (int I = 0, E = MaskedIndices.size(); I < E; ++I)
+ Order[MaskedIndices[I]] = AvailableIndices[I];
+}
+
namespace llvm {
static void inversePermutation(ArrayRef<unsigned> Indices,
SmallVectorImpl<int> &Mask) {
Mask.clear();
const unsigned E = Indices.size();
- Mask.resize(E, E + 1);
+ Mask.resize(E, UndefMaskElem);
for (unsigned I = 0; I < E; ++I)
Mask[Indices[I]] = I;
}
@@ -581,6 +637,22 @@ static Optional<int> getInsertIndex(Value *InsertInst, unsigned Offset) {
return Index;
}
+/// Reorders the list of scalars in accordance with the given \p Order and then
+/// the \p Mask. \p Order - is the original order of the scalars, need to
+/// reorder scalars into an unordered state at first according to the given
+/// order. Then the ordered scalars are shuffled once again in accordance with
+/// the provided mask.
+static void reorderScalars(SmallVectorImpl<Value *> &Scalars,
+ ArrayRef<int> Mask) {
+ assert(!Mask.empty() && "Expected non-empty mask.");
+ SmallVector<Value *> Prev(Scalars.size(),
+ UndefValue::get(Scalars.front()->getType()));
+ Prev.swap(Scalars);
+ for (unsigned I = 0, E = Prev.size(); I < E; ++I)
+ if (Mask[I] != UndefMaskElem)
+ Scalars[Mask[I]] = Prev[I];
+}
+
namespace slpvectorizer {
/// Bottom Up SLP Vectorizer.
@@ -645,13 +717,12 @@ public:
void buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst = None);
- /// Construct a vectorizable tree that starts at \p Roots, ignoring users for
- /// the purpose of scheduling and extraction in the \p UserIgnoreLst taking
- /// into account (and updating it, if required) list of externally used
- /// values stored in \p ExternallyUsedValues.
- void buildTree(ArrayRef<Value *> Roots,
- ExtraValueToDebugLocsMap &ExternallyUsedValues,
- ArrayRef<Value *> UserIgnoreLst = None);
+ /// Builds external uses of the vectorized scalars, i.e. the list of
+ /// vectorized scalars to be extracted, their lanes and their scalar users. \p
+ /// ExternallyUsedValues contains additional list of external uses to handle
+ /// vectorization of reductions.
+ void
+ buildExternalUses(const ExtraValueToDebugLocsMap &ExternallyUsedValues = {});
/// Clear the internal data structures that are created by 'buildTree'.
void deleteTree() {
@@ -659,8 +730,6 @@ public:
ScalarToTreeEntry.clear();
MustGather.clear();
ExternalUses.clear();
- NumOpsWantToKeepOrder.clear();
- NumOpsWantToKeepOriginalOrder = 0;
for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();
BS->clear();
@@ -674,103 +743,22 @@ public:
/// Perform LICM and CSE on the newly generated gather sequences.
void optimizeGatherSequence();
- /// \returns The best order of instructions for vectorization.
- Optional<ArrayRef<unsigned>> bestOrder() const {
- assert(llvm::all_of(
- NumOpsWantToKeepOrder,
- [this](const decltype(NumOpsWantToKeepOrder)::value_type &D) {
- return D.getFirst().size() ==
- VectorizableTree[0]->Scalars.size();
- }) &&
- "All orders must have the same size as number of instructions in "
- "tree node.");
- auto I = std::max_element(
- NumOpsWantToKeepOrder.begin(), NumOpsWantToKeepOrder.end(),
- [](const decltype(NumOpsWantToKeepOrder)::value_type &D1,
- const decltype(NumOpsWantToKeepOrder)::value_type &D2) {
- return D1.second < D2.second;
- });
- if (I == NumOpsWantToKeepOrder.end() ||
- I->getSecond() <= NumOpsWantToKeepOriginalOrder)
- return None;
-
- return makeArrayRef(I->getFirst());
- }
-
- /// Builds the correct order for root instructions.
- /// If some leaves have the same instructions to be vectorized, we may
- /// incorrectly evaluate the best order for the root node (it is built for the
- /// vector of instructions without repeated instructions and, thus, has less
- /// elements than the root node). This function builds the correct order for
- /// the root node.
- /// For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
- /// are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
- /// leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
- /// be reordered, the best order will be \<1, 0\>. We need to extend this
- /// order for the root node. For the root node this order should look like
- /// \<3, 0, 1, 2\>. This function extends the order for the reused
- /// instructions.
- void findRootOrder(OrdersType &Order) {
- // If the leaf has the same number of instructions to vectorize as the root
- // - order must be set already.
- unsigned RootSize = VectorizableTree[0]->Scalars.size();
- if (Order.size() == RootSize)
- return;
- SmallVector<unsigned, 4> RealOrder(Order.size());
- std::swap(Order, RealOrder);
- SmallVector<int, 4> Mask;
- inversePermutation(RealOrder, Mask);
- Order.assign(Mask.begin(), Mask.end());
- // The leaf has less number of instructions - need to find the true order of
- // the root.
- // Scan the nodes starting from the leaf back to the root.
- const TreeEntry *PNode = VectorizableTree.back().get();
- SmallVector<const TreeEntry *, 4> Nodes(1, PNode);
- SmallPtrSet<const TreeEntry *, 4> Visited;
- while (!Nodes.empty() && Order.size() != RootSize) {
- const TreeEntry *PNode = Nodes.pop_back_val();
- if (!Visited.insert(PNode).second)
- continue;
- const TreeEntry &Node = *PNode;
- for (const EdgeInfo &EI : Node.UserTreeIndices)
- if (EI.UserTE)
- Nodes.push_back(EI.UserTE);
- if (Node.ReuseShuffleIndices.empty())
- continue;
- // Build the order for the parent node.
- OrdersType NewOrder(Node.ReuseShuffleIndices.size(), RootSize);
- SmallVector<unsigned, 4> OrderCounter(Order.size(), 0);
- // The algorithm of the order extension is:
- // 1. Calculate the number of the same instructions for the order.
- // 2. Calculate the index of the new order: total number of instructions
- // with order less than the order of the current instruction + reuse
- // number of the current instruction.
- // 3. The new order is just the index of the instruction in the original
- // vector of the instructions.
- for (unsigned I : Node.ReuseShuffleIndices)
- ++OrderCounter[Order[I]];
- SmallVector<unsigned, 4> CurrentCounter(Order.size(), 0);
- for (unsigned I = 0, E = Node.ReuseShuffleIndices.size(); I < E; ++I) {
- unsigned ReusedIdx = Node.ReuseShuffleIndices[I];
- unsigned OrderIdx = Order[ReusedIdx];
- unsigned NewIdx = 0;
- for (unsigned J = 0; J < OrderIdx; ++J)
- NewIdx += OrderCounter[J];
- NewIdx += CurrentCounter[OrderIdx];
- ++CurrentCounter[OrderIdx];
- assert(NewOrder[NewIdx] == RootSize &&
- "The order index should not be written already.");
- NewOrder[NewIdx] = I;
- }
- std::swap(Order, NewOrder);
- }
- assert(Order.size() == RootSize &&
- "Root node is expected or the size of the order must be the same as "
- "the number of elements in the root node.");
- assert(llvm::all_of(Order,
- [RootSize](unsigned Val) { return Val != RootSize; }) &&
- "All indices must be initialized");
- }
+ /// Reorders the current graph to the most profitable order starting from the
+ /// root node to the leaf nodes. The best order is chosen only from the nodes
+ /// of the same size (vectorization factor). Smaller nodes are considered
+ /// parts of subgraph with smaller VF and they are reordered independently. We
+ /// can make it because we still need to extend smaller nodes to the wider VF
+ /// and we can merge reordering shuffles with the widening shuffles.
+ void reorderTopToBottom();
+
+ /// Reorders the current graph to the most profitable order starting from
+ /// leaves to the root. It allows to rotate small subgraphs and reduce the
+ /// number of reshuffles if the leaf nodes use the same order. In this case we
+ /// can merge the orders and just shuffle user node instead of shuffling its
+ /// operands. Plus, even the leaf nodes have different orders, it allows to
+ /// sink reordering in the graph closer to the root node and merge it later
+ /// during analysis.
+ void reorderBottomToTop();
/// \return The vector element size in bits to use when vectorizing the
/// expression tree ending at \p V. If V is a store, the size is the width of
@@ -793,6 +781,10 @@ public:
return MinVecRegSize;
}
+ unsigned getMinVF(unsigned Sz) const {
+ return std::max(2U, getMinVecRegSize() / Sz);
+ }
+
unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
unsigned MaxVF = MaxVFOption.getNumOccurrences() ?
MaxVFOption : TTI->getMaximumVF(ElemWidth, Opcode);
@@ -1621,12 +1613,29 @@ private:
/// \returns true if the scalars in VL are equal to this entry.
bool isSame(ArrayRef<Value *> VL) const {
- if (VL.size() == Scalars.size())
- return std::equal(VL.begin(), VL.end(), Scalars.begin());
- return VL.size() == ReuseShuffleIndices.size() &&
- std::equal(
- VL.begin(), VL.end(), ReuseShuffleIndices.begin(),
- [this](Value *V, int Idx) { return V == Scalars[Idx]; });
+ auto &&IsSame = [VL](ArrayRef<Value *> Scalars, ArrayRef<int> Mask) {
+ if (Mask.size() != VL.size() && VL.size() == Scalars.size())
+ return std::equal(VL.begin(), VL.end(), Scalars.begin());
+ return VL.size() == Mask.size() &&
+ std::equal(
+ VL.begin(), VL.end(), Mask.begin(),
+ [Scalars](Value *V, int Idx) { return V == Scalars[Idx]; });
+ };
+ if (!ReorderIndices.empty()) {
+ // TODO: implement matching if the nodes are just reordered, still can
+ // treat the vector as the same if the list of scalars matches VL
+ // directly, without reordering.
+ SmallVector<int> Mask;
+ inversePermutation(ReorderIndices, Mask);
+ if (VL.size() == Scalars.size())
+ return IsSame(Scalars, Mask);
+ if (VL.size() == ReuseShuffleIndices.size()) {
+ ::addMask(Mask, ReuseShuffleIndices);
+ return IsSame(Scalars, Mask);
+ }
+ return false;
+ }
+ return IsSame(Scalars, ReuseShuffleIndices);
}
/// A vector of scalars.
@@ -1701,6 +1710,12 @@ private:
}
}
+ /// Reorders operands of the node to the given mask \p Mask.
+ void reorderOperands(ArrayRef<int> Mask) {
+ for (ValueList &Operand : Operands)
+ reorderScalars(Operand, Mask);
+ }
+
/// \returns the \p OpIdx operand of this TreeEntry.
ValueList &getOperand(unsigned OpIdx) {
assert(OpIdx < Operands.size() && "Off bounds");
@@ -1760,19 +1775,14 @@ private:
return AltOp ? AltOp->getOpcode() : 0;
}
- /// Update operations state of this entry if reorder occurred.
- bool updateStateIfReorder() {
- if (ReorderIndices.empty())
- return false;
- InstructionsState S = getSameOpcode(Scalars, ReorderIndices.front());
- setOperations(S);
- return true;
- }
- /// When ReuseShuffleIndices is empty it just returns position of \p V
- /// within vector of Scalars. Otherwise, try to remap on its reuse index.
+ /// When ReuseReorderShuffleIndices is empty it just returns position of \p
+ /// V within vector of Scalars. Otherwise, try to remap on its reuse index.
int findLaneForValue(Value *V) const {
unsigned FoundLane = std::distance(Scalars.begin(), find(Scalars, V));
assert(FoundLane < Scalars.size() && "Couldn't find extract lane");
+ if (!ReorderIndices.empty())
+ FoundLane = ReorderIndices[FoundLane];
+ assert(FoundLane < Scalars.size() && "Couldn't find extract lane");
if (!ReuseShuffleIndices.empty()) {
FoundLane = std::distance(ReuseShuffleIndices.begin(),
find(ReuseShuffleIndices, FoundLane));
@@ -1856,7 +1866,7 @@ private:
TreeEntry *newTreeEntry(ArrayRef<Value *> VL, Optional<ScheduleData *> Bundle,
const InstructionsState &S,
const EdgeInfo &UserTreeIdx,
- ArrayRef<unsigned> ReuseShuffleIndices = None,
+ ArrayRef<int> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {
TreeEntry::EntryState EntryState =
Bundle ? TreeEntry::Vectorize : TreeEntry::NeedToGather;
@@ -1869,7 +1879,7 @@ private:
Optional<ScheduleData *> Bundle,
const InstructionsState &S,
const EdgeInfo &UserTreeIdx,
- ArrayRef<unsigned> ReuseShuffleIndices = None,
+ ArrayRef<int> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {
assert(((!Bundle && EntryState == TreeEntry::NeedToGather) ||
(Bundle && EntryState != TreeEntry::NeedToGather)) &&
@@ -1877,12 +1887,25 @@ private:
VectorizableTree.push_back(std::make_unique<TreeEntry>(VectorizableTree));
TreeEntry *Last = VectorizableTree.back().get();
Last->Idx = VectorizableTree.size() - 1;
- Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->State = EntryState;
Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
ReuseShuffleIndices.end());
- Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
- Last->setOperations(S);
+ if (ReorderIndices.empty()) {
+ Last->Scalars.assign(VL.begin(), VL.end());
+ Last->setOperations(S);
+ } else {
+ // Reorder scalars and build final mask.
+ Last->Scalars.assign(VL.size(), nullptr);
+ transform(ReorderIndices, Last->Scalars.begin(),
+ [VL](unsigned Idx) -> Value * {
+ if (Idx >= VL.size())
+ return UndefValue::get(VL.front()->getType());
+ return VL[Idx];
+ });
+ InstructionsState S = getSameOpcode(Last->Scalars);
+ Last->setOperations(S);
+ Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
+ }
if (Last->State != TreeEntry::NeedToGather) {
for (Value *V : VL) {
assert(!getTreeEntry(V) && "Scalar already in tree!");
@@ -2431,14 +2454,6 @@ private:
}
};
- /// Contains orders of operations along with the number of bundles that have
- /// operations in this order. It stores only those orders that require
- /// reordering, if reordering is not required it is counted using \a
- /// NumOpsWantToKeepOriginalOrder.
- DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> NumOpsWantToKeepOrder;
- /// Number of bundles that do not require reordering.
- unsigned NumOpsWantToKeepOriginalOrder = 0;
-
// Analysis and block reference.
Function *F;
ScalarEvolution *SE;
@@ -2591,21 +2606,439 @@ void BoUpSLP::eraseInstructions(ArrayRef<Value *> AV) {
};
}
-void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
- ArrayRef<Value *> UserIgnoreLst) {
- ExtraValueToDebugLocsMap ExternallyUsedValues;
- buildTree(Roots, ExternallyUsedValues, UserIgnoreLst);
+/// Reorders the given \p Reuses mask according to the given \p Mask. \p Reuses
+/// contains original mask for the scalars reused in the node. Procedure
+/// transform this mask in accordance with the given \p Mask.
+static void reorderReuses(SmallVectorImpl<int> &Reuses, ArrayRef<int> Mask) {
+ assert(!Mask.empty() && Reuses.size() == Mask.size() &&
+ "Expected non-empty mask.");
+ SmallVector<int> Prev(Reuses.begin(), Reuses.end());
+ Prev.swap(Reuses);
+ for (unsigned I = 0, E = Prev.size(); I < E; ++I)
+ if (Mask[I] != UndefMaskElem)
+ Reuses[Mask[I]] = Prev[I];
}
-void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
- ExtraValueToDebugLocsMap &ExternallyUsedValues,
- ArrayRef<Value *> UserIgnoreLst) {
- deleteTree();
- UserIgnoreList = UserIgnoreLst;
- if (!allSameType(Roots))
+/// Reorders the given \p Order according to the given \p Mask. \p Order - is
+/// the original order of the scalars. Procedure transforms the provided order
+/// in accordance with the given \p Mask. If the resulting \p Order is just an
+/// identity order, \p Order is cleared.
+static void reorderOrder(SmallVectorImpl<unsigned> &Order, ArrayRef<int> Mask) {
+ assert(!Mask.empty() && "Expected non-empty mask.");
+ SmallVector<int> MaskOrder;
+ if (Order.empty()) {
+ MaskOrder.resize(Mask.size());
+ std::iota(MaskOrder.begin(), MaskOrder.end(), 0);
+ } else {
+ inversePermutation(Order, MaskOrder);
+ }
+ reorderReuses(MaskOrder, Mask);
+ if (ShuffleVectorInst::isIdentityMask(MaskOrder)) {
+ Order.clear();
return;
- buildTree_rec(Roots, 0, EdgeInfo());
+ }
+ Order.assign(Mask.size(), Mask.size());
+ for (unsigned I = 0, E = Mask.size(); I < E; ++I)
+ if (MaskOrder[I] != UndefMaskElem)
+ Order[MaskOrder[I]] = I;
+ fixupOrderingIndices(Order);
+}
+
+void BoUpSLP::reorderTopToBottom() {
+ // Maps VF to the graph nodes.
+ DenseMap<unsigned, SmallPtrSet<TreeEntry *, 4>> VFToOrderedEntries;
+ // ExtractElement gather nodes which can be vectorized and need to handle
+ // their ordering.
+ DenseMap<const TreeEntry *, OrdersType> GathersToOrders;
+ // Find all reorderable nodes with the given VF.
+ // Currently the are vectorized loads,extracts + some gathering of extracts.
+ for_each(VectorizableTree, [this, &VFToOrderedEntries, &GathersToOrders](
+ const std::unique_ptr<TreeEntry> &TE) {
+ // No need to reorder if need to shuffle reuses, still need to shuffle the
+ // node.
+ if (!TE->ReuseShuffleIndices.empty())
+ return;
+ if (TE->State == TreeEntry::Vectorize &&
+ isa<LoadInst, ExtractElementInst, ExtractValueInst, StoreInst,
+ InsertElementInst>(TE->getMainOp()) &&
+ !TE->isAltShuffle()) {
+ VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
+ } else if (TE->State == TreeEntry::NeedToGather &&
+ TE->getOpcode() == Instruction::ExtractElement &&
+ !TE->isAltShuffle() &&
+ isa<FixedVectorType>(cast<ExtractElementInst>(TE->getMainOp())
+ ->getVectorOperandType()) &&
+ allSameType(TE->Scalars) && allSameBlock(TE->Scalars)) {
+ // Check that gather of extractelements can be represented as
+ // just a shuffle of a single vector.
+ OrdersType CurrentOrder;
+ bool Reuse = canReuseExtract(TE->Scalars, TE->getMainOp(), CurrentOrder);
+ if (Reuse || !CurrentOrder.empty()) {
+ VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
+ GathersToOrders.try_emplace(TE.get(), CurrentOrder);
+ }
+ }
+ });
+
+ // Reorder the graph nodes according to their vectorization factor.
+ for (unsigned VF = VectorizableTree.front()->Scalars.size(); VF > 1;
+ VF /= 2) {
+ auto It = VFToOrderedEntries.find(VF);
+ if (It == VFToOrderedEntries.end())
+ continue;
+ // Try to find the most profitable order. We just are looking for the most
+ // used order and reorder scalar elements in the nodes according to this
+ // mostly used order.
+ const SmallPtrSetImpl<TreeEntry *> &OrderedEntries = It->getSecond();
+ // All operands are reordered and used only in this node - propagate the
+ // most used order to the user node.
+ DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> OrdersUses;
+ SmallPtrSet<const TreeEntry *, 4> VisitedOps;
+ for (const TreeEntry *OpTE : OrderedEntries) {
+ // No need to reorder this nodes, still need to extend and to use shuffle,
+ // just need to merge reordering shuffle and the reuse shuffle.
+ if (!OpTE->ReuseShuffleIndices.empty())
+ continue;
+ // Count number of orders uses.
+ const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {
+ if (OpTE->State == TreeEntry::NeedToGather)
+ return GathersToOrders.find(OpTE)->second;
+ return OpTE->ReorderIndices;
+ }();
+ // Stores actually store the mask, not the order, need to invert.
+ if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&
+ OpTE->getOpcode() == Instruction::Store && !Order.empty()) {
+ SmallVector<int> Mask;
+ inversePermutation(Order, Mask);
+ unsigned E = Order.size();
+ OrdersType CurrentOrder(E, E);
+ transform(Mask, CurrentOrder.begin(), [E](int Idx) {
+ return Idx == UndefMaskElem ? E : static_cast<unsigned>(Idx);
+ });
+ fixupOrderingIndices(CurrentOrder);
+ ++OrdersUses.try_emplace(CurrentOrder).first->getSecond();
+ } else {
+ ++OrdersUses.try_emplace(Order).first->getSecond();
+ }
+ }
+ // Set order of the user node.
+ if (OrdersUses.empty())
+ continue;
+ // Choose the most used order.
+ ArrayRef<unsigned> BestOrder = OrdersUses.begin()->first;
+ unsigned Cnt = OrdersUses.begin()->second;
+ for (const auto &Pair : llvm::drop_begin(OrdersUses)) {
+ if (Cnt < Pair.second || (Cnt == Pair.second && Pair.first.empty())) {
+ BestOrder = Pair.first;
+ Cnt = Pair.second;
+ }
+ }
+ // Set order of the user node.
+ if (BestOrder.empty())
+ continue;
+ SmallVector<int> Mask;
+ inversePermutation(BestOrder, Mask);
+ SmallVector<int> MaskOrder(BestOrder.size(), UndefMaskElem);
+ unsigned E = BestOrder.size();
+ transform(BestOrder, MaskOrder.begin(), [E](unsigned I) {
+ return I < E ? static_cast<int>(I) : UndefMaskElem;
+ });
+ // Do an actual reordering, if profitable.
+ for (std::unique_ptr<TreeEntry> &TE : VectorizableTree) {
+ // Just do the reordering for the nodes with the given VF.
+ if (TE->Scalars.size() != VF) {
+ if (TE->ReuseShuffleIndices.size() == VF) {
+ // Need to reorder the reuses masks of the operands with smaller VF to
+ // be able to find the match between the graph nodes and scalar
+ // operands of the given node during vectorization/cost estimation.
+ assert(all_of(TE->UserTreeIndices,
+ [VF, &TE](const EdgeInfo &EI) {
+ return EI.UserTE->Scalars.size() == VF ||
+ EI.UserTE->Scalars.size() ==
+ TE->Scalars.size();
+ }) &&
+ "All users must be of VF size.");
+ // Update ordering of the operands with the smaller VF than the given
+ // one.
+ reorderReuses(TE->ReuseShuffleIndices, Mask);
+ }
+ continue;
+ }
+ if (TE->State == TreeEntry::Vectorize &&
+ isa<ExtractElementInst, ExtractValueInst, LoadInst, StoreInst,
+ InsertElementInst>(TE->getMainOp()) &&
+ !TE->isAltShuffle()) {
+ // Build correct orders for extract{element,value}, loads and
+ // stores.
+ reorderOrder(TE->ReorderIndices, Mask);
+ if (isa<InsertElementInst, StoreInst>(TE->getMainOp()))
+ TE->reorderOperands(Mask);
+ } else {
+ // Reorder the node and its operands.
+ TE->reorderOperands(Mask);
+ assert(TE->ReorderIndices.empty() &&
+ "Expected empty reorder sequence.");
+ reorderScalars(TE->Scalars, Mask);
+ }
+ if (!TE->ReuseShuffleIndices.empty()) {
+ // Apply reversed order to keep the original ordering of the reused
+ // elements to avoid extra reorder indices shuffling.
+ OrdersType CurrentOrder;
+ reorderOrder(CurrentOrder, MaskOrder);
+ SmallVector<int> NewReuses;
+ inversePermutation(CurrentOrder, NewReuses);
+ addMask(NewReuses, TE->ReuseShuffleIndices);
+ TE->ReuseShuffleIndices.swap(NewReuses);
+ }
+ }
+ }
+}
+
+void BoUpSLP::reorderBottomToTop() {
+ SetVector<TreeEntry *> OrderedEntries;
+ DenseMap<const TreeEntry *, OrdersType> GathersToOrders;
+ // Find all reorderable leaf nodes with the given VF.
+ // Currently the are vectorized loads,extracts without alternate operands +
+ // some gathering of extracts.
+ SmallVector<TreeEntry *> NonVectorized;
+ for_each(VectorizableTree, [this, &OrderedEntries, &GathersToOrders,
+ &NonVectorized](
+ const std::unique_ptr<TreeEntry> &TE) {
+ // No need to reorder if need to shuffle reuses, still need to shuffle the
+ // node.
+ if (!TE->ReuseShuffleIndices.empty())
+ return;
+ if (TE->State == TreeEntry::Vectorize &&
+ isa<LoadInst, ExtractElementInst, ExtractValueInst>(TE->getMainOp()) &&
+ !TE->isAltShuffle()) {
+ OrderedEntries.insert(TE.get());
+ } else if (TE->State == TreeEntry::NeedToGather &&
+ TE->getOpcode() == Instruction::ExtractElement &&
+ !TE->isAltShuffle() &&
+ isa<FixedVectorType>(cast<ExtractElementInst>(TE->getMainOp())
+ ->getVectorOperandType()) &&
+ allSameType(TE->Scalars) && allSameBlock(TE->Scalars)) {
+ // Check that gather of extractelements can be represented as
+ // just a shuffle of a single vector with a single user only.
+ OrdersType CurrentOrder;
+ bool Reuse = canReuseExtract(TE->Scalars, TE->getMainOp(), CurrentOrder);
+ if ((Reuse || !CurrentOrder.empty()) &&
+ !any_of(
+ VectorizableTree, [&TE](const std::unique_ptr<TreeEntry> &Entry) {
+ return Entry->State == TreeEntry::NeedToGather &&
+ Entry.get() != TE.get() && Entry->isSame(TE->Scalars);
+ })) {
+ OrderedEntries.insert(TE.get());
+ GathersToOrders.try_emplace(TE.get(), CurrentOrder);
+ }
+ }
+ if (TE->State != TreeEntry::Vectorize)
+ NonVectorized.push_back(TE.get());
+ });
+
+ // Checks if the operands of the users are reordarable and have only single
+ // use.
+ auto &&CheckOperands =
+ [this, &NonVectorized](const auto &Data,
+ SmallVectorImpl<TreeEntry *> &GatherOps) {
+ for (unsigned I = 0, E = Data.first->getNumOperands(); I < E; ++I) {
+ if (any_of(Data.second,
+ [I](const std::pair<unsigned, TreeEntry *> &OpData) {
+ return OpData.first == I &&
+ OpData.second->State == TreeEntry::Vectorize;
+ }))
+ continue;
+ ArrayRef<Value *> VL = Data.first->getOperand(I);
+ const TreeEntry *TE = nullptr;
+ const auto *It = find_if(VL, [this, &TE](Value *V) {
+ TE = getTreeEntry(V);
+ return TE;
+ });
+ if (It != VL.end() && TE->isSame(VL))
+ return false;
+ TreeEntry *Gather = nullptr;
+ if (count_if(NonVectorized, [VL, &Gather](TreeEntry *TE) {
+ assert(TE->State != TreeEntry::Vectorize &&
+ "Only non-vectorized nodes are expected.");
+ if (TE->isSame(VL)) {
+ Gather = TE;
+ return true;
+ }
+ return false;
+ }) > 1)
+ return false;
+ if (Gather)
+ GatherOps.push_back(Gather);
+ }
+ return true;
+ };
+ // 1. Propagate order to the graph nodes, which use only reordered nodes.
+ // I.e., if the node has operands, that are reordered, try to make at least
+ // one operand order in the natural order and reorder others + reorder the
+ // user node itself.
+ SmallPtrSet<const TreeEntry *, 4> Visited;
+ while (!OrderedEntries.empty()) {
+ // 1. Filter out only reordered nodes.
+ // 2. If the entry has multiple uses - skip it and jump to the next node.
+ MapVector<TreeEntry *, SmallVector<std::pair<unsigned, TreeEntry *>>> Users;
+ SmallVector<TreeEntry *> Filtered;
+ for (TreeEntry *TE : OrderedEntries) {
+ if (!(TE->State == TreeEntry::Vectorize ||
+ (TE->State == TreeEntry::NeedToGather &&
+ TE->getOpcode() == Instruction::ExtractElement)) ||
+ TE->UserTreeIndices.empty() || !TE->ReuseShuffleIndices.empty() ||
+ !all_of(drop_begin(TE->UserTreeIndices),
+ [TE](const EdgeInfo &EI) {
+ return EI.UserTE == TE->UserTreeIndices.front().UserTE;
+ }) ||
+ !Visited.insert(TE).second) {
+ Filtered.push_back(TE);
+ continue;
+ }
+ // Build a map between user nodes and their operands order to speedup
+ // search. The graph currently does not provide this dependency directly.
+ for (EdgeInfo &EI : TE->UserTreeIndices) {
+ TreeEntry *UserTE = EI.UserTE;
+ auto It = Users.find(UserTE);
+ if (It == Users.end())
+ It = Users.insert({UserTE, {}}).first;
+ It->second.emplace_back(EI.EdgeIdx, TE);
+ }
+ }
+ // Erase filtered entries.
+ for_each(Filtered,
+ [&OrderedEntries](TreeEntry *TE) { OrderedEntries.remove(TE); });
+ for (const auto &Data : Users) {
+ // Check that operands are used only in the User node.
+ SmallVector<TreeEntry *> GatherOps;
+ if (!CheckOperands(Data, GatherOps)) {
+ for_each(Data.second,
+ [&OrderedEntries](const std::pair<unsigned, TreeEntry *> &Op) {
+ OrderedEntries.remove(Op.second);
+ });
+ continue;
+ }
+ // All operands are reordered and used only in this node - propagate the
+ // most used order to the user node.
+ DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> OrdersUses;
+ SmallPtrSet<const TreeEntry *, 4> VisitedOps;
+ for (const auto &Op : Data.second) {
+ TreeEntry *OpTE = Op.second;
+ if (!OpTE->ReuseShuffleIndices.empty())
+ continue;
+ const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {
+ if (OpTE->State == TreeEntry::NeedToGather)
+ return GathersToOrders.find(OpTE)->second;
+ return OpTE->ReorderIndices;
+ }();
+ // Stores actually store the mask, not the order, need to invert.
+ if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&
+ OpTE->getOpcode() == Instruction::Store && !Order.empty()) {
+ SmallVector<int> Mask;
+ inversePermutation(Order, Mask);
+ unsigned E = Order.size();
+ OrdersType CurrentOrder(E, E);
+ transform(Mask, CurrentOrder.begin(), [E](int Idx) {
+ return Idx == UndefMaskElem ? E : static_cast<unsigned>(Idx);
+ });
+ fixupOrderingIndices(CurrentOrder);
+ ++OrdersUses.try_emplace(CurrentOrder).first->getSecond();
+ } else {
+ ++OrdersUses.try_emplace(Order).first->getSecond();
+ }
+ if (VisitedOps.insert(OpTE).second)
+ OrdersUses.try_emplace({}, 0).first->getSecond() +=
+ OpTE->UserTreeIndices.size();
+ --OrdersUses[{}];
+ }
+ // If no orders - skip current nodes and jump to the next one, if any.
+ if (OrdersUses.empty()) {
+ for_each(Data.second,
+ [&OrderedEntries](const std::pair<unsigned, TreeEntry *> &Op) {
+ OrderedEntries.remove(Op.second);
+ });
+ continue;
+ }
+ // Choose the best order.
+ ArrayRef<unsigned> BestOrder = OrdersUses.begin()->first;
+ unsigned Cnt = OrdersUses.begin()->second;
+ for (const auto &Pair : llvm::drop_begin(OrdersUses)) {
+ if (Cnt < Pair.second || (Cnt == Pair.second && Pair.first.empty())) {
+ BestOrder = Pair.first;
+ Cnt = Pair.second;
+ }
+ }
+ // Set order of the user node (reordering of operands and user nodes).
+ if (BestOrder.empty()) {
+ for_each(Data.second,
+ [&OrderedEntries](const std::pair<unsigned, TreeEntry *> &Op) {
+ OrderedEntries.remove(Op.second);
+ });
+ continue;
+ }
+ // Erase operands from OrderedEntries list and adjust their orders.
+ VisitedOps.clear();
+ SmallVector<int> Mask;
+ inversePermutation(BestOrder, Mask);
+ SmallVector<int> MaskOrder(BestOrder.size(), UndefMaskElem);
+ unsigned E = BestOrder.size();
+ transform(BestOrder, MaskOrder.begin(), [E](unsigned I) {
+ return I < E ? static_cast<int>(I) : UndefMaskElem;
+ });
+ for (const std::pair<unsigned, TreeEntry *> &Op : Data.second) {
+ TreeEntry *TE = Op.second;
+ OrderedEntries.remove(TE);
+ if (!VisitedOps.insert(TE).second)
+ continue;
+ if (!TE->ReuseShuffleIndices.empty() && TE->ReorderIndices.empty()) {
+ // Just reorder reuses indices.
+ reorderReuses(TE->ReuseShuffleIndices, Mask);
+ continue;
+ }
+ // Gathers are processed separately.
+ if (TE->State != TreeEntry::Vectorize)
+ continue;
+ assert((BestOrder.size() == TE->ReorderIndices.size() ||
+ TE->ReorderIndices.empty()) &&
+ "Non-matching sizes of user/operand entries.");
+ reorderOrder(TE->ReorderIndices, Mask);
+ }
+ // For gathers just need to reorder its scalars.
+ for (TreeEntry *Gather : GatherOps) {
+ if (!Gather->ReuseShuffleIndices.empty())
+ continue;
+ assert(Gather->ReorderIndices.empty() &&
+ "Unexpected reordering of gathers.");
+ reorderScalars(Gather->Scalars, Mask);
+ OrderedEntries.remove(Gather);
+ }
+ // Reorder operands of the user node and set the ordering for the user
+ // node itself.
+ if (Data.first->State != TreeEntry::Vectorize ||
+ !isa<ExtractElementInst, ExtractValueInst, LoadInst>(
+ Data.first->getMainOp()) ||
+ Data.first->isAltShuffle())
+ Data.first->reorderOperands(Mask);
+ if (!isa<InsertElementInst, StoreInst>(Data.first->getMainOp()) ||
+ Data.first->isAltShuffle()) {
+ reorderScalars(Data.first->Scalars, Mask);
+ reorderOrder(Data.first->ReorderIndices, MaskOrder);
+ if (Data.first->ReuseShuffleIndices.empty() &&
+ !Data.first->ReorderIndices.empty() &&
+ !Data.first->isAltShuffle()) {
+ // Insert user node to the list to try to sink reordering deeper in
+ // the graph.
+ OrderedEntries.insert(Data.first);
+ }
+ } else {
+ reorderOrder(Data.first->ReorderIndices, Mask);
+ }
+ }
+ }
+}
+void BoUpSLP::buildExternalUses(
+ const ExtraValueToDebugLocsMap &ExternallyUsedValues) {
// Collect the values that we need to extract from the tree.
for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();
@@ -2664,6 +3097,80 @@ void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
}
}
+void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
+ ArrayRef<Value *> UserIgnoreLst) {
+ deleteTree();
+ UserIgnoreList = UserIgnoreLst;
+ if (!allSameType(Roots))
+ return;
+ buildTree_rec(Roots, 0, EdgeInfo());
+}
+
+namespace {
+/// Tracks the state we can represent the loads in the given sequence.
+enum class LoadsState { Gather, Vectorize, ScatterVectorize };
+} // anonymous namespace
+
+/// Checks if the given array of loads can be represented as a vectorized,
+/// scatter or just simple gather.
+static LoadsState canVectorizeLoads(ArrayRef<Value *> VL, const Value *VL0,
+ const TargetTransformInfo &TTI,
+ const DataLayout &DL, ScalarEvolution &SE,
+ SmallVectorImpl<unsigned> &Order,
+ SmallVectorImpl<Value *> &PointerOps) {
+ // Check that a vectorized load would load the same memory as a scalar
+ // load. For example, we don't want to vectorize loads that are smaller
+ // than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM
+ // treats loading/storing it as an i8 struct. If we vectorize loads/stores
+ // from such a struct, we read/write packed bits disagreeing with the
+ // unvectorized version.
+ Type *ScalarTy = VL0->getType();
+
+ if (DL.getTypeSizeInBits(ScalarTy) != DL.getTypeAllocSizeInBits(ScalarTy))
+ return LoadsState::Gather;
+
+ // Make sure all loads in the bundle are simple - we can't vectorize
+ // atomic or volatile loads.
+ PointerOps.clear();
+ PointerOps.resize(VL.size());
+ auto *POIter = PointerOps.begin();
+ for (Value *V : VL) {
+ auto *L = cast<LoadInst>(V);
+ if (!L->isSimple())
+ return LoadsState::Gather;
+ *POIter = L->getPointerOperand();
+ ++POIter;
+ }
+
+ Order.clear();
+ // Check the order of pointer operands.
+ if (llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE, Order)) {
+ Value *Ptr0;
+ Value *PtrN;
+ if (Order.empty()) {
+ Ptr0 = PointerOps.front();
+ PtrN = PointerOps.back();
+ } else {
+ Ptr0 = PointerOps[Order.front()];
+ PtrN = PointerOps[Order.back()];
+ }
+ Optional<int> Diff =
+ getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
+ // Check that the sorted loads are consecutive.
+ if (static_cast<unsigned>(*Diff) == VL.size() - 1)
+ return LoadsState::Vectorize;
+ Align CommonAlignment = cast<LoadInst>(VL0)->getAlign();
</cut>
And the rest of the week I flushed my maintainer queues ;-)
Other
=====
[update-ticket] <file:~/org/team.org::update-ticket>
Update [update-ticket] to work with cloud JIRA
Completed Reviews [8/8]
=======================
[PATCH 0/7] tests: docker images for hexagon, nios2, microblaze
Message-Id: <20211014224435.2539547-1-richard.henderson(a)linaro.org>
[PATCH] gdbstub: Switch to the thread receiving a signal
Message-Id: <20210930095111.23205-1-pavel(a)labath.sk>
[PATCH] replay: improve determinism of virtio-net
Message-Id: <162125666020.1252655.9997723318921206001.stgit@pasha-ThinkPad-X280>
[PATCH RESEND v3 0/2] add APIs to handle alternative sNaN propagation for fmax/fmin
Message-Id: <20211015065500.3850513-1-frank.chang(a)sifive.com>
[PATCH v3 0/5] plugins/cache: multicore cache modelling and minor tweaks
Message-Id: <20210722065428.134608-1-ma.mandourr(a)gmail.com>
[PATCH v2 0/2] plugins: add a drcov plugin
Message-Id: <163429165642.439576.16356288759891202632.stgit@pc-System-Product-Name>
[PATCH v2 0/2] plugins: add a drcov plugin
Message-Id: <163429165642.439576.16356288759891202632.stgit@pc-System-Product-Name>
[PATCH 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20210914155214.105415-1-mlevitsk(a)redhat.com>
Absences
========
- Off Friday next week
Current Review Queue
====================
TODO [PATCH v2 00/48] tcg: optimize redundant sign extensions
Message-Id: <20211007195456.1168070-1-richard.henderson(a)linaro.org>
================================================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
OK I've fixed up my JIRA and email tooling so this is a bit of a flush
of stale data from my org-mode.
VirtIO Initiative ([STR-9])
===========================
- posted Enabling hypervisor agnosticism for VirtIO backends
Message-Id: <87v94ldrqq.fsf(a)linaro.org>
- posted [a PR to do some cleanups to vm-virtio]
[STR-9] <https://projects.linaro.org/browse/STR-9>
[a PR to do some cleanups to vm-virtio]
<https://github.com/rust-vmm/vm-virtio/pull/103>
VirtIO RPMB ([STR-5])
=====================
- made more progress and now have PROGRAM_KEY/WRITE_COUNTER done -
feels like it's getting faster
[STR-5] <https://projects.linaro.org/browse/STR-5>
[Rust version of virtio-rpmb] <https://github.com/stsquad/virtio-rpmb>
[fixes for the C daemon]
<https://github.com/ruchi393/qemu/tree/vhost-user-rpmb-fixes>
[hacking branch] <https://github.com/stsquad/virtio-rpmb/tree/hacking>
Fix VirtIO spec as per Rucha's email
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 6.1-rc3 v1 0/4] gitlab and plugins pre-PR
Message-Id: <20210806141015.2487502-1-alex.bennee(a)linaro.org>
- prepared a potential [pull request for testing issues] but looks
like it will wait for 6.2
[UM-2] <https://projects.linaro.org/browse/UM-2>
[this is the last iteration before Monday]
<https://patchew.org/QEMU/20210709143005.1554-1-alex.bennee@linaro.org/>
[pull request for testing issues]
<https://github.com/stsquad/qemu/tree/pr/120821-for-6.1-rc4-1>
Completed Reviews [4/4]
=======================
[RFC PATCH 0/1] QEMU TCG plugin interface extensions
Message-Id: <20210821094527.491232-1-florian.hauschild(a)fs.ei.tum.de>
[PATCH 0/8] tcg: support 32-bit guest addresses as signed
Message-Id: <20211010174401.141339-1-richard.henderson(a)linaro.org>
[PATCH 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20210914155214.105415-1-mlevitsk(a)redhat.com>
[PATCH 0/6] More record/replay acceptance tests
Message-Id: <162332427732.194926.7555369160312506539.stgit@pasha-ThinkPad-X280>
[PATCH 0/3] Gitlab-CI improvements
Message-Id: <20210730143809.717079-1-thuth(a)redhat.com>
========================================================================================
[PATCH v3 00/13] new plugin argument passing scheme
Message-Id: <20210722071236.139520-1-ma.mandourr(a)gmail.com>
==============================================================================================================
[PATCH] contrib/plugins: add a drcov plugin
Message-Id: <20211011111130.170178-1-arkaisp2021(a)gmail.com>
======================================================================================================
[RFC PATCH v2] Add a post for the new TCG cache modelling plugin
Message-Id: <20210617121707.764126-1-ma.mandourr(a)gmail.com>
===========================================================================================================================
Current Review Queue
====================
TODO [PATCH v2 00/48] tcg: optimize redundant sign extensions
Message-Id: <20211007195456.1168070-1-richard.henderson(a)linaro.org>
================================================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/48]
Message-Id: <20211013024607.731881-1-richard.henderson(a)linaro.org>
=======================================================================================
--
Alex Bennée
After llvm commit 75127bce6de78b83b70b898a04473f213451f13e
Author: Qiongsi Wu <qwu(a)ibm.com>
[AIX][ZOS] Excluding merge-objc-interface.m from Tests
the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%):
- 433.milc:[.] mult_su3_mat_vec slowed down by 16% from 1615 to 1871 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-75127bce6de78b83b70b898a04473f213451f13e
cd investigate-llvm-75127bce6de78b83b70b898a04473f213451f13e
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 75127bce6de78b83b70b898a04473f213451f13e
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d01ae990e1fd6561ed86dc8004a7147dd09fb13c
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 75127bce6de78b83b70b898a04473f213451f13e
Author: Qiongsi Wu <qwu(a)ibm.com>
Date: Fri Oct 8 13:58:32 2021 +0000
[AIX][ZOS] Excluding merge-objc-interface.m from Tests
Objective C is not supported on AIX or ZOS. This patch excludes the newly added `clang/test/Modules/merge-objc-interface.m` (added by https://reviews.llvm.org/D110280) from AIX and ZOS testing.
Many existing tests are already disabled by https://reviews.llvm.org/D109060.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111406
---
clang/test/Modules/merge-objc-interface.m | 1 +
1 file changed, 1 insertion(+)
diff --git a/clang/test/Modules/merge-objc-interface.m b/clang/test/Modules/merge-objc-interface.m
index fba06294a26a..f62f541c1a29 100644
--- a/clang/test/Modules/merge-objc-interface.m
+++ b/clang/test/Modules/merge-objc-interface.m
@@ -1,3 +1,4 @@
+// UNSUPPORTED: -zos, -aix
// RUN: rm -rf %t
// RUN: split-file %s %t
// RUN: %clang_cc1 -emit-llvm -o %t/test.bc -F%t/Frameworks %t/test.m \
</cut>
After llvm commit 483db1c706864d0940206228dfe64bdcd17faa4e
Author: Muhammad Omair Javaid <omair.javaid(a)linaro.org>
[LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux
the following benchmarks slowed down by more than 2%:
- 433.milc slowed down by 4% from 13309 to 13838 perf samples
- 433.milc:[.] mult_su3_mat_vec slowed down by 17% from 2058 to 2409 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-483db1c706864d0940206228dfe64bdcd17faa4e
cd investigate-llvm-483db1c706864d0940206228dfe64bdcd17faa4e
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 483db1c706864d0940206228dfe64bdcd17faa4e
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d11ec6f67e45c630ab87bfb6010dcc93e89542fc
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 483db1c706864d0940206228dfe64bdcd17faa4e
Author: Muhammad Omair Javaid <omair.javaid(a)linaro.org>
Date: Mon Oct 11 14:34:41 2021 +0500
[LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux
TestInferiorAssert.py test_inferior_asserting_disassemble passes after
upgrading LLDB AArch64/Linux buildbot to Ubuntu Focal.
---
lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py b/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py
index c533a1e29a12..5ac4eeb0514a 100644
--- a/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py
+++ b/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py
@@ -45,9 +45,7 @@ class AssertingInferiorTestCase(TestBase):
bugnumber="llvm.org/pr21793: need to implement support for detecting assertion / abort on Windows")
@expectedFailureAll(
oslist=["linux"],
- archs=[
- "aarch64",
- "arm"],
+ archs=["arm"],
triple=no_match(".*-android"),
bugnumber="llvm.org/pr25338")
@expectedFailureAll(bugnumber="llvm.org/pr26592", triple='^mips')
</cut>