Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO
Culprit:
<cut>
commit 19dc02e99f802922a3af69e802465bee0723b57a
Author: Nikita Popov <nikita.ppv(a)gmail.com>
Date: Sun Aug 22 18:15:55 2021 +0200
[MergeICmps] Allow sinking past non-load/store
This is a followup to D106591. MergeICmps currently only allows
sinking the loads past either instructions that don't write to
memory at all, or simple loads/stores that don't modify the memory
the loads access.
The "simple loads/stores" part of this check doesn't seem necessary
to me -- AA isModRef() already accurately models any operation
that may clobber the memory. For example, in the adjusted test case
the transform is still fine if the call to @foo() isn't readonly,
but inaccessiblememonly -- in both cases, the call cannot modify
the loaded memory.
Differential Revision: https://reviews.llvm.org/D108517
</cut>
Results regressed to (for first_bad == 19dc02e99f802922a3af69e802465bee0723b57a)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO artifacts/build-19dc02e99f802922a3af69e802465bee0723b57a/results_id:
1
# 464.h264ref,h264ref_base.default regressed by 105
from (for last_good == da12d88b1c5fc42b49b92fcf94917ca489dd677f)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO artifacts/build-da12d88b1c5fc42b49b92fcf94917ca489dd677f/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3_LTO/4822
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3_LTO/4807
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-19dc02e99f802922a3af69e802465bee0723b57a
cd investigate-llvm-19dc02e99f802922a3af69e802465bee0723b57a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 19dc02e99f802922a3af69e802465bee0723b57a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach da12d88b1c5fc42b49b92fcf94917ca489dd677f
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 19dc02e99f802922a3af69e802465bee0723b57a
Author: Nikita Popov <nikita.ppv(a)gmail.com>
Date: Sun Aug 22 18:15:55 2021 +0200
[MergeICmps] Allow sinking past non-load/store
This is a followup to D106591. MergeICmps currently only allows
sinking the loads past either instructions that don't write to
memory at all, or simple loads/stores that don't modify the memory
the loads access.
The "simple loads/stores" part of this check doesn't seem necessary
to me -- AA isModRef() already accurately models any operation
that may clobber the memory. For example, in the adjusted test case
the transform is still fine if the call to @foo() isn't readonly,
but inaccessiblememonly -- in both cases, the call cannot modify
the loaded memory.
Differential Revision: https://reviews.llvm.org/D108517
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 14 +-------------
.../Transforms/MergeICmps/X86/split-block-does-work.ll | 2 +-
2 files changed, 2 insertions(+), 14 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index f13f24ad2027..34465c76dd3d 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -66,15 +66,6 @@ namespace {
#define DEBUG_TYPE "mergeicmps"
-// Returns true if the instruction is a simple load or a simple store
-static bool isSimpleLoadOrStore(const Instruction *I) {
- if (const LoadInst *LI = dyn_cast<LoadInst>(I))
- return LI->isSimple();
- if (const StoreInst *SI = dyn_cast<StoreInst>(I))
- return SI->isSimple();
- return false;
-}
-
// A BCE atom "Binary Compare Expression Atom" represents an integer load
// that is a constant offset from a base value, e.g. `a` or `o.c` in the example
// at the top.
@@ -244,10 +235,7 @@ bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
// If this instruction may clobber the loads and is in middle of the BCE cmp
// block instructions, then bail for now.
if (Inst->mayWriteToMemory()) {
- // Bail if this is not a simple load or store
- if (!isSimpleLoadOrStore(Inst))
- return false;
- // Disallow stores that might alias the BCE operands
+ // Disallow instructions that might modify the BCE operands
MemoryLocation LLoc = MemoryLocation::get(Cmp.Lhs.LoadI);
MemoryLocation RLoc = MemoryLocation::get(Cmp.Rhs.LoadI);
if (isModSet(AA.getModRefInfo(Inst, LLoc)) ||
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 0b9663f44980..1e341b92918d 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -3,7 +3,7 @@
%S = type { i32, i32, i32, i32 }
-declare void @foo(...) readonly
+declare void @foo(...) inaccessiblememonly
; We can split %entry and create a memcmp(16 bytes).
define zeroext i1 @opeq1(
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2
Culprit:
<cut>
commit d39d3a327b1303012370e47d991459ffbfce45ef
Author: Peyton, Jonathan L <jonathan.l.peyton(a)intel.com>
Date: Fri Aug 20 16:06:13 2021 -0500
[OpenMP][test] fix omp_get_wtime.c test to be more accommodating
The omp_get_wtime.c test fails intermittently if the recorded times are
off by too much which can happen when many tests are run in parallel.
Instead of failing if one timing is a little off, take average of 100
timings minus the 10 worst.
Differential Revision: https://reviews.llvm.org/D108488
</cut>
Results regressed to (for first_bad == d39d3a327b1303012370e47d991459ffbfce45ef)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2 artifacts/build-d39d3a327b1303012370e47d991459ffbfce45ef/results_id:
1
# 447.dealII,dealII_base.default regressed by 105
from (for last_good == f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2 artifacts/build-f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4734
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4757
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-d39d3a327b1303012370e47d991459ffbfce45ef
cd investigate-llvm-d39d3a327b1303012370e47d991459ffbfce45ef
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach d39d3a327b1303012370e47d991459ffbfce45ef
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit d39d3a327b1303012370e47d991459ffbfce45ef
Author: Peyton, Jonathan L <jonathan.l.peyton(a)intel.com>
Date: Fri Aug 20 16:06:13 2021 -0500
[OpenMP][test] fix omp_get_wtime.c test to be more accommodating
The omp_get_wtime.c test fails intermittently if the recorded times are
off by too much which can happen when many tests are run in parallel.
Instead of failing if one timing is a little off, take average of 100
timings minus the 10 worst.
Differential Revision: https://reviews.llvm.org/D108488
---
openmp/runtime/test/api/omp_get_wtime.c | 75 ++++++++++++++++++++++++++-------
1 file changed, 59 insertions(+), 16 deletions(-)
diff --git a/openmp/runtime/test/api/omp_get_wtime.c b/openmp/runtime/test/api/omp_get_wtime.c
index e2bb211e0ce4..a862e07fc5a2 100644
--- a/openmp/runtime/test/api/omp_get_wtime.c
+++ b/openmp/runtime/test/api/omp_get_wtime.c
@@ -4,30 +4,73 @@
#include "omp_testsuite.h"
#include "omp_my_sleep.h"
-int test_omp_get_wtime()
-{
+#define NTIMES 100
+
+// This is the error % threshold. Be generous with the error threshold since
+// this test may be run in parallel with many other tests it may throw off the
+// sleep timing.
+#define THRESHOLD 33.0
+
+double test_omp_get_wtime(double desired_wait_time) {
double start;
double end;
- double measured_time;
- double wait_time = 0.2;
start = 0;
end = 0;
start = omp_get_wtime();
- my_sleep (wait_time);
+ my_sleep(desired_wait_time);
end = omp_get_wtime();
- measured_time = end-start;
- return ((measured_time > 0.97 * wait_time) && (measured_time < 1.03 * wait_time)) ;
+ return end - start;
}
-int main()
-{
- int i;
- int num_failed=0;
+int compare_times(const void *lhs, const void *rhs) {
+ const double *a = (const double *)lhs;
+ const double *b = (const double *)rhs;
+ return *a - *b;
+}
+
+int main() {
+ int i, final_count;
+ double percent_off;
+ double *begin, *end, *ptr;
+ double wait_time = 0.01;
+ double average = 0.0;
+ double n = 0.0;
+ double *times = (double *)malloc(sizeof(double) * NTIMES);
+
+ // Get each timing
+ for (i = 0; i < NTIMES; i++) {
+ times[i] = test_omp_get_wtime(wait_time);
+ }
+
+ // Remove approx the "worst" tenth of the timings
+ qsort(times, NTIMES, sizeof(double), compare_times);
+ begin = times;
+ end = times + NTIMES;
+ for (i = 0; i < NTIMES / 10; ++i) {
+ if (i % 2 == 0)
+ begin++;
+ else
+ end--;
+ }
+
+ // Get the average of the remaining timings
+ for (ptr = begin, final_count = 0; ptr != end; ++ptr, ++final_count)
+ average += times[i];
+ average /= (double)final_count;
+ free(times);
+
+ // Calculate the percent off of desired wait time
+ percent_off = (average - wait_time) / wait_time * 100.0;
+ // Should always be positive, but just in case
+ if (percent_off < 0)
+ percent_off = -percent_off;
- for(i = 0; i < REPETITIONS; i++) {
- if(!test_omp_get_wtime()) {
- num_failed++;
- }
+ if (percent_off > (double)THRESHOLD) {
+ fprintf(stderr, "error: average of %d runs (%lf) is of by %lf%%\n", NTIMES,
+ average, percent_off);
+ return EXIT_FAILURE;
}
- return num_failed;
+ printf("pass: average of %d runs (%lf) is only off by %lf%%\n", NTIMES,
+ average, percent_off);
+ return EXIT_SUCCESS;
}
</cut>
Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig. So far, this commit has regressed CI configurations:
- tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig
Culprit:
<cut>
commit 342f43af70dbc74f8629381998f92c060e1763a2
Author: Maurizio Lombardi <mlombard(a)redhat.com>
Date: Thu Jul 29 15:52:50 2021 +0200
iscsi_ibft: fix crash due to KASLR physical memory remapping
Starting with commit a799c2bd29d1
("x86/setup: Consolidate early memory reservations")
memory reservations have been moved earlier during the boot process,
before the execution of the Kernel Address Space Layout Randomization code.
setup_arch() calls the iscsi_ibft's find_ibft_region() function
to find and reserve the memory dedicated to the iBFT and this function
also saves a virtual pointer to the iBFT table for later use.
The problem is that if KALSR is active, the physical memory gets
remapped somewhere else in the virtual address space and the pointer is
no longer valid, this will cause a kernel panic when the iscsi driver tries
to dereference it.
iBFT detected.
BUG: unable to handle page fault for address: ffff888000099fd8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
..snip..
Call Trace:
? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft]
do_one_initcall+0x44/0x1d0
? kmem_cache_alloc_trace+0x119/0x220
do_init_module+0x5c/0x270
__do_sys_init_module+0x12e/0x1b0
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix this bug by saving the address of the physical location
of the ibft; later the driver will use isa_bus_to_virt() to get
the correct virtual address.
N.B. On each reboot KASLR randomizes the virtual addresses so
assuming phys_to_virt before KASLR does its deed is incorrect.
Simplify the code by renaming find_ibft_region()
to reserve_ibft_region() and remove all the wrappers.
Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad(a)kernel.org>
</cut>
Results regressed to (for first_bad == 342f43af70dbc74f8629381998f92c060e1763a2)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
19722
# First few build errors in logs:
from (for last_good == 62fb9874f5da54fdb243003b386128037319b219)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
19795
# linux build successful:
all
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-linux-342f43af70dbc74f8629381998f92c060e1763a2
cd investigate-linux-342f43af70dbc74f8629381998f92c060e1763a2
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach 342f43af70dbc74f8629381998f92c060e1763a2
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 62fb9874f5da54fdb243003b386128037319b219
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Full commit (up to 1000 lines):
<cut>
commit 342f43af70dbc74f8629381998f92c060e1763a2
Author: Maurizio Lombardi <mlombard(a)redhat.com>
Date: Thu Jul 29 15:52:50 2021 +0200
iscsi_ibft: fix crash due to KASLR physical memory remapping
Starting with commit a799c2bd29d1
("x86/setup: Consolidate early memory reservations")
memory reservations have been moved earlier during the boot process,
before the execution of the Kernel Address Space Layout Randomization code.
setup_arch() calls the iscsi_ibft's find_ibft_region() function
to find and reserve the memory dedicated to the iBFT and this function
also saves a virtual pointer to the iBFT table for later use.
The problem is that if KALSR is active, the physical memory gets
remapped somewhere else in the virtual address space and the pointer is
no longer valid, this will cause a kernel panic when the iscsi driver tries
to dereference it.
iBFT detected.
BUG: unable to handle page fault for address: ffff888000099fd8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
..snip..
Call Trace:
? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft]
do_one_initcall+0x44/0x1d0
? kmem_cache_alloc_trace+0x119/0x220
do_init_module+0x5c/0x270
__do_sys_init_module+0x12e/0x1b0
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix this bug by saving the address of the physical location
of the ibft; later the driver will use isa_bus_to_virt() to get
the correct virtual address.
N.B. On each reboot KASLR randomizes the virtual addresses so
assuming phys_to_virt before KASLR does its deed is incorrect.
Simplify the code by renaming find_ibft_region()
to reserve_ibft_region() and remove all the wrappers.
Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad(a)kernel.org>
---
arch/x86/kernel/setup.c | 10 --------
drivers/firmware/iscsi_ibft.c | 10 +++++---
drivers/firmware/iscsi_ibft_find.c | 48 ++++++++++++++------------------------
include/linux/iscsi_ibft.h | 18 ++++++--------
4 files changed, 32 insertions(+), 54 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1e720626069a..b6a62af06a9f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -571,16 +571,6 @@ void __init reserve_standard_io_resources(void)
}
-static __init void reserve_ibft_region(void)
-{
- unsigned long addr, size = 0;
-
- addr = find_ibft_region(&size);
-
- if (size)
- memblock_reserve(addr, size);
-}
-
static bool __init snb_gfx_workaround_needed(void)
{
#ifdef CONFIG_PCI
diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
index 7127a04bca19..612a59e213df 100644
--- a/drivers/firmware/iscsi_ibft.c
+++ b/drivers/firmware/iscsi_ibft.c
@@ -84,8 +84,10 @@ MODULE_DESCRIPTION("sysfs interface to BIOS iBFT information");
MODULE_LICENSE("GPL");
MODULE_VERSION(IBFT_ISCSI_VERSION);
+static struct acpi_table_ibft *ibft_addr;
+
#ifndef CONFIG_ISCSI_IBFT_FIND
-struct acpi_table_ibft *ibft_addr;
+phys_addr_t ibft_phys_addr;
#endif
struct ibft_hdr {
@@ -858,11 +860,13 @@ static int __init ibft_init(void)
int rc = 0;
/*
- As on UEFI systems the setup_arch()/find_ibft_region()
+ As on UEFI systems the setup_arch()/reserve_ibft_region()
is called before ACPI tables are parsed and it only does
legacy finding.
*/
- if (!ibft_addr)
+ if (ibft_phys_addr)
+ ibft_addr = isa_bus_to_virt(ibft_phys_addr);
+ else
acpi_find_ibft_region();
if (ibft_addr) {
diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
index 64bb94523281..a0594590847d 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -31,8 +31,8 @@
/*
* Physical location of iSCSI Boot Format Table.
*/
-struct acpi_table_ibft *ibft_addr;
-EXPORT_SYMBOL_GPL(ibft_addr);
+phys_addr_t ibft_phys_addr;
+EXPORT_SYMBOL_GPL(ibft_phys_addr);
static const struct {
char *sign;
@@ -47,13 +47,24 @@ static const struct {
#define VGA_MEM 0xA0000 /* VGA buffer */
#define VGA_SIZE 0x20000 /* 128kB */
-static int __init find_ibft_in_mem(void)
+/*
+ * Routine used to find and reserve the iSCSI Boot Format Table
+ */
+void __init reserve_ibft_region(void)
{
unsigned long pos;
unsigned int len = 0;
void *virt;
int i;
+ ibft_phys_addr = 0;
+
+ /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
+ * only use ACPI for this
+ */
+ if (efi_enabled(EFI_BOOT))
+ return;
+
for (pos = IBFT_START; pos < IBFT_END; pos += 16) {
/* The table can't be inside the VGA BIOS reserved space,
* so skip that area */
@@ -70,35 +81,12 @@ static int __init find_ibft_in_mem(void)
/* if the length of the table extends past 1M,
* the table cannot be valid. */
if (pos + len <= (IBFT_END-1)) {
- ibft_addr = (struct acpi_table_ibft *)virt;
- pr_info("iBFT found at 0x%lx.\n", pos);
- goto done;
+ ibft_phys_addr = pos;
+ memblock_reserve(ibft_phys_addr, PAGE_ALIGN(len));
+ pr_info("iBFT found at 0x%lx.\n", ibft_phys_addr);
+ return;
}
}
}
}
-done:
- return len;
-}
-/*
- * Routine used to find the iSCSI Boot Format Table. The logical
- * kernel address is set in the ibft_addr global variable.
- */
-unsigned long __init find_ibft_region(unsigned long *sizep)
-{
- ibft_addr = NULL;
-
- /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
- * only use ACPI for this */
-
- if (!efi_enabled(EFI_BOOT))
- find_ibft_in_mem();
-
- if (ibft_addr) {
- *sizep = PAGE_ALIGN(ibft_addr->header.length);
- return (u64)virt_to_phys(ibft_addr);
- }
-
- *sizep = 0;
- return 0;
}
diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
index b7b45ca82bea..790e7fcfc1a6 100644
--- a/include/linux/iscsi_ibft.h
+++ b/include/linux/iscsi_ibft.h
@@ -13,26 +13,22 @@
#ifndef ISCSI_IBFT_H
#define ISCSI_IBFT_H
-#include <linux/acpi.h>
+#include <linux/types.h>
/*
- * Logical location of iSCSI Boot Format Table.
- * If the value is NULL there is no iBFT on the machine.
+ * Physical location of iSCSI Boot Format Table.
+ * If the value is 0 there is no iBFT on the machine.
*/
-extern struct acpi_table_ibft *ibft_addr;
+extern phys_addr_t ibft_phys_addr;
/*
* Routine used to find and reserve the iSCSI Boot Format Table. The
- * mapped address is set in the ibft_addr variable.
+ * physical address is set in the ibft_phys_addr variable.
*/
#ifdef CONFIG_ISCSI_IBFT_FIND
-unsigned long find_ibft_region(unsigned long *sizep);
+void reserve_ibft_region(void);
#else
-static inline unsigned long find_ibft_region(unsigned long *sizep)
-{
- *sizep = 0;
- return 0;
-}
+static inline void reserve_ibft_region(void) {}
#endif
#endif /* ISCSI_IBFT_H */
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
Culprit:
<cut>
commit 4cd8dd3fe05e099792e1494dedd074eb5ba289b6
Author: Amy Kwan <amy.kwan1(a)ibm.com>
Date: Sun Aug 22 13:46:52 2021 -0500
[scudo][standalone] Link tests against libatomic if libatomic exists
It is possible that libatomic does not exist on some systems. This patch updates
the scudo standalone tests to link against libatomic if the library exists.
This is an update to the original patch: https://reviews.llvm.org/D64134 and
aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431.
Differential Revision: https://reviews.llvm.org/D108503
</cut>
Results regressed to (for first_bad == 4cd8dd3fe05e099792e1494dedd074eb5ba289b6)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-4cd8dd3fe05e099792e1494dedd074eb5ba289b6/results_id:
1
# 447.dealII,dealII_base.default regressed by 103
from (for last_good == d8d84c9df82fc114f2b22a533a8183065ca1a2e0)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-d8d84c9df82fc114f2b22a533a8183065ca1a2e0/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4515
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4510
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-4cd8dd3fe05e099792e1494dedd074eb5ba289b6
cd investigate-llvm-4cd8dd3fe05e099792e1494dedd074eb5ba289b6
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 4cd8dd3fe05e099792e1494dedd074eb5ba289b6
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d8d84c9df82fc114f2b22a533a8183065ca1a2e0
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 4cd8dd3fe05e099792e1494dedd074eb5ba289b6
Author: Amy Kwan <amy.kwan1(a)ibm.com>
Date: Sun Aug 22 13:46:52 2021 -0500
[scudo][standalone] Link tests against libatomic if libatomic exists
It is possible that libatomic does not exist on some systems. This patch updates
the scudo standalone tests to link against libatomic if the library exists.
This is an update to the original patch: https://reviews.llvm.org/D64134 and
aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431.
Differential Revision: https://reviews.llvm.org/D108503
---
compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
index f4186eba1688..eaa47a04a179 100644
--- a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
+++ b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt
@@ -39,7 +39,10 @@ foreach(lib ${SANITIZER_TEST_CXX_LIBRARIES})
endforeach()
list(APPEND LINK_FLAGS -pthread)
# Linking against libatomic is required with some compilers
-list(APPEND LINK_FLAGS -latomic)
+check_library_exists(atomic __atomic_load_8 "" COMPILER_RT_HAS_LIBATOMIC)
+if (COMPILER_RT_HAS_LIBATOMIC)
+ list(APPEND LINK_FLAGS -latomic)
+endif()
set(SCUDO_TEST_HEADERS
scudo_unit_test.h
</cut>
Progress (short week, 2 days)
* UM-2 [QEMU upstream maintainership]
+ Lots of code review and getting things upstream after trunk reopened
+ Wrote up a first draft of how to handle merging pullreqs, so that
other people can share this job with me
+ Sent a patchset that allows board models to mark some buses as
not suitable for plugging in user-created devices -- this avoids
problems with i2c devices appearing on buses that are supposed to
be for on-board devices only in the MPS2/MPS3 machines
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ MVE is now enabled upstream. (There are still some loose ends to
do under this JIRA task, though.)
+ Sent a patchset that makes some of the easier codegen optimizations
for the no-predication case. (Code review spotted an issue which
might be painful to sort out -- we'll see next week...)
-- PMM
Successfully identified regression in *binutils* in CI configuration tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz
Culprit:
<cut>
commit 590d3faada8a12bf0937bbf68413956dc6a339a9
Author: Tom de Vries <tdevries(a)suse.de>
Date: Mon Aug 30 10:30:26 2021 +0200
[gdb/testsuite] Improve argument syntax of proc arange
The current syntax of proc arange is:
...
proc arange { arange_start arange_length {comment ""} {seg_sel ""} } {
...
and a typical call looks like:
...
arange $start $len
...
This style is somewhat annoying because if you want to specify the last
parameter, you need to give the default values of all the other optional ones
before as well:
...
arange $start $len "" $seg_sel
...
Update the syntax to:
...
proc arange { options arange_start arange_length } {
parse_options {
{ comment "" }
{ seg_sel "" }
}
...
such that a typical call looks like:
...
arange {} $start $len
...
and a call using seg_sel looks like:
...
arange {
seg_sel $seg_sel
} $start $len
...
Also update proc aranges, which already has an options argument, to use the
new proc parse_options.
Tested on x86_64-linux.
Co-Authored-By: Simon Marchi <simon.marchi(a)polymtl.ca>
</cut>
Results regressed to (for first_bad == 590d3faada8a12bf0937bbf68413956dc6a339a9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Oz_mthumb artifacts/build-590d3faada8a12bf0937bbf68413956dc6a339a9/results_id:
1
# 447.dealII,[.] contract<3> regressed by 200
from (for last_good == cb03dd22b36b7bd21a81137005ec42dab8355b62)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Oz_mthumb artifacts/build-cb03dd22b36b7bd21a81137005ec42dab8355b62/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4418
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4431
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-binutils-590d3faada8a12bf0937bbf68413956dc6a339a9
cd investigate-binutils-590d3faada8a12bf0937bbf68413956dc6a339a9
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/
cd binutils
# Reproduce first_bad build
git checkout --detach 590d3faada8a12bf0937bbf68413956dc6a339a9
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach cb03dd22b36b7bd21a81137005ec42dab8355b62
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 590d3faada8a12bf0937bbf68413956dc6a339a9
Author: Tom de Vries <tdevries(a)suse.de>
Date: Mon Aug 30 10:30:26 2021 +0200
[gdb/testsuite] Improve argument syntax of proc arange
The current syntax of proc arange is:
...
proc arange { arange_start arange_length {comment ""} {seg_sel ""} } {
...
and a typical call looks like:
...
arange $start $len
...
This style is somewhat annoying because if you want to specify the last
parameter, you need to give the default values of all the other optional ones
before as well:
...
arange $start $len "" $seg_sel
...
Update the syntax to:
...
proc arange { options arange_start arange_length } {
parse_options {
{ comment "" }
{ seg_sel "" }
}
...
such that a typical call looks like:
...
arange {} $start $len
...
and a call using seg_sel looks like:
...
arange {
seg_sel $seg_sel
} $start $len
...
Also update proc aranges, which already has an options argument, to use the
new proc parse_options.
Tested on x86_64-linux.
Co-Authored-By: Simon Marchi <simon.marchi(a)polymtl.ca>
---
gdb/testsuite/gdb.dlang/watch-loc.exp | 2 +-
gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp | 6 +-
.../gdb.dwarf2/frame-inlined-in-outer-frame.exp | 2 +-
.../template-specification-full-name.exp | 2 +-
gdb/testsuite/gdb.testsuite/parse_options_args.exp | 59 ++++++++++++
gdb/testsuite/lib/dwarf.exp | 31 +++---
gdb/testsuite/lib/gdb.exp | 104 ++++++++++++++-------
7 files changed, 150 insertions(+), 56 deletions(-)
diff --git a/gdb/testsuite/gdb.dlang/watch-loc.exp b/gdb/testsuite/gdb.dlang/watch-loc.exp
index 6e8b26e3109..e13400ed479 100644
--- a/gdb/testsuite/gdb.dlang/watch-loc.exp
+++ b/gdb/testsuite/gdb.dlang/watch-loc.exp
@@ -68,7 +68,7 @@ Dwarf::assemble $asm_file {
}
aranges {} cu_start {
- arange $dmain_start $dmain_length
+ arange {} $dmain_start $dmain_length
}
}
diff --git a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp
index e65b4c8610a..d55b7fd150e 100644
--- a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp
+++ b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp
@@ -125,9 +125,9 @@ Dwarf::assemble $asm_file {
}
aranges {} cu_label {
- arange [lindex $main_func 0] [lindex $main_func 1]
- arange [lindex $frame2_func 0] [lindex $frame2_func 1]
- arange [lindex $frame3_func 0] [lindex $frame3_func 1]
+ arange {} [lindex $main_func 0] [lindex $main_func 1]
+ arange {} [lindex $frame2_func 0] [lindex $frame2_func 1]
+ arange {} [lindex $frame3_func 0] [lindex $frame3_func 1]
}
}
diff --git a/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp b/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp
index ff12cd79f19..f95558dffef 100644
--- a/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp
+++ b/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp
@@ -95,7 +95,7 @@ Dwarf::assemble $dwarf_asm {
}
aranges {} cu_label {
- arange __cu_low_pc __cu_high_pc
+ arange {} __cu_low_pc __cu_high_pc
}
}
diff --git a/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp b/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp
index 5c59777e1b6..6e736f2c8ef 100644
--- a/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp
+++ b/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp
@@ -69,7 +69,7 @@ Dwarf::assemble $asm_file {
}
aranges {} cu_start {
- arange "$main_start" "$main_length"
+ arange {} "$main_start" "$main_length"
}
}
diff --git a/gdb/testsuite/gdb.testsuite/parse_options_args.exp b/gdb/testsuite/gdb.testsuite/parse_options_args.exp
new file mode 100644
index 00000000000..ce14fc3cd7c
--- /dev/null
+++ b/gdb/testsuite/gdb.testsuite/parse_options_args.exp
@@ -0,0 +1,59 @@
+# Copyright 2021 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Testsuite self-tests for parse_options and parse_args.
+
+with_test_prefix parse_options {
+ proc test1 { options a b } {
+ set v2 "defval2"
+ parse_options {
+ { opt1 defval1 }
+ { opt2 $v2 }
+ { opt3 }
+ { opt4 }
+ }
+
+ gdb_assert { [string equal $a "vala"] }
+ gdb_assert { [string equal $b "valb"] }
+ gdb_assert { [string equal $opt1 "val1"] }
+ gdb_assert { [string equal $opt2 "defval2"] }
+ gdb_assert { $opt3 == 1 }
+ gdb_assert { $opt4 == 0 }
+ }
+
+ set v1 "val1"
+ test1 { opt1 $v1 opt3 } "vala" "valb"
+}
+
+with_test_prefix parse_args {
+ proc test2 { args } {
+ parse_args {
+ { opt1 defval1 }
+ { opt2 defval2 }
+ { opt3 }
+ { opt4 }
+ }
+ gdb_assert { [llength $args] == 2 }
+ lassign $args a b
+ gdb_assert { [string equal $a "vala"] }
+ gdb_assert { [string equal $b "valb"] }
+ gdb_assert { [string equal $opt1 "val1"] }
+ gdb_assert { [string equal $opt2 "defval2"] }
+ gdb_assert { $opt3 == 1 }
+ gdb_assert { $opt4 == 0 }
+ }
+
+ set v1 "val1"
+ test2 -opt1 $v1 -opt3 "vala" "valb"
+}
diff --git a/gdb/testsuite/lib/dwarf.exp b/gdb/testsuite/lib/dwarf.exp
index 120fa418201..7fb3561a443 100644
--- a/gdb/testsuite/lib/dwarf.exp
+++ b/gdb/testsuite/lib/dwarf.exp
@@ -2212,7 +2212,12 @@ namespace eval Dwarf {
# Emit a DWARF .debug_aranges entry.
- proc arange { arange_start arange_length {comment ""} {seg_sel ""} } {
+ proc arange { options arange_start arange_length } {
+ parse_options {
+ { comment "" }
+ { seg_sel "" }
+ }
+
if { $comment != "" } {
# Wrap
set comment " ($comment)"
@@ -2270,22 +2275,14 @@ namespace eval Dwarf {
variable _addr_size
variable _seg_size
- # Establish the defaults.
- set is_64 0
- set cu_is_64 0
- set section_version 2
- set _seg_size 0
-
# Handle options.
- foreach { name value } $options {
- switch -exact -- $name {
- is_64 { set is_64 $value }
- cu_is_64 { set cu_is_64 $value }
- section_version {set section_version $value }
- seg_size { set _seg_size $value }
- default { error "unknown option $name" }
- }
+ parse_options {
+ { is_64 0 }
+ { cu_is_64 0 }
+ { section_version 2 }
+ { seg_size 0 }
}
+ set _seg_size $seg_size
if { [is_64_target] } {
set _addr_size 8
@@ -2354,9 +2351,9 @@ namespace eval Dwarf {
# Terminator tuple.
set comment "Terminator"
if { $_seg_size == 0 } {
- arange 0 0 $comment
+ arange {comment $comment} 0 0
} else {
- arange 0 0 $comment 0
+ arange {comment $comment seg_sel 0} 0 0
}
# End label.
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 093392709b4..3aea7baaab0 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -7293,8 +7293,8 @@ proc using_fission { } {
return [regexp -- "-gsplit-dwarf" $debug_flags]
}
-# Search the caller's ARGS list and set variables according to the list of
-# valid options described by ARGSET.
+# Search LISTNAME in uplevel LEVEL caller and set variables according to the
+# list of valid options with prefix PREFIX described by ARGSET.
#
# The first member of each one- or two-element list in ARGSET defines the
# name of a variable that will be added to the caller's scope.
@@ -7305,13 +7305,15 @@ proc using_fission { } {
#
# If two elements are given, the second element is the default value of
# the variable. This is then overwritten if the option exists in ARGS.
+# If EVAL, then subst is called on the value, which allows variables
+# to be used.
#
# Any parse_args elements in (the caller's) ARGS will be removed, leaving
# any optional components.
-
+#
# Example:
# proc myproc {foo args} {
-# parse_args {{bar} {baz "abc"} {qux}}
+# parse_list args 1 {{bar} {baz "abc"} {qux}} "-" false
# # ...
# }
# myproc ABC -bar -baz DEF peanut butter
@@ -7319,43 +7321,79 @@ proc using_fission { } {
# foo (=ABC), bar (=1), baz (=DEF), and qux (=0)
# args will be the list {peanut butter}
-proc parse_args { argset } {
- upvar args args
+proc parse_list { level listname argset prefix eval } {
+ upvar $level $listname args
foreach argument $argset {
- if {[llength $argument] == 1} {
- # No default specified, so we assume that we should set
- # the value to 1 if the arg is present and 0 if it's not.
- # It is assumed that no value is given with the argument.
- set result [lsearch -exact $args "-$argument"]
- if {$result != -1} then {
- uplevel 1 [list set $argument 1]
- set args [lreplace $args $result $result]
- } else {
- uplevel 1 [list set $argument 0]
- }
- } elseif {[llength $argument] == 2} {
- # There are two items in the argument. The second is a
- # default value to use if the item is not present.
- # Otherwise, the variable is set to whatever is provided
- # after the item in the args.
- set arg [lindex $argument 0]
- set result [lsearch -exact $args "-[lindex $arg 0]"]
- if {$result != -1} then {
- uplevel 1 [list set $arg [lindex $args [expr $result+1]]]
- set args [lreplace $args $result [expr $result+1]]
- } else {
- uplevel 1 [list set $arg [lindex $argument 1]]
- }
- } else {
- error "Badly formatted argument \"$argument\" in argument set"
- }
+ if {[llength $argument] == 1} {
+ # Normalize argument, strip leading/trailing whitespace.
+ # Allows us to treat {foo} and { foo } the same.
+ set argument [string trim $argument]
+
+ # No default specified, so we assume that we should set
+ # the value to 1 if the arg is present and 0 if it's not.
+ # It is assumed that no value is given with the argument.
+ set pattern "$prefix$argument"
+ set result [lsearch -exact $args $pattern]
+
+ if {$result != -1} then {
+ set value 1
+ set args [lreplace $args $result $result]
+ } else {
+ set value 0
+ }
+ uplevel $level [list set $argument $value]
+ } elseif {[llength $argument] == 2} {
+ # There are two items in the argument. The second is a
+ # default value to use if the item is not present.
+ # Otherwise, the variable is set to whatever is provided
+ # after the item in the args.
+ set arg [lindex $argument 0]
+ set pattern "$prefix[lindex $arg 0]"
+ set result [lsearch -exact $args $pattern]
+
+ if {$result != -1} then {
+ set value [lindex $args [expr $result+1]]
+ if { $eval } {
+ set value [uplevel [expr $level + 1] [list subst $value]]
+ }
+ set args [lreplace $args $result [expr $result+1]]
+ } else {
+ set value [lindex $argument 1]
+ if { $eval } {
+ set value [uplevel $level [list subst $value]]
+ }
+ }
+ uplevel $level [list set $arg $value]
+ } else {
+ error "Badly formatted argument \"$argument\" in argument set"
+ }
}
+}
+
+# Search the caller's args variable and set variables according to the list of
+# valid options described by ARGSET.
+
+proc parse_args { argset } {
+ parse_list 2 args $argset "-" false
# The remaining args should be checked to see that they match the
# number of items expected to be passed into the procedure...
}
+# Process the caller's options variable and set variables according
+# to the list of valid options described by OPTIONSET.
+
+proc parse_options { optionset } {
+ parse_list 2 options $optionset "" true
+
+ # Require no remaining options.
+ upvar 1 options options
+ if { [llength $options] != 0 } {
+ error "Options left unparsed: $options"
+ }
+}
+
# Capture the output of COMMAND in a string ignoring PREFIX (a regexp);
# return that string.
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_O3. So far, this commit has regressed CI configurations:
- tcwg_gcc_bootstrap/master-arm-bootstrap_O3
Culprit:
<cut>
commit cad36f38576a6a781e3c62ab061c68f5b8dab13a
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Tue Aug 31 11:45:07 2021 +0100
Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register. For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.
Whilst there I also added another optimization. If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.
2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.
</cut>
Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap_O3:
2
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3…
Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a
cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3…
Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3…
Full commit (up to 1000 lines):
<cut>
commit cad36f38576a6a781e3c62ab061c68f5b8dab13a
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Tue Aug 31 11:45:07 2021 +0100
Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register. For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.
Whilst there I also added another optimization. If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.
2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.
---
gcc/expr.c | 19 ++++++++++++++++++-
gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
2 files changed, 60 insertions(+), 11 deletions(-)
diff --git a/gcc/expr.c b/gcc/expr.c
index 096c0315ecc..5dd98a9bccc 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp)
&& (GET_MODE_PRECISION (subreg_promoted_mode (x))
>= GET_MODE_PRECISION (int_mode))
&& SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp))
- x = gen_lowpart (int_mode, SUBREG_REG (x));
+ {
+ scalar_int_mode int_orig_mode;
+ machine_mode orig_mode = GET_MODE (x);
+ x = gen_lowpart (int_mode, SUBREG_REG (x));
+
+ /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than
+ the original mode, but narrower than the inner mode. */
+ if (GET_CODE (x) == SUBREG
+ && GET_MODE_PRECISION (subreg_promoted_mode (x))
+ > GET_MODE_PRECISION (int_mode)
+ && is_a <scalar_int_mode> (orig_mode, &int_orig_mode)
+ && GET_MODE_PRECISION (int_mode)
+ > GET_MODE_PRECISION (int_orig_mode))
+ {
+ SUBREG_PROMOTED_VAR_P (x) = 1;
+ SUBREG_PROMOTED_SET (x, unsignedp);
+ }
+ }
if (GET_MODE (x) != VOIDmode)
oldmode = GET_MODE (x);
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index e431e0c19d7..ebad5cb5a79 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
target mode is the same as the variable's promotion. */
if (GET_CODE (op) == SUBREG
&& SUBREG_PROMOTED_VAR_P (op)
- && SUBREG_PROMOTED_SIGNED_P (op)
- && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op))))
+ && SUBREG_PROMOTED_SIGNED_P (op))
{
- temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op));
- if (temp)
- return temp;
+ rtx subreg = SUBREG_REG (op);
+ machine_mode subreg_mode = GET_MODE (subreg);
+ if (!paradoxical_subreg_p (mode, subreg_mode))
+ {
+ temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg);
+ if (temp)
+ {
+ /* Preserve SUBREG_PROMOTED_VAR_P. */
+ if (partial_subreg_p (temp))
+ {
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, 1);
+ }
+ return temp;
+ }
+ }
+ else
+ /* Sign-extending a sign-extended subreg. */
+ return simplify_gen_unary (SIGN_EXTEND, mode,
+ subreg, subreg_mode);
}
/* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>).
@@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
target mode is the same as the variable's promotion. */
if (GET_CODE (op) == SUBREG
&& SUBREG_PROMOTED_VAR_P (op)
- && SUBREG_PROMOTED_UNSIGNED_P (op)
- && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op))))
+ && SUBREG_PROMOTED_UNSIGNED_P (op))
{
- temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op));
- if (temp)
- return temp;
+ rtx subreg = SUBREG_REG (op);
+ machine_mode subreg_mode = GET_MODE (subreg);
+ if (!paradoxical_subreg_p (mode, subreg_mode))
+ {
+ temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg);
+ if (temp)
+ {
+ /* Preserve SUBREG_PROMOTED_VAR_P. */
+ if (partial_subreg_p (temp))
+ {
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, 0);
+ }
+ return temp;
+ }
+ }
+ else
+ /* Zero-extending a zero-extended subreg. */
+ return simplify_gen_unary (ZERO_EXTEND, mode,
+ subreg, subreg_mode);
}
/* Extending a widening multiplication should be canonicalized to
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_gnu_native_build/master-aarch64. So far, this commit has regressed CI configurations:
- tcwg_gnu_native_build/master-aarch64
Culprit:
<cut>
commit cad36f38576a6a781e3c62ab061c68f5b8dab13a
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Tue Aug 31 11:45:07 2021 +0100
Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register. For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.
Whilst there I also added another optimization. If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.
2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.
</cut>
Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:05:59 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/shared-object.mk:14: trunctfhf2.o] Error 1
from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe gcc:
2
# build_abe linux:
4
# build_abe glibc:
5
# build_abe gdb:
6
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art…
Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a
cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art…
Build log: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/con…
Full commit (up to 1000 lines):
<cut>
commit cad36f38576a6a781e3c62ab061c68f5b8dab13a
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Tue Aug 31 11:45:07 2021 +0100
Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register. For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.
Whilst there I also added another optimization. If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.
2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.
---
gcc/expr.c | 19 ++++++++++++++++++-
gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
2 files changed, 60 insertions(+), 11 deletions(-)
diff --git a/gcc/expr.c b/gcc/expr.c
index 096c0315ecc..5dd98a9bccc 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp)
&& (GET_MODE_PRECISION (subreg_promoted_mode (x))
>= GET_MODE_PRECISION (int_mode))
&& SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp))
- x = gen_lowpart (int_mode, SUBREG_REG (x));
+ {
+ scalar_int_mode int_orig_mode;
+ machine_mode orig_mode = GET_MODE (x);
+ x = gen_lowpart (int_mode, SUBREG_REG (x));
+
+ /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than
+ the original mode, but narrower than the inner mode. */
+ if (GET_CODE (x) == SUBREG
+ && GET_MODE_PRECISION (subreg_promoted_mode (x))
+ > GET_MODE_PRECISION (int_mode)
+ && is_a <scalar_int_mode> (orig_mode, &int_orig_mode)
+ && GET_MODE_PRECISION (int_mode)
+ > GET_MODE_PRECISION (int_orig_mode))
+ {
+ SUBREG_PROMOTED_VAR_P (x) = 1;
+ SUBREG_PROMOTED_SET (x, unsignedp);
+ }
+ }
if (GET_MODE (x) != VOIDmode)
oldmode = GET_MODE (x);
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index e431e0c19d7..ebad5cb5a79 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
target mode is the same as the variable's promotion. */
if (GET_CODE (op) == SUBREG
&& SUBREG_PROMOTED_VAR_P (op)
- && SUBREG_PROMOTED_SIGNED_P (op)
- && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op))))
+ && SUBREG_PROMOTED_SIGNED_P (op))
{
- temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op));
- if (temp)
- return temp;
+ rtx subreg = SUBREG_REG (op);
+ machine_mode subreg_mode = GET_MODE (subreg);
+ if (!paradoxical_subreg_p (mode, subreg_mode))
+ {
+ temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg);
+ if (temp)
+ {
+ /* Preserve SUBREG_PROMOTED_VAR_P. */
+ if (partial_subreg_p (temp))
+ {
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, 1);
+ }
+ return temp;
+ }
+ }
+ else
+ /* Sign-extending a sign-extended subreg. */
+ return simplify_gen_unary (SIGN_EXTEND, mode,
+ subreg, subreg_mode);
}
/* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>).
@@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
target mode is the same as the variable's promotion. */
if (GET_CODE (op) == SUBREG
&& SUBREG_PROMOTED_VAR_P (op)
- && SUBREG_PROMOTED_UNSIGNED_P (op)
- && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op))))
+ && SUBREG_PROMOTED_UNSIGNED_P (op))
{
- temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op));
- if (temp)
- return temp;
+ rtx subreg = SUBREG_REG (op);
+ machine_mode subreg_mode = GET_MODE (subreg);
+ if (!paradoxical_subreg_p (mode, subreg_mode))
+ {
+ temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg);
+ if (temp)
+ {
+ /* Preserve SUBREG_PROMOTED_VAR_P. */
+ if (partial_subreg_p (temp))
+ {
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, 0);
+ }
+ return temp;
+ }
+ }
+ else
+ /* Zero-extending a zero-extended subreg. */
+ return simplify_gen_unary (ZERO_EXTEND, mode,
+ subreg, subreg_mode);
}
/* Extending a widening multiplication should be canonicalized to
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_gnu_cross_build/master-aarch64. So far, this commit has regressed CI configurations:
- tcwg_gnu_cross_build/master-aarch64
Culprit:
<cut>
commit cad36f38576a6a781e3c62ab061c68f5b8dab13a
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Tue Aug 31 11:45:07 2021 +0100
Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register. For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.
Whilst there I also added another optimization. If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.
2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.
</cut>
Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
# 00:04:40 cc1: error: no include path in which to search for stdc-predef.h
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
# 00:04:48 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/static-object.mk:17: floatsitf.o] Error 1
# 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132
from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# build_abe stage2:
5
# build_abe gdb:
6
# build_abe qemu:
7
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti…
Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a
cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti…
Build log: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/cons…
Full commit (up to 1000 lines):
<cut>
commit cad36f38576a6a781e3c62ab061c68f5b8dab13a
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Tue Aug 31 11:45:07 2021 +0100
Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg
is correctly zero-extended or sign-extended in the parent register. For
example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the
byte x is zero extended in reg:SI 23, which is useful for optimization.
An example is that zero extending the above QImode value to HImode can
simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0).
This patch addresses the oversight/missed optimization opportunity that
the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P
annotation as its value is guaranteed to be correctly extended in the
SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already
present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing
from one or two (precisely three) places that (accidentally) strip it.
Whilst there I also added another optimization. If we need to extend
the above QImode value beyond the SImode register holding it, say to
DImode, we can eliminate the SUBREG and simply extend from the SImode
register to DImode.
2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider)
partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate
SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
[ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg
would be paradoxical.
---
gcc/expr.c | 19 ++++++++++++++++++-
gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
2 files changed, 60 insertions(+), 11 deletions(-)
diff --git a/gcc/expr.c b/gcc/expr.c
index 096c0315ecc..5dd98a9bccc 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp)
&& (GET_MODE_PRECISION (subreg_promoted_mode (x))
>= GET_MODE_PRECISION (int_mode))
&& SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp))
- x = gen_lowpart (int_mode, SUBREG_REG (x));
+ {
+ scalar_int_mode int_orig_mode;
+ machine_mode orig_mode = GET_MODE (x);
+ x = gen_lowpart (int_mode, SUBREG_REG (x));
+
+ /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than
+ the original mode, but narrower than the inner mode. */
+ if (GET_CODE (x) == SUBREG
+ && GET_MODE_PRECISION (subreg_promoted_mode (x))
+ > GET_MODE_PRECISION (int_mode)
+ && is_a <scalar_int_mode> (orig_mode, &int_orig_mode)
+ && GET_MODE_PRECISION (int_mode)
+ > GET_MODE_PRECISION (int_orig_mode))
+ {
+ SUBREG_PROMOTED_VAR_P (x) = 1;
+ SUBREG_PROMOTED_SET (x, unsignedp);
+ }
+ }
if (GET_MODE (x) != VOIDmode)
oldmode = GET_MODE (x);
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index e431e0c19d7..ebad5cb5a79 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
target mode is the same as the variable's promotion. */
if (GET_CODE (op) == SUBREG
&& SUBREG_PROMOTED_VAR_P (op)
- && SUBREG_PROMOTED_SIGNED_P (op)
- && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op))))
+ && SUBREG_PROMOTED_SIGNED_P (op))
{
- temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op));
- if (temp)
- return temp;
+ rtx subreg = SUBREG_REG (op);
+ machine_mode subreg_mode = GET_MODE (subreg);
+ if (!paradoxical_subreg_p (mode, subreg_mode))
+ {
+ temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg);
+ if (temp)
+ {
+ /* Preserve SUBREG_PROMOTED_VAR_P. */
+ if (partial_subreg_p (temp))
+ {
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, 1);
+ }
+ return temp;
+ }
+ }
+ else
+ /* Sign-extending a sign-extended subreg. */
+ return simplify_gen_unary (SIGN_EXTEND, mode,
+ subreg, subreg_mode);
}
/* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>).
@@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
target mode is the same as the variable's promotion. */
if (GET_CODE (op) == SUBREG
&& SUBREG_PROMOTED_VAR_P (op)
- && SUBREG_PROMOTED_UNSIGNED_P (op)
- && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op))))
+ && SUBREG_PROMOTED_UNSIGNED_P (op))
{
- temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op));
- if (temp)
- return temp;
+ rtx subreg = SUBREG_REG (op);
+ machine_mode subreg_mode = GET_MODE (subreg);
+ if (!paradoxical_subreg_p (mode, subreg_mode))
+ {
+ temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg);
+ if (temp)
+ {
+ /* Preserve SUBREG_PROMOTED_VAR_P. */
+ if (partial_subreg_p (temp))
+ {
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, 0);
+ }
+ return temp;
+ }
+ }
+ else
+ /* Zero-extending a zero-extended subreg. */
+ return simplify_gen_unary (ZERO_EXTEND, mode,
+ subreg, subreg_mode);
}
/* Extending a widening multiplication should be canonicalized to
</cut>
Successfully identified regression in *gdb* in CI configuration tcwg_gnu_native_build/master-arm. So far, this commit has regressed CI configurations:
- tcwg_gnu_native_build/master-arm
Culprit:
<cut>
commit 282aa4f7d292eb4bc213d028465a3b96f5af2f22
Author: Tom Tromey <tom(a)tromey.com>
Date: Sat Aug 28 13:16:50 2021 -0600
Add some parallel_for_each tests
Tom de Vries noticed that a patch in the DWARF scanner rewrite series
caused a regression in parallel_for_each -- it started crashing in the
case where the number of threads is 0 (there was an unchecked use of
"n-1" that was used to size an array).
He also pointed out that there were no tests of parallel_for_each.
This adds a few tests of parallel_for_each, primarily testing that
different settings for the number of threads will work. This test
catches the bug that he found in that series.
</cut>
Results regressed to (for first_bad == 282aa4f7d292eb4bc213d028465a3b96f5af2f22)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe gcc:
2
# build_abe linux:
4
# build_abe glibc:
5
# First few build errors in logs:
# 00:03:45 ../../../../../../gdb/gdb/unittests/parallel-for-selftests.c:53:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’
# 00:03:45 make[1]: *** [unittests/parallel-for-selftests.o] Error 1
# 00:03:46 make: *** [all-gdb] Error 2
from (for last_good == ee8b88452c1cb1be97199942aee7a76bbca210ee)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe gcc:
2
# build_abe linux:
4
# build_abe glibc:
5
# build_abe gdb:
6
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac…
Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gdb-282aa4f7d292eb4bc213d028465a3b96f5af2f22
cd investigate-gdb-282aa4f7d292eb4bc213d028465a3b96f5af2f22
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gdb/ ./ ./bisect/baseline/
cd gdb
# Reproduce first_bad build
git checkout --detach 282aa4f7d292eb4bc213d028465a3b96f5af2f22
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ee8b88452c1cb1be97199942aee7a76bbca210ee
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac…
Build log: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/console…
Full commit (up to 1000 lines):
<cut>
commit 282aa4f7d292eb4bc213d028465a3b96f5af2f22
Author: Tom Tromey <tom(a)tromey.com>
Date: Sat Aug 28 13:16:50 2021 -0600
Add some parallel_for_each tests
Tom de Vries noticed that a patch in the DWARF scanner rewrite series
caused a regression in parallel_for_each -- it started crashing in the
case where the number of threads is 0 (there was an unchecked use of
"n-1" that was used to size an array).
He also pointed out that there were no tests of parallel_for_each.
This adds a few tests of parallel_for_each, primarily testing that
different settings for the number of threads will work. This test
catches the bug that he found in that series.
---
gdb/Makefile.in | 1 +
gdb/unittests/parallel-for-selftests.c | 86 ++++++++++++++++++++++++++++++++++
2 files changed, 87 insertions(+)
diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 73a1bf83c85..320d3326a81 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -456,6 +456,7 @@ SELFTESTS_SRCS = \
unittests/offset-type-selftests.c \
unittests/observable-selftests.c \
unittests/optional-selftests.c \
+ unittests/parallel-for-selftests.c \
unittests/parse-connection-spec-selftests.c \
unittests/ptid-selftests.c \
unittests/main-thread-selftests.c \
diff --git a/gdb/unittests/parallel-for-selftests.c b/gdb/unittests/parallel-for-selftests.c
new file mode 100644
index 00000000000..7f61b709fa7
--- /dev/null
+++ b/gdb/unittests/parallel-for-selftests.c
@@ -0,0 +1,86 @@
+/* Self tests for parallel_for_each
+
+ Copyright (C) 2021 Free Software Foundation, Inc.
+
+ This file is part of GDB.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include "defs.h"
+#include "gdbsupport/selftest.h"
+#include "gdbsupport/parallel-for.h"
+#include "gdbsupport/thread-pool.h"
+
+#if CXX_STD_THREAD
+
+namespace selftests {
+namespace parallel_for {
+
+struct save_restore_n_threads
+{
+ save_restore_n_threads ()
+ : n_threads (gdb::thread_pool::g_thread_pool->thread_count ())
+ {
+ }
+
+ ~save_restore_n_threads ()
+ {
+ gdb::thread_pool::g_thread_pool->set_thread_count (n_threads);
+ }
+
+ int n_threads;
+};
+
+static void
+test (int n_threads)
+{
+ save_restore_n_threads saver;
+ gdb::thread_pool::g_thread_pool->set_thread_count (n_threads);
+
+#define NUMBER 10000
+
+ std::atomic<int> counter = 0;
+ gdb::parallel_for_each (0, NUMBER,
+ [&] (int start, int end)
+ {
+ counter += end - start;
+ });
+
+ SELF_CHECK (counter == NUMBER);
+
+#undef NUMBER
+}
+
+static void
+test_n_threads ()
+{
+ test (0);
+ test (1);
+ test (3);
+}
+
+}
+}
+
+#endif /* CXX_STD_THREAD */
+
+void _initialize_parallel_for_selftests ();
+void
+_initialize_parallel_for_selftests ()
+{
+#ifdef CXX_STD_THREAD
+ selftests::register_test ("parallel_for",
+ selftests::parallel_for::test_n_threads);
+#endif /* CXX_STD_THREAD */
+}
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_gnu_cross_build/master-arm. So far, this commit has regressed CI configurations:
- tcwg_gnu_cross_build/master-arm
Culprit:
<cut>
commit caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1
Author: Sebastian Huber <sebastian.huber(a)embedded-brains.de>
Date: Tue Aug 17 09:53:43 2021 +0200
Use __builtin_trap() for abort() if inhibit_libc
abort() is used in gcc_assert() and gcc_unreachable() which is used by target
libraries such as libgcov.a. This patch changes the abort() definition under
certain conditions. If inhibit_libc is defined and abort is not already
defined, then abort() is defined to __builtin_trap().
The inhibit_libc define is usually defined if GCC is built for targets running
in embedded systems which may optionally use a C standard library. If
inhibit_libc is defined, then there may be still a full featured abort()
available. abort() is a heavy weight function which depends on signals and
file streams. For statically linked applications, this means that a dependency
on gcc_assert() pulls in the support for signals and file streams. This could
prevent using gcov to test low end targets for example. Using __builtin_trap()
avoids these dependencies if the target implements a "trap" instruction. The
application or operating system could use a trap handler to react to failed GCC
runtime checks which caused a trap.
gcc/
* tsystem.h (abort): Define abort() if inhibit_libc is defined and it
is not already defined.
</cut>
Results regressed to (for first_bad == caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
# 00:01:44 cc1: error: no include path in which to search for stdc-predef.h
# 00:02:05 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/unwind-arm-common.inc:55:24: error: macro passed 1 arguments, but takes just 0
# 00:02:05 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/static-object.mk:17: unwind-arm.o] Error 1
# 00:02:06 make[1]: *** [Makefile:12484: all-target-libgcc] Error 2
# 00:02:06 make: *** [Makefile:953: all] Error 2
from (for last_good == d7e56b084d0b230ae5ee280f569d679fa0f09f4d)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# build_abe stage2:
5
# build_abe gdb:
6
# build_abe qemu:
7
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact…
Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1
cd investigate-gcc-caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d7e56b084d0b230ae5ee280f569d679fa0f09f4d
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact…
Build log: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/consoleT…
Full commit (up to 1000 lines):
<cut>
commit caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1
Author: Sebastian Huber <sebastian.huber(a)embedded-brains.de>
Date: Tue Aug 17 09:53:43 2021 +0200
Use __builtin_trap() for abort() if inhibit_libc
abort() is used in gcc_assert() and gcc_unreachable() which is used by target
libraries such as libgcov.a. This patch changes the abort() definition under
certain conditions. If inhibit_libc is defined and abort is not already
defined, then abort() is defined to __builtin_trap().
The inhibit_libc define is usually defined if GCC is built for targets running
in embedded systems which may optionally use a C standard library. If
inhibit_libc is defined, then there may be still a full featured abort()
available. abort() is a heavy weight function which depends on signals and
file streams. For statically linked applications, this means that a dependency
on gcc_assert() pulls in the support for signals and file streams. This could
prevent using gcov to test low end targets for example. Using __builtin_trap()
avoids these dependencies if the target implements a "trap" instruction. The
application or operating system could use a trap handler to react to failed GCC
runtime checks which caused a trap.
gcc/
* tsystem.h (abort): Define abort() if inhibit_libc is defined and it
is not already defined.
---
gcc/tsystem.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/tsystem.h b/gcc/tsystem.h
index e1e6a96a4f4..5c72c69ff3e 100644
--- a/gcc/tsystem.h
+++ b/gcc/tsystem.h
@@ -59,7 +59,7 @@ extern int atexit (void (*)(void));
#endif
#ifndef abort
-extern void abort (void) __attribute__ ((__noreturn__));
+#define abort() __builtin_trap ()
#endif
#ifndef strlen
</cut>
Progress (short week, 3 days):
* UM-2 [QEMU upstream maintainership]
+ QEMU 6.1.0 has now been released
+ Sent out the first arm pullreq for the 6.2 cycle, including
another slice of the MVE patches
+ tried to work through some of the codereview backlog
-- PMM
Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_debug. So far, this commit has regressed CI configurations:
- tcwg_gcc_bootstrap/master-arm-bootstrap_debug
Culprit:
<cut>
commit 1d244020246cb155e4de62ca3b302b920a1f513f
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Mon Aug 23 12:37:04 2021 +0100
Fold sign of LSHIFT_EXPR to eliminate no-op conversions.
This short patch teaches fold that it is "safe" to change the sign
of a left shift, to reduce the number of type conversions in gimple.
As an example:
unsigned int foo(unsigned int i) {
return (int)i << 8;
}
is currently optimized to:
unsigned int foo (unsigned int i)
{
int i.0_1;
int _2;
unsigned int _4;
<bb 2> [local count: 1073741824]:
i.0_1 = (int) i_3(D);
_2 = i.0_1 << 8;
_4 = (unsigned int) _2;
return _4;
}
with this patch, this now becomes:
unsigned int foo (unsigned int i)
{
unsigned int _2;
<bb 2> [local count: 1073741824]:
_2 = i_1(D) << 8;
return _2;
}
which generates exactly the same assembly language. Aside from the
reduced memory usage, the real benefit is that no-op conversions tend
to interfere with many folding optimizations. For example,
unsigned int bar(unsigned char i) {
return (i ^ (i<<16)) | (i<<8);
}
currently gets (tangled in conversions and) optimized to:
unsigned int bar (unsigned char i)
{
unsigned int _1;
unsigned int _2;
int _3;
int _4;
unsigned int _6;
unsigned int _8;
<bb 2> [local count: 1073741824]:
_1 = (unsigned int) i_5(D);
_2 = _1 * 65537;
_3 = (int) i_5(D);
_4 = _3 << 8;
_8 = (unsigned int) _4;
_6 = _2 | _8;
return _6;
}
but with this patch, bar now optimizes down to:
unsigned int bar(unsigned char i)
{
unsigned int _1;
unsigned int _4;
<bb 2> [local count: 1073741824]:
_1 = (unsigned int) i_3(D);
_4 = _1 * 65793;
return _4;
}
2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* match.pd (shift transformations): Change the sign of an
LSHIFT_EXPR if it reduces the number of explicit conversions.
gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-1.c: New test case.
* gcc.dg/fold-convlshift-2.c: New test case.
</cut>
Results regressed to (for first_bad == 1d244020246cb155e4de62ca3b302b920a1f513f)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
# 00:06:26 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored)
# 00:25:39 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored)
# 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: error: type mismatch in ‘lshift_expr’
# 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: internal compiler error: ‘verify_gimple’ failed
# 00:29:38 make[3]: *** [bitmap.o] Error 1
# 00:34:06 make[2]: *** [all-stage3-gcc] Error 2
# 00:34:06 make[1]: *** [stage3-bubble] Error 2
# 00:34:07 make: *** [all] Error 2
from (for last_good == b320edc0c29c838b0090c3c9be14187d132f73f2)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap_debug:
2
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de…
Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f
cd investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 1d244020246cb155e4de62ca3b302b920a1f513f
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b320edc0c29c838b0090c3c9be14187d132f73f2
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de…
Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de…
Full commit (up to 1000 lines):
<cut>
commit 1d244020246cb155e4de62ca3b302b920a1f513f
Author: Roger Sayle <roger(a)nextmovesoftware.com>
Date: Mon Aug 23 12:37:04 2021 +0100
Fold sign of LSHIFT_EXPR to eliminate no-op conversions.
This short patch teaches fold that it is "safe" to change the sign
of a left shift, to reduce the number of type conversions in gimple.
As an example:
unsigned int foo(unsigned int i) {
return (int)i << 8;
}
is currently optimized to:
unsigned int foo (unsigned int i)
{
int i.0_1;
int _2;
unsigned int _4;
<bb 2> [local count: 1073741824]:
i.0_1 = (int) i_3(D);
_2 = i.0_1 << 8;
_4 = (unsigned int) _2;
return _4;
}
with this patch, this now becomes:
unsigned int foo (unsigned int i)
{
unsigned int _2;
<bb 2> [local count: 1073741824]:
_2 = i_1(D) << 8;
return _2;
}
which generates exactly the same assembly language. Aside from the
reduced memory usage, the real benefit is that no-op conversions tend
to interfere with many folding optimizations. For example,
unsigned int bar(unsigned char i) {
return (i ^ (i<<16)) | (i<<8);
}
currently gets (tangled in conversions and) optimized to:
unsigned int bar (unsigned char i)
{
unsigned int _1;
unsigned int _2;
int _3;
int _4;
unsigned int _6;
unsigned int _8;
<bb 2> [local count: 1073741824]:
_1 = (unsigned int) i_5(D);
_2 = _1 * 65537;
_3 = (int) i_5(D);
_4 = _3 << 8;
_8 = (unsigned int) _4;
_6 = _2 | _8;
return _6;
}
but with this patch, bar now optimizes down to:
unsigned int bar(unsigned char i)
{
unsigned int _1;
unsigned int _4;
<bb 2> [local count: 1073741824]:
_1 = (unsigned int) i_3(D);
_4 = _1 * 65793;
return _4;
}
2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com>
gcc/ChangeLog
* match.pd (shift transformations): Change the sign of an
LSHIFT_EXPR if it reduces the number of explicit conversions.
gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-1.c: New test case.
* gcc.dg/fold-convlshift-2.c: New test case.
---
gcc/match.pd | 9 +++++++++
gcc/testsuite/gcc.dg/fold-convlshift-1.c | 20 ++++++++++++++++++++
gcc/testsuite/gcc.dg/fold-convlshift-2.c | 20 ++++++++++++++++++++
3 files changed, 49 insertions(+)
diff --git a/gcc/match.pd b/gcc/match.pd
index 0fcfd0ea62c..978a1b0172e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3385,6 +3385,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (integer_zerop (@2) || integer_all_onesp (@2))
(cmp @0 @2)))))
+/* Both signed and unsigned lshift produce the same result, so use
+ the form that minimizes the number of conversions. */
+(simplify
+ (convert (lshift:s@0 (convert:s@1 @2) INTEGER_CST@3))
+ (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+ && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+ && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type))
+ (lshift (convert @2) @3)))
+
/* Simplifications of conversions. */
/* Basic strip-useless-type-conversions / strip_nops. */
diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-1.c b/gcc/testsuite/gcc.dg/fold-convlshift-1.c
new file mode 100644
index 00000000000..b6f57f81e72
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-convlshift-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo(unsigned int i)
+{
+ int t1 = i;
+ int t2 = t1 << 8;
+ return t2;
+}
+
+int bar(int i)
+{
+ unsigned int t1 = i;
+ unsigned int t2 = t1 << 8;
+ return t2;
+}
+
+/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(unsigned int\\)" "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-2.c b/gcc/testsuite/gcc.dg/fold-convlshift-2.c
new file mode 100644
index 00000000000..f21358c4584
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-convlshift-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int foo(unsigned char c)
+{
+ int t1 = c;
+ int t2 = t1 << 8;
+ return t2;
+}
+
+int bar(unsigned char c)
+{
+ unsigned int t1 = c;
+ unsigned int t2 = t1 << 8;
+ return t2;
+}
+
+/* { dg-final { scan-tree-dump-times "\\(int\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 1 "optimized" } } */
+
</cut>
Hi everyone,
We are shifting the ClangBuiltLinux mailing list from
clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev. Google
Groups has served us well but moving to lists.linux.dev allows for
easier archival (as we will be on lore.kernel.org automatically) and
allows for people to subscribe to us easier, as they only need an email
address, rather than a Google account.
Please follow these directions to subscribe to the new mailing list:
https://subspace.kernel.org/index.html#subscribing
Some more information about lists.linux.dev:
https://www.kernel.org/lists-linux-dev.htmlhttps://subspace.kernel.org/lists.linux.dev.html
I have added CI maintainers/mailing lists that send us regular reports
to this announcement. Please continue to send us emails about build
results, just switch the email from clang-built-linux(a)googlegroups.com
to llvm(a)lists.linux.dev so that they get archived as a part of lore and
can be easily searched, especially with the upcoming
https://x-lore.kernel.org/all/.
I will send a patch shortly to update MAINTAINERS.
Cheers,
Nathan
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO
Culprit:
<cut>
commit 880822255e21179e9706ebaf77fff9111d9d3844
Author: Tobias Gysi <gysit(a)google.com>
Date: Wed Mar 24 14:22:17 2021 +0000
[mlir][linalg] Do not call region builder during vectorization.
All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete.
Differential Revision: https://reviews.llvm.org/D99168
</cut>
Results regressed to (for first_bad == 880822255e21179e9706ebaf77fff9111d9d3844)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2_LTO_marm artifacts/build-880822255e21179e9706ebaf77fff9111d9d3844/results_id:
1
# 456.hmmer,hmmer_base.default regressed by 104
from (for last_good == 92417ebbd10382436136ed5e755be567304ac139)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2_LTO_marm artifacts/build-92417ebbd10382436136ed5e755be567304ac139/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4267
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4268
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-880822255e21179e9706ebaf77fff9111d9d3844
cd investigate-llvm-880822255e21179e9706ebaf77fff9111d9d3844
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 880822255e21179e9706ebaf77fff9111d9d3844
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 92417ebbd10382436136ed5e755be567304ac139
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 880822255e21179e9706ebaf77fff9111d9d3844
Author: Tobias Gysi <gysit(a)google.com>
Date: Wed Mar 24 14:22:17 2021 +0000
[mlir][linalg] Do not call region builder during vectorization.
All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete.
Differential Revision: https://reviews.llvm.org/D99168
---
.../Dialect/Linalg/Transforms/Vectorization.cpp | 31 ++++++----------------
1 file changed, 8 insertions(+), 23 deletions(-)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index d4581013ae69..10562d68a9e0 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -288,7 +288,7 @@ static AffineMap getTransferReadMap(LinalgOp linalgOp, unsigned argIndex) {
/// Generic vectorization function that rewrites the body of a `linalgOp` into
/// vector form. Generic vectorization proceeds as follows:
-/// 1. The region for the linalg op is created if necessary.
+/// 1. Verify the `linalgOp` has one non-empty region.
/// 2. Values defined above the region are mapped to themselves and will be
/// broadcasted on a per-need basis by their consumers.
/// 3. Each region argument is vectorized into a vector.transfer_read (or 0-d
@@ -299,36 +299,21 @@ static AffineMap getTransferReadMap(LinalgOp linalgOp, unsigned argIndex) {
LogicalResult vectorizeAsLinalgGeneric(
OpBuilder &builder, LinalgOp linalgOp, SmallVectorImpl<Value> &newResults,
ArrayRef<CustomVectorizationHook> customVectorizationHooks = {}) {
- // 1. Certain Linalg ops do not have a region but only a region builder.
- // If so, build the region so we can vectorize.
- std::unique_ptr<Region> owningRegion;
- Region *region;
- if (linalgOp->getNumRegions() > 0) {
- region = &linalgOp->getRegion(0);
- } else {
- // RAII avoid remaining in block.
- OpBuilder::InsertionGuard g(builder);
- owningRegion = std::make_unique<Region>();
- region = owningRegion.get();
- Block *block = builder.createBlock(region);
- auto elementTypes = llvm::to_vector<4>(
- llvm::map_range(linalgOp.getShapedOperandTypes(),
- [](ShapedType t) { return t.getElementType(); }));
- block->addArguments(elementTypes);
- linalgOp.getRegionBuilder()(*block, /*captures=*/{});
- }
- Block *block = ®ion->front();
+ // 1. Fail to vectorize if the operation does not have one non-empty region.
+ if (linalgOp->getNumRegions() != 1 || linalgOp->getRegion(0).empty())
+ return failure();
+ auto &block = linalgOp->getRegion(0).front();
BlockAndValueMapping bvm;
// 2. Values defined above the region can only be broadcast for now. Make them
// map to themselves.
llvm::SetVector<Value> valuesSet;
- mlir::getUsedValuesDefinedAbove(*region, valuesSet);
+ mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);
bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());
// 3. Turn all BBArgs into vector.transfer_read / load.
SmallVector<AffineMap> indexings;
- for (auto bbarg : block->getArguments()) {
+ for (auto bbarg : block.getArguments()) {
Value vectorArg = linalgOp.getShapedOperand(bbarg.getArgNumber());
AffineMap map;
VectorType vectorType = extractVectorTypeFromShapedValue(vectorArg);
@@ -360,7 +345,7 @@ LogicalResult vectorizeAsLinalgGeneric(
hooks.push_back(vectorizeYield);
// 5. Iteratively call `vectorizeOneOp` to each op in the slice.
- for (Operation &op : block->getOperations()) {
+ for (Operation &op : block.getOperations()) {
VectorizationResult result = vectorizeOneOp(builder, &op, bvm, hooks);
if (result.status == VectorizationStatus::Failure) {
LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: failed to vectorize: " << op);
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut>
commit 643ce61fb3c2c730b7ecead4a489eaeef3f053ea
Author: Akira Hatanaka <ahatanaka(a)apple.com>
Date: Wed Aug 11 12:55:28 2021 -0700
[ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the
release call
findSafeStoreForStoreStrongContraction checks whether it's safe to move
the release call to the store by inspecting all instructions between the
two, but was ignoring retain instructions. This was causing objects to
be released and deallocated before they were retained.
rdar://81668577
</cut>
Results regressed to (for first_bad == 643ce61fb3c2c730b7ecead4a489eaeef3f053ea)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2_LTO artifacts/build-643ce61fb3c2c730b7ecead4a489eaeef3f053ea/results_id:
1
# 433.milc,milc_base.default regressed by 103
from (for last_good == 767496d19cb9a1fbba57ff08095faa161998ee36)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2_LTO artifacts/build-767496d19cb9a1fbba57ff08095faa161998ee36/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/4172
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/4171
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-643ce61fb3c2c730b7ecead4a489eaeef3f053ea
cd investigate-llvm-643ce61fb3c2c730b7ecead4a489eaeef3f053ea
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 643ce61fb3c2c730b7ecead4a489eaeef3f053ea
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 767496d19cb9a1fbba57ff08095faa161998ee36
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 643ce61fb3c2c730b7ecead4a489eaeef3f053ea
Author: Akira Hatanaka <ahatanaka(a)apple.com>
Date: Wed Aug 11 12:55:28 2021 -0700
[ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the
release call
findSafeStoreForStoreStrongContraction checks whether it's safe to move
the release call to the store by inspecting all instructions between the
two, but was ignoring retain instructions. This was causing objects to
be released and deallocated before they were retained.
rdar://81668577
---
llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp | 21 ++++++++++++---------
.../test/Transforms/ObjCARC/contract-storestrong.ll | 19 +++++++++++++++++++
2 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp b/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp
index 62161b5b6b40..577973c80601 100644
--- a/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp
+++ b/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp
@@ -226,13 +226,6 @@ static StoreInst *findSafeStoreForStoreStrongContraction(LoadInst *Load,
// of Inst.
ARCInstKind Class = GetBasicARCInstKind(Inst);
- // If Inst is an unrelated retain, we don't care about it.
- //
- // TODO: This is one area where the optimization could be made more
- // aggressive.
- if (IsRetain(Class))
- continue;
-
// If we have seen the store, but not the release...
if (Store) {
// We need to make sure that it is safe to move the release from its
@@ -248,8 +241,18 @@ static StoreInst *findSafeStoreForStoreStrongContraction(LoadInst *Load,
return nullptr;
}
- // Ok, now we know we have not seen a store yet. See if Inst can write to
- // our load location, if it can not, just ignore the instruction.
+ // Ok, now we know we have not seen a store yet.
+
+ // If Inst is a retain, we don't care about it as it doesn't prevent moving
+ // the load to the store.
+ //
+ // TODO: This is one area where the optimization could be made more
+ // aggressive.
+ if (IsRetain(Class))
+ continue;
+
+ // See if Inst can write to our load location, if it can not, just ignore
+ // the instruction.
if (!isModSet(AA->getModRefInfo(Inst, Loc)))
continue;
diff --git a/llvm/test/Transforms/ObjCARC/contract-storestrong.ll b/llvm/test/Transforms/ObjCARC/contract-storestrong.ll
index eff0a6fdf900..9c45e3334d83 100644
--- a/llvm/test/Transforms/ObjCARC/contract-storestrong.ll
+++ b/llvm/test/Transforms/ObjCARC/contract-storestrong.ll
@@ -256,6 +256,25 @@ define i8* @test13(i8* %a0, i8* %a1, i8** %addr, i8* %new) {
ret i8* %retained
}
+; Cannot form a storeStrong call because it's unsafe to move the release call to
+; the store.
+
+; CHECK-LABEL: define void @test14(
+; CHECK: %[[V0:.*]] = load i8*, i8** %a
+; CHECK: %[[V1:.*]] = call i8* @llvm.objc.retain(i8* %p)
+; CHECK: store i8* %[[V1]], i8** %a
+; CHECK: %[[V2:.*]] = call i8* @llvm.objc.retain(i8* %[[V0]])
+; CHECK: call void @llvm.objc.release(i8* %[[V2]])
+
+define void @test14(i8** %a, i8* %p) {
+ %v0 = load i8*, i8** %a, align 8
+ %v1 = call i8* @llvm.objc.retain(i8* %p)
+ store i8* %p, i8** %a, align 8
+ %v2 = call i8* @llvm.objc.retain(i8* %v0)
+ call void @llvm.objc.release(i8* %v0)
+ ret void
+}
+
!0 = !{}
; CHECK: attributes [[NUW]] = { nounwind }
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO
Culprit:
<cut>
commit 176379e0c8f9dbde2b357fb3b6a6802b83282e71
Author: Alex Zinenko <zinenko(a)google.com>
Date: Fri Feb 12 12:53:27 2021 +0100
[mlir] Use the interface-based translation for LLVM "intrinsic" dialects
Port the translation of five dialects that define LLVM IR intrinsics
(LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect
interface-based mechanism. This allows us to remove individual translations
that were created for each of these dialects and just use one common
MLIR-to-LLVM-IR translation that potentially supports all dialects instead,
based on what is registered and including any combination of translatable
dialects. This removal was one of the main goals of the refactoring.
To support the addition of GPU-related metadata, the translation interface is
extended with the `amendOperation` function that allows the interface
implementation to post-process any translated operation with dialect attributes
from the dialect for which the interface is implemented regardless of the
operation's dialect. This is currently applied to "kernel" functions, but can
be used to construct other metadata in dialect-specific ways without
necessarily affecting operations.
Depends On D96591, D96504
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96592
</cut>
Results regressed to (for first_bad == 176379e0c8f9dbde2b357fb3b6a6802b83282e71)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2_LTO_marm artifacts/build-176379e0c8f9dbde2b357fb3b6a6802b83282e71/results_id:
1
# 482.sphinx3,sphinx_livepretend_base.default regressed by 109
from (for last_good == 2d728bbff5c688284b8b8306ecfd3000b0ab8bb1)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O2_LTO_marm artifacts/build-2d728bbff5c688284b8b8306ecfd3000b0ab8bb1/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4128
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4125
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-176379e0c8f9dbde2b357fb3b6a6802b83282e71
cd investigate-llvm-176379e0c8f9dbde2b357fb3b6a6802b83282e71
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 176379e0c8f9dbde2b357fb3b6a6802b83282e71
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 2d728bbff5c688284b8b8306ecfd3000b0ab8bb1
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 176379e0c8f9dbde2b357fb3b6a6802b83282e71
Author: Alex Zinenko <zinenko(a)google.com>
Date: Fri Feb 12 12:53:27 2021 +0100
[mlir] Use the interface-based translation for LLVM "intrinsic" dialects
Port the translation of five dialects that define LLVM IR intrinsics
(LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect
interface-based mechanism. This allows us to remove individual translations
that were created for each of these dialects and just use one common
MLIR-to-LLVM-IR translation that potentially supports all dialects instead,
based on what is registered and including any combination of translatable
dialects. This removal was one of the main goals of the refactoring.
To support the addition of GPU-related metadata, the translation interface is
extended with the `amendOperation` function that allows the interface
implementation to post-process any translated operation with dialect attributes
from the dialect for which the interface is implemented regardless of the
operation's dialect. This is currently applied to "kernel" functions, but can
be used to construct other metadata in dialect-specific ways without
necessarily affecting operations.
Depends On D96591, D96504
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96592
---
.../test/Standalone/standalone-translate.mlir | 3 -
mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td | 13 ++-
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | 3 +-
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | 2 +-
mlir/include/mlir/InitAllTranslations.h | 10 --
mlir/include/mlir/Target/LLVMIR.h | 4 +-
.../LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h | 37 ++++++
.../LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h | 37 ++++++
.../LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h | 37 ++++++
.../Dialect/LLVMIR/LLVMToLLVMIRTranslation.h | 4 +-
.../LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h | 42 +++++++
.../Dialect/OpenMP/OpenMPToLLVMIRTranslation.h | 4 +-
.../Dialect/ROCDL/ROCDLToLLVMIRTranslation.h | 42 +++++++
.../mlir/Target/LLVMIR/LLVMTranslationInterface.h | 28 ++++-
.../include/mlir/Target/LLVMIR/ModuleTranslation.h | 32 ++++--
mlir/include/mlir/Target/NVVMIR.h | 39 -------
mlir/include/mlir/Target/ROCDLIR.h | 40 -------
mlir/lib/Target/CMakeLists.txt | 107 +----------------
mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp | 35 +++++-
mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp | 124 --------------------
mlir/lib/Target/LLVMIR/ConvertToROCDLIR.cpp | 127 ---------------------
mlir/lib/Target/LLVMIR/Dialect/CMakeLists.txt | 5 +
.../LLVMIR/Dialect/LLVMAVX512/CMakeLists.txt | 16 +++
.../LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.cpp | 33 ++++++
.../LLVMIR/Dialect/LLVMArmNeon/CMakeLists.txt | 16 +++
.../LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.cpp | 33 ++++++
.../LLVMIR/Dialect/LLVMArmSVE/CMakeLists.txt | 16 +++
.../LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.cpp | 33 ++++++
.../Dialect/LLVMIR/LLVMToLLVMIRTranslation.cpp | 71 ++++--------
mlir/lib/Target/LLVMIR/Dialect/NVVM/CMakeLists.txt | 16 +++
.../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp | 67 +++++++++++
.../lib/Target/LLVMIR/Dialect/ROCDL/CMakeLists.txt | 16 +++
.../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp | 69 +++++++++++
mlir/lib/Target/LLVMIR/LLVMAVX512Intr.cpp | 65 -----------
mlir/lib/Target/LLVMIR/LLVMArmNeonIntr.cpp | 65 -----------
mlir/lib/Target/LLVMIR/LLVMArmSVEIntr.cpp | 65 -----------
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | 65 +++++++----
mlir/test/Target/arm-neon.mlir | 2 +-
mlir/test/Target/arm-sve.mlir | 2 +-
mlir/test/Target/avx512.mlir | 2 +-
mlir/test/Target/nvvmir.mlir | 2 +-
mlir/test/Target/rocdl.mlir | 2 +-
mlir/test/lib/Transforms/CMakeLists.txt | 13 ++-
.../lib/Transforms/TestConvertGPUKernelToCubin.cpp | 25 +++-
.../lib/Transforms/TestConvertGPUKernelToHsaco.cpp | 22 +++-
mlir/tools/mlir-cuda-runner/mlir-cuda-runner.cpp | 6 +-
mlir/tools/mlir-rocm-runner/mlir-rocm-runner.cpp | 3 +
mlir/tools/mlir-tblgen/LLVMIRConversionGen.cpp | 10 +-
48 files changed, 750 insertions(+), 760 deletions(-)
diff --git a/mlir/examples/standalone/test/Standalone/standalone-translate.mlir b/mlir/examples/standalone/test/Standalone/standalone-translate.mlir
index 2a096c38e128..16d49785ee16 100644
--- a/mlir/examples/standalone/test/Standalone/standalone-translate.mlir
+++ b/mlir/examples/standalone/test/Standalone/standalone-translate.mlir
@@ -1,8 +1,5 @@
// RUN: standalone-translate --help | FileCheck %s
-// CHECK: --avx512-mlir-to-llvmir
// CHECK: --deserialize-spirv
// CHECK: --import-llvm
// CHECK: --mlir-to-llvmir
-// CHECK: --mlir-to-nvvmir
-// CHECK: --mlir-to-rocdlir
// CHECK: --serialize-spirv
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
index ad886e55b4e6..541f7ebfadfa 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
@@ -219,12 +219,13 @@ class ListIntSubst<string pattern, list<int> values> {
// or result in the operation.
def LLVM_IntrPatterns {
string operand =
- [{convertType(opInst.getOperand($0).getType())}];
+ [{moduleTranslation.convertType(opInst.getOperand($0).getType())}];
string result =
- [{convertType(opInst.getResult($0).getType())}];
+ [{moduleTranslation.convertType(opInst.getResult($0).getType())}];
string structResult =
- [{convertType(opInst.getResult(0).getType().cast<LLVM::LLVMStructType>()
- .getBody()[$0])}];
+ [{moduleTranslation.convertType(
+ opInst.getResult(0).getType().cast<LLVM::LLVMStructType>()
+ .getBody()[$0])}];
}
@@ -259,7 +260,7 @@ class LLVM_IntrOpBase<Dialect dialect, string opName, string enumName,
ListIntSubst<LLVM_IntrPatterns.operand,
overloadedOperands>.lst), ", ") # [{
});
- auto operands = lookupValues(opInst.getOperands());
+ auto operands = moduleTranslation.lookupValues(opInst.getOperands());
}] # !if(!gt(numResults, 0), "$res = ", "")
# [{builder.CreateCall(fn, operands);
}];
@@ -325,7 +326,7 @@ class LLVM_VectorReductionAcc<string mnem>
{ }] # !interleave(ListIntSubst<LLVM_IntrPatterns.operand, [1]>.lst,
", ") # [{
});
- auto operands = lookupValues(opInst.getOperands());
+ auto operands = moduleTranslation.lookupValues(opInst.getOperands());
llvm::FastMathFlags origFM = builder.getFastMathFlags();
llvm::FastMathFlags tempFM = origFM;
tempFM.setAllowReassoc($reassoc);
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
index 7a2152b9a481..7dca08a43f7a 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
@@ -1083,7 +1083,8 @@ def LLVM_UndefOp : LLVM_Op<"mlir.undef", [NoSideEffect]>,
def LLVM_ConstantOp
: LLVM_Op<"mlir.constant", [NoSideEffect]>,
- LLVM_Builder<"$res = getLLVMConstant($_resultType, $value, $_location);">
+ LLVM_Builder<[{$res = getLLVMConstant($_resultType, $value, $_location,
+ moduleTranslation);}]>
{
let summary = "Defines a constant of LLVM type.";
let description = [{
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index d5eec3cebb4a..cfb08ff465a2 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -175,7 +175,7 @@ def ROCDL_MubufStoreOp :
LLVM_Type:$glc,
LLVM_Type:$slc)>{
string llvmBuilder = [{
- auto vdataType = convertType(op.vdata().getType());
+ auto vdataType = moduleTranslation.convertType(op.vdata().getType());
createIntrinsicCall(builder,
llvm::Intrinsic::amdgcn_buffer_store, {$vdata, $rsrc, $vindex,
$offset, $glc, $slc}, {vdataType});
diff --git a/mlir/include/mlir/InitAllTranslations.h b/mlir/include/mlir/InitAllTranslations.h
index 16dd113d14cd..fc319c09a8c8 100644
--- a/mlir/include/mlir/InitAllTranslations.h
+++ b/mlir/include/mlir/InitAllTranslations.h
@@ -20,11 +20,6 @@ void registerFromLLVMIRTranslation();
void registerFromSPIRVTranslation();
void registerToLLVMIRTranslation();
void registerToSPIRVTranslation();
-void registerToNVVMIRTranslation();
-void registerToROCDLIRTranslation();
-void registerArmNeonToLLVMIRTranslation();
-void registerAVX512ToLLVMIRTranslation();
-void registerArmSVEToLLVMIRTranslation();
// This function should be called before creating any MLIRContext if one
// expects all the possible translations to be made available to the context
@@ -35,11 +30,6 @@ inline void registerAllTranslations() {
registerFromSPIRVTranslation();
registerToLLVMIRTranslation();
registerToSPIRVTranslation();
- registerToNVVMIRTranslation();
- registerToROCDLIRTranslation();
- registerArmNeonToLLVMIRTranslation();
- registerAVX512ToLLVMIRTranslation();
- registerArmSVEToLLVMIRTranslation();
return true;
}();
(void)initOnce;
diff --git a/mlir/include/mlir/Target/LLVMIR.h b/mlir/include/mlir/Target/LLVMIR.h
index 2050c63df73f..10bec79f3506 100644
--- a/mlir/include/mlir/Target/LLVMIR.h
+++ b/mlir/include/mlir/Target/LLVMIR.h
@@ -28,14 +28,14 @@ namespace mlir {
class DialectRegistry;
class OwningModuleRef;
class MLIRContext;
-class ModuleOp;
+class Operation;
/// Convert the given MLIR module into LLVM IR. The LLVM context is extracted
/// from the registered LLVM IR dialect. In case of error, report it
/// to the error handler registered with the MLIR context, if any (obtained from
/// the MLIR module), and return `nullptr`.
std::unique_ptr<llvm::Module>
-translateModuleToLLVMIR(ModuleOp m, llvm::LLVMContext &llvmContext,
+translateModuleToLLVMIR(Operation *op, llvm::LLVMContext &llvmContext,
StringRef name = "LLVMDialectModule");
/// Convert the given LLVM module into MLIR's LLVM dialect. The LLVM context is
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h
new file mode 100644
index 000000000000..e591a95f0357
--- /dev/null
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h
@@ -0,0 +1,37 @@
+//===- LLVMAVX512ToLLVMIRTranslation.h - LLVMAVX512 to LLVM IR --*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the dialect interface for translating the LLVMAVX512
+// dialect to LLVM IR.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_TARGET_LLVMIR_DIALECT_LLVMAVX512_LLVMAVX512TOLLVMIRTRANSLATION_H
+#define MLIR_TARGET_LLVMIR_DIALECT_LLVMAVX512_LLVMAVX512TOLLVMIRTRANSLATION_H
+
+#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h"
+
+namespace mlir {
+
+/// Implementation of the dialect interface that converts operations belonging
+/// to the LLVMAVX512 dialect to LLVM IR.
+class LLVMAVX512DialectLLVMIRTranslationInterface
+ : public LLVMTranslationDialectInterface {
+public:
+ using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;
+
+ /// Translates the given operation to LLVM IR using the provided IR builder
+ /// and saving the state in `moduleTranslation`.
+ LogicalResult
+ convertOperation(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+};
+
+} // namespace mlir
+
+#endif // MLIR_TARGET_LLVMIR_DIALECT_LLVMAVX512_LLVMAVX512TOLLVMIRTRANSLATION_H
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h
new file mode 100644
index 000000000000..7d268d155083
--- /dev/null
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h
@@ -0,0 +1,37 @@
+//===- LLVMArmNeonToLLVMIRTranslation.h - LLVMArmNeon to LLVMIR -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the dialect interface for translating the LLVMArmNeon
+// dialect to LLVM IR.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_TARGET_LLVMIR_DIALECT_LLVMARMNEON_LLVMARMNEONTOLLVMIRTRANSLATION_H
+#define MLIR_TARGET_LLVMIR_DIALECT_LLVMARMNEON_LLVMARMNEONTOLLVMIRTRANSLATION_H
+
+#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h"
+
+namespace mlir {
+
+/// Implementation of the dialect interface that converts operations belonging
+/// to the LLVMArmNeon dialect to LLVM IR.
+class LLVMArmNeonDialectLLVMIRTranslationInterface
+ : public LLVMTranslationDialectInterface {
+public:
+ using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;
+
+ /// Translates the given operation to LLVM IR using the provided IR builder
+ /// and saving the state in `moduleTranslation`.
+ LogicalResult
+ convertOperation(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+};
+
+} // namespace mlir
+
+#endif // MLIR_TARGET_LLVMIR_DIALECT_LLVMARMNEON_LLVMARMNEONTOLLVMIRTRANSLATION_H
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h
new file mode 100644
index 000000000000..9d4d05b9b9bd
--- /dev/null
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h
@@ -0,0 +1,37 @@
+//===- LLVMArmSVEToLLVMIRTranslation.h - LLVMArmSVE to LLVM IR --*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the dialect interface for translating the LLVMArmSVE
+// dialect to LLVM IR.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_TARGET_LLVMIR_DIALECT_LLVMARMSVE_LLVMARMSVETOLLVMIRTRANSLATION_H
+#define MLIR_TARGET_LLVMIR_DIALECT_LLVMARMSVE_LLVMARMSVETOLLVMIRTRANSLATION_H
+
+#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h"
+
+namespace mlir {
+
+/// Implementation of the dialect interface that converts operations belonging
+/// to the LLVMArmSVE dialect to LLVM IR.
+class LLVMArmSVEDialectLLVMIRTranslationInterface
+ : public LLVMTranslationDialectInterface {
+public:
+ using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;
+
+ /// Translates the given operation to LLVM IR using the provided IR builder
+ /// and saving the state in `moduleTranslation`.
+ LogicalResult
+ convertOperation(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+};
+
+} // namespace mlir
+
+#endif // MLIR_TARGET_LLVMIR_DIALECT_LLVMARMSVE_LLVMARMSVETOLLVMIRTRANSLATION_H
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h
index 8b72cedf1ff2..2af76e092917 100644
--- a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h
@@ -18,8 +18,8 @@
namespace mlir {
-/// Implementation of the dialect interface that converts operations beloning to
-/// the LLVM dialect to LLVM IR.
+/// Implementation of the dialect interface that converts operations belonging
+/// to the LLVM dialect to LLVM IR.
class LLVMDialectLLVMIRTranslationInterface
: public LLVMTranslationDialectInterface {
public:
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h
new file mode 100644
index 000000000000..3a8a01df84b0
--- /dev/null
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h
@@ -0,0 +1,42 @@
+//===- NVVMToLLVMIRTranslation.h - NVVM to LLVM IR --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the dialect interface for translating the NVVM
+// dialect to LLVM IR.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_TARGET_LLVMIR_DIALECT_NVVM_NVVMTOLLVMIRTRANSLATION_H
+#define MLIR_TARGET_LLVMIR_DIALECT_NVVM_NVVMTOLLVMIRTRANSLATION_H
+
+#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h"
+
+namespace mlir {
+
+/// Implementation of the dialect interface that converts operations belonging
+/// to the NVVM dialect to LLVM IR.
+class NVVMDialectLLVMIRTranslationInterface
+ : public LLVMTranslationDialectInterface {
+public:
+ using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;
+
+ /// Translates the given operation to LLVM IR using the provided IR builder
+ /// and saving the state in `moduleTranslation`.
+ LogicalResult
+ convertOperation(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+
+ /// Attaches module-level metadata for functions marked as kernels.
+ LogicalResult
+ amendOperation(Operation *op, NamedAttribute attribute,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+};
+
+} // namespace mlir
+
+#endif // MLIR_TARGET_LLVMIR_DIALECT_NVVM_NVVMTOLLVMIRTRANSLATION_H
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h
index 7d9eeea9462e..07721d089689 100644
--- a/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h
@@ -18,8 +18,8 @@
namespace mlir {
-/// Implementation of the dialect interface that converts operations beloning to
-/// the OpenMP dialect to LLVM IR.
+/// Implementation of the dialect interface that converts operations belonging
+/// to the OpenMP dialect to LLVM IR.
class OpenMPDialectLLVMIRTranslationInterface
: public LLVMTranslationDialectInterface {
public:
diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h
new file mode 100644
index 000000000000..e2211a59098f
--- /dev/null
+++ b/mlir/include/mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h
@@ -0,0 +1,42 @@
+//===- ROCDLToLLVMIRTranslation.h - ROCDL to LLVM IR ------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the dialect interface for translating the ROCDL
+// dialect to LLVM IR.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_TARGET_LLVMIR_DIALECT_ROCDL_ROCDLTOLLVMIRTRANSLATION_H
+#define MLIR_TARGET_LLVMIR_DIALECT_ROCDL_ROCDLTOLLVMIRTRANSLATION_H
+
+#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h"
+
+namespace mlir {
+
+/// Implementation of the dialect interface that converts operations belonging
+/// to the ROCDL dialect to LLVM IR.
+class ROCDLDialectLLVMIRTranslationInterface
+ : public LLVMTranslationDialectInterface {
+public:
+ using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;
+
+ /// Translates the given operation to LLVM IR using the provided IR builder
+ /// and saving the state in `moduleTranslation`.
+ LogicalResult
+ convertOperation(Operation *op, llvm::IRBuilderBase &builder,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+
+ /// Attaches module-level metadata for functions marked as kernels.
+ LogicalResult
+ amendOperation(Operation *op, NamedAttribute attribute,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+};
+
+} // namespace mlir
+
+#endif // MLIR_TARGET_LLVMIR_DIALECT_ROCDL_ROCDLTOLLVMIRTRANSLATION_H
diff --git a/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h b/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h
index 0063beac2977..0c563e6e7d39 100644
--- a/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h
+++ b/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h
@@ -13,12 +13,14 @@
#ifndef MLIR_TARGET_LLVMIR_LLVMTRANSLATIONINTERFACE_H
#define MLIR_TARGET_LLVMIR_LLVMTRANSLATIONINTERFACE_H
+#include "mlir/IR/Attributes.h"
#include "mlir/IR/DialectInterface.h"
+#include "mlir/IR/Identifier.h"
#include "mlir/Support/LogicalResult.h"
namespace llvm {
class IRBuilderBase;
-}
+} // namespace llvm
namespace mlir {
namespace LLVM {
@@ -43,6 +45,18 @@ public:
LLVM::ModuleTranslation &moduleTranslation) const {
return failure();
}
+
+ /// Hook for derived dialect interface to act on an operation that has dialect
+ /// attributes from the derived dialect (the operation itself may be from a
+ /// different dialect). This gets called after the operation has been
+ /// translated. The hook is expected to use moduleTranslation to look up the
+ /// translation results and amend the corresponding IR constructs. Does
+ /// nothing and succeeds by default.
+ virtual LogicalResult
+ amendOperation(Operation *op, NamedAttribute attribute,
+ LLVM::ModuleTranslation &moduleTranslation) const {
+ return success();
+ }
};
/// Interface collection for translation to LLVM IR, dispatches to a concrete
@@ -61,6 +75,18 @@ public:
return iface->convertOperation(op, builder, moduleTranslation);
return failure();
}
+
+ /// Acts on the given operation using the interface implemented by the dialect
+ /// of one of the operation's dialect attributes.
+ virtual LogicalResult
+ amendOperation(Operation *op, NamedAttribute attribute,
+ LLVM::ModuleTranslation &moduleTranslation) const {
+ if (const LLVMTranslationDialectInterface *iface =
+ getInterfaceFor(attribute.first.getDialect())) {
+ return iface->amendOperation(op, attribute, moduleTranslation);
+ }
+ return success();
+ }
};
} // namespace mlir
diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h b/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
index 03b7f5336461..004524f33fa4 100644
--- a/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
+++ b/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
@@ -142,18 +142,11 @@ public:
/// Looks up remapped a list of remapped values.
SmallVector<llvm::Value *, 8> lookupValues(ValueRange values);
- /// Create an LLVM IR constant of `llvmType` from the MLIR attribute `attr`.
- /// This currently supports integer, floating point, splat and dense element
- /// attributes and combinations thereof. In case of error, report it to `loc`
- /// and return nullptr.
- llvm::Constant *getLLVMConstant(llvm::Type *llvmType, Attribute attr,
- Location loc);
-
/// Returns the MLIR context of the module being translated.
MLIRContext &getContext() { return *mlirModule->getContext(); }
/// Returns the LLVM context in which the IR is being constructed.
- llvm::LLVMContext &getLLVMContext() { return llvmModule->getContext(); }
+ llvm::LLVMContext &getLLVMContext() const { return llvmModule->getContext(); }
/// Finds an LLVM IR global value that corresponds to the given MLIR operation
/// defining a global value.
@@ -184,6 +177,10 @@ public:
LogicalResult convertBlock(Block &bb, bool ignoreArguments,
llvm::IRBuilder<> &builder);
+ /// Gets the named metadata in the LLVM IR module being constructed, creating
+ /// it if it does not exist.
+ llvm::NamedMDNode *getOrInsertNamedModuleMetadata(StringRef name);
+
protected:
/// Translate the given MLIR module expressed in MLIR LLVM IR dialect into an
/// LLVM IR module. The MLIR LLVM IR dialect holds a pointer to an
@@ -208,6 +205,9 @@ private:
LogicalResult convertGlobals();
LogicalResult convertOneFunction(LLVMFuncOp func);
+ /// Translates dialect attributes attached to the given operation.
+ LogicalResult convertDialectAttributes(Operation *op);
+
/// Original and translated module.
Operation *mlirModule;
std::unique_ptr<llvm::Module> llvmModule;
@@ -228,6 +228,8 @@ private:
/// A stateful object used to translate types.
TypeToLLVMIRTranslator typeTranslator;
+ /// A dialect interface collection used for dispatching the translation to
+ /// specific dialects.
LLVMTranslationInterface iface;
/// Mappings between original and translated values, used for lookups.
@@ -249,6 +251,20 @@ void connectPHINodes(Region ®ion, const ModuleTranslation &state);
/// Get a topologically sorted list of blocks of the given region.
llvm::SetVector<Block *> getTopologicallySortedBlocks(Region ®ion);
+
+/// Create an LLVM IR constant of `llvmType` from the MLIR attribute `attr`.
+/// This currently supports integer, floating point, splat and dense element
+/// attributes and combinations thereof. In case of error, report it to `loc`
+/// and return nullptr.
+llvm::Constant *getLLVMConstant(llvm::Type *llvmType, Attribute attr,
+ Location loc,
+ const ModuleTranslation &moduleTranslation);
+
+/// Creates a call to an LLVM IR intrinsic function with the given arguments.
+llvm::Value *createIntrinsicCall(llvm::IRBuilderBase &builder,
+ llvm::Intrinsic::ID intrinsic,
+ ArrayRef<llvm::Value *> args = {},
+ ArrayRef<llvm::Type *> tys = {});
} // namespace detail
} // namespace LLVM
diff --git a/mlir/include/mlir/Target/NVVMIR.h b/mlir/include/mlir/Target/NVVMIR.h
deleted file mode 100644
index 0cd7688e275b..000000000000
--- a/mlir/include/mlir/Target/NVVMIR.h
+++ /dev/null
@@ -1,39 +0,0 @@
-//===- NVVMIR.h - MLIR to LLVM + NVVM IR conversion -------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file declares the entry point for the MLIR to LLVM + NVVM IR conversion.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef MLIR_TARGET_NVVMIR_H
-#define MLIR_TARGET_NVVMIR_H
-
-#include "llvm/ADT/StringRef.h"
-#include <memory>
-
-// Forward-declare LLVM classes.
-namespace llvm {
-class LLVMContext;
-class Module;
-} // namespace llvm
-
-namespace mlir {
-class Operation;
-
-/// Convert the given LLVM-module-like operation into NVVM IR. This conversion
-/// requires the registration of the LLVM IR dialect and will extract the LLVM
-/// context from the registered LLVM IR dialect. In case of error, report it to
-/// the error handler registered with the MLIR context, if any (obtained from
-/// the MLIR module), and return `nullptr`.
-std::unique_ptr<llvm::Module>
-translateModuleToNVVMIR(Operation *m, llvm::LLVMContext &llvmContext,
- llvm::StringRef name = "LLVMDialectModule");
-
-} // namespace mlir
-
-#endif // MLIR_TARGET_NVVMIR_H
diff --git a/mlir/include/mlir/Target/ROCDLIR.h b/mlir/include/mlir/Target/ROCDLIR.h
deleted file mode 100644
index e2cb812a173d..000000000000
--- a/mlir/include/mlir/Target/ROCDLIR.h
+++ /dev/null
@@ -1,40 +0,0 @@
-//===- ROCDLIR.h - MLIR to LLVM + ROCDL IR conversion -----------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file declares the entry point for the MLIR to LLVM + ROCDL IR
-// conversion.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef MLIR_TARGET_ROCDLIR_H
-#define MLIR_TARGET_ROCDLIR_H
-
-#include "llvm/ADT/StringRef.h"
-#include <memory>
-
-// Forward-declare LLVM classes.
-namespace llvm {
-class LLVMContext;
-class Module;
-} // namespace llvm
-
-namespace mlir {
-class Operation;
-
-/// Convert the given LLVM-module-like operation into ROCDL IR. This conversion
-/// requires the registration of the LLVM IR dialect and will extract the LLVM
-/// context from the registered LLVM IR dialect. In case of error, report it to
-/// the error handler registered with the MLIR context, if any (obtained from
-/// the MLIR module), and return `nullptr`.
-std::unique_ptr<llvm::Module>
-translateModuleToROCDLIR(Operation *m, llvm::LLVMContext &llvmContext,
- llvm::StringRef name = "LLVMDialectModule");
-
-} // namespace mlir
-
-#endif // MLIR_TARGET_ROCDLIR_H
diff --git a/mlir/lib/Target/CMakeLists.txt b/mlir/lib/Target/CMakeLists.txt
index e951ffade6aa..a23222d37ede 100644
--- a/mlir/lib/Target/CMakeLists.txt
+++ b/mlir/lib/Target/CMakeLists.txt
@@ -24,26 +24,6 @@ add_mlir_translation_library(MLIRTargetLLVMIRModuleTranslation
MLIRTranslation
)
-add_mlir_translation_library(MLIRTargetAVX512
- LLVMIR/LLVMAVX512Intr.cpp
-
- ADDITIONAL_HEADER_DIRS
- ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR
-
- DEPENDS
- MLIRLLVMAVX512ConversionsIncGen
-
- LINK_COMPONENTS
- Core
-
- LINK_LIBS PUBLIC
- MLIRIR
- MLIRLLVMAVX512
- MLIRLLVMIR
- MLIRTargetLLVMIR
- MLIRTargetLLVMIRModuleTranslation
- )
-
add_mlir_translation_library(MLIRTargetLLVMIR
LLVMIR/ConvertFromLLVMIR.cpp
LLVMIR/ConvertToLLVMIR.cpp
@@ -56,89 +36,12 @@ add_mlir_translation_library(MLIRTargetLLVMIR
IRReader
LINK_LIBS PUBLIC
+ MLIRLLVMArmNeonToLLVMIRTranslation
+ MLIRLLVMArmSVEToLLVMIRTranslation
+ MLIRLLVMAVX512ToLLVMIRTranslation
MLIRLLVMToLLVMIRTranslation
+ MLIRNVVMToLLVMIRTranslation
MLIROpenMPToLLVMIRTranslation
- MLIRTargetLLVMIRModuleTranslation
- )
-
-add_mlir_translation_library(MLIRTargetArmNeon
- LLVMIR/LLVMArmNeonIntr.cpp
-
- ADDITIONAL_HEADER_DIRS
- ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR
-
- DEPENDS
- MLIRLLVMArmNeonConversionsIncGen
-
- LINK_COMPONENTS
- Core
-
- LINK_LIBS PUBLIC
- MLIRIR
- MLIRLLVMArmNeon
- MLIRLLVMIR
- MLIRTargetLLVMIR
- MLIRTargetLLVMIRModuleTranslation
- )
-
-add_mlir_translation_library(MLIRTargetArmSVE
- LLVMIR/LLVMArmSVEIntr.cpp
-
- ADDITIONAL_HEADER_DIRS
- ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR
-
- DEPENDS
- MLIRLLVMArmSVEConversionsIncGen
-
- LINK_COMPONENTS
- Core
-
- LINK_LIBS PUBLIC
- MLIRIR
- MLIRLLVMArmSVE
- MLIRLLVMIR
- MLIRTargetLLVMIR
- MLIRTargetLLVMIRModuleTranslation
- )
-
-add_mlir_translation_library(MLIRTargetNVVMIR
- LLVMIR/ConvertToNVVMIR.cpp
-
- ADDITIONAL_HEADER_DIRS
- ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR
-
- DEPENDS
- intrinsics_gen
-
- LINK_COMPONENTS
- Core
-
- LINK_LIBS PUBLIC
- MLIRGPU
- MLIRIR
- MLIRLLVMIR
- MLIRNVVMIR
- MLIRTargetLLVMIR
- MLIRTargetLLVMIRModuleTranslation
- )
-
-add_mlir_translation_library(MLIRTargetROCDLIR
- LLVMIR/ConvertToROCDLIR.cpp
-
- ADDITIONAL_HEADER_DIRS
- ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR
-
- DEPENDS
- intrinsics_gen
-
- LINK_COMPONENTS
- Core
-
- LINK_LIBS PUBLIC
- MLIRGPU
- MLIRIR
- MLIRLLVMIR
- MLIRROCDLIR
- MLIRTargetLLVMIR
+ MLIRROCDLToLLVMIRTranslation
MLIRTargetLLVMIRModuleTranslation
)
diff --git a/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp b/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp
index 6b30748cc79b..42391513bacf 100644
--- a/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp
+++ b/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp
@@ -12,9 +12,19 @@
#include "mlir/Target/LLVMIR.h"
+#include "mlir/Dialect/LLVMIR/LLVMAVX512Dialect.h"
+#include "mlir/Dialect/LLVMIR/LLVMArmNeonDialect.h"
+#include "mlir/Dialect/LLVMIR/LLVMArmSVEDialect.h"
+#include "mlir/Dialect/LLVMIR/NVVMDialect.h"
+#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h"
+#include "mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h"
+#include "mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"
+#include "mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h"
+#include "mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/ModuleTranslation.h"
#include "mlir/Translation.h"
@@ -26,14 +36,14 @@
using namespace mlir;
std::unique_ptr<llvm::Module>
-mlir::translateModuleToLLVMIR(ModuleOp m, llvm::LLVMContext &llvmContext,
+mlir::translateModuleToLLVMIR(Operation *op, llvm::LLVMContext &llvmContext,
StringRef name) {
auto llvmModule =
- LLVM::ModuleTranslation::translateModule<>(m, llvmContext, name);
+ LLVM::ModuleTranslation::translateModule<>(op, llvmContext, name);
if (!llvmModule)
- emitError(m.getLoc(), "Fail to convert MLIR to LLVM IR");
+ emitError(op->getLoc(), "Fail to convert MLIR to LLVM IR");
else if (verifyModule(*llvmModule))
- emitError(m.getLoc(), "LLVM IR fails to verify");
+ emitError(op->getLoc(), "LLVM IR fails to verify");
return llvmModule;
}
@@ -70,9 +80,24 @@ void registerToLLVMIRTranslation() {
return success();
},
[](DialectRegistry ®istry) {
- registry.insert<omp::OpenMPDialect>();
+ registry.insert<omp::OpenMPDialect, LLVM::LLVMAVX512Dialect,
+ LLVM::LLVMArmSVEDialect, LLVM::LLVMArmNeonDialect,
+ NVVM::NVVMDialect, ROCDL::ROCDLDialect>();
registry.addDialectInterface<omp::OpenMPDialect,
OpenMPDialectLLVMIRTranslationInterface>();
+ registry
+ .addDialectInterface<LLVM::LLVMAVX512Dialect,
+ LLVMAVX512DialectLLVMIRTranslationInterface>();
+ registry.addDialectInterface<
+ LLVM::LLVMArmNeonDialect,
+ LLVMArmNeonDialectLLVMIRTranslationInterface>();
+ registry
+ .addDialectInterface<LLVM::LLVMArmSVEDialect,
+ LLVMArmSVEDialectLLVMIRTranslationInterface>();
+ registry.addDialectInterface<NVVM::NVVMDialect,
+ NVVMDialectLLVMIRTranslationInterface>();
+ registry.addDialectInterface<ROCDL::ROCDLDialect,
+ ROCDLDialectLLVMIRTranslationInterface>();
registerLLVMDialectTranslation(registry);
});
}
diff --git a/mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp b/mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp
deleted file mode 100644
index 7aee913a27d7..000000000000
--- a/mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp
+++ /dev/null
@@ -1,124 +0,0 @@
-//===- ConvertToNVVMIR.cpp - MLIR to LLVM IR conversion -------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file implements a translation between the MLIR LLVM + NVVM dialects and
-// LLVM IR with NVVM intrinsics and metadata.
-//
-//===----------------------------------------------------------------------===//
-
-#include "mlir/Target/NVVMIR.h"
-
-#include "mlir/Dialect/GPU/GPUDialect.h"
-#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
-#include "mlir/Dialect/LLVMIR/NVVMDialect.h"
-#include "mlir/IR/BuiltinOps.h"
-#include "mlir/Target/LLVMIR.h"
-#include "mlir/Target/LLVMIR/ModuleTranslation.h"
-#include "mlir/Translation.h"
-
-#include "llvm/ADT/StringRef.h"
-#include "llvm/IR/IntrinsicsNVPTX.h"
-#include "llvm/IR/Module.h"
-#include "llvm/Support/ToolOutputFile.h"
-
-using namespace mlir;
-
-static llvm::Value *createIntrinsicCall(llvm::IRBuilder<> &builder,
- llvm::Intrinsic::ID intrinsic,
- ArrayRef<llvm::Value *> args = {}) {
- llvm::Module *module = builder.GetInsertBlock()->getModule();
- llvm::Function *fn = llvm::Intrinsic::getDeclaration(module, intrinsic);
- return builder.CreateCall(fn, args);
-}
-
-static llvm::Intrinsic::ID getShflBflyIntrinsicId(llvm::Type *resultType,
- bool withPredicate) {
- if (withPredicate) {
- resultType = cast<llvm::StructType>(resultType)->getElementType(0);
- return resultType->isFloatTy() ? llvm::Intrinsic::nvvm_shfl_sync_bfly_f32p
- : llvm::Intrinsic::nvvm_shfl_sync_bfly_i32p;
- }
- return resultType->isFloatTy() ? llvm::Intrinsic::nvvm_shfl_sync_bfly_f32
- : llvm::Intrinsic::nvvm_shfl_sync_bfly_i32;
-}
-
-namespace {
-class ModuleTranslation : public LLVM::ModuleTranslation {
-public:
- using LLVM::ModuleTranslation::ModuleTranslation;
-
-protected:
- LogicalResult convertOperation(Operation &opInst,
- llvm::IRBuilder<> &builder) override {
-
-#include "mlir/Dialect/LLVMIR/NVVMConversions.inc"
-
- return LLVM::ModuleTranslation::convertOperation(opInst, builder);
- }
-
- /// Allow access to the constructor.
- friend LLVM::ModuleTranslation;
-};
-} // namespace
-
-std::unique_ptr<llvm::Module>
-mlir::translateModuleToNVVMIR(Operation *m, llvm::LLVMContext &llvmContext,
- StringRef name) {
- // Register the translation to LLVM IR if nobody else did before. This may
- // happen if this translation is called inside a pass pipeline that converts
- // GPU dialects to binary blobs without translating the rest of the code.
- registerLLVMDialectTranslation(*m->getContext());
-
- auto llvmModule = LLVM::ModuleTranslation::translateModule<ModuleTranslation>(
- m, llvmContext, name);
- if (!llvmModule)
- return llvmModule;
-
- // Insert the nvvm.annotations kernel so that the NVVM backend recognizes the
- // function as a kernel.
- for (auto func :
- ModuleTranslation::getModuleBody(m).getOps<LLVM::LLVMFuncOp>()) {
- if (!func->getAttrOfType<UnitAttr>(
- NVVM::NVVMDialect::getKernelFuncAttrName()))
- continue;
-
- auto *llvmFunc = llvmModule->getFunction(func.getName());
-
- llvm::Metadata *llvmMetadata[] = {
- llvm::ValueAsMetadata::get(llvmFunc),
- llvm::MDString::get(llvmModule->getContext(), "kernel"),
- llvm::ValueAsMetadata::get(llvm::ConstantInt::get(
- llvm::Type::getInt32Ty(llvmModule->getContext()), 1))};
- llvm::MDNode *llvmMetadataNode =
- llvm::MDNode::get(llvmModule->getContext(), llvmMetadata);
- llvmModule->getOrInsertNamedMetadata("nvvm.annotations")
- ->addOperand(llvmMetadataNode);
- }
-
- return llvmModule;
-}
-
-namespace mlir {
-void registerToNVVMIRTranslation() {
- TranslateFromMLIRRegistration registration(
- "mlir-to-nvvmir",
- [](ModuleOp module, raw_ostream &output) {
- llvm::LLVMContext llvmContext;
- auto llvmModule = mlir::translateModuleToNVVMIR(module, llvmContext);
- if (!llvmModule)
</cut>
Progress:
* UM-2 [QEMU upstream maintainership]
+ We needed an rc4 (as usual)
+ Tried to work through some of my code review patchlog, notably
some big alignment-related series from RTH
+ Sent patchseries for some small things:
+ implement last few bits of HSTR trap-to-hypervisor functionality
+ actually take an exception if PSTATE.IL gets set
+ don't assert if user asks for both an EL3 guest CPU and KVM
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ worked through rth's code review comments for fp insn patches
these are now ready to send out once QEMU makes its 6.1 release
and the previous slice of reviewed patches can get into the tree
-- PMM
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO
Culprit:
<cut>
commit 310b35304cdf5a230c042904655583c5532d3e91
Author: Rong Xu <xur(a)google.com>
Date: Tue Feb 16 10:53:38 2021 -0800
[SampleFDO][NFC] Refactor SampleProfile.cpp
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
Differential Revision: https://reviews.llvm.org/D96455
</cut>
Results regressed to (for first_bad == 310b35304cdf5a230c042904655583c5532d3e91)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO_marm artifacts/build-310b35304cdf5a230c042904655583c5532d3e91/results_id:
1
# 482.sphinx3,sphinx_livepretend_base.default regressed by 106
from (for last_good == cddc53ef088b68586094c9841a76b41bee3994a4)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO_marm artifacts/build-cddc53ef088b68586094c9841a76b41bee3994a4/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/3917
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/3930
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-310b35304cdf5a230c042904655583c5532d3e91
cd investigate-llvm-310b35304cdf5a230c042904655583c5532d3e91
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 310b35304cdf5a230c042904655583c5532d3e91
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach cddc53ef088b68586094c9841a76b41bee3994a4
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 310b35304cdf5a230c042904655583c5532d3e91
Author: Rong Xu <xur(a)google.com>
Date: Tue Feb 16 10:53:38 2021 -0800
[SampleFDO][NFC] Refactor SampleProfile.cpp
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
Differential Revision: https://reviews.llvm.org/D96455
---
.../llvm/ProfileData/SampleProfileLoaderBaseImpl.h | 862 +++++++++++++++++
.../llvm/ProfileData/SampleProfileLoaderBaseUtil.h | 97 ++
llvm/lib/ProfileData/CMakeLists.txt | 1 +
.../ProfileData/SampleProfileLoaderBaseUtil.cpp | 192 ++++
llvm/lib/Transforms/IPO/SampleProfile.cpp | 1001 +-------------------
5 files changed, 1161 insertions(+), 992 deletions(-)
diff --git a/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h
new file mode 100644
index 000000000000..f02bacb6edc3
--- /dev/null
+++ b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h
@@ -0,0 +1,862 @@
+////===- SampleProfileLoadBaseImpl.h - Profile loader base impl --*- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// This file provides the interface for the sampled PGO profile loader base
+/// implementation.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H
+#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/PostDominators.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/DebugInfoMetadata.h"
+#include "llvm/IR/DebugLoc.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/ProfileData/SampleProf.h"
+#include "llvm/ProfileData/SampleProfReader.h"
+#include "llvm/ProfileData/SampleProfileLoaderBaseUtil.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/GenericDomTree.h"
+#include "llvm/Support/raw_ostream.h"
+
+namespace llvm {
+using namespace llvm;
+using namespace sampleprof;
+using ProfileCount = Function::ProfileCount;
+namespace sampleprofutil {
+bool callsiteIsHot(const SampleCoverageTracker *CT,
+ const FunctionSamples *CallsiteFS, ProfileSummaryInfo *PSI,
+ bool ProfAccForSymsInList);
+} // namespace sampleprofutil
+using namespace sampleprofutil;
+
+#define DEBUG_TYPE "sample-profile-impl"
+
+using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;
+using EquivalenceClassMap = DenseMap<const BasicBlock *, const BasicBlock *>;
+using Edge = std::pair<const BasicBlock *, const BasicBlock *>;
+using EdgeWeightMap = DenseMap<Edge, uint64_t>;
+using BlockEdgeMap =
+ DenseMap<const BasicBlock *, SmallVector<const BasicBlock *, 8>>;
+
+extern cl::opt<unsigned> SampleProfileMaxPropagateIterations;
+extern cl::opt<unsigned> SampleProfileRecordCoverage;
+extern cl::opt<unsigned> SampleProfileSampleCoverage;
+extern cl::opt<bool> NoWarnSampleUnused;
+
+class SampleProfileLoaderBaseImpl {
+public:
+ SampleProfileLoaderBaseImpl(std::string Name) : Filename(Name) {}
+ void dump() { Reader->dump(); }
+
+protected:
+ friend class SampleCoverageTracker;
+
+ unsigned getFunctionLoc(Function &F);
+ virtual ErrorOr<uint64_t> getInstWeight(const Instruction &Inst);
+ ErrorOr<uint64_t> getInstWeightImpl(const Instruction &Inst);
+ ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB);
+ mutable DenseMap<const DILocation *, const FunctionSamples *>
+ DILocation2SampleMap;
+ virtual const FunctionSamples *
+ findFunctionSamples(const Instruction &I) const;
+ void printEdgeWeight(raw_ostream &OS, Edge E);
+ void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const;
+ void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);
+ bool computeBlockWeights(Function &F);
+ void findEquivalenceClasses(Function &F);
+ template <bool IsPostDom>
+ void findEquivalencesFor(BasicBlock *BB1, ArrayRef<BasicBlock *> Descendants,
+ DominatorTreeBase<BasicBlock, IsPostDom> *DomTree);
+
+ void propagateWeights(Function &F);
+ uint64_t visitEdge(Edge E, unsigned *NumUnknownEdges, Edge *UnknownEdge);
+ void buildEdges(Function &F);
+ bool propagateThroughEdges(Function &F, bool UpdateBlockCount);
+ void clearFunctionData();
+ void computeDominanceAndLoopInfo(Function &F);
+ bool
+ computeAndPropagateWeights(Function &F,
+ const DenseSet<GlobalValue::GUID> &InlinedGUIDs);
+ void emitCoverageRemarks(Function &F);
+
+ /// Map basic blocks to their computed weights.
+ ///
+ /// The weight of a basic block is defined to be the maximum
+ /// of all the instruction weights in that block.
+ BlockWeightMap BlockWeights;
+
+ /// Map edges to their computed weights.
+ ///
+ /// Edge weights are computed by propagating basic block weights in
+ /// SampleProfile::propagateWeights.
+ EdgeWeightMap EdgeWeights;
+
+ /// Set of visited blocks during propagation.
+ SmallPtrSet<const BasicBlock *, 32> VisitedBlocks;
+
+ /// Set of visited edges during propagation.
+ SmallSet<Edge, 32> VisitedEdges;
+
+ /// Equivalence classes for block weights.
+ ///
+ /// Two blocks BB1 and BB2 are in the same equivalence class if they
+ /// dominate and post-dominate each other, and they are in the same loop
+ /// nest. When this happens, the two blocks are guaranteed to execute
+ /// the same number of times.
+ EquivalenceClassMap EquivalenceClass;
+
+ /// Dominance, post-dominance and loop information.
+ std::unique_ptr<DominatorTree> DT;
+ std::unique_ptr<PostDominatorTree> PDT;
+ std::unique_ptr<LoopInfo> LI;
+
+ /// Predecessors for each basic block in the CFG.
+ BlockEdgeMap Predecessors;
+
+ /// Successors for each basic block in the CFG.
+ BlockEdgeMap Successors;
+
+ /// Profile coverage tracker.
+ SampleCoverageTracker CoverageTracker;
+
+ /// Profile reader object.
+ std::unique_ptr<SampleProfileReader> Reader;
+
+ /// Samples collected for the body of this function.
+ FunctionSamples *Samples = nullptr;
+
+ /// Name of the profile file to load.
+ std::string Filename;
+
+ /// Profile Summary Info computed from sample profile.
+ ProfileSummaryInfo *PSI = nullptr;
+
+ /// Optimization Remark Emitter used to emit diagnostic remarks.
+ OptimizationRemarkEmitter *ORE = nullptr;
+};
+
+/// Clear all the per-function data used to load samples and propagate weights.
+void SampleProfileLoaderBaseImpl::clearFunctionData() {
+ BlockWeights.clear();
+ EdgeWeights.clear();
+ VisitedBlocks.clear();
+ VisitedEdges.clear();
+ EquivalenceClass.clear();
+ DT = nullptr;
+ PDT = nullptr;
+ LI = nullptr;
+ Predecessors.clear();
+ Successors.clear();
+ CoverageTracker.clear();
+}
+
+#ifndef NDEBUG
+/// Print the weight of edge \p E on stream \p OS.
+///
+/// \param OS Stream to emit the output to.
+/// \param E Edge to print.
+void SampleProfileLoaderBaseImpl::printEdgeWeight(raw_ostream &OS, Edge E) {
+ OS << "weight[" << E.first->getName() << "->" << E.second->getName()
+ << "]: " << EdgeWeights[E] << "\n";
+}
+
+/// Print the equivalence class of block \p BB on stream \p OS.
+///
+/// \param OS Stream to emit the output to.
+/// \param BB Block to print.
+void SampleProfileLoaderBaseImpl::printBlockEquivalence(raw_ostream &OS,
+ const BasicBlock *BB) {
+ const BasicBlock *Equiv = EquivalenceClass[BB];
+ OS << "equivalence[" << BB->getName()
+ << "]: " << ((Equiv) ? EquivalenceClass[BB]->getName() : "NONE") << "\n";
+}
+
+/// Print the weight of block \p BB on stream \p OS.
+///
+/// \param OS Stream to emit the output to.
+/// \param BB Block to print.
+void SampleProfileLoaderBaseImpl::printBlockWeight(raw_ostream &OS,
+ const BasicBlock *BB) const {
+ const auto &I = BlockWeights.find(BB);
+ uint64_t W = (I == BlockWeights.end() ? 0 : I->second);
+ OS << "weight[" << BB->getName() << "]: " << W << "\n";
+}
+#endif
+
+/// Get the weight for an instruction.
+///
+/// The "weight" of an instruction \p Inst is the number of samples
+/// collected on that instruction at runtime. To retrieve it, we
+/// need to compute the line number of \p Inst relative to the start of its
+/// function. We use HeaderLineno to compute the offset. We then
+/// look up the samples collected for \p Inst using BodySamples.
+///
+/// \param Inst Instruction to query.
+///
+/// \returns the weight of \p Inst.
+ErrorOr<uint64_t>
+SampleProfileLoaderBaseImpl::getInstWeight(const Instruction &Inst) {
+ return getInstWeightImpl(Inst);
+}
+
+ErrorOr<uint64_t>
+SampleProfileLoaderBaseImpl::getInstWeightImpl(const Instruction &Inst) {
+ const FunctionSamples *FS = findFunctionSamples(Inst);
+ if (!FS)
+ return std::error_code();
+
+ const DebugLoc &DLoc = Inst.getDebugLoc();
+ if (!DLoc)
+ return std::error_code();
+
+ const DILocation *DIL = DLoc;
+ uint32_t LineOffset = FunctionSamples::getOffset(DIL);
+ uint32_t Discriminator = DIL->getBaseDiscriminator();
+ ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);
+ if (R) {
+ bool FirstMark =
+ CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());
+ if (FirstMark) {
+ ORE->emit([&]() {
+ OptimizationRemarkAnalysis Remark(DEBUG_TYPE, "AppliedSamples", &Inst);
+ Remark << "Applied " << ore::NV("NumSamples", *R);
+ Remark << " samples from profile (offset: ";
+ Remark << ore::NV("LineOffset", LineOffset);
+ if (Discriminator) {
+ Remark << ".";
+ Remark << ore::NV("Discriminator", Discriminator);
+ }
+ Remark << ")";
+ return Remark;
+ });
+ }
+ LLVM_DEBUG(dbgs() << " " << DLoc.getLine() << "."
+ << DIL->getBaseDiscriminator() << ":" << Inst
+ << " (line offset: " << LineOffset << "."
+ << DIL->getBaseDiscriminator() << " - weight: " << R.get()
+ << ")\n");
+ }
+ return R;
+}
+
+/// Compute the weight of a basic block.
+///
+/// The weight of basic block \p BB is the maximum weight of all the
+/// instructions in BB.
+///
+/// \param BB The basic block to query.
+///
+/// \returns the weight for \p BB.
+ErrorOr<uint64_t>
+SampleProfileLoaderBaseImpl::getBlockWeight(const BasicBlock *BB) {
+ uint64_t Max = 0;
+ bool HasWeight = false;
+ for (auto &I : BB->getInstList()) {
+ const ErrorOr<uint64_t> &R = getInstWeight(I);
+ if (R) {
+ Max = std::max(Max, R.get());
+ HasWeight = true;
+ }
+ }
+ return HasWeight ? ErrorOr<uint64_t>(Max) : std::error_code();
+}
+
+/// Compute and store the weights of every basic block.
+///
+/// This populates the BlockWeights map by computing
+/// the weights of every basic block in the CFG.
+///
+/// \param F The function to query.
+bool SampleProfileLoaderBaseImpl::computeBlockWeights(Function &F) {
+ bool Changed = false;
+ LLVM_DEBUG(dbgs() << "Block weights\n");
+ for (const auto &BB : F) {
+ ErrorOr<uint64_t> Weight = getBlockWeight(&BB);
+ if (Weight) {
+ BlockWeights[&BB] = Weight.get();
+ VisitedBlocks.insert(&BB);
+ Changed = true;
+ }
+ LLVM_DEBUG(printBlockWeight(dbgs(), &BB));
+ }
+
+ return Changed;
+}
+
+/// Get the FunctionSamples for an instruction.
+///
+/// The FunctionSamples of an instruction \p Inst is the inlined instance
+/// in which that instruction is coming from. We traverse the inline stack
+/// of that instruction, and match it with the tree nodes in the profile.
+///
+/// \param Inst Instruction to query.
+///
+/// \returns the FunctionSamples pointer to the inlined instance.
+const FunctionSamples *SampleProfileLoaderBaseImpl::findFunctionSamples(
+ const Instruction &Inst) const {
+ const DILocation *DIL = Inst.getDebugLoc();
+ if (!DIL)
+ return Samples;
+
+ auto it = DILocation2SampleMap.try_emplace(DIL, nullptr);
+ if (it.second) {
+ it.first->second = Samples->findFunctionSamples(DIL, Reader->getRemapper());
+ }
+ return it.first->second;
+}
+
+/// Find equivalence classes for the given block.
+///
+/// This finds all the blocks that are guaranteed to execute the same
+/// number of times as \p BB1. To do this, it traverses all the
+/// descendants of \p BB1 in the dominator or post-dominator tree.
+///
+/// A block BB2 will be in the same equivalence class as \p BB1 if
+/// the following holds:
+///
+/// 1- \p BB1 is a descendant of BB2 in the opposite tree. So, if BB2
+/// is a descendant of \p BB1 in the dominator tree, then BB2 should
+/// dominate BB1 in the post-dominator tree.
+///
+/// 2- Both BB2 and \p BB1 must be in the same loop.
+///
+/// For every block BB2 that meets those two requirements, we set BB2's
+/// equivalence class to \p BB1.
+///
+/// \param BB1 Block to check.
+/// \param Descendants Descendants of \p BB1 in either the dom or pdom tree.
+/// \param DomTree Opposite dominator tree. If \p Descendants is filled
+/// with blocks from \p BB1's dominator tree, then
+/// this is the post-dominator tree, and vice versa.
+template <bool IsPostDom>
+void SampleProfileLoaderBaseImpl::findEquivalencesFor(
+ BasicBlock *BB1, ArrayRef<BasicBlock *> Descendants,
+ DominatorTreeBase<BasicBlock, IsPostDom> *DomTree) {
+ const BasicBlock *EC = EquivalenceClass[BB1];
+ uint64_t Weight = BlockWeights[EC];
+ for (const auto *BB2 : Descendants) {
+ bool IsDomParent = DomTree->dominates(BB2, BB1);
+ bool IsInSameLoop = LI->getLoopFor(BB1) == LI->getLoopFor(BB2);
+ if (BB1 != BB2 && IsDomParent && IsInSameLoop) {
+ EquivalenceClass[BB2] = EC;
+ // If BB2 is visited, then the entire EC should be marked as visited.
+ if (VisitedBlocks.count(BB2)) {
+ VisitedBlocks.insert(EC);
+ }
+
+ // If BB2 is heavier than BB1, make BB2 have the same weight
+ // as BB1.
+ //
+ // Note that we don't worry about the opposite situation here
+ // (when BB2 is lighter than BB1). We will deal with this
+ // during the propagation phase. Right now, we just want to
+ // make sure that BB1 has the largest weight of all the
+ // members of its equivalence set.
+ Weight = std::max(Weight, BlockWeights[BB2]);
+ }
+ }
+ if (EC == &EC->getParent()->getEntryBlock()) {
+ BlockWeights[EC] = Samples->getHeadSamples() + 1;
+ } else {
+ BlockWeights[EC] = Weight;
+ }
+}
+
+/// Find equivalence classes.
+///
+/// Since samples may be missing from blocks, we can fill in the gaps by setting
+/// the weights of all the blocks in the same equivalence class to the same
+/// weight. To compute the concept of equivalence, we use dominance and loop
+/// information. Two blocks B1 and B2 are in the same equivalence class if B1
+/// dominates B2, B2 post-dominates B1 and both are in the same loop.
+///
+/// \param F The function to query.
+void SampleProfileLoaderBaseImpl::findEquivalenceClasses(Function &F) {
+ SmallVector<BasicBlock *, 8> DominatedBBs;
+ LLVM_DEBUG(dbgs() << "\nBlock equivalence classes\n");
+ // Find equivalence sets based on dominance and post-dominance information.
+ for (auto &BB : F) {
+ BasicBlock *BB1 = &BB;
+
+ // Compute BB1's equivalence class once.
+ if (EquivalenceClass.count(BB1)) {
+ LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1));
+ continue;
+ }
+
+ // By default, blocks are in their own equivalence class.
+ EquivalenceClass[BB1] = BB1;
+
+ // Traverse all the blocks dominated by BB1. We are looking for
+ // every basic block BB2 such that:
+ //
+ // 1- BB1 dominates BB2.
+ // 2- BB2 post-dominates BB1.
+ // 3- BB1 and BB2 are in the same loop nest.
+ //
+ // If all those conditions hold, it means that BB2 is executed
+ // as many times as BB1, so they are placed in the same equivalence
+ // class by making BB2's equivalence class be BB1.
+ DominatedBBs.clear();
+ DT->getDescendants(BB1, DominatedBBs);
+ findEquivalencesFor(BB1, DominatedBBs, PDT.get());
+
+ LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1));
+ }
+
+ // Assign weights to equivalence classes.
+ //
+ // All the basic blocks in the same equivalence class will execute
+ // the same number of times. Since we know that the head block in
+ // each equivalence class has the largest weight, assign that weight
+ // to all the blocks in that equivalence class.
+ LLVM_DEBUG(
+ dbgs() << "\nAssign the same weight to all blocks in the same class\n");
+ for (auto &BI : F) {
+ const BasicBlock *BB = &BI;
+ const BasicBlock *EquivBB = EquivalenceClass[BB];
+ if (BB != EquivBB)
+ BlockWeights[BB] = BlockWeights[EquivBB];
+ LLVM_DEBUG(printBlockWeight(dbgs(), BB));
+ }
+}
+
+/// Visit the given edge to decide if it has a valid weight.
+///
+/// If \p E has not been visited before, we copy to \p UnknownEdge
+/// and increment the count of unknown edges.
+///
+/// \param E Edge to visit.
+/// \param NumUnknownEdges Current number of unknown edges.
+/// \param UnknownEdge Set if E has not been visited before.
+///
+/// \returns E's weight, if known. Otherwise, return 0.
+uint64_t SampleProfileLoaderBaseImpl::visitEdge(Edge E,
+ unsigned *NumUnknownEdges,
+ Edge *UnknownEdge) {
+ if (!VisitedEdges.count(E)) {
+ (*NumUnknownEdges)++;
+ *UnknownEdge = E;
+ return 0;
+ }
+
+ return EdgeWeights[E];
+}
+
+/// Propagate weights through incoming/outgoing edges.
+///
+/// If the weight of a basic block is known, and there is only one edge
+/// with an unknown weight, we can calculate the weight of that edge.
+///
+/// Similarly, if all the edges have a known count, we can calculate the
+/// count of the basic block, if needed.
+///
+/// \param F Function to process.
+/// \param UpdateBlockCount Whether we should update basic block counts that
+/// has already been annotated.
+///
+/// \returns True if new weights were assigned to edges or blocks.
+bool SampleProfileLoaderBaseImpl::propagateThroughEdges(Function &F,
+ bool UpdateBlockCount) {
+ bool Changed = false;
+ LLVM_DEBUG(dbgs() << "\nPropagation through edges\n");
+ for (const auto &BI : F) {
+ const BasicBlock *BB = &BI;
+ const BasicBlock *EC = EquivalenceClass[BB];
+
+ // Visit all the predecessor and successor edges to determine
+ // which ones have a weight assigned already. Note that it doesn't
+ // matter that we only keep track of a single unknown edge. The
+ // only case we are interested in handling is when only a single
+ // edge is unknown (see setEdgeOrBlockWeight).
+ for (unsigned i = 0; i < 2; i++) {
+ uint64_t TotalWeight = 0;
+ unsigned NumUnknownEdges = 0, NumTotalEdges = 0;
+ Edge UnknownEdge, SelfReferentialEdge, SingleEdge;
+
+ if (i == 0) {
+ // First, visit all predecessor edges.
+ NumTotalEdges = Predecessors[BB].size();
+ for (auto *Pred : Predecessors[BB]) {
+ Edge E = std::make_pair(Pred, BB);
+ TotalWeight += visitEdge(E, &NumUnknownEdges, &UnknownEdge);
+ if (E.first == E.second)
+ SelfReferentialEdge = E;
+ }
+ if (NumTotalEdges == 1) {
+ SingleEdge = std::make_pair(Predecessors[BB][0], BB);
+ }
+ } else {
+ // On the second round, visit all successor edges.
+ NumTotalEdges = Successors[BB].size();
+ for (auto *Succ : Successors[BB]) {
+ Edge E = std::make_pair(BB, Succ);
+ TotalWeight += visitEdge(E, &NumUnknownEdges, &UnknownEdge);
+ }
+ if (NumTotalEdges == 1) {
+ SingleEdge = std::make_pair(BB, Successors[BB][0]);
+ }
+ }
+
+ // After visiting all the edges, there are three cases that we
+ // can handle immediately:
+ //
+ // - All the edge weights are known (i.e., NumUnknownEdges == 0).
+ // In this case, we simply check that the sum of all the edges
+ // is the same as BB's weight. If not, we change BB's weight
+ // to match. Additionally, if BB had not been visited before,
+ // we mark it visited.
+ //
+ // - Only one edge is unknown and BB has already been visited.
+ // In this case, we can compute the weight of the edge by
+ // subtracting the total block weight from all the known
+ // edge weights. If the edges weight more than BB, then the
+ // edge of the last remaining edge is set to zero.
+ //
+ // - There exists a self-referential edge and the weight of BB is
+ // known. In this case, this edge can be based on BB's weight.
+ // We add up all the other known edges and set the weight on
+ // the self-referential edge as we did in the previous case.
+ //
+ // In any other case, we must continue iterating. Eventually,
+ // all edges will get a weight, or iteration will stop when
+ // it reaches SampleProfileMaxPropagateIterations.
+ if (NumUnknownEdges <= 1) {
+ uint64_t &BBWeight = BlockWeights[EC];
+ if (NumUnknownEdges == 0) {
+ if (!VisitedBlocks.count(EC)) {
+ // If we already know the weight of all edges, the weight of the
+ // basic block can be computed. It should be no larger than the sum
+ // of all edge weights.
+ if (TotalWeight > BBWeight) {
+ BBWeight = TotalWeight;
+ Changed = true;
+ LLVM_DEBUG(dbgs() << "All edge weights for " << BB->getName()
+ << " known. Set weight for block: ";
+ printBlockWeight(dbgs(), BB););
+ }
+ } else if (NumTotalEdges == 1 &&
+ EdgeWeights[SingleEdge] < BlockWeights[EC]) {
+ // If there is only one edge for the visited basic block, use the
+ // block weight to adjust edge weight if edge weight is smaller.
+ EdgeWeights[SingleEdge] = BlockWeights[EC];
+ Changed = true;
+ }
+ } else if (NumUnknownEdges == 1 && VisitedBlocks.count(EC)) {
+ // If there is a single unknown edge and the block has been
+ // visited, then we can compute E's weight.
+ if (BBWeight >= TotalWeight)
+ EdgeWeights[UnknownEdge] = BBWeight - TotalWeight;
+ else
+ EdgeWeights[UnknownEdge] = 0;
+ const BasicBlock *OtherEC;
+ if (i == 0)
+ OtherEC = EquivalenceClass[UnknownEdge.first];
+ else
+ OtherEC = EquivalenceClass[UnknownEdge.second];
+ // Edge weights should never exceed the BB weights it connects.
+ if (VisitedBlocks.count(OtherEC) &&
+ EdgeWeights[UnknownEdge] > BlockWeights[OtherEC])
+ EdgeWeights[UnknownEdge] = BlockWeights[OtherEC];
+ VisitedEdges.insert(UnknownEdge);
+ Changed = true;
+ LLVM_DEBUG(dbgs() << "Set weight for edge: ";
+ printEdgeWeight(dbgs(), UnknownEdge));
+ }
+ } else if (VisitedBlocks.count(EC) && BlockWeights[EC] == 0) {
+ // If a block Weights 0, all its in/out edges should weight 0.
+ if (i == 0) {
+ for (auto *Pred : Predecessors[BB]) {
+ Edge E = std::make_pair(Pred, BB);
+ EdgeWeights[E] = 0;
+ VisitedEdges.insert(E);
+ }
+ } else {
+ for (auto *Succ : Successors[BB]) {
+ Edge E = std::make_pair(BB, Succ);
+ EdgeWeights[E] = 0;
+ VisitedEdges.insert(E);
+ }
+ }
+ } else if (SelfReferentialEdge.first && VisitedBlocks.count(EC)) {
+ uint64_t &BBWeight = BlockWeights[BB];
+ // We have a self-referential edge and the weight of BB is known.
+ if (BBWeight >= TotalWeight)
+ EdgeWeights[SelfReferentialEdge] = BBWeight - TotalWeight;
+ else
+ EdgeWeights[SelfReferentialEdge] = 0;
+ VisitedEdges.insert(SelfReferentialEdge);
+ Changed = true;
+ LLVM_DEBUG(dbgs() << "Set self-referential edge weight to: ";
+ printEdgeWeight(dbgs(), SelfReferentialEdge));
+ }
+ if (UpdateBlockCount && !VisitedBlocks.count(EC) && TotalWeight > 0) {
+ BlockWeights[EC] = TotalWeight;
+ VisitedBlocks.insert(EC);
+ Changed = true;
+ }
+ }
+ }
+
+ return Changed;
+}
+
+/// Build in/out edge lists for each basic block in the CFG.
+///
+/// We are interested in unique edges. If a block B1 has multiple
+/// edges to another block B2, we only add a single B1->B2 edge.
+void SampleProfileLoaderBaseImpl::buildEdges(Function &F) {
+ for (auto &BI : F) {
+ BasicBlock *B1 = &BI;
+
+ // Add predecessors for B1.
+ SmallPtrSet<BasicBlock *, 16> Visited;
+ if (!Predecessors[B1].empty())
+ llvm_unreachable("Found a stale predecessors list in a basic block.");
+ for (BasicBlock *B2 : predecessors(B1))
+ if (Visited.insert(B2).second)
+ Predecessors[B1].push_back(B2);
+
+ // Add successors for B1.
+ Visited.clear();
+ if (!Successors[B1].empty())
+ llvm_unreachable("Found a stale successors list in a basic block.");
+ for (BasicBlock *B2 : successors(B1))
+ if (Visited.insert(B2).second)
+ Successors[B1].push_back(B2);
+ }
+}
+
+/// Propagate weights into edges
+///
+/// The following rules are applied to every block BB in the CFG:
+///
+/// - If BB has a single predecessor/successor, then the weight
+/// of that edge is the weight of the block.
+///
+/// - If all incoming or outgoing edges are known except one, and the
+/// weight of the block is already known, the weight of the unknown
+/// edge will be the weight of the block minus the sum of all the known
+/// edges. If the sum of all the known edges is larger than BB's weight,
+/// we set the unknown edge weight to zero.
+///
+/// - If there is a self-referential edge, and the weight of the block is
+/// known, the weight for that edge is set to the weight of the block
+/// minus the weight of the other incoming edges to that block (if
+/// known).
+void SampleProfileLoaderBaseImpl::propagateWeights(Function &F) {
+ bool Changed = true;
+ unsigned I = 0;
+
+ // If BB weight is larger than its corresponding loop's header BB weight,
+ // use the BB weight to replace the loop header BB weight.
+ for (auto &BI : F) {
+ BasicBlock *BB = &BI;
+ Loop *L = LI->getLoopFor(BB);
+ if (!L) {
+ continue;
+ }
+ BasicBlock *Header = L->getHeader();
+ if (Header && BlockWeights[BB] > BlockWeights[Header]) {
+ BlockWeights[Header] = BlockWeights[BB];
+ }
+ }
+
+ // Before propagation starts, build, for each block, a list of
+ // unique predecessors and successors. This is necessary to handle
+ // identical edges in multiway branches. Since we visit all blocks and all
+ // edges of the CFG, it is cleaner to build these lists once at the start
+ // of the pass.
+ buildEdges(F);
+
+ // Propagate until we converge or we go past the iteration limit.
+ while (Changed && I++ < SampleProfileMaxPropagateIterations) {
+ Changed = propagateThroughEdges(F, false);
+ }
+
+ // The first propagation propagates BB counts from annotated BBs to unknown
+ // BBs. The 2nd propagation pass resets edges weights, and use all BB weights
+ // to propagate edge weights.
+ VisitedEdges.clear();
+ Changed = true;
+ while (Changed && I++ < SampleProfileMaxPropagateIterations) {
+ Changed = propagateThroughEdges(F, false);
+ }
+
+ // The 3rd propagation pass allows adjust annotated BB weights that are
+ // obviously wrong.
+ Changed = true;
+ while (Changed && I++ < SampleProfileMaxPropagateIterations) {
+ Changed = propagateThroughEdges(F, true);
+ }
+}
+
+/// Generate branch weight metadata for all branches in \p F.
+///
+/// Branch weights are computed out of instruction samples using a
+/// propagation heuristic. Propagation proceeds in 3 phases:
+///
+/// 1- Assignment of block weights. All the basic blocks in the function
+/// are initial assigned the same weight as their most frequently
+/// executed instruction.
+///
+/// 2- Creation of equivalence classes. Since samples may be missing from
+/// blocks, we can fill in the gaps by setting the weights of all the
+/// blocks in the same equivalence class to the same weight. To compute
+/// the concept of equivalence, we use dominance and loop information.
+/// Two blocks B1 and B2 are in the same equivalence class if B1
+/// dominates B2, B2 post-dominates B1 and both are in the same loop.
+///
+/// 3- Propagation of block weights into edges. This uses a simple
+/// propagation heuristic. The following rules are applied to every
+/// block BB in the CFG:
+///
+/// - If BB has a single predecessor/successor, then the weight
+/// of that edge is the weight of the block.
+///
+/// - If all the edges are known except one, and the weight of the
+/// block is already known, the weight of the unknown edge will
+/// be the weight of the block minus the sum of all the known
+/// edges. If the sum of all the known edges is larger than BB's weight,
+/// we set the unknown edge weight to zero.
+///
+/// - If there is a self-referential edge, and the weight of the block is
+/// known, the weight for that edge is set to the weight of the block
+/// minus the weight of the other incoming edges to that block (if
+/// known).
+///
+/// Since this propagation is not guaranteed to finalize for every CFG, we
+/// only allow it to proceed for a limited number of iterations (controlled
+/// by -sample-profile-max-propagate-iterations).
+///
+/// FIXME: Try to replace this propagation heuristic with a scheme
+/// that is guaranteed to finalize. A work-list approach similar to
+/// the standard value propagation algorithm used by SSA-CCP might
+/// work here.
+///
+/// \param F The function to query.
+///
+/// \returns true if \p F was modified. Returns false, otherwise.
+bool SampleProfileLoaderBaseImpl::computeAndPropagateWeights(
+ Function &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {
+ bool Changed = (InlinedGUIDs.size() != 0);
+
+ // Compute basic block weights.
+ Changed |= computeBlockWeights(F);
+
+ if (Changed) {
+ // Add an entry count to the function using the samples gathered at the
+ // function entry.
+ // Sets the GUIDs that are inlined in the profiled binary. This is used
+ // for ThinLink to make correct liveness analysis, and also make the IR
+ // match the profiled binary before annotation.
+ F.setEntryCount(
+ ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),
+ &InlinedGUIDs);
+
+ // Compute dominance and loop info needed for propagation.
+ computeDominanceAndLoopInfo(F);
+
+ // Find equivalence classes.
+ findEquivalenceClasses(F);
+
+ // Propagate weights to all edges.
+ propagateWeights(F);
+ }
+
+ return Changed;
+}
+
+void SampleProfileLoaderBaseImpl::emitCoverageRemarks(Function &F) {
+ // If coverage checking was requested, compute it now.
+ if (SampleProfileRecordCoverage) {
+ unsigned Used = CoverageTracker.countUsedRecords(Samples, PSI);
+ unsigned Total = CoverageTracker.countBodyRecords(Samples, PSI);
+ unsigned Coverage = CoverageTracker.computeCoverage(Used, Total);
+ if (Coverage < SampleProfileRecordCoverage) {
+ F.getContext().diagnose(DiagnosticInfoSampleProfile(
+ F.getSubprogram()->getFilename(), getFunctionLoc(F),
+ Twine(Used) + " of " + Twine(Total) + " available profile records (" +
+ Twine(Coverage) + "%) were applied",
+ DS_Warning));
+ }
+ }
+
+ if (SampleProfileSampleCoverage) {
+ uint64_t Used = CoverageTracker.getTotalUsedSamples();
+ uint64_t Total = CoverageTracker.countBodySamples(Samples, PSI);
+ unsigned Coverage = CoverageTracker.computeCoverage(Used, Total);
+ if (Coverage < SampleProfileSampleCoverage) {
+ F.getContext().diagnose(DiagnosticInfoSampleProfile(
+ F.getSubprogram()->getFilename(), getFunctionLoc(F),
+ Twine(Used) + " of " + Twine(Total) + " available profile samples (" +
+ Twine(Coverage) + "%) were applied",
+ DS_Warning));
+ }
+ }
+}
+
+/// Get the line number for the function header.
+///
+/// This looks up function \p F in the current compilation unit and
+/// retrieves the line number where the function is defined. This is
+/// line 0 for all the samples read from the profile file. Every line
+/// number is relative to this line.
+///
+/// \param F Function object to query.
+///
+/// \returns the line number where \p F is defined. If it returns 0,
+/// it means that there is no debug information available for \p F.
+unsigned SampleProfileLoaderBaseImpl::getFunctionLoc(Function &F) {
+ if (DISubprogram *S = F.getSubprogram())
+ return S->getLine();
+
+ if (NoWarnSampleUnused)
+ return 0;
+
+ // If the start of \p F is missing, emit a diagnostic to inform the user
+ // about the missed opportunity.
+ F.getContext().diagnose(DiagnosticInfoSampleProfile(
+ "No debug information found in function " + F.getName() +
+ ": Function profile not used",
+ DS_Warning));
+ return 0;
+}
+
+void SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(Function &F) {
+ DT.reset(new DominatorTree);
+ DT->recalculate(F);
+
+ PDT.reset(new PostDominatorTree(F));
+
+ LI.reset(new LoopInfo);
+ LI->analyze(*DT);
+}
+
+#undef DEBUG_TYPE
+
+} // namespace llvm
+#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H
diff --git a/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h
new file mode 100644
index 000000000000..37dc8d8187d9
--- /dev/null
+++ b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h
@@ -0,0 +1,97 @@
+////===- SampleProfileLoadBaseUtil.h - Profile loader util func --*- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// This file provides the utility functions for the sampled PGO loader base
+/// implementation.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H
+#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/DebugLoc.h"
+#include "llvm/IR/Function.h"
+#include "llvm/ProfileData/SampleProf.h"
+#include "llvm/Support/CommandLine.h"
+
+namespace llvm {
+using namespace sampleprof;
+
+extern cl::opt<unsigned> SampleProfileMaxPropagateIterations;
+extern cl::opt<unsigned> SampleProfileRecordCoverage;
+extern cl::opt<unsigned> SampleProfileSampleCoverage;
+extern cl::opt<bool> NoWarnSampleUnused;
+
+namespace sampleprofutil {
+
+class SampleCoverageTracker {
+public:
+ bool markSamplesUsed(const FunctionSamples *FS, uint32_t LineOffset,
+ uint32_t Discriminator, uint64_t Samples);
+ unsigned computeCoverage(unsigned Used, unsigned Total) const;
+ unsigned countUsedRecords(const FunctionSamples *FS,
+ ProfileSummaryInfo *PSI) const;
+ unsigned countBodyRecords(const FunctionSamples *FS,
+ ProfileSummaryInfo *PSI) const;
+ uint64_t getTotalUsedSamples() const { return TotalUsedSamples; }
+ uint64_t countBodySamples(const FunctionSamples *FS,
+ ProfileSummaryInfo *PSI) const;
+
+ void clear() {
+ SampleCoverage.clear();
+ TotalUsedSamples = 0;
+ }
+ void setProfAccForSymsInList(bool V) { ProfAccForSymsInList = V; }
+
+private:
+ using BodySampleCoverageMap = std::map<LineLocation, unsigned>;
+ using FunctionSamplesCoverageMap =
+ DenseMap<const FunctionSamples *, BodySampleCoverageMap>;
+
+ /// Coverage map for sampling records.
+ ///
+ /// This map keeps a record of sampling records that have been matched to
+ /// an IR instruction. This is used to detect some form of staleness in
+ /// profiles (see flag -sample-profile-check-coverage).
+ ///
+ /// Each entry in the map corresponds to a FunctionSamples instance. This is
+ /// another map that counts how many times the sample record at the
+ /// given location has been used.
+ FunctionSamplesCoverageMap SampleCoverage;
+
+ /// Number of samples used from the profile.
+ ///
+ /// When a sampling record is used for the first time, the samples from
+ /// that record are added to this accumulator. Coverage is later computed
+ /// based on the total number of samples available in this function and
+ /// its callsites.
+ ///
+ /// Note that this accumulator tracks samples used from a single function
+ /// and all the inlined callsites. Strictly, we should have a map of counters
+ /// keyed by FunctionSamples pointers, but these stats are cleared after
+ /// every function, so we just need to keep a single counter.
+ uint64_t TotalUsedSamples = 0;
+
+ // For symbol in profile symbol list, whether to regard their profiles
+ // to be accurate. This is passed from the SampleLoader instance.
+ bool ProfAccForSymsInList = false;
+};
+
+/// Return true if the given callsite is hot wrt to hot cutoff threshold.
+bool callsiteIsHot(const FunctionSamples *CallsiteFS, ProfileSummaryInfo *PSI,
+ bool ProfAccForSymsInList);
+
+} // end of namespace sampleprofutil
+} // end of namespace llvm
+
+#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H
diff --git a/llvm/lib/ProfileData/CMakeLists.txt b/llvm/lib/ProfileData/CMakeLists.txt
index 2a377e4d74d3..4125fac918ab 100644
--- a/llvm/lib/ProfileData/CMakeLists.txt
+++ b/llvm/lib/ProfileData/CMakeLists.txt
@@ -5,6 +5,7 @@ add_llvm_component_library(LLVMProfileData
InstrProfWriter.cpp
ProfileSummaryBuilder.cpp
</cut>
Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-arm-next-allmodconfig. So far, this commit has regressed CI configurations:
- tcwg_kernel/llvm-release-arm-next-allmodconfig
Culprit:
<cut>
commit fad7cd3310db3099f95dd34312c77740fbc455e5
Author: Baokun Li <libaokun1(a)huawei.com>
Date: Wed Aug 4 10:12:12 2021 +0800
nbd: add the check to prevent overflow in __nbd_ioctl()
If user specify a large enough value of NBD blocks option, it may trigger
signed integer overflow which may lead to nbd->config->bytesize becomes a
large or small value, zero in particular.
UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31
signed integer overflow:
1024 * 4611686155866341414 cannot be represented in type 'long long int'
[...]
Call trace:
[...]
handle_overflow+0x188/0x1dc lib/ubsan.c:192
__ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
nbd_size_set drivers/block/nbd.c:325 [inline]
__nbd_ioctl drivers/block/nbd.c:1342 [inline]
nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395
__blkdev_driver_ioctl block/ioctl.c:311 [inline]
[...]
Although it is not a big deal, still silence the UBSAN by limit
the input value.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com
[axboe: dropped unlikely()]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
</cut>
Results regressed to (for first_bad == fad7cd3310db3099f95dd34312c77740fbc455e5)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
21709
# First few build errors in logs:
# 00:07:12 make[1]: *** [modules-only.symvers] Error 1
# 00:07:12 make: *** [modules] Error 2
from (for last_good == da20b58d5bbbb0d23ae9530992a37d0f0d1787a4)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
29751
# linux build successful:
all
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all…
Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all…
Configuration details:
rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ecf9343…"
Reproduce builds:
<cut>
mkdir investigate-linux-fad7cd3310db3099f95dd34312c77740fbc455e5
cd investigate-linux-fad7cd3310db3099f95dd34312c77740fbc455e5
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach fad7cd3310db3099f95dd34312c77740fbc455e5
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach da20b58d5bbbb0d23ae9530992a37d0f0d1787a4
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all…
Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all…
Full commit (up to 1000 lines):
<cut>
commit fad7cd3310db3099f95dd34312c77740fbc455e5
Author: Baokun Li <libaokun1(a)huawei.com>
Date: Wed Aug 4 10:12:12 2021 +0800
nbd: add the check to prevent overflow in __nbd_ioctl()
If user specify a large enough value of NBD blocks option, it may trigger
signed integer overflow which may lead to nbd->config->bytesize becomes a
large or small value, zero in particular.
UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31
signed integer overflow:
1024 * 4611686155866341414 cannot be represented in type 'long long int'
[...]
Call trace:
[...]
handle_overflow+0x188/0x1dc lib/ubsan.c:192
__ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
nbd_size_set drivers/block/nbd.c:325 [inline]
__nbd_ioctl drivers/block/nbd.c:1342 [inline]
nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395
__blkdev_driver_ioctl block/ioctl.c:311 [inline]
[...]
Although it is not a big deal, still silence the UBSAN by limit
the input value.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com
[axboe: dropped unlikely()]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/nbd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c38317979f74..f82264835794 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1384,6 +1384,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
unsigned int cmd, unsigned long arg)
{
struct nbd_config *config = nbd->config;
+ loff_t bytesize;
switch (cmd) {
case NBD_DISCONNECT:
@@ -1398,8 +1399,9 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
case NBD_SET_SIZE:
return nbd_set_size(nbd, arg, config->blksize);
case NBD_SET_SIZE_BLOCKS:
- return nbd_set_size(nbd, arg * config->blksize,
- config->blksize);
+ if (check_mul_overflow((loff_t)arg, config->blksize, &bytesize))
+ return -EINVAL;
+ return nbd_set_size(nbd, bytesize, config->blksize);
case NBD_SET_TIMEOUT:
nbd_set_cmd_timeout(nbd, arg);
return 0;
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3
Culprit:
<cut>
commit 6998f8ae2d14e096aff33968f226587b5c1a193a
Author: David Sherwood <david.sherwood(a)arm.com>
Date: Wed Mar 10 08:34:19 2021 +0000
[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.
Differential Revision: https://reviews.llvm.org/D99718
</cut>
Results regressed to (for first_bad == 6998f8ae2d14e096aff33968f226587b5c1a193a)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-6998f8ae2d14e096aff33968f226587b5c1a193a/results_id:
1
# 462.libquantum,libquantum_base.default regressed by 114
# 462.libquantum,[.] quantum_toffoli regressed by 123
# 462.libquantum,[.] quantum_cnot regressed by 115
from (for last_good == c835630c25a4f9925517949579f66a43b113fbc9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-c835630c25a4f9925517949579f66a43b113fbc9/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/3744
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/3755
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a
cd investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 6998f8ae2d14e096aff33968f226587b5c1a193a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c835630c25a4f9925517949579f66a43b113fbc9
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 6998f8ae2d14e096aff33968f226587b5c1a193a
Author: David Sherwood <david.sherwood(a)arm.com>
Date: Wed Mar 10 08:34:19 2021 +0000
[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.
Differential Revision: https://reviews.llvm.org/D99718
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 73 +++++++++++++---------
.../AArch64/no_vector_instructions.ll | 2 +-
.../LoopVectorize/AArch64/predication_costs.ll | 35 +++++++++++
.../Transforms/LoopVectorize/scalarized-bitcast.ll | 40 ++++++++++++
4 files changed, 121 insertions(+), 29 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2b413fc49505..f25af23c86c2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7383,10 +7383,39 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *RetTy = I->getType();
if (canTruncateToMinimalBitwidth(I, VF))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
- VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF);
auto SE = PSE.getSE();
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+ auto hasSingleCopyAfterVectorization = [this](Instruction *I,
+ ElementCount VF) -> bool {
+ if (VF.isScalar())
+ return true;
+
+ auto Scalarized = InstsToScalarize.find(VF);
+ assert(Scalarized != InstsToScalarize.end() &&
+ "VF not yet analyzed for scalarization profitability");
+ return !Scalarized->second.count(I) &&
+ llvm::all_of(I->users(), [&](User *U) {
+ auto *UI = cast<Instruction>(U);
+ return !Scalarized->second.count(UI);
+ });
+ };
+
+ if (isScalarAfterVectorization(I, VF)) {
+ // With the exception of GEPs and PHIs, after scalarization there should
+ // only be one copy of the instruction generated in the loop. This is
+ // because the VF is either 1, or any instructions that need scalarizing
+ // have already been dealt with by the the time we get here. As a result,
+ // it means we don't have to multiply the instruction cost by VF.
+ assert(I->getOpcode() == Instruction::GetElementPtr ||
+ I->getOpcode() == Instruction::PHI ||
+ (I->getOpcode() == Instruction::BitCast &&
+ I->getType()->isPointerTy()) ||
+ hasSingleCopyAfterVectorization(I, VF));
+ VectorTy = RetTy;
+ } else
+ VectorTy = ToVectorTy(RetTy, VF);
+
// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {
case Instruction::GetElementPtr:
@@ -7514,21 +7543,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Op2VK = TargetTransformInfo::OK_UniformValue;
SmallVector<const Value *, 4> Operands(I->operand_values());
- unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
- return N * TTI.getArithmeticInstrCost(
- I->getOpcode(), VectorTy, CostKind,
- TargetTransformInfo::OK_AnyValue,
- Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
+ return TTI.getArithmeticInstrCost(
+ I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
+ Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
}
case Instruction::FNeg: {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");
- unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
- return N * TTI.getArithmeticInstrCost(
- I->getOpcode(), VectorTy, CostKind,
- TargetTransformInfo::OK_AnyValue,
- TargetTransformInfo::OK_AnyValue,
- TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,
- I->getOperand(0), I);
+ return TTI.getArithmeticInstrCost(
+ I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
+ TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None,
+ TargetTransformInfo::OP_None, I->getOperand(0), I);
}
case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);
@@ -7583,6 +7607,10 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
VectorTy = ToVectorTy(getMemInstValueType(I), Width);
return getMemoryInstructionCost(I, VF);
}
+ case Instruction::BitCast:
+ if (I->getType()->isPointerTy())
+ return 0;
+ LLVM_FALLTHROUGH;
case Instruction::ZExt:
case Instruction::SExt:
case Instruction::FPToUI:
@@ -7593,8 +7621,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::SIToFP:
case Instruction::UIToFP:
case Instruction::Trunc:
- case Instruction::FPTrunc:
- case Instruction::BitCast: {
+ case Instruction::FPTrunc: {
// Computes the CastContextHint from a Load/Store instruction.
auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint {
assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
@@ -7672,14 +7699,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
}
}
- unsigned N;
- if (isScalarAfterVectorization(I, VF)) {
- assert(!VF.isScalable() && "VF is assumed to be non scalable");
- N = VF.getKnownMinValue();
- } else
- N = 1;
- return N *
- TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
+ return TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
}
case Instruction::Call: {
bool NeedToScalarize;
@@ -7694,11 +7714,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::ExtractValue:
return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);
default:
- // The cost of executing VF copies of the scalar instruction. This opcode
- // is unknown. Assume that it is the same as 'mul'.
- return VF.getKnownMinValue() * TTI.getArithmeticInstrCost(
- Instruction::Mul, VectorTy, CostKind) +
- getScalarizationOverhead(I, VF);
+ // This opcode is unknown. Assume that it is the same as 'mul'.
+ return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind);
} // end of switch.
}
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
index 247ea35ff5d0..3061998518ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
@@ -6,7 +6,7 @@ target triple = "aarch64--linux-gnu"
; CHECK-LABEL: all_scalar
; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions
;
define void @all_scalar(i64* %a, i64 %n) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
index b0ebb4edf2ad..858b28ddd321 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
@@ -86,6 +86,41 @@ for.end:
ret void
}
+; CHECK-LABEL: predicated_store_phi
+;
+; Same as predicate_store except we use a pointer PHI to maintain the address
+;
+; CHECK: Found new scalar instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ]
+; CHECK: Found new scalar instruction: %addr.next = getelementptr inbounds i32, i32* %addr, i64 1
+; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %addr, align 4
+; CHECK: Found an estimated cost of 0 for VF 2 For instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ]
+; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %addr, align 4
+;
+define void @predicated_store_phi(i32* %a, i1 %c, i32 %x, i64 %n) {
+entry:
+ br label %for.body
+
+for.body:
+ %i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
+ %addr = phi i32 * [ %a, %entry ], [ %addr.next, %for.inc ]
+ %tmp1 = load i32, i32* %addr, align 4
+ %tmp2 = add nsw i32 %tmp1, %x
+ br i1 %c, label %if.then, label %for.inc
+
+if.then:
+ store i32 %tmp2, i32* %addr, align 4
+ br label %for.inc
+
+for.inc:
+ %i.next = add nuw nsw i64 %i, 1
+ %cond = icmp slt i64 %i.next, %n
+ %addr.next = getelementptr inbounds i32, i32* %addr, i64 1
+ br i1 %cond, label %for.body, label %for.end
+
+for.end:
+ ret void
+}
+
; CHECK-LABEL: predicated_udiv_scalarized_operand
;
; This test checks that we correctly compute the cost of the predicated udiv
diff --git a/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll
new file mode 100644
index 000000000000..0c97e6ac475e
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll
@@ -0,0 +1,40 @@
+; REQUIRES: asserts
+; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -S -o - < %s 2>&1 | FileCheck %s
+
+%struct.foo = type { i32, i64 }
+
+; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %0 = bitcast i64* %b to i32*
+
+; The bitcast below will be scalarized due to the predication in the loop. Bitcasts
+; between pointer types should be treated as free, despite the scalarization.
+define void @foo(%struct.foo* noalias nocapture %in, i32* noalias nocapture readnone %out, i64 %n) {
+entry:
+ br label %for.body
+
+for.body: ; preds = %entry, %if.end
+ %i.012 = phi i64 [ %inc, %if.end ], [ 0, %entry ]
+ %b = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 1
+ %0 = bitcast i64* %b to i32*
+ %a = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 0
+ %1 = load i32, i32* %a, align 8
+ %tobool.not = icmp eq i32 %1, 0
+ br i1 %tobool.not, label %if.end, label %land.lhs.true
+
+land.lhs.true: ; preds = %for.body
+ %2 = load i32, i32* %0, align 4
+ %cmp2 = icmp sgt i32 %2, 0
+ br i1 %cmp2, label %if.then, label %if.end
+
+if.then: ; preds = %land.lhs.true
+ %sub = add nsw i32 %2, -1
+ store i32 %sub, i32* %0, align 4
+ br label %if.end
+
+if.end: ; preds = %if.then, %land.lhs.true, %for.body
+ %inc = add nuw nsw i64 %i.012, 1
+ %exitcond.not = icmp eq i64 %inc, %n
+ br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end: ; preds = %if.end
+ ret void
+}
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO
Culprit:
<cut>
commit 4aafd5f00c2a772337ec065d4542ef158453a343
Author: Jan Svoboda <jan_svoboda(a)apple.com>
Date: Fri Aug 6 14:46:41 2021 +0200
[clang] Remove misleading assertion in FullSourceLoc
D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`.
This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`.
The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`.
This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D106862
</cut>
Results regressed to (for first_bad == 4aafd5f00c2a772337ec065d4542ef158453a343)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Oz_LTO artifacts/build-4aafd5f00c2a772337ec065d4542ef158453a343/results_id:
1
# 470.lbm,lbm_base.default regressed by 104
from (for last_good == 3709822d2602b8b7db2d9bcc0e856f676582f25d)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Oz_LTO artifacts/build-3709822d2602b8b7db2d9bcc0e856f676582f25d/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/3746
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/3725
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-4aafd5f00c2a772337ec065d4542ef158453a343
cd investigate-llvm-4aafd5f00c2a772337ec065d4542ef158453a343
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 4aafd5f00c2a772337ec065d4542ef158453a343
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 3709822d2602b8b7db2d9bcc0e856f676582f25d
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 4aafd5f00c2a772337ec065d4542ef158453a343
Author: Jan Svoboda <jan_svoboda(a)apple.com>
Date: Fri Aug 6 14:46:41 2021 +0200
[clang] Remove misleading assertion in FullSourceLoc
D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`.
This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`.
The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`.
This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D106862
---
clang/include/clang/Basic/SourceLocation.h | 13 +++++++------
clang/test/Modules/Inputs/explicit-build-diags/a.h | 1 +
.../Modules/Inputs/explicit-build-diags/module.modulemap | 1 +
clang/test/Modules/explicit-build-diags.cpp | 8 ++++++++
4 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index 540de23b9f55..ba2e9156a2b1 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -363,6 +363,10 @@ class FileEntry;
/// A SourceLocation and its associated SourceManager.
///
/// This is useful for argument passing to functions that expect both objects.
+///
+/// This class does not guarantee the presence of either the SourceManager or
+/// a valid SourceLocation. Clients should use `isValid()` and `hasManager()`
+/// before calling the member functions.
class FullSourceLoc : public SourceLocation {
const SourceManager *SrcMgr = nullptr;
@@ -373,13 +377,10 @@ public:
explicit FullSourceLoc(SourceLocation Loc, const SourceManager &SM)
: SourceLocation(Loc), SrcMgr(&SM) {}
- bool hasManager() const {
- bool hasSrcMgr = SrcMgr != nullptr;
- assert(hasSrcMgr == isValid() && "FullSourceLoc has location but no manager");
- return hasSrcMgr;
- }
+ /// Checks whether the SourceManager is present.
+ bool hasManager() const { return SrcMgr != nullptr; }
- /// \pre This FullSourceLoc has an associated SourceManager.
+ /// \pre hasManager()
const SourceManager &getManager() const {
assert(SrcMgr && "SourceManager is NULL.");
return *SrcMgr;
diff --git a/clang/test/Modules/Inputs/explicit-build-diags/a.h b/clang/test/Modules/Inputs/explicit-build-diags/a.h
new file mode 100644
index 000000000000..486941dde83b
--- /dev/null
+++ b/clang/test/Modules/Inputs/explicit-build-diags/a.h
@@ -0,0 +1 @@
+void a() __attribute__((deprecated));
diff --git a/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap b/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap
new file mode 100644
index 000000000000..bb00c840ce39
--- /dev/null
+++ b/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap
@@ -0,0 +1 @@
+module a { header "a.h" }
diff --git a/clang/test/Modules/explicit-build-diags.cpp b/clang/test/Modules/explicit-build-diags.cpp
new file mode 100644
index 000000000000..4a37dc108a68
--- /dev/null
+++ b/clang/test/Modules/explicit-build-diags.cpp
@@ -0,0 +1,8 @@
+// RUN: rm -rf %t && mkdir %t
+// RUN: %clang_cc1 -fmodules -x c++ %S/Inputs/explicit-build-diags/module.modulemap -fmodule-name=a -emit-module -o %t/a.pcm
+// RUN: %clang_cc1 -fmodules -Wdeprecated-declarations -fdiagnostics-show-note-include-stack -serialize-diagnostic-file %t/tu.dia \
+// RUN: -I %S/Inputs/explicit-build-diags -fmodule-file=%t/a.pcm -fsyntax-only %s
+
+#include "a.h"
+
+void foo() { a(); }
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3
Culprit:
<cut>
commit e771614bae0a05585f720812d5936a0b81dcddf0
Author: David Green <david.green(a)arm.com>
Date: Thu Feb 11 11:58:55 2021 +0000
[ARM] Change getScalarizationOverhead overload used in gather costs. NFC
This changes which of the getScalarizationOverhead overloads is used in
the gather/scatter cost to use the base variant directly, not relying on
the version using heuristics on the number of args with no args
provided. It should still produce the same costs for scalarized
gathers/scatters.
</cut>
Results regressed to (for first_bad == e771614bae0a05585f720812d5936a0b81dcddf0)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_marm artifacts/build-e771614bae0a05585f720812d5936a0b81dcddf0/results_id:
1
# 445.gobmk,[.] fastlib regressed by 115
from (for last_good == a31eae840525e9292a3a42c1fdac3fc594f42949)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_marm artifacts/build-a31eae840525e9292a3a42c1fdac3fc594f42949/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3644
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3642
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-e771614bae0a05585f720812d5936a0b81dcddf0
cd investigate-llvm-e771614bae0a05585f720812d5936a0b81dcddf0
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach e771614bae0a05585f720812d5936a0b81dcddf0
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach a31eae840525e9292a3a42c1fdac3fc594f42949
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit e771614bae0a05585f720812d5936a0b81dcddf0
Author: David Green <david.green(a)arm.com>
Date: Thu Feb 11 11:58:55 2021 +0000
[ARM] Change getScalarizationOverhead overload used in gather costs. NFC
This changes which of the getScalarizationOverhead overloads is used in
the gather/scatter cost to use the base variant directly, not relying on
the version using heuristics on the number of args with no args
provided. It should still produce the same costs for scalarized
gathers/scatters.
---
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index af67839c2d75..de2c0607d2ed 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -1416,8 +1416,9 @@ unsigned ARMTTIImpl::getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
unsigned VectorCost = NumElems * LT.first * ST->getMVEVectorCostFactor();
// The scalarization cost should be a lot higher. We use the number of vector
// elements plus the scalarization overhead.
- unsigned ScalarCost =
- NumElems * LT.first + BaseT::getScalarizationOverhead(VTy, {});
+ unsigned ScalarCost = NumElems * LT.first +
+ BaseT::getScalarizationOverhead(VTy, true, false) +
+ BaseT::getScalarizationOverhead(VTy, false, true);
if (EltSize < 8 || Alignment < EltSize / 8)
return ScalarCost;
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO
Culprit:
<cut>
commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Date: Mon Jan 25 11:00:56 2021 -0800
Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).
This does not affect the backend target-dependent codegen pipeline.
If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.
Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html
Reviewed By: asbirlea, ychen, MaskRay, echristo
Differential Revision: https://reviews.llvm.org/D95380
</cut>
Results regressed to (for first_bad == 669ddd1e9b1226432b003dbba05b99f8e992285b)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO artifacts/build-669ddd1e9b1226432b003dbba05b99f8e992285b/results_id:
1
# 473.astar,astar_base.default regressed by 106
from (for last_good == b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO artifacts/build-b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3543
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3539
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Date: Mon Jan 25 11:00:56 2021 -0800
Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).
This does not affect the backend target-dependent codegen pipeline.
If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.
Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html
Reviewed By: asbirlea, ychen, MaskRay, echristo
Differential Revision: https://reviews.llvm.org/D95380
---
llvm/CMakeLists.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 1affc289e64b..f5298de9f7ca 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -688,8 +688,8 @@ else()
endif()
option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default})
-set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL
- "Enable the experimental new pass manager by default.")
+set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL
+ "Enable the new pass manager by default.")
include(HandleLLVMOptions)
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2
Culprit:
<cut>
commit df7c22831f1e48dba49479c5960c1c180d8eab2c
Author: Richard Sandiford <richard.sandiford(a)arm.com>
Date: Thu Nov 14 15:12:58 2019 +0000
Support vectorisation with mixed vector sizes
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
</cut>
Results regressed to (for first_bad == df7c22831f1e48dba49479c5960c1c180d8eab2c)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O2 artifacts/build-df7c22831f1e48dba49479c5960c1c180d8eab2c/results_id:
1
# 453.povray,[.] _ZN3povL24All_Sphere_IntersectionsEPNS_13Objec regressed by 114
# 482.sphinx3,[.] subvq_mgau_shortlist regressed by 112
from (for last_good == 7f52eb891b738337d5cf82c7c440a5eea8c7b0c9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O2 artifacts/build-7f52eb891b738337d5cf82c7c440a5eea8c7b0c9/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/3483
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/3492
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-df7c22831f1e48dba49479c5960c1c180d8eab2c
cd investigate-gcc-df7c22831f1e48dba49479c5960c1c180d8eab2c
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach df7c22831f1e48dba49479c5960c1c180d8eab2c
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7f52eb891b738337d5cf82c7c440a5eea8c7b0c9
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Full commit (up to 1000 lines):
<cut>
commit df7c22831f1e48dba49479c5960c1c180d8eab2c
Author: Richard Sandiford <richard.sandiford(a)arm.com>
Date: Thu Nov 14 15:12:58 2019 +0000
Support vectorisation with mixed vector sizes
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
---
gcc/ChangeLog | 28 +++++++++++++++++++++++++
gcc/machmode.h | 3 +++
gcc/tree-vect-loop.c | 54 +++++++++++++++++++++++++++++++++++-------------
gcc/tree-vect-slp.c | 33 +++++++++++++++++++++++++----
gcc/tree-vect-stmts.c | 57 ++++++++++++++++++++++++++++++++++++---------------
gcc/tree-vectorizer.c | 2 +-
gcc/tree-vectorizer.h | 8 +++++---
7 files changed, 147 insertions(+), 38 deletions(-)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 41c94140b1a..680aa85121a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
+
+ * machmode.h (opt_machine_mode::operator==): New function.
+ (opt_machine_mode::operator!=): Likewise.
+ * tree-vectorizer.h (vec_info::vector_mode): Update comment.
+ (get_related_vectype_for_scalar_type): Delete.
+ (get_vectype_for_scalar_type_and_size): Declare.
+ * tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
+ whether analysis passed or failed, and with what vector modes.
+ Use related_vector_mode to check whether trying a particular
+ vector mode would be redundant with the autodetected mode,
+ and print a dump message if we decide to skip it.
+ * tree-vect-loop.c (vect_analyze_loop): Likewise.
+ (vect_create_epilog_for_reduction): Use
+ get_related_vectype_for_scalar_type instead of
+ get_vectype_for_scalar_type_and_size.
+ * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
+ with...
+ (get_related_vectype_for_scalar_type): ...this new function.
+ Take a starting/"prevailing" vector mode rather than a vector size.
+ Take an optional nunits argument, with the same meaning as for
+ related_vector_mode. Use related_vector_mode when not
+ auto-detecting a mode, falling back to mode_for_vector if no
+ target mode exists.
+ (get_vectype_for_scalar_type): Update accordingly.
+ (get_same_sized_vectype): Likewise.
+ * tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
+
2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
* tree-vect-stmts.c (vectorizable_call): Require the types
diff --git a/gcc/machmode.h b/gcc/machmode.h
index 6750833c2fe..a507ed66c3f 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -258,6 +258,9 @@ public:
bool exists () const;
template<typename U> bool exists (U *) const;
+ bool operator== (const T &m) const { return m_mode == m; }
+ bool operator!= (const T &m) const { return m_mode != m; }
+
private:
machine_mode m_mode;
};
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 213d620ed2c..e60c159d11a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2435,6 +2435,17 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
if (mode_i == 0)
autodetected_vector_mode = loop_vinfo->vector_mode;
+ if (dump_enabled_p ())
+ {
+ if (res)
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis succeeded with vector mode %s\n",
+ GET_MODE_NAME (loop_vinfo->vector_mode));
+ else
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis failed with vector mode %s\n",
+ GET_MODE_NAME (loop_vinfo->vector_mode));
+ }
loop->aux = NULL;
if (res)
@@ -2501,9 +2512,22 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
}
if (mode_i < vector_modes.length ()
- && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
- GET_MODE_SIZE (autodetected_vector_mode)))
- mode_i += 1;
+ && VECTOR_MODE_P (autodetected_vector_mode)
+ && (related_vector_mode (vector_modes[mode_i],
+ GET_MODE_INNER (autodetected_vector_mode))
+ == autodetected_vector_mode)
+ && (related_vector_mode (autodetected_vector_mode,
+ GET_MODE_INNER (vector_modes[mode_i]))
+ == vector_modes[mode_i]))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Skipping vector mode %s, which would"
+ " repeat the analysis for %s\n",
+ GET_MODE_NAME (vector_modes[mode_i]),
+ GET_MODE_NAME (autodetected_vector_mode));
+ mode_i += 1;
+ }
if (mode_i == vector_modes.length ()
|| autodetected_vector_mode == VOIDmode)
@@ -4898,13 +4922,14 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
halves against each other. */
enum machine_mode mode1 = mode;
tree stype = TREE_TYPE (vectype);
- unsigned sz = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
- unsigned sz1 = sz;
+ unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+ unsigned nunits1 = nunits;
if (!slp_reduc
&& (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
- sz1 = GET_MODE_SIZE (mode1).to_constant ();
+ nunits1 = GET_MODE_NUNITS (mode1).to_constant ();
- tree vectype1 = get_vectype_for_scalar_type_and_size (stype, sz1);
+ tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+ stype, nunits1);
reduce_with_shift = have_whole_vector_shift (mode1);
if (!VECTOR_MODE_P (mode1))
reduce_with_shift = false;
@@ -4918,11 +4943,13 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
/* First reduce the vector to the desired vector size we should
do shift reduction on by combining upper and lower halves. */
new_temp = new_phi_result;
- while (sz > sz1)
+ while (nunits > nunits1)
{
gcc_assert (!slp_reduc);
- sz /= 2;
- vectype1 = get_vectype_for_scalar_type_and_size (stype, sz);
+ nunits /= 2;
+ vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+ stype, nunits);
+ unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
/* The target has to make sure we support lowpart/highpart
extraction, either via direct vector extract or through
@@ -4947,15 +4974,14 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
= gimple_build_assign (dst2, BIT_FIELD_REF,
build3 (BIT_FIELD_REF, vectype1,
new_temp, TYPE_SIZE (vectype1),
- bitsize_int (sz * BITS_PER_UNIT)));
+ bitsize_int (bitsize)));
gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
}
else
{
/* Extract via punning to appropriately sized integer mode
vector. */
- tree eltype = build_nonstandard_integer_type (sz * BITS_PER_UNIT,
- 1);
+ tree eltype = build_nonstandard_integer_type (bitsize, 1);
tree etype = build_vector_type (eltype, 2);
gcc_assert (convert_optab_handler (vec_extract_optab,
TYPE_MODE (etype),
@@ -4984,7 +5010,7 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
= gimple_build_assign (tem, BIT_FIELD_REF,
build3 (BIT_FIELD_REF, eltype,
new_temp, TYPE_SIZE (eltype),
- bitsize_int (sz * BITS_PER_UNIT)));
+ bitsize_int (bitsize)));
gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
dst2 = make_ssa_name (vectype1);
epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 3885d9cbe4a..1e00db5a326 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3203,7 +3203,12 @@ vect_slp_bb_region (gimple_stmt_iterator region_begin,
&& dbg_cnt (vect_slp))
{
if (dump_enabled_p ())
- dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis succeeded with vector mode"
+ " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
+ dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+ }
bb_vinfo->shared->check_datarefs ();
vect_schedule_slp (bb_vinfo);
@@ -3223,6 +3228,13 @@ vect_slp_bb_region (gimple_stmt_iterator region_begin,
vectorized = true;
}
+ else
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis failed with vector mode %s\n",
+ GET_MODE_NAME (bb_vinfo->vector_mode));
+ }
if (mode_i == 0)
autodetected_vector_mode = bb_vinfo->vector_mode;
@@ -3230,9 +3242,22 @@ vect_slp_bb_region (gimple_stmt_iterator region_begin,
delete bb_vinfo;
if (mode_i < vector_modes.length ()
- && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
- GET_MODE_SIZE (autodetected_vector_mode)))
- mode_i += 1;
+ && VECTOR_MODE_P (autodetected_vector_mode)
+ && (related_vector_mode (vector_modes[mode_i],
+ GET_MODE_INNER (autodetected_vector_mode))
+ == autodetected_vector_mode)
+ && (related_vector_mode (autodetected_vector_mode,
+ GET_MODE_INNER (vector_modes[mode_i]))
+ == vector_modes[mode_i]))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Skipping vector mode %s, which would"
+ " repeat the analysis for %s\n",
+ GET_MODE_NAME (vector_modes[mode_i]),
+ GET_MODE_NAME (autodetected_vector_mode));
+ mode_i += 1;
+ }
if (vectorized
|| mode_i == vector_modes.length ()
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 80f59accad7..36f832bb522 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11138,18 +11138,28 @@ vect_remove_stores (stmt_vec_info first_stmt_info)
}
}
-/* Function get_vectype_for_scalar_type_and_size.
+/* If NUNITS is nonzero, return a vector type that contains NUNITS
+ elements of type SCALAR_TYPE, or null if the target doesn't support
+ such a type.
- Returns the vector type corresponding to SCALAR_TYPE and SIZE as supported
- by the target. */
+ If NUNITS is zero, return a vector type that contains elements of
+ type SCALAR_TYPE, choosing whichever vector size the target prefers.
+
+ If PREVAILING_MODE is VOIDmode, we have not yet chosen a vector mode
+ for this vectorization region and want to "autodetect" the best choice.
+ Otherwise, PREVAILING_MODE is a previously-chosen vector TYPE_MODE
+ and we want the new type to be interoperable with it. PREVAILING_MODE
+ in this case can be a scalar integer mode or a vector mode; when it
+ is a vector mode, the function acts like a tree-level version of
+ related_vector_mode. */
tree
-get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
+get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
+ tree scalar_type, poly_uint64 nunits)
{
tree orig_scalar_type = scalar_type;
scalar_mode inner_mode;
machine_mode simd_mode;
- poly_uint64 nunits;
tree vectype;
if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
@@ -11189,10 +11199,11 @@ get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
if (scalar_type == NULL_TREE)
return NULL_TREE;
- /* If no size was supplied use the mode the target prefers. Otherwise
- lookup a vector mode of the specified size. */
- if (known_eq (size, 0U))
+ /* If no prevailing mode was supplied, use the mode the target prefers.
+ Otherwise lookup a vector mode based on the prevailing mode. */
+ if (prevailing_mode == VOIDmode)
{
+ gcc_assert (known_eq (nunits, 0U));
simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
if (SCALAR_INT_MODE_P (simd_mode))
{
@@ -11208,9 +11219,19 @@ get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
return NULL_TREE;
}
}
- else if (!multiple_p (size, nbytes, &nunits)
- || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
- return NULL_TREE;
+ else if (SCALAR_INT_MODE_P (prevailing_mode)
+ || !related_vector_mode (prevailing_mode,
+ inner_mode, nunits).exists (&simd_mode))
+ {
+ /* Fall back to using mode_for_vector, mostly in the hope of being
+ able to use an integer mode. */
+ if (known_eq (nunits, 0U)
+ && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
+ return NULL_TREE;
+
+ if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+ return NULL_TREE;
+ }
vectype = build_vector_type_for_mode (scalar_type, simd_mode);
@@ -11238,9 +11259,8 @@ get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
tree
get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
{
- tree vectype;
- poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
- vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
+ tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
+ scalar_type);
if (vectype && vinfo->vector_mode == VOIDmode)
vinfo->vector_mode = TYPE_MODE (vectype);
return vectype;
@@ -11273,8 +11293,13 @@ get_same_sized_vectype (tree scalar_type, tree vector_type)
if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
return truth_type_for (vector_type);
- return get_vectype_for_scalar_type_and_size
- (scalar_type, GET_MODE_SIZE (TYPE_MODE (vector_type)));
+ poly_uint64 nunits;
+ if (!multiple_p (GET_MODE_SIZE (TYPE_MODE (vector_type)),
+ GET_MODE_SIZE (TYPE_MODE (scalar_type)), &nunits))
+ return NULL_TREE;
+
+ return get_related_vectype_for_scalar_type (TYPE_MODE (vector_type),
+ scalar_type, nunits);
}
/* Function vect_is_simple_use.
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index d6de78350e6..7be81a0b27f 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -1359,7 +1359,7 @@ get_vec_alignment_for_array_type (tree type)
poly_uint64 array_size, vector_size;
tree scalar_type = strip_array_types (type);
- tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
+ tree vectype = get_related_vectype_for_scalar_type (VOIDmode, scalar_type);
if (!vectype
|| !poly_int_tree_p (TYPE_SIZE (type), &array_size)
|| !poly_int_tree_p (TYPE_SIZE (vectype), &vector_size)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f6efed1f863..fadc4d89d16 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -335,8 +335,9 @@ public:
/* Cost data used by the target cost model. */
void *target_cost_data;
- /* If we've chosen a vector size for this vectorization region,
- this is one mode that has such a size, otherwise it is VOIDmode. */
+ /* The argument we should pass to related_vector_mode when looking up
+ the vector mode for a scalar mode, or VOIDmode if we haven't yet
+ made any decisions about which vector modes to use. */
machine_mode vector_mode;
private:
@@ -1624,8 +1625,9 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
/* In tree-vect-stmts.c. */
+extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
+ poly_uint64 = 0);
extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
extern tree get_mask_type_for_scalar_type (vec_info *, tree);
extern tree get_same_sized_vectype (tree, tree);
extern bool vect_get_loop_mask_type (loop_vec_info);
</cut>
VirtIO Initiative ([STR-9])
===========================
- posted Enabling hypervisor agnosticism for VirtIO backends
Message-Id: <87v94ldrqq.fsf(a)linaro.org>
VirtIO RPMB ([STR-5])
- made more progress and now have PROGRAM_KEY/WRITE_COUNTER done -
feels like it's getting faster
[hacking branch] <https://github.com/stsquad/virtio-rpmb/tree/hacking>
Fix VirtIO spec as per Rucha's email
------------------------------------
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 6.1-rc3 v1 0/4] gitlab and plugins pre-PR
Message-Id: <20210806141015.2487502-1-alex.bennee(a)linaro.org>
- prepared a potential [pull request for testing issues] but looks
like it will wait for 6.2
[pull request for testing issues]
<https://github.com/stsquad/qemu/tree/pr/120821-for-6.1-rc4-1>
Write a generic overview of vhost user usage for the manual
Enable plugins by default on TCG builds
- [X] clean-up testing matrix
Completed Reviews [10/10]
=========================
[PATCH 00/13] new plugin argument passing scheme
Message-Id: <20210717100920.240793-1-ma.mandourr(a)gmail.com>
[PATCH 0/9] new plugin argument passing scheme
Message-Id: <20210716080345.136784-1-ma.mandourr(a)gmail.com>
[RFC PATCH] Subject: [RFC PATCH] plugins: Passed the parsed arguments directly to plugins
Message-Id: <20210623155553.481099-1-ma.mandourr(a)gmail.com>
[PATCH 3/6] plugins/cache: Fixed a use-after-free bug with multithreaded usermode
Message-Id: <20210714172151.8494-4-ma.mandourr(a)gmail.com>
[PATCH v8] tests/tcg/s390x: Test SIGILL and SIGSEGV handling
Message-Id: <20210804225146.154513-1-iii(a)linux.ibm.com>
[RFC PATCH v2] Add a post for the new TCG cache modelling plugin
Message-Id: <20210617121707.764126-1-ma.mandourr(a)gmail.com>
[PATCH for 6.1] plugins: do not limit exported symbols if modules are active
Message-Id: <20210811100550.54714-1-pbonzini(a)redhat.com>
[PATCH v4 00/13] new plugin argument passing scheme
Message-Id: <20210730135817.17816-1-ma.mandourr(a)gmail.com>
[PATCH 0/6] docs/devel: Organize devel manual into further subsections
Message-Id: <20210804005621.1577302-1-jsnow(a)redhat.com>
[PATCH] Makefile: Fix cscope issues on MacOS and soft links
Message-Id: <20210801171144.60412-1-peterx(a)redhat.com>
Absences
========
- Another partial week
- On holiday for rest of August
Current Review Queue
====================
TODO [PATCH v3] accel/tcg: Clear PAGE_WRITE before translation
Message-Id: <20210805204835.158918-1-iii(a)linux.ibm.com>
=====================================================================================================================
TODO [PATCH 0/7] tcg: some small towards more modular tcg
Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com>
=================================================================================================================
TODO [PATCH 0/2] Acceptance Tests: clean up of temporary dirs and MAINTAINERS entry
Message-Id: <20210803193447.3946219-1-crosa(a)redhat.com>
==========================================================================================================================================
TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix
Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org>
============================================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
+ Getting rc3 out of the door
+ Finished the systick timer refactoring series and sent it out for
review (it ended up weighing in at 25 patches...)
+ Worked through some Coverity issue reports to analyze them and
either close as false-positive or send out patches fixing them
-- PMM
I'm compiling and running a bare metal AArch64 bootloader using 3
different compilers: the Linaro / ARM GCC 10.3.1 compiler, the Linaro /
ARM GCC 10.2.1 compiler, and an in-house built GCC 10.2.0 compiler.
GDB will single step using the either of the GCC 10.2 compilers; but
runs without halting when step is requested - or perhaps steps multiple
instructions - when built using the Linaro / ARM-supplied GCC 10.3.1.
Eclipse CDT (v4.20 aka 2021-06) is able to correlate debugging
information from binaries built with either of the gcc 10.2 toolchains,
and to single step correctly through the program. Breakpoints work as
expected. Registers display fine.
Eclipse CDT is not able to correlate current PC location to source code
using the binary built with Linaro / ARM 10.3, instead bringing up a
disassembly window. Breakpoints placed at assembly instructions in the
editor do not work.
I've tried three different GDB versions - ARM's supplied 10.2 and 10.3
GDB, and the in-house built GDB. Results are the same.
The same makefile is used to create the binaries, with just a few macro
definitions to switch. The only compiler flag of interest is
-march=armv8.2-a (and of course -g -O0). -mtune=cortex-a53 doesn't help.
The board is connected via JTAG using OpenOCD 0.11.0+ and an Olimex
ARM-USB-OCD-H adapter.
I'm building in a cygwin shell on Windows 10 version 21H1 using the
compilers:
gcc-arm-10.3-2021.07-mingw-w64-i686-aarch64-none-elf.tar.xz
gcc-arm-10.2-2020.11-mingw-w64-i686-aarch64-none-elf.tar.xz
downloaded from:
https://developer.arm.com/tools-and-software/open-source-software/developer…
Differences in compiler configuration (gcc -v) are:
Failing - Linaro / ARM GCC 10.3(.1):
--enable-checking=release
--target=aarch64-none-elf
--with-libiconv-prefix=/data/jenkins/workspace/GNU-toolchain/arm-10-4/build-mingw-aarch64-none-elf/host-tools
Working - in house GCC 10.2.1:
--build=x86_64-w64-mingw32
--disable-libffi
--disable-libgomp
--disable-libmudflap
--disable-libssp
--disable-libstdcxx-pch
--disable-lto
--disable-win32-registry
--enable-multilib
--target=aarch64-elf
--with-gcc
--with-gnu-as
--with-gnu-ld
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
--with-multilib-list=lp64,ilp32
--with-stabs
--with-sysroot=/build/aarch64-elf_10.2.0/cross-gcc/aarch64-elf
--with-zstd=/build/aarch64-elf_10.2.0/host
Has anyone been able to perform hardware debugging of binaries built
with the latest 10.3 builds using GDB (and maybe even Eclipse CDT)?
Any suggestions as to other steps to try?
Thanks.
== This Week ==
* GNU-708 (Attribute to mark param as const)
- Created prototype patch
- Discussions on gcc mailing list
* PR66791 (replace builtins in intrinsics with vector extensions)
- Fixed issue with PR98435 test-case as suggested by Christophe
- Pinged patches for review.
== Next Week ==
- GNU-708, PR66791
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3_LTO
Culprit:
<cut>
commit f31da42e047e8018ca6ad9809273bc7efb6ffcaf
Author: Richard Biener <rguenther(a)suse.de>
Date: Fri Aug 6 14:39:05 2021 +0200
tree-optimization/101801 - remove vect_worthwhile_without_simd_p
This removes the cost part of vect_worthwhile_without_simd_p, retaining
only the correctness bits. The reason is that the cost heuristic
do not properly account for SLP plus the check whether "without simd"
applies misfires for AVX512 mask vectors at the moment, leading to
missed vectorizations there.
Any costing decision should take place in the cost modeling, no
single stmt is to disable all vectorization on its own.
2021-08-06 Richard Biener <rguenther(a)suse.de>
PR tree-optimization/101801
* tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename...
(vect_can_vectorize_without_simd_p): ... to this.
* tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename...
(vect_can_vectorize_without_simd_p): ... to this and fold
in vect_min_worthwhile_factor.
(vect_min_worthwhile_factor): Remove.
(vectorizable_reduction): Adjust and remove the cost part.
* tree-vect-stmts.c (vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
</cut>
Results regressed to (for first_bad == f31da42e047e8018ca6ad9809273bc7efb6ffcaf)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O3_LTO_marm artifacts/build-f31da42e047e8018ca6ad9809273bc7efb6ffcaf/results_id:
1
# 482.sphinx3,sphinx_livepretend_base.default regressed by 105
from (for last_good == c2a984a3570b908a44a35e43bb48f0a05196156a)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O3_LTO_marm artifacts/build-c2a984a3570b908a44a35e43bb48f0a05196156a/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Results ID of last_good: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O3_LTO/3203
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Results ID of first_bad: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O3_LTO/3211
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-f31da42e047e8018ca6ad9809273bc7efb6ffcaf
cd investigate-gcc-f31da42e047e8018ca6ad9809273bc7efb6ffcaf
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach f31da42e047e8018ca6ad9809273bc7efb6ffcaf
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c2a984a3570b908a44a35e43bb48f0a05196156a
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Full commit (up to 1000 lines):
<cut>
commit f31da42e047e8018ca6ad9809273bc7efb6ffcaf
Author: Richard Biener <rguenther(a)suse.de>
Date: Fri Aug 6 14:39:05 2021 +0200
tree-optimization/101801 - remove vect_worthwhile_without_simd_p
This removes the cost part of vect_worthwhile_without_simd_p, retaining
only the correctness bits. The reason is that the cost heuristic
do not properly account for SLP plus the check whether "without simd"
applies misfires for AVX512 mask vectors at the moment, leading to
missed vectorizations there.
Any costing decision should take place in the cost modeling, no
single stmt is to disable all vectorization on its own.
2021-08-06 Richard Biener <rguenther(a)suse.de>
PR tree-optimization/101801
* tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename...
(vect_can_vectorize_without_simd_p): ... to this.
* tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename...
(vect_can_vectorize_without_simd_p): ... to this and fold
in vect_min_worthwhile_factor.
(vect_min_worthwhile_factor): Remove.
(vectorizable_reduction): Adjust and remove the cost part.
* tree-vect-stmts.c (vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
---
gcc/tree-vect-loop.c | 43 +++++++------------------------------------
gcc/tree-vect-stmts.c | 26 ++------------------------
gcc/tree-vectorizer.h | 2 +-
3 files changed, 10 insertions(+), 61 deletions(-)
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 1e21fe6b13d..37c7daa7f9e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7227,24 +7227,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
if (dump_enabled_p ())
dump_printf (MSG_NOTE, "op not supported by target.\n");
if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
- || !vect_worthwhile_without_simd_p (loop_vinfo, code))
+ || !vect_can_vectorize_without_simd_p (code))
ok = false;
else
if (dump_enabled_p ())
dump_printf (MSG_NOTE, "proceeding using word mode.\n");
}
- /* Worthwhile without SIMD support? */
- if (ok
- && !VECTOR_MODE_P (TYPE_MODE (vectype_in))
- && !vect_worthwhile_without_simd_p (loop_vinfo, code))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "not worthwhile without SIMD support.\n");
- ok = false;
- }
-
/* lane-reducing operations have to go through vect_transform_reduction.
For the other cases try without the single cycle optimization. */
if (!ok)
@@ -7948,46 +7937,28 @@ vectorizable_phi (vec_info *,
}
-/* Function vect_min_worthwhile_factor.
+/* Return true if we can emulate CODE on an integer mode representation
+ of a vector. */
- For a loop where we could vectorize the operation indicated by CODE,
- return the minimum vectorization factor that makes it worthwhile
- to use generic vectors. */
-static unsigned int
-vect_min_worthwhile_factor (enum tree_code code)
+bool
+vect_can_vectorize_without_simd_p (tree_code code)
{
switch (code)
{
case PLUS_EXPR:
case MINUS_EXPR:
case NEGATE_EXPR:
- return 4;
-
case BIT_AND_EXPR:
case BIT_IOR_EXPR:
case BIT_XOR_EXPR:
case BIT_NOT_EXPR:
- return 2;
+ return true;
default:
- return INT_MAX;
+ return false;
}
}
-/* Return true if VINFO indicates we are doing loop vectorization and if
- it is worth decomposing CODE operations into scalar operations for
- that loop's vectorization factor. */
-
-bool
-vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
-{
- loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
- unsigned HOST_WIDE_INT value;
- return (loop_vinfo
- && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&value)
- && value >= vect_min_worthwhile_factor (code));
-}
-
/* Function vectorizable_induction
Check if STMT_INFO performs an induction computation that can be vectorized.
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 94bdb74ea8d..5b94d41e292 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5685,24 +5685,13 @@ vectorizable_shift (vec_info *vinfo,
/* Check only during analysis. */
if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
|| (!vec_stmt
- && !vect_worthwhile_without_simd_p (vinfo, code)))
+ && !vect_can_vectorize_without_simd_p (code)))
return false;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"proceeding using word mode.\n");
}
- /* Worthwhile without SIMD support? Check only during analysis. */
- if (!vec_stmt
- && !VECTOR_MODE_P (TYPE_MODE (vectype))
- && !vect_worthwhile_without_simd_p (vinfo, code))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "not worthwhile without SIMD support.\n");
- return false;
- }
-
if (!vec_stmt) /* transformation not required. */
{
if (slp_node
@@ -6094,24 +6083,13 @@ vectorizable_operation (vec_info *vinfo,
"op not supported by target.\n");
/* Check only during analysis. */
if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
- || (!vec_stmt && !vect_worthwhile_without_simd_p (vinfo, code)))
+ || (!vec_stmt && !vect_can_vectorize_without_simd_p (code)))
return false;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"proceeding using word mode.\n");
}
- /* Worthwhile without SIMD support? Check only during analysis. */
- if (!VECTOR_MODE_P (vec_mode)
- && !vec_stmt
- && !vect_worthwhile_without_simd_p (vinfo, code))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "not worthwhile without SIMD support.\n");
- return false;
- }
-
int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL);
internal_fn cond_fn = get_conditional_internal_fn (code);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 5571b3cce3b..de0ecf86478 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2061,7 +2061,7 @@ extern bool vectorizable_lc_phi (loop_vec_info, stmt_vec_info,
gimple **, slp_tree);
extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, slp_tree,
stmt_vector_for_cost *);
-extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code);
+extern bool vect_can_vectorize_without_simd_p (tree_code);
extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
stmt_vector_for_cost *,
stmt_vector_for_cost *,
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os
Culprit:
<cut>
commit b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8
Author: Jakub Jelinek <jakub(a)redhat.com>
Date: Tue Aug 11 16:46:49 2020 +0200
c-family: Fix ICE in get_atomic_generic_size [PR96545]
As the testcase shows, we would ICE if the type of the first argument of
various atomic builtins was pointer to (non-void) incomplete type, we would
assume that TYPE_SIZE_UNIT must be non-NULL. This patch diagnoses it
instead. And also changes the TREE_CODE != INTEGER_CST check to
!tree_fits_uhwi_p, as we use tree_to_uhwi after this and at least in theory
the int could be too large and not fit.
2020-08-11 Jakub Jelinek <jakub(a)redhat.com>
PR c/96545
* c-common.c (get_atomic_generic_size): Require that first argument's
type points to a complete type and use tree_fits_uhwi_p instead of
just INTEGER_CST TREE_CODE check for the TYPE_SIZE_UNIT.
* c-c++-common/pr96545.c: New test.
(cherry picked from commit 7840b4dc05539cf5575b3e9ff57ff5f6c3da2cae)
</cut>
Results regressed to (for first_bad == b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Os_mthumb artifacts/build-b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8/results_id:
1
# 429.mcf,mcf_base.default regressed by 104
# 470.lbm,lbm_base.default regressed by 103
from (for last_good == db00336a49707327552e678b59da8e85384bdae6)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Os_mthumb artifacts/build-db00336a49707327552e678b59da8e85384bdae6/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-release-arm-spec2k6-Os/3201
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-release-arm-spec2k6-Os/3141
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8
cd investigate-gcc-b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach db00336a49707327552e678b59da8e85384bdae6
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit b9bb6a5e12cae44a1cbf298b69f28fc6871f81c8
Author: Jakub Jelinek <jakub(a)redhat.com>
Date: Tue Aug 11 16:46:49 2020 +0200
c-family: Fix ICE in get_atomic_generic_size [PR96545]
As the testcase shows, we would ICE if the type of the first argument of
various atomic builtins was pointer to (non-void) incomplete type, we would
assume that TYPE_SIZE_UNIT must be non-NULL. This patch diagnoses it
instead. And also changes the TREE_CODE != INTEGER_CST check to
!tree_fits_uhwi_p, as we use tree_to_uhwi after this and at least in theory
the int could be too large and not fit.
2020-08-11 Jakub Jelinek <jakub(a)redhat.com>
PR c/96545
* c-common.c (get_atomic_generic_size): Require that first argument's
type points to a complete type and use tree_fits_uhwi_p instead of
just INTEGER_CST TREE_CODE check for the TYPE_SIZE_UNIT.
* c-c++-common/pr96545.c: New test.
(cherry picked from commit 7840b4dc05539cf5575b3e9ff57ff5f6c3da2cae)
---
gcc/c-family/c-common.c | 9 ++++++++-
gcc/testsuite/c-c++-common/pr96545.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 20258c331af..b6eb40c8122 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6948,8 +6948,15 @@ get_atomic_generic_size (location_t loc, tree function,
return 0;
}
+ if (!COMPLETE_TYPE_P (TREE_TYPE (type_0)))
+ {
+ error_at (loc, "argument 1 of %qE must be a pointer to a complete type",
+ function);
+ return 0;
+ }
+
/* Types must be compile time constant sizes. */
- if (TREE_CODE ((TYPE_SIZE_UNIT (TREE_TYPE (type_0)))) != INTEGER_CST)
+ if (!tree_fits_uhwi_p ((TYPE_SIZE_UNIT (TREE_TYPE (type_0)))))
{
error_at (loc,
"argument 1 of %qE must be a pointer to a constant size type",
diff --git a/gcc/testsuite/c-c++-common/pr96545.c b/gcc/testsuite/c-c++-common/pr96545.c
new file mode 100644
index 00000000000..bc6b0cf345c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr96545.c
@@ -0,0 +1,31 @@
+/* PR c/96545 */
+/* { dg-do compile } */
+
+extern char x[], y[], z[];
+struct S;
+extern struct S s, t, u;
+int v, w;
+
+void
+foo (void)
+{
+ __atomic_exchange (&x, &y, &z, 0); /* { dg-error "must be a pointer to a complete type" } */
+}
+
+void
+bar (void)
+{
+ __atomic_exchange (&s, &t, &u, 0); /* { dg-error "must be a pointer to a complete type" } */
+}
+
+void
+baz (void)
+{
+ __atomic_exchange (&v, &t, &w, 0); /* { dg-error "size mismatch in argument 2 of" } */
+}
+
+void
+qux (void)
+{
+ __atomic_exchange (&v, &w, &t, 0); /* { dg-error "size mismatch in argument 3 of" } */
+}
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
Culprit:
<cut>
commit f1ab60e40d16970381a003e145be6d5932823597
Author: Tomasz Kamiński <tomasz.kaminski(a)sonarsource.com>
Date: Thu Jul 29 10:55:24 2021 +0200
Fix FindZ3.cmake to support static libraries and Windows
Use absolute path to link z3 to allow builds both on windows and linux
since the library name is platform dependent for Z3 (libz3 on Windows
and z3 on Linux) and MSVC does not recognized -L and -l options.
Fix CMAKE_CROSSCOMPILING that does not work correctly since it uses
Z3_BUILD_VERSION instead of Z3_BUILD_NUMBER
Fix building with the static version of z3 library (supersedes D80227).
- Build the Z3 version detection code as C++, since the static
library brings in libstdc++ symbols
- Detect threading support and link against threading, in the
(likely) case Z3 was built with threads
Exposed compilation error from building a program that is used to detect
z3 version in the warning message, to simplify troubleshooting.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D106131
</cut>
Results regressed to (for first_bad == f1ab60e40d16970381a003e145be6d5932823597)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-f1ab60e40d16970381a003e145be6d5932823597/results_id:
1
# 464.h264ref,h264ref_base.default regressed by 106
# 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 131
from (for last_good == 2df8bf9339e43de63d8d28e07182e1d6d7ffb843)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-2df8bf9339e43de63d8d28e07182e1d6d7ffb843/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/3087
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/3058
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-f1ab60e40d16970381a003e145be6d5932823597
cd investigate-llvm-f1ab60e40d16970381a003e145be6d5932823597
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach f1ab60e40d16970381a003e145be6d5932823597
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 2df8bf9339e43de63d8d28e07182e1d6d7ffb843
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit f1ab60e40d16970381a003e145be6d5932823597
Author: Tomasz Kamiński <tomasz.kaminski(a)sonarsource.com>
Date: Thu Jul 29 10:55:24 2021 +0200
Fix FindZ3.cmake to support static libraries and Windows
Use absolute path to link z3 to allow builds both on windows and linux
since the library name is platform dependent for Z3 (libz3 on Windows
and z3 on Linux) and MSVC does not recognized -L and -l options.
Fix CMAKE_CROSSCOMPILING that does not work correctly since it uses
Z3_BUILD_VERSION instead of Z3_BUILD_NUMBER
Fix building with the static version of z3 library (supersedes D80227).
- Build the Z3 version detection code as C++, since the static
library brings in libstdc++ symbols
- Detect threading support and link against threading, in the
(likely) case Z3 was built with threads
Exposed compilation error from building a program that is used to detect
z3 version in the warning message, to simplify troubleshooting.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D106131
---
llvm/cmake/modules/FindZ3.cmake | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/llvm/cmake/modules/FindZ3.cmake b/llvm/cmake/modules/FindZ3.cmake
index 95dd37789a87..118b1eac3b32 100644
--- a/llvm/cmake/modules/FindZ3.cmake
+++ b/llvm/cmake/modules/FindZ3.cmake
@@ -2,8 +2,21 @@ INCLUDE(CheckCXXSourceRuns)
# Function to check Z3's version
function(check_z3_version z3_include z3_lib)
+ # Get lib path
+ set(z3_link_libs "${z3_lib}")
+
+ # Try to find a threading module in case Z3 was built with threading support.
+ # Threads are required elsewhere in LLVM, but not marked as required here because
+ # Z3 could have been compiled without threading support.
+ find_package(Threads)
+ # CMAKE_THREAD_LIBS_INIT may be empty if the thread functions are provided by the
+ # system libraries and no special flags are needed.
+ if(CMAKE_THREAD_LIBS_INIT)
+ list(APPEND z3_link_libs "${CMAKE_THREAD_LIBS_INIT}")
+ endif()
+
# The program that will be executed to print Z3's version.
- file(WRITE ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/testz3.c
+ file(WRITE ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/testz3.cpp
"#include <assert.h>
#include <z3.h>
int main() {
@@ -13,16 +26,14 @@ function(check_z3_version z3_include z3_lib)
return 0;
}")
- # Get lib path
- get_filename_component(z3_lib_path ${z3_lib} PATH)
-
try_run(
Z3_RETURNCODE
Z3_COMPILED
${CMAKE_BINARY_DIR}
- ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/testz3.c
+ ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/testz3.cpp
COMPILE_DEFINITIONS -I"${z3_include}"
- LINK_LIBRARIES -L${z3_lib_path} -lz3
+ LINK_LIBRARIES ${z3_link_libs}
+ COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT
RUN_OUTPUT_VARIABLE SRC_OUTPUT
)
@@ -30,6 +41,9 @@ function(check_z3_version z3_include z3_lib)
string(REGEX REPLACE "([0-9]*\\.[0-9]*\\.[0-9]*)" "\\1"
z3_version "${SRC_OUTPUT}")
set(Z3_VERSION_STRING ${z3_version} PARENT_SCOPE)
+ else()
+ message(NOTICE "${COMPILE_OUTPUT}")
+ message(WARNING "Failed to compile Z3 program that is used to determine library version.")
endif()
endfunction(check_z3_version)
@@ -86,7 +100,7 @@ if(NOT Z3_VERSION_STRING AND (CMAKE_CROSSCOMPILING AND
file(STRINGS "${Z3_INCLUDE_DIR}/z3_version.h"
z3_version_str REGEX "^#define[\t ]+Z3_BUILD_NUMBER[\t ]+.*")
- string(REGEX REPLACE "^.*Z3_BUILD_VERSION[\t ]+([0-9]).*$" "\\1"
+ string(REGEX REPLACE "^.*Z3_BUILD_NUMBER[\t ]+([0-9]).*$" "\\1"
Z3_BUILD "${z3_version_str}")
set(Z3_VERSION_STRING ${Z3_MAJOR}.${Z3_MINOR}.${Z3_BUILD})
@@ -98,6 +112,7 @@ if(NOT Z3_VERSION_STRING)
# conservative and force the found version to 0.0.0 to make version
# checks always fail.
set(Z3_VERSION_STRING "0.0.0")
+ message(WARNING "Failed to determine Z3 library version, defaulting to 0.0.0.")
endif()
# handle the QUIETLY and REQUIRED arguments and set Z3_FOUND to TRUE if
</cut>
Progress:
* UM-2 [QEMU upstream maintainership]
+ Usual release work (rc2 now out) and code review
+ Continuing with systick timer refactoring. This has turned out a bit
more complicated than I expected: had to do a preliminary refactor
to move some stuff out of the NVIC device into the armv7m container;
also needed to add support in the Clock APIs for frequency multiply
and divide for the benefit of the stm32 SoCs which drive the systick
reference clock at 1/8 the speed of the main CPU clock
-- PMM
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2
Culprit:
<cut>
commit 6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093
Author: Richard Sandiford <richard.sandiford(a)arm.com>
Date: Thu Oct 31 17:16:31 2019 +0000
[AArch64] Fix build for non-default languages
The SVE PCS support broke go, D and Ada because those languages don't
call TARGET_INIT_BUILTINS. We therefore ended up trying to get the
TYPE_MAIN_VARIANT of a null __SVBool_t.
We shouldn't really need to apply TYPE_MAIN_VARIANT there anyway,
since the ABI-defined types are (and need to be) their own main
variants. This patch asserts for that instead.
2019-10-31 Richard Sandiford <richard.sandiford(a)arm.com>
gcc/
* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types):
Assert that the type we store in abi_vector_types is its own
main variant.
(svbool_type_p): Don't apply TYPE_MAIN_VARIANT here.
From-SVN: r277680
</cut>
Results regressed to (for first_bad == 6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O2 artifacts/build-6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093/results_id:
1
# 458.sjeng,[.] setup_attackers regressed by 111
# 458.sjeng,[.] search regressed by 215
from (for last_good == aaa80941e042d18dcd5add6e7bb28cb392767a39)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O2 artifacts/build-aaa80941e042d18dcd5add6e7bb28cb392767a39/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/2854
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/2851
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093
cd investigate-gcc-6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach aaa80941e042d18dcd5add6e7bb28cb392767a39
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Full commit (up to 1000 lines):
<cut>
commit 6ff0cdebb1bc281ba2374f3ecdbe358c4fa74093
Author: Richard Sandiford <richard.sandiford(a)arm.com>
Date: Thu Oct 31 17:16:31 2019 +0000
[AArch64] Fix build for non-default languages
The SVE PCS support broke go, D and Ada because those languages don't
call TARGET_INIT_BUILTINS. We therefore ended up trying to get the
TYPE_MAIN_VARIANT of a null __SVBool_t.
We shouldn't really need to apply TYPE_MAIN_VARIANT there anyway,
since the ABI-defined types are (and need to be) their own main
variants. This patch asserts for that instead.
2019-10-31 Richard Sandiford <richard.sandiford(a)arm.com>
gcc/
* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types):
Assert that the type we store in abi_vector_types is its own
main variant.
(svbool_type_p): Don't apply TYPE_MAIN_VARIANT here.
From-SVN: r277680
---
gcc/ChangeLog | 7 +++++++
gcc/config/aarch64/aarch64-sve-builtins.cc | 4 ++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 66b7a142251..affa74cdd25 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2019-10-31 Richard Sandiford <richard.sandiford(a)arm.com>
+
+ * config/aarch64/aarch64-sve-builtins.cc (register_builtin_types):
+ Assert that the type we store in abi_vector_types is its own
+ main variant.
+ (svbool_type_p): Don't apply TYPE_MAIN_VARIANT here.
+
2019-10-31 Richard Earnshaw <rearnsha(a)arm.com>
* config/arm/arm.c (arm_legitimize_address): Don't form negative offsets
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 70d7b1a165d..424f64adfef 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2993,6 +2993,7 @@ register_builtin_types ()
BITS_PER_SVE_VECTOR));
}
vectype = build_distinct_type_copy (vectype);
+ gcc_assert (vectype == TYPE_MAIN_VARIANT (vectype));
SET_TYPE_STRUCTURAL_EQUALITY (vectype);
TYPE_ARTIFICIAL (vectype) = 1;
abi_vector_types[i] = vectype;
@@ -3235,8 +3236,7 @@ bool
svbool_type_p (const_tree type)
{
tree abi_type = abi_vector_types[VECTOR_TYPE_svbool_t];
- return (type != error_mark_node
- && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (abi_type));
+ return type != error_mark_node && TYPE_MAIN_VARIANT (type) == abi_type;
}
/* If TYPE is a built-in type defined by the SVE ABI, return the mangled name,
</cut>