Project Stratos
===============
- spent more time troubleshooting Xen builds with Viresh
Linux RPMB Sub-system and virtio-driver ([STR-40])
- started working on v2 of the Linux driver
QEMU Upstream Work ([UM-2])
===========================
- posted Analysis of slow distro boots in check-avocado
(BootLinuxAarch64.test_virt_tcg*) Message-Id:
<874k4xbqvp.fsf(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Current Review Queue
====================
TODO [RFC PATCH 00/27] Virtio sound card implementation
Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com>
============================================================================================================================
TODO [PATCH v6 00/43] CXl 2.0 emulation Support
Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com>
==============================================================================================================
TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org>
====================================================================================================================================
--
Alex Bennée
Progress (a report covering two half-weeks)
* UM-2 [QEMU upstream maintainership]
- lots of code review
- fixed another bug in the armv7m clock framework code
- refactoring patchset to trim some fat from a header that
gets included by every C file in the build
* QEMU-420 [GICv4 emulation]
- CPU interface parts of GICv4 work are code-complete
- started on the redistributor work
-- PMM
Project Stratos
===============
- posted [RFC PATCH] tests/qtest: attempt to enable tests for
virtio-gpio (!working) Message-Id:
<20220121151534.3654562-1-alex.bennee(a)linaro.org>
- need to increase coverage of the QEMU boilerplate to get it merged
- discussions on next steps with SCMI backend with Vincent (moving
from the QEMU->QEMU PoC)
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH v2 00/25] testing and plugin updates Message-Id:
<20220201182050.15087-1-alex.bennee(a)linaro.org>
- posted [RFC PATCH 0/4] improve coverage of vector backend
Message-Id: <20220202191242.652607-2-alex.bennee(a)linaro.org>
- posted [PATCH v3 00/26] testing and plugins pre-PR Message-Id:
<20220204204335.1689602-1-alex.bennee(a)linaro.org>
- posted [RFC PATCH] arm: force flag recalculation when messing with
DAIF Message-Id: <20220202122353.457084-1-alex.bennee(a)linaro.org>
- trying to track down a weird TLS bug:
<https://gitlab.com/stsquad/qemu/-/jobs/2056025874#L3532>
- on aarch64 HW, running qemu-s390x with a simple test case fails
every 100/200 times
- seems TLS memory gets made non-accessible (rw-p -> ---p, except to
gdb)
- strace doesn't show a culprit, possible kernel bug?
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG
sanity tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Other
=====
- planning and brainstorming for Linaro Tech Day
Completed Reviews [5/5]
=======================
[PATCH v4 00/42] CXl 2.0 emulation Support
Message-Id: <20220124171705.10432-1-Jonathan.Cameron(a)huawei.com>
[PATCH] gitlab: fall back to commit hash in qemu-setup filename
Message-Id: <20220125173454.10381-1-stefanha(a)redhat.com>
[PATCH for-7.0] gitlab-ci: Add cirrus-ci based tests for NetBSD and OpenBSD
Message-Id: <20211209103124.121942-1-thuth(a)redhat.com>
[PATCH 00/20] tcg: vector improvements
Message-Id: <20211218194250.247633-1-richard.henderson(a)linaro.org>
Absences
========
Current Review Queue
====================
TODO [PATCH 0/4] target/arm: SVE fixes versus VHE
Message-Id: <20220127063428.30212-1-richard.henderson(a)linaro.org>
==================================================================================================================
TODO [PATCH 00/14] arm_gicv3_its: Implement MOVI and MOVALL commands
Message-Id: <20220122182444.724087-1-peter.maydell(a)linaro.org>
==================================================================================================================================
TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices
Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com>
==============================================================================================================================================
TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings
Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org>
======================================================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Fixed some minor issues with the hvf accelerator and sent out a patchset
+ '-cpu max' didn't act like '-cpu host'
+ we weren't exposing PAuth to the guest
* QEMU-420 [GICv4 emulation]
- Sent out a patchset with more cleanups and fixes to the existing ITS code
- The ITS parts of the GICv4 work are now code-complete; moving on to
the redistributor end of things next week.
-- PMM
Progress:
* UM-2 [QEMU upstream maintainership]
- Before the QEMU 7.0 release we tried to land a bug fix which corrected
the handling in our PSCI emulation of calls where the function ID
is unrecognized -- these are supposed to return an error code.
The bugfix turned out to cause regressions for some boards when
running guest code at EL3 (because those boards were incorrectly
enabling PSCI emulation in that situation). Sent a patchset that
fixed those boards so we don't enable PSCI when running EL3 guests,
and re-introduced the original PSCI bugfix.
- Fixed various bugs in the highbank/midway boards discovered in
the process of writing and testing the above patchset. (These
two boards were the most complicated to fix.)
- More code review, and sent out an arm pullrequest
- Small handful of other minor patches
-- PMM
Seems like my change to make Clang default to DWARFv5 might've caused a
buildbot failure on your build worker here:
https://lab.llvm.org/buildbot/#/builders/185/builds/1295
But I seem to be able to run this test successfully locally on my Linux
machine - so I'm wondering if you can offer any help diagnosing the issue
showing up on your builder/worker?
Project Stratos
===============
- [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio
(!working) Message-Id:
<20220121151534.3654562-1-alex.bennee(a)linaro.org>
- trying to clear the way for merging virtio-gpio to QEMU
vhost-device maintainer effort ([UM-196])
- reviewed vhost-device [pr7 with the vm-virtio vsock abstraction]
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
[pr7 with the vm-virtio vsock abstraction]
<https://github.com/stsquad/vhost-device/tree/review/pr7-with-laurat-abstrac…>
QEMU Upstream Work ([UM-2])
===========================
- posted [PULL v2 00/31] testing/next and other misc fixes Message-Id:
<20220118190043.1427303-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG
sanity tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
Completed Reviews [2/2]
=======================
[PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings
Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org>
[PATCH v2 0/6] qtests/libqos: Allow PCI tests to be run with virt-machine
Message-Id: <20220118203833.316741-7-eric.auger(a)redhat.com>
Absences
========
Current Review Queue
====================
TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices
Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com>
==============================================================================================================================================
TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings
Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org>
======================================================================================================================================
TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix
Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org>
============================================================================================================================
TODO [PATCH 0/7] tcg: some small towards more modular tcg
Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com>
=================================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Sent patches for some reported bugs to do with state save/load
* QEMU-420 [GICv4 emulation]
- Wrote patches to implement the missing MOVALL and MOVI commands
- Fixed a few minor bugs noticed along the way
- Should be able to send out a patchset early next week and then can
get back to the new-in-GICv4 work
-- PMM
[TCWG CI] Regression caused by gcc: Add -Wdangling-pointer [PR63272].:
commit 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
Author: Martin Sebor <msebor(a)redhat.com>
Add -Wdangling-pointer [PR63272].
Results regressed to
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
21324
# First few build errors in logs:
# 00:03:31 sound/core/oss/mixer_oss.c:1057:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized]
# 00:03:32 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized]
# 00:03:32 sound/core/oss/pcm_oss.c:2488:34: error: ‘setup’ is used uninitialized [-Werror=uninitialized]
# 00:03:32 sound/core/oss/pcm_oss.c:2998:51: error: ‘template’ is used uninitialized [-Werror=uninitialized]
# 00:03:35 make[3]: *** [scripts/Makefile.build:277: sound/core/oss/mixer_oss.o] Error 1
# 00:03:35 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized]
# 00:03:35 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized]
# 00:03:36 make[4]: *** [scripts/Makefile.build:277: sound/core/seq/oss/seq_oss_init.o] Error 1
# 00:03:40 make[3]: *** [scripts/Makefile.build:277: sound/core/oss/pcm_oss.o] Error 1
# 00:03:50 make[3]: *** [scripts/Makefile.build:540: sound/core/seq/oss] Error 2
from
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
21354
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_kernel/gnu-master-aarch64-stable-allmodconfig
First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-…
Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-…
Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-…
Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-…
Reproduce builds:
<cut>
mkdir investigate-gcc-9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
cd investigate-gcc-9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 671a283636de75f7ed638ee6b01ed2d44361b8b6
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6
Author: Martin Sebor <msebor(a)redhat.com>
Date: Sat Jan 15 16:41:40 2022 -0700
Add -Wdangling-pointer [PR63272].
Resolves:
PR c/63272 - GCC should warn when using pointer to dead scoped variable with
in the same function
gcc/c-family/ChangeLog:
PR c/63272
* c.opt (-Wdangling-pointer): New option.
gcc/ChangeLog:
PR c/63272
* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
-Wdangling-pointer.
* doc/invoke.texi (-Wdangling-pointer): Document new option.
* gimple-ssa-warn-access.cc (pass_waccess::clone): Set new member.
(pass_waccess::check_pointer_uses): New function.
(pass_waccess::gimple_call_return_arg): New function.
(pass_waccess::gimple_call_return_arg_ref): New function.
(pass_waccess::check_call_dangling): New function.
(pass_waccess::check_dangling_uses): New function overloads.
(pass_waccess::check_dangling_stores): New function.
(pass_waccess::check_dangling_stores): New function.
(pass_waccess::m_clobbers): New data member.
(pass_waccess::m_func): New data member.
(pass_waccess::m_run_number): New data member.
(pass_waccess::m_check_dangling_p): New data member.
(pass_waccess::check_alloca): Check m_early_checks_p.
(pass_waccess::check_alloc_size_call): Same.
(pass_waccess::check_strcat): Same.
(pass_waccess::check_strncat): Same.
(pass_waccess::check_stxcpy): Same.
(pass_waccess::check_stxncpy): Same.
(pass_waccess::check_strncmp): Same.
(pass_waccess::check_memop_access): Same.
(pass_waccess::check_read_access): Same.
(pass_waccess::check_builtin): Call check_pointer_uses.
(pass_waccess::warn_invalid_pointer): Add arguments.
(is_auto_decl): New function.
(pass_waccess::check_stmt): New function.
(pass_waccess::check_block): Call check_stmt.
(pass_waccess::execute): Call check_dangling_uses,
check_dangling_stores. Empty m_clobbers.
* passes.def (pass_warn_access): Invoke pass two more times.
gcc/testsuite/ChangeLog:
PR c/63272
* g++.dg/warn/Wfree-nonheap-object-6.C: Disable valid warnings.
* g++.dg/warn/ref-temp1.C: Prune expected warning.
* gcc.dg/uninit-pr50476.c: Expect a new warning.
* c-c++-common/Wdangling-pointer-2.c: New test.
* c-c++-common/Wdangling-pointer-3.c: New test.
* c-c++-common/Wdangling-pointer-4.c: New test.
* c-c++-common/Wdangling-pointer-5.c: New test.
* c-c++-common/Wdangling-pointer-6.c: New test.
* c-c++-common/Wdangling-pointer.c: New test.
* g++.dg/warn/Wdangling-pointer-2.C: New test.
* g++.dg/warn/Wdangling-pointer.C: New test.
* gcc.dg/Wdangling-pointer-2.c: New test.
* gcc.dg/Wdangling-pointer.c: New test.
---
gcc/c-family/c.opt | 8 +
gcc/diagnostic-spec.c | 1 +
gcc/doc/invoke.texi | 62 +-
gcc/gimple-ssa-warn-access.cc | 635 +++++++++++++++++++--
gcc/passes.def | 5 +-
gcc/testsuite/c-c++-common/Wdangling-pointer-2.c | 437 ++++++++++++++
gcc/testsuite/c-c++-common/Wdangling-pointer-3.c | 64 +++
gcc/testsuite/c-c++-common/Wdangling-pointer-4.c | 73 +++
gcc/testsuite/c-c++-common/Wdangling-pointer-5.c | 90 +++
gcc/testsuite/c-c++-common/Wdangling-pointer-6.c | 32 ++
gcc/testsuite/c-c++-common/Wdangling-pointer.c | 434 ++++++++++++++
gcc/testsuite/g++.dg/warn/Wdangling-pointer-2.C | 23 +
gcc/testsuite/g++.dg/warn/Wdangling-pointer.C | 74 +++
gcc/testsuite/g++.dg/warn/Wfree-nonheap-object-6.C | 4 +-
gcc/testsuite/g++.dg/warn/ref-temp1.C | 3 +
gcc/testsuite/gcc.dg/Wdangling-pointer-2.c | 82 +++
gcc/testsuite/gcc.dg/Wdangling-pointer.c | 75 +++
gcc/testsuite/gcc.dg/uninit-pr50476.c | 2 +-
18 files changed, 2043 insertions(+), 61 deletions(-)
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 28363643664..db65c14a7a5 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -548,6 +548,14 @@ Wdangling-else
C ObjC C++ ObjC++ Var(warn_dangling_else) Warning LangEnabledBy(C ObjC C++ ObjC++,Wparentheses)
Warn about dangling else.
+Wdangling-pointer
+C ObjC C++ LTO ObjC++ Alias(Wdangling-pointer=, 2, 0) Warning
+Warn for uses of pointers to auto variables whose lifetime has ended.
+
+Wdangling-pointer=
+C ObjC C++ ObjC++ Joined RejectNegative UInteger Var(warn_dangling_pointer) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall, 2, 0) IntegerRange(0, 2)
+Warn for uses of pointers to auto variables whose lifetime has ended.
+
Wdate-time
C ObjC C++ ObjC++ CPP(warn_date_time) CppReason(CPP_W_DATE_TIME) Var(cpp_warn_date_time) Init(0) Warning
Warn about __TIME__, __DATE__ and __TIMESTAMP__ usage.
diff --git a/gcc/diagnostic-spec.c b/gcc/diagnostic-spec.c
index c9e1c1be91d..a8af229d677 100644
--- a/gcc/diagnostic-spec.c
+++ b/gcc/diagnostic-spec.c
@@ -99,6 +99,7 @@ nowarn_spec_t::nowarn_spec_t (opt_code opt)
m_bits = NW_UNINIT;
break;
+ case OPT_Wdangling_pointer_:
case OPT_Wreturn_local_addr:
case OPT_Wuse_after_free_:
m_bits = NW_DANGLING;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 121c8ea827f..7f2205e4a85 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -341,7 +341,8 @@ Objective-C and Objective-C++ Dialects}.
-Wchar-subscripts @gol
-Wclobbered -Wcomment @gol
-Wconversion -Wno-coverage-mismatch -Wno-cpp @gol
--Wdangling-else -Wdate-time @gol
+-Wdangling-else -Wdangling-pointer -Wdangling-pointer=@var{n} @gol
+-Wdate-time @gol
-Wno-deprecated -Wno-deprecated-declarations -Wno-designated-init @gol
-Wdisabled-optimization @gol
-Wno-discarded-array-qualifiers -Wno-discarded-qualifiers @gol
@@ -4389,6 +4390,8 @@ Warn about overriding virtual functions that are not marked with the
@opindex Wno-use-after-free
Warn about uses of pointers to dynamically allocated objects that have
been rendered indeterminate by a call to a deallocation function.
+The warning is enabled at all optimization levels but may yield different
+results with optimization than without.
@table @gcctabopt
@item -Wuse-after-free=1
@@ -5714,6 +5717,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}.
-Wcatch-value @r{(C++ and Objective-C++ only)} @gol
-Wchar-subscripts @gol
-Wcomment @gol
+-Wdangling-pointer=2 @gol
-Wduplicate-decl-specifier @r{(C and Objective-C only)} @gol
-Wenum-compare @r{(in C/ObjC; this is on by default in C++)} @gol
-Wformat @gol
@@ -8587,6 +8591,62 @@ looks like this:
This warning is enabled by @option{-Wparentheses}.
+@item -Wdangling-pointer
+@itemx -Wdangling-pointer=@var{n}
+@opindex Wdangling-pointer
+@opindex Wno-dangling-pointer
+Warn about uses of pointers (or C++ references) to objects with automatic
+storage duration after their lifetime has ended. This includes local
+variables declared in nested blocks, compound literals and other unnamed
+temporary objects. In addition, warn about storing the address of such
+objects in escaped pointers. The warning is enabled at all optimization
+levels but may yield different results with optimization than without.
+
+@table @gcctabopt
+@item -Wdangling-pointer=1
+At level 1 the warning diagnoses only unconditional uses of dangling pointers.
+For example
+@smallexample
+int f (int c1, int c2, x)
+@{
+ char *p = strchr ((char[])@{ c1, c2 @}, c3);
+ return p ? *p : 'x'; // warning: dangling pointer to a compound literal
+@}
+@end smallexample
+In the following function the store of the address of the local variable
+@code{x} in the escaped pointer @code{*p} also triggers the warning.
+@smallexample
+void g (int **p)
+@{
+ int x = 7;
+ *p = &x; // warning: storing the address of a local variable in *p
+@}
+@end smallexample
+
+@item -Wdangling-pointer=2
+At level 2, in addition to unconditional uses the warning also diagnoses
+conditional uses of dangling pointers.
+
+For example, because the array @var{a} in the following function is out of
+scope when the pointer @var{s} that was set to point is used, the warning
+triggers at this level.
+
+@smallexample
+void f (char *s)
+@{
+ if (!s)
+ @{
+ char a[12] = "tmpname";
+ s = a;
+ @}
+ strcat (s, ".tmp"); // warning: dangling pointer to a may be used
+ ...
+@}
+@end smallexample
+@end table
+
+@option{-Wdangling-pointer=2} is included in @option{-Wall}.
+
@item -Wdate-time
@opindex Wdate-time
@opindex Wno-date-time
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 882129143a1..f639807a78a 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2069,10 +2069,12 @@ class pass_waccess : public gimple_opt_pass
~pass_waccess ();
- opt_pass *clone () { return new pass_waccess (m_ctxt); }
+ opt_pass *clone ();
virtual bool gate (function *);
+ void set_pass_param (unsigned, bool);
+
virtual unsigned int execute (function *);
private:
@@ -2089,6 +2091,9 @@ private:
/* Check a call to an ordinary function for invalid accesses. */
bool check_call_access (gcall *);
+ /* Check a non-call statement. */
+ void check_stmt (gimple *);
+
/* Check statements in a basic block. */
void check_block (basic_block);
@@ -2112,26 +2117,41 @@ private:
void check_atomic_memmodel (gimple *, tree, tree, const unsigned char *);
/* Check for uses of indeterminate pointers. */
- void check_pointer_uses (gimple *, tree);
+ void check_pointer_uses (gimple *, tree, tree = NULL_TREE, bool = false);
/* Return the argument that a call returns. */
tree gimple_call_return_arg (gcall *);
+ tree gimple_call_return_arg_ref (gcall *);
+
+ /* Check a call for uses of a dangling pointer arguments. */
+ void check_call_dangling (gcall *);
+
+ /* Check uses of a dangling pointer or those derived from it. */
+ void check_dangling_uses (tree, tree, bool = false, bool = false);
+ void check_dangling_uses ();
+ void check_dangling_stores ();
+ void check_dangling_stores (basic_block, hash_set<tree> &, auto_bitmap &);
- void warn_invalid_pointer (tree, gimple *, gimple *, bool, bool = false);
+ void warn_invalid_pointer (tree, gimple *, gimple *, tree, bool, bool = false);
/* Return true if use follows an invalidating statement. */
- bool use_after_inval_p (gimple *, gimple *);
+ bool use_after_inval_p (gimple *, gimple *, bool = false);
/* A pointer_query object and its cache to store information about
pointers and their targets in. */
pointer_query m_ptr_qry;
pointer_query::cache_type m_var_cache;
-
+ /* Mapping from DECLs and their clobber statements in the function. */
+ hash_map<tree, gimple *> m_clobbers;
/* A bit is set for each basic block whose statements have been assigned
valid UIDs. */
bitmap m_bb_uids_set;
/* The current function. */
function *m_func;
+ /* True to run checks for uses of dangling pointers. */
+ bool m_check_dangling_p;
+ /* True to run checks early on in the optimization pipeline. */
+ bool m_early_checks_p;
};
/* Construct the pass. */
@@ -2140,11 +2160,22 @@ pass_waccess::pass_waccess (gcc::context *ctxt)
: gimple_opt_pass (pass_data_waccess, ctxt),
m_ptr_qry (NULL, &m_var_cache),
m_var_cache (),
+ m_clobbers (),
m_bb_uids_set (),
- m_func ()
+ m_func (),
+ m_check_dangling_p (),
+ m_early_checks_p ()
{
}
+/* Return a copy of the pass with RUN_NUMBER one greater than THIS. */
+
+opt_pass*
+pass_waccess::clone ()
+{
+ return new pass_waccess (m_ctxt);
+}
+
/* Release pointer_query cache. */
pass_waccess::~pass_waccess ()
@@ -2152,6 +2183,14 @@ pass_waccess::~pass_waccess ()
m_ptr_qry.flush_cache ();
}
+void
+pass_waccess::set_pass_param (unsigned int n, bool early)
+{
+ gcc_assert (n == 0);
+
+ m_early_checks_p = early;
+}
+
/* Return true when any checks performed by the pass are enabled. */
bool
@@ -2340,6 +2379,9 @@ maybe_warn_alloc_args_overflow (gimple *stmt, const tree args[2],
void
pass_waccess::check_alloca (gcall *stmt)
{
+ if (m_early_checks_p)
+ return;
+
if ((warn_vla_limit >= HOST_WIDE_INT_MAX
&& warn_alloc_size_limit < warn_vla_limit)
|| (warn_alloca_limit >= HOST_WIDE_INT_MAX
@@ -2361,6 +2403,13 @@ pass_waccess::check_alloca (gcall *stmt)
void
pass_waccess::check_alloc_size_call (gcall *stmt)
{
+ if (m_early_checks_p)
+ return;
+
+ if (gimple_call_num_args (stmt) < 1)
+ /* Avoid invalid calls to functions without a prototype. */
+ return;
+
tree fndecl = gimple_call_fndecl (stmt);
if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
{
@@ -2413,6 +2462,9 @@ pass_waccess::check_alloc_size_call (gcall *stmt)
void
pass_waccess::check_strcat (gcall *stmt)
{
+ if (m_early_checks_p)
+ return;
+
if (!warn_stringop_overflow && !warn_stringop_overread)
return;
@@ -2438,6 +2490,9 @@ pass_waccess::check_strcat (gcall *stmt)
void
pass_waccess::check_strncat (gcall *stmt)
{
+ if (m_early_checks_p)
+ return;
+
if (!warn_stringop_overflow && !warn_stringop_overread)
return;
@@ -2507,6 +2562,9 @@ pass_waccess::check_strncat (gcall *stmt)
void
pass_waccess::check_stxcpy (gcall *stmt)
{
+ if (m_early_checks_p)
+ return;
+
tree dst = call_arg (stmt, 0);
tree src = call_arg (stmt, 1);
@@ -2545,7 +2603,7 @@ pass_waccess::check_stxcpy (gcall *stmt)
void
pass_waccess::check_stxncpy (gcall *stmt)
{
- if (!warn_stringop_overflow)
+ if (m_early_checks_p || !warn_stringop_overflow)
return;
tree dst = call_arg (stmt, 0);
@@ -2569,7 +2627,7 @@ pass_waccess::check_stxncpy (gcall *stmt)
void
pass_waccess::check_strncmp (gcall *stmt)
{
- if (!warn_stringop_overread)
+ if (m_early_checks_p || !warn_stringop_overread)
return;
tree arg1 = call_arg (stmt, 0);
@@ -2674,6 +2732,9 @@ pass_waccess::check_strncmp (gcall *stmt)
void
pass_waccess::check_memop_access (gimple *stmt, tree dest, tree src, tree size)
{
+ if (m_early_checks_p)
+ return;
+
/* For functions like memset and memcpy that operate on raw memory
try to determine the size of the largest source and destination
object using type-0 Object Size regardless of the object size
@@ -2695,7 +2756,7 @@ pass_waccess::check_read_access (gimple *stmt, tree src,
tree bound /* = NULL_TREE */,
int ost /* = 1 */)
{
- if (!warn_stringop_overread)
+ if (m_early_checks_p || !warn_stringop_overread)
return;
if (bound && !useless_type_conversion_p (size_type_node, TREE_TYPE (bound)))
@@ -2938,7 +2999,7 @@ pass_waccess::check_atomic_memmodel (gimple *stmt, tree ord_sucs,
if (warning_suppressed_p (stmt, OPT_Winvalid_memory_model))
return;
- if (maybe_warn_memmodel (stmt, ord_sucs, ord_fail, valid))
+ if (!maybe_warn_memmodel (stmt, ord_sucs, ord_fail, valid))
return;
suppress_warning (stmt, OPT_Winvalid_memory_model);
@@ -3094,11 +3155,12 @@ pass_waccess::check_builtin (gcall *stmt)
case BUILT_IN_FREE:
case BUILT_IN_REALLOC:
- {
- tree arg = call_arg (stmt, 0);
- if (TREE_CODE (arg) == SSA_NAME)
- check_pointer_uses (stmt, arg);
- }
+ if (!m_early_checks_p)
+ {
+ tree arg = call_arg (stmt, 0);
+ if (TREE_CODE (arg) == SSA_NAME)
+ check_pointer_uses (stmt, arg);
+ }
return true;
case BUILT_IN_GETTEXT:
@@ -3725,16 +3787,67 @@ pass_waccess::maybe_check_dealloc_call (gcall *call)
/* Return true if either USE_STMT's basic block (that of a pointer's use)
is dominated by INVAL_STMT's (that of a pointer's invalidating statement,
- or if they're in the same block, USE_STMT follows INVAL_STMT. */
+ which is either a clobber or a deallocation call), or if they're in
+ the same block, USE_STMT follows INVAL_STMT. */
bool
-pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt)
+pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt,
+ bool last_block /* = false */)
{
+ tree clobvar =
+ gimple_clobber_p (inval_stmt) ? gimple_assign_lhs (inval_stmt) : NULL_TREE;
+
basic_block inval_bb = gimple_bb (inval_stmt);
basic_block use_bb = gimple_bb (use_stmt);
+ if (!inval_bb || !use_bb)
+ return false;
+
if (inval_bb != use_bb)
- return dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb);
+ {
+ if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
+ return true;
+
+ if (!clobvar || !last_block)
+ return false;
+
+ /* Proceed only when looking for uses of dangling pointers. */
+ auto gsi = gsi_for_stmt (use_stmt);
+
+ auto_bitmap visited;
+
+ /* A use statement in the last basic block in a function or one that
+ falls through to it is after any other prior clobber of the used
+ variable unless it's followed by a clobber of the same variable. */
+ basic_block bb = use_bb;
+ while (bb != inval_bb
+ && single_succ_p (bb)
+ && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK)))
+ {
+ if (!bitmap_set_bit (visited, bb->index))
+ /* Avoid cycles. */
+ return true;
+
+ for (; !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
+ {
+ gimple *stmt = gsi_stmt (gsi);
+ if (gimple_clobber_p (stmt))
+ {
+ if (clobvar == gimple_assign_lhs (stmt))
+ /* The use is followed by a clobber. */
+ return false;
+ }
+ }
+
+ bb = single_succ (bb);
+ gsi = gsi_start_bb (bb);
+ }
+
+ /* The use is one of a dangling pointer if a clobber of the variable
+ [the pointer points to] has not been found before the function exit
+ point. */
+ return bb == EXIT_BLOCK_PTR_FOR_FN (cfun);
+ }
if (bitmap_set_bit (m_bb_uids_set, inval_bb->index))
/* The first time this basic block is visited assign increasing ids
@@ -3752,27 +3865,30 @@ pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt)
return gimple_uid (inval_stmt) < gimple_uid (use_stmt);
}
-/* Issue a warning for the USE_STMT of pointer PTR rendered invalid
- by INVAL_STMT. PTR may be null when it's been optimized away.
- MAYBE is true to issue the "maybe" kind of warning. EQUALITY is
- true when the pointer is used in an equality expression. */
+/* Issue a warning for the USE_STMT of pointer or reference REF rendered
+ invalid by INVAL_STMT. REF may be null when it's been optimized away.
+ When nonnull, INVAL_STMT is the deallocation function that rendered
+ the pointer or reference dangling. Otherwise, VAR is the auto variable
+ (including an unnamed temporary such as a compound literal) whose
+ lifetime's rended it dangling. MAYBE is true to issue the "maybe"
+ kind of warning. EQUALITY is true when the pointer is used in
+ an equality expression. */
void
-pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt,
- gimple *inval_stmt,
- bool maybe,
- bool equality /* = false */)
+pass_waccess::warn_invalid_pointer (tree ref, gimple *use_stmt,
+ gimple *inval_stmt, tree var,
+ bool maybe, bool equality /* = false */)
{
/* Avoid printing the unhelpful "<unknown>" in the diagnostics. */
- if (ptr && TREE_CODE (ptr) == SSA_NAME
- && (!SSA_NAME_VAR (ptr) || DECL_ARTIFICIAL (SSA_NAME_VAR (ptr))))
- ptr = NULL_TREE;
+ if (ref && TREE_CODE (ref) == SSA_NAME
+ && (!SSA_NAME_VAR (ref) || DECL_ARTIFICIAL (SSA_NAME_VAR (ref))))
+ ref = NULL_TREE;
location_t use_loc = gimple_location (use_stmt);
if (use_loc == UNKNOWN_LOCATION)
{
- use_loc = cfun->function_end_locus;
- if (!ptr)
+ use_loc = m_func->function_end_locus;
+ if (!ref)
/* Avoid issuing a warning with no context other than
the function. That would make it difficult to debug
in any but very simple cases. */
@@ -3788,12 +3904,12 @@ pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt,
const tree inval_decl = gimple_call_fndecl (inval_stmt);
- if ((ptr && warning_at (use_loc, OPT_Wuse_after_free,
+ if ((ref && warning_at (use_loc, OPT_Wuse_after_free,
(maybe
? G_("pointer %qE may be used after %qD")
: G_("pointer %qE used after %qD")),
- ptr, inval_decl))
- || (!ptr && warning_at (use_loc, OPT_Wuse_after_free,
+ ref, inval_decl))
+ || (!ref && warning_at (use_loc, OPT_Wuse_after_free,
(maybe
? G_("pointer may be used after %qD")
: G_("pointer used after %qD")),
@@ -3805,6 +3921,52 @@ pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt,
}
return;
}
+
+ if ((maybe && warn_dangling_pointer < 2)
+ || warning_suppressed_p (use_stmt, OPT_Wdangling_pointer_))
+ return;
+
+ if (DECL_NAME (var))
+ {
+ if ((ref
+ && warning_at (use_loc, OPT_Wdangling_pointer_,
+ (maybe
+ ? G_("dangling pointer %qE to %qD may be used")
+ : G_("using dangling pointer %qE to %qD")),
+ ref, var))
+ || (!ref
+ && warning_at (use_loc, OPT_Wdangling_pointer_,
+ (maybe
+ ? G_("dangling pointer to %qD may be used")
+ : G_("using a dangling pointer to %qD")),
+ var)))
+ inform (DECL_SOURCE_LOCATION (var),
+ "%qD declared here", var);
+ suppress_warning (use_stmt, OPT_Wdangling_pointer_);
+ return;
+ }
+
+ if ((ref
+ && warning_at (use_loc, OPT_Wdangling_pointer_,
+ (maybe
+ ? G_("dangling pointer %qE to an unnamed temporary "
+ "may be used")
+ : G_("using dangling pointer %qE to an unnamed "
+ "temporary")),
+ ref, var))
+ || (!ref
+ && warning_at (use_loc, OPT_Wdangling_pointer_,
+ (maybe
+ ? G_("dangling pointer to an unnamed temporary "
+ "may be used")
+ : G_("using a dangling pointer to an unnamed "
+ "temporary")),
+ var)))
+ {
+ inform (DECL_SOURCE_LOCATION (var),
+ "unnamed temporary defined here");
+ suppress_warning (use_stmt, OPT_Wdangling_pointer_);
+ }
}
/* If STMT is a call to either the standard realloc or to a user-defined
@@ -3927,10 +4089,14 @@ pointers_related_p (gimple *stmt, tree p, tree q, pointer_query &qry)
/* For a STMT either a call to a deallocation function or a clobber, warn
for uses of the pointer PTR it was called with (including its copies
- or others derived from it by pointer arithmetic). */
+ or others derived from it by pointer arithmetic). If STMT is a clobber,
+ VAR is the decl of the clobbered variable. When MAYBE is true use
+ a "maybe" form of diagnostic. */
void
-pass_waccess::check_pointer_uses (gimple *stmt, tree ptr)
+pass_waccess::check_pointer_uses (gimple *stmt, tree ptr,
+ tree var /* = NULL_TREE */,
+ bool maybe /* = false */)
{
gcc_assert (TREE_CODE (ptr) == SSA_NAME);
@@ -4013,18 +4179,25 @@ pass_waccess::check_pointer_uses (gimple *stmt, tree ptr)
/* Warn if USE_STMT is dominated by the deallocation STMT.
Otherwise, add the pointer to POINTERS so that the uses
of any other pointers derived from it can be checked. */
- if (use_after_inval_p (stmt, use_stmt))
+ if (use_after_inval_p (stmt, use_stmt, check_dangling))
{
- /* TODO: Handle PHIs but careful of false positives. */
- if (gimple_code (use_stmt) != GIMPLE_PHI)
+ if (gimple_code (use_stmt) == GIMPLE_PHI)
{
- basic_block use_bb = gimple_bb (use_stmt);
- bool this_maybe
- = !dominated_by_p (CDI_POST_DOMINATORS, use_bb, stmt_bb);
- warn_invalid_pointer (*use_p->use, use_stmt, stmt,
- this_maybe, equality);
- continue;
+ tree lhs = gimple_phi_result (use_stmt);
+ if (TREE_CODE (lhs) == SSA_NAME)
+ {
+ pointers.safe_push (lhs);
+ continue;
+ }
}
+
+ basic_block use_bb = gimple_bb (use_stmt);
+ bool this_maybe
+ = (maybe
+ || !dominated_by_p (CDI_POST_DOMINATORS, use_bb, stmt_bb));
+ warn_invalid_pointer (*use_p->use, use_stmt, stmt, var,
+ this_maybe, equality);
+ continue;
}
if (is_gimple_assign (use_stmt))
@@ -4059,26 +4232,100 @@ pass_waccess::check_call (gcall *stmt)
if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
check_builtin (stmt);
- if (tree callee = gimple_call_fndecl (stmt))
- {
- /* Check for uses of the pointer passed to either a standard
- or a user-defined deallocation function. */
- unsigned argno = fndecl_dealloc_argno (callee);
- if (argno < (unsigned) call_nargs (stmt))
- {
- tree arg = call_arg (stmt, argno);
- if (TREE_CODE (arg) == SSA_NAME)
- check_pointer_uses (stmt, arg);
- }
- }
+ if (!m_early_checks_p)
+ if (tree callee = gimple_call_fndecl (stmt))
+ {
+ /* Check for uses of the pointer passed to either a standard
+ or a user-defined deallocation function. */
+ unsigned argno = fndecl_dealloc_argno (callee);
+ if (argno < (unsigned) call_nargs (stmt))
+ {
+ tree arg = call_arg (stmt, argno);
+ if (TREE_CODE (arg) == SSA_NAME)
+ check_pointer_uses (stmt, arg);
+ }
+ }
check_call_access (stmt);
+ check_call_dangling (stmt);
+
+ if (m_early_checks_p)
+ return;
maybe_check_dealloc_call (stmt);
check_nonstring_args (stmt);
}
+/* Return true of X is a DECL with automatic storage duration. */
+
+static inline bool
+is_auto_decl (tree x)
+{
+ return DECL_P (x) && !DECL_EXTERNAL (x) && !TREE_STATIC (x);
+}
+
+/* Check non-call STMT for invalid accesses. */
+
+void
+pass_waccess::check_stmt (gimple *stmt)
+{
+ if (m_check_dangling_p && gimple_clobber_p (stmt))
+ {
+ /* Ignore clobber statemts in blocks with exceptional edges. */
+ basic_block bb = gimple_bb (stmt);
+ edge e = EDGE_PRED (bb, 0);
+ if (e->flags & EDGE_EH)
+ return;
+
+ tree var = gimple_assign_lhs (stmt);
+ m_clobbers.put (var, stmt);
+ return;
+ }
+
+ if (is_gimple_assign (stmt))
+ {
+ /* Clobbered unnamed temporaries such as compound literals can be
+ revived. Check for an assignment to one and remove it from
+ M_CLOBBERS. */
+ tree lhs = gimple_assign_lhs (stmt);
+ while (handled_component_p (lhs))
+ lhs = TREE_OPERAND (lhs, 0);
+
+ if (is_auto_decl (lhs))
+ m_clobbers.remove (lhs);
+ return;
+ }
+
+ if (greturn *ret = dyn_cast <greturn *> (stmt))
+ {
+ if (optimize && flag_isolate_erroneous_paths_dereference)
+ /* Avoid interfering with -Wreturn-local-addr (which runs only
+ with optimization enabled). */
+ return;
+
+ tree arg = gimple_return_retval (ret);
+ if (!arg || TREE_CODE (arg) != ADDR_EXPR)
+ return;
+
+ arg = TREE_OPERAND (arg, 0);
+ while (handled_component_p (arg))
+ arg = TREE_OPERAND (arg, 0);
+
+ if (!is_auto_decl (arg))
+ return;
+
+ gimple **pclobber = m_clobbers.get (arg);
+ if (!pclobber)
+ return;
+
+ if (!use_after_inval_p (*pclobber, stmt))
+ return;
+
+ warn_invalid_pointer (NULL_TREE, stmt, *pclobber, arg, false);
+ }
+}
+
/* Check basic block BB for invalid accesses. */
void
@@ -4091,6 +4338,8 @@ pass_waccess::check_block (basic_block bb)
gimple *stmt = gsi_stmt (si);
if (gcall *call = dyn_cast <gcall *> (stmt))
check_call (call);
+ else
+ check_stmt (stmt);
}
}
@@ -4139,6 +4388,262 @@ pass_waccess::gimple_call_return_arg (gcall *call)
return gimple_call_arg (call, argno);
}
+/* Return the decl referenced by the argument that the call STMT to
+ a built-in function returns (including with an offset) or null if
+ it doesn't. */
+
+tree
+pass_waccess::gimple_call_return_arg_ref (gcall *call)
+{
+ if (tree arg = gimple_call_return_arg (call))
+ {
+ access_ref aref;
+ if (m_ptr_qry.get_ref (arg, call, &aref, 0)
+ && DECL_P (aref.ref))
+ return aref.ref;
+ }
+
+ return NULL_TREE;
+}
+
+/* Check for and diagnose all uses of the dangling pointer VAR to the auto
+ object DECL whose lifetime has ended. OBJREF is true when VAR denotes
+ an access to a DECL that may have been clobbered. */
+
+void
+pass_waccess::check_dangling_uses (tree var, tree decl, bool maybe /* = false */,
+ bool objref /* = false */)
+{
+ if (!decl || !is_auto_decl (decl))
+ return;
+
+ gimple **pclob = m_clobbers.get (decl);
+ if (!pclob)
+ return;
+
+ if (!objref)
+ {
+ check_pointer_uses (*pclob, var, decl, maybe);
+ return;
+ }
+
+ gimple *use_stmt = SSA_NAME_DEF_STMT (var);
+ if (!use_after_inval_p (*pclob, use_stmt, true))
+ return;
+
+ basic_block use_bb = gimple_bb (use_stmt);
+ basic_block clob_bb = gimple_bb (*pclob);
+ maybe = maybe || !dominated_by_p (CDI_POST_DOMINATORS, use_bb, clob_bb);
+ warn_invalid_pointer (var, use_stmt, *pclob, decl, maybe, false);
+}
+
+/* Diagnose stores in BB and (recursively) its predecessors of the addresses
+ of local variables into nonlocal pointers that are left dangling after
+ the function returns. BBS is a bitmap of basic blocks visited. */
+
+void
+pass_waccess::check_dangling_stores (basic_block bb,
+ hash_set<tree> &stores,
+ auto_bitmap &bbs)
+{
+ if (!bitmap_set_bit (bbs, bb->index))
+ /* Avoid cycles. */
+ return;
+
+ /* Iterate backwards over the statements looking for a store of
+ the address of a local variable into a nonlocal pointer. */
+ for (auto gsi = gsi_last_nondebug_bb (bb); ; gsi_prev_nondebug (&gsi))
+ {
+ gimple *stmt = gsi_stmt (gsi);
+ if (!stmt)
+ break;
+
+ if (is_gimple_call (stmt)
+ && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE)))
+ /* Avoid looking before nonconst, nonpure calls since those might
+ use the escaped locals. */
+ return;
+
+ if (!is_gimple_assign (stmt) || gimple_clobber_p (stmt))
+ continue;
+
+ access_ref lhs_ref;
+ tree lhs = gimple_assign_lhs (stmt);
+ if (!m_ptr_qry.get_ref (lhs, stmt, &lhs_ref, 0))
+ continue;
+
+ if (is_auto_decl (lhs_ref.ref))
+ continue;
+
+ if (DECL_P (lhs_ref.ref))
+ {
+ if (!POINTER_TYPE_P (TREE_TYPE (lhs_ref.ref))
+ || lhs_ref.deref > 0)
+ continue;
+ }
+ else if (TREE_CODE (lhs_ref.ref) == SSA_NAME)
+ {
+ /* Avoid looking at or before stores into unknown objects. */
+ gimple *def_stmt = SSA_NAME_DEF_STMT (lhs_ref.ref);
+ if (!gimple_nop_p (def_stmt))
+ return;
+ }
+ else if (TREE_CODE (lhs_ref.ref) == MEM_REF)
+ {
+ tree arg = TREE_OPERAND (lhs_ref.ref, 0);
+ if (TREE_CODE (arg) == SSA_NAME)
+ {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (arg);
+ if (!gimple_nop_p (def_stmt))
+ return;
+ }
+ }
+ else
+ continue;
+
+ if (stores.add (lhs_ref.ref))
+ continue;
+
+ /* FIXME: Handle stores of alloca() and VLA. */
+ access_ref rhs_ref;
+ tree rhs = gimple_assign_rhs1 (stmt);
+ if (!m_ptr_qry.get_ref (rhs, stmt, &rhs_ref, 0)
+ || rhs_ref.deref != -1)
+ continue;
+
+ if (!is_auto_decl (rhs_ref.ref))
+ continue;
+
+ location_t loc = gimple_location (stmt);
+ if (warning_at (loc, OPT_Wdangling_pointer_,
+ "storing the address of local variable %qD in %qE",
+ rhs_ref.ref, lhs))
+ {
+ location_t loc = DECL_SOURCE_LOCATION (rhs_ref.ref);
+ inform (loc, "%qD declared here", rhs_ref.ref);
+
+ if (DECL_P (lhs_ref.ref))
+ loc = DECL_SOURCE_LOCATION (lhs_ref.ref);
+ else if (EXPR_HAS_LOCATION (lhs_ref.ref))
+ loc = EXPR_LOCATION (lhs_ref.ref);
+
+ if (loc != UNKNOWN_LOCATION)
+ inform (loc, "%qE declared here", lhs_ref.ref);
+ }
+ }
+
+ edge e;
+ edge_iterator ei;
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ {
+ basic_block pred = e->src;
+ check_dangling_stores (pred, stores, bbs);
+ }
+}
+
+/* Diagnose stores of the addresses of local variables into nonlocal
+ pointers that are left dangling after the function returns. */
+
+void
+pass_waccess::check_dangling_stores ()
+{
+ auto_bitmap bbs;
+ hash_set<tree> stores;
+ check_dangling_stores (EXIT_BLOCK_PTR_FOR_FN (m_func), stores, bbs);
+}
+
+/* Check for and diagnose uses of dangling pointers to auto objects
+ whose lifetime has ended. */
+
+void
+pass_waccess::check_dangling_uses ()
+{
+ tree var;
+ unsigned i;
+ FOR_EACH_SSA_NAME (i, var, m_func)
+ {
+ /* For each SSA_NAME pointer VAR find the DECL it points to.
+ If the DECL is a clobbered local variable, check to see
+ if any of VAR's uses (or those of other pointers derived
+ from VAR) happens after the clobber. If so, warn. */
+ tree decl = NULL_TREE;
+
+ gimple *def_stmt = SSA_NAME_DEF_STMT (var);
+ if (is_gimple_assign (def_stmt))
+ {
+ tree rhs = gimple_assign_rhs1 (def_stmt);
+ if (TREE_CODE (rhs) == ADDR_EXPR)
+ {
+ if (!POINTER_TYPE_P (TREE_TYPE (var)))
+ continue;
+ decl = TREE_OPERAND (rhs, 0);
+ }
+ else
+ {
+ /* For other expressions, check the base DECL to see
+ if it's been clobbered, most likely as a result of
</cut>
Project Stratos
===============
- reviewed Peter's virtio-video patches for QEMU
[PR to clean up some typos in EDK2]
<https://github.com/tianocore/edk2-platforms/pull/34>
vhost-device maintainer effort ([UM-196])
- started reviewing https://github.com/rust-vmm/vhost-device/pull/7
- looking pretty good, see how
https://github.com/rust-vmm/vm-virtio/commit/463dd20552fc32139bbbb56e9152df…
would work with it
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
QEMU Upstream Work ([UM-2])
===========================
- posted [RFC PATCH 0/6] Basic skeleton of RP2040 Raspbery Pi Pico
Message-Id: <20220110175104.2908956-1-alex.bennee(a)linaro.org>
- posted [PATCH v1 00/34] testing/next and other misc fixes
Message-Id: <20220105135009.1584676-1-alex.bennee(a)linaro.org>
- and the eventual [PULL 00/31] testing/next and other misc fixes
Message-Id: <20220112112722.3641051-1-alex.bennee(a)linaro.org>
- and the inevitable fixup [RFC PATCH] linux-user: expand reserved
brk space for 64bit guests Message-Id:
<20220113165550.4184455-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG
sanity tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Completed Reviews [6/6]
=======================
[PATCH] tests/docker: Add gentoo-loongarch64-cross image and run cross builds in GitLab
Message-Id: <20211229062204.3726981-1-git(a)xen0n.name>
[PATCH 0/2] tests/tcg: Fix float_{convs,madds}
Message-Id: <20211224035541.2159966-1-richard.henderson(a)linaro.org>
[PATCH v5 00/18] tests/docker: start using libvirt-ci's "lcitool" for dockerfiles
Message-Id: <20211215141949.3512719-1-berrange(a)redhat.com>
[PATCH] tests/tcg: Unconditionally use 90 second timeout
Message-Id: <20211230235424.49155-1-richard.henderson(a)linaro.org>
[PATCH] gitlab-ci: Speed up the msys2-64bit job by using --without-default-devices
Message-Id: <20211216082253.43899-1-thuth(a)redhat.com>
[PATCH 0/8] virtio: Add vhost-user based Video decode
Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org>
Absences
========
Current Review Queue
====================
TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix
Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org>
============================================================================================================================
TODO [PATCH 0/7] tcg: some small towards more modular tcg
Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com>
=================================================================================================================
TODO [PATCH 0/6] Introduce CanoKey QEMU
Message-Id: <YcSupUSXWDXOAkas@Sun>
=========================================================================
TODO [PATCH] target/arm: Add missing FEAT_TLBIOS instructions
Message-Id: <20211231103928.1455657-1-idan.horowitz(a)gmail.com>
===========================================================================================================================
--
Alex Bennée
Project Stratos
===============
- reviewed Peter's virtio-video patches for QEMU
[PR to clean up some typos in EDK2]
<https://github.com/tianocore/edk2-platforms/pull/34>
vhost-device maintainer effort ([UM-196])
- started reviewing https://github.com/rust-vmm/vhost-device/pull/7
- looking pretty good, see how
https://github.com/rust-vmm/vm-virtio/commit/463dd20552fc32139bbbb56e9152df…
would work with it
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
QEMU Upstream Work ([UM-2])
===========================
- posted [RFC PATCH 0/6] Basic skeleton of RP2040 Raspbery Pi Pico
Message-Id: <20220110175104.2908956-1-alex.bennee(a)linaro.org>
- posted [PATCH v1 00/34] testing/next and other misc fixes
Message-Id: <20220105135009.1584676-1-alex.bennee(a)linaro.org>
- and the eventual [PULL 00/31] testing/next and other misc fixes
Message-Id: <20220112112722.3641051-1-alex.bennee(a)linaro.org>
- and the inevitable fixup [RFC PATCH] linux-user: expand reserved
brk space for 64bit guests Message-Id:
<20220113165550.4184455-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG
sanity tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Completed Reviews [6/6]
=======================
[PATCH] tests/docker: Add gentoo-loongarch64-cross image and run cross builds in GitLab
Message-Id: <20211229062204.3726981-1-git(a)xen0n.name>
[PATCH 0/2] tests/tcg: Fix float_{convs,madds}
Message-Id: <20211224035541.2159966-1-richard.henderson(a)linaro.org>
[PATCH v5 00/18] tests/docker: start using libvirt-ci's "lcitool" for dockerfiles
Message-Id: <20211215141949.3512719-1-berrange(a)redhat.com>
[PATCH] tests/tcg: Unconditionally use 90 second timeout
Message-Id: <20211230235424.49155-1-richard.henderson(a)linaro.org>
[PATCH] gitlab-ci: Speed up the msys2-64bit job by using --without-default-devices
Message-Id: <20211216082253.43899-1-thuth(a)redhat.com>
[PATCH 0/8] virtio: Add vhost-user based Video decode
Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org>
Absences
========
Current Review Queue
====================
TODO [PATCH 0/6] Introduce CanoKey QEMU
Message-Id: <YcSupUSXWDXOAkas@Sun>
=========================================================================
TODO [PATCH] target/arm: Add missing FEAT_TLBIOS instructions
Message-Id: <20211231103928.1455657-1-idan.horowitz(a)gmail.com>
========================================================================================================================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Most of this week was spent on continuing to work through
my code-review queue :-/
- Sent a few minor cleanup patches for linux-user nits I noticed while
reading the code as part of reviewing a big bsd-user patchset
* QEMU-420 [GICv4 emulation]
- got some reviewed ITS cleanup patches upstream
- rerolled and sent v2 patchset for the rest of the cleanup patches
- got back up to speed with where I left my GICv4 ITS patches
before Christmas, and dealt with some minor loose ends I'd
left in the last patch or two I was working on.
-- PMM
Hi Peter,
Welcome back, hope you had a good Christmas break. I'm off oh holiday myself for the next
two weeks, so this would be an ideal time to pass back merge control to you.
The board is mostly green now, with occasional allowed failures for centos-stream and
freebsd for upstream package manager failures.
See yall in a couple of weeks.
r~
[UM-2]
* Re-greening of gitlab-ci.
- There are continuing issues with cross-i386-tci.
Occasionally I see *really* long test times:
https://gitlab.com/qemu-project/qemu/-/jobs/1941996332
with qtest-aarch64/qom-test taking 1738s, or 28 of the 60 minute budget.
More often it's merely slow:
https://gitlab.com/qemu-project/qemu/-/jobs/1954634840
with qtest-aarch64/qom-test taking 538s. Note that locally this test
runs in about 100s, and I have been unable to determine why it runs so
much slower on gitlab.
- Worked on a ppc64-softmmu slowdown leading to timeouts.
- Fixes for meson regressions affecting testing.
* Refresh tcg unaligned user patch sets.
r~
Progress (short week, 2 days):
* UM-2 [QEMU upstream maintainership]
- Catching up with email and codereview backlog from 3 weeks holiday :-)
(Have got the codereview queue down to less than a dozen things
so should be able to do some more GICv4 development next week.)
-- PMM
Project Stratos
===============
- got Xen working on the MachiatoBin
- posted Configuring the host GIC for guest to guest IPI Message-Id:
<87fsqwn2sd.fsf(a)linaro.org>
QEMU Upstream Work ([UM-2])
===========================
- posted [RFC PATCH] linux-user: don't adjust base of found hole
Message-Id: <20211216144442.2270605-1-alex.bennee(a)linaro.org>
- posted [PATCH] hw/arm: add control knob to disable kaslr_seed via
DTB Message-Id: <20211215120926.1696302-1-alex.bennee(a)linaro.org>
Completed Reviews [3/3]
=======================
[PATCH 00/26] arm gicv3 ITS: Various bug fixes and refactorings
Message-Id: <20211211191135.1764649-1-peter.maydell(a)linaro.org>
[PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org>
[PATCH-for-6.2? v2 0/5] docs/devel/style: Improve rST rendering
Message-Id: <20211118145716.4116731-1-philmd(a)redhat.com>
Absences
========
Off for holidays, back in the new year. Merry Christmas everyone!
--
Alex Bennée
Project Stratos
===============
- posted Potential demo setup for a TSN/XDP networking Message-Id:
<87wnkfkp2f.fsf(a)linaro.org>
- final Stratos call of the year
- CC and Arnd will look at fat virtq
- nice update from EPAM on Zephyr
- had another round of getting working ACPI on MachiatoBin
- posted [PR to clean up some typos in EDK2]
- might have a working Xen setup without needing SMC hacks
[PR to clean up some typos in EDK2]
<https://github.com/tianocore/edk2-platforms/pull/34>
vhost-device maintainer effort ([UM-196])
- finished review of https://github.com/rust-vmm/vhost-device/pull/4
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
QEMU Upstream Work ([UM-2])
===========================
- discussion around Suggestions for TCG performance improvements
Message-Id: <c76bde31-8f3b-2d03-b7c7-9e026d4b5873(a)huawei.com>
- did a bunch of bug triage and tagging
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- awaiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity
tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Completed Reviews [3/3]
=======================
[PATCH] tests/plugin/syscall.c: fix compiler warnings
Message-Id: <20211128011551.2115468-1-juro.bystricky(a)intel.com>
[PATCH for-6.2? 0/2] arm_gicv3: Fix handling of LPIs in list registers
Message-Id: <20211126163915.1048353-2-peter.maydell(a)linaro.org>
[PATCH] tests/docker: add libfuse3 development headers
Message-Id: <20211207160025.52466-1-stefanha(a)redhat.com>
Absences
========
Current Review Queue
====================
TODO [PATCH 0/8] virtio: Add vhost-user based Video decode
Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org>
========================================================================================================================
TODO [PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org>
========================================================================================================================================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- More code review: now have a target-arm.next poised and ready to
send once 6.2 is released
* QEMU-420 [GICv4 emulation]
- Working on the ITS changes needed for GICv4 support (this turns
out to be a more tractable end to start than the redistributor)
- I have a preliminary set of 25 or so patches to the ITS which
clean up the code and fix some pre-existing bugs that I found
while working on the GICv4 changes
- have implemented the new VMAPI, VMAPTI, VMAPP ITS commands
-- PMM
After llvm commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
[LV] Pass compare predicate to getCmpSelInstrCost.
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 11115 to 11846 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
cd investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 3d549dddf75b6ff9e0ec8c053677750bde4226ea
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ab31d003e16e483bff298ea2f28fec0f23e8eb79
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
Date: Mon Dec 6 11:14:27 2021 +0000
[LV] Pass compare predicate to getCmpSelInstrCost.
If the condition of a select is a compare, pass its predicate to
TTI::getCmpSelInstrCost to get a more accurate cost value instead
of passing BAD_ICMP_PREDICATE.
I noticed that the commit message from D90070 had a comment about the
vectorized select predicate possibly being composed of other compares with
different predicate values, but I wasn't able to construct an example
where this was an actual issue. If this is an issue, I guess we could
add another check that the block isn't predicated for any reason.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D114646
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 11 ++++++++---
llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll | 14 +++++++-------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 050879144afd..c03e506b7474 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7570,8 +7570,12 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);
- return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+
+ CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+ if (auto *Cmp = dyn_cast<CmpInst>(SI->getCondition()))
+ Pred = Cmp->getPredicate();
+ return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, Pred,
+ CostKind, I);
}
case Instruction::ICmp:
case Instruction::FCmp: {
@@ -7581,7 +7585,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);
VectorTy = ToVectorTy(ValTy, VF);
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+ cast<CmpInst>(I)->getPredicate(), CostKind,
+ I);
}
case Instruction::Store:
case Instruction::Load: {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
index 62b18f44fbc5..20d2dc0b7cda 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
@@ -5,17 +5,17 @@ target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios5.0.0"
define void @selects_1(i32* nocapture %dst, i32 %A, i32 %B, i32 %C, i32 %N) {
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
; CHECK-LABEL: define void @selects_1(
; CHECK: vector.body:
-; CHECK: select <2 x i1>
+; CHECK: select <4 x i1>
entry:
%cmp26 = icmp sgt i32 %N, 0
</cut>
Dear Linaro Toolchain Working Group,
clang-thumbv7-full-2stage is red for 20 days.
Could you take it to the staging area and make it green again, please?
Thanks
Galina
After llvm commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d
Author: David Blaikie <dblaikie(a)gmail.com>
Add missing header
the following benchmarks slowed down by more than 2%:
- 433.milc slowed down by 4% from 12427 to 12916 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-bd4c6a476fd037fb07a1c484f75d93ee40713d3d
cd investigate-llvm-bd4c6a476fd037fb07a1c484f75d93ee40713d3d
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach bd4c6a476fd037fb07a1c484f75d93ee40713d3d
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7d4da4e1ab7f79e51db0d5c2a0f5ef1711122dd7
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d
Author: David Blaikie <dblaikie(a)gmail.com>
Date: Mon Nov 29 16:29:25 2021 -0800
Add missing header
---
llvm/lib/Demangle/DLangDemangle.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/llvm/lib/Demangle/DLangDemangle.cpp b/llvm/lib/Demangle/DLangDemangle.cpp
index faf91b239490..f380aa90035e 100644
--- a/llvm/lib/Demangle/DLangDemangle.cpp
+++ b/llvm/lib/Demangle/DLangDemangle.cpp
@@ -17,6 +17,7 @@
#include "llvm/Demangle/StringView.h"
#include "llvm/Demangle/Utility.h"
+#include <cctype>
#include <cstring>
#include <limits>
</cut>
VirtIO Initiative ([STR-9])
===========================
- synced up on [AX_XDP task with Akashi-san]
- synced on rust-vmm
[AX_XDP task with Akashi-san]
<https://linaro.atlassian.net/browse/STR-68>
vhost-device maintainer effort ([UM-196])
- started looking at https://github.com/rust-vmm/vhost-device/pull/4
QEMU Upstream Work ([UM-2])
===========================
- posted [PULL for 6.2 0/8] more tcg, plugin, test and build fixes
Message-Id: <20211129171449.4176301-1-alex.bennee(a)linaro.org>
- commented on Re: Follow-up on the CXL discussion at OFTC Message-Id:
<20211119015207.62fhk5mjmvaj5nz4(a)intel.com> to see if I can unblock
- posted [RFC PATCH] blog post: how to get your new feature
up-streamed Message-Id:
<20211126203319.3298089-1-alex.bennee(a)linaro.org>
- posted [PATCH for 6.2?] Revert "vga: don't abort when adding a
duplicate isa-vga device" Message-Id:
<20211202164929.1119036-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM
Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Other
=====
- wrote [RFC PATCH 0/2] insn plugin tweaks for measuring frequency
Message-Id: <20211203144421.1445232-1-alex.bennee(a)linaro.org>
- might make a good basis for a TCG plugins blog post
Completed Reviews [2/2]
=======================
[PATCH] tests/plugin/syscall.c: fix compiler warnings
Message-Id: <20211128011551.2115468-1-juro.bystricky(a)intel.com>
[PATCH for-6.2? 0/2] arm_gicv3: Fix handling of LPIs in list registers
Message-Id: <20211126163915.1048353-2-peter.maydell(a)linaro.org>
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- Code review: worked through some of the backlog and accumulated
a list of series to take once the tree reopens for 7.0
- Wrote and sent some cleanup patches relating to the qemu-common.h
header file
- Fixed a bug where we miscalculated the length for TLB range
invalidations
* QEMU-420 [GICv4 emulation]
- Found the problem with PCI passthrough in my nested test setup:
apparently virtio PCI devices need an extra command line argument
to get them to honour the presence of an IOMMU. Everything is
now working and I've put some notes about the setup into
https://linaro.atlassian.net/browse/QEMU-447
- started to implement the GICv4 redistributor changes
-- PMM
VirtIO Initiative ([STR-9])
===========================
- [this weeks sync], topics on AF_XDP, virtio-video and
virtio-watchdog
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 6.2 v2 0/7] more tcg, plugin, test and build fixes
Message-Id: <20211125154144.2904741-1-alex.bennee(a)linaro.org>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM
Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Other
=====
- renewal feedback
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20211101132300.192584-1-mlevitsk(a)redhat.com>
[PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes
Message-Id: <20211124202005.989935-1-peter.maydell(a)linaro.org>
Absences
========
- off 2 days sick
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress:
* QEMU-420 [GICv4 emulation]
- Tracked down and fixed a bug in our ITS emulation which would
(intermittently?) result in a Linux guest reporting "irq 54:
nobody cared" and hanging, because we were not correctly
recalculating the highest priority pending interrupt when the
guest acknowledged a pending LPI. This fix will go into 6.2.
- Set up a test environment for GICv4 work -- because the major
feature of GICv4 is support for directly injecting interrupts
into a VM, the test setup needs to be nested virtualization,
where an outer L1 guest runs on pure emulated QEMU, the inner
L2 guest uses KVM (as provided by L1), and we pass a PCI device
(emulated by QEMU) through from L1 to L2. I think I have this
correctly set up now, but...
- ...the L2 guest hangs because it apparently never sees an
interrupt from the passed-through PCI device. This implies a
bug in our current GICv3 emulation somewhere: need to track
this down before starting in on GICv4 work.
- Separately, I found through code inspection a bug where we
do the wrong thing in the non-passthrough case when the L1 guest
sets a virtual interrupt for the L2 guest in the GIC list
registers and that interrupt has an ID > 1023 (ie it is an LPI).
We got this wrong both for acknowledging and ending an interrupt,
so the two bugs cancel each other out except that we don't set
the vCPU priority and so the L2 guest might get an unexpected
interrupt while it was servicing the LPI. Patches sent.
-- PMM
VirtIO Initiative ([STR-9])
===========================
- posted Initial thoughts for test scenarios for AF_XDP epic
Message-Id: <87k0h5v6ju.fsf(a)linaro.org>
vhost-device maintainer effort ([UM-196])
- more review
- [did some more noodling with rust] to get comfortable with generics
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
[vhost-device crate] <https://github.com/rust-vmm/vhost-device>
[did some more noodling with rust]
<https://gitlab.com/stsquad/softfloat.rs>
QEMU Upstream Work ([UM-2])
===========================
- posted [PULL for 6.2 0/7] misc build and test fixes Message-Id:
<20211116162515.4100231-1-alex.bennee(a)linaro.org>
- posted [RFC PATCH] tests/avocado: fix tcg_plugin mem access count
test Message-Id: <20211117095448.136558-1-alex.bennee(a)linaro.org>
- posted Re: [RFC PATCH] plugins/meson.build: fix linker issue with
weird paths (for v6.2?) Message-Id:
<20211117111924.179776-1-alex.bennee(a)linaro.org>
- posted Re: [PATCH v2 1/3] icount: preserve cflags when custom tb is
about to execute Message-Id: <87h7cbw1tx.fsf(a)linaro.org>
- posted [RFC PATCH] gdbstub: handle a potentially racing TaskState
Message-Id: <20211119145124.942390-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v3 0/3] GIC ITS tests Message-Id:
<20211112114734.3058678-1-alex.bennee(a)linaro.org>
- posted [kvm-unit-tests PATCH v8 00/10] MTTCG sanity tests for ARM
Message-Id: <20211118184650.661575-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] Some watchpoint-related patches
Message-Id: <163662450348.125458.5494710452733592356.stgit@pasha-ThinkPad-X280>
[PATCH 0/5] Update linux-headers + NOIRQ support for KVM gdbstub
Message-Id: <20211111110604.207376-1-pbonzini(a)redhat.com>
Absences
========
- none
Current Review Queue
====================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress (short week, 3 days)
* UM-2 [QEMU upstream maintainership]
- Still trying to sort out the regression of booting EL3 guest
code on the imx7 board. I got most of the way through prototyping
a cleanup which would fix this, but then spotted that the
highbank board has a more awkward-to-fix similar problem.
We're going to revert the PSCI emulation change for 6.2 so we
can take the time to get the cleanup right and land it in 7.0.
- Usual patch accumulation, review, etc during release cycle
-- PMM
VirtIO Initiative ([STR-9])
===========================
- project admin
[STR-9] <https://linaro.atlassian.net/browse/STR-9>
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
[proposal] <https://github.com/rust-vmm/vhost-device/pull/57>
vhost-device maintainer effort ([UM-196])
- did a bunch of review on [vhost-device crate]
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
[vhost-device crate] <https://github.com/rust-vmm/vhost-device>
QEMU Upstream Work ([UM-2])
===========================
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- posted [kvm-unit-tests PATCH v3 0/3] GIC ITS tests Message-Id:
<20211112114734.3058678-1-alex.bennee(a)linaro.org>
- might as well flush the tree state as I left it
- posted [RFC PATCH] hw/intc: clean-up error reporting for failed
ITS cmd Message-Id:
<20211112170454.3158925-1-alex.bennee(a)linaro.org>
- re-based [mttcg tests to current state and fixed up]
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
[mttcg tests to current state and fixed up]
<https://github.com/stsquad/qemu/tree/mttcg/current-tests-v8>
Completed Reviews [2/2]
=======================
[PATCH v2 0/3] Some watchpoint-related patches
Message-Id: <163662450348.125458.5494710452733592356.stgit@pasha-ThinkPad-X280>
[PATCH 0/5] Update linux-headers + NOIRQ support for KVM gdbstub
Message-Id: <20211111110604.207376-1-pbonzini(a)redhat.com>
Absences
========
- none
Current Review Queue
====================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
TODO [PATCH] softmmu: fix watchpoint processing in icount mode
Message-Id: <163101424137.678744.18360776310711795413.stgit@pasha-ThinkPad-X280>
==============================================================================================================================================
--
Alex Bennée
Progress (short week, 3 days)
* UM-2 [QEMU upstream maintainership]
- recent changes to QEMU's PSCI emulation broke booting of guest code
at EL3 on the imx7 board, which was previously accidentally
relying on PSCI-emulation-via-SMC not getting in its way despite
being enabled. We need to make this board disable PSCI when the
guest code is booting to EL3, as the virt board does, but it's
trickier here because the CPU-creation code is hidden inside a
model of an SoC object. After some on-list discussion I have a
plan for how to restructure this, and need to write some code...
* QEMU-420 [GICv4 emulation]
- re-read the GIC architecture specification, acquired a better
understanding of the required work, and broke this epic down into
stories
- discussed with Leif how the ITS support should be landed in the
sbsa-ref board
Misc:
* higher-than-usual amount of meetings and meeting-prep this week
-- PMM
After llvm commit f411c1dd95092139c8b992260705ac0b75c8583f
Author: Peter Klausler <pklausler(a)nvidia.com>
[flang] Fix crash in semantic error recovery situation
the following benchmarks slowed down by more than 2%:
- 456.hmmer slowed down by 3% from 7600 to 7806 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-f411c1dd95092139c8b992260705ac0b75c8583f
cd investigate-llvm-f411c1dd95092139c8b992260705ac0b75c8583f
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach f411c1dd95092139c8b992260705ac0b75c8583f
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c0b298fc213c1b33e97ca72fba58597365375875
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f411c1dd95092139c8b992260705ac0b75c8583f
Author: Peter Klausler <pklausler(a)nvidia.com>
Date: Tue Nov 2 16:41:15 2021 -0700
[flang] Fix crash in semantic error recovery situation
A CHECK() in semantics is triggering when analyzing a program
with an undefined derived type pointer because the CHECK is
expecting a new error message to have been issued in a function
but not allowing for the case that a diagnostic could have been
produced earlier. Adjust the predicate.
Differential Revision: https://reviews.llvm.org/D113307
---
flang/lib/Semantics/expression.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index 331b9b2cf5bc..8ee8c9a9c9ce 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -1916,7 +1916,7 @@ auto ExpressionAnalyzer::AnalyzeProcedureComponentRef(
"Base of procedure component reference is not a derived-type object"_err_en_US);
}
}
- CHECK(!GetContextualMessages().empty());
+ CHECK(context_.AnyFatalError());
return std::nullopt;
}
</cut>
Hello,
We have been using Linaro GCC 7.5-2019.12 for the A53.
As we move on to new tech there seems to be no support for "-
mcpu=cortex-a55".
Today, we use the aarch64-elf- toolchain.
What GCC do you suggest we start using for A55 ?
Thanks,
Stefan
VirtIO Initiative ([STR-9])
===========================
- various rust-vmm discussions
- [upstream rust-vmm sync meeting]
- how to deal with vhost-device/vm-virtio split: [proposal]
- synced with ARM on their interests
- got update on Fwd: FW: [App-services] Slides from the
hypervisor-less virtio status meeting Message-Id:
<CAHDbmO2G4hUyfxtaxwnbxsrMk+P41zbL-7VNe=Aa6DshxC-5zQ(a)mail.gmail.com>
[STR-9] <https://linaro.atlassian.net/browse/STR-9>
[upstream rust-vmm sync meeting]
<https://etherpad.opendev.org/p/rust-vmm-sync-2021&sa=D&source=calendar&ust=…>
[proposal] <https://github.com/rust-vmm/vhost-device/pull/57>
QEMU Upstream Work ([UM-2])
===========================
- did some bug triage and investigated [555] and [690] which might
intersect with earlier changes I made
- spent time on the PR from hell [PULL 00/30] testing, gdbstub and
semihosting Message-Id:
<20210115130828.23968-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
[555] <https://gitlab.com/qemu-project/qemu/-/issues/555>
[690] <https://gitlab.com/qemu-project/qemu/-/issues/690>
Other
=====
- TSC report preparation for QEMU and Stratos
Completed Reviews [1/1]
=======================
[XEN PATCH v7 00/51] xen: Build system improvements, now with out-of-tree build!
Message-Id: <20210824105038.1257926-1-anthony.perard(a)citrix.com>
Absences
========
,----
| (save-excursion
| (goto-char (point-min))
| (when (re-search-forward "* Absences")
| (goto-char (match-beginning 0))
| (org-export-as 'ascii t nil t )))
`----
Current Review Queue
====================
TODO [PATCH v2 00/48] tcg: optimize redundant sign extensions
Message-Id: <20211007195456.1168070-1-richard.henderson(a)linaro.org>
================================================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
Progress
* UM-2 [QEMU upstream maintainership]
+ worked through the big pile of email that had built up while
I was on holiday...
+ some long-delayed sysadmin tasks on my work machines now I have
an opportunity to go into the office and do things that would be
too risky with only remote access
+ triaged a bunch of Coverity issues
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ All work here has now gone upstream; closed!
-- PMM
After llvm commit fbc0c308d599fe3300ab6516650b65b41979446d
Author: Nikita Popov <nikita.ppv(a)gmail.com>
[BasicAA] Handle known bits as ranges
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 10899 to 11610 perf samples
- 464.h264ref:libc.so.6 slowed down by 11% from 3538 to 3922 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d
cd investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach fbc0c308d599fe3300ab6516650b65b41979446d
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 30a3652b6ade43504087f6e3acd8dc879055f501
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit fbc0c308d599fe3300ab6516650b65b41979446d
Author: Nikita Popov <nikita.ppv(a)gmail.com>
Date: Mon Oct 25 15:47:21 2021 +0200
[BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by
checking whether all variable indices are positive based on known
bits, multiplied by a positive scale. However, this is incorrect
if the scale multiplication might overflow. In the modified test
case the original value is positive, but may be negative after a
left shift.
Fix this by converting known bits into a constant range and reusing
the range-based logic, which handles overflow correctly.
Differential Revision: https://reviews.llvm.org/D112611
---
llvm/lib/Analysis/BasicAliasAnalysis.cpp | 51 +++++-----------------
.../test/Analysis/BasicAA/assume-index-positive.ll | 4 +-
2 files changed, 12 insertions(+), 43 deletions(-)
diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index 0305732ca5d5..8cf947c43bf4 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -318,15 +318,6 @@ struct CastedValue {
return N;
}
- KnownBits evaluateWith(KnownBits N) const {
- assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() &&
- "Incompatible bit width");
- if (TruncBits) N = N.trunc(N.getBitWidth() - TruncBits);
- if (SExtBits) N = N.sext(N.getBitWidth() + SExtBits);
- if (ZExtBits) N = N.zext(N.getBitWidth() + ZExtBits);
- return N;
- }
-
ConstantRange evaluateWith(ConstantRange N) const {
assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() &&
"Incompatible bit width");
@@ -1250,8 +1241,6 @@ AliasResult BasicAAResult::aliasGEP(
if (!DecompGEP1.VarIndices.empty()) {
APInt GCD;
- bool AllNonNegative = DecompGEP1.Offset.isNonNegative();
- bool AllNonPositive = DecompGEP1.Offset.isNonPositive();
ConstantRange OffsetRange = ConstantRange(DecompGEP1.Offset);
for (unsigned i = 0, e = DecompGEP1.VarIndices.size(); i != e; ++i) {
const VariableGEPIndex &Index = DecompGEP1.VarIndices[i];
@@ -1266,24 +1255,19 @@ AliasResult BasicAAResult::aliasGEP(
else
GCD = APIntOps::GreatestCommonDivisor(GCD, ScaleForGCD.abs());
- if (AllNonNegative || AllNonPositive) {
- KnownBits Known = Index.Val.evaluateWith(
- computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT));
- bool SignKnownZero = Known.isNonNegative();
- bool SignKnownOne = Known.isNegative();
- AllNonNegative &= (SignKnownZero && Scale.isNonNegative()) ||
- (SignKnownOne && Scale.isNonPositive());
- AllNonPositive &= (SignKnownZero && Scale.isNonPositive()) ||
- (SignKnownOne && Scale.isNonNegative());
- }
+ ConstantRange CR =
+ computeConstantRange(Index.Val.V, true, &AC, Index.CxtI);
+ KnownBits Known =
+ computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT);
+ CR = CR.intersectWith(
+ ConstantRange::fromKnownBits(Known, /* Signed */ true),
+ ConstantRange::Signed);
assert(OffsetRange.getBitWidth() == Scale.getBitWidth() &&
"Bit widths are normalized to MaxPointerSize");
- OffsetRange = OffsetRange.add(Index.Val
- .evaluateWith(computeConstantRange(
- Index.Val.V, true, &AC, Index.CxtI))
- .sextOrTrunc(OffsetRange.getBitWidth())
- .smul_fast(ConstantRange(Scale)));
+ OffsetRange = OffsetRange.add(
+ Index.Val.evaluateWith(CR).sextOrTrunc(OffsetRange.getBitWidth())
+ .smul_fast(ConstantRange(Scale)));
}
// We now have accesses at two offsets from the same base:
@@ -1300,21 +1284,6 @@ AliasResult BasicAAResult::aliasGEP(
(GCD - ModOffset).uge(V1Size.getValue()))
return AliasResult::NoAlias;
- // If we know all the variables are non-negative, then the total offset is
- // also non-negative and >= DecompGEP1.Offset. We have the following layout:
- // [0, V2Size) ... [TotalOffset, TotalOffer+V1Size]
- // If DecompGEP1.Offset >= V2Size, the accesses don't alias.
- if (AllNonNegative && V2Size.hasValue() &&
- DecompGEP1.Offset.uge(V2Size.getValue()))
- return AliasResult::NoAlias;
- // Similarly, if the variables are non-positive, then the total offset is
- // also non-positive and <= DecompGEP1.Offset. We have the following layout:
- // [TotalOffset, TotalOffset+V1Size) ... [0, V2Size)
- // If -DecompGEP1.Offset >= V1Size, the accesses don't alias.
- if (AllNonPositive && V1Size.hasValue() &&
- (-DecompGEP1.Offset).uge(V1Size.getValue()))
- return AliasResult::NoAlias;
-
if (V1Size.hasValue() && V2Size.hasValue()) {
// Compute ranges of potentially accessed bytes for both accesses. If the
// interseciton is empty, there can be no overlap.
diff --git a/llvm/test/Analysis/BasicAA/assume-index-positive.ll b/llvm/test/Analysis/BasicAA/assume-index-positive.ll
index 451592067f4b..a53fff2c6009 100644
--- a/llvm/test/Analysis/BasicAA/assume-index-positive.ll
+++ b/llvm/test/Analysis/BasicAA/assume-index-positive.ll
@@ -130,12 +130,12 @@ define void @symmetry([0 x i8]* %ptr, i32 %a, i32 %b, i32 %c) {
ret void
}
-; TODO: %ptr.neg and %ptr.shl may alias, as the shl renders the previously
+; %ptr.neg and %ptr.shl may alias, as the shl renders the previously
; non-negative value potentially negative.
define void @shl_of_non_negative(i8* %ptr, i64 %a) {
; CHECK-LABEL: Function: shl_of_non_negative
; CHECK: NoAlias: i8* %ptr.a, i8* %ptr.neg
-; CHECK: NoAlias: i8* %ptr.neg, i8* %ptr.shl
+; CHECK: MayAlias: i8* %ptr.neg, i8* %ptr.shl
%a.cmp = icmp sge i64 %a, 0
call void @llvm.assume(i1 %a.cmp)
%ptr.neg = getelementptr i8, i8* %ptr, i64 -2
</cut>
After llvm commit adf55ac6657693f7bfbe3087b599b4031a765a44
Author: Lang Hames <lhames(a)gmail.com>
[ORC] Call ExecutorProcessControl::disconnect in unit tests that require it.
the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%):
- 400.perlbench:[.] S_find_byclass slowed down by 12% from 644 to 721 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-adf55ac6657693f7bfbe3087b599b4031a765a44
cd investigate-llvm-adf55ac6657693f7bfbe3087b599b4031a765a44
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach adf55ac6657693f7bfbe3087b599b4031a765a44
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach f526ee5b8517b60620cd03bb3e5945ed69d6bfaa
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit adf55ac6657693f7bfbe3087b599b4031a765a44
Author: Lang Hames <lhames(a)gmail.com>
Date: Tue Oct 12 14:55:49 2021 -0700
[ORC] Call ExecutorProcessControl::disconnect in unit tests that require it.
Another follow-up to 2815ed57e3c and 19b4e3cfc6a. For unit tests that don't use
an ExecutionSession we need to call ExecutorProcessControl::disconnect directly
to wait for the dispatcher to shut down.
https://llvm.org/PR52153
---
.../ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp | 2 ++
llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp | 2 ++
2 files changed, 4 insertions(+)
diff --git a/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp b/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp
index f2b157e424b6..a95435aec2a3 100644
--- a/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp
+++ b/llvm/unittests/ExecutionEngine/Orc/EPCGenericJITLinkMemoryManagerTest.cpp
@@ -134,6 +134,8 @@ TEST(EPCGenericJITLinkMemoryManagerTest, AllocFinalizeFree) {
auto Err2 = MemMgr->deallocate(std::move(*FA));
EXPECT_THAT_ERROR(std::move(Err2), Succeeded());
+
+ cantFail(SelfEPC->disconnect());
}
} // namespace
diff --git a/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp b/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp
index 78024644ca8b..beb0fefa094a 100644
--- a/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp
+++ b/llvm/unittests/ExecutionEngine/Orc/EPCGenericMemoryAccessTest.cpp
@@ -93,6 +93,8 @@ TEST(EPCGenericMemoryAccessTest, MemWrites) {
{{pointerToJITTargetAddress(&Test_Buffer), TestMsg}});
EXPECT_THAT_ERROR(std::move(Err5), Succeeded());
EXPECT_EQ(StringRef(Test_Buffer, TestMsg.size()), TestMsg);
+
+ cantFail(SelfEPC->disconnect());
}
} // namespace
</cut>
== This Week ==
* GCC
- Committed a clean up patch to gimple-isel
- PR93183: Committed fix
- PR102376: Patch approved upstream
- PR83750: Patch approved upstream but it regresses one test-case.
== Next Week ==
- Continue with ongoing tasks
After llvm commit bc69dd62c04a70d29943c1c06c7effed150b70e1
Author: Alexey Bataev <a.bataev(a)outlook.com>
[SLP]Improve graph reordering.
the following benchmarks grew in size by more than 1%:
- 444.namd grew in size by 2% from 192302 to 195218 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-bc69dd62c04a70d29943c1c06c7effed150b70e1
cd investigate-llvm-bc69dd62c04a70d29943c1c06c7effed150b70e1
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach bc69dd62c04a70d29943c1c06c7effed150b70e1
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 5661317f864abf750cf893c6a4cc7a977be0995a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit bc69dd62c04a70d29943c1c06c7effed150b70e1
Author: Alexey Bataev <a.bataev(a)outlook.com>
Date: Tue Aug 3 13:20:32 2021 -0700
[SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/D105020
---
.../llvm/Transforms/Vectorize/SLPVectorizer.h | 3 +-
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 1364 ++++++++++++++------
.../AArch64/transpose-inseltpoison.ll | 84 +-
.../Transforms/SLPVectorizer/AArch64/transpose.ll | 84 +-
llvm/test/Transforms/SLPVectorizer/X86/addsub.ll | 42 +-
.../Transforms/SLPVectorizer/X86/crash_cmpop.ll | 6 +-
llvm/test/Transforms/SLPVectorizer/X86/extract.ll | 6 +-
.../SLPVectorizer/X86/jumbled-load-multiuse.ll | 12 +-
.../Transforms/SLPVectorizer/X86/jumbled-load.ll | 22 +-
.../SLPVectorizer/X86/jumbled_store_crash.ll | 29 +-
.../SLPVectorizer/X86/reorder_repeated_ops.ll | 4 +-
.../SLPVectorizer/X86/split-load8_2-unord.ll | 4 +-
.../X86/vectorize-reorder-alt-shuffle.ll | 9 +-
.../SLPVectorizer/X86/vectorize-reorder-reuse.ll | 52 +-
14 files changed, 1119 insertions(+), 602 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h b/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
index f416a592d683..5e8c29913cad 100644
--- a/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
+++ b/llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
@@ -95,8 +95,7 @@ private:
/// Try to vectorize a list of operands.
/// \returns true if a value was vectorized.
- bool tryToVectorizeList(ArrayRef<Value *> VL, slpvectorizer::BoUpSLP &R,
- bool AllowReorder = false);
+ bool tryToVectorizeList(ArrayRef<Value *> VL, slpvectorizer::BoUpSLP &R);
/// Try to vectorize a chain that may start at the operands of \p I.
bool tryToVectorize(Instruction *I, slpvectorizer::BoUpSLP &R);
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 9c0029484964..7400b3d8a503 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -21,6 +21,7 @@
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/Optional.h"
#include "llvm/ADT/PostOrderIterator.h"
+#include "llvm/ADT/PriorityQueue.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/SetVector.h"
@@ -535,13 +536,68 @@ static bool isSimple(Instruction *I) {
return true;
}
+/// Shuffles \p Mask in accordance with the given \p SubMask.
+static void addMask(SmallVectorImpl<int> &Mask, ArrayRef<int> SubMask) {
+ if (SubMask.empty())
+ return;
+ if (Mask.empty()) {
+ Mask.append(SubMask.begin(), SubMask.end());
+ return;
+ }
+ SmallVector<int> NewMask(SubMask.size(), UndefMaskElem);
+ int TermValue = std::min(Mask.size(), SubMask.size());
+ for (int I = 0, E = SubMask.size(); I < E; ++I) {
+ if (SubMask[I] >= TermValue || SubMask[I] == UndefMaskElem ||
+ Mask[SubMask[I]] >= TermValue)
+ continue;
+ NewMask[I] = Mask[SubMask[I]];
+ }
+ Mask.swap(NewMask);
+}
+
+/// Order may have elements assigned special value (size) which is out of
+/// bounds. Such indices only appear on places which correspond to undef values
+/// (see canReuseExtract for details) and used in order to avoid undef values
+/// have effect on operands ordering.
+/// The first loop below simply finds all unused indices and then the next loop
+/// nest assigns these indices for undef values positions.
+/// As an example below Order has two undef positions and they have assigned
+/// values 3 and 7 respectively:
+/// before: 6 9 5 4 9 2 1 0
+/// after: 6 3 5 4 7 2 1 0
+/// \returns Fixed ordering.
+static void fixupOrderingIndices(SmallVectorImpl<unsigned> &Order) {
+ const unsigned Sz = Order.size();
+ SmallBitVector UsedIndices(Sz);
+ SmallVector<int> MaskedIndices;
+ for (unsigned I = 0; I < Sz; ++I) {
+ if (Order[I] < Sz)
+ UsedIndices.set(Order[I]);
+ else
+ MaskedIndices.push_back(I);
+ }
+ if (MaskedIndices.empty())
+ return;
+ SmallVector<int> AvailableIndices(MaskedIndices.size());
+ unsigned Cnt = 0;
+ int Idx = UsedIndices.find_first();
+ do {
+ AvailableIndices[Cnt] = Idx;
+ Idx = UsedIndices.find_next(Idx);
+ ++Cnt;
+ } while (Idx > 0);
+ assert(Cnt == MaskedIndices.size() && "Non-synced masked/available indices.");
+ for (int I = 0, E = MaskedIndices.size(); I < E; ++I)
+ Order[MaskedIndices[I]] = AvailableIndices[I];
+}
+
namespace llvm {
static void inversePermutation(ArrayRef<unsigned> Indices,
SmallVectorImpl<int> &Mask) {
Mask.clear();
const unsigned E = Indices.size();
- Mask.resize(E, E + 1);
+ Mask.resize(E, UndefMaskElem);
for (unsigned I = 0; I < E; ++I)
Mask[Indices[I]] = I;
}
@@ -581,6 +637,22 @@ static Optional<int> getInsertIndex(Value *InsertInst, unsigned Offset) {
return Index;
}
+/// Reorders the list of scalars in accordance with the given \p Order and then
+/// the \p Mask. \p Order - is the original order of the scalars, need to
+/// reorder scalars into an unordered state at first according to the given
+/// order. Then the ordered scalars are shuffled once again in accordance with
+/// the provided mask.
+static void reorderScalars(SmallVectorImpl<Value *> &Scalars,
+ ArrayRef<int> Mask) {
+ assert(!Mask.empty() && "Expected non-empty mask.");
+ SmallVector<Value *> Prev(Scalars.size(),
+ UndefValue::get(Scalars.front()->getType()));
+ Prev.swap(Scalars);
+ for (unsigned I = 0, E = Prev.size(); I < E; ++I)
+ if (Mask[I] != UndefMaskElem)
+ Scalars[Mask[I]] = Prev[I];
+}
+
namespace slpvectorizer {
/// Bottom Up SLP Vectorizer.
@@ -645,13 +717,12 @@ public:
void buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst = None);
- /// Construct a vectorizable tree that starts at \p Roots, ignoring users for
- /// the purpose of scheduling and extraction in the \p UserIgnoreLst taking
- /// into account (and updating it, if required) list of externally used
- /// values stored in \p ExternallyUsedValues.
- void buildTree(ArrayRef<Value *> Roots,
- ExtraValueToDebugLocsMap &ExternallyUsedValues,
- ArrayRef<Value *> UserIgnoreLst = None);
+ /// Builds external uses of the vectorized scalars, i.e. the list of
+ /// vectorized scalars to be extracted, their lanes and their scalar users. \p
+ /// ExternallyUsedValues contains additional list of external uses to handle
+ /// vectorization of reductions.
+ void
+ buildExternalUses(const ExtraValueToDebugLocsMap &ExternallyUsedValues = {});
/// Clear the internal data structures that are created by 'buildTree'.
void deleteTree() {
@@ -659,8 +730,6 @@ public:
ScalarToTreeEntry.clear();
MustGather.clear();
ExternalUses.clear();
- NumOpsWantToKeepOrder.clear();
- NumOpsWantToKeepOriginalOrder = 0;
for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();
BS->clear();
@@ -674,103 +743,22 @@ public:
/// Perform LICM and CSE on the newly generated gather sequences.
void optimizeGatherSequence();
- /// \returns The best order of instructions for vectorization.
- Optional<ArrayRef<unsigned>> bestOrder() const {
- assert(llvm::all_of(
- NumOpsWantToKeepOrder,
- [this](const decltype(NumOpsWantToKeepOrder)::value_type &D) {
- return D.getFirst().size() ==
- VectorizableTree[0]->Scalars.size();
- }) &&
- "All orders must have the same size as number of instructions in "
- "tree node.");
- auto I = std::max_element(
- NumOpsWantToKeepOrder.begin(), NumOpsWantToKeepOrder.end(),
- [](const decltype(NumOpsWantToKeepOrder)::value_type &D1,
- const decltype(NumOpsWantToKeepOrder)::value_type &D2) {
- return D1.second < D2.second;
- });
- if (I == NumOpsWantToKeepOrder.end() ||
- I->getSecond() <= NumOpsWantToKeepOriginalOrder)
- return None;
-
- return makeArrayRef(I->getFirst());
- }
-
- /// Builds the correct order for root instructions.
- /// If some leaves have the same instructions to be vectorized, we may
- /// incorrectly evaluate the best order for the root node (it is built for the
- /// vector of instructions without repeated instructions and, thus, has less
- /// elements than the root node). This function builds the correct order for
- /// the root node.
- /// For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
- /// are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
- /// leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
- /// be reordered, the best order will be \<1, 0\>. We need to extend this
- /// order for the root node. For the root node this order should look like
- /// \<3, 0, 1, 2\>. This function extends the order for the reused
- /// instructions.
- void findRootOrder(OrdersType &Order) {
- // If the leaf has the same number of instructions to vectorize as the root
- // - order must be set already.
- unsigned RootSize = VectorizableTree[0]->Scalars.size();
- if (Order.size() == RootSize)
- return;
- SmallVector<unsigned, 4> RealOrder(Order.size());
- std::swap(Order, RealOrder);
- SmallVector<int, 4> Mask;
- inversePermutation(RealOrder, Mask);
- Order.assign(Mask.begin(), Mask.end());
- // The leaf has less number of instructions - need to find the true order of
- // the root.
- // Scan the nodes starting from the leaf back to the root.
- const TreeEntry *PNode = VectorizableTree.back().get();
- SmallVector<const TreeEntry *, 4> Nodes(1, PNode);
- SmallPtrSet<const TreeEntry *, 4> Visited;
- while (!Nodes.empty() && Order.size() != RootSize) {
- const TreeEntry *PNode = Nodes.pop_back_val();
- if (!Visited.insert(PNode).second)
- continue;
- const TreeEntry &Node = *PNode;
- for (const EdgeInfo &EI : Node.UserTreeIndices)
- if (EI.UserTE)
- Nodes.push_back(EI.UserTE);
- if (Node.ReuseShuffleIndices.empty())
- continue;
- // Build the order for the parent node.
- OrdersType NewOrder(Node.ReuseShuffleIndices.size(), RootSize);
- SmallVector<unsigned, 4> OrderCounter(Order.size(), 0);
- // The algorithm of the order extension is:
- // 1. Calculate the number of the same instructions for the order.
- // 2. Calculate the index of the new order: total number of instructions
- // with order less than the order of the current instruction + reuse
- // number of the current instruction.
- // 3. The new order is just the index of the instruction in the original
- // vector of the instructions.
- for (unsigned I : Node.ReuseShuffleIndices)
- ++OrderCounter[Order[I]];
- SmallVector<unsigned, 4> CurrentCounter(Order.size(), 0);
- for (unsigned I = 0, E = Node.ReuseShuffleIndices.size(); I < E; ++I) {
- unsigned ReusedIdx = Node.ReuseShuffleIndices[I];
- unsigned OrderIdx = Order[ReusedIdx];
- unsigned NewIdx = 0;
- for (unsigned J = 0; J < OrderIdx; ++J)
- NewIdx += OrderCounter[J];
- NewIdx += CurrentCounter[OrderIdx];
- ++CurrentCounter[OrderIdx];
- assert(NewOrder[NewIdx] == RootSize &&
- "The order index should not be written already.");
- NewOrder[NewIdx] = I;
- }
- std::swap(Order, NewOrder);
- }
- assert(Order.size() == RootSize &&
- "Root node is expected or the size of the order must be the same as "
- "the number of elements in the root node.");
- assert(llvm::all_of(Order,
- [RootSize](unsigned Val) { return Val != RootSize; }) &&
- "All indices must be initialized");
- }
+ /// Reorders the current graph to the most profitable order starting from the
+ /// root node to the leaf nodes. The best order is chosen only from the nodes
+ /// of the same size (vectorization factor). Smaller nodes are considered
+ /// parts of subgraph with smaller VF and they are reordered independently. We
+ /// can make it because we still need to extend smaller nodes to the wider VF
+ /// and we can merge reordering shuffles with the widening shuffles.
+ void reorderTopToBottom();
+
+ /// Reorders the current graph to the most profitable order starting from
+ /// leaves to the root. It allows to rotate small subgraphs and reduce the
+ /// number of reshuffles if the leaf nodes use the same order. In this case we
+ /// can merge the orders and just shuffle user node instead of shuffling its
+ /// operands. Plus, even the leaf nodes have different orders, it allows to
+ /// sink reordering in the graph closer to the root node and merge it later
+ /// during analysis.
+ void reorderBottomToTop();
/// \return The vector element size in bits to use when vectorizing the
/// expression tree ending at \p V. If V is a store, the size is the width of
@@ -793,6 +781,10 @@ public:
return MinVecRegSize;
}
+ unsigned getMinVF(unsigned Sz) const {
+ return std::max(2U, getMinVecRegSize() / Sz);
+ }
+
unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
unsigned MaxVF = MaxVFOption.getNumOccurrences() ?
MaxVFOption : TTI->getMaximumVF(ElemWidth, Opcode);
@@ -1621,12 +1613,29 @@ private:
/// \returns true if the scalars in VL are equal to this entry.
bool isSame(ArrayRef<Value *> VL) const {
- if (VL.size() == Scalars.size())
- return std::equal(VL.begin(), VL.end(), Scalars.begin());
- return VL.size() == ReuseShuffleIndices.size() &&
- std::equal(
- VL.begin(), VL.end(), ReuseShuffleIndices.begin(),
- [this](Value *V, int Idx) { return V == Scalars[Idx]; });
+ auto &&IsSame = [VL](ArrayRef<Value *> Scalars, ArrayRef<int> Mask) {
+ if (Mask.size() != VL.size() && VL.size() == Scalars.size())
+ return std::equal(VL.begin(), VL.end(), Scalars.begin());
+ return VL.size() == Mask.size() &&
+ std::equal(
+ VL.begin(), VL.end(), Mask.begin(),
+ [Scalars](Value *V, int Idx) { return V == Scalars[Idx]; });
+ };
+ if (!ReorderIndices.empty()) {
+ // TODO: implement matching if the nodes are just reordered, still can
+ // treat the vector as the same if the list of scalars matches VL
+ // directly, without reordering.
+ SmallVector<int> Mask;
+ inversePermutation(ReorderIndices, Mask);
+ if (VL.size() == Scalars.size())
+ return IsSame(Scalars, Mask);
+ if (VL.size() == ReuseShuffleIndices.size()) {
+ ::addMask(Mask, ReuseShuffleIndices);
+ return IsSame(Scalars, Mask);
+ }
+ return false;
+ }
+ return IsSame(Scalars, ReuseShuffleIndices);
}
/// A vector of scalars.
@@ -1701,6 +1710,12 @@ private:
}
}
+ /// Reorders operands of the node to the given mask \p Mask.
+ void reorderOperands(ArrayRef<int> Mask) {
+ for (ValueList &Operand : Operands)
+ reorderScalars(Operand, Mask);
+ }
+
/// \returns the \p OpIdx operand of this TreeEntry.
ValueList &getOperand(unsigned OpIdx) {
assert(OpIdx < Operands.size() && "Off bounds");
@@ -1760,19 +1775,14 @@ private:
return AltOp ? AltOp->getOpcode() : 0;
}
- /// Update operations state of this entry if reorder occurred.
- bool updateStateIfReorder() {
- if (ReorderIndices.empty())
- return false;
- InstructionsState S = getSameOpcode(Scalars, ReorderIndices.front());
- setOperations(S);
- return true;
- }
- /// When ReuseShuffleIndices is empty it just returns position of \p V
- /// within vector of Scalars. Otherwise, try to remap on its reuse index.
+ /// When ReuseReorderShuffleIndices is empty it just returns position of \p
+ /// V within vector of Scalars. Otherwise, try to remap on its reuse index.
int findLaneForValue(Value *V) const {
unsigned FoundLane = std::distance(Scalars.begin(), find(Scalars, V));
assert(FoundLane < Scalars.size() && "Couldn't find extract lane");
+ if (!ReorderIndices.empty())
+ FoundLane = ReorderIndices[FoundLane];
+ assert(FoundLane < Scalars.size() && "Couldn't find extract lane");
if (!ReuseShuffleIndices.empty()) {
FoundLane = std::distance(ReuseShuffleIndices.begin(),
find(ReuseShuffleIndices, FoundLane));
@@ -1856,7 +1866,7 @@ private:
TreeEntry *newTreeEntry(ArrayRef<Value *> VL, Optional<ScheduleData *> Bundle,
const InstructionsState &S,
const EdgeInfo &UserTreeIdx,
- ArrayRef<unsigned> ReuseShuffleIndices = None,
+ ArrayRef<int> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {
TreeEntry::EntryState EntryState =
Bundle ? TreeEntry::Vectorize : TreeEntry::NeedToGather;
@@ -1869,7 +1879,7 @@ private:
Optional<ScheduleData *> Bundle,
const InstructionsState &S,
const EdgeInfo &UserTreeIdx,
- ArrayRef<unsigned> ReuseShuffleIndices = None,
+ ArrayRef<int> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {
assert(((!Bundle && EntryState == TreeEntry::NeedToGather) ||
(Bundle && EntryState != TreeEntry::NeedToGather)) &&
@@ -1877,12 +1887,25 @@ private:
VectorizableTree.push_back(std::make_unique<TreeEntry>(VectorizableTree));
TreeEntry *Last = VectorizableTree.back().get();
Last->Idx = VectorizableTree.size() - 1;
- Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->State = EntryState;
Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
ReuseShuffleIndices.end());
- Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
- Last->setOperations(S);
+ if (ReorderIndices.empty()) {
+ Last->Scalars.assign(VL.begin(), VL.end());
+ Last->setOperations(S);
+ } else {
+ // Reorder scalars and build final mask.
+ Last->Scalars.assign(VL.size(), nullptr);
+ transform(ReorderIndices, Last->Scalars.begin(),
+ [VL](unsigned Idx) -> Value * {
+ if (Idx >= VL.size())
+ return UndefValue::get(VL.front()->getType());
+ return VL[Idx];
+ });
+ InstructionsState S = getSameOpcode(Last->Scalars);
+ Last->setOperations(S);
+ Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
+ }
if (Last->State != TreeEntry::NeedToGather) {
for (Value *V : VL) {
assert(!getTreeEntry(V) && "Scalar already in tree!");
@@ -2431,14 +2454,6 @@ private:
}
};
- /// Contains orders of operations along with the number of bundles that have
- /// operations in this order. It stores only those orders that require
- /// reordering, if reordering is not required it is counted using \a
- /// NumOpsWantToKeepOriginalOrder.
- DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> NumOpsWantToKeepOrder;
- /// Number of bundles that do not require reordering.
- unsigned NumOpsWantToKeepOriginalOrder = 0;
-
// Analysis and block reference.
Function *F;
ScalarEvolution *SE;
@@ -2591,21 +2606,439 @@ void BoUpSLP::eraseInstructions(ArrayRef<Value *> AV) {
};
}
-void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
- ArrayRef<Value *> UserIgnoreLst) {
- ExtraValueToDebugLocsMap ExternallyUsedValues;
- buildTree(Roots, ExternallyUsedValues, UserIgnoreLst);
+/// Reorders the given \p Reuses mask according to the given \p Mask. \p Reuses
+/// contains original mask for the scalars reused in the node. Procedure
+/// transform this mask in accordance with the given \p Mask.
+static void reorderReuses(SmallVectorImpl<int> &Reuses, ArrayRef<int> Mask) {
+ assert(!Mask.empty() && Reuses.size() == Mask.size() &&
+ "Expected non-empty mask.");
+ SmallVector<int> Prev(Reuses.begin(), Reuses.end());
+ Prev.swap(Reuses);
+ for (unsigned I = 0, E = Prev.size(); I < E; ++I)
+ if (Mask[I] != UndefMaskElem)
+ Reuses[Mask[I]] = Prev[I];
}
-void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
- ExtraValueToDebugLocsMap &ExternallyUsedValues,
- ArrayRef<Value *> UserIgnoreLst) {
- deleteTree();
- UserIgnoreList = UserIgnoreLst;
- if (!allSameType(Roots))
+/// Reorders the given \p Order according to the given \p Mask. \p Order - is
+/// the original order of the scalars. Procedure transforms the provided order
+/// in accordance with the given \p Mask. If the resulting \p Order is just an
+/// identity order, \p Order is cleared.
+static void reorderOrder(SmallVectorImpl<unsigned> &Order, ArrayRef<int> Mask) {
+ assert(!Mask.empty() && "Expected non-empty mask.");
+ SmallVector<int> MaskOrder;
+ if (Order.empty()) {
+ MaskOrder.resize(Mask.size());
+ std::iota(MaskOrder.begin(), MaskOrder.end(), 0);
+ } else {
+ inversePermutation(Order, MaskOrder);
+ }
+ reorderReuses(MaskOrder, Mask);
+ if (ShuffleVectorInst::isIdentityMask(MaskOrder)) {
+ Order.clear();
return;
- buildTree_rec(Roots, 0, EdgeInfo());
+ }
+ Order.assign(Mask.size(), Mask.size());
+ for (unsigned I = 0, E = Mask.size(); I < E; ++I)
+ if (MaskOrder[I] != UndefMaskElem)
+ Order[MaskOrder[I]] = I;
+ fixupOrderingIndices(Order);
+}
+
+void BoUpSLP::reorderTopToBottom() {
+ // Maps VF to the graph nodes.
+ DenseMap<unsigned, SmallPtrSet<TreeEntry *, 4>> VFToOrderedEntries;
+ // ExtractElement gather nodes which can be vectorized and need to handle
+ // their ordering.
+ DenseMap<const TreeEntry *, OrdersType> GathersToOrders;
+ // Find all reorderable nodes with the given VF.
+ // Currently the are vectorized loads,extracts + some gathering of extracts.
+ for_each(VectorizableTree, [this, &VFToOrderedEntries, &GathersToOrders](
+ const std::unique_ptr<TreeEntry> &TE) {
+ // No need to reorder if need to shuffle reuses, still need to shuffle the
+ // node.
+ if (!TE->ReuseShuffleIndices.empty())
+ return;
+ if (TE->State == TreeEntry::Vectorize &&
+ isa<LoadInst, ExtractElementInst, ExtractValueInst, StoreInst,
+ InsertElementInst>(TE->getMainOp()) &&
+ !TE->isAltShuffle()) {
+ VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
+ } else if (TE->State == TreeEntry::NeedToGather &&
+ TE->getOpcode() == Instruction::ExtractElement &&
+ !TE->isAltShuffle() &&
+ isa<FixedVectorType>(cast<ExtractElementInst>(TE->getMainOp())
+ ->getVectorOperandType()) &&
+ allSameType(TE->Scalars) && allSameBlock(TE->Scalars)) {
+ // Check that gather of extractelements can be represented as
+ // just a shuffle of a single vector.
+ OrdersType CurrentOrder;
+ bool Reuse = canReuseExtract(TE->Scalars, TE->getMainOp(), CurrentOrder);
+ if (Reuse || !CurrentOrder.empty()) {
+ VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
+ GathersToOrders.try_emplace(TE.get(), CurrentOrder);
+ }
+ }
+ });
+
+ // Reorder the graph nodes according to their vectorization factor.
+ for (unsigned VF = VectorizableTree.front()->Scalars.size(); VF > 1;
+ VF /= 2) {
+ auto It = VFToOrderedEntries.find(VF);
+ if (It == VFToOrderedEntries.end())
+ continue;
+ // Try to find the most profitable order. We just are looking for the most
+ // used order and reorder scalar elements in the nodes according to this
+ // mostly used order.
+ const SmallPtrSetImpl<TreeEntry *> &OrderedEntries = It->getSecond();
+ // All operands are reordered and used only in this node - propagate the
+ // most used order to the user node.
+ DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> OrdersUses;
+ SmallPtrSet<const TreeEntry *, 4> VisitedOps;
+ for (const TreeEntry *OpTE : OrderedEntries) {
+ // No need to reorder this nodes, still need to extend and to use shuffle,
+ // just need to merge reordering shuffle and the reuse shuffle.
+ if (!OpTE->ReuseShuffleIndices.empty())
+ continue;
+ // Count number of orders uses.
+ const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {
+ if (OpTE->State == TreeEntry::NeedToGather)
+ return GathersToOrders.find(OpTE)->second;
+ return OpTE->ReorderIndices;
+ }();
+ // Stores actually store the mask, not the order, need to invert.
+ if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&
+ OpTE->getOpcode() == Instruction::Store && !Order.empty()) {
+ SmallVector<int> Mask;
+ inversePermutation(Order, Mask);
+ unsigned E = Order.size();
+ OrdersType CurrentOrder(E, E);
+ transform(Mask, CurrentOrder.begin(), [E](int Idx) {
+ return Idx == UndefMaskElem ? E : static_cast<unsigned>(Idx);
+ });
+ fixupOrderingIndices(CurrentOrder);
+ ++OrdersUses.try_emplace(CurrentOrder).first->getSecond();
+ } else {
+ ++OrdersUses.try_emplace(Order).first->getSecond();
+ }
+ }
+ // Set order of the user node.
+ if (OrdersUses.empty())
+ continue;
+ // Choose the most used order.
+ ArrayRef<unsigned> BestOrder = OrdersUses.begin()->first;
+ unsigned Cnt = OrdersUses.begin()->second;
+ for (const auto &Pair : llvm::drop_begin(OrdersUses)) {
+ if (Cnt < Pair.second || (Cnt == Pair.second && Pair.first.empty())) {
+ BestOrder = Pair.first;
+ Cnt = Pair.second;
+ }
+ }
+ // Set order of the user node.
+ if (BestOrder.empty())
+ continue;
+ SmallVector<int> Mask;
+ inversePermutation(BestOrder, Mask);
+ SmallVector<int> MaskOrder(BestOrder.size(), UndefMaskElem);
+ unsigned E = BestOrder.size();
+ transform(BestOrder, MaskOrder.begin(), [E](unsigned I) {
+ return I < E ? static_cast<int>(I) : UndefMaskElem;
+ });
+ // Do an actual reordering, if profitable.
+ for (std::unique_ptr<TreeEntry> &TE : VectorizableTree) {
+ // Just do the reordering for the nodes with the given VF.
+ if (TE->Scalars.size() != VF) {
+ if (TE->ReuseShuffleIndices.size() == VF) {
+ // Need to reorder the reuses masks of the operands with smaller VF to
+ // be able to find the match between the graph nodes and scalar
+ // operands of the given node during vectorization/cost estimation.
+ assert(all_of(TE->UserTreeIndices,
+ [VF, &TE](const EdgeInfo &EI) {
+ return EI.UserTE->Scalars.size() == VF ||
+ EI.UserTE->Scalars.size() ==
+ TE->Scalars.size();
+ }) &&
+ "All users must be of VF size.");
+ // Update ordering of the operands with the smaller VF than the given
+ // one.
+ reorderReuses(TE->ReuseShuffleIndices, Mask);
+ }
+ continue;
+ }
+ if (TE->State == TreeEntry::Vectorize &&
+ isa<ExtractElementInst, ExtractValueInst, LoadInst, StoreInst,
+ InsertElementInst>(TE->getMainOp()) &&
+ !TE->isAltShuffle()) {
+ // Build correct orders for extract{element,value}, loads and
+ // stores.
+ reorderOrder(TE->ReorderIndices, Mask);
+ if (isa<InsertElementInst, StoreInst>(TE->getMainOp()))
+ TE->reorderOperands(Mask);
+ } else {
+ // Reorder the node and its operands.
+ TE->reorderOperands(Mask);
+ assert(TE->ReorderIndices.empty() &&
+ "Expected empty reorder sequence.");
+ reorderScalars(TE->Scalars, Mask);
+ }
+ if (!TE->ReuseShuffleIndices.empty()) {
+ // Apply reversed order to keep the original ordering of the reused
+ // elements to avoid extra reorder indices shuffling.
+ OrdersType CurrentOrder;
+ reorderOrder(CurrentOrder, MaskOrder);
+ SmallVector<int> NewReuses;
+ inversePermutation(CurrentOrder, NewReuses);
+ addMask(NewReuses, TE->ReuseShuffleIndices);
+ TE->ReuseShuffleIndices.swap(NewReuses);
+ }
+ }
+ }
+}
+
+void BoUpSLP::reorderBottomToTop() {
+ SetVector<TreeEntry *> OrderedEntries;
+ DenseMap<const TreeEntry *, OrdersType> GathersToOrders;
+ // Find all reorderable leaf nodes with the given VF.
+ // Currently the are vectorized loads,extracts without alternate operands +
+ // some gathering of extracts.
+ SmallVector<TreeEntry *> NonVectorized;
+ for_each(VectorizableTree, [this, &OrderedEntries, &GathersToOrders,
+ &NonVectorized](
+ const std::unique_ptr<TreeEntry> &TE) {
+ // No need to reorder if need to shuffle reuses, still need to shuffle the
+ // node.
+ if (!TE->ReuseShuffleIndices.empty())
+ return;
+ if (TE->State == TreeEntry::Vectorize &&
+ isa<LoadInst, ExtractElementInst, ExtractValueInst>(TE->getMainOp()) &&
+ !TE->isAltShuffle()) {
+ OrderedEntries.insert(TE.get());
+ } else if (TE->State == TreeEntry::NeedToGather &&
+ TE->getOpcode() == Instruction::ExtractElement &&
+ !TE->isAltShuffle() &&
+ isa<FixedVectorType>(cast<ExtractElementInst>(TE->getMainOp())
+ ->getVectorOperandType()) &&
+ allSameType(TE->Scalars) && allSameBlock(TE->Scalars)) {
+ // Check that gather of extractelements can be represented as
+ // just a shuffle of a single vector with a single user only.
+ OrdersType CurrentOrder;
+ bool Reuse = canReuseExtract(TE->Scalars, TE->getMainOp(), CurrentOrder);
+ if ((Reuse || !CurrentOrder.empty()) &&
+ !any_of(
+ VectorizableTree, [&TE](const std::unique_ptr<TreeEntry> &Entry) {
+ return Entry->State == TreeEntry::NeedToGather &&
+ Entry.get() != TE.get() && Entry->isSame(TE->Scalars);
+ })) {
+ OrderedEntries.insert(TE.get());
+ GathersToOrders.try_emplace(TE.get(), CurrentOrder);
+ }
+ }
+ if (TE->State != TreeEntry::Vectorize)
+ NonVectorized.push_back(TE.get());
+ });
+
+ // Checks if the operands of the users are reordarable and have only single
+ // use.
+ auto &&CheckOperands =
+ [this, &NonVectorized](const auto &Data,
+ SmallVectorImpl<TreeEntry *> &GatherOps) {
+ for (unsigned I = 0, E = Data.first->getNumOperands(); I < E; ++I) {
+ if (any_of(Data.second,
+ [I](const std::pair<unsigned, TreeEntry *> &OpData) {
+ return OpData.first == I &&
+ OpData.second->State == TreeEntry::Vectorize;
+ }))
+ continue;
+ ArrayRef<Value *> VL = Data.first->getOperand(I);
+ const TreeEntry *TE = nullptr;
+ const auto *It = find_if(VL, [this, &TE](Value *V) {
+ TE = getTreeEntry(V);
+ return TE;
+ });
+ if (It != VL.end() && TE->isSame(VL))
+ return false;
+ TreeEntry *Gather = nullptr;
+ if (count_if(NonVectorized, [VL, &Gather](TreeEntry *TE) {
+ assert(TE->State != TreeEntry::Vectorize &&
+ "Only non-vectorized nodes are expected.");
+ if (TE->isSame(VL)) {
+ Gather = TE;
+ return true;
+ }
+ return false;
+ }) > 1)
+ return false;
+ if (Gather)
+ GatherOps.push_back(Gather);
+ }
+ return true;
+ };
+ // 1. Propagate order to the graph nodes, which use only reordered nodes.
+ // I.e., if the node has operands, that are reordered, try to make at least
+ // one operand order in the natural order and reorder others + reorder the
+ // user node itself.
+ SmallPtrSet<const TreeEntry *, 4> Visited;
+ while (!OrderedEntries.empty()) {
+ // 1. Filter out only reordered nodes.
+ // 2. If the entry has multiple uses - skip it and jump to the next node.
+ MapVector<TreeEntry *, SmallVector<std::pair<unsigned, TreeEntry *>>> Users;
+ SmallVector<TreeEntry *> Filtered;
+ for (TreeEntry *TE : OrderedEntries) {
+ if (!(TE->State == TreeEntry::Vectorize ||
+ (TE->State == TreeEntry::NeedToGather &&
+ TE->getOpcode() == Instruction::ExtractElement)) ||
+ TE->UserTreeIndices.empty() || !TE->ReuseShuffleIndices.empty() ||
+ !all_of(drop_begin(TE->UserTreeIndices),
+ [TE](const EdgeInfo &EI) {
+ return EI.UserTE == TE->UserTreeIndices.front().UserTE;
+ }) ||
+ !Visited.insert(TE).second) {
+ Filtered.push_back(TE);
+ continue;
+ }
+ // Build a map between user nodes and their operands order to speedup
+ // search. The graph currently does not provide this dependency directly.
+ for (EdgeInfo &EI : TE->UserTreeIndices) {
+ TreeEntry *UserTE = EI.UserTE;
+ auto It = Users.find(UserTE);
+ if (It == Users.end())
+ It = Users.insert({UserTE, {}}).first;
+ It->second.emplace_back(EI.EdgeIdx, TE);
+ }
+ }
+ // Erase filtered entries.
+ for_each(Filtered,
+ [&OrderedEntries](TreeEntry *TE) { OrderedEntries.remove(TE); });
+ for (const auto &Data : Users) {
+ // Check that operands are used only in the User node.
+ SmallVector<TreeEntry *> GatherOps;
+ if (!CheckOperands(Data, GatherOps)) {
+ for_each(Data.second,
+ [&OrderedEntries](const std::pair<unsigned, TreeEntry *> &Op) {
+ OrderedEntries.remove(Op.second);
+ });
+ continue;
+ }
+ // All operands are reordered and used only in this node - propagate the
+ // most used order to the user node.
+ DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> OrdersUses;
+ SmallPtrSet<const TreeEntry *, 4> VisitedOps;
+ for (const auto &Op : Data.second) {
+ TreeEntry *OpTE = Op.second;
+ if (!OpTE->ReuseShuffleIndices.empty())
+ continue;
+ const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {
+ if (OpTE->State == TreeEntry::NeedToGather)
+ return GathersToOrders.find(OpTE)->second;
+ return OpTE->ReorderIndices;
+ }();
+ // Stores actually store the mask, not the order, need to invert.
+ if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&
+ OpTE->getOpcode() == Instruction::Store && !Order.empty()) {
+ SmallVector<int> Mask;
+ inversePermutation(Order, Mask);
+ unsigned E = Order.size();
+ OrdersType CurrentOrder(E, E);
+ transform(Mask, CurrentOrder.begin(), [E](int Idx) {
+ return Idx == UndefMaskElem ? E : static_cast<unsigned>(Idx);
+ });
+ fixupOrderingIndices(CurrentOrder);
+ ++OrdersUses.try_emplace(CurrentOrder).first->getSecond();
+ } else {
+ ++OrdersUses.try_emplace(Order).first->getSecond();
+ }
+ if (VisitedOps.insert(OpTE).second)
+ OrdersUses.try_emplace({}, 0).first->getSecond() +=
+ OpTE->UserTreeIndices.size();
+ --OrdersUses[{}];
+ }
+ // If no orders - skip current nodes and jump to the next one, if any.
+ if (OrdersUses.empty()) {
+ for_each(Data.second,
+ [&OrderedEntries](const std::pair<unsigned, TreeEntry *> &Op) {
+ OrderedEntries.remove(Op.second);
+ });
+ continue;
+ }
+ // Choose the best order.
+ ArrayRef<unsigned> BestOrder = OrdersUses.begin()->first;
+ unsigned Cnt = OrdersUses.begin()->second;
+ for (const auto &Pair : llvm::drop_begin(OrdersUses)) {
+ if (Cnt < Pair.second || (Cnt == Pair.second && Pair.first.empty())) {
+ BestOrder = Pair.first;
+ Cnt = Pair.second;
+ }
+ }
+ // Set order of the user node (reordering of operands and user nodes).
+ if (BestOrder.empty()) {
+ for_each(Data.second,
+ [&OrderedEntries](const std::pair<unsigned, TreeEntry *> &Op) {
+ OrderedEntries.remove(Op.second);
+ });
+ continue;
+ }
+ // Erase operands from OrderedEntries list and adjust their orders.
+ VisitedOps.clear();
+ SmallVector<int> Mask;
+ inversePermutation(BestOrder, Mask);
+ SmallVector<int> MaskOrder(BestOrder.size(), UndefMaskElem);
+ unsigned E = BestOrder.size();
+ transform(BestOrder, MaskOrder.begin(), [E](unsigned I) {
+ return I < E ? static_cast<int>(I) : UndefMaskElem;
+ });
+ for (const std::pair<unsigned, TreeEntry *> &Op : Data.second) {
+ TreeEntry *TE = Op.second;
+ OrderedEntries.remove(TE);
+ if (!VisitedOps.insert(TE).second)
+ continue;
+ if (!TE->ReuseShuffleIndices.empty() && TE->ReorderIndices.empty()) {
+ // Just reorder reuses indices.
+ reorderReuses(TE->ReuseShuffleIndices, Mask);
+ continue;
+ }
+ // Gathers are processed separately.
+ if (TE->State != TreeEntry::Vectorize)
+ continue;
+ assert((BestOrder.size() == TE->ReorderIndices.size() ||
+ TE->ReorderIndices.empty()) &&
+ "Non-matching sizes of user/operand entries.");
+ reorderOrder(TE->ReorderIndices, Mask);
+ }
+ // For gathers just need to reorder its scalars.
+ for (TreeEntry *Gather : GatherOps) {
+ if (!Gather->ReuseShuffleIndices.empty())
+ continue;
+ assert(Gather->ReorderIndices.empty() &&
+ "Unexpected reordering of gathers.");
+ reorderScalars(Gather->Scalars, Mask);
+ OrderedEntries.remove(Gather);
+ }
+ // Reorder operands of the user node and set the ordering for the user
+ // node itself.
+ if (Data.first->State != TreeEntry::Vectorize ||
+ !isa<ExtractElementInst, ExtractValueInst, LoadInst>(
+ Data.first->getMainOp()) ||
+ Data.first->isAltShuffle())
+ Data.first->reorderOperands(Mask);
+ if (!isa<InsertElementInst, StoreInst>(Data.first->getMainOp()) ||
+ Data.first->isAltShuffle()) {
+ reorderScalars(Data.first->Scalars, Mask);
+ reorderOrder(Data.first->ReorderIndices, MaskOrder);
+ if (Data.first->ReuseShuffleIndices.empty() &&
+ !Data.first->ReorderIndices.empty() &&
+ !Data.first->isAltShuffle()) {
+ // Insert user node to the list to try to sink reordering deeper in
+ // the graph.
+ OrderedEntries.insert(Data.first);
+ }
+ } else {
+ reorderOrder(Data.first->ReorderIndices, Mask);
+ }
+ }
+ }
+}
+void BoUpSLP::buildExternalUses(
+ const ExtraValueToDebugLocsMap &ExternallyUsedValues) {
// Collect the values that we need to extract from the tree.
for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();
@@ -2664,6 +3097,80 @@ void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
}
}
+void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
+ ArrayRef<Value *> UserIgnoreLst) {
+ deleteTree();
+ UserIgnoreList = UserIgnoreLst;
+ if (!allSameType(Roots))
+ return;
+ buildTree_rec(Roots, 0, EdgeInfo());
+}
+
+namespace {
+/// Tracks the state we can represent the loads in the given sequence.
+enum class LoadsState { Gather, Vectorize, ScatterVectorize };
+} // anonymous namespace
+
+/// Checks if the given array of loads can be represented as a vectorized,
+/// scatter or just simple gather.
+static LoadsState canVectorizeLoads(ArrayRef<Value *> VL, const Value *VL0,
+ const TargetTransformInfo &TTI,
+ const DataLayout &DL, ScalarEvolution &SE,
+ SmallVectorImpl<unsigned> &Order,
+ SmallVectorImpl<Value *> &PointerOps) {
+ // Check that a vectorized load would load the same memory as a scalar
+ // load. For example, we don't want to vectorize loads that are smaller
+ // than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM
+ // treats loading/storing it as an i8 struct. If we vectorize loads/stores
+ // from such a struct, we read/write packed bits disagreeing with the
+ // unvectorized version.
+ Type *ScalarTy = VL0->getType();
+
+ if (DL.getTypeSizeInBits(ScalarTy) != DL.getTypeAllocSizeInBits(ScalarTy))
+ return LoadsState::Gather;
+
+ // Make sure all loads in the bundle are simple - we can't vectorize
+ // atomic or volatile loads.
+ PointerOps.clear();
+ PointerOps.resize(VL.size());
+ auto *POIter = PointerOps.begin();
+ for (Value *V : VL) {
+ auto *L = cast<LoadInst>(V);
+ if (!L->isSimple())
+ return LoadsState::Gather;
+ *POIter = L->getPointerOperand();
+ ++POIter;
+ }
+
+ Order.clear();
+ // Check the order of pointer operands.
+ if (llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE, Order)) {
+ Value *Ptr0;
+ Value *PtrN;
+ if (Order.empty()) {
+ Ptr0 = PointerOps.front();
+ PtrN = PointerOps.back();
+ } else {
+ Ptr0 = PointerOps[Order.front()];
+ PtrN = PointerOps[Order.back()];
+ }
+ Optional<int> Diff =
+ getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
+ // Check that the sorted loads are consecutive.
+ if (static_cast<unsigned>(*Diff) == VL.size() - 1)
+ return LoadsState::Vectorize;
+ Align CommonAlignment = cast<LoadInst>(VL0)->getAlign();
</cut>
And the rest of the week I flushed my maintainer queues ;-)
Other
=====
[update-ticket] <file:~/org/team.org::update-ticket>
Update [update-ticket] to work with cloud JIRA
Completed Reviews [8/8]
=======================
[PATCH 0/7] tests: docker images for hexagon, nios2, microblaze
Message-Id: <20211014224435.2539547-1-richard.henderson(a)linaro.org>
[PATCH] gdbstub: Switch to the thread receiving a signal
Message-Id: <20210930095111.23205-1-pavel(a)labath.sk>
[PATCH] replay: improve determinism of virtio-net
Message-Id: <162125666020.1252655.9997723318921206001.stgit@pasha-ThinkPad-X280>
[PATCH RESEND v3 0/2] add APIs to handle alternative sNaN propagation for fmax/fmin
Message-Id: <20211015065500.3850513-1-frank.chang(a)sifive.com>
[PATCH v3 0/5] plugins/cache: multicore cache modelling and minor tweaks
Message-Id: <20210722065428.134608-1-ma.mandourr(a)gmail.com>
[PATCH v2 0/2] plugins: add a drcov plugin
Message-Id: <163429165642.439576.16356288759891202632.stgit@pc-System-Product-Name>
[PATCH v2 0/2] plugins: add a drcov plugin
Message-Id: <163429165642.439576.16356288759891202632.stgit@pc-System-Product-Name>
[PATCH 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20210914155214.105415-1-mlevitsk(a)redhat.com>
Absences
========
- Off Friday next week
Current Review Queue
====================
TODO [PATCH v2 00/48] tcg: optimize redundant sign extensions
Message-Id: <20211007195456.1168070-1-richard.henderson(a)linaro.org>
================================================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV
Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org>
==================================================================================================================================
--
Alex Bennée
OK I've fixed up my JIRA and email tooling so this is a bit of a flush
of stale data from my org-mode.
VirtIO Initiative ([STR-9])
===========================
- posted Enabling hypervisor agnosticism for VirtIO backends
Message-Id: <87v94ldrqq.fsf(a)linaro.org>
- posted [a PR to do some cleanups to vm-virtio]
[STR-9] <https://projects.linaro.org/browse/STR-9>
[a PR to do some cleanups to vm-virtio]
<https://github.com/rust-vmm/vm-virtio/pull/103>
VirtIO RPMB ([STR-5])
=====================
- made more progress and now have PROGRAM_KEY/WRITE_COUNTER done -
feels like it's getting faster
[STR-5] <https://projects.linaro.org/browse/STR-5>
[Rust version of virtio-rpmb] <https://github.com/stsquad/virtio-rpmb>
[fixes for the C daemon]
<https://github.com/ruchi393/qemu/tree/vhost-user-rpmb-fixes>
[hacking branch] <https://github.com/stsquad/virtio-rpmb/tree/hacking>
Fix VirtIO spec as per Rucha's email
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 6.1-rc3 v1 0/4] gitlab and plugins pre-PR
Message-Id: <20210806141015.2487502-1-alex.bennee(a)linaro.org>
- prepared a potential [pull request for testing issues] but looks
like it will wait for 6.2
[UM-2] <https://projects.linaro.org/browse/UM-2>
[this is the last iteration before Monday]
<https://patchew.org/QEMU/20210709143005.1554-1-alex.bennee@linaro.org/>
[pull request for testing issues]
<https://github.com/stsquad/qemu/tree/pr/120821-for-6.1-rc4-1>
Completed Reviews [4/4]
=======================
[RFC PATCH 0/1] QEMU TCG plugin interface extensions
Message-Id: <20210821094527.491232-1-florian.hauschild(a)fs.ei.tum.de>
[PATCH 0/8] tcg: support 32-bit guest addresses as signed
Message-Id: <20211010174401.141339-1-richard.henderson(a)linaro.org>
[PATCH 0/3] KVM: qemu patches for few KVM features I developed
Message-Id: <20210914155214.105415-1-mlevitsk(a)redhat.com>
[PATCH 0/6] More record/replay acceptance tests
Message-Id: <162332427732.194926.7555369160312506539.stgit@pasha-ThinkPad-X280>
[PATCH 0/3] Gitlab-CI improvements
Message-Id: <20210730143809.717079-1-thuth(a)redhat.com>
========================================================================================
[PATCH v3 00/13] new plugin argument passing scheme
Message-Id: <20210722071236.139520-1-ma.mandourr(a)gmail.com>
==============================================================================================================
[PATCH] contrib/plugins: add a drcov plugin
Message-Id: <20211011111130.170178-1-arkaisp2021(a)gmail.com>
======================================================================================================
[RFC PATCH v2] Add a post for the new TCG cache modelling plugin
Message-Id: <20210617121707.764126-1-ma.mandourr(a)gmail.com>
===========================================================================================================================
Current Review Queue
====================
TODO [PATCH v2 00/48] tcg: optimize redundant sign extensions
Message-Id: <20211007195456.1168070-1-richard.henderson(a)linaro.org>
================================================================================================================================
TODO [PATCH] cpu-models-x86.rst: Tidy up a couple of things
Message-Id: <20211015100718.17828-1-pbonzini(a)redhat.com>
===================================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
TODO [PATCH v4 00/48]
Message-Id: <20211013024607.731881-1-richard.henderson(a)linaro.org>
=======================================================================================
--
Alex Bennée
After llvm commit 75127bce6de78b83b70b898a04473f213451f13e
Author: Qiongsi Wu <qwu(a)ibm.com>
[AIX][ZOS] Excluding merge-objc-interface.m from Tests
the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%):
- 433.milc:[.] mult_su3_mat_vec slowed down by 16% from 1615 to 1871 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-75127bce6de78b83b70b898a04473f213451f13e
cd investigate-llvm-75127bce6de78b83b70b898a04473f213451f13e
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 75127bce6de78b83b70b898a04473f213451f13e
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d01ae990e1fd6561ed86dc8004a7147dd09fb13c
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 75127bce6de78b83b70b898a04473f213451f13e
Author: Qiongsi Wu <qwu(a)ibm.com>
Date: Fri Oct 8 13:58:32 2021 +0000
[AIX][ZOS] Excluding merge-objc-interface.m from Tests
Objective C is not supported on AIX or ZOS. This patch excludes the newly added `clang/test/Modules/merge-objc-interface.m` (added by https://reviews.llvm.org/D110280) from AIX and ZOS testing.
Many existing tests are already disabled by https://reviews.llvm.org/D109060.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111406
---
clang/test/Modules/merge-objc-interface.m | 1 +
1 file changed, 1 insertion(+)
diff --git a/clang/test/Modules/merge-objc-interface.m b/clang/test/Modules/merge-objc-interface.m
index fba06294a26a..f62f541c1a29 100644
--- a/clang/test/Modules/merge-objc-interface.m
+++ b/clang/test/Modules/merge-objc-interface.m
@@ -1,3 +1,4 @@
+// UNSUPPORTED: -zos, -aix
// RUN: rm -rf %t
// RUN: split-file %s %t
// RUN: %clang_cc1 -emit-llvm -o %t/test.bc -F%t/Frameworks %t/test.m \
</cut>
After llvm commit 483db1c706864d0940206228dfe64bdcd17faa4e
Author: Muhammad Omair Javaid <omair.javaid(a)linaro.org>
[LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux
the following benchmarks slowed down by more than 2%:
- 433.milc slowed down by 4% from 13309 to 13838 perf samples
- 433.milc:[.] mult_su3_mat_vec slowed down by 17% from 2058 to 2409 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-483db1c706864d0940206228dfe64bdcd17faa4e
cd investigate-llvm-483db1c706864d0940206228dfe64bdcd17faa4e
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 483db1c706864d0940206228dfe64bdcd17faa4e
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d11ec6f67e45c630ab87bfb6010dcc93e89542fc
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 483db1c706864d0940206228dfe64bdcd17faa4e
Author: Muhammad Omair Javaid <omair.javaid(a)linaro.org>
Date: Mon Oct 11 14:34:41 2021 +0500
[LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux
TestInferiorAssert.py test_inferior_asserting_disassemble passes after
upgrading LLDB AArch64/Linux buildbot to Ubuntu Focal.
---
lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py b/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py
index c533a1e29a12..5ac4eeb0514a 100644
--- a/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py
+++ b/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py
@@ -45,9 +45,7 @@ class AssertingInferiorTestCase(TestBase):
bugnumber="llvm.org/pr21793: need to implement support for detecting assertion / abort on Windows")
@expectedFailureAll(
oslist=["linux"],
- archs=[
- "aarch64",
- "arm"],
+ archs=["arm"],
triple=no_match(".*-android"),
bugnumber="llvm.org/pr25338")
@expectedFailureAll(bugnumber="llvm.org/pr26592", triple='^mips')
</cut>
After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez <aldyh(a)redhat.com>
Avoid invalid loop transformations in jump threading registry.
the following benchmarks slowed down by more than 2%:
- 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: arm-linux-gnueabihf
- Compiler flags: -O3 -marm
- Hardware: NVidia TK1 4x Cortex-A15
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Reproduce builds:
<cut>
mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez <aldyh(a)redhat.com>
Date: Thu Sep 23 10:59:24 2021 +0200
Avoid invalid loop transformations in jump threading registry.
My upcoming improvements to the forward jump threader make it thread
more aggressively. In investigating some "regressions", I noticed
that it has always allowed threading through empty latches and across
loop boundaries. As we have discussed recently, this should be avoided
until after loop optimizations have run their course.
Note that this wasn't much of a problem before because DOM/VRP
couldn't find these opportunities, but with a smarter solver, we trip
over them more easily.
Because the forward threader doesn't have an independent localized cost
model like the new threader (profitable_path_p), it is difficult to
catch these things at discovery. However, we can catch them at
registration time, with the added benefit that all the threaders
(forward and backward) can share the handcuffs.
This patch is an adaptation of what we do in the backward threader, but
it is not meant to catch everything we do there, as some of the
restrictions there are due to limitations of the different block
copiers (for example, the generic copier does not re-use existing
threading paths).
We could ideally remove the now redundant bits in profitable_path_p, but
I would prefer not to for two reasons. First, the backward threader uses
profitable_path_p as it discovers paths to avoid discovering paths in
unprofitable directions. Second, I would like to merge all the forward
cost restrictions into the profitability class in the backward threader,
not the other way around. Alas, that reshuffling will have to wait for
the next release.
As usual, there are quite a few tests that needed adjustments. It seems
we were quite happily threading improper scenarios. With most of them,
as can be seen in pr77445-2.c, we're merely shifting the threading to
after loop optimizations.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
New.
(jt_path_registry::register_jump_thread): Call
cancel_invalid_paths.
* tree-ssa-threadupdate.h (class jt_path_registry): Add
cancel_invalid_paths.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/20030714-2.c: Adjust.
* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
* gcc.dg/vect/bb-slp-16.c: Adjust.
---
gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++-
gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++---
gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +-
gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 ---
gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++-----
gcc/tree-ssa-threadupdate.h | 1 +
8 files changed, 78 insertions(+), 35 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
index eb663f2ff5b..9585ff11307 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
@@ -32,7 +32,8 @@ get_alias_set (t)
}
}
-/* There should be exactly three IF conditionals if we thread jumps
- properly. */
-/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
+/* There should be exactly 4 IF conditionals if we thread jumps
+ properly. There used to be 3, but one thread was crossing
+ loops. */
+/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index e1464e21170..922a331b217 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
extern int status, pt;
extern int count;
@@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a)
pt--;
}
-/* There are 4 jump threading opportunities, all of which will be
- realized, which will eliminate testing of FLAG, completely. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */
+/* There are 2 jump threading opportunities (which don't cross loops),
+ all of which will be realized, which will eliminate testing of
+ FLAG, completely. */
+/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
-/* There should be no assignments or references to FLAG, verify they're
- eliminated as early as possible. */
-/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
+/* We used to remove references to FLAG by DCE2, but this was
+ depending on early threaders threading through loop boundaries
+ (which we shouldn't do). However, the late threading passes, which
+ run after loop optimizations , can successfully eliminate the
+ references to FLAG. Verify that ther are no references by the late
+ threading passes. */
+/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
index f9fc212f49e..01a0f1f197d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
@@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) {
aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities. Skip the later tests on aarch64. */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 60d4f76f076..2d78d045516 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -21,5 +21,7 @@
condition.
All the cases are picked up by VRP1 as jump threads. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
+
+/* There used to be 6 jump threads found by thread1, but they all
+ depended on threading through distinct loops in ethread. */
/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index e3d4b311c03..16abcde5053 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,8 +1,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 664e93e9b60..e68a9b62535 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -1,8 +1,5 @@
/* { dg-require-effective-target vect_int } */
-/* See note below as to why we disable threading. */
-/* { dg-additional-options "-fdisable-tree-thread1" } */
-
#include <stdarg.h>
#include "tree-vect.h"
@@ -30,10 +27,6 @@ main1 (int dummy)
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
- /* In some architectures like ppc64, jump threading may thread
- the iteration where i==0 such that we no longer optimize the
- BB. Another alternative to disable jump threading would be
- to wrap the read from `i' into a function returning i. */
if (arr[i] = i)
a = i;
else
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index baac11280fa..2b9b8f81274 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
return retval;
}
+bool
+jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
+{
+ gcc_checking_assert (!path.is_empty ());
+ edge taken_edge = path[path.length () - 1]->e;
+ loop_p loop = taken_edge->src->loop_father;
+ bool seen_latch = false;
+ bool path_crosses_loops = false;
+
+ for (unsigned int i = 0; i < path.length (); i++)
+ {
+ edge e = path[i]->e;
+
+ if (e == NULL)
+ {
+ // NULL outgoing edges on a path can happen for jumping to a
+ // constant address.
+ cancel_thread (&path, "Found NULL edge in jump threading path");
+ return true;
+ }
+
+ if (loop->latch == e->src || loop->latch == e->dest)
+ seen_latch = true;
+
+ // The first entry represents the block with an outgoing edge
+ // that we will redirect to the jump threading path. Thus we
+ // don't care about that block's loop father.
+ if ((i > 0 && e->src->loop_father != loop)
+ || e->dest->loop_father != loop)
+ path_crosses_loops = true;
+
+ if (flag_checking && !m_backedge_threads)
+ gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
+ }
+
+ if (cfun->curr_properties & PROP_loop_opts_done)
+ return false;
+
+ if (seen_latch && empty_block_p (loop->latch))
+ {
+ cancel_thread (&path, "Threading through latch before loop opts "
+ "would create non-empty latch");
+ return true;
+ }
+ if (path_crosses_loops)
+ {
+ cancel_thread (&path, "Path crosses loops");
+ return true;
+ }
+ return false;
+}
+
/* Register a jump threading opportunity. We queue up all the jump
threading opportunities discovered by a pass and update the CFG
and SSA form all at once.
@@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
return false;
}
- /* First make sure there are no NULL outgoing edges on the jump threading
- path. That can happen for jumping to a constant address. */
- for (unsigned int i = 0; i < path->length (); i++)
- {
- if ((*path)[i]->e == NULL)
- {
- cancel_thread (path, "Found NULL edge in jump threading path");
- return false;
- }
-
- if (flag_checking && !m_backedge_threads)
- gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0);
- }
+ if (cancel_invalid_paths (*path))
+ return false;
if (dump_file && (dump_flags & TDF_DETAILS))
dump_jump_thread_path (dump_file, *path, true);
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 8b48a671212..d68795c9f27 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -75,6 +75,7 @@ protected:
unsigned long m_num_threaded_edges;
private:
virtual bool update_cfg (bool peel_loop_headers) = 0;
+ bool cancel_invalid_paths (vec<jump_thread_edge *> &path);
jump_thread_path_allocator m_allocator;
// True if threading through back edges is allowed. This is only
// allowed in the generic copier in the backward threader.
</cut>
[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions:
commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8
Author: Richard Biener <rguenther(a)suse.de>
tree-optimization/102570 - teach VN about internal functions
Results regressed to
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
18360
# First few build errors in logs:
# 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized]
# 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized]
# 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized]
# 00:01:21 make[2]: *** [scripts/Makefile.build:288: arch/arm64/hyperv/hv_core.o] Error 1
# 00:01:22 crypto/asymmetric_keys/asymmetric_type.c:481:15: error: ‘restrict_method’ is used uninitialized [-Werror=uninitialized]
# 00:01:22 make[2]: *** [scripts/Makefile.build:288: crypto/asymmetric_keys/asymmetric_type.o] Error 1
# 00:01:22 ./include/trace/perf.h:38:25: error: ‘entry’ is used uninitialized [-Werror=uninitialized]
# 00:01:22 ./include/trace/perf.h:44:13: error: ‘__entry_size’ is used uninitialized [-Werror=uninitialized]
# 00:01:23 security/keys/encrypted-keys/encrypted.c:660:19: error: ‘mkey’ is used uninitialized [-Werror=uninitialized]
# 00:01:23 security/keys/encrypted-keys/encrypted.c:905:19: error: ‘epayload’ is used uninitialized [-Werror=uninitialized]
from
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
21404
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_kernel/gnu-master-aarch64-next-allmodconfig
First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al…
Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al…
Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al…
Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al…
Reproduce builds:
<cut>
mkdir investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8
cd investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 22d34a2a50651d01669b6fbcdb9677c18d2197c5
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8
Author: Richard Biener <rguenther(a)suse.de>
Date: Mon Oct 4 10:57:45 2021 +0200
tree-optimization/102570 - teach VN about internal functions
We're now using internal functions for a lot of stuff but there's
still missing VN support out of laziness. The following instantiates
support and adds testcases for FRE and PRE (hoisting).
2021-10-04 Richard Biener <rguenther(a)suse.de>
PR tree-optimization/102570
* tree-ssa-sccvn.h (vn_reference_op_struct): Document
we are using clique for the internal function code.
* tree-ssa-sccvn.c (vn_reference_op_eq): Compare the
internal function code.
(print_vn_reference_ops): Print the internal function code.
(vn_reference_op_compute_hash): Hash it.
(copy_reference_ops_from_call): Record it.
(visit_stmt): Remove the restriction around internal function
calls.
(fully_constant_vn_reference_p): Use fold_const_call and handle
internal functions.
(vn_reference_eq): Compare call return types.
* tree-ssa-pre.c (create_expression_by_pieces): Handle
generating calls to internal functions.
(compute_avail): Remove the restriction around internal function
calls.
* gcc.dg/tree-ssa/ssa-fre-96.c: New testcase.
* gcc.dg/tree-ssa/ssa-pre-33.c: Likewise.
---
gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c | 14 +++++
gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c | 15 +++++
gcc/tree-ssa-pre.c | 27 +++++----
gcc/tree-ssa-sccvn.c | 91 ++++++++++++++++++------------
gcc/tree-ssa-sccvn.h | 3 +-
5 files changed, 103 insertions(+), 47 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
new file mode 100644
index 00000000000..fd1d5713b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+_Bool f1(unsigned x, unsigned y, unsigned *res)
+{
+ _Bool t = __builtin_add_overflow(x, y, res);
+ unsigned res1;
+ _Bool t1 = __builtin_add_overflow(x, y, &res1);
+ *res -= res1;
+ return t==t1;
+}
+
+/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "fre1" } } */
+/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c
new file mode 100644
index 00000000000..3b3bd629bc2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+_Bool f1(unsigned x, unsigned y, unsigned *res, int flag, _Bool *t)
+{
+ if (flag)
+ *t = __builtin_add_overflow(x, y, res);
+ unsigned res1;
+ _Bool t1 = __builtin_add_overflow(x, y, &res1);
+ *res -= res1;
+ return *t==t1;
+}
+
+/* We should hoist the .ADD_OVERFLOW to before the check. */
+/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "pre" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 08755847f66..1cc1aae694f 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -2855,9 +2855,13 @@ create_expression_by_pieces (basic_block block, pre_expr expr,
unsigned int operand = 1;
vn_reference_op_t currop = &ref->operands[0];
tree sc = NULL_TREE;
- tree fn = find_or_generate_expression (block, currop->op0, stmts);
- if (!fn)
- return NULL_TREE;
+ tree fn = NULL_TREE;
+ if (currop->op0)
+ {
+ fn = find_or_generate_expression (block, currop->op0, stmts);
+ if (!fn)
+ return NULL_TREE;
+ }
if (currop->op1)
{
sc = find_or_generate_expression (block, currop->op1, stmts);
@@ -2873,12 +2877,19 @@ create_expression_by_pieces (basic_block block, pre_expr expr,
return NULL_TREE;
args.quick_push (arg);
}
- gcall *call = gimple_build_call_vec (fn, args);
+ gcall *call;
+ if (currop->op0)
+ {
+ call = gimple_build_call_vec (fn, args);
+ gimple_call_set_fntype (call, currop->type);
+ }
+ else
+ call = gimple_build_call_internal_vec ((internal_fn)currop->clique,
+ args);
gimple_set_location (call, expr->loc);
- gimple_call_set_fntype (call, currop->type);
if (sc)
gimple_call_set_chain (call, sc);
- tree forcedname = make_ssa_name (TREE_TYPE (currop->type));
+ tree forcedname = make_ssa_name (ref->type);
gimple_call_set_lhs (call, forcedname);
/* There's no CCP pass after PRE which would re-compute alignment
information so make sure we re-materialize this here. */
@@ -4004,10 +4015,6 @@ compute_avail (function *fun)
vn_reference_s ref1;
pre_expr result = NULL;
- /* We can value number only calls to real functions. */
- if (gimple_call_internal_p (stmt))
- continue;
-
vn_reference_lookup_call (as_a <gcall *> (stmt), &ref, &ref1);
/* There is no point to PRE a call without a value. */
if (!ref || !ref->result)
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 416a5252144..0d942218279 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-scalar-evolution.h"
#include "tree-ssa-loop-niter.h"
#include "builtins.h"
+#include "fold-const-call.h"
#include "tree-ssa-sccvn.h"
/* This algorithm is based on the SCC algorithm presented by Keith
@@ -212,7 +213,8 @@ vn_reference_op_eq (const void *p1, const void *p2)
TYPE_MAIN_VARIANT (vro2->type))))
&& expressions_equal_p (vro1->op0, vro2->op0)
&& expressions_equal_p (vro1->op1, vro2->op1)
- && expressions_equal_p (vro1->op2, vro2->op2));
+ && expressions_equal_p (vro1->op2, vro2->op2)
+ && (vro1->opcode != CALL_EXPR || vro1->clique == vro2->clique));
}
/* Free a reference operation structure VP. */
@@ -264,15 +266,18 @@ print_vn_reference_ops (FILE *outfile, const vec<vn_reference_op_s> ops)
&& TREE_CODE_CLASS (vro->opcode) != tcc_declaration)
{
fprintf (outfile, "%s", get_tree_code_name (vro->opcode));
- if (vro->op0)
+ if (vro->op0 || vro->opcode == CALL_EXPR)
{
fprintf (outfile, "<");
closebrace = true;
}
}
- if (vro->op0)
+ if (vro->op0 || vro->opcode == CALL_EXPR)
{
- print_generic_expr (outfile, vro->op0);
+ if (!vro->op0)
+ fprintf (outfile, internal_fn_name ((internal_fn)vro->clique));
+ else
+ print_generic_expr (outfile, vro->op0);
if (vro->op1)
{
fprintf (outfile, ",");
@@ -684,6 +689,8 @@ static void
vn_reference_op_compute_hash (const vn_reference_op_t vro1, inchash::hash &hstate)
{
hstate.add_int (vro1->opcode);
+ if (vro1->opcode == CALL_EXPR && !vro1->op0)
+ hstate.add_int (vro1->clique);
if (vro1->op0)
inchash::add_expr (vro1->op0, hstate);
if (vro1->op1)
@@ -769,11 +776,16 @@ vn_reference_eq (const_vn_reference_t const vr1, const_vn_reference_t const vr2)
if (vr1->type != vr2->type)
return false;
}
+ else if (vr1->type == vr2->type)
+ ;
else if (COMPLETE_TYPE_P (vr1->type) != COMPLETE_TYPE_P (vr2->type)
|| (COMPLETE_TYPE_P (vr1->type)
&& !expressions_equal_p (TYPE_SIZE (vr1->type),
TYPE_SIZE (vr2->type))))
return false;
+ else if (vr1->operands[0].opcode == CALL_EXPR
+ && !types_compatible_p (vr1->type, vr2->type))
+ return false;
else if (INTEGRAL_TYPE_P (vr1->type)
&& INTEGRAL_TYPE_P (vr2->type))
{
@@ -1270,6 +1282,8 @@ copy_reference_ops_from_call (gcall *call,
temp.type = gimple_call_fntype (call);
temp.opcode = CALL_EXPR;
temp.op0 = gimple_call_fn (call);
+ if (gimple_call_internal_p (call))
+ temp.clique = gimple_call_internal_fn (call);
temp.op1 = gimple_call_chain (call);
if (stmt_could_throw_p (cfun, call) && (lr = lookup_stmt_eh_lp (call)) > 0)
temp.op2 = size_int (lr);
@@ -1459,9 +1473,11 @@ fully_constant_vn_reference_p (vn_reference_t ref)
a call to a builtin function with at most two arguments. */
op = &operands[0];
if (op->opcode == CALL_EXPR
- && TREE_CODE (op->op0) == ADDR_EXPR
- && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL
- && fndecl_built_in_p (TREE_OPERAND (op->op0, 0))
+ && (!op->op0
+ || (TREE_CODE (op->op0) == ADDR_EXPR
+ && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL
+ && fndecl_built_in_p (TREE_OPERAND (op->op0, 0),
+ BUILT_IN_NORMAL)))
&& operands.length () >= 2
&& operands.length () <= 3)
{
@@ -1481,13 +1497,17 @@ fully_constant_vn_reference_p (vn_reference_t ref)
anyconst = true;
if (anyconst)
{
- tree folded = build_call_expr (TREE_OPERAND (op->op0, 0),
- arg1 ? 2 : 1,
- arg0->op0,
- arg1 ? arg1->op0 : NULL);
- if (folded
- && TREE_CODE (folded) == NOP_EXPR)
- folded = TREE_OPERAND (folded, 0);
+ combined_fn fn;
+ if (op->op0)
+ fn = as_combined_fn (DECL_FUNCTION_CODE
+ (TREE_OPERAND (op->op0, 0)));
+ else
+ fn = as_combined_fn ((internal_fn) op->clique);
+ tree folded;
+ if (arg1)
+ folded = fold_const_call (fn, ref->type, arg0->op0, arg1->op0);
+ else
+ folded = fold_const_call (fn, ref->type, arg0->op0);
if (folded
&& is_gimple_min_invariant (folded))
return folded;
@@ -5648,28 +5668,27 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = false)
&& TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL)
extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0));
}
- if (!gimple_call_internal_p (call_stmt)
- && (/* Calls to the same function with the same vuse
- and the same operands do not necessarily return the same
- value, unless they're pure or const. */
- ((gimple_call_flags (call_stmt) | extra_fnflags)
- & (ECF_PURE | ECF_CONST))
- /* If calls have a vdef, subsequent calls won't have
- the same incoming vuse. So, if 2 calls with vdef have the
- same vuse, we know they're not subsequent.
- We can value number 2 calls to the same function with the
- same vuse and the same operands which are not subsequent
- the same, because there is no code in the program that can
- compare the 2 values... */
- || (gimple_vdef (call_stmt)
- /* ... unless the call returns a pointer which does
- not alias with anything else. In which case the
- information that the values are distinct are encoded
- in the IL. */
- && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS)
- /* Only perform the following when being called from PRE
- which embeds tail merging. */
- && default_vn_walk_kind == VN_WALK)))
+ if (/* Calls to the same function with the same vuse
+ and the same operands do not necessarily return the same
+ value, unless they're pure or const. */
+ ((gimple_call_flags (call_stmt) | extra_fnflags)
+ & (ECF_PURE | ECF_CONST))
+ /* If calls have a vdef, subsequent calls won't have
+ the same incoming vuse. So, if 2 calls with vdef have the
+ same vuse, we know they're not subsequent.
+ We can value number 2 calls to the same function with the
+ same vuse and the same operands which are not subsequent
+ the same, because there is no code in the program that can
+ compare the 2 values... */
+ || (gimple_vdef (call_stmt)
+ /* ... unless the call returns a pointer which does
+ not alias with anything else. In which case the
+ information that the values are distinct are encoded
+ in the IL. */
+ && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS)
+ /* Only perform the following when being called from PRE
+ which embeds tail merging. */
+ && default_vn_walk_kind == VN_WALK))
changed = visit_reference_op_call (lhs, call_stmt);
else
changed = defs_to_varying (call_stmt);
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 96100596d2e..8a1b649c726 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -106,7 +106,8 @@ typedef const struct vn_phi_s *const_vn_phi_t;
typedef struct vn_reference_op_struct
{
ENUM_BITFIELD(tree_code) opcode : 16;
- /* Dependence info, used for [TARGET_]MEM_REF only. */
+ /* Dependence info, used for [TARGET_]MEM_REF only. For internal
+ function calls clique is also used for the internal function code. */
unsigned short clique;
unsigned short base;
unsigned reverse : 1;
</cut>
After gcc commit a459ee44c0a74b0df0485ed7a56683816c02aae9
Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com>
aarch64: Improve size heuristic for cpymem expansion
the following benchmarks grew in size by more than 1%:
- 458.sjeng grew in size by 4% from 105780 to 109944 bytes
- 459.GemsFDTD grew in size by 2% from 247504 to 251468 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os -flto
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9
cd investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach a459ee44c0a74b0df0485ed7a56683816c02aae9
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8f95e3c04d659d541ca4937b3df2f1175a1c5f05
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit a459ee44c0a74b0df0485ed7a56683816c02aae9
Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com>
Date: Wed Sep 29 11:21:45 2021 +0100
aarch64: Improve size heuristic for cpymem expansion
Similar to my previous patch for setmem this one does the same for the cpymem expansion.
We count the number of ops emitted and compare it against the alternative of just calling
the library function when optimising for size.
For the code:
void
cpy_127 (char *out, char *in)
{
__builtin_memcpy (out, in, 127);
}
void
cpy_128 (char *out, char *in)
{
__builtin_memcpy (out, in, 128);
}
we now emit a call to memcpy (with an extra MOV-immediate instruction for the size) instead of:
cpy_127(char*, char*):
ldp q0, q1, [x1]
stp q0, q1, [x0]
ldp q0, q1, [x1, 32]
stp q0, q1, [x0, 32]
ldp q0, q1, [x1, 64]
stp q0, q1, [x0, 64]
ldr q0, [x1, 96]
str q0, [x0, 96]
ldr q0, [x1, 111]
str q0, [x0, 111]
ret
cpy_128(char*, char*):
ldp q0, q1, [x1]
stp q0, q1, [x0]
ldp q0, q1, [x1, 32]
stp q0, q1, [x0, 32]
ldp q0, q1, [x1, 64]
stp q0, q1, [x0, 64]
ldp q0, q1, [x1, 96]
stp q0, q1, [x0, 96]
ret
which is a clear code size win. Speed optimisation heuristics remain unchanged.
2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com>
* config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of
emitted operations and adjust heuristic for code size.
2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com>
* gcc.target/aarch64/cpymem-size.c: New test.
---
gcc/config/aarch64/aarch64.c | 36 ++++++++++++++++++--------
gcc/testsuite/gcc.target/aarch64/cpymem-size.c | 29 +++++++++++++++++++++
2 files changed, 54 insertions(+), 11 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ac17c1c88fb..a9a1800af53 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23390,7 +23390,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
}
/* Expand cpymem, as if from a __builtin_memcpy. Return true if
- we succeed, otherwise return false. */
+ we succeed, otherwise return false, indicating that a libcall to
+ memcpy should be emitted. */
bool
aarch64_expand_cpymem (rtx *operands)
@@ -23407,11 +23408,13 @@ aarch64_expand_cpymem (rtx *operands)
unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
- /* Inline up to 256 bytes when optimizing for speed. */
+ /* Try to inline up to 256 bytes. */
unsigned HOST_WIDE_INT max_copy_size = 256;
- if (optimize_function_for_size_p (cfun))
- max_copy_size = 128;
+ bool size_p = optimize_function_for_size_p (cfun);
+
+ if (size > max_copy_size)
+ return false;
int copy_bits = 256;
@@ -23421,13 +23424,14 @@ aarch64_expand_cpymem (rtx *operands)
|| !TARGET_SIMD
|| (aarch64_tune_params.extra_tuning_flags
& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
- {
- copy_bits = 128;
- max_copy_size = max_copy_size / 2;
- }
+ copy_bits = 128;
- if (size > max_copy_size)
- return false;
+ /* Emit an inline load+store sequence and count the number of operations
+ involved. We use a simple count of just the loads and stores emitted
+ rather than rtx_insn count as all the pointer adjustments and reg copying
+ in this function will get optimized away later in the pipeline. */
+ start_sequence ();
+ unsigned nops = 0;
base = copy_to_mode_reg (Pmode, XEXP (dst, 0));
dst = adjust_automodify_address (dst, VOIDmode, base, 0);
@@ -23456,7 +23460,8 @@ aarch64_expand_cpymem (rtx *operands)
cur_mode = V4SImode;
aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode);
-
+ /* A single block copy is 1 load + 1 store. */
+ nops += 2;
n -= mode_bits;
/* Emit trailing copies using overlapping unaligned accesses - this is
@@ -23471,7 +23476,16 @@ aarch64_expand_cpymem (rtx *operands)
n = n_bits;
}
}
+ rtx_insn *seq = get_insns ();
+ end_sequence ();
+
+ /* A memcpy libcall in the worst case takes 3 instructions to prepare the
+ arguments + 1 for the call. */
+ unsigned libcall_cost = 4;
+ if (size_p && libcall_cost < nops)
+ return false;
+ emit_insn (seq);
return true;
}
diff --git a/gcc/testsuite/gcc.target/aarch64/cpymem-size.c b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c
new file mode 100644
index 00000000000..4d488b74301
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+#include <stdlib.h>
+
+/*
+** cpy_127:
+** mov x2, 127
+** b memcpy
+*/
+void
+cpy_127 (char *out, char *in)
+{
+ __builtin_memcpy (out, in, 127);
+}
+
+/*
+** cpy_128:
+** mov x2, 128
+** b memcpy
+*/
+void
+cpy_128 (char *out, char *in)
+{
+ __builtin_memcpy (out, in, 128);
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
</cut>
Hi Arthur,
Thanks for looking into this!
The flags to compile regexec.c were:
-O3 --target=aarch64-linux-gnu -fgnu89-inline
Clang was configured with (on x86_64-linux-gnu host):
cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64
Please let me know if the above doesn’t work for you.
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
> On 29 Sep 2021, at 20:47, Arthur Eubanks <aeubanks(a)google.com> wrote:
>
> Do you know the flags passed to Clang to compile the sources? I tried compiling the preprocessed sources but ran into the below, and couldn't find the flags in any of the logs.
>
> In file included from regexec.c:93:
> In file included from ./perl.h:384:
> In file included from /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144:
> /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27: error: typedef redefinition with different types ('unsigned long' vs 'unsigned long long')
> typedef long unsigned int size_t;
> ^
> 1 error generated.
>
>
>
> And yeah just moving the code around could cause major performance regressions, I've had other patches do the same for various benchmarks, there's not much we can do about that if that's actually the root cause. If I can compile the file I can check if the optimization actually created worse IR or not.
>
>
> On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote:
> Hi Arthur,
>
> Pre-processed source is in the save-temps tarballs linked below; S_regmatch() is in regexec.i .
>
> The save-temps also have .s assembly file for before and after your patch, and the only code-gen difference is in S_reginclass() function — see the attached screenshot #1.
>
> Looking into profile of S_regmatch(), some of the extra cycles come from hot loop starting with “cbz w19,...” getting misaligned — before your patch it was starting at "2bce10", and after it starts at "2bce6c”.
>
> Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() in an unfortunate way?
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
>> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeubanks(a)google.com> wrote:
>>
>> Could I get the source file with S_regmatch()?
>>
>> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote:
>> Hi Arthur,
>>
>> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of its hot function S_regmatch() by 14%.
>>
>> Could you take a look if this is easily fixable, please?
>>
>> Regards,
>>
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>>
>> > On 24 Sep 2021, at 15:07, ci_notify(a)linaro.org wrote:
>> >
>> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc
>> > Author: Arthur Eubanks <aeubanks(a)google.com>
>> >
>> > [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
>> >
>> > the following benchmarks slowed down by more than 2%:
>> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples
>> > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples
>> >
>> > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
>> >
>> > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
>> > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
>> > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
>> > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
>> >
>> > Configuration:
>> > - Benchmark: SPEC CPU2006
>> > - Toolchain: Clang + Glibc + LLVM Linker
>> > - Version: all components were built from their tip of trunk
>> > - Target: aarch64-linux-gnu
>> > - Compiler flags: -O3
>> > - Hardware: NVidia TX1 4x Cortex-A57
>> >
>> > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
>
> <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>
Progress
* UM-2 [QEMU upstream maintainership]
+ Worked through my code-review backlog
+ Noticed that we never got round to making our emulated GICv3
support having redistributors in more than one contiguous region;
this prevents using more than 123 CPUs with the virt board. Sent
out a patchset which adds the necessary handling.
+ Generally trying to tie off loose ends pre-holiday :-)
-- PMM
Identified regression caused by *linux:30f349097897c115345beabeecc5e710b479ff1e*:
commit 30f349097897c115345beabeecc5e710b479ff1e
Merge: 9c566611ac5c f76c87e8c337
Author: Linus Torvalds <torvalds(a)linux-foundation.org>
Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Results regressed to (for first_bad == 30f349097897c115345beabeecc5e710b479ff1e)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
21782
# First few build errors in logs:
from (for last_good == 9c566611ac5cc7b45af943632f7a9b1b6a642991)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
29893
# linux build successful:
all
This commit has regressed these CI configurations:
- tcwg_kernel/gnu-release-arm-mainline-allmodconfig
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a…
Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a…
Reproduce builds:
<cut>
mkdir investigate-linux-30f349097897c115345beabeecc5e710b479ff1e
cd investigate-linux-30f349097897c115345beabeecc5e710b479ff1e
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach 30f349097897c115345beabeecc5e710b479ff1e
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 9c566611ac5cc7b45af943632f7a9b1b6a642991
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 30f349097897c115345beabeecc5e710b479ff1e
Merge: 9c566611ac5c f76c87e8c337
Author: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Wed Sep 8 16:38:25 2021 -0700
Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are mostly ARM cpufreq driver updates, including one new
MediaTek driver that has just passed all of the reviews, with the
addition of a revert of a recent intel_pstate commit, some core
cpufreq changes and a DT-related update of the operating performance
points (OPP) support code.
Specifics:
- Add new cpufreq driver for the MediaTek MT6779 platform called
mediatek-hw along with corresponding DT bindings (Hector.Yuan).
- Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
Gopinath).
- Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
policy flag (Taniya Das).
- Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
Andersson).
- Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
flag (Viresh Kumar).
- Add new cpufreq driver callback to allow drivers to register with
the Energy Model in a consistent way and make several drivers use
it (Viresh Kumar).
- Change the remaining users of the .ready() cpufreq driver callback
to move the code from it elsewhere and drop it from the cpufreq
core (Viresh Kumar).
- Revert recent intel_pstate change adding HWP guaranteed performance
change notification support to it that led to problems, because the
notification in question is triggered prematurely on some systems
(Rafael Wysocki).
- Convert the OPP DT bindings to DT schema and clean them up while at
it (Rob Herring)"
* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
cpufreq: Remove ready() callback
cpufreq: sh: Remove sh_cpufreq_cpu_ready()
cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
dt-bindings: opp: Convert to DT schema
dt-bindings: Clean-up OPP binding node names in examples
ARM: dts: omap: Drop references to opp.txt
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
...
Documentation/cpu-freq/cpu-drivers.rst | 3 -
.../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +-
.../bindings/cpufreq/cpufreq-mediatek-hw.yaml | 70 +++
.../bindings/cpufreq/cpufreq-mediatek.txt | 2 +-
.../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +-
.../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +-
.../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +-
.../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +-
.../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +-
.../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +-
.../opp/allwinner,sun50i-h6-operating-points.yaml | 4 +
Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++
.../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++
Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++
Documentation/devicetree/bindings/opp/opp.txt | 622 ---------------------
Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +-
.../bindings/opp/ti-omap5-opp-supply.txt | 2 +-
.../devicetree/bindings/power/power-domain.yaml | 2 +-
.../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 -
arch/arm/boot/dts/omap34xx.dtsi | 1 -
arch/arm/boot/dts/omap36xx.dtsi | 1 -
drivers/base/arch_topology.c | 2 +
drivers/cpufreq/Kconfig.arm | 12 +
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/acpi-cpufreq.c | 14 +-
drivers/cpufreq/cpufreq-dt-platdev.c | 4 +
drivers/cpufreq/cpufreq-dt.c | 3 +-
drivers/cpufreq/cpufreq.c | 17 +-
drivers/cpufreq/imx6q-cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 39 --
drivers/cpufreq/mediatek-cpufreq-hw.c | 308 ++++++++++
drivers/cpufreq/mediatek-cpufreq.c | 3 +-
drivers/cpufreq/omap-cpufreq.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++-
drivers/cpufreq/scmi-cpufreq.c | 65 ++-
drivers/cpufreq/scpi-cpufreq.c | 3 +-
drivers/cpufreq/sh-cpufreq.c | 11 -
drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +-
include/linux/cpufreq.h | 75 ++-
39 files changed, 1441 insertions(+), 767 deletions(-)
</cut>
Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-aarch64-next-allnoconfig. So far, this commit has regressed CI configurations:
- tcwg_kernel/llvm-release-aarch64-next-allnoconfig
Culprit:
<cut>
commit 8633ef82f101c040427b57d4df7b706261420b94
Author: Javier Martinez Canillas <javierm(a)redhat.com>
Date: Fri Jun 25 15:13:59 2021 +0200
drivers/firmware: consolidate EFI framebuffer setup for all arches
The register_gop_device() function registers an "efi-framebuffer" platform
device to match against the efifb driver, to have an early framebuffer for
EFI platforms.
But there is already support to do exactly the same by the Generic System
Framebuffers (sysfb) driver. This used to be only for X86 but it has been
moved to drivers/firmware and could be reused by other architectures.
Also, besides supporting registering an "efi-framebuffer", this driver can
register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers
on non-X86 EFI platforms. For example, on aarch64 these drivers can only
be used with DT and doesn't have code to register a "simple-frambuffer"
platform device when booting with EFI.
For these reasons, let's remove the register_gop_device() duplicated code
and instead move the platform specific logic that's there to sysfb driver.
Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com>
Acked-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi…
</cut>
Results regressed to (for first_bad == 8633ef82f101c040427b57d4df7b706261420b94)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
600
# First few build errors in logs:
# 00:00:38 ld.lld: error: undefined symbol: screen_info
# 00:00:38 make: *** [vmlinux] Error 1
from (for last_good == d391c58271072d0b0fad93c82018d495b2633448)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
601
# linux build successful:
all
# linux boot successful:
boot
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Configuration details:
rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ff11764…"
Reproduce builds:
<cut>
mkdir investigate-linux-8633ef82f101c040427b57d4df7b706261420b94
cd investigate-linux-8633ef82f101c040427b57d4df7b706261420b94
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach 8633ef82f101c040427b57d4df7b706261420b94
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d391c58271072d0b0fad93c82018d495b2633448
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Full commit (up to 1000 lines):
<cut>
commit 8633ef82f101c040427b57d4df7b706261420b94
Author: Javier Martinez Canillas <javierm(a)redhat.com>
Date: Fri Jun 25 15:13:59 2021 +0200
drivers/firmware: consolidate EFI framebuffer setup for all arches
The register_gop_device() function registers an "efi-framebuffer" platform
device to match against the efifb driver, to have an early framebuffer for
EFI platforms.
But there is already support to do exactly the same by the Generic System
Framebuffers (sysfb) driver. This used to be only for X86 but it has been
moved to drivers/firmware and could be reused by other architectures.
Also, besides supporting registering an "efi-framebuffer", this driver can
register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers
on non-X86 EFI platforms. For example, on aarch64 these drivers can only
be used with DT and doesn't have code to register a "simple-frambuffer"
platform device when booting with EFI.
For these reasons, let's remove the register_gop_device() duplicated code
and instead move the platform specific logic that's there to sysfb driver.
Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com>
Acked-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi…
---
arch/arm/include/asm/efi.h | 5 +--
arch/arm64/include/asm/efi.h | 5 +--
arch/riscv/include/asm/efi.h | 5 +--
drivers/firmware/Kconfig | 8 ++--
drivers/firmware/Makefile | 2 +-
drivers/firmware/efi/efi-init.c | 90 ---------------------------------------
drivers/firmware/efi/sysfb_efi.c | 76 ++++++++++++++++++++++++++++++++-
drivers/firmware/sysfb.c | 35 ++++++++++-----
drivers/firmware/sysfb_simplefb.c | 31 ++++++++++----
drivers/gpu/drm/tiny/Kconfig | 4 +-
include/linux/sysfb.h | 26 +++++------
11 files changed, 143 insertions(+), 144 deletions(-)
diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 9de7ab2ce05d..a6f3b179e8a9 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -17,6 +17,7 @@
#ifdef CONFIG_EFI
void efi_init(void);
+extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md);
int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md);
@@ -52,10 +53,6 @@ void efi_virtmap_unload(void);
struct screen_info *alloc_screen_info(void);
void free_screen_info(struct screen_info *si);
-static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
-{
-}
-
/*
* A reasonable upper bound for the uncompressed kernel size is 32 MBytes,
* so we will reserve that amount of memory. We have no easy way to tell what
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 3578aba9c608..42d673a011c8 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -14,6 +14,7 @@
#ifdef CONFIG_EFI
extern void efi_init(void);
+extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
#else
#define efi_init()
#endif
@@ -85,10 +86,6 @@ static inline void free_screen_info(struct screen_info *si)
{
}
-static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
-{
-}
-
#define EFI_ALLOC_ALIGN SZ_64K
/*
diff --git a/arch/riscv/include/asm/efi.h b/arch/riscv/include/asm/efi.h
index 6d98cd999680..7a8f0d45b13a 100644
--- a/arch/riscv/include/asm/efi.h
+++ b/arch/riscv/include/asm/efi.h
@@ -13,6 +13,7 @@
#ifdef CONFIG_EFI
extern void efi_init(void);
+extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
#else
#define efi_init()
#endif
@@ -39,10 +40,6 @@ static inline void free_screen_info(struct screen_info *si)
{
}
-static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
-{
-}
-
void efi_virtmap_load(void);
void efi_virtmap_unload(void);
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 71f3d97f0c39..af6719cc576b 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -254,9 +254,9 @@ config QCOM_SCM_DOWNLOAD_MODE_DEFAULT
config SYSFB
bool
default y
- depends on X86 || COMPILE_TEST
+ depends on X86 || ARM || ARM64 || RISCV || COMPILE_TEST
-config X86_SYSFB
+config SYSFB_SIMPLEFB
bool "Mark VGA/VBE/EFI FB as generic system framebuffer"
depends on SYSFB
help
@@ -264,10 +264,10 @@ config X86_SYSFB
bootloader or kernel can show basic video-output during boot for
user-guidance and debugging. Historically, x86 used the VESA BIOS
Extensions and EFI-framebuffers for this, which are mostly limited
- to x86.
+ to x86 BIOS or EFI systems.
This option, if enabled, marks VGA/VBE/EFI framebuffers as generic
framebuffers so the new generic system-framebuffer drivers can be
- used on x86. If the framebuffer is not compatible with the generic
+ used instead. If the framebuffer is not compatible with the generic
modes, it is advertised as fallback platform framebuffer so legacy
drivers like efifb, vesafb and uvesafb can pick it up.
If this option is not selected, all system framebuffers are always
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index ad78f78ffa8d..6ac637e422b9 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -19,7 +19,7 @@ obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o
obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o
obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
obj-$(CONFIG_SYSFB) += sysfb.o
-obj-$(CONFIG_X86_SYSFB) += sysfb_simplefb.o
+obj-$(CONFIG_SYSFB_SIMPLEFB) += sysfb_simplefb.o
obj-$(CONFIG_TI_SCI_PROTOCOL) += ti_sci.o
obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o
obj-$(CONFIG_TURRIS_MOX_RWTM) += turris-mox-rwtm.o
diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
index a552a08a1741..b19ce1a83f91 100644
--- a/drivers/firmware/efi/efi-init.c
+++ b/drivers/firmware/efi/efi-init.c
@@ -275,93 +275,3 @@ void __init efi_init(void)
}
#endif
}
-
-static bool efifb_overlaps_pci_range(const struct of_pci_range *range)
-{
- u64 fb_base = screen_info.lfb_base;
-
- if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
- fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32;
-
- return fb_base >= range->cpu_addr &&
- fb_base < (range->cpu_addr + range->size);
-}
-
-static struct device_node *find_pci_overlap_node(void)
-{
- struct device_node *np;
-
- for_each_node_by_type(np, "pci") {
- struct of_pci_range_parser parser;
- struct of_pci_range range;
- int err;
-
- err = of_pci_range_parser_init(&parser, np);
- if (err) {
- pr_warn("of_pci_range_parser_init() failed: %d\n", err);
- continue;
- }
-
- for_each_of_pci_range(&parser, &range)
- if (efifb_overlaps_pci_range(&range))
- return np;
- }
- return NULL;
-}
-
-/*
- * If the efifb framebuffer is backed by a PCI graphics controller, we have
- * to ensure that this relation is expressed using a device link when
- * running in DT mode, or the probe order may be reversed, resulting in a
- * resource reservation conflict on the memory window that the efifb
- * framebuffer steals from the PCIe host bridge.
- */
-static int efifb_add_links(struct fwnode_handle *fwnode)
-{
- struct device_node *sup_np;
-
- sup_np = find_pci_overlap_node();
-
- /*
- * If there's no PCI graphics controller backing the efifb, we are
- * done here.
- */
- if (!sup_np)
- return 0;
-
- fwnode_link_add(fwnode, of_fwnode_handle(sup_np));
- of_node_put(sup_np);
-
- return 0;
-}
-
-static const struct fwnode_operations efifb_fwnode_ops = {
- .add_links = efifb_add_links,
-};
-
-static struct fwnode_handle efifb_fwnode;
-
-static int __init register_gop_device(void)
-{
- struct platform_device *pd;
- int err;
-
- if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
- return 0;
-
- pd = platform_device_alloc("efi-framebuffer", 0);
- if (!pd)
- return -ENOMEM;
-
- if (IS_ENABLED(CONFIG_PCI)) {
- fwnode_init(&efifb_fwnode, &efifb_fwnode_ops);
- pd->dev.fwnode = &efifb_fwnode;
- }
-
- err = platform_device_add_data(pd, &screen_info, sizeof(screen_info));
- if (err)
- return err;
-
- return platform_device_add(pd);
-}
-subsys_initcall(register_gop_device);
diff --git a/drivers/firmware/efi/sysfb_efi.c b/drivers/firmware/efi/sysfb_efi.c
index 9f035b15501c..f51865e1b876 100644
--- a/drivers/firmware/efi/sysfb_efi.c
+++ b/drivers/firmware/efi/sysfb_efi.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * Generic System Framebuffers on x86
+ * Generic System Framebuffers
* Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com>
*
* EFI Quirks Copyright (c) 2006 Edgar Hucek <gimli(a)dark-green.com>
@@ -19,7 +19,9 @@
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/of_address.h>
#include <linux/pci.h>
+#include <linux/platform_device.h>
#include <linux/screen_info.h>
#include <linux/sysfb.h>
#include <video/vga.h>
@@ -267,7 +269,72 @@ static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = {
{},
};
-__init void sysfb_apply_efi_quirks(void)
+static bool efifb_overlaps_pci_range(const struct of_pci_range *range)
+{
+ u64 fb_base = screen_info.lfb_base;
+
+ if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+ fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32;
+
+ return fb_base >= range->cpu_addr &&
+ fb_base < (range->cpu_addr + range->size);
+}
+
+static struct device_node *find_pci_overlap_node(void)
+{
+ struct device_node *np;
+
+ for_each_node_by_type(np, "pci") {
+ struct of_pci_range_parser parser;
+ struct of_pci_range range;
+ int err;
+
+ err = of_pci_range_parser_init(&parser, np);
+ if (err) {
+ pr_warn("of_pci_range_parser_init() failed: %d\n", err);
+ continue;
+ }
+
+ for_each_of_pci_range(&parser, &range)
+ if (efifb_overlaps_pci_range(&range))
+ return np;
+ }
+ return NULL;
+}
+
+/*
+ * If the efifb framebuffer is backed by a PCI graphics controller, we have
+ * to ensure that this relation is expressed using a device link when
+ * running in DT mode, or the probe order may be reversed, resulting in a
+ * resource reservation conflict on the memory window that the efifb
+ * framebuffer steals from the PCIe host bridge.
+ */
+static int efifb_add_links(struct fwnode_handle *fwnode)
+{
+ struct device_node *sup_np;
+
+ sup_np = find_pci_overlap_node();
+
+ /*
+ * If there's no PCI graphics controller backing the efifb, we are
+ * done here.
+ */
+ if (!sup_np)
+ return 0;
+
+ fwnode_link_add(fwnode, of_fwnode_handle(sup_np));
+ of_node_put(sup_np);
+
+ return 0;
+}
+
+static const struct fwnode_operations efifb_fwnode_ops = {
+ .add_links = efifb_add_links,
+};
+
+static struct fwnode_handle efifb_fwnode;
+
+__init void sysfb_apply_efi_quirks(struct platform_device *pd)
{
if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI ||
!(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS))
@@ -281,4 +348,9 @@ __init void sysfb_apply_efi_quirks(void)
screen_info.lfb_height = temp;
screen_info.lfb_linelength = 4 * screen_info.lfb_width;
}
+
+ if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI && IS_ENABLED(CONFIG_PCI)) {
+ fwnode_init(&efifb_fwnode, &efifb_fwnode_ops);
+ pd->dev.fwnode = &efifb_fwnode;
+ }
}
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 1337515963d5..2bfbb05f7d89 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -1,11 +1,11 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * Generic System Framebuffers on x86
+ * Generic System Framebuffers
* Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com>
*/
/*
- * Simple-Framebuffer support for x86 systems
+ * Simple-Framebuffer support
* Create a platform-device for any available boot framebuffer. The
* simple-framebuffer platform device is already available on DT systems, so
* this module parses the global "screen_info" object and creates a suitable
@@ -16,12 +16,12 @@
* to pick these devices up without messing with simple-framebuffer drivers.
* The global "screen_info" is still valid at all times.
*
- * If CONFIG_X86_SYSFB is not selected, we never register "simple-framebuffer"
+ * If CONFIG_SYSFB_SIMPLEFB is not selected, never register "simple-framebuffer"
* platform devices, but only use legacy framebuffer devices for
* backwards compatibility.
*
* TODO: We set the dev_id field of all platform-devices to 0. This allows
- * other x86 OF/DT parsers to create such devices, too. However, they must
+ * other OF/DT parsers to create such devices, too. However, they must
* start at offset 1 for this to work.
*/
@@ -43,12 +43,10 @@ static __init int sysfb_init(void)
bool compatible;
int ret;
- sysfb_apply_efi_quirks();
-
/* try to create a simple-framebuffer device */
- compatible = parse_mode(si, &mode);
+ compatible = sysfb_parse_mode(si, &mode);
if (compatible) {
- ret = create_simplefb(si, &mode);
+ ret = sysfb_create_simplefb(si, &mode);
if (!ret)
return 0;
}
@@ -61,9 +59,24 @@ static __init int sysfb_init(void)
else
name = "platform-framebuffer";
- pd = platform_device_register_resndata(NULL, name, 0,
- NULL, 0, si, sizeof(*si));
- return PTR_ERR_OR_ZERO(pd);
+ pd = platform_device_alloc(name, 0);
+ if (!pd)
+ return -ENOMEM;
+
+ sysfb_apply_efi_quirks(pd);
+
+ ret = platform_device_add_data(pd, si, sizeof(*si));
+ if (ret)
+ goto err;
+
+ ret = platform_device_add(pd);
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ platform_device_put(pd);
+ return ret;
}
/* must execute after PCI subsystem for EFI quirks */
diff --git a/drivers/firmware/sysfb_simplefb.c b/drivers/firmware/sysfb_simplefb.c
index df892444ea17..b86761904949 100644
--- a/drivers/firmware/sysfb_simplefb.c
+++ b/drivers/firmware/sysfb_simplefb.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * Generic System Framebuffers on x86
+ * Generic System Framebuffers
* Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com>
*/
@@ -23,9 +23,9 @@
static const char simplefb_resname[] = "BOOTFB";
static const struct simplefb_format formats[] = SIMPLEFB_FORMATS;
-/* try parsing x86 screen_info into a simple-framebuffer mode struct */
-__init bool parse_mode(const struct screen_info *si,
- struct simplefb_platform_data *mode)
+/* try parsing screen_info into a simple-framebuffer mode struct */
+__init bool sysfb_parse_mode(const struct screen_info *si,
+ struct simplefb_platform_data *mode)
{
const struct simplefb_format *f;
__u8 type;
@@ -57,13 +57,14 @@ __init bool parse_mode(const struct screen_info *si,
return false;
}
-__init int create_simplefb(const struct screen_info *si,
- const struct simplefb_platform_data *mode)
+__init int sysfb_create_simplefb(const struct screen_info *si,
+ const struct simplefb_platform_data *mode)
{
struct platform_device *pd;
struct resource res;
u64 base, size;
u32 length;
+ int ret;
/*
* If the 64BIT_BASE capability is set, ext_lfb_base will contain the
@@ -105,7 +106,19 @@ __init int create_simplefb(const struct screen_info *si,
if (res.end <= res.start)
return -EINVAL;
- pd = platform_device_register_resndata(NULL, "simple-framebuffer", 0,
- &res, 1, mode, sizeof(*mode));
- return PTR_ERR_OR_ZERO(pd);
+ pd = platform_device_alloc("simple-framebuffer", 0);
+ if (!pd)
+ return -ENOMEM;
+
+ sysfb_apply_efi_quirks(pd);
+
+ ret = platform_device_add_resources(pd, &res, 1);
+ if (ret)
+ return ret;
+
+ ret = platform_device_add_data(pd, mode, sizeof(*mode));
+ if (ret)
+ return ret;
+
+ return platform_device_add(pd);
}
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 5593128eeff9..d31be274a2bd 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -64,8 +64,8 @@ config DRM_SIMPLEDRM
buffer, size, and display format must be provided via device tree,
UEFI, VESA, etc.
- On x86 and compatible, you should also select CONFIG_X86_SYSFB to
- use UEFI and VESA framebuffers.
+ On x86 BIOS or UEFI systems, you should also select SYSFB_SIMPLEFB
+ to use UEFI and VESA framebuffers.
config TINYDRM_HX8357D
tristate "DRM support for HX8357D display panels"
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index 3e5355769dc3..b0dcfa26d07b 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,37 +58,37 @@ struct efifb_dmi_info {
#ifdef CONFIG_EFI
extern struct efifb_dmi_info efifb_dmi_list[];
-void sysfb_apply_efi_quirks(void);
+void sysfb_apply_efi_quirks(struct platform_device *pd);
#else /* CONFIG_EFI */
-static inline void sysfb_apply_efi_quirks(void)
+static inline void sysfb_apply_efi_quirks(struct platform_device *pd)
{
}
#endif /* CONFIG_EFI */
-#ifdef CONFIG_X86_SYSFB
+#ifdef CONFIG_SYSFB_SIMPLEFB
-bool parse_mode(const struct screen_info *si,
- struct simplefb_platform_data *mode);
-int create_simplefb(const struct screen_info *si,
- const struct simplefb_platform_data *mode);
+bool sysfb_parse_mode(const struct screen_info *si,
+ struct simplefb_platform_data *mode);
+int sysfb_create_simplefb(const struct screen_info *si,
+ const struct simplefb_platform_data *mode);
-#else /* CONFIG_X86_SYSFB */
+#else /* CONFIG_SYSFB_SIMPLE */
-static inline bool parse_mode(const struct screen_info *si,
- struct simplefb_platform_data *mode)
+static inline bool sysfb_parse_mode(const struct screen_info *si,
+ struct simplefb_platform_data *mode)
{
return false;
}
-static inline int create_simplefb(const struct screen_info *si,
- const struct simplefb_platform_data *mode)
+static inline int sysfb_create_simplefb(const struct screen_info *si,
+ const struct simplefb_platform_data *mode)
{
return -EINVAL;
}
-#endif /* CONFIG_X86_SYSFB */
+#endif /* CONFIG_SYSFB_SIMPLE */
#endif /* _LINUX_SYSFB_H */
</cut>
Hi Greg,
This appears to have been a fluke. Boot-testing succeeded before the merge and failed after. Boot-testing on allmodconfig doesn’t seem to be stable, so we are going to disable it.
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
> On 18 Aug 2021, at 08:38, Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote:
>
> On Wed, Aug 18, 2021 at 05:22:07AM +0000, ci_notify(a)linaro.org wrote:
>> Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-lts-allmodconfig. So far, this commit has regressed CI configurations:
>> - tcwg_kernel/llvm-master-aarch64-lts-allmodconfig
>>
>> Culprit:
>> <cut>
>> commit 132a8267adabd645476b542b3b132c1b91988fe8
>> Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>> Date: Thu Aug 12 13:22:21 2021 +0200
>>
>> Linux 5.10.58
>
> <snip>
>
> And what am I supposed to do with this information?
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe(a)googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/YRyczv2OCq51edQh%40kroa….
[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp:
commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
Author: Tom de Vries <tdevries(a)suse.de>
[gdb/testsuite] Add gdb.testsuite/dump-system-info.exp
Results regressed to
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-8
# build_abe newlib:
-6
# build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-5
# true:
0
# benchmark -- -O3_VECT_mthumb artifacts/build-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1/results_id:
1
from
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-8
# build_abe newlib:
-6
# build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-5
# true:
0
# benchmark -- -O3_VECT_mthumb artifacts/build-baseline/results_id:
1
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_eabi_stm32/gnu_eabi-master-arm_eabi-coremark-O3_VECT
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Reproduce builds:
<cut>
mkdir investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
cd investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/
cd binutils
# Reproduce first_bad build
git checkout --detach b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 3814a9e1fe77c01c7e872c25afa198537d4ac780
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
Author: Tom de Vries <tdevries(a)suse.de>
Date: Fri Sep 24 12:39:14 2021 +0200
[gdb/testsuite] Add gdb.testsuite/dump-system-info.exp
When interpreting the testsuite results, it's often relevant what kind of
machine the testsuite ran on. On a local machine one can just do
/proc/cpuinfo, but in case of running tests using a remote system
that distributes test runs to other remote systems that are not directly
accessible, that's not possible.
Fix this by dumping /proc/cpuinfo into the gdb.log, as well as lsb_release -a
and uname -a.
We could do this at the start of each test run, by putting it into unix.exp
or some such. However, this might be too verbose, so we choose to put it into
its own test-case, such that it get triggered in a full testrun, but not when
running one or a subset of tests.
We put the test-case into the gdb.testsuite directory, which is currently the
only place in the testsuite where we do not test gdb. [ Though perhaps this
could be put into a new gdb.info directory, since the test-case doesn't
actually test the testsuite. ]
Tested on x86_64-linux.
---
gdb/testsuite/gdb.testsuite/dump-system-info.exp | 48 ++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/gdb/testsuite/gdb.testsuite/dump-system-info.exp b/gdb/testsuite/gdb.testsuite/dump-system-info.exp
new file mode 100644
index 00000000000..bf181469bd5
--- /dev/null
+++ b/gdb/testsuite/gdb.testsuite/dump-system-info.exp
@@ -0,0 +1,48 @@
+# Copyright 2021 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# The purpose of this test-case is to dump /proc/cpuinfo and similar system
+# info into gdb.log.
+
+# Check if /proc/cpuinfo is available.
+set res [remote_exec target "test -r /proc/cpuinfo"]
+set status [lindex $res 0]
+set output [lindex $res 1]
+
+if { $status == 0 && $output == "" } {
+ verbose -log "Cpuinfo available, dumping:"
+ remote_exec target "cat /proc/cpuinfo"
+} else {
+ verbose -log "Cpuinfo not available"
+}
+
+set res [remote_exec target "lsb_release -a"]
+set status [lindex $res 0]
+set output [lindex $res 1]
+
+if { $status == 0 } {
+ verbose -log "lsb_release -a availabe, dumping:\n$output"
+} else {
+ verbose -log "lsb_release -a not available"
+}
+
+set res [remote_exec target "uname -a"]
+set status [lindex $res 0]
+set output [lindex $res 1]
+
+if { $status == 0 } {
+ verbose -log "uname -a availabe, dumping:\n$output"
+} else {
+ verbose -log "uname -a not available"
+}
</cut>
After llvm commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
Author: Erich Keane <erich.keane(a)intel.com>
Fix test from 8dd42f, capitalization in test
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 3% from 10973 to 11249 perf samples
- 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 12% from 1446 to 1619 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
cd investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 77d200a546136c2855063613ff4bca1f682fb23a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
Author: Erich Keane <erich.keane(a)intel.com>
Date: Fri Sep 24 10:24:17 2021 -0700
Fix test from 8dd42f, capitalization in test
---
clang/test/CXX/drs/dr17xx.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clang/test/CXX/drs/dr17xx.cpp b/clang/test/CXX/drs/dr17xx.cpp
index 42303c83ae3c..c8648908ebda 100644
--- a/clang/test/CXX/drs/dr17xx.cpp
+++ b/clang/test/CXX/drs/dr17xx.cpp
@@ -129,7 +129,7 @@ namespace dr1778 { // dr1778: 9
namespace dr1762 { // dr1762: 14
#if __cplusplus >= 201103L
float operator ""_E(const char *);
- // expected-error@+2 {{invalid suffix on literal; c++11 requires a space between literal and identifier}}
+ // expected-error@+2 {{invalid suffix on literal; C++11 requires a space between literal and identifier}}
// expected-warning@+1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
float operator ""E(const char *);
#endif
</cut>
Thanks, Stanislav,
FWIW, it will be, probably, easier for you to just rebuild the compiler, it is an x86_64-linux-gnu -> arm-linux-gnueabihf cross. This link has the build log [1].
cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=ARM
Then compile the pre-processed source with plain -O2 or -O3 optimisation settings.
[1] https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
> On 24 Sep 2021, at 20:30, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
>
> [AMD Official Use Only]
>
> I have reverted the whole change. There was yet another perf regression report.
>
> Stas
>
> From: Mekhanoshin, Stanislav
> Sent: Thursday, September 23, 2021 11:48
> To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
>
> Thanks. I see the reload. There shall not be extra pressure since that is the whole idea, make pressure less. However, I see more spills in that specific file, fast_algorithms.s if I get it right.
> Can I get the IR for it? Something to feed llc.
>
> Stas
>
> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> Sent: Thursday, September 23, 2021 2:31
> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
>
> [CAUTION: External Email]
>
> Thanks, Stanislav.
>
> I’ve looked into profile dumps, and 456.hmmer’s hot loop get several additional reloads. E.g., "ldr r1, [sp, #84]” generates 203 additional samples, which translates into 20 seconds of time just for that one instruction.
>
> See the attached profile dumps and the the screenshot with the hot loop highlighted.
>
> Maybe your patch increases register pressure too much?
>
> Regards,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
> > On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
> >
> > [AMD Official Use Only]
> >
> > There are actually couple things worth to try if that is easy:
> >
> > https://reviews.llvm.org/D109077
> > https://reviews.llvm.org/differential/diff/374324/
> >
> > Both may slightly change spill weights and then spilling pattern.
> >
> > Stas
> >
> > -----Original Message-----
> > From: Mekhanoshin, Stanislav
> > Sent: Wednesday, September 22, 2021 12:09
> > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >
> > I assume some of the newly rematerialized instructions caused perf drops. Probably some very specific ones. I would appreciate if you could point them to me.
> > In addition I believe I would need to have a linked or optimized bitcode to feed into llc.
> >
> > Stas
> >
> > -----Original Message-----
> > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> > Sent: Wednesday, September 22, 2021 12:06
> > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >
> > [CAUTION: External Email]
> >
> > Hi Stanislav,
> >
> > That's fair; I or someone from Linaro will try to analyze this and follow up here.
> >
> > On a more general note, what info would you like to see in these benchmarking regression reports?
> >
> > Thanks,
> >
> > --
> > Maxim Kuvyrkov
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar…
> >
> >
> >> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
> >>
> >> [AMD Official Use Only]
> >>
> >> Hm... I'd really like to help, but I do not think I can do anything with megabytes of code in an asm which I do not understand and tons of differences in 48 asm files.
> >> What I can see there is overall less spilling code which was the intent in the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 less of them.
> >> I doubt I could say much more without someone pointing to the actual root cause.
> >>
> >> Stas
> >>
> >> -----Original Message-----
> >> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> >> Sent: Wednesday, September 22, 2021 5:16
> >> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> >> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> >> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >>
> >> [CAUTION: External Email]
> >>
> >> Hi Stanislav,
> >>
> >> Attached is a tarball with -save-temps output (pre-processed source and generated assembly) for first-bad run (your commit) and last-good run (immediate parent of your commit).
> >>
> >> --
> >> Maxim Kuvyrkov
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar…
> >>
> >>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
> >>>
> >>> [AMD Official Use Only]
> >>>
> >>> Thanks for letting me know. Some regressions are inevitable, however do you happen to have any analysis and dumps? I myself do not understand ARM ISA well...
> >>>
> >>> Stas
> >>>
> >>> -----Original Message-----
> >>> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> >>> Sent: Wednesday, September 15, 2021 5:52
> >>> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> >>> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> >>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >>>
> >>> [CAUTION: External Email]
> >>>
> >>> Hi Stanislav,
> >>>
> >>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels.
> >>>
> >>> --
> >>> Maxim Kuvyrkov
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar…
> >>>
> >>
> >>
> >
> <image001.png>
I’m trying to import these files into Ds-5. After unzipping files, it still will not show up in ds-5 search. Below is the error that I keep receiving:
Sent from my iPhone
After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez <aldyh(a)redhat.com>
Avoid invalid loop transformations in jump threading registry.
the following benchmarks grew in size by more than 1%:
- 450.soplex grew in size by 2% from 207260 to 211436 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os -flto
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez <aldyh(a)redhat.com>
Date: Thu Sep 23 10:59:24 2021 +0200
Avoid invalid loop transformations in jump threading registry.
My upcoming improvements to the forward jump threader make it thread
more aggressively. In investigating some "regressions", I noticed
that it has always allowed threading through empty latches and across
loop boundaries. As we have discussed recently, this should be avoided
until after loop optimizations have run their course.
Note that this wasn't much of a problem before because DOM/VRP
couldn't find these opportunities, but with a smarter solver, we trip
over them more easily.
Because the forward threader doesn't have an independent localized cost
model like the new threader (profitable_path_p), it is difficult to
catch these things at discovery. However, we can catch them at
registration time, with the added benefit that all the threaders
(forward and backward) can share the handcuffs.
This patch is an adaptation of what we do in the backward threader, but
it is not meant to catch everything we do there, as some of the
restrictions there are due to limitations of the different block
copiers (for example, the generic copier does not re-use existing
threading paths).
We could ideally remove the now redundant bits in profitable_path_p, but
I would prefer not to for two reasons. First, the backward threader uses
profitable_path_p as it discovers paths to avoid discovering paths in
unprofitable directions. Second, I would like to merge all the forward
cost restrictions into the profitability class in the backward threader,
not the other way around. Alas, that reshuffling will have to wait for
the next release.
As usual, there are quite a few tests that needed adjustments. It seems
we were quite happily threading improper scenarios. With most of them,
as can be seen in pr77445-2.c, we're merely shifting the threading to
after loop optimizations.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
New.
(jt_path_registry::register_jump_thread): Call
cancel_invalid_paths.
* tree-ssa-threadupdate.h (class jt_path_registry): Add
cancel_invalid_paths.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/20030714-2.c: Adjust.
* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
* gcc.dg/vect/bb-slp-16.c: Adjust.
---
gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++-
gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++---
gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +-
gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 ---
gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++-----
gcc/tree-ssa-threadupdate.h | 1 +
8 files changed, 78 insertions(+), 35 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
index eb663f2ff5b..9585ff11307 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
@@ -32,7 +32,8 @@ get_alias_set (t)
}
}
-/* There should be exactly three IF conditionals if we thread jumps
- properly. */
-/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
+/* There should be exactly 4 IF conditionals if we thread jumps
+ properly. There used to be 3, but one thread was crossing
+ loops. */
+/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index e1464e21170..922a331b217 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
extern int status, pt;
extern int count;
@@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a)
pt--;
}
-/* There are 4 jump threading opportunities, all of which will be
- realized, which will eliminate testing of FLAG, completely. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */
+/* There are 2 jump threading opportunities (which don't cross loops),
+ all of which will be realized, which will eliminate testing of
+ FLAG, completely. */
+/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
-/* There should be no assignments or references to FLAG, verify they're
- eliminated as early as possible. */
-/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
+/* We used to remove references to FLAG by DCE2, but this was
+ depending on early threaders threading through loop boundaries
+ (which we shouldn't do). However, the late threading passes, which
+ run after loop optimizations , can successfully eliminate the
+ references to FLAG. Verify that ther are no references by the late
+ threading passes. */
+/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
index f9fc212f49e..01a0f1f197d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
@@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) {
aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities. Skip the later tests on aarch64. */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 60d4f76f076..2d78d045516 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -21,5 +21,7 @@
condition.
All the cases are picked up by VRP1 as jump threads. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
+
+/* There used to be 6 jump threads found by thread1, but they all
+ depended on threading through distinct loops in ethread. */
/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index e3d4b311c03..16abcde5053 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,8 +1,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 664e93e9b60..e68a9b62535 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -1,8 +1,5 @@
/* { dg-require-effective-target vect_int } */
-/* See note below as to why we disable threading. */
-/* { dg-additional-options "-fdisable-tree-thread1" } */
-
#include <stdarg.h>
#include "tree-vect.h"
@@ -30,10 +27,6 @@ main1 (int dummy)
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
- /* In some architectures like ppc64, jump threading may thread
- the iteration where i==0 such that we no longer optimize the
- BB. Another alternative to disable jump threading would be
- to wrap the read from `i' into a function returning i. */
if (arr[i] = i)
a = i;
else
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index baac11280fa..2b9b8f81274 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
return retval;
}
+bool
+jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
+{
+ gcc_checking_assert (!path.is_empty ());
+ edge taken_edge = path[path.length () - 1]->e;
+ loop_p loop = taken_edge->src->loop_father;
+ bool seen_latch = false;
+ bool path_crosses_loops = false;
+
+ for (unsigned int i = 0; i < path.length (); i++)
+ {
+ edge e = path[i]->e;
+
+ if (e == NULL)
+ {
+ // NULL outgoing edges on a path can happen for jumping to a
+ // constant address.
+ cancel_thread (&path, "Found NULL edge in jump threading path");
+ return true;
+ }
+
+ if (loop->latch == e->src || loop->latch == e->dest)
+ seen_latch = true;
+
+ // The first entry represents the block with an outgoing edge
+ // that we will redirect to the jump threading path. Thus we
+ // don't care about that block's loop father.
+ if ((i > 0 && e->src->loop_father != loop)
+ || e->dest->loop_father != loop)
+ path_crosses_loops = true;
+
+ if (flag_checking && !m_backedge_threads)
+ gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
+ }
+
+ if (cfun->curr_properties & PROP_loop_opts_done)
+ return false;
+
+ if (seen_latch && empty_block_p (loop->latch))
+ {
+ cancel_thread (&path, "Threading through latch before loop opts "
+ "would create non-empty latch");
+ return true;
+ }
+ if (path_crosses_loops)
+ {
+ cancel_thread (&path, "Path crosses loops");
+ return true;
+ }
+ return false;
+}
+
/* Register a jump threading opportunity. We queue up all the jump
threading opportunities discovered by a pass and update the CFG
and SSA form all at once.
@@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
return false;
}
- /* First make sure there are no NULL outgoing edges on the jump threading
- path. That can happen for jumping to a constant address. */
- for (unsigned int i = 0; i < path->length (); i++)
- {
- if ((*path)[i]->e == NULL)
- {
- cancel_thread (path, "Found NULL edge in jump threading path");
- return false;
- }
-
- if (flag_checking && !m_backedge_threads)
- gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0);
- }
+ if (cancel_invalid_paths (*path))
+ return false;
if (dump_file && (dump_flags & TDF_DETAILS))
dump_jump_thread_path (dump_file, *path, true);
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 8b48a671212..d68795c9f27 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -75,6 +75,7 @@ protected:
unsigned long m_num_threaded_edges;
private:
virtual bool update_cfg (bool peel_loop_headers) = 0;
+ bool cancel_invalid_paths (vec<jump_thread_edge *> &path);
jump_thread_path_allocator m_allocator;
// True if threading through back edges is allowed. This is only
// allowed in the generic copier in the backward threader.
</cut>
Progress
* UM-2 [QEMU upstream maintainership]
+ Still looking at the mess that is non-unique bus names. Worked
through exactly which devices and machine types are affected for
the i2c bus.
+ Sent a patchset which tries to make the "create a bus" function
names a bit more regular across different bus types.
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Luis figured out why GDB was crashing when fed the MVE XML by
QEMU's gdbstub; this was a combination of QEMU giving GDB some
non-standard extra registers in its "vfp" XML feature and GDB
not being robust enough against those unexpected extras. Sent
out a patchset which cleans up QEMU's XML in this area and also
implements the extra XML for MVE. (This will only go into QEMU
once the GDB patches have landed and the XML format is nailed down.)
-- PMM
After gcc commit f92901a508305f291fcf2acae0825379477724de
Author: Richard Biener <rguenther(a)suse.de>
tree-optimization/65206 - dependence analysis on mixed pointer/array
the following benchmarks slowed down by more than 2%:
- 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de
cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach f92901a508305f291fcf2acae0825379477724de
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f92901a508305f291fcf2acae0825379477724de
Author: Richard Biener <rguenther(a)suse.de>
Date: Wed Sep 8 14:42:31 2021 +0200
tree-optimization/65206 - dependence analysis on mixed pointer/array
This adds the capability to analyze the dependence of mixed
pointer/array accesses. The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based. Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.
The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based. The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference. That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.
There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there. Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).
2021-09-08 Richard Biener <rguenther(a)suse.de>
PR tree-optimization/65206
* tree-data-ref.h (struct data_reference): Add alt_indices,
order it last.
* tree-data-ref.c (free_data_ref): Release alt_indices.
(dr_analyze_indices): Work on struct indices and get DR_REF as tree.
(create_data_ref): Adjust.
(initialize_data_dependence_relation): Split into head
and tail. When the base objects fail to match up try
again with pointer-based analysis of indices.
* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
not compare the lazily computed alternate set of indices.
* gcc.dg/torture/20210916.c: New testcase.
* gcc.dg/vect/pr65206.c: Likewise.
---
gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++
gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++
gcc/tree-data-ref.c | 174 +++++++++++++++++++++-----------
gcc/tree-data-ref.h | 9 +-
gcc/tree-vectorizer.c | 3 +-
5 files changed, 168 insertions(+), 60 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
new file mode 100644
index 00000000000..0ea6d45e463
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20210916.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+typedef union tree_node *tree;
+struct tree_base {
+ unsigned : 1;
+ unsigned lang_flag_2 : 1;
+};
+struct tree_type {
+ tree main_variant;
+};
+union tree_node {
+ struct tree_base base;
+ struct tree_type type;
+};
+tree finish_struct_t, finish_struct_x;
+void finish_struct()
+{
+ for (; finish_struct_t->type.main_variant;)
+ finish_struct_x->base.lang_flag_2 = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
new file mode 100644
index 00000000000..3b6262622c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
+/* { dg-additional-options "-mavx" { target avx } } */
+
+#define N 1024
+
+double a[N], b[N];
+
+void foo ()
+{
+ for (int i = 0; i < N; ++i)
+ if (b[i] < 3.)
+ a[i] += b[i];
+}
+
+/* We get a .MASK_STORE because while the load of a[i] does not trap
+ the store would introduce store data races. Make sure we still
+ can handle the data dependence with zero distance. */
+
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e061baa7c20..18307a554fc 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see
#include "internal-fn.h"
#include "vr-values.h"
#include "range-op.h"
+#include "tree-ssa-loop-ivopts.h"
static struct datadep_stats
{
@@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
DR, analyzed in LOOP and instantiated before NEST. */
static void
-dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
+dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
{
- vec<tree> access_fns = vNULL;
- tree ref, op;
- tree base, off, access_fn;
-
/* If analyzing a basic-block there are no indices to analyze
and thus no access functions. */
if (!nest)
{
- DR_BASE_OBJECT (dr) = DR_REF (dr);
- DR_ACCESS_FNS (dr).create (0);
+ dri->base_object = ref;
+ dri->access_fns.create (0);
return;
}
- ref = DR_REF (dr);
+ vec<tree> access_fns = vNULL;
/* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
into a two element array with a constant index. The base is
@@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
{
if (TREE_CODE (ref) == ARRAY_REF)
{
- op = TREE_OPERAND (ref, 1);
- access_fn = analyze_scalar_evolution (loop, op);
+ tree op = TREE_OPERAND (ref, 1);
+ tree access_fn = analyze_scalar_evolution (loop, op);
access_fn = instantiate_scev (nest, loop, access_fn);
access_fns.safe_push (access_fn);
}
@@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
analyzed nest, add it as an additional independent access-function. */
if (TREE_CODE (ref) == MEM_REF)
{
- op = TREE_OPERAND (ref, 0);
- access_fn = analyze_scalar_evolution (loop, op);
+ tree op = TREE_OPERAND (ref, 0);
+ tree access_fn = analyze_scalar_evolution (loop, op);
access_fn = instantiate_scev (nest, loop, access_fn);
if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
{
- tree orig_type;
tree memoff = TREE_OPERAND (ref, 1);
- base = initial_condition (access_fn);
- orig_type = TREE_TYPE (base);
+ tree base = initial_condition (access_fn);
+ tree orig_type = TREE_TYPE (base);
STRIP_USELESS_TYPE_CONVERSION (base);
+ tree off;
split_constant_offset (base, &base, &off);
STRIP_USELESS_TYPE_CONVERSION (base);
/* Fold the MEM_REF offset into the evolutions initial
@@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
base, memoff);
MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
- DR_UNCONSTRAINED_BASE (dr) = true;
+ dri->unconstrained_base = true;
access_fns.safe_push (access_fn);
}
}
@@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
build_int_cst (reference_alias_ptr_type (ref), 0));
}
- DR_BASE_OBJECT (dr) = ref;
- DR_ACCESS_FNS (dr) = access_fns;
+ dri->base_object = ref;
+ dri->access_fns = access_fns;
}
/* Extracts the alias analysis information from the memory reference DR. */
@@ -1463,6 +1460,8 @@ void
free_data_ref (data_reference_p dr)
{
DR_ACCESS_FNS (dr).release ();
+ if (dr->alt_indices.base_object)
+ dr->alt_indices.access_fns.release ();
free (dr);
}
@@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
dr_analyze_innermost (&DR_INNERMOST (dr), memref,
nest != NULL ? loop : NULL, stmt);
- dr_analyze_indices (dr, nest, loop);
+ dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
dr_analyze_alias (dr);
if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b)
TREE_TYPE (TREE_OPERAND (ref_b, 0)));
}
-/* Initialize a data dependence relation between data accesses A and
- B. NB_LOOPS is the number of loops surrounding the references: the
- size of the classic distance/direction vectors. */
+/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES
+ is true when the main indices of A and B were not comparable so we try again
+ with alternate indices computed on an indirect reference. */
struct data_dependence_relation *
-initialize_data_dependence_relation (struct data_reference *a,
- struct data_reference *b,
- vec<loop_p> loop_nest)
+initialize_data_dependence_relation (struct data_dependence_relation *res,
+ vec<loop_p> loop_nest,
+ bool use_alt_indices)
{
- struct data_dependence_relation *res;
+ struct data_reference *a = DDR_A (res);
+ struct data_reference *b = DDR_B (res);
unsigned int i;
- res = XCNEW (struct data_dependence_relation);
- DDR_A (res) = a;
- DDR_B (res) = b;
- DDR_LOOP_NEST (res).create (0);
- DDR_SUBSCRIPTS (res).create (0);
- DDR_DIR_VECTS (res).create (0);
- DDR_DIST_VECTS (res).create (0);
-
- if (a == NULL || b == NULL)
+ struct indices *indices_a = &a->indices;
+ struct indices *indices_b = &b->indices;
+ if (use_alt_indices)
{
- DDR_ARE_DEPENDENT (res) = chrec_dont_know;
- return res;
+ if (TREE_CODE (DR_REF (a)) != MEM_REF)
+ indices_a = &a->alt_indices;
+ if (TREE_CODE (DR_REF (b)) != MEM_REF)
+ indices_b = &b->alt_indices;
}
-
- /* If the data references do not alias, then they are independent. */
- if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
- {
- DDR_ARE_DEPENDENT (res) = chrec_known;
- return res;
- }
-
- unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
- unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
+ unsigned int num_dimensions_a = indices_a->access_fns.length ();
+ unsigned int num_dimensions_b = indices_b->access_fns.length ();
if (num_dimensions_a == 0 || num_dimensions_b == 0)
{
DDR_ARE_DEPENDENT (res) = chrec_dont_know;
@@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a,
the a and b accesses have a single ARRAY_REF component reference [0]
but have two subscripts. */
- if (DR_UNCONSTRAINED_BASE (a))
+ if (indices_a->unconstrained_base)
num_dimensions_a -= 1;
- if (DR_UNCONSTRAINED_BASE (b))
+ if (indices_b->unconstrained_base)
num_dimensions_b -= 1;
/* These structures describe sequences of component references in
@@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a,
B: [3, 4] (i.e. s.e) */
while (index_a < num_dimensions_a && index_b < num_dimensions_b)
{
+ /* The alternate indices form always has a single dimension
+ with unconstrained base. */
+ gcc_assert (!use_alt_indices);
+
/* REF_A and REF_B must be one of the component access types
allowed by dr_analyze_indices. */
gcc_checking_assert (access_fn_component_p (ref_a));
@@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a,
/* See whether FULL_SEQ ends at the base and whether the two bases
are equal. We do not care about TBAA or alignment info so we can
use OEP_ADDRESS_OF to avoid false negatives. */
- tree base_a = DR_BASE_OBJECT (a);
- tree base_b = DR_BASE_OBJECT (b);
+ tree base_a = indices_a->base_object;
+ tree base_b = indices_b->base_object;
bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
&& full_seq.start_b + full_seq.length == num_dimensions_b
- && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
+ && (indices_a->unconstrained_base
+ == indices_b->unconstrained_base)
&& operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
&& (types_compatible_p (TREE_TYPE (base_a),
TREE_TYPE (base_b))
@@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a,
both lvalues are distinct from the object's declared type. */
if (same_base_p)
{
- if (DR_UNCONSTRAINED_BASE (a))
+ if (indices_a->unconstrained_base)
full_seq.length += 1;
}
else
@@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a,
/* Punt if we didn't find a suitable sequence. */
if (full_seq.length == 0)
{
- DDR_ARE_DEPENDENT (res) = chrec_dont_know;
- return res;
+ if (use_alt_indices
+ || (TREE_CODE (DR_REF (a)) == MEM_REF
+ && TREE_CODE (DR_REF (b)) == MEM_REF)
+ || may_be_nonaddressable_p (DR_REF (a))
+ || may_be_nonaddressable_p (DR_REF (b)))
+ {
+ /* Fully exhausted possibilities. */
+ DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+ return res;
+ }
+
+ /* Try evaluating both DRs as dereferences of pointers. */
+ if (!a->alt_indices.base_object
+ && TREE_CODE (DR_REF (a)) != MEM_REF)
+ {
+ tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
+ build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
+ build_int_cst
+ (reference_alias_ptr_type (DR_REF (a)), 0));
+ dr_analyze_indices (&a->alt_indices, alt_ref,
+ loop_preheader_edge (loop_nest[0]),
+ loop_containing_stmt (DR_STMT (a)));
+ }
+ if (!b->alt_indices.base_object
+ && TREE_CODE (DR_REF (b)) != MEM_REF)
+ {
+ tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
+ build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
+ build_int_cst
+ (reference_alias_ptr_type (DR_REF (b)), 0));
+ dr_analyze_indices (&b->alt_indices, alt_ref,
+ loop_preheader_edge (loop_nest[0]),
+ loop_containing_stmt (DR_STMT (b)));
+ }
+ return initialize_data_dependence_relation (res, loop_nest, true);
}
if (!same_base_p)
@@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a,
struct subscript *subscript;
subscript = XNEW (struct subscript);
- SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
- SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
+ SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
+ SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
@@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a,
return res;
}
+/* Initialize a data dependence relation between data accesses A and
+ B. NB_LOOPS is the number of loops surrounding the references: the
+ size of the classic distance/direction vectors. */
+
+struct data_dependence_relation *
+initialize_data_dependence_relation (struct data_reference *a,
+ struct data_reference *b,
+ vec<loop_p> loop_nest)
+{
+ data_dependence_relation *res = XCNEW (struct data_dependence_relation);
+ DDR_A (res) = a;
+ DDR_B (res) = b;
+ DDR_LOOP_NEST (res).create (0);
+ DDR_SUBSCRIPTS (res).create (0);
+ DDR_DIR_VECTS (res).create (0);
+ DDR_DIST_VECTS (res).create (0);
+
+ if (a == NULL || b == NULL)
+ {
+ DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+ return res;
+ }
+
+ /* If the data references do not alias, then they are independent. */
+ if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
+ {
+ DDR_ARE_DEPENDENT (res) = chrec_known;
+ return res;
+ }
+
+ return initialize_data_dependence_relation (res, loop_nest, false);
+}
+
+
/* Frees memory used by the conflict function F. */
static void
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 685f33d85ae..74f579c9f3f 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -166,14 +166,19 @@ struct data_reference
and runs to completion. */
bool is_conditional_in_stmt;
+ /* Alias information for the data reference. */
+ struct dr_alias alias;
+
/* Behavior of the memory reference in the innermost loop. */
struct innermost_loop_behavior innermost;
/* Subscripts of this data reference. */
struct indices indices;
- /* Alias information for the data reference. */
- struct dr_alias alias;
+ /* Alternate subscripts initialized lazily and used by data-dependence
+ analysis only when the main indices of two DRs are not comparable.
+ Keep last to keep vec_info_shared::check_datarefs happy. */
+ struct indices alt_indices;
};
#define DR_STMT(DR) (DR)->stmt
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 3aa3e2a6783..20daa31187d 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
return;
gcc_assert (datarefs.length () == datarefs_copy.length ());
for (unsigned i = 0; i < datarefs.length (); ++i)
- if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
+ if (memcmp (&datarefs_copy[i], datarefs[i],
+ offsetof (data_reference, alt_indices)) != 0)
gcc_unreachable ();
}
</cut>