Hi Arthur,
Thanks for looking into this!
The flags to compile regexec.c were:
-O3 --target=aarch64-linux-gnu -fgnu89-inline
Clang was configured with (on x86_64-linux-gnu host):
cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64
Please let me know if the above doesn’t work for you.
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
> On 29 Sep 2021, at 20:47, Arthur Eubanks <aeubanks(a)google.com> wrote:
>
> Do you know the flags passed to Clang to compile the sources? I tried compiling the preprocessed sources but ran into the below, and couldn't find the flags in any of the logs.
>
> In file included from regexec.c:93:
> In file included from ./perl.h:384:
> In file included from /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144:
> /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27: error: typedef redefinition with different types ('unsigned long' vs 'unsigned long long')
> typedef long unsigned int size_t;
> ^
> 1 error generated.
>
>
>
> And yeah just moving the code around could cause major performance regressions, I've had other patches do the same for various benchmarks, there's not much we can do about that if that's actually the root cause. If I can compile the file I can check if the optimization actually created worse IR or not.
>
>
> On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote:
> Hi Arthur,
>
> Pre-processed source is in the save-temps tarballs linked below; S_regmatch() is in regexec.i .
>
> The save-temps also have .s assembly file for before and after your patch, and the only code-gen difference is in S_reginclass() function — see the attached screenshot #1.
>
> Looking into profile of S_regmatch(), some of the extra cycles come from hot loop starting with “cbz w19,...” getting misaligned — before your patch it was starting at "2bce10", and after it starts at "2bce6c”.
>
> Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() in an unfortunate way?
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
>> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeubanks(a)google.com> wrote:
>>
>> Could I get the source file with S_regmatch()?
>>
>> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote:
>> Hi Arthur,
>>
>> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of its hot function S_regmatch() by 14%.
>>
>> Could you take a look if this is easily fixable, please?
>>
>> Regards,
>>
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>>
>> > On 24 Sep 2021, at 15:07, ci_notify(a)linaro.org wrote:
>> >
>> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc
>> > Author: Arthur Eubanks <aeubanks(a)google.com>
>> >
>> > [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
>> >
>> > the following benchmarks slowed down by more than 2%:
>> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples
>> > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples
>> >
>> > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
>> >
>> > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
>> > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
>> > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
>> > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
>> >
>> > Configuration:
>> > - Benchmark: SPEC CPU2006
>> > - Toolchain: Clang + Glibc + LLVM Linker
>> > - Version: all components were built from their tip of trunk
>> > - Target: aarch64-linux-gnu
>> > - Compiler flags: -O3
>> > - Hardware: NVidia TX1 4x Cortex-A57
>> >
>> > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
>
> <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>
Progress
* UM-2 [QEMU upstream maintainership]
+ Worked through my code-review backlog
+ Noticed that we never got round to making our emulated GICv3
support having redistributors in more than one contiguous region;
this prevents using more than 123 CPUs with the virt board. Sent
out a patchset which adds the necessary handling.
+ Generally trying to tie off loose ends pre-holiday :-)
-- PMM
Identified regression caused by *linux:30f349097897c115345beabeecc5e710b479ff1e*:
commit 30f349097897c115345beabeecc5e710b479ff1e
Merge: 9c566611ac5c f76c87e8c337
Author: Linus Torvalds <torvalds(a)linux-foundation.org>
Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Results regressed to (for first_bad == 30f349097897c115345beabeecc5e710b479ff1e)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
21782
# First few build errors in logs:
from (for last_good == 9c566611ac5cc7b45af943632f7a9b1b6a642991)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
29893
# linux build successful:
all
This commit has regressed these CI configurations:
- tcwg_kernel/gnu-release-arm-mainline-allmodconfig
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a…
Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a…
Reproduce builds:
<cut>
mkdir investigate-linux-30f349097897c115345beabeecc5e710b479ff1e
cd investigate-linux-30f349097897c115345beabeecc5e710b479ff1e
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-mainline-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach 30f349097897c115345beabeecc5e710b479ff1e
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 9c566611ac5cc7b45af943632f7a9b1b6a642991
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 30f349097897c115345beabeecc5e710b479ff1e
Merge: 9c566611ac5c f76c87e8c337
Author: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Wed Sep 8 16:38:25 2021 -0700
Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are mostly ARM cpufreq driver updates, including one new
MediaTek driver that has just passed all of the reviews, with the
addition of a revert of a recent intel_pstate commit, some core
cpufreq changes and a DT-related update of the operating performance
points (OPP) support code.
Specifics:
- Add new cpufreq driver for the MediaTek MT6779 platform called
mediatek-hw along with corresponding DT bindings (Hector.Yuan).
- Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
Gopinath).
- Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
policy flag (Taniya Das).
- Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
Andersson).
- Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
flag (Viresh Kumar).
- Add new cpufreq driver callback to allow drivers to register with
the Energy Model in a consistent way and make several drivers use
it (Viresh Kumar).
- Change the remaining users of the .ready() cpufreq driver callback
to move the code from it elsewhere and drop it from the cpufreq
core (Viresh Kumar).
- Revert recent intel_pstate change adding HWP guaranteed performance
change notification support to it that led to problems, because the
notification in question is triggered prematurely on some systems
(Rafael Wysocki).
- Convert the OPP DT bindings to DT schema and clean them up while at
it (Rob Herring)"
* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
cpufreq: Remove ready() callback
cpufreq: sh: Remove sh_cpufreq_cpu_ready()
cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
dt-bindings: opp: Convert to DT schema
dt-bindings: Clean-up OPP binding node names in examples
ARM: dts: omap: Drop references to opp.txt
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
...
Documentation/cpu-freq/cpu-drivers.rst | 3 -
.../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +-
.../bindings/cpufreq/cpufreq-mediatek-hw.yaml | 70 +++
.../bindings/cpufreq/cpufreq-mediatek.txt | 2 +-
.../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +-
.../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +-
.../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +-
.../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +-
.../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +-
.../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +-
.../opp/allwinner,sun50i-h6-operating-points.yaml | 4 +
Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++
.../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++
Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++
Documentation/devicetree/bindings/opp/opp.txt | 622 ---------------------
Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +-
.../bindings/opp/ti-omap5-opp-supply.txt | 2 +-
.../devicetree/bindings/power/power-domain.yaml | 2 +-
.../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 -
arch/arm/boot/dts/omap34xx.dtsi | 1 -
arch/arm/boot/dts/omap36xx.dtsi | 1 -
drivers/base/arch_topology.c | 2 +
drivers/cpufreq/Kconfig.arm | 12 +
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/acpi-cpufreq.c | 14 +-
drivers/cpufreq/cpufreq-dt-platdev.c | 4 +
drivers/cpufreq/cpufreq-dt.c | 3 +-
drivers/cpufreq/cpufreq.c | 17 +-
drivers/cpufreq/imx6q-cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 39 --
drivers/cpufreq/mediatek-cpufreq-hw.c | 308 ++++++++++
drivers/cpufreq/mediatek-cpufreq.c | 3 +-
drivers/cpufreq/omap-cpufreq.c | 2 +-
drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++-
drivers/cpufreq/scmi-cpufreq.c | 65 ++-
drivers/cpufreq/scpi-cpufreq.c | 3 +-
drivers/cpufreq/sh-cpufreq.c | 11 -
drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +-
include/linux/cpufreq.h | 75 ++-
39 files changed, 1441 insertions(+), 767 deletions(-)
</cut>
Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-aarch64-next-allnoconfig. So far, this commit has regressed CI configurations:
- tcwg_kernel/llvm-release-aarch64-next-allnoconfig
Culprit:
<cut>
commit 8633ef82f101c040427b57d4df7b706261420b94
Author: Javier Martinez Canillas <javierm(a)redhat.com>
Date: Fri Jun 25 15:13:59 2021 +0200
drivers/firmware: consolidate EFI framebuffer setup for all arches
The register_gop_device() function registers an "efi-framebuffer" platform
device to match against the efifb driver, to have an early framebuffer for
EFI platforms.
But there is already support to do exactly the same by the Generic System
Framebuffers (sysfb) driver. This used to be only for X86 but it has been
moved to drivers/firmware and could be reused by other architectures.
Also, besides supporting registering an "efi-framebuffer", this driver can
register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers
on non-X86 EFI platforms. For example, on aarch64 these drivers can only
be used with DT and doesn't have code to register a "simple-frambuffer"
platform device when booting with EFI.
For these reasons, let's remove the register_gop_device() duplicated code
and instead move the platform specific logic that's there to sysfb driver.
Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com>
Acked-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi…
</cut>
Results regressed to (for first_bad == 8633ef82f101c040427b57d4df7b706261420b94)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
600
# First few build errors in logs:
# 00:00:38 ld.lld: error: undefined symbol: screen_info
# 00:00:38 make: *** [vmlinux] Error 1
from (for last_good == d391c58271072d0b0fad93c82018d495b2633448)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
601
# linux build successful:
all
# linux boot successful:
boot
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Configuration details:
rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ff11764…"
Reproduce builds:
<cut>
mkdir investigate-linux-8633ef82f101c040427b57d4df7b706261420b94
cd investigate-linux-8633ef82f101c040427b57d4df7b706261420b94
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach 8633ef82f101c040427b57d4df7b706261420b94
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach d391c58271072d0b0fad93c82018d495b2633448
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next…
Full commit (up to 1000 lines):
<cut>
commit 8633ef82f101c040427b57d4df7b706261420b94
Author: Javier Martinez Canillas <javierm(a)redhat.com>
Date: Fri Jun 25 15:13:59 2021 +0200
drivers/firmware: consolidate EFI framebuffer setup for all arches
The register_gop_device() function registers an "efi-framebuffer" platform
device to match against the efifb driver, to have an early framebuffer for
EFI platforms.
But there is already support to do exactly the same by the Generic System
Framebuffers (sysfb) driver. This used to be only for X86 but it has been
moved to drivers/firmware and could be reused by other architectures.
Also, besides supporting registering an "efi-framebuffer", this driver can
register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers
on non-X86 EFI platforms. For example, on aarch64 these drivers can only
be used with DT and doesn't have code to register a "simple-frambuffer"
platform device when booting with EFI.
For these reasons, let's remove the register_gop_device() duplicated code
and instead move the platform specific logic that's there to sysfb driver.
Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com>
Acked-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi…
---
arch/arm/include/asm/efi.h | 5 +--
arch/arm64/include/asm/efi.h | 5 +--
arch/riscv/include/asm/efi.h | 5 +--
drivers/firmware/Kconfig | 8 ++--
drivers/firmware/Makefile | 2 +-
drivers/firmware/efi/efi-init.c | 90 ---------------------------------------
drivers/firmware/efi/sysfb_efi.c | 76 ++++++++++++++++++++++++++++++++-
drivers/firmware/sysfb.c | 35 ++++++++++-----
drivers/firmware/sysfb_simplefb.c | 31 ++++++++++----
drivers/gpu/drm/tiny/Kconfig | 4 +-
include/linux/sysfb.h | 26 +++++------
11 files changed, 143 insertions(+), 144 deletions(-)
diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 9de7ab2ce05d..a6f3b179e8a9 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -17,6 +17,7 @@
#ifdef CONFIG_EFI
void efi_init(void);
+extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md);
int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md);
@@ -52,10 +53,6 @@ void efi_virtmap_unload(void);
struct screen_info *alloc_screen_info(void);
void free_screen_info(struct screen_info *si);
-static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
-{
-}
-
/*
* A reasonable upper bound for the uncompressed kernel size is 32 MBytes,
* so we will reserve that amount of memory. We have no easy way to tell what
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 3578aba9c608..42d673a011c8 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -14,6 +14,7 @@
#ifdef CONFIG_EFI
extern void efi_init(void);
+extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
#else
#define efi_init()
#endif
@@ -85,10 +86,6 @@ static inline void free_screen_info(struct screen_info *si)
{
}
-static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
-{
-}
-
#define EFI_ALLOC_ALIGN SZ_64K
/*
diff --git a/arch/riscv/include/asm/efi.h b/arch/riscv/include/asm/efi.h
index 6d98cd999680..7a8f0d45b13a 100644
--- a/arch/riscv/include/asm/efi.h
+++ b/arch/riscv/include/asm/efi.h
@@ -13,6 +13,7 @@
#ifdef CONFIG_EFI
extern void efi_init(void);
+extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
#else
#define efi_init()
#endif
@@ -39,10 +40,6 @@ static inline void free_screen_info(struct screen_info *si)
{
}
-static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
-{
-}
-
void efi_virtmap_load(void);
void efi_virtmap_unload(void);
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 71f3d97f0c39..af6719cc576b 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -254,9 +254,9 @@ config QCOM_SCM_DOWNLOAD_MODE_DEFAULT
config SYSFB
bool
default y
- depends on X86 || COMPILE_TEST
+ depends on X86 || ARM || ARM64 || RISCV || COMPILE_TEST
-config X86_SYSFB
+config SYSFB_SIMPLEFB
bool "Mark VGA/VBE/EFI FB as generic system framebuffer"
depends on SYSFB
help
@@ -264,10 +264,10 @@ config X86_SYSFB
bootloader or kernel can show basic video-output during boot for
user-guidance and debugging. Historically, x86 used the VESA BIOS
Extensions and EFI-framebuffers for this, which are mostly limited
- to x86.
+ to x86 BIOS or EFI systems.
This option, if enabled, marks VGA/VBE/EFI framebuffers as generic
framebuffers so the new generic system-framebuffer drivers can be
- used on x86. If the framebuffer is not compatible with the generic
+ used instead. If the framebuffer is not compatible with the generic
modes, it is advertised as fallback platform framebuffer so legacy
drivers like efifb, vesafb and uvesafb can pick it up.
If this option is not selected, all system framebuffers are always
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index ad78f78ffa8d..6ac637e422b9 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -19,7 +19,7 @@ obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o
obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o
obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
obj-$(CONFIG_SYSFB) += sysfb.o
-obj-$(CONFIG_X86_SYSFB) += sysfb_simplefb.o
+obj-$(CONFIG_SYSFB_SIMPLEFB) += sysfb_simplefb.o
obj-$(CONFIG_TI_SCI_PROTOCOL) += ti_sci.o
obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o
obj-$(CONFIG_TURRIS_MOX_RWTM) += turris-mox-rwtm.o
diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
index a552a08a1741..b19ce1a83f91 100644
--- a/drivers/firmware/efi/efi-init.c
+++ b/drivers/firmware/efi/efi-init.c
@@ -275,93 +275,3 @@ void __init efi_init(void)
}
#endif
}
-
-static bool efifb_overlaps_pci_range(const struct of_pci_range *range)
-{
- u64 fb_base = screen_info.lfb_base;
-
- if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
- fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32;
-
- return fb_base >= range->cpu_addr &&
- fb_base < (range->cpu_addr + range->size);
-}
-
-static struct device_node *find_pci_overlap_node(void)
-{
- struct device_node *np;
-
- for_each_node_by_type(np, "pci") {
- struct of_pci_range_parser parser;
- struct of_pci_range range;
- int err;
-
- err = of_pci_range_parser_init(&parser, np);
- if (err) {
- pr_warn("of_pci_range_parser_init() failed: %d\n", err);
- continue;
- }
-
- for_each_of_pci_range(&parser, &range)
- if (efifb_overlaps_pci_range(&range))
- return np;
- }
- return NULL;
-}
-
-/*
- * If the efifb framebuffer is backed by a PCI graphics controller, we have
- * to ensure that this relation is expressed using a device link when
- * running in DT mode, or the probe order may be reversed, resulting in a
- * resource reservation conflict on the memory window that the efifb
- * framebuffer steals from the PCIe host bridge.
- */
-static int efifb_add_links(struct fwnode_handle *fwnode)
-{
- struct device_node *sup_np;
-
- sup_np = find_pci_overlap_node();
-
- /*
- * If there's no PCI graphics controller backing the efifb, we are
- * done here.
- */
- if (!sup_np)
- return 0;
-
- fwnode_link_add(fwnode, of_fwnode_handle(sup_np));
- of_node_put(sup_np);
-
- return 0;
-}
-
-static const struct fwnode_operations efifb_fwnode_ops = {
- .add_links = efifb_add_links,
-};
-
-static struct fwnode_handle efifb_fwnode;
-
-static int __init register_gop_device(void)
-{
- struct platform_device *pd;
- int err;
-
- if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
- return 0;
-
- pd = platform_device_alloc("efi-framebuffer", 0);
- if (!pd)
- return -ENOMEM;
-
- if (IS_ENABLED(CONFIG_PCI)) {
- fwnode_init(&efifb_fwnode, &efifb_fwnode_ops);
- pd->dev.fwnode = &efifb_fwnode;
- }
-
- err = platform_device_add_data(pd, &screen_info, sizeof(screen_info));
- if (err)
- return err;
-
- return platform_device_add(pd);
-}
-subsys_initcall(register_gop_device);
diff --git a/drivers/firmware/efi/sysfb_efi.c b/drivers/firmware/efi/sysfb_efi.c
index 9f035b15501c..f51865e1b876 100644
--- a/drivers/firmware/efi/sysfb_efi.c
+++ b/drivers/firmware/efi/sysfb_efi.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * Generic System Framebuffers on x86
+ * Generic System Framebuffers
* Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com>
*
* EFI Quirks Copyright (c) 2006 Edgar Hucek <gimli(a)dark-green.com>
@@ -19,7 +19,9 @@
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/of_address.h>
#include <linux/pci.h>
+#include <linux/platform_device.h>
#include <linux/screen_info.h>
#include <linux/sysfb.h>
#include <video/vga.h>
@@ -267,7 +269,72 @@ static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = {
{},
};
-__init void sysfb_apply_efi_quirks(void)
+static bool efifb_overlaps_pci_range(const struct of_pci_range *range)
+{
+ u64 fb_base = screen_info.lfb_base;
+
+ if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+ fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32;
+
+ return fb_base >= range->cpu_addr &&
+ fb_base < (range->cpu_addr + range->size);
+}
+
+static struct device_node *find_pci_overlap_node(void)
+{
+ struct device_node *np;
+
+ for_each_node_by_type(np, "pci") {
+ struct of_pci_range_parser parser;
+ struct of_pci_range range;
+ int err;
+
+ err = of_pci_range_parser_init(&parser, np);
+ if (err) {
+ pr_warn("of_pci_range_parser_init() failed: %d\n", err);
+ continue;
+ }
+
+ for_each_of_pci_range(&parser, &range)
+ if (efifb_overlaps_pci_range(&range))
+ return np;
+ }
+ return NULL;
+}
+
+/*
+ * If the efifb framebuffer is backed by a PCI graphics controller, we have
+ * to ensure that this relation is expressed using a device link when
+ * running in DT mode, or the probe order may be reversed, resulting in a
+ * resource reservation conflict on the memory window that the efifb
+ * framebuffer steals from the PCIe host bridge.
+ */
+static int efifb_add_links(struct fwnode_handle *fwnode)
+{
+ struct device_node *sup_np;
+
+ sup_np = find_pci_overlap_node();
+
+ /*
+ * If there's no PCI graphics controller backing the efifb, we are
+ * done here.
+ */
+ if (!sup_np)
+ return 0;
+
+ fwnode_link_add(fwnode, of_fwnode_handle(sup_np));
+ of_node_put(sup_np);
+
+ return 0;
+}
+
+static const struct fwnode_operations efifb_fwnode_ops = {
+ .add_links = efifb_add_links,
+};
+
+static struct fwnode_handle efifb_fwnode;
+
+__init void sysfb_apply_efi_quirks(struct platform_device *pd)
{
if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI ||
!(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS))
@@ -281,4 +348,9 @@ __init void sysfb_apply_efi_quirks(void)
screen_info.lfb_height = temp;
screen_info.lfb_linelength = 4 * screen_info.lfb_width;
}
+
+ if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI && IS_ENABLED(CONFIG_PCI)) {
+ fwnode_init(&efifb_fwnode, &efifb_fwnode_ops);
+ pd->dev.fwnode = &efifb_fwnode;
+ }
}
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 1337515963d5..2bfbb05f7d89 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -1,11 +1,11 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * Generic System Framebuffers on x86
+ * Generic System Framebuffers
* Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com>
*/
/*
- * Simple-Framebuffer support for x86 systems
+ * Simple-Framebuffer support
* Create a platform-device for any available boot framebuffer. The
* simple-framebuffer platform device is already available on DT systems, so
* this module parses the global "screen_info" object and creates a suitable
@@ -16,12 +16,12 @@
* to pick these devices up without messing with simple-framebuffer drivers.
* The global "screen_info" is still valid at all times.
*
- * If CONFIG_X86_SYSFB is not selected, we never register "simple-framebuffer"
+ * If CONFIG_SYSFB_SIMPLEFB is not selected, never register "simple-framebuffer"
* platform devices, but only use legacy framebuffer devices for
* backwards compatibility.
*
* TODO: We set the dev_id field of all platform-devices to 0. This allows
- * other x86 OF/DT parsers to create such devices, too. However, they must
+ * other OF/DT parsers to create such devices, too. However, they must
* start at offset 1 for this to work.
*/
@@ -43,12 +43,10 @@ static __init int sysfb_init(void)
bool compatible;
int ret;
- sysfb_apply_efi_quirks();
-
/* try to create a simple-framebuffer device */
- compatible = parse_mode(si, &mode);
+ compatible = sysfb_parse_mode(si, &mode);
if (compatible) {
- ret = create_simplefb(si, &mode);
+ ret = sysfb_create_simplefb(si, &mode);
if (!ret)
return 0;
}
@@ -61,9 +59,24 @@ static __init int sysfb_init(void)
else
name = "platform-framebuffer";
- pd = platform_device_register_resndata(NULL, name, 0,
- NULL, 0, si, sizeof(*si));
- return PTR_ERR_OR_ZERO(pd);
+ pd = platform_device_alloc(name, 0);
+ if (!pd)
+ return -ENOMEM;
+
+ sysfb_apply_efi_quirks(pd);
+
+ ret = platform_device_add_data(pd, si, sizeof(*si));
+ if (ret)
+ goto err;
+
+ ret = platform_device_add(pd);
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ platform_device_put(pd);
+ return ret;
}
/* must execute after PCI subsystem for EFI quirks */
diff --git a/drivers/firmware/sysfb_simplefb.c b/drivers/firmware/sysfb_simplefb.c
index df892444ea17..b86761904949 100644
--- a/drivers/firmware/sysfb_simplefb.c
+++ b/drivers/firmware/sysfb_simplefb.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * Generic System Framebuffers on x86
+ * Generic System Framebuffers
* Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com>
*/
@@ -23,9 +23,9 @@
static const char simplefb_resname[] = "BOOTFB";
static const struct simplefb_format formats[] = SIMPLEFB_FORMATS;
-/* try parsing x86 screen_info into a simple-framebuffer mode struct */
-__init bool parse_mode(const struct screen_info *si,
- struct simplefb_platform_data *mode)
+/* try parsing screen_info into a simple-framebuffer mode struct */
+__init bool sysfb_parse_mode(const struct screen_info *si,
+ struct simplefb_platform_data *mode)
{
const struct simplefb_format *f;
__u8 type;
@@ -57,13 +57,14 @@ __init bool parse_mode(const struct screen_info *si,
return false;
}
-__init int create_simplefb(const struct screen_info *si,
- const struct simplefb_platform_data *mode)
+__init int sysfb_create_simplefb(const struct screen_info *si,
+ const struct simplefb_platform_data *mode)
{
struct platform_device *pd;
struct resource res;
u64 base, size;
u32 length;
+ int ret;
/*
* If the 64BIT_BASE capability is set, ext_lfb_base will contain the
@@ -105,7 +106,19 @@ __init int create_simplefb(const struct screen_info *si,
if (res.end <= res.start)
return -EINVAL;
- pd = platform_device_register_resndata(NULL, "simple-framebuffer", 0,
- &res, 1, mode, sizeof(*mode));
- return PTR_ERR_OR_ZERO(pd);
+ pd = platform_device_alloc("simple-framebuffer", 0);
+ if (!pd)
+ return -ENOMEM;
+
+ sysfb_apply_efi_quirks(pd);
+
+ ret = platform_device_add_resources(pd, &res, 1);
+ if (ret)
+ return ret;
+
+ ret = platform_device_add_data(pd, mode, sizeof(*mode));
+ if (ret)
+ return ret;
+
+ return platform_device_add(pd);
}
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 5593128eeff9..d31be274a2bd 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -64,8 +64,8 @@ config DRM_SIMPLEDRM
buffer, size, and display format must be provided via device tree,
UEFI, VESA, etc.
- On x86 and compatible, you should also select CONFIG_X86_SYSFB to
- use UEFI and VESA framebuffers.
+ On x86 BIOS or UEFI systems, you should also select SYSFB_SIMPLEFB
+ to use UEFI and VESA framebuffers.
config TINYDRM_HX8357D
tristate "DRM support for HX8357D display panels"
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index 3e5355769dc3..b0dcfa26d07b 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,37 +58,37 @@ struct efifb_dmi_info {
#ifdef CONFIG_EFI
extern struct efifb_dmi_info efifb_dmi_list[];
-void sysfb_apply_efi_quirks(void);
+void sysfb_apply_efi_quirks(struct platform_device *pd);
#else /* CONFIG_EFI */
-static inline void sysfb_apply_efi_quirks(void)
+static inline void sysfb_apply_efi_quirks(struct platform_device *pd)
{
}
#endif /* CONFIG_EFI */
-#ifdef CONFIG_X86_SYSFB
+#ifdef CONFIG_SYSFB_SIMPLEFB
-bool parse_mode(const struct screen_info *si,
- struct simplefb_platform_data *mode);
-int create_simplefb(const struct screen_info *si,
- const struct simplefb_platform_data *mode);
+bool sysfb_parse_mode(const struct screen_info *si,
+ struct simplefb_platform_data *mode);
+int sysfb_create_simplefb(const struct screen_info *si,
+ const struct simplefb_platform_data *mode);
-#else /* CONFIG_X86_SYSFB */
+#else /* CONFIG_SYSFB_SIMPLE */
-static inline bool parse_mode(const struct screen_info *si,
- struct simplefb_platform_data *mode)
+static inline bool sysfb_parse_mode(const struct screen_info *si,
+ struct simplefb_platform_data *mode)
{
return false;
}
-static inline int create_simplefb(const struct screen_info *si,
- const struct simplefb_platform_data *mode)
+static inline int sysfb_create_simplefb(const struct screen_info *si,
+ const struct simplefb_platform_data *mode)
{
return -EINVAL;
}
-#endif /* CONFIG_X86_SYSFB */
+#endif /* CONFIG_SYSFB_SIMPLE */
#endif /* _LINUX_SYSFB_H */
</cut>
Hi Greg,
This appears to have been a fluke. Boot-testing succeeded before the merge and failed after. Boot-testing on allmodconfig doesn’t seem to be stable, so we are going to disable it.
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
> On 18 Aug 2021, at 08:38, Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote:
>
> On Wed, Aug 18, 2021 at 05:22:07AM +0000, ci_notify(a)linaro.org wrote:
>> Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-lts-allmodconfig. So far, this commit has regressed CI configurations:
>> - tcwg_kernel/llvm-master-aarch64-lts-allmodconfig
>>
>> Culprit:
>> <cut>
>> commit 132a8267adabd645476b542b3b132c1b91988fe8
>> Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>> Date: Thu Aug 12 13:22:21 2021 +0200
>>
>> Linux 5.10.58
>
> <snip>
>
> And what am I supposed to do with this information?
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe(a)googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/YRyczv2OCq51edQh%40kroa….
[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp:
commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
Author: Tom de Vries <tdevries(a)suse.de>
[gdb/testsuite] Add gdb.testsuite/dump-system-info.exp
Results regressed to
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-8
# build_abe newlib:
-6
# build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-5
# true:
0
# benchmark -- -O3_VECT_mthumb artifacts/build-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1/results_id:
1
from
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-8
# build_abe newlib:
-6
# build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard:
-5
# true:
0
# benchmark -- -O3_VECT_mthumb artifacts/build-baseline/results_id:
1
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_eabi_stm32/gnu_eabi-master-arm_eabi-coremark-O3_VECT
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea…
Reproduce builds:
<cut>
mkdir investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
cd investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/
cd binutils
# Reproduce first_bad build
git checkout --detach b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 3814a9e1fe77c01c7e872c25afa198537d4ac780
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1
Author: Tom de Vries <tdevries(a)suse.de>
Date: Fri Sep 24 12:39:14 2021 +0200
[gdb/testsuite] Add gdb.testsuite/dump-system-info.exp
When interpreting the testsuite results, it's often relevant what kind of
machine the testsuite ran on. On a local machine one can just do
/proc/cpuinfo, but in case of running tests using a remote system
that distributes test runs to other remote systems that are not directly
accessible, that's not possible.
Fix this by dumping /proc/cpuinfo into the gdb.log, as well as lsb_release -a
and uname -a.
We could do this at the start of each test run, by putting it into unix.exp
or some such. However, this might be too verbose, so we choose to put it into
its own test-case, such that it get triggered in a full testrun, but not when
running one or a subset of tests.
We put the test-case into the gdb.testsuite directory, which is currently the
only place in the testsuite where we do not test gdb. [ Though perhaps this
could be put into a new gdb.info directory, since the test-case doesn't
actually test the testsuite. ]
Tested on x86_64-linux.
---
gdb/testsuite/gdb.testsuite/dump-system-info.exp | 48 ++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/gdb/testsuite/gdb.testsuite/dump-system-info.exp b/gdb/testsuite/gdb.testsuite/dump-system-info.exp
new file mode 100644
index 00000000000..bf181469bd5
--- /dev/null
+++ b/gdb/testsuite/gdb.testsuite/dump-system-info.exp
@@ -0,0 +1,48 @@
+# Copyright 2021 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# The purpose of this test-case is to dump /proc/cpuinfo and similar system
+# info into gdb.log.
+
+# Check if /proc/cpuinfo is available.
+set res [remote_exec target "test -r /proc/cpuinfo"]
+set status [lindex $res 0]
+set output [lindex $res 1]
+
+if { $status == 0 && $output == "" } {
+ verbose -log "Cpuinfo available, dumping:"
+ remote_exec target "cat /proc/cpuinfo"
+} else {
+ verbose -log "Cpuinfo not available"
+}
+
+set res [remote_exec target "lsb_release -a"]
+set status [lindex $res 0]
+set output [lindex $res 1]
+
+if { $status == 0 } {
+ verbose -log "lsb_release -a availabe, dumping:\n$output"
+} else {
+ verbose -log "lsb_release -a not available"
+}
+
+set res [remote_exec target "uname -a"]
+set status [lindex $res 0]
+set output [lindex $res 1]
+
+if { $status == 0 } {
+ verbose -log "uname -a availabe, dumping:\n$output"
+} else {
+ verbose -log "uname -a not available"
+}
</cut>
After llvm commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
Author: Erich Keane <erich.keane(a)intel.com>
Fix test from 8dd42f, capitalization in test
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 3% from 10973 to 11249 perf samples
- 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 12% from 1446 to 1619 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
cd investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 77d200a546136c2855063613ff4bca1f682fb23a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af
Author: Erich Keane <erich.keane(a)intel.com>
Date: Fri Sep 24 10:24:17 2021 -0700
Fix test from 8dd42f, capitalization in test
---
clang/test/CXX/drs/dr17xx.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clang/test/CXX/drs/dr17xx.cpp b/clang/test/CXX/drs/dr17xx.cpp
index 42303c83ae3c..c8648908ebda 100644
--- a/clang/test/CXX/drs/dr17xx.cpp
+++ b/clang/test/CXX/drs/dr17xx.cpp
@@ -129,7 +129,7 @@ namespace dr1778 { // dr1778: 9
namespace dr1762 { // dr1762: 14
#if __cplusplus >= 201103L
float operator ""_E(const char *);
- // expected-error@+2 {{invalid suffix on literal; c++11 requires a space between literal and identifier}}
+ // expected-error@+2 {{invalid suffix on literal; C++11 requires a space between literal and identifier}}
// expected-warning@+1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}}
float operator ""E(const char *);
#endif
</cut>
Thanks, Stanislav,
FWIW, it will be, probably, easier for you to just rebuild the compiler, it is an x86_64-linux-gnu -> arm-linux-gnueabihf cross. This link has the build log [1].
cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=ARM
Then compile the pre-processed source with plain -O2 or -O3 optimisation settings.
[1] https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-…
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
> On 24 Sep 2021, at 20:30, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
>
> [AMD Official Use Only]
>
> I have reverted the whole change. There was yet another perf regression report.
>
> Stas
>
> From: Mekhanoshin, Stanislav
> Sent: Thursday, September 23, 2021 11:48
> To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
>
> Thanks. I see the reload. There shall not be extra pressure since that is the whole idea, make pressure less. However, I see more spills in that specific file, fast_algorithms.s if I get it right.
> Can I get the IR for it? Something to feed llc.
>
> Stas
>
> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> Sent: Thursday, September 23, 2021 2:31
> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
>
> [CAUTION: External Email]
>
> Thanks, Stanislav.
>
> I’ve looked into profile dumps, and 456.hmmer’s hot loop get several additional reloads. E.g., "ldr r1, [sp, #84]” generates 203 additional samples, which translates into 20 seconds of time just for that one instruction.
>
> See the attached profile dumps and the the screenshot with the hot loop highlighted.
>
> Maybe your patch increases register pressure too much?
>
> Regards,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
> > On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
> >
> > [AMD Official Use Only]
> >
> > There are actually couple things worth to try if that is easy:
> >
> > https://reviews.llvm.org/D109077
> > https://reviews.llvm.org/differential/diff/374324/
> >
> > Both may slightly change spill weights and then spilling pattern.
> >
> > Stas
> >
> > -----Original Message-----
> > From: Mekhanoshin, Stanislav
> > Sent: Wednesday, September 22, 2021 12:09
> > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >
> > I assume some of the newly rematerialized instructions caused perf drops. Probably some very specific ones. I would appreciate if you could point them to me.
> > In addition I believe I would need to have a linked or optimized bitcode to feed into llc.
> >
> > Stas
> >
> > -----Original Message-----
> > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> > Sent: Wednesday, September 22, 2021 12:06
> > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >
> > [CAUTION: External Email]
> >
> > Hi Stanislav,
> >
> > That's fair; I or someone from Linaro will try to analyze this and follow up here.
> >
> > On a more general note, what info would you like to see in these benchmarking regression reports?
> >
> > Thanks,
> >
> > --
> > Maxim Kuvyrkov
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar…
> >
> >
> >> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
> >>
> >> [AMD Official Use Only]
> >>
> >> Hm... I'd really like to help, but I do not think I can do anything with megabytes of code in an asm which I do not understand and tons of differences in 48 asm files.
> >> What I can see there is overall less spilling code which was the intent in the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 less of them.
> >> I doubt I could say much more without someone pointing to the actual root cause.
> >>
> >> Stas
> >>
> >> -----Original Message-----
> >> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> >> Sent: Wednesday, September 22, 2021 5:16
> >> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> >> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> >> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >>
> >> [CAUTION: External Email]
> >>
> >> Hi Stanislav,
> >>
> >> Attached is a tarball with -save-temps output (pre-processed source and generated assembly) for first-bad run (your commit) and last-good run (immediate parent of your commit).
> >>
> >> --
> >> Maxim Kuvyrkov
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar…
> >>
> >>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote:
> >>>
> >>> [AMD Official Use Only]
> >>>
> >>> Thanks for letting me know. Some regressions are inevitable, however do you happen to have any analysis and dumps? I myself do not understand ARM ISA well...
> >>>
> >>> Stas
> >>>
> >>> -----Original Message-----
> >>> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org>
> >>> Sent: Wednesday, September 15, 2021 5:52
> >>> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com>
> >>> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org>
> >>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses
> >>>
> >>> [CAUTION: External Email]
> >>>
> >>> Hi Stanislav,
> >>>
> >>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels.
> >>>
> >>> --
> >>> Maxim Kuvyrkov
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar…
> >>>
> >>
> >>
> >
> <image001.png>
I’m trying to import these files into Ds-5. After unzipping files, it still will not show up in ds-5 search. Below is the error that I keep receiving:
Sent from my iPhone
After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez <aldyh(a)redhat.com>
Avoid invalid loop transformations in jump threading registry.
the following benchmarks grew in size by more than 1%:
- 450.soplex grew in size by 2% from 207260 to 211436 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os -flto
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez <aldyh(a)redhat.com>
Date: Thu Sep 23 10:59:24 2021 +0200
Avoid invalid loop transformations in jump threading registry.
My upcoming improvements to the forward jump threader make it thread
more aggressively. In investigating some "regressions", I noticed
that it has always allowed threading through empty latches and across
loop boundaries. As we have discussed recently, this should be avoided
until after loop optimizations have run their course.
Note that this wasn't much of a problem before because DOM/VRP
couldn't find these opportunities, but with a smarter solver, we trip
over them more easily.
Because the forward threader doesn't have an independent localized cost
model like the new threader (profitable_path_p), it is difficult to
catch these things at discovery. However, we can catch them at
registration time, with the added benefit that all the threaders
(forward and backward) can share the handcuffs.
This patch is an adaptation of what we do in the backward threader, but
it is not meant to catch everything we do there, as some of the
restrictions there are due to limitations of the different block
copiers (for example, the generic copier does not re-use existing
threading paths).
We could ideally remove the now redundant bits in profitable_path_p, but
I would prefer not to for two reasons. First, the backward threader uses
profitable_path_p as it discovers paths to avoid discovering paths in
unprofitable directions. Second, I would like to merge all the forward
cost restrictions into the profitability class in the backward threader,
not the other way around. Alas, that reshuffling will have to wait for
the next release.
As usual, there are quite a few tests that needed adjustments. It seems
we were quite happily threading improper scenarios. With most of them,
as can be seen in pr77445-2.c, we're merely shifting the threading to
after loop optimizations.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
New.
(jt_path_registry::register_jump_thread): Call
cancel_invalid_paths.
* tree-ssa-threadupdate.h (class jt_path_registry): Add
cancel_invalid_paths.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/20030714-2.c: Adjust.
* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
* gcc.dg/vect/bb-slp-16.c: Adjust.
---
gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++-
gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++---
gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +-
gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 ---
gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++-----
gcc/tree-ssa-threadupdate.h | 1 +
8 files changed, 78 insertions(+), 35 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
index eb663f2ff5b..9585ff11307 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c
@@ -32,7 +32,8 @@ get_alias_set (t)
}
}
-/* There should be exactly three IF conditionals if we thread jumps
- properly. */
-/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
+/* There should be exactly 4 IF conditionals if we thread jumps
+ properly. There used to be 3, but one thread was crossing
+ loops. */
+/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index e1464e21170..922a331b217 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
extern int status, pt;
extern int count;
@@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a)
pt--;
}
-/* There are 4 jump threading opportunities, all of which will be
- realized, which will eliminate testing of FLAG, completely. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */
+/* There are 2 jump threading opportunities (which don't cross loops),
+ all of which will be realized, which will eliminate testing of
+ FLAG, completely. */
+/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
-/* There should be no assignments or references to FLAG, verify they're
- eliminated as early as possible. */
-/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
+/* We used to remove references to FLAG by DCE2, but this was
+ depending on early threaders threading through loop boundaries
+ (which we shouldn't do). However, the late threading passes, which
+ run after loop optimizations , can successfully eliminate the
+ references to FLAG. Verify that ther are no references by the late
+ threading passes. */
+/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
index f9fc212f49e..01a0f1f197d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
@@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) {
aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities. Skip the later tests on aarch64. */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 60d4f76f076..2d78d045516 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -21,5 +21,7 @@
condition.
All the cases are picked up by VRP1 as jump threads. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
+
+/* There used to be 6 jump threads found by thread1, but they all
+ depended on threading through distinct loops in ethread. */
/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index e3d4b311c03..16abcde5053 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,8 +1,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 664e93e9b60..e68a9b62535 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -1,8 +1,5 @@
/* { dg-require-effective-target vect_int } */
-/* See note below as to why we disable threading. */
-/* { dg-additional-options "-fdisable-tree-thread1" } */
-
#include <stdarg.h>
#include "tree-vect.h"
@@ -30,10 +27,6 @@ main1 (int dummy)
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
- /* In some architectures like ppc64, jump threading may thread
- the iteration where i==0 such that we no longer optimize the
- BB. Another alternative to disable jump threading would be
- to wrap the read from `i' into a function returning i. */
if (arr[i] = i)
a = i;
else
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index baac11280fa..2b9b8f81274 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
return retval;
}
+bool
+jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
+{
+ gcc_checking_assert (!path.is_empty ());
+ edge taken_edge = path[path.length () - 1]->e;
+ loop_p loop = taken_edge->src->loop_father;
+ bool seen_latch = false;
+ bool path_crosses_loops = false;
+
+ for (unsigned int i = 0; i < path.length (); i++)
+ {
+ edge e = path[i]->e;
+
+ if (e == NULL)
+ {
+ // NULL outgoing edges on a path can happen for jumping to a
+ // constant address.
+ cancel_thread (&path, "Found NULL edge in jump threading path");
+ return true;
+ }
+
+ if (loop->latch == e->src || loop->latch == e->dest)
+ seen_latch = true;
+
+ // The first entry represents the block with an outgoing edge
+ // that we will redirect to the jump threading path. Thus we
+ // don't care about that block's loop father.
+ if ((i > 0 && e->src->loop_father != loop)
+ || e->dest->loop_father != loop)
+ path_crosses_loops = true;
+
+ if (flag_checking && !m_backedge_threads)
+ gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
+ }
+
+ if (cfun->curr_properties & PROP_loop_opts_done)
+ return false;
+
+ if (seen_latch && empty_block_p (loop->latch))
+ {
+ cancel_thread (&path, "Threading through latch before loop opts "
+ "would create non-empty latch");
+ return true;
+ }
+ if (path_crosses_loops)
+ {
+ cancel_thread (&path, "Path crosses loops");
+ return true;
+ }
+ return false;
+}
+
/* Register a jump threading opportunity. We queue up all the jump
threading opportunities discovered by a pass and update the CFG
and SSA form all at once.
@@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
return false;
}
- /* First make sure there are no NULL outgoing edges on the jump threading
- path. That can happen for jumping to a constant address. */
- for (unsigned int i = 0; i < path->length (); i++)
- {
- if ((*path)[i]->e == NULL)
- {
- cancel_thread (path, "Found NULL edge in jump threading path");
- return false;
- }
-
- if (flag_checking && !m_backedge_threads)
- gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0);
- }
+ if (cancel_invalid_paths (*path))
+ return false;
if (dump_file && (dump_flags & TDF_DETAILS))
dump_jump_thread_path (dump_file, *path, true);
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 8b48a671212..d68795c9f27 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -75,6 +75,7 @@ protected:
unsigned long m_num_threaded_edges;
private:
virtual bool update_cfg (bool peel_loop_headers) = 0;
+ bool cancel_invalid_paths (vec<jump_thread_edge *> &path);
jump_thread_path_allocator m_allocator;
// True if threading through back edges is allowed. This is only
// allowed in the generic copier in the backward threader.
</cut>