- linaro-toolchain - lists.linaro.org

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-release-aarch64-next-allnoconfig - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-aarch64-next-allnoconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-release-aarch64-next-allnoconfig Culprit: <cut> commit 8633ef82f101c040427b57d4df7b706261420b94 Author: Javier Martinez Canillas <javierm(a)redhat.com> Date: Fri Jun 25 15:13:59 2021 +0200 drivers/firmware: consolidate EFI framebuffer setup for all arches The register_gop_device() function registers an "efi-framebuffer" platform device to match against the efifb driver, to have an early framebuffer for EFI platforms. But there is already support to do exactly the same by the Generic System Framebuffers (sysfb) driver. This used to be only for X86 but it has been moved to drivers/firmware and could be reused by other architectures. Also, besides supporting registering an "efi-framebuffer", this driver can register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers on non-X86 EFI platforms. For example, on aarch64 these drivers can only be used with DT and doesn't have code to register a "simple-frambuffer" platform device when booting with EFI. For these reasons, let's remove the register_gop_device() duplicated code and instead move the platform specific logic that's there to sysfb driver. Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com> Acked-by: Borislav Petkov <bp(a)suse.de> Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi… </cut> Results regressed to (for first_bad == 8633ef82f101c040427b57d4df7b706261420b94) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 600 # First few build errors in logs: # 00:00:38 ld.lld: error: undefined symbol: screen_info # 00:00:38 make: *** [vmlinux] Error 1 from (for last_good == d391c58271072d0b0fad93c82018d495b2633448) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 601 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ff11764…" Reproduce builds: <cut> mkdir investigate-linux-8633ef82f101c040427b57d4df7b706261420b94 cd investigate-linux-8633ef82f101c040427b57d4df7b706261420b94 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 8633ef82f101c040427b57d4df7b706261420b94 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d391c58271072d0b0fad93c82018d495b2633448 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-aarch64-next… Full commit (up to 1000 lines): <cut> commit 8633ef82f101c040427b57d4df7b706261420b94 Author: Javier Martinez Canillas <javierm(a)redhat.com> Date: Fri Jun 25 15:13:59 2021 +0200 drivers/firmware: consolidate EFI framebuffer setup for all arches The register_gop_device() function registers an "efi-framebuffer" platform device to match against the efifb driver, to have an early framebuffer for EFI platforms. But there is already support to do exactly the same by the Generic System Framebuffers (sysfb) driver. This used to be only for X86 but it has been moved to drivers/firmware and could be reused by other architectures. Also, besides supporting registering an "efi-framebuffer", this driver can register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers on non-X86 EFI platforms. For example, on aarch64 these drivers can only be used with DT and doesn't have code to register a "simple-frambuffer" platform device when booting with EFI. For these reasons, let's remove the register_gop_device() duplicated code and instead move the platform specific logic that's there to sysfb driver. Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com> Acked-by: Borislav Petkov <bp(a)suse.de> Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javi… --- arch/arm/include/asm/efi.h | 5 +-- arch/arm64/include/asm/efi.h | 5 +-- arch/riscv/include/asm/efi.h | 5 +-- drivers/firmware/Kconfig | 8 ++-- drivers/firmware/Makefile | 2 +- drivers/firmware/efi/efi-init.c | 90 --------------------------------------- drivers/firmware/efi/sysfb_efi.c | 76 ++++++++++++++++++++++++++++++++- drivers/firmware/sysfb.c | 35 ++++++++++----- drivers/firmware/sysfb_simplefb.c | 31 ++++++++++---- drivers/gpu/drm/tiny/Kconfig | 4 +- include/linux/sysfb.h | 26 +++++------ 11 files changed, 143 insertions(+), 144 deletions(-) diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h index 9de7ab2ce05d..a6f3b179e8a9 100644 --- a/arch/arm/include/asm/efi.h +++ b/arch/arm/include/asm/efi.h @@ -17,6 +17,7 @@ #ifdef CONFIG_EFI void efi_init(void); +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md); int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md); @@ -52,10 +53,6 @@ void efi_virtmap_unload(void); struct screen_info *alloc_screen_info(void); void free_screen_info(struct screen_info *si); -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) -{ -} - /* * A reasonable upper bound for the uncompressed kernel size is 32 MBytes, * so we will reserve that amount of memory. We have no easy way to tell what diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index 3578aba9c608..42d673a011c8 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -14,6 +14,7 @@ #ifdef CONFIG_EFI extern void efi_init(void); +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); #else #define efi_init() #endif @@ -85,10 +86,6 @@ static inline void free_screen_info(struct screen_info *si) { } -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) -{ -} - #define EFI_ALLOC_ALIGN SZ_64K /* diff --git a/arch/riscv/include/asm/efi.h b/arch/riscv/include/asm/efi.h index 6d98cd999680..7a8f0d45b13a 100644 --- a/arch/riscv/include/asm/efi.h +++ b/arch/riscv/include/asm/efi.h @@ -13,6 +13,7 @@ #ifdef CONFIG_EFI extern void efi_init(void); +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); #else #define efi_init() #endif @@ -39,10 +40,6 @@ static inline void free_screen_info(struct screen_info *si) { } -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) -{ -} - void efi_virtmap_load(void); void efi_virtmap_unload(void); diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index 71f3d97f0c39..af6719cc576b 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -254,9 +254,9 @@ config QCOM_SCM_DOWNLOAD_MODE_DEFAULT config SYSFB bool default y - depends on X86 || COMPILE_TEST + depends on X86 || ARM || ARM64 || RISCV || COMPILE_TEST -config X86_SYSFB +config SYSFB_SIMPLEFB bool "Mark VGA/VBE/EFI FB as generic system framebuffer" depends on SYSFB help @@ -264,10 +264,10 @@ config X86_SYSFB bootloader or kernel can show basic video-output during boot for user-guidance and debugging. Historically, x86 used the VESA BIOS Extensions and EFI-framebuffers for this, which are mostly limited - to x86. + to x86 BIOS or EFI systems. This option, if enabled, marks VGA/VBE/EFI framebuffers as generic framebuffers so the new generic system-framebuffer drivers can be - used on x86. If the framebuffer is not compatible with the generic + used instead. If the framebuffer is not compatible with the generic modes, it is advertised as fallback platform framebuffer so legacy drivers like efifb, vesafb and uvesafb can pick it up. If this option is not selected, all system framebuffers are always diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile index ad78f78ffa8d..6ac637e422b9 100644 --- a/drivers/firmware/Makefile +++ b/drivers/firmware/Makefile @@ -19,7 +19,7 @@ obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o obj-$(CONFIG_SYSFB) += sysfb.o -obj-$(CONFIG_X86_SYSFB) += sysfb_simplefb.o +obj-$(CONFIG_SYSFB_SIMPLEFB) += sysfb_simplefb.o obj-$(CONFIG_TI_SCI_PROTOCOL) += ti_sci.o obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o obj-$(CONFIG_TURRIS_MOX_RWTM) += turris-mox-rwtm.o diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c index a552a08a1741..b19ce1a83f91 100644 --- a/drivers/firmware/efi/efi-init.c +++ b/drivers/firmware/efi/efi-init.c @@ -275,93 +275,3 @@ void __init efi_init(void) } #endif } - -static bool efifb_overlaps_pci_range(const struct of_pci_range *range) -{ - u64 fb_base = screen_info.lfb_base; - - if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) - fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32; - - return fb_base >= range->cpu_addr && - fb_base < (range->cpu_addr + range->size); -} - -static struct device_node *find_pci_overlap_node(void) -{ - struct device_node *np; - - for_each_node_by_type(np, "pci") { - struct of_pci_range_parser parser; - struct of_pci_range range; - int err; - - err = of_pci_range_parser_init(&parser, np); - if (err) { - pr_warn("of_pci_range_parser_init() failed: %d\n", err); - continue; - } - - for_each_of_pci_range(&parser, &range) - if (efifb_overlaps_pci_range(&range)) - return np; - } - return NULL; -} - -/* - * If the efifb framebuffer is backed by a PCI graphics controller, we have - * to ensure that this relation is expressed using a device link when - * running in DT mode, or the probe order may be reversed, resulting in a - * resource reservation conflict on the memory window that the efifb - * framebuffer steals from the PCIe host bridge. - */ -static int efifb_add_links(struct fwnode_handle *fwnode) -{ - struct device_node *sup_np; - - sup_np = find_pci_overlap_node(); - - /* - * If there's no PCI graphics controller backing the efifb, we are - * done here. - */ - if (!sup_np) - return 0; - - fwnode_link_add(fwnode, of_fwnode_handle(sup_np)); - of_node_put(sup_np); - - return 0; -} - -static const struct fwnode_operations efifb_fwnode_ops = { - .add_links = efifb_add_links, -}; - -static struct fwnode_handle efifb_fwnode; - -static int __init register_gop_device(void) -{ - struct platform_device *pd; - int err; - - if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI) - return 0; - - pd = platform_device_alloc("efi-framebuffer", 0); - if (!pd) - return -ENOMEM; - - if (IS_ENABLED(CONFIG_PCI)) { - fwnode_init(&efifb_fwnode, &efifb_fwnode_ops); - pd->dev.fwnode = &efifb_fwnode; - } - - err = platform_device_add_data(pd, &screen_info, sizeof(screen_info)); - if (err) - return err; - - return platform_device_add(pd); -} -subsys_initcall(register_gop_device); diff --git a/drivers/firmware/efi/sysfb_efi.c b/drivers/firmware/efi/sysfb_efi.c index 9f035b15501c..f51865e1b876 100644 --- a/drivers/firmware/efi/sysfb_efi.c +++ b/drivers/firmware/efi/sysfb_efi.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * Generic System Framebuffers on x86 + * Generic System Framebuffers * Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com> * * EFI Quirks Copyright (c) 2006 Edgar Hucek <gimli(a)dark-green.com> @@ -19,7 +19,9 @@ #include <linux/init.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/of_address.h> #include <linux/pci.h> +#include <linux/platform_device.h> #include <linux/screen_info.h> #include <linux/sysfb.h> #include <video/vga.h> @@ -267,7 +269,72 @@ static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = { {}, }; -__init void sysfb_apply_efi_quirks(void) +static bool efifb_overlaps_pci_range(const struct of_pci_range *range) +{ + u64 fb_base = screen_info.lfb_base; + + if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) + fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32; + + return fb_base >= range->cpu_addr && + fb_base < (range->cpu_addr + range->size); +} + +static struct device_node *find_pci_overlap_node(void) +{ + struct device_node *np; + + for_each_node_by_type(np, "pci") { + struct of_pci_range_parser parser; + struct of_pci_range range; + int err; + + err = of_pci_range_parser_init(&parser, np); + if (err) { + pr_warn("of_pci_range_parser_init() failed: %d\n", err); + continue; + } + + for_each_of_pci_range(&parser, &range) + if (efifb_overlaps_pci_range(&range)) + return np; + } + return NULL; +} + +/* + * If the efifb framebuffer is backed by a PCI graphics controller, we have + * to ensure that this relation is expressed using a device link when + * running in DT mode, or the probe order may be reversed, resulting in a + * resource reservation conflict on the memory window that the efifb + * framebuffer steals from the PCIe host bridge. + */ +static int efifb_add_links(struct fwnode_handle *fwnode) +{ + struct device_node *sup_np; + + sup_np = find_pci_overlap_node(); + + /* + * If there's no PCI graphics controller backing the efifb, we are + * done here. + */ + if (!sup_np) + return 0; + + fwnode_link_add(fwnode, of_fwnode_handle(sup_np)); + of_node_put(sup_np); + + return 0; +} + +static const struct fwnode_operations efifb_fwnode_ops = { + .add_links = efifb_add_links, +}; + +static struct fwnode_handle efifb_fwnode; + +__init void sysfb_apply_efi_quirks(struct platform_device *pd) { if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI || !(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS)) @@ -281,4 +348,9 @@ __init void sysfb_apply_efi_quirks(void) screen_info.lfb_height = temp; screen_info.lfb_linelength = 4 * screen_info.lfb_width; } + + if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI && IS_ENABLED(CONFIG_PCI)) { + fwnode_init(&efifb_fwnode, &efifb_fwnode_ops); + pd->dev.fwnode = &efifb_fwnode; + } } diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c index 1337515963d5..2bfbb05f7d89 100644 --- a/drivers/firmware/sysfb.c +++ b/drivers/firmware/sysfb.c @@ -1,11 +1,11 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * Generic System Framebuffers on x86 + * Generic System Framebuffers * Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com> */ /* - * Simple-Framebuffer support for x86 systems + * Simple-Framebuffer support * Create a platform-device for any available boot framebuffer. The * simple-framebuffer platform device is already available on DT systems, so * this module parses the global "screen_info" object and creates a suitable @@ -16,12 +16,12 @@ * to pick these devices up without messing with simple-framebuffer drivers. * The global "screen_info" is still valid at all times. * - * If CONFIG_X86_SYSFB is not selected, we never register "simple-framebuffer" + * If CONFIG_SYSFB_SIMPLEFB is not selected, never register "simple-framebuffer" * platform devices, but only use legacy framebuffer devices for * backwards compatibility. * * TODO: We set the dev_id field of all platform-devices to 0. This allows - * other x86 OF/DT parsers to create such devices, too. However, they must + * other OF/DT parsers to create such devices, too. However, they must * start at offset 1 for this to work. */ @@ -43,12 +43,10 @@ static __init int sysfb_init(void) bool compatible; int ret; - sysfb_apply_efi_quirks(); - /* try to create a simple-framebuffer device */ - compatible = parse_mode(si, &mode); + compatible = sysfb_parse_mode(si, &mode); if (compatible) { - ret = create_simplefb(si, &mode); + ret = sysfb_create_simplefb(si, &mode); if (!ret) return 0; } @@ -61,9 +59,24 @@ static __init int sysfb_init(void) else name = "platform-framebuffer"; - pd = platform_device_register_resndata(NULL, name, 0, - NULL, 0, si, sizeof(*si)); - return PTR_ERR_OR_ZERO(pd); + pd = platform_device_alloc(name, 0); + if (!pd) + return -ENOMEM; + + sysfb_apply_efi_quirks(pd); + + ret = platform_device_add_data(pd, si, sizeof(*si)); + if (ret) + goto err; + + ret = platform_device_add(pd); + if (ret) + goto err; + + return 0; +err: + platform_device_put(pd); + return ret; } /* must execute after PCI subsystem for EFI quirks */ diff --git a/drivers/firmware/sysfb_simplefb.c b/drivers/firmware/sysfb_simplefb.c index df892444ea17..b86761904949 100644 --- a/drivers/firmware/sysfb_simplefb.c +++ b/drivers/firmware/sysfb_simplefb.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * Generic System Framebuffers on x86 + * Generic System Framebuffers * Copyright (c) 2012-2013 David Herrmann <dh.herrmann(a)gmail.com> */ @@ -23,9 +23,9 @@ static const char simplefb_resname[] = "BOOTFB"; static const struct simplefb_format formats[] = SIMPLEFB_FORMATS; -/* try parsing x86 screen_info into a simple-framebuffer mode struct */ -__init bool parse_mode(const struct screen_info *si, - struct simplefb_platform_data *mode) +/* try parsing screen_info into a simple-framebuffer mode struct */ +__init bool sysfb_parse_mode(const struct screen_info *si, + struct simplefb_platform_data *mode) { const struct simplefb_format *f; __u8 type; @@ -57,13 +57,14 @@ __init bool parse_mode(const struct screen_info *si, return false; } -__init int create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode) +__init int sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode) { struct platform_device *pd; struct resource res; u64 base, size; u32 length; + int ret; /* * If the 64BIT_BASE capability is set, ext_lfb_base will contain the @@ -105,7 +106,19 @@ __init int create_simplefb(const struct screen_info *si, if (res.end <= res.start) return -EINVAL; - pd = platform_device_register_resndata(NULL, "simple-framebuffer", 0, - &res, 1, mode, sizeof(*mode)); - return PTR_ERR_OR_ZERO(pd); + pd = platform_device_alloc("simple-framebuffer", 0); + if (!pd) + return -ENOMEM; + + sysfb_apply_efi_quirks(pd); + + ret = platform_device_add_resources(pd, &res, 1); + if (ret) + return ret; + + ret = platform_device_add_data(pd, mode, sizeof(*mode)); + if (ret) + return ret; + + return platform_device_add(pd); } diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig index 5593128eeff9..d31be274a2bd 100644 --- a/drivers/gpu/drm/tiny/Kconfig +++ b/drivers/gpu/drm/tiny/Kconfig @@ -64,8 +64,8 @@ config DRM_SIMPLEDRM buffer, size, and display format must be provided via device tree, UEFI, VESA, etc. - On x86 and compatible, you should also select CONFIG_X86_SYSFB to - use UEFI and VESA framebuffers. + On x86 BIOS or UEFI systems, you should also select SYSFB_SIMPLEFB + to use UEFI and VESA framebuffers. config TINYDRM_HX8357D tristate "DRM support for HX8357D display panels" diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index 3e5355769dc3..b0dcfa26d07b 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -58,37 +58,37 @@ struct efifb_dmi_info { #ifdef CONFIG_EFI extern struct efifb_dmi_info efifb_dmi_list[]; -void sysfb_apply_efi_quirks(void); +void sysfb_apply_efi_quirks(struct platform_device *pd); #else /* CONFIG_EFI */ -static inline void sysfb_apply_efi_quirks(void) +static inline void sysfb_apply_efi_quirks(struct platform_device *pd) { } #endif /* CONFIG_EFI */ -#ifdef CONFIG_X86_SYSFB +#ifdef CONFIG_SYSFB_SIMPLEFB -bool parse_mode(const struct screen_info *si, - struct simplefb_platform_data *mode); -int create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode); +bool sysfb_parse_mode(const struct screen_info *si, + struct simplefb_platform_data *mode); +int sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode); -#else /* CONFIG_X86_SYSFB */ +#else /* CONFIG_SYSFB_SIMPLE */ -static inline bool parse_mode(const struct screen_info *si, - struct simplefb_platform_data *mode) +static inline bool sysfb_parse_mode(const struct screen_info *si, + struct simplefb_platform_data *mode) { return false; } -static inline int create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode) +static inline int sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode) { return -EINVAL; } -#endif /* CONFIG_X86_SYSFB */ +#endif /* CONFIG_SYSFB_SIMPLE */ #endif /* _LINUX_SYSFB_H */ </cut>

3 years, 10 months

4
4
0 0

[TCWG CI] 400.perlbench slowed down by 6% after llvm: [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest

by ci_notify＠linaro.org

After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc Author: Arthur Eubanks <aeubanks(a)google.com> [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest the following benchmarks slowed down by more than 2%: - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc cd investigate-llvm-e7249e4acf3cf9438d6d9e02edecebd5b622a4dc # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach e7249e4acf3cf9438d6d9e02edecebd5b622a4dc ../artifacts/test.sh # Reproduce last_good build git checkout --detach 32a50078657dd8beead327a3478ede4e9d730432 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc Author: Arthur Eubanks <aeubanks(a)google.com> Date: Fri Aug 27 12:32:59 2021 -0700 [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837 --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 17 ++++++----- llvm/test/CodeGen/AArch64/csr-split.ll | 34 +++++++++++----------- .../fold-branch-to-common-dest-free-cost.ll | 5 ++-- 3 files changed, 29 insertions(+), 27 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 2ff98b238de0..a3bd89e72af9 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -3258,13 +3258,16 @@ bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU, SawVectorOp |= isVectorOp(I); // Account for the cost of duplicating this instruction into each - // predecessor. - NumBonusInsts += PredCount; - - // Early exits once we reach the limit. - if (NumBonusInsts > - BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier) - return false; + // predecessor. Ignore free instructions. + if (!TTI || + TTI->getUserCost(&I, CostKind) != TargetTransformInfo::TCC_Free) { + NumBonusInsts += PredCount; + + // Early exits once we reach the limit. + if (NumBonusInsts > + BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier) + return false; + } auto IsBCSSAUse = [BB, &I](Use &U) { auto *UI = cast<Instruction>(U.getUser()); diff --git a/llvm/test/CodeGen/AArch64/csr-split.ll b/llvm/test/CodeGen/AArch64/csr-split.ll index 1bee7f05acec..de85b4313433 100644 --- a/llvm/test/CodeGen/AArch64/csr-split.ll +++ b/llvm/test/CodeGen/AArch64/csr-split.ll @@ -82,22 +82,22 @@ define dso_local signext i32 @test2(i32* %p1) local_unnamed_addr { ; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: .cfi_offset w19, -8 ; CHECK-NEXT: .cfi_offset w30, -16 -; CHECK-NEXT: cbz x0, .LBB1_2 -; CHECK-NEXT: // %bb.1: // %if.end +; CHECK-NEXT: cbz x0, .LBB1_3 +; CHECK-NEXT: // %bb.1: // %entry ; CHECK-NEXT: adrp x8, a ; CHECK-NEXT: ldrsw x8, [x8, :lo12:a] ; CHECK-NEXT: mov x19, x0 ; CHECK-NEXT: cmp x8, x0 -; CHECK-NEXT: b.eq .LBB1_3 -; CHECK-NEXT: .LBB1_2: // %return -; CHECK-NEXT: mov w0, wzr -; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload -; CHECK-NEXT: ret -; CHECK-NEXT: .LBB1_3: // %if.then2 +; CHECK-NEXT: b.ne .LBB1_3 +; CHECK-NEXT: // %bb.2: // %if.then2 ; CHECK-NEXT: bl callVoid ; CHECK-NEXT: mov x0, x19 ; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload ; CHECK-NEXT: b callNonVoid +; CHECK-NEXT: .LBB1_3: // %return +; CHECK-NEXT: mov w0, wzr +; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload +; CHECK-NEXT: ret ; ; CHECK-APPLE-LABEL: test2: ; CHECK-APPLE: ; %bb.0: ; %entry @@ -108,26 +108,26 @@ define dso_local signext i32 @test2(i32* %p1) local_unnamed_addr { ; CHECK-APPLE-NEXT: .cfi_offset w29, -16 ; CHECK-APPLE-NEXT: .cfi_offset w19, -24 ; CHECK-APPLE-NEXT: .cfi_offset w20, -32 -; CHECK-APPLE-NEXT: cbz x0, LBB1_2 -; CHECK-APPLE-NEXT: ; %bb.1: ; %if.end +; CHECK-APPLE-NEXT: cbz x0, LBB1_3 +; CHECK-APPLE-NEXT: ; %bb.1: ; %entry ; CHECK-APPLE-NEXT: Lloh2: ; CHECK-APPLE-NEXT: adrp x8, _a@PAGE ; CHECK-APPLE-NEXT: Lloh3: ; CHECK-APPLE-NEXT: ldrsw x8, [x8, _a@PAGEOFF] ; CHECK-APPLE-NEXT: mov x19, x0 ; CHECK-APPLE-NEXT: cmp x8, x0 -; CHECK-APPLE-NEXT: b.eq LBB1_3 -; CHECK-APPLE-NEXT: LBB1_2: ; %return -; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload -; CHECK-APPLE-NEXT: mov w0, wzr -; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload -; CHECK-APPLE-NEXT: ret -; CHECK-APPLE-NEXT: LBB1_3: ; %if.then2 +; CHECK-APPLE-NEXT: b.ne LBB1_3 +; CHECK-APPLE-NEXT: ; %bb.2: ; %if.then2 ; CHECK-APPLE-NEXT: bl _callVoid ; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload ; CHECK-APPLE-NEXT: mov x0, x19 ; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload ; CHECK-APPLE-NEXT: b _callNonVoid +; CHECK-APPLE-NEXT: LBB1_3: ; %return +; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload +; CHECK-APPLE-NEXT: mov w0, wzr +; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload +; CHECK-APPLE-NEXT: ret ; CHECK-APPLE-NEXT: .loh AdrpLdr Lloh2, Lloh3 entry: %tobool = icmp eq i32* %p1, null diff --git a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll index ace2a5ed35ca..27df5ec44582 100644 --- a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll +++ b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll @@ -8,12 +8,11 @@ declare void @g2() define void @f(i8* %a, i8* %b, i1 %c, i1 %d, i1 %e) { ; CHECK-LABEL: @f( -; CHECK-NEXT: br i1 [[C:%.*]], label [[L1:%.*]], label [[L3:%.*]] -; CHECK: l1: ; CHECK-NEXT: [[A1:%.*]] = call i8* @llvm.strip.invariant.group.p0i8(i8* [[A:%.*]]) ; CHECK-NEXT: [[B1:%.*]] = call i8* @llvm.strip.invariant.group.p0i8(i8* [[B:%.*]]) ; CHECK-NEXT: [[I:%.*]] = icmp eq i8* [[A1]], [[B1]] -; CHECK-NEXT: br i1 [[I]], label [[L2:%.*]], label [[L3]] +; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C:%.*]], i1 [[I]], i1 false +; CHECK-NEXT: br i1 [[OR_COND]], label [[L2:%.*]], label [[L3:%.*]] ; CHECK: l2: ; CHECK-NEXT: call void @g1() ; CHECK-NEXT: br label [[RET:%.*]] </cut>

3 years, 10 months

3
4
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-next-allyesconfig - Build # 14 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-next-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-next-allyesconfig Culprit: <cut> commit 3d463dd5023b5a58b3c37207d65eeb5acbac2be3 Author: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Date: Thu Jul 29 12:40:19 2021 +0200 nfc: fdp: constify several pointers Several functions do not modify pointed data so arguments and local variables can be const for correctness and safety. This allows also making file-scope nci_core_get_config_otp_ram_version array const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> </cut> Results regressed to (for first_bad == 3d463dd5023b5a58b3c37207d65eeb5acbac2be3) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 19928 # First few build errors in logs: # 00:02:04 drivers/nfc/fdp/fdp.c:116:60: error: passing 'const char *' to parameter of type '__u8 *' (aka 'unsigned char *') discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers] # 00:02:04 make[3]: *** [scripts/Makefile.build:271: drivers/nfc/fdp/fdp.o] Error 1 # 00:02:05 make[2]: *** [scripts/Makefile.build:514: drivers/nfc/fdp] Error 2 # 00:02:23 make[1]: *** [scripts/Makefile.build:514: drivers/nfc] Error 2 # 00:04:16 make: *** [Makefile:1842: drivers] Error 2 from (for last_good == c3e26b6dc1b4e3e8f57be4f004b1f2a410c5c468) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 20000 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#cb16362…" Reproduce builds: <cut> mkdir investigate-linux-3d463dd5023b5a58b3c37207d65eeb5acbac2be3 cd investigate-linux-3d463dd5023b5a58b3c37207d65eeb5acbac2be3 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 3d463dd5023b5a58b3c37207d65eeb5acbac2be3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c3e26b6dc1b4e3e8f57be4f004b1f2a410c5c468 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-next-… Full commit (up to 1000 lines): <cut> commit 3d463dd5023b5a58b3c37207d65eeb5acbac2be3 Author: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Date: Thu Jul 29 12:40:19 2021 +0200 nfc: fdp: constify several pointers Several functions do not modify pointed data so arguments and local variables can be const for correctness and safety. This allows also making file-scope nci_core_get_config_otp_ram_version array const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> --- drivers/nfc/fdp/fdp.c | 18 +++++++++--------- drivers/nfc/fdp/fdp.h | 2 +- drivers/nfc/fdp/i2c.c | 6 +++--- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/nfc/fdp/fdp.c b/drivers/nfc/fdp/fdp.c index 3f5fba922c4d..c6b3334f24c9 100644 --- a/drivers/nfc/fdp/fdp.c +++ b/drivers/nfc/fdp/fdp.c @@ -52,7 +52,7 @@ struct fdp_nci_info { u32 limited_otp_version; u8 key_index; - u8 *fw_vsc_cfg; + const u8 *fw_vsc_cfg; u8 clock_type; u32 clock_freq; @@ -65,7 +65,7 @@ struct fdp_nci_info { wait_queue_head_t setup_wq; }; -static u8 nci_core_get_config_otp_ram_version[5] = { +static const u8 nci_core_get_config_otp_ram_version[5] = { 0x04, NCI_PARAM_ID_FW_RAM_VERSION, NCI_PARAM_ID_FW_OTP_VERSION, @@ -111,7 +111,7 @@ static inline int fdp_nci_patch_cmd(struct nci_dev *ndev, u8 type) } static inline int fdp_nci_set_production_data(struct nci_dev *ndev, u8 len, - char *data) + const char *data) { return nci_prop_cmd(ndev, NCI_OP_PROP_SET_PDATA_OID, len, data); } @@ -236,7 +236,7 @@ static int fdp_nci_send_patch(struct nci_dev *ndev, u8 conn_id, u8 type) static int fdp_nci_open(struct nci_dev *ndev) { - struct fdp_nci_info *info = nci_get_drvdata(ndev); + const struct fdp_nci_info *info = nci_get_drvdata(ndev); return info->phy_ops->enable(info->phy); } @@ -260,7 +260,7 @@ static int fdp_nci_request_firmware(struct nci_dev *ndev) { struct fdp_nci_info *info = nci_get_drvdata(ndev); struct device *dev = &info->phy->i2c_dev->dev; - u8 *data; + const u8 *data; int r; r = request_firmware(&info->ram_patch, FDP_RAM_PATCH_NAME, dev); @@ -269,7 +269,7 @@ static int fdp_nci_request_firmware(struct nci_dev *ndev) return r; } - data = (u8 *) info->ram_patch->data; + data = info->ram_patch->data; info->ram_patch_version = data[FDP_FW_HEADER_SIZE] | (data[FDP_FW_HEADER_SIZE + 1] << 8) | @@ -610,9 +610,9 @@ static int fdp_nci_core_get_config_rsp_packet(struct nci_dev *ndev, { struct fdp_nci_info *info = nci_get_drvdata(ndev); struct device *dev = &info->phy->i2c_dev->dev; - struct nci_core_get_config_rsp *rsp = (void *) skb->data; + const struct nci_core_get_config_rsp *rsp = (void *) skb->data; unsigned int i; - u8 *p; + const u8 *p; if (rsp->status == NCI_STATUS_OK) { @@ -691,7 +691,7 @@ static const struct nci_ops nci_ops = { int fdp_nci_probe(struct fdp_i2c_phy *phy, const struct nfc_phy_ops *phy_ops, struct nci_dev **ndevp, int tx_headroom, int tx_tailroom, u8 clock_type, u32 clock_freq, - u8 *fw_vsc_cfg) + const u8 *fw_vsc_cfg) { struct device *dev = &phy->i2c_dev->dev; struct fdp_nci_info *info; diff --git a/drivers/nfc/fdp/fdp.h b/drivers/nfc/fdp/fdp.h index dc048d4b977e..2e9161a4d7bf 100644 --- a/drivers/nfc/fdp/fdp.h +++ b/drivers/nfc/fdp/fdp.h @@ -23,7 +23,7 @@ struct fdp_i2c_phy { int fdp_nci_probe(struct fdp_i2c_phy *phy, const struct nfc_phy_ops *phy_ops, struct nci_dev **ndev, int tx_headroom, int tx_tailroom, - u8 clock_type, u32 clock_freq, u8 *fw_vsc_cfg); + u8 clock_type, u32 clock_freq, const u8 *fw_vsc_cfg); void fdp_nci_remove(struct nci_dev *ndev); #endif /* __LOCAL_FDP_H_ */ diff --git a/drivers/nfc/fdp/i2c.c b/drivers/nfc/fdp/i2c.c index 98e1876c9468..051c43a2a52f 100644 --- a/drivers/nfc/fdp/i2c.c +++ b/drivers/nfc/fdp/i2c.c @@ -36,7 +36,7 @@ print_hex_dump(KERN_DEBUG, prefix": ", DUMP_PREFIX_OFFSET, \ 16, 1, (skb)->data, (skb)->len, 0) -static void fdp_nci_i2c_reset(struct fdp_i2c_phy *phy) +static void fdp_nci_i2c_reset(const struct fdp_i2c_phy *phy) { /* Reset RST/WakeUP for at least 100 micro-second */ gpiod_set_value_cansleep(phy->power_gpio, FDP_POWER_OFF); @@ -47,7 +47,7 @@ static void fdp_nci_i2c_reset(struct fdp_i2c_phy *phy) static int fdp_nci_i2c_enable(void *phy_id) { - struct fdp_i2c_phy *phy = phy_id; + const struct fdp_i2c_phy *phy = phy_id; fdp_nci_i2c_reset(phy); @@ -56,7 +56,7 @@ static int fdp_nci_i2c_enable(void *phy_id) static void fdp_nci_i2c_disable(void *phy_id) { - struct fdp_i2c_phy *phy = phy_id; + const struct fdp_i2c_phy *phy = phy_id; fdp_nci_i2c_reset(phy); } </cut>

3 years, 10 months

3
2
0 0

Re: [CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-lts-allmodconfig - Build # 6 - Successful!

by Maxim Kuvyrkov

Hi Greg, This appears to have been a fluke. Boot-testing succeeded before the merge and failed after. Boot-testing on allmodconfig doesn’t seem to be stable, so we are going to disable it. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 18 Aug 2021, at 08:38, Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote: > > On Wed, Aug 18, 2021 at 05:22:07AM +0000, ci_notify(a)linaro.org wrote: >> Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-lts-allmodconfig. So far, this commit has regressed CI configurations: >> - tcwg_kernel/llvm-master-aarch64-lts-allmodconfig >> >> Culprit: >> <cut> >> commit 132a8267adabd645476b542b3b132c1b91988fe8 >> Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> >> Date: Thu Aug 12 13:22:21 2021 +0200 >> >> Linux 5.10.58 > > <snip> > > And what am I supposed to do with this information? > > -- > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe(a)googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/YRyczv2OCq51edQh%40kroa….

3 years, 10 months

1
0
0 0

clang-aarch64-full-2stage buildbot timeout

by Maxim Kuvyrkov

> > On Sep 22, 2021, at 11:23, Florian Hahn <florian_hahn at apple.com > > wrote: > > > > Hi, > > > > It looks like a lot of the recent builds of clang-aarch64-full-2stage are timing out. > > > > E.g https://lab.llvm.org/buildbot/#/builders/179/builds/1078 while checking out sources > > > https://lab.llvm.org/buildbot/#/builders/179/builds/1076 during building stage2 > > > > Is there anything that could be done to avoid such timeouts and avoid false positive failure emails? > > > > Cheers, > > Florian Hi Florian, Thanks for the heads up. We’ve noticed these timeouts too, and have reduced the load in the machine. It appears to have helped. > > Looks like other bots are also hit by timeouts, including clang-arm64-windows-msvc-2stage ( > https://lab.llvm.org/buildbot/#/builders/120/builds/1197 > ) This one looks like a legitimate failure, and appears to have been fixed by https://github.com/llvm/llvm-project/commit/c6013f71a4555f6d9ef9c60e6bc4376… in build https://lab.llvm.org/buildbot/#/builders/120/builds/1200 . Regards, -- Maxim Kuvyrkov https://www.linaro.org

3 years, 10 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 6% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 11014 to 11697 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 40% from 1513 to 2118 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

3 years, 10 months

1
0
0 0

[TCWG CI] 401.bzip2 grew in size by 2% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks grew in size by more than 1%: - 401.bzip2 grew in size by 2% from 46428 to 47368 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

3 years, 11 months

1
0
0 0

[TCWG CI] 462.libquantum grew in size by 3% after llvm: [JumpThreading] Ignore free instructions

by ci_notify＠linaro.org

After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> [JumpThreading] Ignore free instructions the following benchmarks grew in size by more than 1%: - 462.libquantum grew in size by 3% from 14035 to 14398 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -Os -flto -mthumb - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Wed Sep 22 21:34:24 2021 +0200 [JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290 --- .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 ++++++++++------------ .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- .../inlining-alignment-assumptions.ll | 12 ++--- 4 files changed, 52 insertions(+), 53 deletions(-) diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h index 816ea1071e52..0ac7d7c62b7a 100644 --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h @@ -44,6 +44,7 @@ class PHINode; class SelectInst; class SwitchInst; class TargetLibraryInfo; +class TargetTransformInfo; class Value; /// A private "module" namespace for types and utilities used by @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; /// revectored to the false side of the second if. class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { TargetLibraryInfo *TLI; + TargetTransformInfo *TTI; LazyValueInfo *LVI; AAResults *AA; DomTreeUpdater *DTU; @@ -99,9 +101,9 @@ public: JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); // Glue for old PM. - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, - std::unique_ptr<BlockFrequencyInfo> BFI, + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI, + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, std::unique_ptr<BranchProbabilityInfo> BPI); PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp index 688902ecb9ff..fe9a7211967c 100644 --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { dbgs() << "LVI for function '" << F.getName() << "':\n"; @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, } bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, - LazyValueInfo *LVI_, AliasAnalysis *AA_, - DomTreeUpdater *DTU_, bool HasProfileData_, + TargetTransformInfo *TTI_, LazyValueInfo *LVI_, + AliasAnalysis *AA_, DomTreeUpdater *DTU_, + bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_, std::unique_ptr<BranchProbabilityInfo> BPI_) { LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n"); TLI = TLI_; + TTI = TTI_; LVI = LVI_; AA = AA_; DTU = DTU_; @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value *ToVal) { /// Return the cost of duplicating a piece of this block from first non-phi /// and before StopAt instruction to thread across it. Stop scanning the block /// when exceeding the threshold. If duplication is impossible, returns ~0U. -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, + BasicBlock *BB, Instruction *StopAt, unsigned Threshold) { assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, if (Size > Threshold) return Size; - // Debugger intrinsics don't incur code size. - if (isa<DbgInfoIntrinsic>(I)) continue; - - // Pseudo-probes don't incur code size. - if (isa<PseudoProbeInst>(I)) - continue; - - // If this is a pointer->pointer bitcast, it is free. - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) - continue; - - // Freeze instruction is free, too. - if (isa<FreezeInst>(I)) - continue; - // Bail out if this instruction gives back a token type, it is not possible // to duplicate it if it is used outside this BB. if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) return ~0U; + // Blocks with NoDuplicate are modelled as having infinite cost, so they + // are never duplicated. + if (const CallInst *CI = dyn_cast<CallInst>(I)) + if (CI->cannotDuplicate() || CI->isConvergent()) + return ~0U; + + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) + == TargetTransformInfo::TCC_Free) + continue; + // All other instructions count for at least one unit. ++Size; @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, // as having cost of 2 total, and if they are a vector intrinsic, we model // them as having cost 1. if (const CallInst *CI = dyn_cast<CallInst>(I)) { - if (CI->cannotDuplicate() || CI->isConvergent()) - // Blocks with NoDuplicate are modelled as having infinite cost, so they - // are never duplicated. - return ~0U; - else if (!isa<IntrinsicInst>(CI)) + if (!isa<IntrinsicInst>(CI)) Size += 3; else if (!CI->getType()->isVectorTy()) Size += 1; @@ -2234,10 +2228,10 @@ bool JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, } // Compute the cost of duplicating BB and PredBB. - unsigned BBCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned BBCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); unsigned PredBBCost = getJumpThreadDuplicationCost( - PredBB, PredBB->getTerminator(), BBDupThreshold); + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); // Give up if costs are too high. We need to check BBCost and PredBBCost // individually before checking their sum because getJumpThreadDuplicationCost @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( return false; } - unsigned JumpThreadCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned JumpThreadCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (JumpThreadCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() << "' - Cost is too high: " << JumpThreadCost << "\n"); @@ -2614,8 +2608,8 @@ bool JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( return false; } - unsigned DuplicationCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned DuplicationCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (DuplicationCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() << "' - Cost is too high: " << DuplicationCost << "\n"); @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, IntrinsicInst *Guard, ValueToValueMapTy UnguardedMapping, GuardedMapping; Instruction *AfterGuard = Guard->getNextNode(); - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, BBDupThreshold); + unsigned Cost = + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); if (Cost > BBDupThreshold) return false; // Duplicate all instructions before the guard and the guard itself to the diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll b/llvm/test/Transforms/JumpThreading/free_instructions.ll index f768ec996779..76392af77d33 100644 --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll @@ -5,26 +5,28 @@ ; the jump threading threshold, as everything else are free instructions. define i32 @free_instructions(i1 %c, i32* %p) { ; CHECK-LABEL: @free_instructions( -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] -; CHECK: if: +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] +; CHECK: if2: ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 -; CHECK-NEXT: br label [[JOIN:%.*]] -; CHECK: else: -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 -; CHECK-NEXT: br label [[JOIN]] -; CHECK: join: ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META0:![0-9]+]]) ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* +; CHECK-NEXT: [[P32:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P21]]) +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 +; CHECK-NEXT: ret i32 0 +; CHECK: else2: +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META4:![0-9]+]]) +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* ; CHECK-NEXT: [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P2]]) ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] -; CHECK: if2: -; CHECK-NEXT: ret i32 0 -; CHECK: else2: ; CHECK-NEXT: ret i32 1 ; br i1 %c, label %if, label %else diff --git a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll index 57014e856a09..f764a59dd8a2 100644 --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] ; ; ASSUMPTIONS-ON-LABEL: @caller1( -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE1:%.*]] -; ASSUMPTIONS-ON: false1: -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE2:%.*]] ; ASSUMPTIONS-ON: common.ret: -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, [[TMP0:%.*]] ] -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR]], i64 8) ] +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, [[TMP0:%.*]] ] +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR:%.*]], i64 8) ] ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: ret void +; ASSUMPTIONS-ON: false2: +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] ; br i1 %c, label %true1, label %false1 </cut>

3 years, 11 months

2
2
0 0

[TCWG CI] Regression caused by llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

[TCWG CI] Regression caused by llvm: Revert "Allow rematerialization of virtual reg uses": commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 21880 # First few build errors in logs: # 00:04:00 arch/arm/lib/xor-neon.c:30:2: error: This code requires at least version 4.6 of GCC [-Werror,-W#warnings] # 00:04:00 make[1]: *** [scripts/Makefile.build:277: arch/arm/lib/xor-neon.o] Error 1 # 00:04:00 make: *** [Makefile:1868: arch/arm/lib] Error 2 # 00:05:21 crypto/wp512.c:782:13: error: stack frame size (1176) exceeds limit (1024) in function 'wp512_process_buffer' [-Werror,-Wframe-larger-than] # 00:05:21 make[1]: *** [scripts/Makefile.build:277: crypto/wp512.o] Error 1 # 00:08:06 make: *** [Makefile:1868: crypto] Error 2 # 00:18:48 drivers/gpu/drm/selftests/test-drm_mm.c:372:12: error: stack frame size (1032) exceeds limit (1024) in function '__igt_reserve' [-Werror,-Wframe-larger-than] # 00:18:49 make[4]: *** [scripts/Makefile.build:277: drivers/gpu/drm/selftests/test-drm_mm.o] Error 1 # 00:19:07 make[3]: *** [scripts/Makefile.build:540: drivers/gpu/drm/selftests] Error 2 # 00:30:18 drivers/firmware/tegra/bpmp-debugfs.c:357:16: error: stack frame size (1248) exceeds limit (1024) in function 'bpmp_debug_store' [-Werror,-Wframe-larger-than] from # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 21881 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/llvm-master-arm-mainline-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Baseline build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Even more details: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-mainline-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

3 years, 11 months

2
1
0 0

[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp: commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 Author: Tom de Vries <tdevries(a)suse.de> [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_VECT_mthumb artifacts/build-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-master-arm_eabi-coremark-O3_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 cd investigate-binutils-b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3814a9e1fe77c01c7e872c25afa198537d4ac780 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b4e4386a2e58ba6ce8d02b952f1bc6ceb8fc95d1 Author: Tom de Vries <tdevries(a)suse.de> Date: Fri Sep 24 12:39:14 2021 +0200 [gdb/testsuite] Add gdb.testsuite/dump-system-info.exp When interpreting the testsuite results, it's often relevant what kind of machine the testsuite ran on. On a local machine one can just do /proc/cpuinfo, but in case of running tests using a remote system that distributes test runs to other remote systems that are not directly accessible, that's not possible. Fix this by dumping /proc/cpuinfo into the gdb.log, as well as lsb_release -a and uname -a. We could do this at the start of each test run, by putting it into unix.exp or some such. However, this might be too verbose, so we choose to put it into its own test-case, such that it get triggered in a full testrun, but not when running one or a subset of tests. We put the test-case into the gdb.testsuite directory, which is currently the only place in the testsuite where we do not test gdb. [ Though perhaps this could be put into a new gdb.info directory, since the test-case doesn't actually test the testsuite. ] Tested on x86_64-linux. --- gdb/testsuite/gdb.testsuite/dump-system-info.exp | 48 ++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/gdb/testsuite/gdb.testsuite/dump-system-info.exp b/gdb/testsuite/gdb.testsuite/dump-system-info.exp new file mode 100644 index 00000000000..bf181469bd5 --- /dev/null +++ b/gdb/testsuite/gdb.testsuite/dump-system-info.exp @@ -0,0 +1,48 @@ +# Copyright 2021 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# The purpose of this test-case is to dump /proc/cpuinfo and similar system +# info into gdb.log. + +# Check if /proc/cpuinfo is available. +set res [remote_exec target "test -r /proc/cpuinfo"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 && $output == "" } { + verbose -log "Cpuinfo available, dumping:" + remote_exec target "cat /proc/cpuinfo" +} else { + verbose -log "Cpuinfo not available" +} + +set res [remote_exec target "lsb_release -a"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 } { + verbose -log "lsb_release -a availabe, dumping:\n$output" +} else { + verbose -log "lsb_release -a not available" +} + +set res [remote_exec target "uname -a"] +set status [lindex $res 0] +set output [lindex $res 1] + +if { $status == 0 } { + verbose -log "uname -a availabe, dumping:\n$output" +} else { + verbose -log "uname -a not available" +} </cut>

3 years, 11 months

2
1
0 0

[TCWG CI] 464.h264ref slowed down by 3% after llvm: Fix test from 8dd42f, capitalization in test

by ci_notify＠linaro.org

After llvm commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af Author: Erich Keane <erich.keane(a)intel.com> Fix test from 8dd42f, capitalization in test the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 3% from 10973 to 11249 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 12% from 1446 to 1619 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af cd investigate-llvm-e8e2edd8ca88f8b0a7dba141349b2aa83284f3af # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh # Reproduce last_good build git checkout --detach 77d200a546136c2855063613ff4bca1f682fb23a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e8e2edd8ca88f8b0a7dba141349b2aa83284f3af Author: Erich Keane <erich.keane(a)intel.com> Date: Fri Sep 24 10:24:17 2021 -0700 Fix test from 8dd42f, capitalization in test --- clang/test/CXX/drs/dr17xx.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/test/CXX/drs/dr17xx.cpp b/clang/test/CXX/drs/dr17xx.cpp index 42303c83ae3c..c8648908ebda 100644 --- a/clang/test/CXX/drs/dr17xx.cpp +++ b/clang/test/CXX/drs/dr17xx.cpp @@ -129,7 +129,7 @@ namespace dr1778 { // dr1778: 9 namespace dr1762 { // dr1762: 14 #if __cplusplus >= 201103L float operator ""_E(const char *); - // expected-error@+2 {{invalid suffix on literal; c++11 requires a space between literal and identifier}} + // expected-error@+2 {{invalid suffix on literal; C++11 requires a space between literal and identifier}} // expected-warning@+1 {{user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator}} float operator ""E(const char *); #endif </cut>

3 years, 11 months

2
1
0 0

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Thanks, Stanislav, FWIW, it will be, probably, easier for you to just rebuild the compiler, it is an x86_64-linux-gnu -> arm-linux-gnueabihf cross. This link has the build log [1]. cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=ARM Then compile the pre-processed source with plain -O2 or -O3 optimisation settings. [1] https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 24 Sep 2021, at 20:30, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > > [AMD Official Use Only] > > I have reverted the whole change. There was yet another perf regression report. > > Stas > > From: Mekhanoshin, Stanislav > Sent: Thursday, September 23, 2021 11:48 > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > Thanks. I see the reload. There shall not be extra pressure since that is the whole idea, make pressure less. However, I see more spills in that specific file, fast_algorithms.s if I get it right. > Can I get the IR for it? Something to feed llc. > > Stas > > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > Sent: Thursday, September 23, 2021 2:31 > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > [CAUTION: External Email] > > Thanks, Stanislav. > > I’ve looked into profile dumps, and 456.hmmer’s hot loop get several additional reloads. E.g., "ldr r1, [sp, #84]” generates 203 additional samples, which translates into 20 seconds of time just for that one instruction. > > See the attached profile dumps and the the screenshot with the hot loop highlighted. > > Maybe your patch increases register pressure too much? > > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org > > > On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > > > > [AMD Official Use Only] > > > > There are actually couple things worth to try if that is easy: > > > > https://reviews.llvm.org/D109077 > > https://reviews.llvm.org/differential/diff/374324/ > > > > Both may slightly change spill weights and then spilling pattern. > > > > Stas > > > > -----Original Message----- > > From: Mekhanoshin, Stanislav > > Sent: Wednesday, September 22, 2021 12:09 > > To: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > > > I assume some of the newly rematerialized instructions caused perf drops. Probably some very specific ones. I would appreciate if you could point them to me. > > In addition I believe I would need to have a linked or optimized bitcode to feed into llc. > > > > Stas > > > > -----Original Message----- > > From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > > Sent: Wednesday, September 22, 2021 12:06 > > To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > > Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > > > > [CAUTION: External Email] > > > > Hi Stanislav, > > > > That's fair; I or someone from Linaro will try to analyze this and follow up here. > > > > On a more general note, what info would you like to see in these benchmarking regression reports? > > > > Thanks, > > > > -- > > Maxim Kuvyrkov > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > > > > > >> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > >> > >> [AMD Official Use Only] > >> > >> Hm... I'd really like to help, but I do not think I can do anything with megabytes of code in an asm which I do not understand and tons of differences in 48 asm files. > >> What I can see there is overall less spilling code which was the intent in the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 less of them. > >> I doubt I could say much more without someone pointing to the actual root cause. > >> > >> Stas > >> > >> -----Original Message----- > >> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > >> Sent: Wednesday, September 22, 2021 5:16 > >> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > >> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > >> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > >> > >> [CAUTION: External Email] > >> > >> Hi Stanislav, > >> > >> Attached is a tarball with -save-temps output (pre-processed source and generated assembly) for first-bad run (your commit) and last-good run (immediate parent of your commit). > >> > >> -- > >> Maxim Kuvyrkov > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > >> > >>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> wrote: > >>> > >>> [AMD Official Use Only] > >>> > >>> Thanks for letting me know. Some regressions are inevitable, however do you happen to have any analysis and dumps? I myself do not understand ARM ISA well... > >>> > >>> Stas > >>> > >>> -----Original Message----- > >>> From: Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> > >>> Sent: Wednesday, September 15, 2021 5:52 > >>> To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin(a)amd.com> > >>> Cc: linaro-toolchain <linaro-toolchain(a)lists.linaro.org> > >>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses > >>> > >>> [CAUTION: External Email] > >>> > >>> Hi Stanislav, > >>> > >>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. > >>> > >>> -- > >>> Maxim Kuvyrkov > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linar… > >>> > >> > >> > > > <image001.png>

3 years, 11 months

1
0
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit fcc561a54de2beb19cb325094fbd3ec76f96e520 Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-fcc561a54de2beb19cb325094fbd3ec76f96e520/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-fcc561a54de2beb19cb325094fbd3ec76f96e520 cd investigate-binutils-fcc561a54de2beb19cb325094fbd3ec76f96e520 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach fcc561a54de2beb19cb325094fbd3ec76f96e520 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit fcc561a54de2beb19cb325094fbd3ec76f96e520 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Wed Sep 22 00:00:31 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 338c1288a22..c45d963473c 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210921 +#define BFD_VERSION_DATE 20210922 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

3 years, 11 months

1
0
0 0

[TCWG CI] Regression caused by gcc: Daily bump.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Daily bump.: commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Daily bump. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-d4b84aefe696a5783a58a30b3fb8dc4617cd147a/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a cd investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d4b84aefe696a5783a58a30b3fb8dc4617cd147a ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Date: Tue Sep 21 00:17:57 2021 +0000 Daily bump. --- gcc/DATESTAMP | 2 +- gcc/fortran/ChangeLog | 5 +++++ gcc/testsuite/ChangeLog | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index c1155ef2341..ed865cb70ab 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20210920 +20210921 diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog index f6863fb900a..3d53ed99f33 100644 --- a/gcc/fortran/ChangeLog +++ b/gcc/fortran/ChangeLog @@ -1,3 +1,8 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' + on 'distribute' for combined directives, matching OpenMP 5.0 + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2ea65ee2d7f..7f8d142942a 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: </cut>

3 years, 11 months

1
0
0 0

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail

by ci_notify＠linaro.org

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail: commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Cygwin: allow open_setup to fail Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-e5fcb021cc9dcb1f19d45030457be86b4a226e65/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 cd investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /newlib/ ./ ./bisect/baseline/ cd newlib # Reproduce first_bad build git checkout --detach e5fcb021cc9dcb1f19d45030457be86b4a226e65 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b0841aa789e74b6778744b89af76b60bd1a78bc ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Date: Sat Sep 18 08:13:55 2021 -0400 Cygwin: allow open_setup to fail Convert fhandler_base::open_setup to a (virtual) method that returns a bool result. For the moment, it and its overrides always return true. --- winsup/cygwin/fhandler.cc | 3 ++- winsup/cygwin/fhandler.h | 10 +++++----- winsup/cygwin/fhandler_console.cc | 4 ++-- winsup/cygwin/fhandler_pipe.cc | 9 +++++++-- winsup/cygwin/fhandler_tty.cc | 8 ++++---- 5 files changed, 20 insertions(+), 14 deletions(-) diff --git a/winsup/cygwin/fhandler.cc b/winsup/cygwin/fhandler.cc index 9dfe70be38..1af469e0c9 100644 --- a/winsup/cygwin/fhandler.cc +++ b/winsup/cygwin/fhandler.cc @@ -789,9 +789,10 @@ fhandler_base::fd_reopen (int, mode_t) return NULL; } -void +bool fhandler_base::open_setup (int) { + return true; } /* states: diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h index 61113e6981..3471e95b97 100644 --- a/winsup/cygwin/fhandler.h +++ b/winsup/cygwin/fhandler.h @@ -355,7 +355,7 @@ class fhandler_base int open_null (int flags); virtual int open (int, mode_t); virtual fhandler_base *fd_reopen (int, mode_t); - virtual void open_setup (int flags); + virtual bool open_setup (int flags); void set_unique_id (int64_t u) { unique_id = u; } void set_unique_id () { NtAllocateLocallyUniqueId ((PLUID) &unique_id); } @@ -1206,7 +1206,7 @@ public: select_record *select_except (select_stuff *); char *get_proc_fd_name (char *buf); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); void fixup_after_fork (HANDLE); int dup (fhandler_base *child, int); void set_close_on_exec (bool val); @@ -2132,7 +2132,7 @@ private: bool use_archetype () const {return true;} int open (int flags, mode_t mode); - void open_setup (int flags); + bool open_setup (int flags); int dup (fhandler_base *, int); void __reg3 read (void *ptr, size_t& len); @@ -2300,7 +2300,7 @@ class fhandler_pty_slave: public fhandler_pty_common HANDLE& get_handle_nat () { return io_handle_nat; } int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int init (HANDLE, DWORD, mode_t); @@ -2399,7 +2399,7 @@ public: void doecho (const void *str, DWORD len); int accept_input (); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int close (); diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc index e00f2cdbcc..ee862b17d1 100644 --- a/winsup/cygwin/fhandler_console.cc +++ b/winsup/cygwin/fhandler_console.cc @@ -1366,13 +1366,13 @@ fhandler_console::open (int flags, mode_t) return 1; } -void +bool fhandler_console::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); if (myself->set_ctty (this, flags) && !myself->cygstarted) init_console_handler (true); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } int diff --git a/winsup/cygwin/fhandler_pipe.cc b/winsup/cygwin/fhandler_pipe.cc index 73ace3ac53..590ecf6670 100644 --- a/winsup/cygwin/fhandler_pipe.cc +++ b/winsup/cygwin/fhandler_pipe.cc @@ -191,10 +191,11 @@ out: return 0; } -void +bool fhandler_pipe::open_setup (int flags) { - fhandler_base::open_setup (flags); + if (!fhandler_base::open_setup (flags)) + goto err; if (get_dev () == FH_PIPER && !read_mtx) { SECURITY_ATTRIBUTES *sa = sec_none_cloexec (flags); @@ -211,6 +212,10 @@ fhandler_pipe::open_setup (int flags) } if (get_dev () == FH_PIPEW && !query_hdl) set_pipe_non_blocking (is_nonblocking ()); + return true; + +err: + return false; } off_t diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 1ea9a47ac5..05fe5348af 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -964,13 +964,13 @@ err_no_msg: return 0; } -void +bool fhandler_pty_slave::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); myself->set_ctty (this, flags); report_tty_counts (this, "opened", ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } void @@ -1947,14 +1947,14 @@ fhandler_pty_master::open (int flags, mode_t) return 1; } -void +bool fhandler_pty_master::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); char buf[sizeof ("opened pty master for ptyNNNNNNNNNNN")]; __small_sprintf (buf, "opened pty master for pty%d", get_minor ()); report_tty_counts (this, buf, ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } off_t </cut>

3 years, 11 months

1
0
0 0

gcc-linaro-6.3.1-2017.05-i686_aarch64-elf.tar.xz

by maytte sanchez

I’m trying to import these files into Ds-5. After unzipping files, it still will not show up in ds-5 search. Below is the error that I keep receiving: Sent from my iPhone

3 years, 11 months

1
0
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-27439f0edab99c6870cf7fe042074e47632f3fbd/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd cd investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh # Reproduce last_good build git checkout --detach 6060c2f3373e18f76fa9e3e4d7cf2f3d5983da03 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Tue Sep 21 00:00:39 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 72a41aba322..338c1288a22 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210920 +#define BFD_VERSION_DATE 20210921 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

3 years, 11 months

1
0
0 0

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute: commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> GCC11 - Fortran: combined directives - order(concurrent) not on distribute Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-b1dc26d3543d79805751c26ba5b142eeeb1f55b8/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 cd investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 79c523d40de1b7ce1dd0f4865c0855ab2bf6744b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> Date: Mon Sep 20 17:24:56 2021 +0200 GCC11 - Fortran: combined directives - order(concurrent) not on distribute While OpenMP 5.1 and GCC 12 permits 'order(concurrent)' on distribute, OpenMP 5.0 and GCC 11 don't. This patch for GCC 11 ensures the clause also does not end up on 'distribute' when splitting combined directives. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' on 'distribute' for combined directives, matching OpenMP 5.0 gcc/testsuite/ChangeLog: * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. --- gcc/fortran/trans-openmp.c | 2 -- .../gomp/distribute-order-concurrent.f90 | 25 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 7e931bf4bc7..973d916b4a2 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -5176,8 +5176,6 @@ gfc_split_omp_clauses (gfc_code *code, /* Duplicate collapse. */ clausesa[GFC_OMP_SPLIT_DISTRIBUTE].collapse = code->ext.omp_clauses->collapse; - clausesa[GFC_OMP_SPLIT_DISTRIBUTE].order_concurrent - = code->ext.omp_clauses->order_concurrent; } if (mask & GFC_OMP_MASK_PARALLEL) { diff --git a/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 new file mode 100644 index 00000000000..9597d913684 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 @@ -0,0 +1,25 @@ +! { dg-additional-options "-fdump-tree-original" } +! +! In OpenMP 5.0, 'order(concurrent)' does not apply to distribute +! Ensure that it is rejected in GCC 11. +! +! Note: OpenMP 5.1 allows it; the GCC 12 testcase for it is gfortran.dg/gomp/order-5.f90 + +subroutine f(a) +implicit none +integer :: i, thr +!save :: thr +integer :: a(:) + +!$omp distribute parallel do order(concurrent) private(thr) + do i = 1, 10 + thr = 5 + a(i) = thr + end do +!$omp end distribute parallel do +end + +! { dg-final { scan-tree-dump-not "omp distribute\[^\n\r]*order" "original" } } +! { dg-final { scan-tree-dump "#pragma omp distribute\[\n\r\]" "original" } } +! { dg-final { scan-tree-dump "#pragma omp parallel private\$thr\$" "original" } } +! { dg-final { scan-tree-dump "#pragma omp for nowait order\$concurrent\$" "original" } } </cut>

3 years, 11 months

1
0
0 0

[TCWG CI] 450.soplex grew in size by 2% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks grew in size by more than 1%: - 450.soplex grew in size by 2% from 207260 to 211436 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

3 years, 11 months

1
0
0 0

[TCWG CI] 464.h264ref slowed down by 7% after llvm: [JumpThreading] Ignore free instructions

by ci_notify＠linaro.org

After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> [JumpThreading] Ignore free instructions the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 7% from 10715 to 11434 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Wed Sep 22 21:34:24 2021 +0200 [JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290 --- .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 ++++++++++------------ .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- .../inlining-alignment-assumptions.ll | 12 ++--- 4 files changed, 52 insertions(+), 53 deletions(-) diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h index 816ea1071e52..0ac7d7c62b7a 100644 --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h @@ -44,6 +44,7 @@ class PHINode; class SelectInst; class SwitchInst; class TargetLibraryInfo; +class TargetTransformInfo; class Value; /// A private "module" namespace for types and utilities used by @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; /// revectored to the false side of the second if. class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { TargetLibraryInfo *TLI; + TargetTransformInfo *TTI; LazyValueInfo *LVI; AAResults *AA; DomTreeUpdater *DTU; @@ -99,9 +101,9 @@ public: JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); // Glue for old PM. - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, - std::unique_ptr<BlockFrequencyInfo> BFI, + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI, + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, std::unique_ptr<BranchProbabilityInfo> BPI); PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp index 688902ecb9ff..fe9a7211967c 100644 --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { dbgs() << "LVI for function '" << F.getName() << "':\n"; @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, } bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, - LazyValueInfo *LVI_, AliasAnalysis *AA_, - DomTreeUpdater *DTU_, bool HasProfileData_, + TargetTransformInfo *TTI_, LazyValueInfo *LVI_, + AliasAnalysis *AA_, DomTreeUpdater *DTU_, + bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_, std::unique_ptr<BranchProbabilityInfo> BPI_) { LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n"); TLI = TLI_; + TTI = TTI_; LVI = LVI_; AA = AA_; DTU = DTU_; @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value *ToVal) { /// Return the cost of duplicating a piece of this block from first non-phi /// and before StopAt instruction to thread across it. Stop scanning the block /// when exceeding the threshold. If duplication is impossible, returns ~0U. -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, + BasicBlock *BB, Instruction *StopAt, unsigned Threshold) { assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, if (Size > Threshold) return Size; - // Debugger intrinsics don't incur code size. - if (isa<DbgInfoIntrinsic>(I)) continue; - - // Pseudo-probes don't incur code size. - if (isa<PseudoProbeInst>(I)) - continue; - - // If this is a pointer->pointer bitcast, it is free. - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) - continue; - - // Freeze instruction is free, too. - if (isa<FreezeInst>(I)) - continue; - // Bail out if this instruction gives back a token type, it is not possible // to duplicate it if it is used outside this BB. if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) return ~0U; + // Blocks with NoDuplicate are modelled as having infinite cost, so they + // are never duplicated. + if (const CallInst *CI = dyn_cast<CallInst>(I)) + if (CI->cannotDuplicate() || CI->isConvergent()) + return ~0U; + + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) + == TargetTransformInfo::TCC_Free) + continue; + // All other instructions count for at least one unit. ++Size; @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, // as having cost of 2 total, and if they are a vector intrinsic, we model // them as having cost 1. if (const CallInst *CI = dyn_cast<CallInst>(I)) { - if (CI->cannotDuplicate() || CI->isConvergent()) - // Blocks with NoDuplicate are modelled as having infinite cost, so they - // are never duplicated. - return ~0U; - else if (!isa<IntrinsicInst>(CI)) + if (!isa<IntrinsicInst>(CI)) Size += 3; else if (!CI->getType()->isVectorTy()) Size += 1; @@ -2234,10 +2228,10 @@ bool JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, } // Compute the cost of duplicating BB and PredBB. - unsigned BBCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned BBCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); unsigned PredBBCost = getJumpThreadDuplicationCost( - PredBB, PredBB->getTerminator(), BBDupThreshold); + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); // Give up if costs are too high. We need to check BBCost and PredBBCost // individually before checking their sum because getJumpThreadDuplicationCost @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( return false; } - unsigned JumpThreadCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned JumpThreadCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (JumpThreadCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() << "' - Cost is too high: " << JumpThreadCost << "\n"); @@ -2614,8 +2608,8 @@ bool JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( return false; } - unsigned DuplicationCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned DuplicationCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (DuplicationCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() << "' - Cost is too high: " << DuplicationCost << "\n"); @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, IntrinsicInst *Guard, ValueToValueMapTy UnguardedMapping, GuardedMapping; Instruction *AfterGuard = Guard->getNextNode(); - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, BBDupThreshold); + unsigned Cost = + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); if (Cost > BBDupThreshold) return false; // Duplicate all instructions before the guard and the guard itself to the diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll b/llvm/test/Transforms/JumpThreading/free_instructions.ll index f768ec996779..76392af77d33 100644 --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll @@ -5,26 +5,28 @@ ; the jump threading threshold, as everything else are free instructions. define i32 @free_instructions(i1 %c, i32* %p) { ; CHECK-LABEL: @free_instructions( -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] -; CHECK: if: +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] +; CHECK: if2: ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 -; CHECK-NEXT: br label [[JOIN:%.*]] -; CHECK: else: -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 -; CHECK-NEXT: br label [[JOIN]] -; CHECK: join: ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META0:![0-9]+]]) ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* +; CHECK-NEXT: [[P32:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P21]]) +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 +; CHECK-NEXT: ret i32 0 +; CHECK: else2: +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META4:![0-9]+]]) +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* ; CHECK-NEXT: [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P2]]) ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] -; CHECK: if2: -; CHECK-NEXT: ret i32 0 -; CHECK: else2: ; CHECK-NEXT: ret i32 1 ; br i1 %c, label %if, label %else diff --git a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll index 57014e856a09..f764a59dd8a2 100644 --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] ; ; ASSUMPTIONS-ON-LABEL: @caller1( -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE1:%.*]] -; ASSUMPTIONS-ON: false1: -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE2:%.*]] ; ASSUMPTIONS-ON: common.ret: -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, [[TMP0:%.*]] ] -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR]], i64 8) ] +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, [[TMP0:%.*]] ] +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR:%.*]], i64 8) ] ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: ret void +; ASSUMPTIONS-ON: false2: +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] ; br i1 %c, label %true1, label %false1 </cut>

3 years, 11 months

1
0
0 0

[ACTIVITY] report week ending 24 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Still looking at the mess that is non-unique bus names. Worked through exactly which devices and machine types are affected for the i2c bus. + Sent a patchset which tries to make the "create a bus" function names a bit more regular across different bus types. * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Luis figured out why GDB was crashing when fed the MVE XML by QEMU's gdbstub; this was a combination of QEMU giving GDB some non-standard extra registers in its "vfp" XML feature and GDB not being robust enough against those unexpected extras. Sent out a patchset which cleans up QEMU's XML in this area and also implements the extra XML for MVE. (This will only go into QEMU once the GDB patches have landed and the XML format is nailed down.) -- PMM

3 years, 11 months

1
0
0 0

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Hi Stanislav, FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. -- Maxim Kuvyrkov https://www.linaro.org > On 15 Sep 2021, at 12:54, ci_notify(a)linaro.org wrote: > > After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > > Allow rematerialization of virtual reg uses > > the following benchmarks slowed down by more than 2%: > - 456.hmmer slowed down by 6% > - 482.sphinx3 slowed down by 3% > > Benchmark: > Toolchain: Clang + Glibc + LLVM Linker > Version: all components were built from their tip of trunk > Target: arm-linux-gnueabihf > Compiler flags: -O3 -marm > Hardware: NVidia TK1 4x Cortex-A15 > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2 > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > Date: Thu Aug 19 11:42:09 2021 -0700 > > Allow rematerialization of virtual reg uses > > Currently isReallyTriviallyReMaterializableGeneric() implementation > prevents rematerialization on any virtual register use on the grounds > that is not a trivial rematerialization and that we do not want to > extend liveranges. > > It appears that LRE logic does not attempt to extend a liverange of > a source register for rematerialization so that is not an issue. > That is checked in the LiveRangeEdit::allUsesAvailableAt(). > > The only non-trivial aspect of it is accounting for tied-defs which > normally represent a read-modify-write operation and not rematerializable. > > The test for a tied-def situation already exists in the > /CodeGen/AMDGPU/remat-vop.mir, > test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. > > The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets > where I more or less understand the asm it seems to reduce spilling > (as expected) or be neutral. However, it needs a review by all targets' > specialists. > > Differential Revision: https://reviews.llvm.org/D106408 > --- > llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- > llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- > llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + > llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- > llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- > llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- > .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- > llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- > llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- > llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- > llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- > llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- > llvm/test/CodeGen/Mips/tls.ll | 4 +- > llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- > llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- > llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- > llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- > llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- > llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- > llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- > llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- > .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- > llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- > llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- > llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- > .../tail-pred-disabled-in-loloops.ll | 14 +- > .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- > .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- > llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- > llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- > llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- > llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- > llvm/test/CodeGen/X86/addcarry.ll | 20 +- > llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- > llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- > llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- > llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- > llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- > llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- > 42 files changed, 4217 insertions(+), 4202 deletions(-) > > diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > index 2f853a2c6f9f..1c05afba730d 100644 > --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h > +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > @@ -117,10 +117,11 @@ public: > const MachineFunction &MF) const; > > /// Return true if the instruction is trivially rematerializable, meaning it > - /// has no side effects and requires no operands that aren't always available. > - /// This means the only allowed uses are constants and unallocatable physical > - /// registers so that the instructions result is independent of the place > - /// in the function. > + /// has no side effects. Uses of constants and unallocatable physical > + /// registers are always trivial to rematerialize so that the instructions > + /// result is independent of the place in the function. Uses of virtual > + /// registers are allowed but it is caller's responsility to ensure these > + /// operands are valid at the point the instruction is beeing moved. > bool isTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA = nullptr) const { > return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || > @@ -140,8 +141,7 @@ protected: > /// set, this hook lets the target specify whether the instruction is actually > /// trivially rematerializable, taking into consideration its operands. This > /// predicate must return false if the instruction has any side effects other > - /// than producing a value, or if it requres any address registers that are > - /// not always available. > + /// than producing a value. > /// Requirements must be check as stated in isTriviallyReMaterializable() . > virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA) const { > diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp > index 1eab8e7443a7..fe7d60e0b7e2 100644 > --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp > +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp > @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > const MachineRegisterInfo &MRI = MF.getRegInfo(); > > // Remat clients assume operand 0 is the defined register. > - if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) > + if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || > + MI.getOperand(0).isTied()) > return false; > Register DefReg = MI.getOperand(0).getReg(); > > @@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > // same virtual register, though. > if (MO.isDef() && Reg != DefReg) > return false; > - > - // Don't allow any virtual-register uses. Rematting an instruction with > - // virtual register uses would length the live ranges of the uses, which > - // is not necessarily a good idea, certainly not "trivial". > - if (MO.isUse()) > - return false; > } > > // Everything checked out. > diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > index ed799bfca028..c9915aaabfde 100644 > --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir > +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > @@ -51,6 +51,66 @@ body: | > S_NOP 0, implicit %2 > S_ENDPGM 0 > ... > +# The liverange of %0 covers a point of rematerialization, source value is > +# availabe. > +--- > +name: test_remat_s_mov_b32_vreg_src_long_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_NOP 0, implicit %0 > + S_ENDPGM 0 > +... > +# The liverange of %0 does not cover a point of rematerialization, source value is > +# unavailabe and we do not want to artificially extend the liverange. > +--- > +name: test_no_remat_s_mov_b32_vreg_src_short_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) > + ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_ENDPGM 0 > +... > --- > name: test_remat_s_mov_b64 > tracksRegLiveness: true > diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > index a4243276c70a..175a2069a441 100644 > --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; ENABLE-NEXT: pophs {r11, pc} > ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; ENABLE-NEXT: movw r12, :lower16:skip > -; ENABLE-NEXT: sub r1, r1, #1 > +; ENABLE-NEXT: sub r3, r1, #1 > ; ENABLE-NEXT: movt r12, :upper16:skip > ; ENABLE-NEXT: .LBB0_4: @ %while.body > ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; ENABLE-NEXT: ldrb r3, [r0] > -; ENABLE-NEXT: ldrb r3, [r12, r3] > -; ENABLE-NEXT: add r0, r0, r3 > -; ENABLE-NEXT: sub r3, r1, #1 > -; ENABLE-NEXT: cmp r3, r1 > +; ENABLE-NEXT: ldrb r1, [r0] > +; ENABLE-NEXT: ldrb r1, [r12, r1] > +; ENABLE-NEXT: add r0, r0, r1 > +; ENABLE-NEXT: sub r1, r3, #1 > +; ENABLE-NEXT: cmp r1, r3 > ; ENABLE-NEXT: bhs .LBB0_6 > ; ENABLE-NEXT: @ %bb.5: @ %while.body > ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; ENABLE-NEXT: cmp r0, r2 > -; ENABLE-NEXT: mov r1, r3 > +; ENABLE-NEXT: mov r3, r1 > ; ENABLE-NEXT: blo .LBB0_4 > ; ENABLE-NEXT: .LBB0_6: @ %if.end29 > ; ENABLE-NEXT: pop {r11, pc} > @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; DISABLE-NEXT: pophs {r11, pc} > ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; DISABLE-NEXT: movw r12, :lower16:skip > -; DISABLE-NEXT: sub r1, r1, #1 > +; DISABLE-NEXT: sub r3, r1, #1 > ; DISABLE-NEXT: movt r12, :upper16:skip > ; DISABLE-NEXT: .LBB0_4: @ %while.body > ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; DISABLE-NEXT: ldrb r3, [r0] > -; DISABLE-NEXT: ldrb r3, [r12, r3] > -; DISABLE-NEXT: add r0, r0, r3 > -; DISABLE-NEXT: sub r3, r1, #1 > -; DISABLE-NEXT: cmp r3, r1 > +; DISABLE-NEXT: ldrb r1, [r0] > +; DISABLE-NEXT: ldrb r1, [r12, r1] > +; DISABLE-NEXT: add r0, r0, r1 > +; DISABLE-NEXT: sub r1, r3, #1 > +; DISABLE-NEXT: cmp r1, r3 > ; DISABLE-NEXT: bhs .LBB0_6 > ; DISABLE-NEXT: @ %bb.5: @ %while.body > ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; DISABLE-NEXT: cmp r0, r2 > -; DISABLE-NEXT: mov r1, r3 > +; DISABLE-NEXT: mov r3, r1 > ; DISABLE-NEXT: blo .LBB0_4 > ; DISABLE-NEXT: .LBB0_6: @ %if.end29 > ; DISABLE-NEXT: pop {r11, pc} > diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > index 55157875d355..ea15fcc5c824 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: push {r4, r5, r11, lr} > ; SCALAR-NEXT: rsb r3, r2, #0 > ; SCALAR-NEXT: and r4, r2, #63 > -; SCALAR-NEXT: and lr, r3, #63 > -; SCALAR-NEXT: rsb r3, lr, #32 > +; SCALAR-NEXT: and r12, r3, #63 > +; SCALAR-NEXT: rsb r3, r12, #32 > ; SCALAR-NEXT: lsl r2, r0, r4 > -; SCALAR-NEXT: lsr r12, r0, lr > -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 > -; SCALAR-NEXT: subs r12, lr, #32 > -; SCALAR-NEXT: lsrpl r3, r1, r12 > +; SCALAR-NEXT: lsr lr, r0, r12 > +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 > +; SCALAR-NEXT: subs lr, r12, #32 > +; SCALAR-NEXT: lsrpl r3, r1, lr > ; SCALAR-NEXT: subs r5, r4, #32 > ; SCALAR-NEXT: movwpl r2, #0 > ; SCALAR-NEXT: cmp r5, #0 > @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: lsr r3, r0, r3 > ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 > ; SCALAR-NEXT: lslpl r3, r0, r5 > -; SCALAR-NEXT: lsr r0, r1, lr > -; SCALAR-NEXT: cmp r12, #0 > +; SCALAR-NEXT: lsr r0, r1, r12 > +; SCALAR-NEXT: cmp lr, #0 > ; SCALAR-NEXT: movwpl r0, #0 > ; SCALAR-NEXT: orr r1, r3, r0 > ; SCALAR-NEXT: mov r0, r2 > @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK: @ %bb.0: > ; CHECK-NEXT: .save {r4, r5, r11, lr} > ; CHECK-NEXT: push {r4, r5, r11, lr} > -; CHECK-NEXT: and lr, r2, #63 > +; CHECK-NEXT: and r12, r2, #63 > ; CHECK-NEXT: rsb r2, r2, #0 > -; CHECK-NEXT: rsb r3, lr, #32 > +; CHECK-NEXT: rsb r3, r12, #32 > ; CHECK-NEXT: and r4, r2, #63 > -; CHECK-NEXT: lsr r12, r0, lr > -; CHECK-NEXT: orr r3, r12, r1, lsl r3 > -; CHECK-NEXT: subs r12, lr, #32 > +; CHECK-NEXT: lsr lr, r0, r12 > +; CHECK-NEXT: orr r3, lr, r1, lsl r3 > +; CHECK-NEXT: subs lr, r12, #32 > ; CHECK-NEXT: lsl r2, r0, r4 > -; CHECK-NEXT: lsrpl r3, r1, r12 > +; CHECK-NEXT: lsrpl r3, r1, lr > ; CHECK-NEXT: subs r5, r4, #32 > ; CHECK-NEXT: movwpl r2, #0 > ; CHECK-NEXT: cmp r5, #0 > @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK-NEXT: lsr r3, r0, r3 > ; CHECK-NEXT: orr r3, r3, r1, lsl r4 > ; CHECK-NEXT: lslpl r3, r0, r5 > -; CHECK-NEXT: lsr r0, r1, lr > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: lsr r0, r1, r12 > +; CHECK-NEXT: cmp lr, #0 > ; CHECK-NEXT: movwpl r0, #0 > ; CHECK-NEXT: orr r1, r0, r3 > ; CHECK-NEXT: mov r0, r2 > diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll > index 54c93b493c98..6372f9be2ca3 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll > @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { > ; CHECK-NEXT: mov r3, #0 > ; CHECK-NEXT: bl __aeabi_uldivmod > ; CHECK-NEXT: add r0, r2, #27 > -; CHECK-NEXT: lsl r6, r6, #27 > -; CHECK-NEXT: and r1, r0, #63 > ; CHECK-NEXT: lsl r2, r7, #27 > +; CHECK-NEXT: and r12, r0, #63 > +; CHECK-NEXT: lsl r6, r6, #27 > ; CHECK-NEXT: orr r7, r6, r7, lsr #5 > +; CHECK-NEXT: rsb r3, r12, #32 > +; CHECK-NEXT: lsr r2, r2, r12 > ; CHECK-NEXT: mov r6, #63 > -; CHECK-NEXT: rsb r3, r1, #32 > -; CHECK-NEXT: lsr r2, r2, r1 > -; CHECK-NEXT: subs r12, r1, #32 > -; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: orr r2, r2, r7, lsl r3 > +; CHECK-NEXT: subs r3, r12, #32 > +; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: lsl r5, r9, #1 > -; CHECK-NEXT: lsrpl r2, r7, r12 > +; CHECK-NEXT: lsrpl r2, r7, r3 > +; CHECK-NEXT: subs r1, r6, #32 > ; CHECK-NEXT: lsl r0, r5, r6 > -; CHECK-NEXT: subs r4, r6, #32 > -; CHECK-NEXT: lsl r3, r8, #1 > +; CHECK-NEXT: lsl r4, r8, #1 > ; CHECK-NEXT: movwpl r0, #0 > -; CHECK-NEXT: orr r3, r3, r9, lsr #31 > +; CHECK-NEXT: orr r4, r4, r9, lsr #31 > ; CHECK-NEXT: orr r0, r0, r2 > ; CHECK-NEXT: rsb r2, r6, #32 > -; CHECK-NEXT: cmp r4, #0 > -; CHECK-NEXT: lsr r1, r7, r1 > +; CHECK-NEXT: cmp r1, #0 > ; CHECK-NEXT: lsr r2, r5, r2 > -; CHECK-NEXT: orr r2, r2, r3, lsl r6 > -; CHECK-NEXT: lslpl r2, r5, r4 > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: orr r2, r2, r4, lsl r6 > +; CHECK-NEXT: lslpl r2, r5, r1 > +; CHECK-NEXT: lsr r1, r7, r12 > +; CHECK-NEXT: cmp r3, #0 > ; CHECK-NEXT: movwpl r1, #0 > ; CHECK-NEXT: orr r1, r2, r1 > ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} > diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > index 2922e0ed5423..0a0bb62b0a09 100644 > --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { > ; BE-LABEL: i56_or: > ; BE: @ %bb.0: > ; BE-NEXT: mov r1, r0 > -; BE-NEXT: ldr r12, [r0] > ; BE-NEXT: ldrh r2, [r1, #4]! > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: orr r2, r3, r2, lsl #8 > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: strb r2, [r1, #2] > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: strb r12, [r1, #2] > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > %aa = load i56, i56* %a > @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: strb r2, [r1, #2] > ; BE-NEXT: orr r2, r3, r12, lsl #8 > -; BE-NEXT: ldr r12, [r0] > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > > diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll > index 09a991da2e59..46490efb6631 100644 > --- a/llvm/test/CodeGen/ARM/neon-copy.ll > +++ b/llvm/test/CodeGen/ARM/neon-copy.ll > @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { > ; CHECK-NEXT: .pad #8 > ; CHECK-NEXT: sub sp, sp, #8 > ; CHECK-NEXT: vmov.u16 r1, d0[1] > -; CHECK-NEXT: and r0, r0, #3 > +; CHECK-NEXT: and r12, r0, #3 > ; CHECK-NEXT: vmov.u16 r2, d0[2] > -; CHECK-NEXT: mov r3, sp > -; CHECK-NEXT: vmov.u16 r12, d0[3] > -; CHECK-NEXT: orr r0, r3, r0, lsl #1 > +; CHECK-NEXT: mov r0, sp > +; CHECK-NEXT: vmov.u16 r3, d0[3] > +; CHECK-NEXT: orr r0, r0, r12, lsl #1 > ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] > ; CHECK-NEXT: vldr d0, [sp] > ; CHECK-NEXT: vmov.16 d0[1], r1 > ; CHECK-NEXT: vmov.16 d0[2], r2 > -; CHECK-NEXT: vmov.16 d0[3], r12 > +; CHECK-NEXT: vmov.16 d0[3], r3 > ; CHECK-NEXT: add sp, sp, #8 > ; CHECK-NEXT: bx lr > %tmp = extractelement <8 x i16> %x, i32 0 > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > index 8be7100d368b..a125446b27c3 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $2, $6 > +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 76($sp) > -; MMR3-NEXT: srlv $4, $7, $16 > -; MMR3-NEXT: not16 $3, $16 > -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sll16 $2, $6, 1 > -; MMR3-NEXT: sllv $3, $2, $3 > -; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: srlv $6, $6, $16 > -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: subu16 $7, $2, $16 > +; MMR3-NEXT: srlv $3, $7, $16 > +; MMR3-NEXT: not16 $6, $16 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sll16 $2, $2, 1 > +; MMR3-NEXT: sllv $2, $2, $6 > +; MMR3-NEXT: li16 $6, 64 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: srlv $4, $4, $16 > +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill > +; MMR3-NEXT: subu16 $7, $6, $16 > ; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: andi16 $2, $7, 32 > -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $5, $16, 32 > -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill > -; MMR3-NEXT: move $4, $9 > +; MMR3-NEXT: andi16 $5, $7, 32 > +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill > +; MMR3-NEXT: andi16 $6, $16, 32 > +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $3, $9 > ; MMR3-NEXT: li16 $17, 0 > -; MMR3-NEXT: movn $4, $17, $2 > -; MMR3-NEXT: movn $3, $6, $5 > -; MMR3-NEXT: addiu $2, $16, -64 > -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srlv $5, $5, $2 > -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $17, 1 > -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > -; MMR3-NEXT: not16 $5, $2 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $4 > -; MMR3-NEXT: srav $1, $17, $2 > -; MMR3-NEXT: andi16 $2, $2, 32 > -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $2 > -; MMR3-NEXT: sllv $2, $17, $7 > -; MMR3-NEXT: not16 $4, $7 > -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $6, $7, 1 > -; MMR3-NEXT: srlv $6, $6, $4 > +; MMR3-NEXT: movn $3, $17, $5 > +; MMR3-NEXT: movn $2, $4, $6 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $4, $17, $4 > +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $6, 1 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $3 > +; MMR3-NEXT: addiu $3, $16, -64 > +; MMR3-NEXT: srav $1, $6, $3 > +; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: sllv $3, $6, $7 > +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: not16 $3, $7 > +; MMR3-NEXT: srl16 $4, $17, 1 > +; MMR3-NEXT: srlv $3, $4, $3 > ; MMR3-NEXT: sltiu $10, $16, 64 > -; MMR3-NEXT: movn $5, $3, $10 > -; MMR3-NEXT: or16 $6, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload > -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sllv $3, $4, $3 > +; MMR3-NEXT: movn $5, $2, $10 > +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload > ; MMR3-NEXT: or16 $3, $2 > -; MMR3-NEXT: srav $11, $17, $16 > -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $11, $4 > -; MMR3-NEXT: sra $2, $17, 31 > +; MMR3-NEXT: srlv $2, $17, $16 > +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sllv $17, $7, $4 > +; MMR3-NEXT: or16 $17, $2 > +; MMR3-NEXT: srav $11, $6, $16 > +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $17, $11, $2 > +; MMR3-NEXT: sra $2, $6, 31 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: move $8, $2 > -; MMR3-NEXT: movn $8, $3, $10 > -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $6, $9, $3 > -; MMR3-NEXT: li16 $3, 0 > -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $7, $3, $4 > -; MMR3-NEXT: or16 $7, $6 > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: movn $4, $17, $10 > +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $9, $6 > +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $3 > ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movn $1, $2, $3 > ; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $3, $16 > -; MMR3-NEXT: movn $11, $2, $4 > +; MMR3-NEXT: movn $11, $2, $6 > ; MMR3-NEXT: movn $2, $11, $10 > -; MMR3-NEXT: move $3, $8 > +; MMR3-NEXT: move $3, $4 > ; MMR3-NEXT: move $4, $1 > ; MMR3-NEXT: lwp $16, 40($sp) > ; MMR3-NEXT: addiusp 48 > @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > -; MMR6-NEXT: move $1, $7 > +; MMR6-NEXT: move $12, $7 > ; MMR6-NEXT: lw $3, 44($sp) > ; MMR6-NEXT: li16 $2, 64 > -; MMR6-NEXT: subu16 $7, $2, $3 > -; MMR6-NEXT: sllv $8, $5, $7 > -; MMR6-NEXT: andi16 $2, $7, 32 > -; MMR6-NEXT: selnez $9, $8, $2 > -; MMR6-NEXT: sllv $10, $4, $7 > -; MMR6-NEXT: not16 $7, $7 > -; MMR6-NEXT: srl16 $16, $5, 1 > -; MMR6-NEXT: srlv $7, $16, $7 > -; MMR6-NEXT: or $7, $10, $7 > -; MMR6-NEXT: seleqz $7, $7, $2 > -; MMR6-NEXT: or $7, $9, $7 > -; MMR6-NEXT: srlv $9, $1, $3 > -; MMR6-NEXT: not16 $16, $3 > -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: subu16 $16, $2, $3 > +; MMR6-NEXT: sllv $1, $5, $16 > +; MMR6-NEXT: andi16 $2, $16, 32 > +; MMR6-NEXT: selnez $8, $1, $2 > +; MMR6-NEXT: sllv $9, $4, $16 > +; MMR6-NEXT: not16 $16, $16 > +; MMR6-NEXT: srl16 $17, $5, 1 > +; MMR6-NEXT: srlv $10, $17, $16 > +; MMR6-NEXT: or $9, $9, $10 > +; MMR6-NEXT: seleqz $9, $9, $2 > +; MMR6-NEXT: or $8, $8, $9 > +; MMR6-NEXT: srlv $9, $7, $3 > +; MMR6-NEXT: not16 $7, $3 > +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $17, $6, 1 > -; MMR6-NEXT: sllv $10, $17, $16 > +; MMR6-NEXT: sllv $10, $17, $7 > ; MMR6-NEXT: or $9, $10, $9 > ; MMR6-NEXT: andi16 $17, $3, 32 > ; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: srlv $10, $6, $3 > ; MMR6-NEXT: selnez $11, $10, $17 > ; MMR6-NEXT: seleqz $10, $10, $17 > -; MMR6-NEXT: or $10, $10, $7 > -; MMR6-NEXT: seleqz $12, $8, $2 > -; MMR6-NEXT: or $8, $11, $9 > +; MMR6-NEXT: or $8, $10, $8 > +; MMR6-NEXT: seleqz $1, $1, $2 > +; MMR6-NEXT: or $9, $11, $9 > ; MMR6-NEXT: addiu $2, $3, -64 > -; MMR6-NEXT: srlv $9, $5, $2 > +; MMR6-NEXT: srlv $10, $5, $2 > ; MMR6-NEXT: sll16 $7, $4, 1 > ; MMR6-NEXT: not16 $16, $2 > ; MMR6-NEXT: sllv $11, $7, $16 > ; MMR6-NEXT: sltiu $13, $3, 64 > -; MMR6-NEXT: or $8, $8, $12 > -; MMR6-NEXT: selnez $10, $10, $13 > -; MMR6-NEXT: or $9, $11, $9 > -; MMR6-NEXT: srav $11, $4, $2 > +; MMR6-NEXT: or $1, $9, $1 > +; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $9, $11, $10 > +; MMR6-NEXT: srav $10, $4, $2 > ; MMR6-NEXT: andi16 $2, $2, 32 > -; MMR6-NEXT: seleqz $12, $11, $2 > +; MMR6-NEXT: seleqz $11, $10, $2 > ; MMR6-NEXT: sra $14, $4, 31 > ; MMR6-NEXT: selnez $15, $14, $2 > ; MMR6-NEXT: seleqz $9, $9, $2 > -; MMR6-NEXT: or $12, $15, $12 > -; MMR6-NEXT: seleqz $12, $12, $13 > -; MMR6-NEXT: selnez $2, $11, $2 > -; MMR6-NEXT: seleqz $11, $14, $13 > -; MMR6-NEXT: or $10, $10, $12 > -; MMR6-NEXT: selnez $10, $10, $3 > -; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $11, $15, $11 > +; MMR6-NEXT: seleqz $11, $11, $13 > +; MMR6-NEXT: selnez $2, $10, $2 > +; MMR6-NEXT: seleqz $10, $14, $13 > +; MMR6-NEXT: or $8, $8, $11 > +; MMR6-NEXT: selnez $8, $8, $3 > +; MMR6-NEXT: selnez $1, $1, $13 > ; MMR6-NEXT: or $2, $2, $9 > ; MMR6-NEXT: srav $9, $4, $3 > ; MMR6-NEXT: seleqz $4, $9, $17 > -; MMR6-NEXT: selnez $12, $14, $17 > -; MMR6-NEXT: or $4, $12, $4 > -; MMR6-NEXT: selnez $12, $4, $13 > +; MMR6-NEXT: selnez $11, $14, $17 > +; MMR6-NEXT: or $4, $11, $4 > +; MMR6-NEXT: selnez $11, $4, $13 > ; MMR6-NEXT: seleqz $2, $2, $13 > ; MMR6-NEXT: seleqz $4, $6, $3 > -; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $8, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > +; MMR6-NEXT: seleqz $6, $12, $3 > ; MMR6-NEXT: or $1, $1, $2 > -; MMR6-NEXT: or $4, $4, $10 > -; MMR6-NEXT: or $2, $12, $11 > -; MMR6-NEXT: srlv $3, $5, $3 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $5, $7, $5 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: seleqz $3, $3, $17 > -; MMR6-NEXT: selnez $5, $9, $17 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: selnez $3, $3, $13 > -; MMR6-NEXT: or $3, $3, $11 > +; MMR6-NEXT: selnez $1, $1, $3 > +; MMR6-NEXT: or $1, $6, $1 > +; MMR6-NEXT: or $4, $4, $8 > +; MMR6-NEXT: or $6, $11, $10 > +; MMR6-NEXT: srlv $2, $5, $3 > +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $3, $7, $3 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: seleqz $2, $2, $17 > +; MMR6-NEXT: selnez $3, $9, $17 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: selnez $2, $2, $13 > +; MMR6-NEXT: or $3, $2, $10 > +; MMR6-NEXT: move $2, $6 > ; MMR6-NEXT: move $5, $1 > ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload > ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > index ed2bfc9fcf60..e4b4b3ae1d0f 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 68($sp) > ; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: subu16 $7, $2, $16 > -; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: move $17, $5 > -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $3, $7, 32 > +; MMR3-NEXT: subu16 $17, $2, $16 > +; MMR3-NEXT: sllv $9, $5, $17 > +; MMR3-NEXT: andi16 $3, $17, 32 > ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > ; MMR3-NEXT: li16 $2, 0 > ; MMR3-NEXT: move $4, $9 > ; MMR3-NEXT: movn $4, $2, $3 > -; MMR3-NEXT: srlv $5, $8, $16 > +; MMR3-NEXT: srlv $5, $7, $16 > ; MMR3-NEXT: not16 $3, $16 > ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sll16 $2, $6, 1 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sllv $2, $2, $3 > ; MMR3-NEXT: or16 $2, $5 > -; MMR3-NEXT: srlv $5, $6, $16 > -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: srlv $7, $6, $16 > ; MMR3-NEXT: andi16 $3, $16, 32 > ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $2, $5, $3 > +; MMR3-NEXT: movn $2, $7, $3 > ; MMR3-NEXT: addiu $3, $16, -64 > ; MMR3-NEXT: or16 $2, $4 > -; MMR3-NEXT: srlv $4, $17, $3 > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $4, 1 > -; MMR3-NEXT: not16 $5, $3 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $17 > -; MMR3-NEXT: srlv $1, $4, $3 > -; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $3, $6, $3 > ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $3, 1 > +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $4 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: srlv $1, $3, $4 > +; MMR3-NEXT: andi16 $4, $4, 32 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $4 > ; MMR3-NEXT: sltiu $10, $16, 64 > ; MMR3-NEXT: movn $5, $2, $10 > -; MMR3-NEXT: sllv $2, $4, $7 > -; MMR3-NEXT: not16 $3, $7 > -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $4, $7, 1 > +; MMR3-NEXT: sllv $2, $3, $17 > +; MMR3-NEXT: not16 $3, $17 > +; MMR3-NEXT: srl16 $4, $6, 1 > ; MMR3-NEXT: srlv $4, $4, $3 > ; MMR3-NEXT: or16 $4, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > +; MMR3-NEXT: srlv $2, $6, $16 > ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload > ; MMR3-NEXT: sllv $3, $6, $3 > ; MMR3-NEXT: or16 $3, $2 > ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload > ; MMR3-NEXT: srlv $2, $2, $16 > -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $2, $17 > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $2, $6 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: li16 $6, 0 > -; MMR3-NEXT: movz $3, $6, $10 > -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $4, $9, $7 > -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $6, $7, $17 > -; MMR3-NEXT: or16 $6, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movz $3, $17, $10 > +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $4, $9, $17 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $4 > ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $1, $7, $4 > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $1, $6, $10 > +; MMR3-NEXT: movn $1, $17, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $4, $16 > -; MMR3-NEXT: movn $2, $7, $17 > +; MMR3-NEXT: movn $2, $17, $6 > ; MMR3-NEXT: li16 $4, 0 > ; MMR3-NEXT: movz $2, $4, $10 > ; MMR3-NEXT: move $4, $1 > @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; > ; MMR6-LABEL: lshr_i128: > ; MMR6: # %bb.0: # %entry > -; MMR6-NEXT: addiu $sp, $sp, -32 > -; MMR6-NEXT: .cfi_def_cfa_offset 32 > -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill > -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill > +; MMR6-NEXT: addiu $sp, $sp, -24 > +; MMR6-NEXT: .cfi_def_cfa_offset 24 > +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > ; MMR6-NEXT: move $1, $7 > -; MMR6-NEXT: move $7, $5 > -; MMR6-NEXT: lw $3, 60($sp) > +; MMR6-NEXT: move $7, $4 > +; MMR6-NEXT: lw $3, 52($sp) > ; MMR6-NEXT: srlv $2, $1, $3 > -; MMR6-NEXT: not16 $5, $3 > -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill > -; MMR6-NEXT: move $17, $6 > -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $3 > +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > +; MMR6-NEXT: move $4, $6 > +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $6, $6, 1 > -; MMR6-NEXT: sllv $6, $6, $5 > +; MMR6-NEXT: sllv $6, $6, $16 > ; MMR6-NEXT: or $8, $6, $2 > -; MMR6-NEXT: addiu $5, $3, -64 > -; MMR6-NEXT: srlv $9, $7, $5 > -; MMR6-NEXT: move $6, $4 > -; MMR6-NEXT: sll16 $2, $4, 1 > -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill > -; MMR6-NEXT: not16 $16, $5 > +; MMR6-NEXT: addiu $6, $3, -64 > +; MMR6-NEXT: srlv $9, $5, $6 > +; MMR6-NEXT: sll16 $2, $7, 1 > +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $6 > ; MMR6-NEXT: sllv $10, $2, $16 > ; MMR6-NEXT: andi16 $16, $3, 32 > ; MMR6-NEXT: seleqz $8, $8, $16 > ; MMR6-NEXT: or $9, $10, $9 > -; MMR6-NEXT: srlv $10, $17, $3 > +; MMR6-NEXT: srlv $10, $4, $3 > ; MMR6-NEXT: selnez $11, $10, $16 > ; MMR6-NEXT: li16 $17, 64 > ; MMR6-NEXT: subu16 $2, $17, $3 > -; MMR6-NEXT: sllv $12, $7, $2 > -; MMR6-NEXT: move $17, $7 > +; MMR6-NEXT: sllv $12, $5, $2 > ; MMR6-NEXT: andi16 $4, $2, 32 > -; MMR6-NEXT: andi16 $7, $5, 32 > -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill > -; MMR6-NEXT: seleqz $9, $9, $7 > +; MMR6-NEXT: andi16 $17, $6, 32 > +; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: seleqz $13, $12, $4 > ; MMR6-NEXT: or $8, $11, $8 > ; MMR6-NEXT: selnez $11, $12, $4 > -; MMR6-NEXT: sllv $12, $6, $2 > -; MMR6-NEXT: move $7, $6 > -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sllv $12, $7, $2 > ; MMR6-NEXT: not16 $2, $2 > -; MMR6-NEXT: srl16 $6, $17, 1 > +; MMR6-NEXT: srl16 $6, $5, 1 > ; MMR6-NEXT: srlv $2, $6, $2 > ; MMR6-NEXT: or $2, $12, $2 > ; MMR6-NEXT: seleqz $2, $2, $4 > -; MMR6-NEXT: srlv $4, $7, $5 > -; MMR6-NEXT: or $11, $11, $2 > -; MMR6-NEXT: or $5, $8, $13 > -; MMR6-NEXT: srlv $6, $17, $3 > -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: selnez $7, $4, $2 > -; MMR6-NEXT: sltiu $8, $3, 64 > -; MMR6-NEXT: selnez $12, $5, $8 > -; MMR6-NEXT: or $7, $7, $9 > -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: addiu $4, $3, -64 > +; MMR6-NEXT: srlv $4, $7, $4 > +; MMR6-NEXT: or $12, $11, $2 > +; MMR6-NEXT: or $6, $8, $13 > +; MMR6-NEXT: srlv $5, $5, $3 > +; MMR6-NEXT: selnez $8, $4, $17 > +; MMR6-NEXT: sltiu $11, $3, 64 > +; MMR6-NEXT: selnez $13, $6, $11 > +; MMR6-NEXT: or $8, $8, $9 > ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $9, $2, $5 > +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $9, $6, $2 > ; MMR6-NEXT: seleqz $10, $10, $16 > -; MMR6-NEXT: li16 $5, 0 > -; MMR6-NEXT: or $10, $10, $11 > -; MMR6-NEXT: or $6, $9, $6 > -; MMR6-NEXT: seleqz $2, $7, $8 > -; MMR6-NEXT: seleqz $7, $5, $8 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: srlv $9, $5, $3 > -; MMR6-NEXT: seleqz $11, $9, $16 > -; MMR6-NEXT: selnez $11, $11, $8 > +; MMR6-NEXT: li16 $2, 0 > +; MMR6-NEXT: or $10, $10, $12 > +; MMR6-NEXT: or $9, $9, $5 > +; MMR6-NEXT: seleqz $5, $8, $11 > +; MMR6-NEXT: seleqz $8, $2, $11 > +; MMR6-NEXT: srlv $7, $7, $3 > +; MMR6-NEXT: seleqz $2, $7, $16 > +; MMR6-NEXT: selnez $2, $2, $11 > ; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $12, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > -; MMR6-NEXT: or $5, $1, $2 > -; MMR6-NEXT: or $2, $7, $11 > -; MMR6-NEXT: seleqz $1, $6, $16 > -; MMR6-NEXT: selnez $6, $9, $16 > -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $9, $16, $3 > -; MMR6-NEXT: selnez $10, $10, $8 > -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $4, $4, $16 > -; MMR6-NEXT: seleqz $4, $4, $8 > -; MMR6-NEXT: or $4, $10, $4 > +; MMR6-NEXT: or $5, $13, $5 > +; MMR6-NEXT: selnez $5, $5, $3 > +; MMR6-NEXT: or $5, $1, $5 > +; MMR6-NEXT: or $2, $8, $2 > +; MMR6-NEXT: seleqz $1, $9, $16 > +; MMR6-NEXT: selnez $6, $7, $16 > +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: seleqz $7, $7, $3 > +; MMR6-NEXT: selnez $9, $10, $11 > +; MMR6-NEXT: seleqz $4, $4, $17 > +; MMR6-NEXT: seleqz $4, $4, $11 > </cut>

3 years, 11 months

2
7
0 0

[TCWG CI] 482.sphinx3 slowed down by 4% after gcc: tree-optimization/65206 - dependence analysis on mixed pointer/array

by ci_notify＠linaro.org

After gcc commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> tree-optimization/65206 - dependence analysis on mixed pointer/array the following benchmarks slowed down by more than 2%: - 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach f92901a508305f291fcf2acae0825379477724de ../artifacts/test.sh # Reproduce last_good build git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> Date: Wed Sep 8 14:42:31 2021 +0200 tree-optimization/65206 - dependence analysis on mixed pointer/array This adds the capability to analyze the dependence of mixed pointer/array accesses. The example is from where using a masked load/store creates the pointer-based access when an otherwise unconditional access is array based. Other examples would include accesses to an array mixed with accesses from inlined helpers that work on pointers. The idea is quite simple and old - analyze the data-ref indices as if the reference was pointer-based. The following change does this by changing dr_analyze_indices to work on the indices sub-structure and storing an alternate indices substructure in each data reference. That alternate set of indices is analyzed lazily by initialize_data_dependence_relation when it fails to match-up the main set of indices of two data references. initialize_data_dependence_relation is refactored into a head and a tail worker and changed to work on one of the indices structures and thus away from using DR_* access macros which continue to reference the main indices substructure. There are quite some vectorization and loop distribution opportunities unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and 544.nab_r see amendments in what they report with -fopt-info-loop while the rest of the specrate set sees no changes there. Measuring runtime for the set where changes were reported reveals nothing off-noise besides 511.povray_r which seems to regress slightly for me (on a Zen2 machine with -Ofast -march=native). 2021-09-08 Richard Biener <rguenther(a)suse.de> PR tree-optimization/65206 * tree-data-ref.h (struct data_reference): Add alt_indices, order it last. * tree-data-ref.c (free_data_ref): Release alt_indices. (dr_analyze_indices): Work on struct indices and get DR_REF as tree. (create_data_ref): Adjust. (initialize_data_dependence_relation): Split into head and tail. When the base objects fail to match up try again with pointer-based analysis of indices. * tree-vectorizer.c (vec_info_shared::check_datarefs): Do not compare the lazily computed alternate set of indices. * gcc.dg/torture/20210916.c: New testcase. * gcc.dg/vect/pr65206.c: Likewise. --- gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++ gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++ gcc/tree-data-ref.c | 174 +++++++++++++++++++++----------- gcc/tree-data-ref.h | 9 +- gcc/tree-vectorizer.c | 3 +- 5 files changed, 168 insertions(+), 60 deletions(-) diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c new file mode 100644 index 00000000000..0ea6d45e463 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/20210916.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +typedef union tree_node *tree; +struct tree_base { + unsigned : 1; + unsigned lang_flag_2 : 1; +}; +struct tree_type { + tree main_variant; +}; +union tree_node { + struct tree_base base; + struct tree_type type; +}; +tree finish_struct_t, finish_struct_x; +void finish_struct() +{ + for (; finish_struct_t->type.main_variant;) + finish_struct_x->base.lang_flag_2 = 0; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c new file mode 100644 index 00000000000..3b6262622c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */ +/* { dg-additional-options "-mavx" { target avx } } */ + +#define N 1024 + +double a[N], b[N]; + +void foo () +{ + for (int i = 0; i < N; ++i) + if (b[i] < 3.) + a[i] += b[i]; +} + +/* We get a .MASK_STORE because while the load of a[i] does not trap + the store would introduce store data races. Make sure we still + can handle the data dependence with zero distance. */ + +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */ diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e061baa7c20..18307a554fc 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "vr-values.h" #include "range-op.h" +#include "tree-ssa-loop-ivopts.h" static struct datadep_stats { @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base) DR, analyzed in LOOP and instantiated before NEST. */ static void -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop) { - vec<tree> access_fns = vNULL; - tree ref, op; - tree base, off, access_fn; - /* If analyzing a basic-block there are no indices to analyze and thus no access functions. */ if (!nest) { - DR_BASE_OBJECT (dr) = DR_REF (dr); - DR_ACCESS_FNS (dr).create (0); + dri->base_object = ref; + dri->access_fns.create (0); return; } - ref = DR_REF (dr); + vec<tree> access_fns = vNULL; /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses into a two element array with a constant index. The base is @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) { if (TREE_CODE (ref) == ARRAY_REF) { - op = TREE_OPERAND (ref, 1); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 1); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); access_fns.safe_push (access_fn); } @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) analyzed nest, add it as an additional independent access-function. */ if (TREE_CODE (ref) == MEM_REF) { - op = TREE_OPERAND (ref, 0); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 0); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC) { - tree orig_type; tree memoff = TREE_OPERAND (ref, 1); - base = initial_condition (access_fn); - orig_type = TREE_TYPE (base); + tree base = initial_condition (access_fn); + tree orig_type = TREE_TYPE (base); STRIP_USELESS_TYPE_CONVERSION (base); + tree off; split_constant_offset (base, &base, &off); STRIP_USELESS_TYPE_CONVERSION (base); /* Fold the MEM_REF offset into the evolutions initial @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) base, memoff); MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old); MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old); - DR_UNCONSTRAINED_BASE (dr) = true; + dri->unconstrained_base = true; access_fns.safe_push (access_fn); } } @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) build_int_cst (reference_alias_ptr_type (ref), 0)); } - DR_BASE_OBJECT (dr) = ref; - DR_ACCESS_FNS (dr) = access_fns; + dri->base_object = ref; + dri->access_fns = access_fns; } /* Extracts the alias analysis information from the memory reference DR. */ @@ -1463,6 +1460,8 @@ void free_data_ref (data_reference_p dr) { DR_ACCESS_FNS (dr).release (); + if (dr->alt_indices.base_object) + dr->alt_indices.access_fns.release (); free (dr); } @@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt, dr_analyze_innermost (&DR_INNERMOST (dr), memref, nest != NULL ? loop : NULL, stmt); - dr_analyze_indices (dr, nest, loop); + dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop); dr_analyze_alias (dr); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b) TREE_TYPE (TREE_OPERAND (ref_b, 0))); } -/* Initialize a data dependence relation between data accesses A and - B. NB_LOOPS is the number of loops surrounding the references: the - size of the classic distance/direction vectors. */ +/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES + is true when the main indices of A and B were not comparable so we try again + with alternate indices computed on an indirect reference. */ struct data_dependence_relation * -initialize_data_dependence_relation (struct data_reference *a, - struct data_reference *b, - vec<loop_p> loop_nest) +initialize_data_dependence_relation (struct data_dependence_relation *res, + vec<loop_p> loop_nest, + bool use_alt_indices) { - struct data_dependence_relation *res; + struct data_reference *a = DDR_A (res); + struct data_reference *b = DDR_B (res); unsigned int i; - res = XCNEW (struct data_dependence_relation); - DDR_A (res) = a; - DDR_B (res) = b; - DDR_LOOP_NEST (res).create (0); - DDR_SUBSCRIPTS (res).create (0); - DDR_DIR_VECTS (res).create (0); - DDR_DIST_VECTS (res).create (0); - - if (a == NULL || b == NULL) + struct indices *indices_a = &a->indices; + struct indices *indices_b = &b->indices; + if (use_alt_indices) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (TREE_CODE (DR_REF (a)) != MEM_REF) + indices_a = &a->alt_indices; + if (TREE_CODE (DR_REF (b)) != MEM_REF) + indices_b = &b->alt_indices; } - - /* If the data references do not alias, then they are independent. */ - if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) - { - DDR_ARE_DEPENDENT (res) = chrec_known; - return res; - } - - unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); - unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + unsigned int num_dimensions_a = indices_a->access_fns.length (); + unsigned int num_dimensions_b = indices_b->access_fns.length (); if (num_dimensions_a == 0 || num_dimensions_b == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; @@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a, the a and b accesses have a single ARRAY_REF component reference [0] but have two subscripts. */ - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) num_dimensions_a -= 1; - if (DR_UNCONSTRAINED_BASE (b)) + if (indices_b->unconstrained_base) num_dimensions_b -= 1; /* These structures describe sequences of component references in @@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a, B: [3, 4] (i.e. s.e) */ while (index_a < num_dimensions_a && index_b < num_dimensions_b) { + /* The alternate indices form always has a single dimension + with unconstrained base. */ + gcc_assert (!use_alt_indices); + /* REF_A and REF_B must be one of the component access types allowed by dr_analyze_indices. */ gcc_checking_assert (access_fn_component_p (ref_a)); @@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a, /* See whether FULL_SEQ ends at the base and whether the two bases are equal. We do not care about TBAA or alignment info so we can use OEP_ADDRESS_OF to avoid false negatives. */ - tree base_a = DR_BASE_OBJECT (a); - tree base_b = DR_BASE_OBJECT (b); + tree base_a = indices_a->base_object; + tree base_b = indices_b->base_object; bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a && full_seq.start_b + full_seq.length == num_dimensions_b - && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && (indices_a->unconstrained_base + == indices_b->unconstrained_base) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) && (types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) @@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a, both lvalues are distinct from the object's declared type. */ if (same_base_p) { - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) full_seq.length += 1; } else @@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a, /* Punt if we didn't find a suitable sequence. */ if (full_seq.length == 0) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (use_alt_indices + || (TREE_CODE (DR_REF (a)) == MEM_REF + && TREE_CODE (DR_REF (b)) == MEM_REF) + || may_be_nonaddressable_p (DR_REF (a)) + || may_be_nonaddressable_p (DR_REF (b))) + { + /* Fully exhausted possibilities. */ + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* Try evaluating both DRs as dereferences of pointers. */ + if (!a->alt_indices.base_object + && TREE_CODE (DR_REF (a)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)), + build_int_cst + (reference_alias_ptr_type (DR_REF (a)), 0)); + dr_analyze_indices (&a->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (a))); + } + if (!b->alt_indices.base_object + && TREE_CODE (DR_REF (b)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)), + build_int_cst + (reference_alias_ptr_type (DR_REF (b)), 0)); + dr_analyze_indices (&b->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (b))); + } + return initialize_data_dependence_relation (res, loop_nest, true); } if (!same_base_p) @@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a, struct subscript *subscript; subscript = XNEW (struct subscript); - SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); - SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); + SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i]; + SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i]; SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a, return res; } +/* Initialize a data dependence relation between data accesses A and + B. NB_LOOPS is the number of loops surrounding the references: the + size of the classic distance/direction vectors. */ + +struct data_dependence_relation * +initialize_data_dependence_relation (struct data_reference *a, + struct data_reference *b, + vec<loop_p> loop_nest) +{ + data_dependence_relation *res = XCNEW (struct data_dependence_relation); + DDR_A (res) = a; + DDR_B (res) = b; + DDR_LOOP_NEST (res).create (0); + DDR_SUBSCRIPTS (res).create (0); + DDR_DIR_VECTS (res).create (0); + DDR_DIST_VECTS (res).create (0); + + if (a == NULL || b == NULL) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* If the data references do not alias, then they are independent. */ + if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) + { + DDR_ARE_DEPENDENT (res) = chrec_known; + return res; + } + + return initialize_data_dependence_relation (res, loop_nest, false); +} + + /* Frees memory used by the conflict function F. */ static void diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 685f33d85ae..74f579c9f3f 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -166,14 +166,19 @@ struct data_reference and runs to completion. */ bool is_conditional_in_stmt; + /* Alias information for the data reference. */ + struct dr_alias alias; + /* Behavior of the memory reference in the innermost loop. */ struct innermost_loop_behavior innermost; /* Subscripts of this data reference. */ struct indices indices; - /* Alias information for the data reference. */ - struct dr_alias alias; + /* Alternate subscripts initialized lazily and used by data-dependence + analysis only when the main indices of two DRs are not comparable. + Keep last to keep vec_info_shared::check_datarefs happy. */ + struct indices alt_indices; }; #define DR_STMT(DR) (DR)->stmt diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 3aa3e2a6783..20daa31187d 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs () return; gcc_assert (datarefs.length () == datarefs_copy.length ()); for (unsigned i = 0; i < datarefs.length (); ++i) - if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0) + if (memcmp (&datarefs_copy[i], datarefs[i], + offsetof (data_reference, alt_indices)) != 0) gcc_unreachable (); } </cut>

3 years, 11 months

3
2
0 0

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.: commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Factor predidacte analysis out of tree-ssa-uninit.c into its own module. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6240 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6999 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-mainline-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Reproduce builds: <cut> mkdir investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 cd investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 94c12ffac234b29a702aa7b6730f2678265857c8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 51166eb2c534692c3c7779def24f83c8c3811b98 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Sep 17 15:39:13 2021 -0600 Factor predidacte analysis out of tree-ssa-uninit.c into its own module. gcc/ChangeLog: * Makefile.in (OBJS): Add gimple-predicate-analysis.o. * tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis. (MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same. (check_defs): Add comment. (can_skip_redundant_opnd): Update comment. (compute_uninit_opnds_pos): Adjust to namespace change. (find_pdom): Move to gimple-predicate-analysis.cc. (find_dom): Same. (struct uninit_undef_val_t): New. (is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc. (find_control_equiv_block): Same. (MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same. (MAX_SWITCH_CASES): Same. (compute_control_dep_chain): Same. (find_uninit_use): Use predicate analyzer. (struct pred_info): Move to gimple-predicate-analysis. (convert_control_dep_chain_into_preds): Same. (find_predicates): Same. (collect_phi_def_edges): Same. (warn_uninitialized_phi): Use predicate analyzer. (find_def_preds): Move to gimple-predicate-analysis. (dump_pred_info): Same. (dump_pred_chain): Same. (dump_predicates): Same. (destroy_predicate_vecs): Remove. (execute_late_warn_uninitialized): New. (get_cmp_code): Move to gimple-predicate-analysis. (is_value_included_in): Same. (value_sat_pred_p): Same. (find_matching_predicate_in_rest_chains): Same. (is_use_properly_guarded): Same. (prune_uninit_phi_opnds): Same. (find_var_cmp_const): Same. (use_pred_not_overlap_with_undef_path_pred): Same. (pred_equal_p): Same. (is_neq_relop_p): Same. (is_neq_zero_form_p): Same. (pred_expr_equal_p): Same. (is_pred_expr_subset_of): Same. (is_pred_chain_subset_of): Same. (is_included_in): Same. (is_superset_of): Same. (pred_neg_p): Same. (simplify_pred): Same. (simplify_preds_2): Same. (simplify_preds_3): Same. (simplify_preds_4): Same. (simplify_preds): Same. (push_pred): Same. (push_to_worklist): Same. (get_pred_info_from_cmp): Same. (is_degenerated_phi): Same. (normalize_one_pred_1): Same. (normalize_one_pred): Same. (normalize_one_pred_chain): Same. (normalize_preds): Same. (can_one_predicate_be_invalidated_p): Same. (can_chain_union_be_invalidated_p): Same. (uninit_uses_cannot_happen): Same. (pass_late_warn_uninitialized::execute): Define. * gimple-predicate-analysis.cc: New file. * gimple-predicate-analysis.h: New file. --- gcc/Makefile.in | 1 + gcc/gimple-predicate-analysis.cc | 2400 +++++++++++++++++++++++++++++++++++++ gcc/gimple-predicate-analysis.h | 158 +++ gcc/tree-ssa-uninit.c | 2431 +++----------------------------------- 4 files changed, 2741 insertions(+), 2249 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index b8229adf580..f36ffa4740b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1394,6 +1394,7 @@ OBJS = \ gimple-loop-jam.o \ gimple-loop-versioning.o \ gimple-low.o \ + gimple-predicate-analysis.o \ gimple-pretty-print.o \ gimple-range.o \ gimple-range-cache.o \ diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc new file mode 100644 index 00000000000..3404f2d630a --- /dev/null +++ b/gcc/gimple-predicate-analysis.cc @@ -0,0 +1,2400 @@ +/* Support for simple predicate analysis. + + Copyright (C) 2001-2021 Free Software Foundation, Inc. + Contributed by Xinliang David Li <davidxl(a)google.com> + Generalized by Martin Sebor <msebor(a)redhat.com> + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#define INCLUDE_STRING +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-pretty-print.h" +#include "diagnostic-core.h" +#include "fold-const.h" +#include "gimple-iterator.h" +#include "tree-ssa.h" +#include "tree-cfg.h" +#include "cfghooks.h" +#include "attribs.h" +#include "builtins.h" +#include "calls.h" +#include "value-query.h" + +#include "gimple-predicate-analysis.h" + +#define DEBUG_PREDICATE_ANALYZER 1 + +/* Find the immediate postdominator of the specified basic block BB. */ + +static inline basic_block +find_pdom (basic_block bb) +{ + basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun); + if (bb == exit_bb) + return exit_bb; + + if (basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb)) + return pdom; + + return exit_bb; +} + +/* Find the immediate dominator of the specified basic block BB. */ + +static inline basic_block +find_dom (basic_block bb) +{ + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + if (bb == entry_bb) + return entry_bb; + + if (basic_block dom = get_immediate_dominator (CDI_DOMINATORS, bb)) + return dom; + + return entry_bb; +} + +/* Return true if BB1 is postdominating BB2 and BB1 is not a loop exit + bb. The loop exit bb check is simple and does not cover all cases. */ + +static bool +is_non_loop_exit_postdominating (basic_block bb1, basic_block bb2) +{ + if (!dominated_by_p (CDI_POST_DOMINATORS, bb2, bb1)) + return false; + + if (single_pred_p (bb1) && !single_succ_p (bb2)) + return false; + + return true; +} + +/* Find BB's closest postdominator that is its control equivalent (i.e., + that's controlled by the same predicate). */ + +static inline basic_block +find_control_equiv_block (basic_block bb) +{ + basic_block pdom = find_pdom (bb); + + /* Skip the postdominating bb that is also a loop exit. */ + if (!is_non_loop_exit_postdominating (pdom, bb)) + return NULL; + + /* If the postdominator is dominated by BB, return it. */ + if (dominated_by_p (CDI_DOMINATORS, pdom, bb)) + return pdom; + + return NULL; +} + +/* Return true if X1 is the negation of X2. */ + +static inline bool +pred_neg_p (const pred_info &x1, const pred_info &x2) +{ + if (!operand_equal_p (x1.pred_lhs, x2.pred_lhs, 0) + || !operand_equal_p (x1.pred_rhs, x2.pred_rhs, 0)) + return false; + + tree_code c1 = x1.cond_code, c2; + if (x1.invert == x2.invert) + c2 = invert_tree_comparison (x2.cond_code, false); + else + c2 = x2.cond_code; + + return c1 == c2; +} + +/* Return whether the condition (VAL CMPC BOUNDARY) is true. */ + +static bool +is_value_included_in (tree val, tree boundary, tree_code cmpc) +{ + /* Only handle integer constant here. */ + if (TREE_CODE (val) != INTEGER_CST || TREE_CODE (boundary) != INTEGER_CST) + return true; + + bool inverted = false; + if (cmpc == GE_EXPR || cmpc == GT_EXPR || cmpc == NE_EXPR) + { + cmpc = invert_tree_comparison (cmpc, false); + inverted = true; + } + + bool result; + if (cmpc == EQ_EXPR) + result = tree_int_cst_equal (val, boundary); + else if (cmpc == LT_EXPR) + result = tree_int_cst_lt (val, boundary); + else + { + gcc_assert (cmpc == LE_EXPR); + result = tree_int_cst_le (val, boundary); + } + + if (inverted) + result ^= 1; + + return result; +} + +/* Format the vector of edges EV as a string. */ + +static std::string +format_edge_vec (const vec<edge> &ev) +{ + std::string str; + + unsigned n = ev.length (); + for (unsigned i = 0; i < n; ++i) + { + char es[32]; + const_edge e = ev[i]; + sprintf (es, "%u", e->src->index); + str += es; + if (i + 1 < n) + str += " -> "; + } + return str; +} + +/* Format the first N elements of the array of vector of edges EVA as + a string. */ + +static std::string +format_edge_vecs (const vec<edge> eva[], unsigned n) +{ + std::string str; + + for (unsigned i = 0; i != n; ++i) + { + str += '{'; + str += format_edge_vec (eva[i]); + str += '}'; + if (i + 1 < n) + str += ", "; + } + return str; +} + +/* Dump a single pred_info to DUMP_FILE. */ + +static void +dump_pred_info (const pred_info &pred) +{ + if (pred.invert) + fprintf (dump_file, "NOT ("); + print_generic_expr (dump_file, pred.pred_lhs); + fprintf (dump_file, " %s ", op_symbol_code (pred.cond_code)); + print_generic_expr (dump_file, pred.pred_rhs); + if (pred.invert) + fputc (')', dump_file); +} + +/* Dump a pred_chain to DUMP_FILE. */ + +static void +dump_pred_chain (const pred_chain &chain) +{ + unsigned np = chain.length (); + if (np > 1) + fprintf (dump_file, "AND ("); + + for (unsigned j = 0; j < np; j++) + { + dump_pred_info (chain[j]); + if (j < np - 1) + fprintf (dump_file, ", "); + else if (j > 0) + fputc (')', dump_file); + } +} + +/* Dump the predicate chain PREDS for STMT, prefixed by MSG. */ + +static void +dump_predicates (gimple *stmt, const pred_chain_union &preds, const char *msg) +{ + fprintf (dump_file, "%s", msg); + if (stmt) + { + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "is guarded by:\n"); + } + + unsigned np = preds.length (); + if (np > 1) + fprintf (dump_file, "OR ("); + for (unsigned i = 0; i < np; i++) + { + dump_pred_chain (preds[i]); + if (i < np - 1) + fprintf (dump_file, ", "); + else if (i > 0) + fputc (')', dump_file); + } + fputc ('\n', dump_file); +} + +/* Dump the first NCHAINS elements of the DEP_CHAINS array into DUMP_FILE. */ + +static void +dump_dep_chains (const auto_vec<edge> dep_chains[], unsigned nchains) +{ + if (!dump_file) + return; + + for (unsigned i = 0; i != nchains; ++i) + { + const auto_vec<edge> &v = dep_chains[i]; + unsigned n = v.length (); + for (unsigned j = 0; j != n; ++j) + { + fprintf (dump_file, "%u", v[j]->src->index); + if (j + 1 < n) + fprintf (dump_file, " -> "); + } + fputc ('\n', dump_file); + } +} + +/* Return the 'normalized' conditional code with operand swapping + and condition inversion controlled by SWAP_COND and INVERT. */ + +static tree_code +get_cmp_code (tree_code orig_cmp_code, bool swap_cond, bool invert) +{ + tree_code tc = orig_cmp_code; + + if (swap_cond) + tc = swap_tree_comparison (orig_cmp_code); + if (invert) + tc = invert_tree_comparison (tc, false); + + switch (tc) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + break; + default: + return ERROR_MARK; + } + return tc; +} + +/* Return true if PRED is common among all predicate chains in PREDS + (and therefore can be factored out). */ + +static bool +find_matching_predicate_in_rest_chains (const pred_info &pred, + const pred_chain_union &preds) +{ + /* Trival case. */ + if (preds.length () == 1) + return true; + + for (unsigned i = 1; i < preds.length (); i++) + { + bool found = false; + const pred_chain &chain = preds[i]; + unsigned n = chain.length (); + for (unsigned j = 0; j < n; j++) + { + const pred_info &pred2 = chain[j]; + /* Can relax the condition comparison to not use address + comparison. However, the most common case is that + multiple control dependent paths share a common path + prefix, so address comparison should be ok. */ + if (operand_equal_p (pred2.pred_lhs, pred.pred_lhs, 0) + && operand_equal_p (pred2.pred_rhs, pred.pred_rhs, 0) + && pred2.invert == pred.invert) + { + found = true; + break; + } + } + if (!found) + return false; + } + return true; +} + +/* Find a predicate to examine against paths of interest. If there + is no predicate of the "FLAG_VAR CMP CONST" form, try to find one + of that's the form "FLAG_VAR CMP FLAG_VAR" with value range info. + PHI is the phi node whose incoming (interesting) paths need to be + examined. On success, return the comparison code, set defintion + gimple of FLAG_DEF and BOUNDARY_CST. Otherwise return ERROR_MARK. */ + +static tree_code +find_var_cmp_const (pred_chain_union preds, gphi *phi, gimple **flag_def, + tree *boundary_cst) +{ + tree_code vrinfo_code = ERROR_MARK; + gimple *vrinfo_def = NULL; + tree vrinfo_cst = NULL; + + gcc_assert (preds.length () > 0); + pred_chain chain = preds[0]; + for (unsigned i = 0; i < chain.length (); i++) + { + bool use_vrinfo_p = false; + const pred_info &pred = chain[i]; + tree cond_lhs = pred.pred_lhs; + tree cond_rhs = pred.pred_rhs; + if (cond_lhs == NULL_TREE || cond_rhs == NULL_TREE) + continue; + + tree_code code = get_cmp_code (pred.cond_code, false, pred.invert); + if (code == ERROR_MARK) + continue; + + /* Convert to the canonical form SSA_NAME CMP CONSTANT. */ + if (TREE_CODE (cond_lhs) == SSA_NAME + && is_gimple_constant (cond_rhs)) + ; + else if (TREE_CODE (cond_rhs) == SSA_NAME + && is_gimple_constant (cond_lhs)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + /* Check if we can take advantage of FLAG_VAR COMP FLAG_VAR predicate + with value range info. Note only first of such case is handled. */ + else if (vrinfo_code == ERROR_MARK + && TREE_CODE (cond_lhs) == SSA_NAME + && TREE_CODE (cond_rhs) == SSA_NAME) + { + gimple* lhs_def = SSA_NAME_DEF_STMT (cond_lhs); + if (!lhs_def || gimple_code (lhs_def) != GIMPLE_PHI + || gimple_bb (lhs_def) != gimple_bb (phi)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + + /* Check value range info of rhs, do following transforms: + flag_var < [min, max] -> flag_var < max + flag_var > [min, max] -> flag_var > min + + We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: + flag_var <= [min, max] -> flag_var < [min, max+1] + flag_var >= [min, max] -> flag_var > [min-1, max] + if no overflow/wrap. */ + tree type = TREE_TYPE (cond_lhs); + value_range r; + if (!INTEGRAL_TYPE_P (type) + || !get_range_query (cfun)->range_of_expr (r, cond_rhs) + || r.kind () != VR_RANGE) + continue; + + wide_int min = r.lower_bound (); + wide_int max = r.upper_bound (); + if (code == LE_EXPR + && max != wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = LT_EXPR; + max = max + 1; + } + if (code == GE_EXPR + && min != wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = GT_EXPR; + min = min - 1; + } + if (code == LT_EXPR) + cond_rhs = wide_int_to_tree (type, max); + else if (code == GT_EXPR) + cond_rhs = wide_int_to_tree (type, min); + else + continue; + + use_vrinfo_p = true; + } + else + continue; + + if ((*flag_def = SSA_NAME_DEF_STMT (cond_lhs)) == NULL) + continue; + + if (gimple_code (*flag_def) != GIMPLE_PHI + || gimple_bb (*flag_def) != gimple_bb (phi) + || !find_matching_predicate_in_rest_chains (pred, preds)) + continue; + + /* Return if any "flag_var comp const" predicate is found. */ + if (!use_vrinfo_p) + { + *boundary_cst = cond_rhs; + return code; + } + /* Record if any "flag_var comp flag_var[vinfo]" predicate is found. */ + else if (vrinfo_code == ERROR_MARK) + { + vrinfo_code = code; + vrinfo_def = *flag_def; + vrinfo_cst = cond_rhs; + } + } + /* Return the "flag_var cmp flag_var[vinfo]" predicate we found. */ + if (vrinfo_code != ERROR_MARK) + { + *flag_def = vrinfo_def; + *boundary_cst = vrinfo_cst; + } + return vrinfo_code; +} + +/* Return true if all interesting opnds are pruned, false otherwise. + PHI is the phi node with interesting operands, OPNDS is the bitmap + of the interesting operand positions, FLAG_DEF is the statement + defining the flag guarding the use of the PHI output, BOUNDARY_CST + is the const value used in the predicate associated with the flag, + CMP_CODE is the comparison code used in the predicate, VISITED_PHIS + is the pointer set of phis visited, and VISITED_FLAG_PHIS is + the pointer to the pointer set of flag definitions that are also + phis. + + Example scenario: + + BB1: + flag_1 = phi <0, 1> // (1) + var_1 = phi <undef, some_val> + + + BB2: + flag_2 = phi <0, flag_1, flag_1> // (2) + var_2 = phi <undef, var_1, var_1> + if (flag_2 == 1) + goto BB3; + + BB3: + use of var_2 // (3) + + Because some flag arg in (1) is not constant, if we do not look into + the flag phis recursively, it is conservatively treated as unknown and + var_1 is thought to flow into use at (3). Since var_1 is potentially + uninitialized a false warning will be emitted. + Checking recursively into (1), the compiler can find out that only + some_val (which is defined) can flow into (3) which is OK. */ + +static bool +prune_phi_opnds (gphi *phi, unsigned opnds, gphi *flag_def, + tree boundary_cst, tree_code cmp_code, + predicate::func_t &eval, + hash_set<gphi *> *visited_phis, + bitmap *visited_flag_phis) +{ + /* The Boolean predicate guarding the PHI definition. Initialized + lazily from PHI in the first call to is_use_guarded() and cached + for subsequent iterations. */ + predicate def_preds (eval); + + unsigned n = MIN (eval.max_phi_args, gimple_phi_num_args (flag_def)); + for (unsigned i = 0; i < n; i++) + { + if (!MASK_TEST_BIT (opnds, i)) + continue; + + tree flag_arg = gimple_phi_arg_def (flag_def, i); + if (!is_gimple_constant (flag_arg)) + { + if (TREE_CODE (flag_arg) != SSA_NAME) + return false; + + gphi *flag_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (flag_arg)); + if (!flag_arg_def) + return false; + + tree phi_arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (phi_arg) != SSA_NAME) + return false; + + gphi *phi_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (phi_arg)); + if (!phi_arg_def) + return false; + + if (gimple_bb (phi_arg_def) != gimple_bb (flag_arg_def)) + return false; + + if (!*visited_flag_phis) + *visited_flag_phis = BITMAP_ALLOC (NULL); + + tree phi_result = gimple_phi_result (flag_arg_def); + if (bitmap_bit_p (*visited_flag_phis, SSA_NAME_VERSION (phi_result))) + return false; + + bitmap_set_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + + /* Now recursively try to prune the interesting phi args. */ + unsigned opnds_arg_phi = eval.phi_arg_set (phi_arg_def); + if (!prune_phi_opnds (phi_arg_def, opnds_arg_phi, flag_arg_def, + boundary_cst, cmp_code, eval, visited_phis, + visited_flag_phis)) + return false; + + bitmap_clear_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + continue; + } + + /* Now check if the constant is in the guarded range. */ + if (is_value_included_in (flag_arg, boundary_cst, cmp_code)) + { + /* Now that we know that this undefined edge is not pruned. + If the operand is defined by another phi, we can further + prune the incoming edges of that phi by checking + the predicates of this operands. */ + + tree opnd = gimple_phi_arg_def (phi, i); + gimple *opnd_def = SSA_NAME_DEF_STMT (opnd); + if (gphi *opnd_def_phi = dyn_cast <gphi *> (opnd_def)) + { + unsigned opnds2 = eval.phi_arg_set (opnd_def_phi); + if (!MASK_EMPTY (opnds2)) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + if (def_preds.is_use_guarded (phi, opnd_edge->src, + opnd_def_phi, opnds2, + visited_phis)) + return false; + } + } + else + return false; + } + } + + return true; +} + +/* Recursively compute the set PHI's incoming edges with "uninteresting" + operands of a phi chain, i.e., those for which EVAL returns false. + CD_ROOT is the control dependence root from which edges are collected + up the CFG nodes that it's dominated by. *EDGES holds the result, and + VISITED is used for detecting cycles. */ + +static void +collect_phi_def_edges (gphi *phi, basic_block cd_root, auto_vec<edge> *edges, + predicate::func_t &eval, hash_set<gimple *> *visited) +{ + if (visited->elements () == 0 + && DEBUG_PREDICATE_ANALYZER + && dump_file) + { + fprintf (dump_file, "%s for cd_root %u and ", + __func__, cd_root->index); + print_gimple_stmt (dump_file, phi, 0); + + } + + if (visited->add (phi)) + return; + + unsigned n = gimple_phi_num_args (phi); + for (unsigned i = 0; i < n; i++) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + tree opnd = gimple_phi_arg_def (phi, i); + + if (TREE_CODE (opnd) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (opnd); + + if (gimple_code (def) == GIMPLE_PHI + && dominated_by_p (CDI_DOMINATORS, gimple_bb (def), cd_root)) + collect_phi_def_edges (as_a<gphi *> (def), cd_root, edges, eval, + visited); + else if (!eval (opnd)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + edges->safe_push (opnd_edge); + } + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + + if (!eval (opnd)) + edges->safe_push (opnd_edge); + } + } +} + +/* Return an expression corresponding to the predicate PRED. */ + +static tree +build_pred_expr (const pred_info &pred) +{ + tree_code cond_code = pred.cond_code; + tree lhs = pred.pred_lhs; + tree rhs = pred.pred_rhs; + + if (pred.invert) + cond_code = invert_tree_comparison (cond_code, false); + + return build2 (cond_code, TREE_TYPE (lhs), lhs, rhs); +} + +/* Return an expression corresponding to PREDS. */ + +static tree +build_pred_expr (const pred_chain_union &preds, bool invert = false) +{ + tree_code code = invert ? TRUTH_AND_EXPR : TRUTH_OR_EXPR; + tree_code subcode = invert ? TRUTH_OR_EXPR : TRUTH_AND_EXPR; + + tree expr = NULL_TREE; + for (unsigned i = 0; i != preds.length (); ++i) + { + tree subexpr = NULL_TREE; + for (unsigned j = 0; j != preds[i].length (); ++j) + { + const pred_info &pi = preds[i][j]; + tree cond = build_pred_expr (pi); + if (invert) + cond = invert_truthvalue (cond); + subexpr = subexpr ? build2 (subcode, boolean_type_node, + subexpr, cond) : cond; + } + if (expr) + expr = build2 (code, boolean_type_node, expr, subexpr); + else + expr = subexpr; + } + + return expr; +} + +/* Return a bitset of all PHI arguments or zero if there are too many. */ + +unsigned +predicate::func_t::phi_arg_set (gphi *phi) +{ + unsigned n = gimple_phi_num_args (phi); + + if (max_phi_args < n) + return 0; + + /* Set the least significant N bits. */ + return (1U << n) - 1; +} + +/* Determine if the predicate set of the use does not overlap with that + of the interesting paths. The most common senario of guarded use is + in Example 1: + Example 1: + if (some_cond) + { + x = ...; // set x to valid + flag = true; + } + + ... some code ... + + if (flag) + use (x); // use when x is valid + + The real world examples are usually more complicated, but similar + and usually result from inlining: + + bool init_func (int * x) + { + if (some_cond) + return false; + *x = ...; // set *x to valid + return true; + } + + void foo (..) + { + int x; + + if (!init_func (&x)) + return; + + .. some_code ... + use (x); // use when x is valid + } + + Another possible use scenario is in the following trivial example: + + Example 2: + if (n > 0) + x = 1; + ... + if (n > 0) + { + if (m < 2) + ... = x; + } + + Predicate analysis needs to compute the composite predicate: + + 1) 'x' use predicate: (n > 0) .AND. (m < 2) + 2) 'x' default value (non-def) predicate: .NOT. (n > 0) + (the predicate chain for phi operand defs can be computed + starting from a bb that is control equivalent to the phi's + bb and is dominating the operand def.) + + and check overlapping: + (n > 0) .AND. (m < 2) .AND. (.NOT. (n > 0)) + <==> false + + This implementation provides a framework that can handle different + scenarios. (Note that many simple cases are handled properly without + the predicate analysis if jump threading eliminates the merge point + thus makes path-sensitive analysis unnecessary.) + + PHI is the phi node whose incoming (undefined) paths need to be + pruned, and OPNDS is the bitmap holding interesting operand + positions. VISITED is the pointer set of phi stmts being + checked. */ + +bool +predicate::overlap (gphi *phi, unsigned opnds, hash_set<gphi *> *visited) +{ + gimple *flag_def = NULL; + tree boundary_cst = NULL_TREE; + bitmap visited_flag_phis = NULL; + + /* Find within the common prefix of multiple predicate chains + a predicate that is a comparison of a flag variable against + a constant. */ + tree_code cmp_code = find_var_cmp_const (m_preds, phi, &flag_def, + &boundary_cst); + if (cmp_code == ERROR_MARK) + return true; + + /* Now check all the uninit incoming edges have a constant flag + value that is in conflict with the use guard/predicate. */ + gphi *phi_def = as_a<gphi *> (flag_def); + bool all_pruned = prune_phi_opnds (phi, opnds, phi_def, boundary_cst, + cmp_code, m_eval, visited, + &visited_flag_phis); + + if (visited_flag_phis) + BITMAP_FREE (visited_flag_phis); + + return !all_pruned; +} + +/* Return true if two predicates PRED1 and X2 are equivalent. Assume + the expressions have already properly re-associated. */ + +static inline bool +pred_equal_p (const pred_info &pred1, const pred_info &pred2) +{ + if (!operand_equal_p (pred1.pred_lhs, pred2.pred_lhs, 0) + || !operand_equal_p (pred1.pred_rhs, pred2.pred_rhs, 0)) + return false; + + tree_code c1 = pred1.cond_code, c2; + if (pred1.invert != pred2.invert + && TREE_CODE_CLASS (pred2.cond_code) == tcc_comparison) + c2 = invert_tree_comparison (pred2.cond_code, false); + else + c2 = pred2.cond_code; + + return c1 == c2; +} + +/* Return true if PRED tests inequality (i.e., X != Y). */ + +static inline bool +is_neq_relop_p (const pred_info &pred) +{ + + return ((pred.cond_code == NE_EXPR && !pred.invert) + || (pred.cond_code == EQ_EXPR && pred.invert)); +} + +/* Returns true if PRED is of the form X != 0. */ + +static inline bool +is_neq_zero_form_p (const pred_info &pred) +{ + if (!is_neq_relop_p (pred) || !integer_zerop (pred.pred_rhs) + || TREE_CODE (pred.pred_lhs) != SSA_NAME) + return false; + return true; +} + +/* Return true if PRED is equivalent to X != 0. */ + +static inline bool +pred_expr_equal_p (const pred_info &pred, tree expr) +{ + if (!is_neq_zero_form_p (pred)) + return false; + + return operand_equal_p (pred.pred_lhs, expr, 0); +} + +/* Return true if VAL satisfies (x CMPC BOUNDARY) predicate. CMPC can + be either one of the range comparison codes ({GE,LT,EQ,NE}_EXPR and + the like), or BIT_AND_EXPR. EXACT_P is only meaningful for the latter. + Modify the question from VAL & BOUNDARY != 0 to VAL & BOUNDARY == VAL. + For other values of CMPC, EXACT_P is ignored. */ + +static bool +value_sat_pred_p (tree val, tree boundary, tree_code cmpc, + bool exact_p = false) +{ + if (cmpc != BIT_AND_EXPR) + return is_value_included_in (val, boundary, cmpc); + + wide_int andw = wi::to_wide (val) & wi::to_wide (boundary); + if (exact_p) + return andw == wi::to_wide (val); + + return andw.to_uhwi (); +} + +/* Return true if the domain of single predicate expression PRED1 + is a subset of that of PRED2, and false if it cannot be proved. */ + +static bool +subset_of (const pred_info &pred1, const pred_info &pred2) +{ + if (pred_equal_p (pred1, pred2)) + return true; + </cut>

3 years, 11 months

2
1
0 0

clang-aarch64-full-2stage buildbot timeout

by Florian Hahn

Hi, It looks like a lot of the recent builds of clang-aarch64-full-2stage are timing out. E.g https://lab.llvm.org/buildbot/#/builders/179/builds/1078 while checking out sources https://lab.llvm.org/buildbot/#/builders/179/builds/1076 during building stage2 Is there anything that could be done to avoid such timeouts and avoid false positive failure emails? Cheers, Florian

3 years, 11 months

1
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain