Progress:
* UM-2 [QEMU upstream maintainership]
- I'm now back on merging duty for the 8.0 release cycle, so some
time spent on pull request processing
- Sent pull requests with accumulated arm and reset-refactoring
patches from the freeze period
- Trying to cut down my code review backlog before the holidays
-- PMM
# [GNU-767] Support changing SVE vector length in remote debugging
- Patches to gdbserver to support changing the SVE vector length: About
halfway through implementing the new approach of sending new XML
target descriptions through the wire.
# Misc
- Experimented with using a GDB wrapper to run the testsuite. Came up
with a small patch that fixes the tests that fail when using the
wrapper.
--
Thiago
Failure after basepoints/gcc-13-4618-g17ae956c0fa: AArch64: Support new tbranch optab.:
Results changed to
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
7572
# First few build errors in logs:
# 00:10:43 drivers/gpu/drm/v3d/v3d_perfmon.c:57:1: internal compiler error: in decompose, at rtl.h:2288
# 00:10:44 make[5]: *** [scripts/Makefile.build:250: drivers/gpu/drm/v3d/v3d_perfmon.o] Error 1
# 00:10:53 make[4]: *** [scripts/Makefile.build:502: drivers/gpu/drm/v3d] Error 2
# 00:13:52 drivers/media/mc/mc-device.c:198:1: internal compiler error: in decompose, at rtl.h:2288
# 00:13:53 make[4]: *** [scripts/Makefile.build:250: drivers/media/mc/mc-device.o] Error 1
# 00:13:57 make[3]: *** [scripts/Makefile.build:502: drivers/media/mc] Error 2
# 00:15:27 make[2]: *** [scripts/Makefile.build:502: drivers/media] Error 2
# 00:17:50 make[3]: *** [scripts/Makefile.build:502: drivers/gpu/drm] Error 2
# 00:17:50 make[2]: *** [scripts/Makefile.build:502: drivers/gpu] Error 2
# 00:17:50 make[1]: *** [scripts/Makefile.build:502: drivers] Error 2
from
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
8625
# linux build successful:
all
# linux boot successful:
boot
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
For latest status see comments in https://linaro.atlassian.net/browse/GNU-680 .
Status of basepoints/gcc-13-4618-g17ae956c0fa commit for tcwg_kernel:
commit 17ae956c0fa6baac3d22764019d5dd5ebf5c2b11
Author: Tamar Christina <tamar.christina(a)arm.com>
Date: Mon Dec 12 15:18:56 2022 +0000
AArch64: Support new tbranch optab.
This implements the new tbranch optab for AArch64.
we cannot emit one big RTL for the final instruction immediately.
The reason that all comparisons in the AArch64 backend expand to separate CC
compares, and separate testing of the operands is for ifcvt.
The separate CC compare is needed so ifcvt can produce csel, cset etc from the
compares. Unlike say combine, ifcvt can not do recog on a parallel with a
clobber. Should we emit the instruction directly then ifcvt will not be able
to say, make a csel, because we have no patterns which handle zero_extract and
compare. (unlike combine ifcvt cannot transform the extract into an AND).
While you could provide various patterns for this (and I did try) you end up
with broken patterns because you can't add the clobber to the CC register. If
you do, ifcvt recog fails.
i.e.
int
f1 (int x)
{
if (x & 1)
return 1;
return x;
}
We lose csel here.
Secondly the reason the compare with an explicit CC mode is needed is so that
ifcvt can transform the operation into a version that doesn't require the flags
to be set. But it only does so if it know the explicit usage of the CC reg.
For instance
int
foo (int a, int b)
{
return ((a & (1 << 25)) ? 5 : 4);
}
Doesn't require a comparison, the optimal form is:
foo(int, int):
ubfx x0, x0, 25, 1
add w0, w0, 4
ret
and no compare is actually needed. If you represent the instruction using an
ANDS instead of a zero_extract then you get close, but you end up with an ands
followed by an add, which is a slower operation.
gcc/ChangeLog:
* config/aarch64/aarch64.md (*tb<optab><mode>1): Rename to...
(*tb<optab><ALLI:mode><GPI:mode>1): ... this.
(tbranch_<code><mode>4): New.
* config/aarch64/iterators.md(ZEROM, zerom): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/tbz_1.c: New test.
* gnu-master-aarch64-mainline-defconfig
** Failure after basepoints/gcc-13-4618-g17ae956c0fa: AArch64: Support new tbranch optab.:
** https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-master-aarch64-mainline…
Bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-master-aarch64-mainline…
Good build: https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-master-aarch64-mainline…
Reproduce current build:
<cut>
mkdir -p investigate-gcc-17ae956c0fa6baac3d22764019d5dd5ebf5c2b11
cd investigate-gcc-17ae956c0fa6baac3d22764019d5dd5ebf5c2b11
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests for bad and good builds
mkdir -p bad/artifacts good/artifacts
curl -o bad/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-master-aarch64-mainline… --fail
curl -o good/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-master-aarch64-mainline… --fail
# Reproduce bad build
(cd bad; ../jenkins-scripts/tcwg_kernel-build.sh ^^ true %%rr[top_artifacts] artifacts)
# Reproduce good build
(cd good; ../jenkins-scripts/tcwg_kernel-build.sh ^^ true %%rr[top_artifacts] artifacts)
</cut>
Full commit (up to 1000 lines):
<cut>
commit 17ae956c0fa6baac3d22764019d5dd5ebf5c2b11
Author: Tamar Christina <tamar.christina(a)arm.com>
Date: Mon Dec 12 15:18:56 2022 +0000
AArch64: Support new tbranch optab.
This implements the new tbranch optab for AArch64.
we cannot emit one big RTL for the final instruction immediately.
The reason that all comparisons in the AArch64 backend expand to separate CC
compares, and separate testing of the operands is for ifcvt.
The separate CC compare is needed so ifcvt can produce csel, cset etc from the
compares. Unlike say combine, ifcvt can not do recog on a parallel with a
clobber. Should we emit the instruction directly then ifcvt will not be able
to say, make a csel, because we have no patterns which handle zero_extract and
compare. (unlike combine ifcvt cannot transform the extract into an AND).
While you could provide various patterns for this (and I did try) you end up
with broken patterns because you can't add the clobber to the CC register. If
you do, ifcvt recog fails.
i.e.
int
f1 (int x)
{
if (x & 1)
return 1;
return x;
}
We lose csel here.
Secondly the reason the compare with an explicit CC mode is needed is so that
ifcvt can transform the operation into a version that doesn't require the flags
to be set. But it only does so if it know the explicit usage of the CC reg.
For instance
int
foo (int a, int b)
{
return ((a & (1 << 25)) ? 5 : 4);
}
Doesn't require a comparison, the optimal form is:
foo(int, int):
ubfx x0, x0, 25, 1
add w0, w0, 4
ret
and no compare is actually needed. If you represent the instruction using an
ANDS instead of a zero_extract then you get close, but you end up with an ands
followed by an add, which is a slower operation.
gcc/ChangeLog:
* config/aarch64/aarch64.md (*tb<optab><mode>1): Rename to...
(*tb<optab><ALLI:mode><GPI:mode>1): ... this.
(tbranch_<code><mode>4): New.
* config/aarch64/iterators.md(ZEROM, zerom): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/tbz_1.c: New test.
---
gcc/config/aarch64/aarch64.md | 33 ++++++++---
gcc/config/aarch64/iterators.md | 2 +
gcc/testsuite/gcc.target/aarch64/tbz_1.c | 95 ++++++++++++++++++++++++++++++++
3 files changed, 122 insertions(+), 8 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 896b6a8ac79..d749c98eef6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -947,12 +947,29 @@
(const_int 1)))]
)
-(define_insn "*tb<optab><mode>1"
+(define_expand "tbranch_<code><mode>3"
[(set (pc) (if_then_else
- (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
- (const_int 1)
- (match_operand 1
- "aarch64_simd_shift_imm_<mode>" "n"))
+ (EQL (match_operand:ALLI 0 "register_operand")
+ (match_operand 1 "aarch64_simd_shift_imm_<mode>"))
+ (label_ref (match_operand 2 ""))
+ (pc)))]
+ ""
+{
+ rtx bitvalue = gen_reg_rtx (<ZEROM>mode);
+ rtx reg = gen_lowpart (<ZEROM>mode, operands[0]);
+ rtx val = GEN_INT (1UL << UINTVAL (operands[1]));
+ emit_insn (gen_and<zerom>3 (bitvalue, reg, val));
+ operands[1] = const0_rtx;
+ operands[0] = aarch64_gen_compare_reg (<CODE>, bitvalue,
+ operands[1]);
+})
+
+(define_insn "*tb<optab><ALLI:mode><GPI:mode>1"
+ [(set (pc) (if_then_else
+ (EQL (zero_extract:GPI (match_operand:ALLI 0 "register_operand" "r")
+ (const_int 1)
+ (match_operand 1
+ "aarch64_simd_shift_imm_<ALLI:mode>" "n"))
(const_int 0))
(label_ref (match_operand 2 "" ""))
(pc)))
@@ -963,15 +980,15 @@
{
if (get_attr_far_branch (insn) == 1)
return aarch64_gen_far_branch (operands, 2, "Ltb",
- "<inv_tb>\\t%<w>0, %1, ");
+ "<inv_tb>\\t%<ALLI:w>0, %1, ");
else
{
operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL (operands[1]));
- return "tst\t%<w>0, %1\;<bcond>\t%l2";
+ return "tst\t%<ALLI:w>0, %1\;<bcond>\t%l2";
}
}
else
- return "<tbz>\t%<w>0, %1, %l2";
+ return "<tbz>\t%<ALLI:w>0, %1, %l2";
}
[(set_attr "type" "branch")
(set (attr "length")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d10cf93572e..a521dbde1ec 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1107,6 +1107,8 @@
;; Give the number of bits in the mode
(define_mode_attr sizen [(QI "8") (HI "16") (SI "32") (DI "64")])
+(define_mode_attr ZEROM [(QI "SI") (HI "SI") (SI "SI") (DI "DI")])
+(define_mode_attr zerom [(QI "si") (HI "si") (SI "si") (DI "di")])
;; Give the ordinal of the MSB in the mode
(define_mode_attr sizem1 [(QI "#7") (HI "#15") (SI "#31") (DI "#63")
diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
new file mode 100644
index 00000000000..39deb58e278
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
@@ -0,0 +1,95 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=c99 -fno-unwind-tables -fno-asynchronous-unwind-tables" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#include <stdbool.h>
+
+void h(void);
+
+/*
+** g1:
+** tbnz w[0-9]+, #?0, .L([0-9]+)
+** ret
+** ...
+*/
+void g1(bool x)
+{
+ if (__builtin_expect (x, 0))
+ h ();
+}
+
+/*
+** g2:
+** tbz w[0-9]+, #?0, .L([0-9]+)
+** b h
+** ...
+*/
+void g2(bool x)
+{
+ if (__builtin_expect (x, 1))
+ h ();
+}
+
+/*
+** g3_ge:
+** tbnz w[0-9]+, #?31, .L[0-9]+
+** b h
+** ...
+*/
+void g3_ge(int x)
+{
+ if (__builtin_expect (x >= 0, 1))
+ h ();
+}
+
+/*
+** g3_gt:
+** cmp w[0-9]+, 0
+** ble .L[0-9]+
+** b h
+** ...
+*/
+void g3_gt(int x)
+{
+ if (__builtin_expect (x > 0, 1))
+ h ();
+}
+
+/*
+** g3_lt:
+** tbz w[0-9]+, #?31, .L[0-9]+
+** b h
+** ...
+*/
+void g3_lt(int x)
+{
+ if (__builtin_expect (x < 0, 1))
+ h ();
+}
+
+/*
+** g3_le:
+** cmp w[0-9]+, 0
+** bgt .L[0-9]+
+** b h
+** ...
+*/
+void g3_le(int x)
+{
+ if (__builtin_expect (x <= 0, 1))
+ h ();
+}
+
+/*
+** g5:
+** mov w[0-9]+, 65279
+** tst w[0-9]+, w[0-9]+
+** beq .L[0-9]+
+** b h
+** ...
+*/
+void g5(int x)
+{
+ if (__builtin_expect (x & 0xfeff, 1))
+ h ();
+}
</cut>
# [GNU-767] Support changing SVE vector length in remote debugging
- Patches to gdbserver to support changing the SVE vector length: There
was an upstream discussion about whether changing the implementation
from relying on expedited registers to relying on sending target
descriptions over the wire was a better approach. Simon Marchi
detailed his idea on how to do that and it does seem better.
- Started implementing Simon's approach of sending target descriptions
over the wire for each thread.
# Misc
- Sent and later committed a couple of patches¹ fixing whitespace issues
in a Python script that generates a GDB source file.
--
Thiago
¹ https://inbox.sourceware.org/gdb-patches/20221202192200.405379-1-thiago.bau…
Hello,
# [GNU-767] Support changing SVE vector length in remote debugging
- v2 of the gdbserver patches to support changing the SVE vector length
was quickly reviewed by both Luis and Simon Marchi. I applied their
review suggestions and I'm now working on fixing a bug with
multi-threaded programs that they spotted.
- Submitted a couple of small patches¹ fixing tab vs spaces issues in
the gdbarch.py script that generates some source code in GDB.
# Misc
- Fixed problem in the tcwg-dev/start.sh script where asking docker to
expose /dev/kvm to a dev container on a host which doesn't have KVM
support causes docker to error out (reported by David Spickett). Sent
Gerrit change request “42669: tcwg-dev: Add heuristic to check for KVM
support on the host” to fix it. David reviewed and merged it. Thanks!
--
Thiago
¹ https://inbox.sourceware.org/gdb-patches/20221202192200.405379-1-thiago.bau…
Hello,
# [GNU-767] Support changing SVE vector length in remote debugging
- Worked on v2 of the gdbserver patches improving SVE support. Found a
couple of simplifications that could be made to the code. Rebased on
current master branch and finished regression testing and patch
preparation. Wrote the cover letter.
- Finally posted the patch series upstream¹.
--
Thiago
¹ https://inbox.sourceware.org/gdb-patches/20221126020452.1686509-1-thiago.ba…
Progress:
* UM-2 [QEMU upstream maintainership]
- More conversions of devices to 3-phase reset in pursuit of the
interim goal of getting rid of device_class_set_parent_reset()
- Investigated a regression in a TF-A workload: this was due to
recent pagetable walk refactoring breaking debug accesses.
- code review, etc; mostly this has been of for-8.0 material, as
(other than the above mentioned ptw regression) there haven't
been many release-worthy issues this freeze cycle.
-- PMM
Hi David,
Our CI flagged your commit; it seems it miscompiles 403.gcc from SPEC CPU2006 at -O3 -flto for aarch64-linux-gnu. Would you please investigate?
Let me know if you need any assistance in reproducing this.
Thanks!
===
After working-3971-g5f7f484ee54e commit 5f7f484ee54ebbf702ee4c5fe9852502dc237121
Author: David Green <david.green(a)arm.com>
[AArch64] Add GPR rr instructions to isAssociativeAndCommutative
the following benchmarks slowed down by more than 3%:
- 403.gcc failed to run
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3 -flto
- Hardware: NVidia TX1 4x Cortex-A57
--
Maxim Kuvyrkov
https://www.linaro.org
Hello,
# [GNU-767] Support changing SVE vector length in remote debugging
- Working on v2 of the gdbserver patches improving SVE support. Found
and fixed a memory leak in my code while preparing patches for
upstream submission. Continuing preparing the patches.
# [GNU-796] Stabilise GDB testsuite results in the CI
- Analysed failures of “fast_check_gdb” CI jobs over the weekend. They
were caused by unrelated build issues in GCC and libstdc++.
--
Thiago
Project Stratos
===============
- finished [blog post for virtio-camera]
- now live [on web site]
- did a review of [virtio-loopback]
- started reviewing [PATCH v9 0/8] KVM: mm: fd-based approach for
supporting KVM Message-Id:
<20221025151344.3784230-1-chao.p.peng(a)linux.intel.com>
- trying to assess if user-space facing solution for memory sharing
[blog post for virtio-camera]
<https://linaro.atlassian.net/jira/core/projects/LBO/board?selectedIssue=LBO…>
[on web site]
<https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/>
[virtio-loopback] <https://git.virtualopensystems.com/virtio-loopback>
vhost-device maintainer effort ([UM-196])
- debugged regression in virtio-vsock and QEMU
- should have some error message patches to post
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
Single Binary ([QEMU-487])
==========================
- posted [PATCH for 8.0 v5 00/20] use MemTxAttrs to avoid current_cpu
in hw/ Message-Id: <20221111182535.64844-1-alex.bennee(a)linaro.org>
[QEMU-487] <https://linaro.atlassian.net/browse/QEMU-487>
QEMU Upstream Work ([UM-2])
===========================
- posted [PATCH for 7.2-rc1 v2 00/12] testing, docs, plugins, arm
pre-PR Message-Id: <20221111145529.4020801-1-alex.bennee(a)linaro.org>
- posted [PULL for 7.2 00/10] testing and doc updates Message-Id:
<20221115133439.2348929-1-alex.bennee(a)linaro.org>
- did a bunch of follow-up testing to shake out other console bugs
- posted [PATCH for 7.2-rc1 v2 00/12] testing, docs, plugins, arm
pre-PR Message-Id:
<20221111145529.4020801-1-alex.bennee(a)linaro.org>
- posted [PATCH for 7.2? v1 0/2] Arm GICv2 patches Message-Id:
<20221115134048.2352715-1-alex.bennee(a)linaro.org>
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Other
=====
- hackbox2 upgrade follow-up
- health check content
Completed Reviews [1/1]
=======================
[PATCH] ci: replace x86_64 macos-11 with aarch64 macos-12
Message-Id: <20221116175023.80627-1-berrange(a)redhat.com>
Absences
========
Current Review Queue
====================
TODO [PATCH v6 00/21] Drivers for gunyah hypervisor
Message-Id: <20221026185846.3983888-1-quic_eberman(a)quicinc.com>
==================================================================================================================
TODO [PATCH v2 0/1] tcg: add perfmap and jitdump
Message-Id: <20221114161321.3364875-1-iii(a)linux.ibm.com>
========================================================================================================
TODO [PATCH for-8.0 v3 00/45] tcg: Support for Int128 with helpers
Message-Id: <20221111074101.2069454-1-richard.henderson(a)linaro.org>
=====================================================================================================================================
--
Alex Bennée