Holiday [2/10]
Juno crash analysis [2/10]
* Spent some time fiddling with kexec on AArch64
* Worked in one very specific case
* Another patch series is (apparently) coming, will look out for it
and try again
SPEC-on-Android [2/10]
* Supporting Qian on getting this working
* Wrote a readme for the repository, fixed a Makefile bug that Qian's
cross-compiler happened to tickle
Jenkins benchmarking job - TCWG-348 [1/10]
* Tested, tidied up pbl hacks to generate JSON
* Tested my pbl with Jenkins prototype jobs
* A few minor bug fixes/enhancements for pbl
LAVA jobs for uinstance - TCWG-432 [1/10]
* Reworked jobs to support uinstance, maintaining backward
compatibility as far as possible
* Started adding support to submit results to bundle stream
Misc [2/10]
* Debian FS ready to submit
* Usual meetings/mail/etc background
=Plan=
Look at doing pbl hacks properly in Fathi's in-development refactored p-b-l
Pull together Jenkins/LAVA/pbl, ready to test when uinstance is available
Write up noise control report
(If time, if patches land) have another go at crashdump
== Progress ==
o Linaro GCC (4/10)
* Delivered GCC 4.9 2015.10 snapshot
* More backports forGCC 5 2015.11
* Many instabilities on Hetzner this week
o Upstream work (2/10)
* Sanitizing gfortran testsuite
o Release tools (2/10)
* Added RCs and binaries support to our snapshot.linaro.org
publishing job
o Misc (2/10)
* Various meetings
* Some support
== Plan ==
o Track missing backports dependencies
o Continue ongoing tasks.
== This week ==
* TCWG-369 - Exploit wide add operations when appropriate for Aarch64 (4/10)
- Determined that vectorizer is failing for all targets that have
widening adds with
V8HI to V4SI support (aarch64, ia64, powerPC).
- Modified test cases to indicate expected failure with wide add
V8HI to V4SI support
- Patch sent upstream for approval
* Bugzilla 68223 - arm_[su]min_cmp pattern fails
- Resolved by reverting patch for tcwg-146 as pattern fail in some
corner cases. (3/10)
- Reverted patch checked in upstream
* Misc (1/10)
- Conference calls
* Illness, November 2nd (2/10)
== Next week ==
- TCWG-317 - Resolve lto big endian failures
== Progress ==
- Leave (2/10)
- Widening pass (TCWG-547) - 5/10
* Made the latest changes requested in the review
* Fixed bootstrap and bootstrap mis-compare for ppc64-linux-gnu
* Making uninitialized variable as anonymous ssa (as asked in review)
results in few ICEs.
* Posted updated patch for feedback
- Misc (3/10)
* started looking into LTO status
* Looked at LuaJIT for arm
* gcc/bug list
== Plan ==
* continue with widening pass based on feedback
* Look at implementing LuaJIT for aarch64
* LTO
== This week ==
* TCWG-72 (6/10)
- 5 iterations since the original patch. Changes include:
a) Integration into widening_mul patch
b) Rewriting the divmod transform so DIVMOD() is placed before the topmost
div/mod stmt
c) Removed check for widening mode and optab handler check in expand_DIVMOD
d) Fixed ICE when constant is one of the operands to div/mod stmt.
e) Fixed mis-compilation with a test-case when operands matched but in
opposite order.
f) Formatting nits and fixed test-cases.
- Richard suggested no need to check for post-domination conditions.
- Not sure on what condition to gate the transform.
Checking for availability of divmod/div/mod is not sufficient because arm
defines optab handler for mod which only matches r0 % n where n is
constant and power of 2
for other cases it's expanded via divmod libcall thru expand_divmod.
We would rather need
to check if the template for mod/div gets matched than just to check
if optab handler exists.
AFAIK this cannot be done during tree-ssa passes.
I can think of two approaches:
a) Do the transform to DIVMOD representation unconditionally in
widening_mul pass.
And then in expand_DIVMOD check if the template for mod can be matched.
If it does match then undo the transform from DIVMOD to original
representation and expand.
I am not sure how feasible it is to undo the transform at expansion
time, and start expanding the modified cfg.
b) Define a new target hook combine_divmod.
Default implementation could check for optab handler for div/mod/divmod.
and I could override it for arm-backend to additionally check if the
second operand is a constant and power of 2 and fail for this case
(since we want this to be expanded from modsi3 pattern).
Not sure if this is a good idea, I am replicating the information from
the modsi3 pattern.
If the pattern changes, the hook would also need to be changed.
* Convert ASM_FORMAT_PRIVATE_NAME to hook (2/10)
* TCWG-319 (1/10)
- Bencharmking for patch in progress
* Misc (1/10)
- Meetings
- Sync with Kugan
== Next Week ==
- Continue with TCWG-72
- Complete the patch with build, test and config-builds for
ASM_FROMAT_PRIVATE_NAME and submit upstream
- Continue benchmarking TCWG-319, TCWG-310
== Progress ==
* Buildbots (5/10)
- Some broken bots, bisecting, etc
- Helping a MIPS patch pass on ARM bot
* Maintenance (2/10)
- SciMark2 seems not to be unstable or slow any more in ARM64
- Some more investigations on Loop Load Elimination
- Profiling bigfib on APM and HiKey
* Background (3/10)
- Code review, meetings, discussions, general support, etc.
- Some FOSDEM fiddling
- Some power issues
== Progress ==
* Validation
- moved list of unstable tests to a separate repo, to make
maintenance easier (TCWG-425)
- Jenkins jobs maintenance & cleanup
- a few ABE reporting patches
- comparison of results between old & new lab
* GCC
- trunk monitoring, reported a few new failures.
- Send patch to fix vqtb[lx][34] intrinsics on aarch64_be
* Binutils
- Added a Jenkins job to build+check binutils on
a variety of configurations:
https://ci.linaro.org/view/tcwg-ci/job/tcwg-binutils/
- sent a small patch to fix a bug in the recent STM32L4XX erratum patch
== Next ==
* Validation:
- work on the switch to the new lab, once dev-01 is back online
- more tuning to avoid deadlocks
- re-measure build time on dev-01, to better tune other build jobs
* Two half day off. [2/10]
# Progress #
* TCWG-332, fails in gdb.threads/multiple-step-overs.exp. [1/10]
Testing the simpler approach suggested during the review.
* TCWG-387, done. [1/10] GDB patches are pushed in.
* TCWG-422, GNU vector extension support in ARM GDB. [2/10]
Patches are done, and being tested.
* TCWG-423, GNU vector extension support in AArch64 GDB. [2/10]
Writing patches. Find more issues for AArch64 that GDB doesn't
fully understand the AArch64 calling convention. Need more work here.
* Review ARM GDBserver software single step patch. [1/10]
* Misc, meeting, email, [1/10]
# Plan #
* Off on Wed afternoon.
* TCWG-422, post patches
* TCWG-423, continue.
--
Yao
The Linaro Toolchain Working Group is pleased to announce the availability
of the Linaro Stable Binary Toolchain GCC 5.2-2015.11-rc1
Release-Candidate Archives.
http://snapshots.linaro.org/components/toolchain/binaries/5.2-2015.11-rc1/http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.2-2015.11-rc1/
These archives provide cross-toolchain executables (compiler, debugger,
linker, etc.) and shared libraries (libstdc++, libc, etc.) that target ARM
or Aarch64 GNU/Linux and bare-metal environments. The cross-toolchain
binaries execute on a Linux or MS Windows (under mingw32) host
operating-system.
Please evaluate this release-candidate for correctness. Linaro will
shortly spin the Linaro GCC 5.2-2015.11 release if this release-candidate
passes stakeholder validation.
For bugs related to this release-candidate please email
linaro-toolchain(a)lists.linaro.org or file a bug at
https://bugs.linaro.org/enter_bug.cgi?product=Linux%20Binary%20toolchain
NEWS
* GCC 5.2 2015.11-rc1
The Linaro GCC 5.2 2015.11-rc1 binary toolchain release-candidate is
built from the Linaro GCC-5.2-2015.11 release-candidate source archive.
The Linaro GCC-5.2-2015.11 release source archive is derived from the same
sources as the Linaro GCC-5.2-2015.10 snapshot source archive.
--
Ryan S. Arnold
Linaro Toolchain Working Group - Engineering Manager
www.linaro.org
Dear List,
I'm new to this list and have some questions.
Looking at the created code of GCC on ARMv8, we noticed some areas where there is room for performance improvements.
I assume that these items might already be noticed by you guys.
For example:
1) We noticed that when writing typical DGEMM like code, GCC includes unnecessary DUP instruction
2) GCC seems unwilling to use LDP loads
3) For optimal FPU performance on some A57 its needed to interleave instruction working on ODD and EVEN registers
GCC seem not properly support this. Here sometimes 100% performance increase could be reached by different instruction interleaving.
4) Some work loops highly benefit of interleaving of FPU instructinons and loads.
GCC seems to likes to re-arrange the code so that most or all loads are put on top of the loop.
This can reduce the performance of a well written workloop significantly.
I have no patches to fix this.
But I can produce C- code and ASM output which will show these performance issues.
Please tell me what the next recommended step will be now.
Are all these items known already, or shall I provide code examples to further explain them?
Kind regards
Gunnar von Boehn