== Progress ==
* Validation
- extended validation: submitted ABE patch for discussion
- noticed random results on some tests in the Cambridge lab.
maybe caused by excessive load on the tester, or stdout/stderr problems
- investigating how to actually separate stdout/stderr streams in dejagnu
* GCC:
- branch merge review for 2016.03 snapshot
- AdvSIMD/Neon intrinsics tests: more cleanup, wondering about
poly128_t prototypes
* Misc (conf calls, meetings, emails, ....)
== Next ==
* Validation:
- more on extended validation
- random tests investigation
- more on stdout/stderr
* GCC:
- trunk monitoring, report regressions if needed
- intrinsics tests cleanup
== This Week ==
* LTO (3/10)
a) section anchors:
- prototype patch to bind functions to global vars
- looked at balanced partitioning
b) chromium LTO build fails with ICE on trunk for arm-linux-gnueabihf:
http://pastebin.com/sX6yKLBP
c) ipa-comdat
- Looked at the pass.
- trying to address TODO: put symbol in it's own comdat section
* Validation (1/10)
- prototype job in bash.
* Holidays (6/10)
== Next Week ==
Continue ongoing tasks
Port to microinstance - TCWG-432 [5/10]
* Non-lab side of minimal trust benchmarking
* More investigation of runtime anomalies
* Reordered builder phases to do useful work while waiting for targets
* Updated everything to work with benchmarking LAVA user (rather than
running as me)
Automated backport benchmarking - TCWG-352 [2/10]
* Cycles of review/development/testing
Controlled image builds - TCWG-360 [1/10]
* More failures to get image to boot on Juno
Log critical data - TCWG-349 [1/10]
* Everything now logged, except where it depends on TCWG-360
Misc - [1/10]
=Plan=
Finish non-lab side of minimal trust benchmarking
Commit backport benchmarking, review permitting
Tweak microinstance in reaction to lab work
More Juno image work
More runtime anomaly work
=Availability=
Off from this Friday, back for three days from Monday 4th April
Return to ARM on Thursday 7th April
== This week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names (5/10)
- Resolved 50% of thumb2 failures
- Resolved issues with overlapping registers and not setting
condition codes
- Wrote compile only test cases that pass validation
* TCWG-247 - Create Validation Job to run on GCC Trunk Commits (2/10)
- Investigation into Python API for discovering when Jenkins builders
are idle
* Linaro connect recovery day (2/10)
* Misc meeting (1/10)
== Next week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names
- Resolve remaining thumb2 issues
* TCWG-247
- Create prototype implementation
= Progress ==
* Day off (2/10)
- After Connect, recuperating, jet lagging
* EuroLLVM (6/10)
- Flying Wed to Barcelona, attending conference
- Back on Saturday
* Background (2/10)
- Code review, meetings, discussions, general support, etc.
- Planning for a bigger team (git, Jenkins, infrastructure)
* Sick on Monday [2/10]
# Progress #
* AArch64/ARM linux syscall for process record. [2/10] TCWG-532
ARM patch (fixing the register for syscall arg pass) is committed.
Canonicalize ARM syscall patch is posted for review.
* Support range stepping on arm-linux. [4/10] TCWG-545
Preparatory patches fixing bugs when "single step the instruction
branch to itself" are being reviewed. Pedro thinks my patches may
not work in some rare cases, and I spend some time writing the case
and prove it won't happen.
* Misc [2/10]
** file expense,
** upstream patch review,
# Plan #
* TCWG-532
* TCWG-545
--
Yao
The Linaro Toolchain Working Group (TCWG) is pleased to announce the
2016.03 snapshot of the Linaro GCC 5 source package.
This monthly snapshot[1] is based on FSF GCC 5.3+svn234210 and
includes performance improvements and bug fixes backported from
mainline GCC. This snapshot contents will be part of the 2016.05
stable [1] quarterly release.
This snapshot tarball is available on:
http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.3-2016.03/
Interesting changes in this GCC source package snapshot include:
* Updates to GCC 5.3+svn234210
* Backport of [Bugfix] [AArch64] [Linaro #1994] Disable
pcrelative_literal_loads with fix-cortex-a53-843419
* Backport of [Bugfix] [AArch64] [Linaro #2123] Fix dependency of gcc-plugin.h
* Backport of [Bugfix] [AArch32] PR target/62554 target/69610 Fix for ARMv3
* Backport of [Bugfix] [AArch32] PR target/69161: Don't ignore mode
when matching comparison operator in cstore-like patterns
* Backport of [AArch32] Enable instruction fusion of AES instructions
on ARM for Cortex-A53 and Cortex-A57
* Backport of [AArch64] Add missing return in aarch64_internal_mov_immediate
* Backport of [AArch64] Enable instruction fusion of dependent AESE;
AESMC and AESD; AESIMC pairs
* Backport of [AArch64] Fix installed plugin headers for aarch64, m68k and c6x
* Backport of [AArch64] GCC 6 regression in vector performance. - Fix
vector initialization to happen with lane load instructions
* Backport of [AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS
* Backport of [Testsuite] [AArch64] add check for aarch64 in
check_effective_target_section_anchors
* Backport of [Testsuite] Print markers to stderr to avoid races with
sanitizer output
* Backport of [Misc] Fix ChangeLog for 233518
Subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro development.
** Linaro Toolchain Development "mailing list":
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
** Linaro Toolchain IRC channel on irc.freenode.net at @#linaro-tcwg@
* Bug reports should be filed in bugzilla against GCC product:
http://bugs.linaro.org/enter_bug.cgi?product=GCC
* Interested in commercial support? inquire at "Linaro support":
mailto:support@linaro.org
[1]. Stable source package releases are defined as releases where the
full Linaro Toolchain validation plan is executed.
[2]. Source package snapshots are defined when the compiler is only
put through unit-testing and full validation is not performed.
== Progress ==
o BKK16 remote (5/10)
* Followed TCWG sessions
* Extended validation:
- worked with Kugan
- implemented job for native validation
o GCC dev. (4/10)
* Remote validation sanitizing:
- iterate on the output pattern fix
- testing a fix for stderr/stdin ordering issue
* Gave some support on __sync builtins, preparing a fix for armv8.1
o Misc (1/10)
* Various meetings
== Plan ==
o GCC 5 branch merge, and 2016.03 snapshot
o Continue on-going tasks
Port to microinstance - TCWG-432 [17/10]
* Investigating difference between LAVA and 'desktop Juno' runtimes
** Some of this was down to piles of /dev/console output - redirecting
to file improved SPEC build time by 75%!
** Some cases make sense, others remain unexplained
** Might just go away if we update the Juno image
* Wrote up how to do benchmarking for minimal-trust cases
** Needs both lab and development work
* Merged another large tranche of changes back to benchmarking branch
** Microinstance more or less functional, main instance benchmarking
seems unbroken
** But some more tweaks to make as Lab work happens
* Prepared backport benchmarking for merge
Misc [3/10]
=Plan=
* Submit backport benchmarking for review
* Tweak uinstance in reaction to lab work
* Implement the non-lab side of minimal-trust benchmarking
* Return to looking at Juno image generation
* Look some more at LAVA/desktop runtime differences
* Implement small improvements, if time
== This week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names (3/10)
- Resolved thumb2 failures
- Negdi2 was not generating instruction to set condition codes
* Bugzilla 70008 - [ARM] Reverse subtract with carry can be generated in
thumb2 mode (1/10)
- Created new patch using new predicate that matches arm and thumb2
constraints
- Received approval to GCC 7 stage 1
* Bugzilla 70014 - [ARM] Predicate does not match constraint
(*subsi3_carryin_const) (1/10)
- Fix checked into trunk
* Linaro Connect meetings (5/10)
== Next week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names
- Create compile only test cases and re-run validation testing
- Post new patch upstream
Hey,
Regarding the GCC ABI 5 issue, I was wondering what's the policy
behind updating packages on stable updates for both Debian and Ubuntu.
Our time frame is a bit constrained, and we definitely will have to
take some hard decisions in the next six months, so I'd like to
understand everything that is at stake before I have my own opinion.
LLVM has a 6 month major cycle, releasing around February / August.
Major releases are allowed to break the ABI. Major breakages need one
release warning period.
Ubuntu has a 6 month release cycle, around April / October. IIUC,
major releases are allowed to have new versions of packages, but
updates for the next few years have to keep within the same major
release.
Debian has a -1 years release cycle (heh), and has the same major /
minor policy, which makes it a lot harder to update major versions.
However, I believe unstable is still not closed, nor will be in August
this year, so updating to LLVM 3.9 will not be a problem, but it will
mean users will have to wait a bit more to get a working LLVM.
The time frame is then:
3.8.0 released March (without the fix)
Ubuntu X released April
3.9.0 releases August (hopefully with a fix)
Ubuntu X+1 released October
Debian freezes ??
LLVM 3.8.1 ??
If we don't back-port GCC ABI 5 into 3.8.1, Ubuntu users will not have
the fix ever, unless you *can* update to 3.9.0 in August.
Ubuntu X+1 will be fine using 3.9, as will Debian after August, unless
you guys freeze before that.
I believe both Debian and Ubuntu have a trunk-based LLVM package for
experimental use only, and it would be bad, but not completely broken,
to recommend users to use that meanwhile.
If Debian freezes *before* 3.9.0 is out, or if Ubuntu can't update to
3.9.0 on April's release, then we'll have a strong reason to back-port
the change to 3.8.x. If not, even though it will be uncomfortable for
users until August, the argument is not that strong and will be hard
to get it through.
Any comments? Ideas? Does any of that make sense?
cheers,
--renato
Hi,
I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC
5.2-2015.11-1) and have noticed a difference with the __sync
intrinsics.
Here is the simple test case
--- cut here ---
int add_int(int add_value, int *dest)
{
return __sync_add_and_fetch(dest, add_value);
}
--- cut here ---
Compiling with the stock gcc 5.2 (-S -O3) I get
---------
add_int:
.L2:
ldaxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1]
cbnz w3, .L2
mov w0, w2
ret
---------
Wheras with Linaro gcc 5.2 I get
---------
add_int:
.L2:
ldxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1]
cbnz w3, .L2
dmb ish
mov w0, w2
ret
---------
Why the extra (unnecessary?) memory barrier?
Also, is it worthwhile putting a prfm before the ldaxr. EG
add_int:
prfm pst1strm, [x1]
.L2:
ldaxr w2, [x1]
See the following thread
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html
All the best,
Ed
== Progress ==
o GCC dev. (7/10)
* Remote validation sanitizing:
- fixed last issues in dejagnu patch and submitted it uptsream
- 2 more cleanup/fix dejagnu patches submitted and merged upstream
- proposed a fix/workaround for the output pattern issues (>400
failures removed with this patch)
o Misc (3/10)
* Various meetings
* internal discussions
== Plan ==
o Try to follow connect remotely
o Extended validation work
== Progress ==
* GCC bugs:
- #2073 tried to reproduce it with a manually-built toolchain. No luck
* GCC validation:
- added support to choose simulated cpu (different from --with-cpu)
* GCC:
- completing Neon intrinsics tests, to prepare cleanup
* Validation:
- small improvements
* Misc (conf calls, meetings, emails, ...)
== Next ==
Remote Connect
== Progress ==
* Support (5/10)
- Working on PR17193
- Continue review on D17141
* Background (5/10)
- Code review, meetings, discussions, general support, etc.
- Connect preparations
- GCC ABI 5 discussions
- Assessing Swift calling convention impact ARM back-end
- Interviews
# Progress #
* TCWG-545, Handle "branch-to-self" instruction in single stepping.
[5/10] Patches are posted upstream for review.
* TCWG-532, one patch is committed and one patch is posted for review.
[2/10]
* Tweak ARM process record. [2/10]
Two patches are pushed in. Many test fails are fixed.
* FSF patches review. [1/10].
# Plan #
* Linaro Connect.
--
Yao
Hi,
I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
However, I still have a few whinges:-)
See attached copy.c / copy.s (This is a performance critical function from OpenJDK)
pd_disjoint_words:
cmp x2, 8 <<< (1)
sub sp, sp, #64 <<< (2)
bhi .L2
cmp w2, 8 <<< (1)
bls .L15
.L2:
add sp, sp, 64 <<< (2)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
(2) Nowhere in the function does it store anything on the stack, so why
drop and restore the stack every time. Also, minor quibble in the
disass, why does sub use #64 whereas add uses just '64' (appreciate this
is probably binutils, not gcc).
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldrb w2, [x3,w2,uxtw] <<< (3)
adr x3, .Lrtx4
add x2, x3, w2, sxtb #2
br x2
(3) Why use a byte table, this is not some sort of embedded system. Use
a word table and this becomes.
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldr x2, [x3, x2, lsl #3]
br x2
An aligned word load takes exactly the same time as a byte load and we
save the faffing about calculating the address.
.L10:
ldp x6, x7, [x0]
ldp x4, x5, [x0, 16]
ldp x2, x3, [x0, 32] <<< (4)
stp x2, x3, [x1, 32] <<< (4)
stp x6, x7, [x1]
stp x4, x5, [x1, 16]
(4) Seems to be something wrong with the load scheduler here? Why not
move the stp x2, x3 to the end. It does this repeatedly.
Unfortunately as this function is performance critical it means I will
probably end up doing it in inline assembler which is time consuming,
error prone and non portable.
* Whinge mode off
Ed