= Progress ==
* Day off (2/10)
- After Connect, recuperating, jet lagging
* EuroLLVM (6/10)
- Flying Wed to Barcelona, attending conference
- Back on Saturday
* Background (2/10)
- Code review, meetings, discussions, general support, etc.
- Planning for a bigger team (git, Jenkins, infrastructure)
* Sick on Monday [2/10]
# Progress #
* AArch64/ARM linux syscall for process record. [2/10] TCWG-532
ARM patch (fixing the register for syscall arg pass) is committed.
Canonicalize ARM syscall patch is posted for review.
* Support range stepping on arm-linux. [4/10] TCWG-545
Preparatory patches fixing bugs when "single step the instruction
branch to itself" are being reviewed. Pedro thinks my patches may
not work in some rare cases, and I spend some time writing the case
and prove it won't happen.
* Misc [2/10]
** file expense,
** upstream patch review,
# Plan #
* TCWG-532
* TCWG-545
--
Yao
The Linaro Toolchain Working Group (TCWG) is pleased to announce the
2016.03 snapshot of the Linaro GCC 5 source package.
This monthly snapshot[1] is based on FSF GCC 5.3+svn234210 and
includes performance improvements and bug fixes backported from
mainline GCC. This snapshot contents will be part of the 2016.05
stable [1] quarterly release.
This snapshot tarball is available on:
http://snapshots.linaro.org/components/toolchain/gcc-linaro/5.3-2016.03/
Interesting changes in this GCC source package snapshot include:
* Updates to GCC 5.3+svn234210
* Backport of [Bugfix] [AArch64] [Linaro #1994] Disable
pcrelative_literal_loads with fix-cortex-a53-843419
* Backport of [Bugfix] [AArch64] [Linaro #2123] Fix dependency of gcc-plugin.h
* Backport of [Bugfix] [AArch32] PR target/62554 target/69610 Fix for ARMv3
* Backport of [Bugfix] [AArch32] PR target/69161: Don't ignore mode
when matching comparison operator in cstore-like patterns
* Backport of [AArch32] Enable instruction fusion of AES instructions
on ARM for Cortex-A53 and Cortex-A57
* Backport of [AArch64] Add missing return in aarch64_internal_mov_immediate
* Backport of [AArch64] Enable instruction fusion of dependent AESE;
AESMC and AESD; AESIMC pairs
* Backport of [AArch64] Fix installed plugin headers for aarch64, m68k and c6x
* Backport of [AArch64] GCC 6 regression in vector performance. - Fix
vector initialization to happen with lane load instructions
* Backport of [AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS
* Backport of [Testsuite] [AArch64] add check for aarch64 in
check_effective_target_section_anchors
* Backport of [Testsuite] Print markers to stderr to avoid races with
sanitizer output
* Backport of [Misc] Fix ChangeLog for 233518
Subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro development.
** Linaro Toolchain Development "mailing list":
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
** Linaro Toolchain IRC channel on irc.freenode.net at @#linaro-tcwg@
* Bug reports should be filed in bugzilla against GCC product:
http://bugs.linaro.org/enter_bug.cgi?product=GCC
* Interested in commercial support? inquire at "Linaro support":
mailto:support@linaro.org
[1]. Stable source package releases are defined as releases where the
full Linaro Toolchain validation plan is executed.
[2]. Source package snapshots are defined when the compiler is only
put through unit-testing and full validation is not performed.
== Progress ==
o BKK16 remote (5/10)
* Followed TCWG sessions
* Extended validation:
- worked with Kugan
- implemented job for native validation
o GCC dev. (4/10)
* Remote validation sanitizing:
- iterate on the output pattern fix
- testing a fix for stderr/stdin ordering issue
* Gave some support on __sync builtins, preparing a fix for armv8.1
o Misc (1/10)
* Various meetings
== Plan ==
o GCC 5 branch merge, and 2016.03 snapshot
o Continue on-going tasks
Port to microinstance - TCWG-432 [17/10]
* Investigating difference between LAVA and 'desktop Juno' runtimes
** Some of this was down to piles of /dev/console output - redirecting
to file improved SPEC build time by 75%!
** Some cases make sense, others remain unexplained
** Might just go away if we update the Juno image
* Wrote up how to do benchmarking for minimal-trust cases
** Needs both lab and development work
* Merged another large tranche of changes back to benchmarking branch
** Microinstance more or less functional, main instance benchmarking
seems unbroken
** But some more tweaks to make as Lab work happens
* Prepared backport benchmarking for merge
Misc [3/10]
=Plan=
* Submit backport benchmarking for review
* Tweak uinstance in reaction to lab work
* Implement the non-lab side of minimal-trust benchmarking
* Return to looking at Juno image generation
* Look some more at LAVA/desktop runtime differences
* Implement small improvements, if time
== This week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names (3/10)
- Resolved thumb2 failures
- Negdi2 was not generating instruction to set condition codes
* Bugzilla 70008 - [ARM] Reverse subtract with carry can be generated in
thumb2 mode (1/10)
- Created new patch using new predicate that matches arm and thumb2
constraints
- Received approval to GCC 7 stage 1
* Bugzilla 70014 - [ARM] Predicate does not match constraint
(*subsi3_carryin_const) (1/10)
- Fix checked into trunk
* Linaro Connect meetings (5/10)
== Next week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names
- Create compile only test cases and re-run validation testing
- Post new patch upstream
Hey,
Regarding the GCC ABI 5 issue, I was wondering what's the policy
behind updating packages on stable updates for both Debian and Ubuntu.
Our time frame is a bit constrained, and we definitely will have to
take some hard decisions in the next six months, so I'd like to
understand everything that is at stake before I have my own opinion.
LLVM has a 6 month major cycle, releasing around February / August.
Major releases are allowed to break the ABI. Major breakages need one
release warning period.
Ubuntu has a 6 month release cycle, around April / October. IIUC,
major releases are allowed to have new versions of packages, but
updates for the next few years have to keep within the same major
release.
Debian has a -1 years release cycle (heh), and has the same major /
minor policy, which makes it a lot harder to update major versions.
However, I believe unstable is still not closed, nor will be in August
this year, so updating to LLVM 3.9 will not be a problem, but it will
mean users will have to wait a bit more to get a working LLVM.
The time frame is then:
3.8.0 released March (without the fix)
Ubuntu X released April
3.9.0 releases August (hopefully with a fix)
Ubuntu X+1 released October
Debian freezes ??
LLVM 3.8.1 ??
If we don't back-port GCC ABI 5 into 3.8.1, Ubuntu users will not have
the fix ever, unless you *can* update to 3.9.0 in August.
Ubuntu X+1 will be fine using 3.9, as will Debian after August, unless
you guys freeze before that.
I believe both Debian and Ubuntu have a trunk-based LLVM package for
experimental use only, and it would be bad, but not completely broken,
to recommend users to use that meanwhile.
If Debian freezes *before* 3.9.0 is out, or if Ubuntu can't update to
3.9.0 on April's release, then we'll have a strong reason to back-port
the change to 3.8.x. If not, even though it will be uncomfortable for
users until August, the argument is not that strong and will be hard
to get it through.
Any comments? Ideas? Does any of that make sense?
cheers,
--renato
Hi,
I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC
5.2-2015.11-1) and have noticed a difference with the __sync
intrinsics.
Here is the simple test case
--- cut here ---
int add_int(int add_value, int *dest)
{
return __sync_add_and_fetch(dest, add_value);
}
--- cut here ---
Compiling with the stock gcc 5.2 (-S -O3) I get
---------
add_int:
.L2:
ldaxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1]
cbnz w3, .L2
mov w0, w2
ret
---------
Wheras with Linaro gcc 5.2 I get
---------
add_int:
.L2:
ldxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1]
cbnz w3, .L2
dmb ish
mov w0, w2
ret
---------
Why the extra (unnecessary?) memory barrier?
Also, is it worthwhile putting a prfm before the ldaxr. EG
add_int:
prfm pst1strm, [x1]
.L2:
ldaxr w2, [x1]
See the following thread
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html
All the best,
Ed
== Progress ==
o GCC dev. (7/10)
* Remote validation sanitizing:
- fixed last issues in dejagnu patch and submitted it uptsream
- 2 more cleanup/fix dejagnu patches submitted and merged upstream
- proposed a fix/workaround for the output pattern issues (>400
failures removed with this patch)
o Misc (3/10)
* Various meetings
* internal discussions
== Plan ==
o Try to follow connect remotely
o Extended validation work
== Progress ==
* GCC bugs:
- #2073 tried to reproduce it with a manually-built toolchain. No luck
* GCC validation:
- added support to choose simulated cpu (different from --with-cpu)
* GCC:
- completing Neon intrinsics tests, to prepare cleanup
* Validation:
- small improvements
* Misc (conf calls, meetings, emails, ...)
== Next ==
Remote Connect
== Progress ==
* Support (5/10)
- Working on PR17193
- Continue review on D17141
* Background (5/10)
- Code review, meetings, discussions, general support, etc.
- Connect preparations
- GCC ABI 5 discussions
- Assessing Swift calling convention impact ARM back-end
- Interviews
# Progress #
* TCWG-545, Handle "branch-to-self" instruction in single stepping.
[5/10] Patches are posted upstream for review.
* TCWG-532, one patch is committed and one patch is posted for review.
[2/10]
* Tweak ARM process record. [2/10]
Two patches are pushed in. Many test fails are fixed.
* FSF patches review. [1/10].
# Plan #
* Linaro Connect.
--
Yao
Hi,
I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
However, I still have a few whinges:-)
See attached copy.c / copy.s (This is a performance critical function from OpenJDK)
pd_disjoint_words:
cmp x2, 8 <<< (1)
sub sp, sp, #64 <<< (2)
bhi .L2
cmp w2, 8 <<< (1)
bls .L15
.L2:
add sp, sp, 64 <<< (2)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
(2) Nowhere in the function does it store anything on the stack, so why
drop and restore the stack every time. Also, minor quibble in the
disass, why does sub use #64 whereas add uses just '64' (appreciate this
is probably binutils, not gcc).
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldrb w2, [x3,w2,uxtw] <<< (3)
adr x3, .Lrtx4
add x2, x3, w2, sxtb #2
br x2
(3) Why use a byte table, this is not some sort of embedded system. Use
a word table and this becomes.
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldr x2, [x3, x2, lsl #3]
br x2
An aligned word load takes exactly the same time as a byte load and we
save the faffing about calculating the address.
.L10:
ldp x6, x7, [x0]
ldp x4, x5, [x0, 16]
ldp x2, x3, [x0, 32] <<< (4)
stp x2, x3, [x1, 32] <<< (4)
stp x6, x7, [x1]
stp x4, x5, [x1, 16]
(4) Seems to be something wrong with the load scheduler here? Why not
move the stp x2, x3 to the end. It does this repeatedly.
Unfortunately as this function is performance critical it means I will
probably end up doing it in inline assembler which is time consuming,
error prone and non portable.
* Whinge mode off
Ed
== Progress ==
o GCC dev. (7/10)
* Remote validation sanitizing:
- Implemented and tested a pure dejagnu fix (the actual
implementation works fine for GCC but might be an issue in a different
context, a cleaner fix almost done)
- Found a latent issue in GCC profiling test harness
* ARM and AArch64 backends LRA cleanup:
- Looked at the remaining artifacts, will prepare a patch for GCC 7
o Misc (3/10)
* Various meetings
* internal discussions
== Plan ==
o Finalize and submit dejagnu fix
Port to microinstance - TCWG-432 [7/10]
* Merged last few months of development back to benchmarking branch
* Restored support for multiple targets per builder
* Updated builder landed, altered jobs to work with it
** Removed assumption that host filesystem is non-persistent
** Stacked up test runs for the weekend
Transfer secret management to LAVA [1/10]
* LAVA jobs now use a within-LAVA key to access sources
Misc [2/10]
* Unsuccessful fiddling with heat-monitoring tools on Juno
* Usual background of mail and meetings
=Plan=
* Fallout from weekend test runs
** Some failure is going on, need to investigate
* Update docs and Jenkins configs w.r.t. last week's activity
* Further investigation on a couple of LAVA issues that are causing me pain
** Un-deserializable bundles
** Inaccessible image reports
* Continue assessing target stability/looking at inconsistent results
== This week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names (6/10)
- Tested and posted SImode and DImode patch upstream
- Feedback recommended supporting thumb2 in addition to arm architectures
- Patch to support thumb2 fails on all thumb architectures;
investigating failures
* Bugzilla 70008 - [ARM] Reverse subtract with carry can be generated in
thumb2 mode (2/10)
- Created new bug, developed and successfully tested patch
- Fix posted upstream
* Bugzilla 70014 - [ARM] Predicate does not match constraint
(*subsi3_carryin_const) (1/10)
- Created new bug and patch
* Misc (1/10)
== Next week ==
* Bugzilla 69663 - Cleanup by merging patterns using mode iterators,
submit upstream
* Bugzilla 70008 - Respond to upstream comments as appropriate
* Bugzilla 70014 - Post patch and respond to upstream comments
* Travel to Linaro Connect beginning March 3rd
== This Week ==
* LTO (6/10)
- TCWG 528:
a) reduced test-case for the case when decl node gets visited multiple times
b) updated patch not to walk artificial record decls (typeinfo
objects) as per Richard's suggestion.
submitted upstream, waiting for review.
- benchmarking: Aarch64 SPEC2006-int benchmarks complete
- looked at pr57703
- Slides
* setting up perf on chromebook (2/10)
- perf doc
- got perf running on chromebook by manually building it and set of
(clumsy) workarounds.
- perf annotate shows no output and perf stat shows "not supported" for almost
all entires except "page faults"
- will give a try to dual boot chrubuntu on chromebook
* half-day sick leave (1/10)
- doctor's appointment for eye inflammation
* Misc (1/10)
- Meetings
== Next Week ==
- LTO
- tcwg-310
- look at jenkins tutorial in collaborate wiki
== Progress ==
* Support (4/10)
- Updating patch D17141 for Darwin, resubmitting, discussions.
- Understanding PR21778, may need changes to SLP
- Benchmarking some scheduler choices for A17
* Release (1/10)
- 3.8.0 RC3 validation
* Background (5/10)
- Code review, meetings, discussions, general support, etc.
- Sifting through CVs, interviews, etc.
# Progress #
* Support range stepping on arm-linux. TCWG-518. [5/10]
Post patch series about "the thread is stepping over breakpoint but
it spawns child thread". The fix is OK but the test case changes are
being reviewed.
The more I test my range stepping patches, the more existing bugs I
find. Looking at the bug "software single step the instruction
branch to self."
* AArch64 linux syscall for record/replay. TCWG-532. [1/10]
Patch is out for review.
* Fix some ARM reverse debugging bugs. TCWG-183. [1/10]
Patch is pushed in. The original implementation wasn't carefully
reviewed, so I am sure there are bugs somewhere else.
* Patch review on arm tracepoint support. [1/10]
One patch is approved but I insist that another patch should be done
in generic part instead of ARM specific part, but the author wants do
it in ARM specific part because he things it is simpler.
* Misc [2/10]
** Go through the Linux kernel awareness GDB patches quickly, the first
reaction is "split your patch, please".
** Go to London to collect my passport.
# Plan #
* Support range stepping on arm-linux. TCWG-518.
* TCWG-167, TCWG-532.
* Prepare for the Linaro Connect travel.
--
Yao
Hi All,
Does linaro distributes arm-gcc as a pre-built static tool chain
distribution? If yes, where can i download them from. Please point me some
location from where i can download.
--
Thanks & Regards,
M.Srikanth Kumar.