== This week ==
* TCWG-619
- v8 LTO build with different options for x86 and aarch64.
- Reported upstream v8 LTO build failure on ARM.
- Tried to build chromium with FSF gcc, linaro binary release and
linaro-4.9-branch
* PR49551
- Not able to reproduce ICE with latest trunk (r221871).
* Misc
- College assignments submission and term end.
== Next Week ==
* TCWG-619
- Build chromium with linaro-4.9-branch and trunk.
- Prepare stats for LTO build with different options for v8 on x86 and aarch64
- Try building chromium with LTO with FSF trunk for arm
* TCWG-639:
- Add enhancement to header file flattening script.
== Progress ==
Friday holiday
* Automation Framework (CARD-1378 2/10)
- Power cut in the office
- Fixing gateway, rebooting machines
- Mob management
* LLVM ARM Maintenance (CARD-1833 2/10)
- ARMTargetParser review
* Background (4/10)
- Code review, meetings, discussions, etc.
- All LLVM buildbots broken (one still)
- Trying to merge Android round/exception
- https://android-review.googlesource.com/#/c/125910/1
- Not that easy, will need bigger changes and tests to go in
== Plan ==
* Long holidays
* EuroLLVM
* Back on the 15th
Hi,
I did some tests on the following function
--- CUT HERE ---
int fibo(int n)
{
if (n < 2) return 1;
return (fibo(n-2) + fibo(n-1));
}
--- CUT HERE ---
and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.
Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.
.L3:
fmov w0, s8
sub w25, w25, #1
cmn w25, #1
add w0, w0, w27
fmov s8, w0
bne .L19
add w0, w0, 1
b .L2
Recompiling with -mgeneral-regs-only generates a huge improvement.
The following are the times I get on various partner HW. I have
normalised the -O2 times to 1 second so that I do not disclose actual
partner performance data:
Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec
Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec
Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec
Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec
So, in general, -O3 does actually do better than -O2, but in all cases
performance is better if I stop it using FP registers for int values.
I have put a tarball of the test program along with 3 binaries and 3
disassemblies here:-
http://people.linaro.org/~edward.nevill/fibo.tar
All the best,
Ed.
Hi,
I'm seeing the following build error trying to build from the current master
branch (1ac806b) of http://git.linaro.org/toolchain/binutils-gdb.
make[3]: *** No rule to make target `-L../zlib', needed by `run'. Stop.
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory `gdb/sim/arm'
The following commit predating the zlib changes appears to build without error.
b19a8f8545100a08ee2a64c05631aff6f651faa1
Thanks,
Chris
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
catomics - TCWG-436 [5/10]
* Got pointed at a suitable set of benchmarks, results still underwhelming
* However, patches were using relaxed atomics rather than no atomics at all
* Fiddled abe into building sysroots for me (I get libstdc++ that way)
Misc - [5/10]
* Tidied up some 'perf shotgun' scripting from the juno cache
investigation, so I've got the tools for next time
* Started sorting out my backups - but didn't finish before build-01's
death destroyed a bunch of work
* Raised priority of sorting out my backups, now just a matter of
waiting on some large rysncs
* Pieced my world back together on dev-01
=Plan=
Holiday Wednesday, public holidays next Friday and Monday
See how catomics do when we're conditionally-not-atomic-at-all
Investigate a bit to see if I can see if there's a reason we were
using relaxed atomics
Resurrect Jira benchmarking on dev-01
* Will include some porting, scripts don't work out of box on dev-01
One day off on Friday. [2/10]
# Progress #
* aarch64 gdb, the number of FAIL is reduced to 26 on aarch64-linux!
there are still about 10+ FAILs can be fixed. [4/10]
** TCWG-726, fails in gdb.base/break-interp.exp. Fixed.
Remove prelink package from juno board as aarch64 isn't supported.
** TCWG-681, fails in savedregs.exp. Patch is committed.
** PR 18139. Patches are committed.
* arm gdb, 938 fails for -mfloat-abi=soft and 1014 fails for
-mfloat-abi=hard. Analysing fails. [2/10]
* GDB kernel-awareness meeting with ST. [1/10]
Understand the definition of "kernel-awareness", and will
discuss about the design upstreams later.
* TCWG-716, investigate LLDB perf testing. [1/10] Done.
** LLDB already had something about performance testing, in
lldb/test/benchmarks/ and lldb/tools/lldb-perf.
** TestCompileRunToBreakpointTurnaround.py compares the speed of LLDB
and GDB, but in an incorrect way.
# Plan #
* Take more care on arm gdb test fails.
* Fix the rest of aarch64 gdb fails.
--
Yao
== Issue ==
* none
== Progress ==
* Infrastructure and Validation (1/10)
* GCC Upstream (6/10)
- PR63587 and PR64871 committed in FSF 4.9 branch.
- PR64208 patch review is OK, but needs to be validate on an iWMMXT platform
(pinged some Marvell people).
- Submitted a fix for arm_subsi3_insn (alternatives issue). This is
a stage1 patch.
- Identified another insn which has alternatives issues in Thumb2.
* Release and Backports (1/10)
- Backflip maintenance
- 12 Backports for 2015.04 (CARD TCWG-699)
* Misc (2/10)
- Various meetings
- ST internal year review
== Plan ==
- Continue upstream work.
* ASAN/TSAN run on 42 bit VA Aarch64 (TCWG-634) (6/10)
Sent a patch that enables ASAN tests with 64 bit allocator on
amd-01 (AMD Seattle). All ASAN test passes in LLVM.
But on juno platform 39 bit VA does not have enough memory to map
hence we need to stay on 32 bit allocator.
Discussed with ASAN community and it is been decided to use 32 bit
allocator as default. They are not ok with having a mechanism to
detect VA and swutch allocators based on that.
Started looking at failures on amd-01 (AMD steatle) with 32 bit alloctor.
None of the ASAN tests ran when I switched to 32 bit allocator on amd-01.
Reason there is a spin mutex lock which is waiting for the memory
allocation to complete, but assertion failure makes it to wait
infinitely.
After fixing map range the assertion failure is gone but I keep
getting some failures with 32 bit allocator "on".
Bug869: Continued to look at ABS_EXPR cases (2/10).
* Emails, meetings. (2/10)
* Linaro 1-1 with christophe, Ryan, status meet
* AMD meetings/event, 1-1 with AMD manager, status meeting.
* GCC mailing list.
== Plan ==
*Continue to fix TSAN/ASAN 32bit allocator failures on amd-01 .
* Bug869
== Progress ==
* Type promotion pass (zero/sign extension elimination) - TCWG-547 (2/10)
- Ran more benchmarks and gathered more data (will post the results)
- Need to run perf to analyse regressions
* Bug 1373 (1/10)
- Set-up back-porting infrastructure
- Ran into some issues
* TCWG-486 (6/10)
- Discussed with Jim and identified the issues and possible fixes
- Getting closer to an acceptable fix
- Need to run benchmarking
* Misc (1/10)
- gcc-patchs and gcc-bugs list
== Plan ==
* TCWG-620 and TCWG-547
== Progress ==
LLDB development
-- Patch submission and build testing LLDB Arm SysV ABI classes
[1/10] [TCWG-643]
-- Patch submission and build testing LLDB AArch64 SysV ABI classes
[1/10] [TCWG-715]
-- Implemented native Linux register context for Arm [3/10] [TCWG-650]
-- Implemented POSIX register context for Arm [3/10] [TCWG-755]
-- Migrated LLDB wiki to collaborate.linaro.org and updated howtos
[1/10] [TCWG-640] [TCWG-641] [TCWG-583]
-- Another try on doing a native LLDB build on arm [1/10] [TCWG-647]
Miscellaneous [1/10]
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Complete implementation and submit native Linux register context
for Arm upstream
-- Complete implementation and submit POSIX register context for Arm upstream
-- Patch reviews and upstream commits.
-- Start work on LLDB arm integration, testing and bug fixing.
Miscellaneous
-- Try LLDB armhf builds and figure out a way to do gcc 4.8 softfloat build.
== This Week ==
* TCWG-619:
- LTO and non-LTO builds of v8 and chromium on x86, arm, and aarch64 native and
x86->arm, x86->aarch64 cross.
- LTO build for v8 on arm native and with x86->arm cross works with linaro-4.8,
but not with linaro-4.9. Also appears to fail for trunk.
- Issues in building chromium cross x86->arm - undefined reference to
clock_gettime.
* PR 49551
- Patch approved by Charles.
== Next Week ==
- v8 LTO build with different lto options.
- Investigate LTO build failure for v8 on arm.
- LTO and non-LTO builds for chromium on x86, arm and aarch64.
- Submit patch to PR49551 for upstream review after testing on x86, arm.
== Progress ==
* Validation
- worked on stabilization of abe and jenkins jobs
* Backports
- a few reviews
* Misc
- meetings, conf-calls, emails, ...
== Next ==
* Validation: hopefully make the staging, then stable branches
== Progress ==
* Automation Framework (CARD-1378 5/10)
- Moving LLVM lab into llvm.tcwglab subnet
- Passing down my knowledge to the lab team
- Helping them set up the new builders
* Background (5/10)
- Code review, meetings, discussions, etc.
- Upgrading APM's compiler/binutils
- Writing LLVM Getting started wiki page
- Helping Adhemerval setup
== Plan ==
* Go back working on LLVM
== Progress ==
qemu-system experiment [4/10]
Tried to set up qemu-system for reliable simulated validation of tests
which don't work under qemu-user. Mostly works, but there is arcane
interaction between DejaGNU, gcc testsuite and board files which make
it a bit flakey. Interesting experiment, but I've dropped it for now
as there still niggles to iron out.
Misc [4/10]
Patch review for Prathamesh
Backporting stuff
ABE bugzilla stuff
Benchmarking results
Emails/doc review about Lab infrastructure
Holiday Friday [2/10]
== Plans ==
Holiday Monday
Investigate autovectorization
Next backport
== Progress ==
LLDB development
-- Implemented LLDB Arm SysV ABI classes [3/10] [TCWG-643]
-- Implemented LLDB AArch64 SysV ABI classes [3/10] [TCWG-715]
-- Started implementation of Arm native register context [1/10] [TCWG-650]
-- Figure out steps to run lldb-remote testsuite on Arm and AArch64
[1/10] [TCWG-640] [TCWG-641]
-- Try to build lldb-server natively on chromebook [1/10] [TCWG-647]
Miscellaneous [1/10]
-- Meetings, emails discussions.
-- Updates to wiki pages for LLDB howtos
== Plan ==
LLDB development
-- Complete implementation and submit LLDB Arm SysV ABI classes upstream
-- Complete implementation and submit LLDB AArch64 SysV ABI classes upstream
-- Further progress on implementation of native register context
-- Begin implementation of POSIX monitor register context for arm.
Miscellaneous
-- Migrate LLDB pages to collaborate.linaro
ASAN/TSAN run on 42 bit VA Aarch64 with 64 bit allocators (TCWG-634) (6/10)
* Juno does not have space for kernel allocator map demanded by
ASAN, So we need to remain on 32 bit allocators only.
* amd-01 went offline. So moved to internal machine in AMD.
Debugging LLVM test failures in GDB showed that ASLR should be
turned off and also the shadow offset is set at 1<<36 and is not
changing when I fix it in asan_mappings.h file .
Manually changing shadow offset to 1<<39 fixes some segfaults.
Bug869: Continued to look at ABS_EXPR cases (2/10).
* Emails, meetings. (2/10)
* Linaro 1-1 with christophe, Ryan
* AMD meetings/event, 1-1 with AMD manager, status meeting.
* GCC mailing list.
== Plan ==
*Continue to fix TSAN/ASAN 64 bit allocator failures on amd-01 .
* Bug869
== Issue ==
* none
== Progress ==
* Infrastructure and Validation (1/10)
- Validate staging builders, still some issues with guality tests
* GCC Upstream (5/10)
- PR64208 submitted a patch that fixes the LRA ICE for iwmmxt target.
- PR63587 and PR64871 submitted patches that backport the fixes into
FSF 4.9 branch. Patches approved, to be committed.
* Release and Backports (3/10)
- Finished Backflip improvements, dev branch merged into master.
- Presented this new features, stacked backports process and conflict
handling during our GCC team weekly meeting.
* Linaro Bugzilla (-/-)
- #1322 - Identified it as already resolved on our 4.8 branch.
* Misc (110)
- Various meetings
== Plan ==
- Continue on upstream bugzillas, backports and validation.
catomics - TCWG-436 [6/10]
* Started a series of runs on a local board I'd borrowed
** Then had to give it back before they'd really got anywhere
* Got some, possibly dubious, results back from A15 from previous week
** If the results are worth anything, they suggest that catomics don't
achieve anything
* Started again with a subset of SPEC on juno-01, as it was on my desk
for the weekend anyway
** Results again underwhelming
** Maybe I picked the wrong subset, maybe A57 is too smart
Misc [4/10]
* Including a little 'juno cache effects' followup, a little juno-01
work, and a lot of mail catchup
=Plan=
* Get back to benchmark automation
** Apply a bunch of small improvements I've got on a branch
** Get a working Jenkins backport benchmarking prototype
** Sort out sources/results storage
* Think about why catomics may not be showing any effect
** Starting to believe that this is a red herring
** But might be interesting to try 'little' class cores
** But that does involve finding a reliable target I can hold for a long time
Juno cache effects - LDTS-1238 [6/10]
* Seems to be mainly due to (expected) instruction scheduling
limitations, and prefetcher effects
* Reported back, hopefully this will wrap up now
catomics - TCWG-436 [1/10]
* Shepherding benchmark runs in LAVA, usual problems with ssh-agent,
juno contention and random target failure
* Almost no actual data produced
benchmark automation - TCWG-360 [1/10]
* User support, some discussion about extent of our juno usage
* Something weird happened in Jenkins, _might_ have been a one-off due
to slaves moving around
Misc - [2/10]
* Featuring juno-01 fixing
TCWG-619:
- Cross compiled v8 on ARM using linaro toolchain (binary release).
- Built chromium LTO native
- Building v8 on ARM with LTO results in ICE at lto_tag_to_tree_code.
- Buiilding v8 (without LTO) with linaro-4.9-branch results in ar error.
- Cross compiling chromium on ARM with LTO using linaro toolchain
segfaults ld
- Using gcc-nm, gcc-ar works as a work-around for "plugin needed to
handle lto object" error
TCWG-621:
- Finished refactoring sel-sched-ir.h
* Bugs
- PR49951: Modified patch to fix few test-cases.
* Misc:
- Internal college event on Saturday.
== Next Week ==
- TCWG-619
- Test patch for PR49951 and submit upstream.
- Refactor lra-int.h
== Progress ==
* type promotion pass (zero/sign extension elimination) - TCWG-547 (6/10)
- Fixed LTO testcase failure
- Native testing on arm chromebook found three more failures
- Fixed all of them
- Setup spec2006 on chromebook
- spec2006 with -O3 -mfpu=neon -march=armv7-a -fno-common shows some
(12 of them) regressions even though there are some gains (17 of them).
- GEOMEAN is the same.
- 437.leslie3d regresses 18% for -O3 but improves 16% if I use -O2 in
both the original and with the patch
- some optimizations like vectorization could be impacted (?)
- restarted the full benchmarking at -O2
* TCWG-620 (1/10)
- read more documents and looked at code samples
* TCWG-486 (2/10)
- Latest trunk didn’t work with the patch I had
- Original patch Zhenqiang also behaves similar. Looking into it.
* Misc (1/10)
- gcc-patchs and gcc-bugs list
== Plan ==
* TCWG-620 and TCWG-547
== Progress ==
* Thursday off (2/10)
* Buildbots (CARD-1823 1/10)
- Fixing llvm-apm-01 (disk problem)
- Fixing llvm-d01-04 (my bad)
* Releases (CARD-1431 1/10)
- Spinning release 3.5.2 RC1, all green
* Automation Framework (CARD-1378 3/10)
- A lot of time wasted in infra shenanigans
* Background (3/10)
- Code review, meetings, discussions, etc.
- Reviewing, testing and committing ARM11 patch by Tinti
- Getting ircproxy to work
- Broken bots a-plenty
== Plan ==
* Fork LLVM lab out of TCWG
* Zillions of patches to review
* Continue target description changes
* Welcome Adhemerval, setup LLD track
* Catch up with Omair on LLDB