== Progress ==
* Add REG_EQUAL note for arm_emit_movpair (1/10)
- committed patch1 after testing again
* Factor conversion out of COND_EXPR - TCWG-849 (5/10)
- Gone through couple of iterations and committed the patch
- There are still some improvements need as follow up patches
* TACT -TCWG-851 (2/10)
- Started looking into spec2k
* Misc (2/10)
- Looked at interaction between gcc optimization passes
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
- Linaro bug 1318
== Progress ==
* Releases (CARD-1431 1/10)
- Released 3.6.2-final
* Maintenance (CARD-1833 5/10)
- Reducing runtime of some benchmarks in LLVM's
test-suite by getting rid of millions of useless
fprintf calls.
- Working on https://llvm.org/PR20700 some more
* Background (4/10)
- Code review, meetings, discussions, etc.
- Long TargetTuple review (D10969) / discussions
- Replacing broken buildbot USB disks (need to buy more)
- Bisecting self-hosting bot breakage
- Testing patches for ARM
- Jira farming
== Plan ==
* continue PR20700
* continue review/discussion of TargetTuple
* look again at PR20757
* maybe look at PR21000
* Off Monday (2/10)
== Progress ==
* published linaro-4.8 and 4.9 2015.06 releases
* linaro-5.1-2015.07 (1/10)
- backport reviews
- updated my helper script for reviews for cope with the git-only branches
* upstream (1/10)
- started looking at vget_lane Neon intrinsic failure on armeb
* infra/release/backports (2/10)
- reviews
* Misc (4/10)
- meetings, conf-calls, emails
== Next ==
* Off Tuesday
* backports, release, validation: update doc
* backports, reviews
* upstream work
== Later ==
* Off July 18th-Aug 3rd
Hi Linaro Toolchain Group,
I am comparing execution time (run time) of sin() trigonometric function
between following glibc (including libm) libraries for aarch64 (juno cortex
a57) :
Linaro glibc 2.19, Linaro eglibc 2.19, eglibc 2.19 (from
http://www.eglibc.org/) and Linaro glibc 2.21.
My observation for execution time of sin():
with Linaro glibc 2.19 and eglibc 2.19 = 1m24.703s (approx)
whereas,
with Linaro eglibc 2.19 & Linaro glibc 2.21 = 0m25.243s (approx)
Has Linaro optimized the libm functions for aarch64 in Linaro eglibc 2.19 ?
If yes, please point me to relevant reference from where I can find more
information on them.
Since the eglibc development from version 2.19 has stopped, will Linaro
maintain its own development version of glibc ?
I am using below snippet code and linux 'time' command to calculate the
time.
void sin_func(void)
{
double incr = 0.732;
double result, count = 0.0;
printf("%s\n", __func__);
while (count < 105414350.0) {
result = sin(count);
count += incr;
}
}
Thanks.
--
with regards,
Virendra Kumar Pathak
== Progress ==
LLDB development [TCWG-855] [8/10]
-- Figure out build steps for building cross lldb-server with
arm-linux-genueabihf-g++
-- Debugging of lldb-server communication packets for fixing lldb-server
armhf crash problem.
-- Comparison with androidabi version to figure out missing pieces
Miscellaneous [2/10]
-- Ubuntu reinstall on laptop
-- Follow up on Czech Republic visa
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Work on lldb-server for armhf and try to figure out crash problems
[TCWG-855]
End of sick leave (will work 100% from home until my cast is removed).
== Progress ==
o Upstream GCC (2/6)
* Back to ongoing patches
o Linaro GCC release (7/6)
* Reviewed FSF branch merge into 4.8/4.9 branches
* Reviewed patches for tcwg-realease script
* Sent a first batch of backports for 5.1
Still pending due to validation infra. issues
o Misc (1/6)
* Various meetings
== Plan ==
- Continue ongoing tasks
* One day off on Fri. [2/10]
# Progress #
* TCWG-805, aarch64 native debugging multi-arch support. [5/10]
Patches (part1) are posted upstream for review, need to rewrite some
of them. The rest of them are OK and can be pushed in after 7.10
branch is created.
Watchpoint support in multi-arch debugging. Both kernel and GDB need
some fixes. Ongoing.
* Complete the document of aarch64 tracepoint work. [1/10]
* FSF GDB. [2/10]
Review intel mpx patch again, and read something on intel
mpx stuff.
# Plan #
* TCWG-805, update some patches in part 1 patch series, and continue
the multi-arch watchpoint work.
--
Yao
== Progress ==
* Add REG_EQUAL note for arm_emit_movpair (1/10)
- Updated and reposted
- https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00295.html
- https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02066.html
* Factor conversion out of COND_EXPR - TCWG-849 (3/10)
- https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00246.html
* TACT -TCWG-851 (2/10)
- Small examples now seem to work.
- Have to do cross testing
* Git work flow for upstream patches -TCWG-848 (1/10)
- Updated based on review
* Misc (3/10)
- Looked up LLVM documents
- Looked at the TODO list Renato provided
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
- Linaro bug 1318
== This Week ==
* TCWG-856 (2/10)
- submitted patch to flatten cfgloop.h:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00277.html
* TCWG-777 (4/10)
- Modified pass to not generate redundant stores
- Investigating ICE caused by the pass during gcc build
- Discussions for possible approaches with Christophe and Kugan
- Reading thru documentation on optabs and ccmp patches
* Misc (4/10)
- Patch sent upstream which fixes segfault in gcc for -dx option.
- Filed upstream binutils bug for "branch range out of error"
- Conference calls
- Travel to Mumbai for US Visa OFC appointment
== Next Week ==
- Word towards committing cfgloop.h flattening patch
- Continue working on TCWG-830, TCWG-777, TCWG-847
== Progress ==
* Maintenance (CARD-1833 4/10)
- ADD/SUB with negative immediates solved by a year old
patch from ARM, sigh. On to the next bug... :(
- Working on https://llvm.org/PR20700
* Buildbots (CARD-1823 2/10)
- Moving benchmark bot to CMake, fixing deepcopy bug in
environment that broke new builds
- Restarting a few bots that crashed
* Background (4/10)
- Code review, meetings, discussions, etc.
- A lot of code review this week...
- Blocking disrespectful web spiders in llvm.org
- Emacs now almost works as I expect
== Plan ==
* Continue PR20700
* Have a look at Polybench
* Look for some more bugs to fix
Benchmarking presentation [7/10]
* More reading
* Ran through a couple more drafts
Misc [3/10]
* Featuring a bug in my backup scripts that took ~1/10 to fix
=Plan=
Back to benchmark automation as main activity
Presentation in the background
== Progress ==
LLDB development
-- Support for running lldb on arm hard float abi targets [TCWG-855] [7/10]
-- Built lldb-server for armhf trusty chromebook
-- Figured out problem with lldb-server showing up i386-linux-gnu as
target triple.
-- Verfied load of arm-elf executable and breakpoint setting.
-- LLDB GDBserver dies while trying to run the target.
Miscellaneous [3/10]
-- Playing with highkey board and setup chromebook with armhf and armel
chroots on ssd.
-- Preparing document for Czech Republic visa
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Further progress and try to fix run control on armhf targets [TCWG-855]
== This week ==
* TCWG-146 - Detect smin/umin idion (1/10)
- Patch sent upstream for approval
* TCWG-140 - Transform end of loop conditions to min_expr (4/10)
- Patch and investigating validation regressions
* TCWG-833 - Exploit Wide Add operations when appropriate (4/10)
- Investigation into why vectorizer does not exploit wide adds
* Misc (1/10)
- Conference calls
- Conference call with Kugan and Prathamesh to discuss GCC Git workflow
- Conference call with Charles and Prathamesh to discuss
autovectorization
== Next week ==
- Vacation
== Progress ==
(TCWG-831) post-indexed addressing [3/10]
. vectorization project kick-off call
. code browsing/reading to understand mailing list feedback about previous patch
(TCWG-775) NEON error messages [6/10]
. completed conversion of some ARM intrinsics to give same error
messages as AArch64 work
. reworked tests so they can be shared between AArch64, ARM.
. re-submitted previous patch with updated tests
Misc [1/10]
email, irc, gerrit reviews, connect travel booking, AArch64 qemu
big-endian experiment
== Plans ==
submit patch for work done so far on ARM NEON error messages
cortex-a53 workarounds
Benchmark automation - TCWG-360 [3/10]
* Created a partial Jenkins prototype
* Considered some security issues
Benchmarking presentation [5/10]
* Drafted some slides, did some reading
Misc [2/10]
=Plan=
More of the above
== Progress ==
* TCWG-849 (1/10)
- Committed improvement for VRP
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=225108
* Add REG_EQUAL for arm_emit_movpair (4/10)
- Posted patches for review
* TACT -TCWG-851 (3/10)
- Started with the small examples.
- Ran into an error while tuning; looking into it
* Git work flow for upstream patches -TCWG-848 (1/10)
- Had a chat with Michael and Prathamesh
- Tried the work-flow and now started documenting them
* Misc (1/10)
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
* TCWG-830 (4/10)
- Observing tree dumps
- Peeling for alignment happens at -O3 but not at -O2 -ftree-vectorize
Reason: in vect_enhance_data_refs_alignment() for:
a) -O2 -ftree-vectorize: max_allowed_peel == 0
b) -O3: max_allowed_peel == (unsigned) -1;
which equals UINT_MAX and therefore peeling gets allowed.
- Workaround: Pass -param vect-max-peeling-for-alignment=0
- Peeling for alignment with O2 can be enabled by passing
-fvect-cost-model (we don't want this!)
Reason:
opts.c:
/* Tune vectorization related parametees according to cost model. */
if (opts->x_flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
{
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS,
6, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS,
0, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
0, opts->x_param_values, opts_set->x_param_values);
}
The above if condition becomes false when -fvect-cost-model is passed.
- Proposed patch (untested): http://pastebin.com/ftp0mrwH
Patch follows the workaround and passes --param vect-max-peeling-for-alignment=0
if unaligned access is supported.
* TCWG-777 (4/10)
- Observing tree and rtl dumps
- Workaround: for -O1 pass -fno-tree-fre -fno-tree-dominator-opts
Test-case: http://pastebin.com/cjBcSpiT
Generated assembly at -O1 without workaround: http://pastebin.com/jmQGZhN9
Generated assembly at -O1 with workaround: http://pastebin.com/JGj05z66
Is that the expected output for no unnecessary temps in assembly with
workaround ?
Is it profitable over the assembly generated without workaround ?
- Approach currently taken:
a) New pass "remove-temps" (for lack of better name), after nrv (added
as last gimple pass).
b) Transforms:
if (ssa_var != 0)
to
new_ssa_var = SSA_NAME_DEF_STMT (ssa_var)
if (new_ssa_var != 0)
This "unfolds" cse on expressions within if, which was done by fre
(and if fre was disabled then by dom pass).
c) However this approach results in dead stores.
eg:
_8 = flags_7(D) & 1;
if (_8 != 0)
...
is transformed to:
_8 = flags_7(D) & 1;
_32 = flags_7(D) & 1;
if (_32 != 0)
...
so store to _8 is dead store.
I tried to run dse after remove-temps but that didn't work.
RTL 194r.jump eliminates the above dead store as "trivially dead insn".
However I don't think it's a good idea to have dead stores like these
in gimple and rely
on RTL to eliminate them. I could try to make the pass bit smarter to
not generate redundant stores like _32 != 0 in above case.
d) Patch (no intent to commit as-is): http://pastebin.com/AGXnSkrZ
Generated assembly at -O1 with the patch: http://pastebin.com/VmHCVpGC
Patch eliminates temporaries at -O1 but not at -O2.
I have not yet figured out the reason for that.
For if (flags & 1),
In dfinish pass for -O1, the generated RTL is from
zeroextractsi_compare0_scratch
while for -O2, the generated RTL is from andsi3_compare0
e) Is this a problem also on x86 ?
x86 generated assembly with -O1: http://pastebin.com/XMeTXXwK
* Misc (2/10)
- Getting familiar with vectorizer and NEON gcc intrinsics
- Reviewed git tutorials and starting preparation of git doc
- Conference calls
== Next Week ==
- Continue working on TCWG-830 and TCWG-777
- Header file flattening
- Travel to Mumbai on 2nd July (Thursday) for US Visa OFC appointment.
== Progress ==
* Maintenance (CARD-1833 4/10)
- Found the trail on the ADD/SUB with negative immediate
- Submitting RFC for discussion (http://llvm.org/PR20978)
- Bugzilla farming
- More LNT investigations (http://llvm.org/perf/ unstable)
* Releases (CARD-1431 1/10)
- Building, testing and uploading 3.6.2 RC1
* Background (5/10)
- Code review, meetings, discussions, etc.
- More stride vectorizer code review (lnN/stN implementation)
- More lab discussions (routers, lab split, new link)
- Changing my dev env to emacs (huge mind set flip)
== Plan ==
* Continue with ADD/SUB change
* Continue with Emacs setup
* Move benchmark bot to CMake
* Some other bugs
* One day off on Thu [2/10]
# Progress #
* Linaro GDB [4/10]
** TCWG-805, aarch64 native debugging multi-arch support.
Prepare for the patches submission.
It is a big patch series, and think about how to upstream them.
Write commit log including the rationale of the changes.
* FSF GDB [2/10]
** FSF GDB 7.10 release. Audit some GDB regressions caused by intel
mpx stuff.
** PR 18605. Write a patch and it is in testing.
** Other patches review.
* Misc [2/10]
** File expense report for Grenoble travel.
** Some discussions on aarch64 tracepoint.
# Plan #
* TCWG-805, upstream some patches on multi-arch debugging.
--
Yao
* One day off (Wed) (2/10)
== Progress ==
* linaro-5.1-2015.06 snapshot (1/10)
- dealt with tags, release notes
- shared it with B&B
* 4.8-2015.06 branch merge (1/10)
- investigated regression: incorrect automatic merge
- fixed, validation on-going
* 4.9 branch (2/10)
- updated our git linaro-4.9-branch to match the svn one
- ready for branch merge, will be done right after fsf release
* Misc (4/10)
- meetings, conf-calls, emails, reviews (GCC backports, ABE, backflip)
== Next ==
* more reviews for new backports
* backports, release, validation: update doc
* hopefully upstream work
Recently I came across two excellent post about accelerating clang/llvm
build with different compiler/optimization [1] [2].
I tried some of author advices getting very good results. Basically I
moved to optimized clang build, changed to gold linker and used another
memory allocator than system glibc one. Results in build time for all
the clang/llvm toolchain is summarized below (my machine is a i7-4510U,
2C/4T, 8GB, 256GB SSD):
GCC 4.8.4 + gold (Ubuntu 14.04)
real 85m17.640s
user 257m1.976s
sys 11m35.284s
LLVM 3.6 + gold (Ubuntu 14.04)
real 34m4.909s
user 128m43.382s
sys 3m51.643s
LLVM 3.7 + gold + tcmalloc
real 32m56.707s
user 121m40.562s
sys 3m52.358s
Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4
while linking in 8GB without issue any swapping.
Two things I would add/check for the posts:
1. Change from libc to tcmalloc showed me a 3-4% improvement. I tried jemalloc,
but tcmalloc is faster. I am using currently system version 2.2, but I have
pushed an aggressive decommit patch to enable as default for 2.4 that might
show lower RSS and latency (I will check it later).
2. First I try to accelerate my build by offloading compilation using distcc.
Results were good, although the other machine utilization (i7, 4C/8T, 8GB)
showed mixes cpu utilization. The problem was linking memory utilization
using ld.bfd, which generates a lot of swapping with higher job count. I
will try using distcc with clang.
[1] http://blogs.s-osg.org/an-introduction-to-accelerating-your-build-with-clan…
[2] http://blogs.s-osg.org/a-conclusion-to-accelerating-your-build-with-clang/
Benchmark automation - TCWG-360 [7/10]
* Arndales stopped booting
** Package servers for elderly filesystem had gone
** Investigated some approaches to creating more stable filesystems
** Realized I could just updated image to point at old-releases, so
did that for now
* _More_ time thinking about interactions with Jenkins & LAVA. Fathi
gave me some Jenkins jobs to prototype in.
* Brain-dumped some of the present state of things into Collaborate
Misc - [3/10]
=Plan=
Jenkins prototyping