== This Week ==
* TCWG-835 (4/10)
- Validated and submitted patch upstream for review.
- Made changes to patches according to upstream reviews.
- Microbenchmarks: http://pastebin.com/tDnHZuG5
* TCWG-777 (5/10)
- Investigating different ICE's caused by my gimple remove-temps pass
- Looked at expansion of GIMPLE_COND
- Trying to write rtl version of remove-temps pass
* Misc (1/10)
- Conference calls
== Next Week ==
- Continue with TCWG-777
- GNU Tools Cauldron 2015
== Progress ==
* Performance (CARD-1832 1/10)
- Checking differences of PostRAListSched on OOO ARM cores
- Not many changes, ignoring for now
* Maintenance (CARD-1833 4/10)
- Building libc++/abi/unwind in LLVM/Clang tree
- Getting -Wa,-mfpu patches in, last important Clang driver ARM bug
- Some patches to get libunwind and libc++ to compile in-tree on ARM
- Fixed native sub-features detection (http://llvm.org/PR12794)
* Background (5/10)
- Code review, meetings, discussions, etc.
- Long discussions about TargetTuple/TargetParser/Triple
- Lots of patch reviews this week (I mean, *A LOT*)
- Moving some machines around, checking for Chromebook batteries
- Setting up cross-builder using multiarch / QEMU
- Some future planning
== Plan ==
* Look for some more performance issues in 3.7
* Try to hook up the cross-builder
* Investigate libc++ check-all failures
Benchmark infrastructure - TCWG-360 [8/10]
* Testing found many problems in multinode
* Iterating to solutions
Misc [2/10]
=Plan=
Holiday next week.
Then back to fixing multinode, incorporating into jenkins, noise
control experiments
Hi Linaro Toolchain Group,
I am building a native toolchain for aarch64 with below configurations:
--build=x86_64-unknown-linux-gnu --host=aarch64-linux-gnu
--target=aarch64-linux-gnu.
In copy_gcc_libs_to_sysroot() - which copy libgcc.a to sysroot, current
implementation try to find the absolute path of libgcc.a as below :
libgcc="`${local_builds}/destdir/${host}/bin/${target}-gcc
-print-file-name=${libgcc}`
But above line will not execute (i.e. gcc -print-file-name) on x86_64 as
the toolchain is native toolchain for aarch64-linux-gnu. Thus a infinite
loop will be created in copy command i.e. copying directory x in x.
however, when I hard coded the libgcc.a path in my machine (as below),
everything went fine.
libgcc="/home/vpathak/arm/toolchain/build_abe_new/builds/destdir/aarch64-linux-gnu/lib/gcc/aarch64-linux-gnu/5.1.1/libgcc.a"
I think this is a bug in ABE build infrastructure.
Thanks.
--
with regards,
Virendra Kumar Pathak
* TCWG-806, aarch64 remote debugging multi-arch support. [4/10]
Patches are done. Need to test them and polish them.
Fix various multi-arch issues when --wrapper is used in GDBserver.
Patches are pushed in to mainline.
Could you describe this activity in more detail?
Is the goal here to support mixed aarch32/aarch64 in the same GDB binary
and detect the change at runtime?
Thanks.
-Duane
== Progress ==
* Factor conversion out of COND_EXPR - TCWG-849 (5/10)
- Iterated through the review and more testing
* Looked at widening pass and the test-case from Wilco (1/10)
* Misc (2/10)
- Connect slides.
- gcc-patches, gcc-bugs list
- Meetings
* Sick (2/10)
== Plan ==
- GCC Bugs
- Widening pass
- Linaro bug 1318
== This week ==
* TCWG-146 - Detect smin/umin idiom (1/10)
- Made change recommended upstream and resubmitted
* TCWG-140 - Transform end of loop conditions to min_expr (1/10)
- Validated and submitted upstream
* TCWG-833 - Exploit Wide Add operations when appropriate (5/10)
- Added early clobber and forced operand 0 and operand 2 to match
- Finished Aarch32 by using mode iterators
- Developed patch for Aarch64
- Wide add instructions are now emitted for both Aarch32 and Aarch64
* TCWG-834 - Use non-unit stride loads by preference when applicable (2/10)
- Further Aarch32 investigation
* Misc (1/10)
- Conference calls
== Next week ==
- Validate patches for TCWG-833 and submit upstream
- Further TCWG-834 investigation
- Linaro connect presentation preparation
* TCWG-835 (6/10)
- Looked at newton raphson method
- Need to write new md pattern that matches sdiv_optab for modes == v2sf, v4sf
- First attempt for patch: http://pastebin.com/NKy8WdWC
* TCWG-830 (2/10)
- Ran Charles's benchmarks on ARM and AArch64.
- Investigating testsuite fallout for ARM patch.
- Still blocked by permissions to do benchmarking
* Misc (2/10)
- Conference Calls
- US visa collection
== Next Week ==
- Continue with TCWG-830, TCWG-835, TCWG-777
Hi Linaro Toolchain Group,
I am trying to learn the 'decoding decision tree' for aarch64 in binutils
by trying to add a new assembly instruction 'addvp'.
For example: addvp x0, x0, 9
For this, I added a entry in struct aarch64_opcode aarch64_opcode_table[]
(file opcodes/aarch64-tbl.h) as below:
{"addvp", 0x01000000, 0x7f000000, addsub_imm, 0, CORE, OP3 (Rd_SP, Rn_SP,
AIMM), QL_R2NIL, F_SF},
ARM manual say, bit 27 & bit 28 are unallocated. Thus for addvp, I am
giving opcode 01000000 (with bit 27 & 28 as 0).
With this, generating object file from assembly file is successful (test.s
--> test.o); but while disassembling using objdump, it say undefined
instruction.
>From objdump log:
81002400 .inst 0x81002400 ; undefined
(but instruction was generated correct i.e. 81002400 !!!).
I know since addvp is a hack instruction, it won't execute on cpu. But
still disassembly should succeed.
1. Please help me in knowing what I am doing wrong here ? What else I
should do to add a new instruction in binutils ?
2. I also saw some printf in opcodes/aarch64-gen.c which I guess create
decoding tree (initialize_decoder_tree()). How to print them ? I made debug
=1 but still print is not coming.
3. There are some auto-generated files
like aarch64-asm-2.c, aarch64-dis-2.c. How to re-generate them ?
Thanks.
--
with regards,
Virendra Kumar Pathak
== Progress ==
* Maintenance (CARD-1833 5/10)
- Building libc++/abi/unwind in LLVM/Clang tree
- Fixing some build errors (D11486)
- Addressing comments to submissions from last week
- Committing approved ones
- Re-working the others
* Releases (CARD-1431 1/10)
- Building 3.7.0-RC1 on ARM and AArch64, uploading
* Benchmarks (CARD-716 2/10)
- Running LNT, SPEC and EEMBC on ARM and AArch64 for 3.7.0
* Background (2/10)
- Code review, meetings, discussions, etc.
- Upgraded APM to Debian, kernel 3.16
- Perf still segfaults. :(
== Plan ==
* Finish open reviews
* Continue getting libc++ to build and pass the tests in tree
* Look at some of the performance regressions in 3.7
# Progress #
* TCWG-806, aarch64 remote debugging multi-arch support. [4/10]
Patches are done. Need to test them and polish them.
Fix various multi-arch issues when --wrapper is used in GDBserver.
Patches are pushed in to mainline.
* TCWG-876 [1/10]
Re-run GDB testsuite with incoming Linaro toolchain release.
Everything looks OK.
* TCWG-860, aarch64 fast tracepoint. [1/10]
Polish the patches, and ready for submission.
* TCWG-757, upstream patch review. [2/10].
* Misc, meeting. [2/10]
# Plan #
* TCWG-806, test patches on different targets, polish patches
and post them for review.
# Absence #
06th Aug - 10th Aug, GNU Tools Cauldron.
11th Aug - 14th Aug, Holiday.
--
Yao
Benchmark infrastructure - TCWG-360 [6/10]
* Some user support/bugfixing/bugraising
* Multinode job more or less working (not fully tested)
* Additional restructuring got rid of some more complexity
** Though if my simplifying assumption doesn't hold, I'll have to put it back
Benchmarking 101 presentation [2/10]
* Ran through slides with Ryan & Maxim
* Removed many slides
* Collected up and categorized the removed slides
** Probably will go into future presentation(s)
Misc [2/10]
=Plan=
* Tweak multinode a little more
* Integrate multinode into Jenkins
** To the extent that I'm comfortable with the security
* Read a bit about some benchmarks that aren't SPEC
* Start noise control experiments (may inform presentation)
=Week After Next=
Holiday
* One day off - Bastille day (2/10)
== Progress ==
o Upstream GCC (3/10)
* Finalized and committed fix in trunk for Linaro bug #416
o Linaro GCC release (4/10)
* Reviewed and did more patches for tcwg-release script
* Still investigate validation issues.
* Prepared FSF branch merge into Linaro GCC 5 branch
o Misc (1/10)
* Various meetings
== Plan ==
- Summer Holidays (2 weeks)
== Progress ==
* Add REG_EQUAL note for arm_emit_movpair (1/10)
- Patch2 ok to commit.
- Ran complete validation.
- Found an issue and posted a patch to fix
* Factor conversion out of COND_EXPR - TCWG-849 (6/10)
- Found a performance regression in tree-ssa-reasoc
- Looked at the tree-ssa-reasoc code to see possible fixes
- Posted an RFC patch
* PR66865
- Wine segfaults from gcc in trunk (r225757)
- Reproduced it but turned out not from my commit
- Fixed by other PR
* Misc (2/10)
- Looked at interaction between gcc optimization passes
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
- Linaro bug 1318
Benchmark infrastructure - TCWG-360 [5/10]
* Worked through my Jenkins issues with Fathi, raised some tickets at him
* Converting LAVA end into multinode job
** Having some trouble with multinode API
Benchmarking 101 presentation [3/10]
* 1/2 day of discussions/reading, full day of redrafting
* Looked for Michael Hope's similar 2012 presentation
** Found slides, not video
=Plan=
* Complete multinode job
* Integrate into Jenkins to the extent that I'm comfortable
* Complete 'shareable' draft of benchmarking-101
** And see if I have enough left over for -102, maybe -103
== This week ==
* TCWG-140 - Transform end of loop conditions to min_expr (1/10)
- Blocked waiting on validation
* TCWG-833 - Exploit Wide Add operations when appropriate (7/10)
- Developed patch to handle signed and unsigned cases for Aarc32
- Investigation and debugging into support for Aarch64
* TCWG-834 - Use non-unit stride loads by preference when applicable (1/10)
- Initial Aarch32 investigation
* Misc (1/10)
- Conference calls
== Next week ==
- Validate Aarch32 patch for TCWG-833
- Develop Aarch64 patch for TCWG-833
- Validate TCWG-140
- Make recommended fixes to TCWG-146 and resubmit upstream
* TCWG-777 (3/10)
- O2 workaround: -fno-tree-pre -fno-tree-fre -fno-tree-dominator-opts
-fno-gcse -fno-peephole2
- Observing rtl dumps for gcse, combine, peephole2 with different
options and optimization levels.
- Continued investigating ICE during gcc build with my pass applied.
- Sent mail to tcwg, for further suggestions
* TCWG-830 (2/10)
- Verified the behavior for aarch64, and extended patch for aarc64
along same lines.
- Running Charles's microbenchmarks on r1-a7
- Benchmarking setup with Bernie. Blocked by permissions, sent a mail
to lava-lab,
for granting requisite permissions
* TCWG-835 (2/10)
- observing vector and asm dumps
* Misc (3/10)
- Travel to Mumbai for US Visa Interview
- Conference Calls
== Next Week ==
- Continue with TCWG-777, TCWG-835, TCWG-830
I'm trying to build the toolchain as win32 executable on Ubuntu with ABE.
I'm pretty new with ABE. I followed the FAQ
https://wiki.linaro.org/WorkingGroups/ToolChain/FAQ and Rob's post. Also
checked the MakeRelease.job and slave.sh. I have all packages listed in the
slave.sh installed. So I assume I have all dependencies ready for the build.
Here is what I have done:
Create _build subfolder beside abe
CD to _build and run: ../abe/configure --with-fileserver=148.251.136.42
--with-remote-snapshots=/snapshots-ref
First build this: ../abe/abe.sh --target aarch64-none-elf -build all
It installed the toolchain to
_build/builds/destdir/x86_64-unknown-gnu-linux. I added the bin under it to
my PATH
Then do 2nd round build: ../abe/abe.sh -host i686-w64-mingw32 --target
aarch64-none-elf -build all
However, I'm getting config error while it building libiberty:
configure:5946: checking for library containing strerror
Configure:5978: error: Link tests are not allowed after GCC_NO_EXECUTABLES.
My understanding is that the linker cannot find glibc or eglibc.
What have I missed?
Any where I can find detail instruction like step by step to build Linaro
toolchain for running on Windows host?
Sincerely,
Qyq
== Progress ==
* Maintenance (CARD-1833 4/10)
- Clang driver:
- Passing -Wa,-mfpu and friends to assembler (D11147, D11148)
- Passing -I to assembler (D11185)
- Don't include libgcc/asm if using libunwind/libc++abi (D11153)
- Asm warnings:
- Trying again to look for a way to disable asm warnings from clang (D11216)
* Benchmarks (CARD-716 3/10)
- Benchmarking shrink-wrapping in AArch64
- Setting up LNT Benchmarks on A32/A64
- Scripts to collate / compare LNT results on the fly
- Benchmarking LNT on 3.5.2 and 3.6.2 on ARM and AArch64
- Multisampling, perf, and all goodness
- Getting ready to compare with 3.7.0 to come
* Releases (CARD-1431 1/10)
- Spinning release 3.7.0
- Many changes, CMake builds, etc.
- Fixing the test-release.sh script (D11326)
* Background (2/10)
- Code review, meetings, discussions, etc.
- Upgrading APM to Debian 3.16
== Plan ==
* Upstreaming pending reviews
* Continue release 3.7.0, benchmark it
* Start looking at the effects of the stride vectorizer on ARM/AArch64
# Progress #
* TCWG-806, aarch64 remote debugging multi-arch support. [6/10]
Some code refactor and fix various multi-arch issues when --wrapper
is used in GDBserver. Patches are being tested.
* TCWG-757, Patches review. [2/10]
* Misc, meeting, [2/10]
# Plan #
* TCWG-806, aarch64 remote debugging multi-arch support.
# Absence #
* 06th Aug - 10th Aug, GNU Tools Cauldron.
* 11th Aug - 14th Aug, Holiday.
--
Yao
1 day off (2/10)
== Progress ==
* backports/release/infra (1/10)
- reviews
* GCC (3/10)
- posted patch to fix vget_lane on armeb
- investigating AdvSIMD failures on aarch64_be.
Having a way to debug target code would help (qemu does not seem
to support aarch64_be yet, and I use the foundation model in bare
metal mode)
* Misc (4/10)
- meetings, conf-calls, emails
== Next ==
Holidays until Aug 3rd.
Benchmark infrastructure - TCWG-360 [5/10]
* More thinking/prototyping sufficiently-secure Jenkins benchmarking
* Converting LAVA end into multinode job
Benchmarking presentation [2/10]
* A couple of helpful discussions
* Read a couple of helpful docs
Misc [3/10]
=Plan=
* Complete multinode job
* Settle on a plan for Jenkins
* Redraft presentation
== This Week ==
* TCWG-777 (4/10)
- Resolved ICE caused by pass during gcc build but hit another ICE:
http://pastebin.com/RUAY6scB
- Current pass state: http://pastebin.com/AGXnSkrZ
- For test-case:
void f(int flags)
{
void foo(void);
if (flags & 1)
foo();
}
- temporaries don't exist for -O1
- for -O2 temps introduced by peephole2 due to define_peephole2
pattern in thumb2.md:1540
http://pastebin.com/3rEF8Te4
So this intentionally transforms rtx from
zeroextractsi_compare0_scratch to rtx from shiftsi3_compare0_scratch.
Why is it beneficial to do this transform ?
- Looking into combine pass
- For above test-case works with -marm for -O2.
* TCWG-830 (3/10)
- trying to understand vect dump
- untested patch: http://pastebin.com/K4UX5iYz
* Misc (2/10)
- Started looking at TCWG-835, loop vectorized on x86 but not arm
- Committed fix to segfault on -dx
- Conference calls
== Next Week ==
- Continue with TCWG-777, TCWG-830, TCWG-835
- Travel to Mumbai on 14th July (Tuesday) for US Visa Interview with
US Consulate.
Hospital and physio (2/10)
== Progress ==
o Upstream GCC (2/10)
* More work on ongoing patches
o Linaro GCC release (3/10)
* Reviewed patches for tcwg-release script
* Looked at validation issues and redo backports for 5.1
o Misc (3/10)
* Various meetings
* Upstream libunwind support
== Plan ==
- Continue ongoing tasks
== Progress ==
LLDB development
-- Debugging problems with process launch and debugserver crash on remote
connection. [TCWG-855] [6/10]
-- Caught the notorious issue mentioned above fix can be found here
http://reviews.llvm.org/D11129.
-- Figured arm lldb and lldbserver host builds on chromebook will put
steps on collaborate LLDB page soon. [1/10]
Miscellaneous [3/10]
-- Travel to Islamabad for Czech Republic visa
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Follow up review process for process launch bug fix.
-- Run testsuite for armhf and figure out issues to fix.
-- Submit patches to fix build on older version of gcc.
Eid Holidays 17th to 21st July 2015
== This week ==
* TCWG-146 - Detect smin/umin idion (0/10)
- Waiting for upstream approval/review
* TCWG-140 - Transform end of loop conditions to min_expr (1/10)
- Blocked waiting on validation
* TCWG-833 - Exploit Wide Add operations when appropriate (8/10)
- Mere detailed investigation
- Working theory is to develop wide add rtl patterns that
incorporate vec_unpack to widen 16-bit to 32-bit
* Misc (1/10)
- Conference calls
- Conference call with Charles and Prathamesh to discuss
autovectorization progress
== Next week ==
- Validate patch for TCWG-140
- Develop patch for TCWG-833
- Validate TCWG-833 if successful patch is developed
- Investigate Aarch64 implementation
== Progress ==
* Add REG_EQUAL note for arm_emit_movpair (1/10)
- committed patch1 after testing again
* Factor conversion out of COND_EXPR - TCWG-849 (5/10)
- Gone through couple of iterations and committed the patch
- There are still some improvements need as follow up patches
* TACT -TCWG-851 (2/10)
- Started looking into spec2k
* Misc (2/10)
- Looked at interaction between gcc optimization passes
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
- Linaro bug 1318
== Progress ==
* Releases (CARD-1431 1/10)
- Released 3.6.2-final
* Maintenance (CARD-1833 5/10)
- Reducing runtime of some benchmarks in LLVM's
test-suite by getting rid of millions of useless
fprintf calls.
- Working on https://llvm.org/PR20700 some more
* Background (4/10)
- Code review, meetings, discussions, etc.
- Long TargetTuple review (D10969) / discussions
- Replacing broken buildbot USB disks (need to buy more)
- Bisecting self-hosting bot breakage
- Testing patches for ARM
- Jira farming
== Plan ==
* continue PR20700
* continue review/discussion of TargetTuple
* look again at PR20757
* maybe look at PR21000
* Off Monday (2/10)
== Progress ==
* published linaro-4.8 and 4.9 2015.06 releases
* linaro-5.1-2015.07 (1/10)
- backport reviews
- updated my helper script for reviews for cope with the git-only branches
* upstream (1/10)
- started looking at vget_lane Neon intrinsic failure on armeb
* infra/release/backports (2/10)
- reviews
* Misc (4/10)
- meetings, conf-calls, emails
== Next ==
* Off Tuesday
* backports, release, validation: update doc
* backports, reviews
* upstream work
== Later ==
* Off July 18th-Aug 3rd
Hi Linaro Toolchain Group,
I am comparing execution time (run time) of sin() trigonometric function
between following glibc (including libm) libraries for aarch64 (juno cortex
a57) :
Linaro glibc 2.19, Linaro eglibc 2.19, eglibc 2.19 (from
http://www.eglibc.org/) and Linaro glibc 2.21.
My observation for execution time of sin():
with Linaro glibc 2.19 and eglibc 2.19 = 1m24.703s (approx)
whereas,
with Linaro eglibc 2.19 & Linaro glibc 2.21 = 0m25.243s (approx)
Has Linaro optimized the libm functions for aarch64 in Linaro eglibc 2.19 ?
If yes, please point me to relevant reference from where I can find more
information on them.
Since the eglibc development from version 2.19 has stopped, will Linaro
maintain its own development version of glibc ?
I am using below snippet code and linux 'time' command to calculate the
time.
void sin_func(void)
{
double incr = 0.732;
double result, count = 0.0;
printf("%s\n", __func__);
while (count < 105414350.0) {
result = sin(count);
count += incr;
}
}
Thanks.
--
with regards,
Virendra Kumar Pathak
== Progress ==
LLDB development [TCWG-855] [8/10]
-- Figure out build steps for building cross lldb-server with
arm-linux-genueabihf-g++
-- Debugging of lldb-server communication packets for fixing lldb-server
armhf crash problem.
-- Comparison with androidabi version to figure out missing pieces
Miscellaneous [2/10]
-- Ubuntu reinstall on laptop
-- Follow up on Czech Republic visa
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Work on lldb-server for armhf and try to figure out crash problems
[TCWG-855]
End of sick leave (will work 100% from home until my cast is removed).
== Progress ==
o Upstream GCC (2/6)
* Back to ongoing patches
o Linaro GCC release (7/6)
* Reviewed FSF branch merge into 4.8/4.9 branches
* Reviewed patches for tcwg-realease script
* Sent a first batch of backports for 5.1
Still pending due to validation infra. issues
o Misc (1/6)
* Various meetings
== Plan ==
- Continue ongoing tasks
* One day off on Fri. [2/10]
# Progress #
* TCWG-805, aarch64 native debugging multi-arch support. [5/10]
Patches (part1) are posted upstream for review, need to rewrite some
of them. The rest of them are OK and can be pushed in after 7.10
branch is created.
Watchpoint support in multi-arch debugging. Both kernel and GDB need
some fixes. Ongoing.
* Complete the document of aarch64 tracepoint work. [1/10]
* FSF GDB. [2/10]
Review intel mpx patch again, and read something on intel
mpx stuff.
# Plan #
* TCWG-805, update some patches in part 1 patch series, and continue
the multi-arch watchpoint work.
--
Yao
== Progress ==
* Add REG_EQUAL note for arm_emit_movpair (1/10)
- Updated and reposted
- https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00295.html
- https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02066.html
* Factor conversion out of COND_EXPR - TCWG-849 (3/10)
- https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00246.html
* TACT -TCWG-851 (2/10)
- Small examples now seem to work.
- Have to do cross testing
* Git work flow for upstream patches -TCWG-848 (1/10)
- Updated based on review
* Misc (3/10)
- Looked up LLVM documents
- Looked at the TODO list Renato provided
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
- Linaro bug 1318
== This Week ==
* TCWG-856 (2/10)
- submitted patch to flatten cfgloop.h:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00277.html
* TCWG-777 (4/10)
- Modified pass to not generate redundant stores
- Investigating ICE caused by the pass during gcc build
- Discussions for possible approaches with Christophe and Kugan
- Reading thru documentation on optabs and ccmp patches
* Misc (4/10)
- Patch sent upstream which fixes segfault in gcc for -dx option.
- Filed upstream binutils bug for "branch range out of error"
- Conference calls
- Travel to Mumbai for US Visa OFC appointment
== Next Week ==
- Word towards committing cfgloop.h flattening patch
- Continue working on TCWG-830, TCWG-777, TCWG-847
== Progress ==
* Maintenance (CARD-1833 4/10)
- ADD/SUB with negative immediates solved by a year old
patch from ARM, sigh. On to the next bug... :(
- Working on https://llvm.org/PR20700
* Buildbots (CARD-1823 2/10)
- Moving benchmark bot to CMake, fixing deepcopy bug in
environment that broke new builds
- Restarting a few bots that crashed
* Background (4/10)
- Code review, meetings, discussions, etc.
- A lot of code review this week...
- Blocking disrespectful web spiders in llvm.org
- Emacs now almost works as I expect
== Plan ==
* Continue PR20700
* Have a look at Polybench
* Look for some more bugs to fix
Benchmarking presentation [7/10]
* More reading
* Ran through a couple more drafts
Misc [3/10]
* Featuring a bug in my backup scripts that took ~1/10 to fix
=Plan=
Back to benchmark automation as main activity
Presentation in the background
== Progress ==
LLDB development
-- Support for running lldb on arm hard float abi targets [TCWG-855] [7/10]
-- Built lldb-server for armhf trusty chromebook
-- Figured out problem with lldb-server showing up i386-linux-gnu as
target triple.
-- Verfied load of arm-elf executable and breakpoint setting.
-- LLDB GDBserver dies while trying to run the target.
Miscellaneous [3/10]
-- Playing with highkey board and setup chromebook with armhf and armel
chroots on ssd.
-- Preparing document for Czech Republic visa
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Further progress and try to fix run control on armhf targets [TCWG-855]
== This week ==
* TCWG-146 - Detect smin/umin idion (1/10)
- Patch sent upstream for approval
* TCWG-140 - Transform end of loop conditions to min_expr (4/10)
- Patch and investigating validation regressions
* TCWG-833 - Exploit Wide Add operations when appropriate (4/10)
- Investigation into why vectorizer does not exploit wide adds
* Misc (1/10)
- Conference calls
- Conference call with Kugan and Prathamesh to discuss GCC Git workflow
- Conference call with Charles and Prathamesh to discuss
autovectorization
== Next week ==
- Vacation
== Progress ==
(TCWG-831) post-indexed addressing [3/10]
. vectorization project kick-off call
. code browsing/reading to understand mailing list feedback about previous patch
(TCWG-775) NEON error messages [6/10]
. completed conversion of some ARM intrinsics to give same error
messages as AArch64 work
. reworked tests so they can be shared between AArch64, ARM.
. re-submitted previous patch with updated tests
Misc [1/10]
email, irc, gerrit reviews, connect travel booking, AArch64 qemu
big-endian experiment
== Plans ==
submit patch for work done so far on ARM NEON error messages
cortex-a53 workarounds
Benchmark automation - TCWG-360 [3/10]
* Created a partial Jenkins prototype
* Considered some security issues
Benchmarking presentation [5/10]
* Drafted some slides, did some reading
Misc [2/10]
=Plan=
More of the above
== Progress ==
* TCWG-849 (1/10)
- Committed improvement for VRP
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=225108
* Add REG_EQUAL for arm_emit_movpair (4/10)
- Posted patches for review
* TACT -TCWG-851 (3/10)
- Started with the small examples.
- Ran into an error while tuning; looking into it
* Git work flow for upstream patches -TCWG-848 (1/10)
- Had a chat with Michael and Prathamesh
- Tried the work-flow and now started documenting them
* Misc (1/10)
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
* TCWG-830 (4/10)
- Observing tree dumps
- Peeling for alignment happens at -O3 but not at -O2 -ftree-vectorize
Reason: in vect_enhance_data_refs_alignment() for:
a) -O2 -ftree-vectorize: max_allowed_peel == 0
b) -O3: max_allowed_peel == (unsigned) -1;
which equals UINT_MAX and therefore peeling gets allowed.
- Workaround: Pass -param vect-max-peeling-for-alignment=0
- Peeling for alignment with O2 can be enabled by passing
-fvect-cost-model (we don't want this!)
Reason:
opts.c:
/* Tune vectorization related parametees according to cost model. */
if (opts->x_flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
{
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS,
6, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS,
0, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
0, opts->x_param_values, opts_set->x_param_values);
}
The above if condition becomes false when -fvect-cost-model is passed.
- Proposed patch (untested): http://pastebin.com/ftp0mrwH
Patch follows the workaround and passes --param vect-max-peeling-for-alignment=0
if unaligned access is supported.
* TCWG-777 (4/10)
- Observing tree and rtl dumps
- Workaround: for -O1 pass -fno-tree-fre -fno-tree-dominator-opts
Test-case: http://pastebin.com/cjBcSpiT
Generated assembly at -O1 without workaround: http://pastebin.com/jmQGZhN9
Generated assembly at -O1 with workaround: http://pastebin.com/JGj05z66
Is that the expected output for no unnecessary temps in assembly with
workaround ?
Is it profitable over the assembly generated without workaround ?
- Approach currently taken:
a) New pass "remove-temps" (for lack of better name), after nrv (added
as last gimple pass).
b) Transforms:
if (ssa_var != 0)
to
new_ssa_var = SSA_NAME_DEF_STMT (ssa_var)
if (new_ssa_var != 0)
This "unfolds" cse on expressions within if, which was done by fre
(and if fre was disabled then by dom pass).
c) However this approach results in dead stores.
eg:
_8 = flags_7(D) & 1;
if (_8 != 0)
...
is transformed to:
_8 = flags_7(D) & 1;
_32 = flags_7(D) & 1;
if (_32 != 0)
...
so store to _8 is dead store.
I tried to run dse after remove-temps but that didn't work.
RTL 194r.jump eliminates the above dead store as "trivially dead insn".
However I don't think it's a good idea to have dead stores like these
in gimple and rely
on RTL to eliminate them. I could try to make the pass bit smarter to
not generate redundant stores like _32 != 0 in above case.
d) Patch (no intent to commit as-is): http://pastebin.com/AGXnSkrZ
Generated assembly at -O1 with the patch: http://pastebin.com/VmHCVpGC
Patch eliminates temporaries at -O1 but not at -O2.
I have not yet figured out the reason for that.
For if (flags & 1),
In dfinish pass for -O1, the generated RTL is from
zeroextractsi_compare0_scratch
while for -O2, the generated RTL is from andsi3_compare0
e) Is this a problem also on x86 ?
x86 generated assembly with -O1: http://pastebin.com/XMeTXXwK
* Misc (2/10)
- Getting familiar with vectorizer and NEON gcc intrinsics
- Reviewed git tutorials and starting preparation of git doc
- Conference calls
== Next Week ==
- Continue working on TCWG-830 and TCWG-777
- Header file flattening
- Travel to Mumbai on 2nd July (Thursday) for US Visa OFC appointment.