https://blueprints.launchpad.net/ubuntu/+spec/other-linaro-n-cross-compilers:
- wrote patches for creating backports PPA
- each component [1] generates versioned -source binary package
(eglibc-2.12.1-source etc)
- a-c-t-b [2] got "PPA" boolean variable in rules to have one source package
for archive and for backports
- I built a-c-t-b with all components backports from natty in lucid pbuilder
Bugs:
- 684625 - libc6 is compiled for armv5 instead of armv7a
- confirmed, wrote fix, will sent for review and merge
- 683832 - gcc fails to cross compile Qt
- confirmed in maverick for cross gcc 4.4/4.5
- need to check with fixed (bug 684625) toolchain
- FTFBS of armel-cross-toolchain-base 1.53/natty
- issue is lack of LTO plugin built in gcc/stage2
- have first patches for it, need to test
1. component = eglibc, gcc-4.4/4.5, binutils, linux
2. a-c-t-b = armel-cross-toolchain-base
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
== GCC related ==
* PR44557, Thumb-1 ICE: originally thought a fix of constraint will
work, however after simplifying the testcase, received another ICE in
postreload, due to a load of IP, which is not permitted in Thumb-1.
Looking at some reload internals as part of fixing this.
* PR45416, ARM code generation regression. First fix from last week hit
an assert FAIL in the alias-oracle due to ARRAY_REFs not being handled
there. Also, further found some expand code quality regressions due to
this change. Turned to a more conservative fix by adding the related TER
substitution to expr.c:do_store_flag(), which produced more focused
results. However, 32-bit x86 slightly regressed in the same flag storing
code (did not use the 'testl' insn after the change). Still WIP.
* PR46667, submitted a section type conflict bug fix upstream, see
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00137.html , which
supposedly fixes the upstream ARM-Linux C++ build. Jan Hubicka later
gave another fix, so still in discussion.
* PR45886, this PR is call for backporting the __ARM_PCS* preprocessor
symbols to gcc-4_5-branch. Submitted a mail to ask for approval, no
response.
== libffi VFP hard-float ==
* PR46508, libffi VFP assembly error. I missed this earlier due to using
a compiler configured with --with-fpu=vfp. Submitted assembly fix to add
the needed FPU directives. Committed to upstream trunk.
== This week ==
* Hope to wrap up the above in-progress PRs, as well as continue to look
at other PRs of interest.
* LP #685534 popped up on Sunday, and manifests on upstream trunk too.
Add this to queue.
* Think about GCC performance opportunities (Linaro)
== Linaro GCC ==
* Reproduce regression of my ldm/stm backport on 4.5, which is caused by
the other two merged patches in ifcvt.c. Fix them. Propose merge
request again. Learn how to sync/merge changes from one branch to the
other branch.
* Fix VFP_D0_D7 handling in predicate vfp_register_operand. Approved
and committed upstreams.
* Test new regrename improvement patch on x86_64, and measure
effectiveness of it on ARM. Code size of bash-3.2 is reduced 0.2% with
option "-march=armv7a -mthumb -O2 -frename-registers". Eric B. is
almost OK with this patch except some wording in comments.
== Linaro GDB ==
As discussed in UDS, I'll move to GDB work for gdb correctness.
http://ex.seabright.co.nz/helpers/planner#tr-toolchain-gdb-correctness
In this month, I'll focus on GDB testsuite failures fixing.
* Analyze LP:615978, failures in gdb.base/annota3.exp.
Signal is not delivered to child while software single-stepping. The
same as LP:649121.
* Fix failure in gdb.xml/tdesc-regs.exp. LP:685494
It is caused by a target triplet matching error, when target is set to
"armv7l-linux-gnueabi". Target triplet matching in test cases should be
changed. Patch is being reviewed in upstreams.
* LP:616000 failures caused by -fstack-protector.
Homework to understand frame-related code in GDB. Got some big picture
of usage of some key data structures inside GDB on frame. Compared with
prologue with and without stack-protector, find some difference there.
Still no clue on how to educate GDB to identify whether stack-protector
is turned on or off.
* Fix one failure in printcmds.exp. LP:685702
This test case on 7.2 branch is a little bit out of date, compared with
GDB trunk. Backport one patch on trunk to 7.2 branch can fix this
problem. Backport patch is being reviewed in upstreams.
* Neon registers in kernel dump file. LP:615972
Ask Linaro kernel WG to see how to move forward on this. Discussion is
still ongoing.
== This Week ==
* Report the rest of GDB testsuite failures.
* Pick up some of them, and fix.
* Pass gcc patches in my queue one by one to gcc-patches to review.
--
Yao (齐尧)
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* merge-correctness-fixes:
** Nathan Froyd (CodeSourcery) has reviewed a lot of
my ARM patches. Most were OK, one or two needed tweaking
We seem to have come to agreement on how best to treat
the API between qemu and the softfloat library, and I
have a V2 patchset ready to mail as soon as Nathan has
commented on the final patch.
** posted a patch to rename a very misleading _is_nan()
function
** identified list of correctness patches in meego and
samsung trees and issues noted within ARM
** qemu: posted patch to remove an unused function
** started looking at the first patch in the meego tree,
which fixes VQSHL. I have already discovered a bug in
this insn not covered by the meego patch...
Meetings: toolchain, PD update, ARM 20th birthday party
Plans:
- qemu consolidation
Issues:
* Locking in qemu is definitely insufficient, especially
(but not exclusively) when running multi-threaded
programs in linux-user mode.
https://bugs.launchpad.net/qemu/+bug/668799
has an example problem and some discussion; I'm hoping
some other qemu developers have an opinion, but the
nicest approach IMHO would involve fairly invasive
changes to how qemu implements interrupting a cpu
which is executing TCG code.
Not sure where this should sit in the priority list.
Things of note:
- there has been some discussion of broadening the "KVM Forum"
conference to include other virtualisation related topics including
Xen and also the TCG aspects of Qemu. Still all up in the air but
possibly colocated with LinuxCon in Vancouver in August. See:
http://www.linux-kvm.org/page/KVM_Forum_2011
Absences: (complete to end of 2010)
Fri 17 Dec - Tue 4 Jan inclusive.
2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.
== This week ==
* Looked at a generic bug in GAS's handling of ifuncs. Sent a patch upstream:
http://sourceware.org/ml/binutils/2010-11/msg00495.html
Alan quite reasonably wanted me to test on a variety of targets. For want
of anything better, I wrote a script to test Alan's list of 118 targets.
Tests went OK, patch committed upstream.
* Wrote more IFUNC tests. Found another problem (as yet unresolved).
* Looked at vector stuff, but nothing tangible yet.
(I also had to spend some time on other IBM things, sorry.)
== Next week ==
* Away Monday and Tuesday.
* More STT_GNU_IFUNC and vectors.
Richard
* Benchmarking of simple package builds with various string routine
versions; not finding enough difference in the noise to make any large
conclusions
* Looking at the string routine behaviour with perf to see where the time
is going
- getting hit by the Linaro kernels on silverbell missing Perf
enablement in the config
- Useful amount of time does seem to be spent outside the main 'fast
aligned' chunks of code
- pushing/popping registers does seem to be pretty expensive
* Started looking at libffi and hard float
- Started writing a spec
https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/LibFFI-variadic
- It's going to need an API change to libffi, although the change
shouldn't break any existing code on existing platforms where they work.
* Helping with the image testing
Dave
Hi,
* got llvm+clang working on ARM:
https://wiki.linaro.org/KenWerner/Sandbox/HowToBuildToolchainComponents#llv…
* checked whether llvm inlines the __sync_* builtins on ARM or not:
https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations#LLVM
* developed a patch for #681138 (tested with current gcc-linaro)
* spent some time for bootstrapping the GCC trunk in order to test and post
that patch on the ml but wasn't successful
(finally ran into the issues discussed at #659713)
* did some verification work on #674090
* preparing to work on the "investigate current developer tools" item
Regards
Ken
Hi there. Currently you can't use NEON instructions in inline
assembly if the compiler is set to -mfpu=vfp such as Ubuntu's
-mfpu=vfpv3-d16. Trying code like this:
int main()
{
asm("veor d1, d2, d3");
return 0;
}
gives an error message like:
test.s: Assembler messages:
test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3'
The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the
compiler what instructions to use, and also tells the assembler what
instructions are valid. We might want the compiler to use the VFP for
compatibility or power reasons, but still be able to use NEON
instructions in inline assembler without passing extra flags.
Inserting ".fpu neon" to the start of the inline assembly fixes the
problem. Is this valid? Are assembly files with multiple .fpu
statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as
gas seems to ignore the second -mfpu.
What's the best way to handle this? Some options are:
* Add '.fpu neon' directives to the start of any inline assembly
* Separate out the features, so you can specify the capabilities with
one option and restrict the compiler to a subset with another.
Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'
* Relax the assembler so that any instructions are accepted. We'd
lose some checking of GCC's output though.
-- Michael
- Continued looking into NEON special loads and stores.
- Benchmarks: concentrated on EEMBC Telecom:
- autcor gets vectorized
- viterbi, besides strided data accesses, needs to sink conditional
stores to allow if-conversion and make the main loop vectorizable.
Since the potential here is 4x, I think it's worthwhile to work on
this.
- conven, fbital also have control-flow issue, but much more
complicated than viterbi
- fft has a problem with loop count, I would like to investigate
this a bit more
- diffmeasure doesn't seem to have vectorization potential
- Fixed GCC PR 46663 on trunk, testing the fix for 4.3, 4.4, 4.5.
Hi,
Here's a work-in-progress patch which fixes many execution failures
seen in big-endian mode when -mvectorize-with-neon-quad is in effect
(which is soon to be the default, if we stick to the current plan).
But, it's pretty hairy, and I'm not at all convinced it's not working
"for the wrong reason" in a couple of places.
I'm mainly posting to gauge opinions on what we should do in big-endian
mode. This patch works with the assumption that quad-word vectors in
big-endian mode are in "vldm" order (i.e. with constituent double-words
in little-endian order: see previous discussions). But, that's pretty
confusing, leads to less than optimal code, and is bound to cause more
problems in the future. So I'm not sure how much effort to expend on
making it work right, given that we might be throwing that vector
ordering away in the future (at least in some cases: see below).
The "problem" patterns are as follows.
* Full-vector shifts: these don't work with big-endian vldm-order quad
vectors. For now, I've disabled them, although they could
potentially be implemented using vtbl (at some cost).
* Widening moves (unpacks) & widening multiplies: when widening from
D-reg to Q-reg size, we must swap double-words in the result (I've
done this with vext). This seems to work fine, but what "hi" and "lo"
refer to is rather muddled (in my head!). Also they should be
expanders instead of emitting multiple assembler insns.
* Narrowing moves: implemented by "open-coded" permute & vmovn (for 2x
D-reg -> D-reg), or 2x vmovn and vrev64.32 for Q-regs (as
suggested by Paul). These seem to work fine.
* Reduction operations: when reducing Q-reg values, GCC currently
tries to extract the result from the "wrong half" of the reduced
vector. The fix in the attached patch is rather dubious, but seems
to work (I'd like to understand why better).
We can sort those bits out, but the question is, do we want to go that
route? Vectors are used in three quite distinct ways by GCC:
1. By the vectorizer.
2. By the NEON intrinsics.
3. By the "generic vector" support.
For the first of these, I think we can get away with changing the
vectorizer to use explicit "array" loads and stores (i.e. vldN/vstN), so
that vector registers will hold elements in memory order -- so, all the
contortions in the attached patch will be unnecessary. ABI issues are
irrelevant, since vectors are "invisible" at the source code layer
generally, including at ABI boundaries.
For the second, intrinsics, we should do exactly what the user
requests: so, vectors are essentially treated as opaque objects. This
isn't a problem as such, but might mean that instruction patterns
written using "canonical" RTL for the vectorizer can't be shared with
intrinsics when the order of elements matters. (I'm not sure how many
patterns this would refer to at present; possibly none.)
The third case would continue to use "vldm" ordering, so if users
inadvertantly write code such as:
res = vaddq_u32 (*foo, bar);
instead of writing an explicit vld* intrinsic (for the load of *foo),
the result might be different from what they expect. It'd be nice to
diagnose such code as erroneous, but that's another issue.
The important observation is that vectors from case 1 and from cases 2/3
never interact: it's quite safe for them to use different element
orderings, without extensive changes to GCC infrastructure (i.e.,
multiple internal representations). I don't think I quite realised this
previously.
So, anyway, back to the patch in question. The choices are, I think:
1. Apply as-is (after I've ironed out the wrinkles), and then remove
the "ugly" bits at a later point when vectorizer "array load/store"
support is implemented.
2. Apply a version which simply disables all the troublesome
patterns until the same support appears.
Apologies if I'm retreading old ground ;-).
(The CANNOT_CHANGE_MODE_CLASS fragment is necessary to generate good
code for the quad-word vec_pack_trunc_<mode> pattern. It would
eventually be applied as a separate patch.)
Thoughts?
Julian
ChangeLog
gcc/
* config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Allow changing mode
of vector registers.
* config/arm/neon.md (vec_shr_<mode>, vec_shl_<mode>): Disable in
big-endian mode.
(reduc_splus_<mode>, reduc_smin_<mode>, reduc_smax_<mode>)
(reduc_umin_<mode>, reduc_umax_<mode>)
(neon_vec_unpack<US>_lo_<mode>, neon_vec_unpack<US>_hi_<mode>)
(neon_vec_<US>mult_lo_<mode>, neon_vec_<US>mult_hi_<mode>)
(vec_pack_trunc_<mode>, neon_vec_pack_trunc_<mode>): Handle
big-endian mode for quad-word vectors.