Committed Kazu's VFP testcases patch upstream.
Merged the latest from upstream GCC 4.6.
Merged all the outstanding launchpad merge requests against both GCC 4.5
and 4.6.
Spun the 4.5-2011.03-0 and 4.6-2011.03-0 releases. Passed the tarballs
to Michael H for final testing.
Brought the patch tracker up to date w.r.t. to new merges.
Posted one of Dan's patches upstream for review.
Decided to drop Julian's A8 alignment patch completely. I had previously
discovered it provided no measurable benefit on A8, and now I've found
the same for A9 (Pandaboard). There's no real improvement for any
combination of -falign-* options in EEMBC.
Bernd's "Discourage NEON on A8" patch also doesn't show any value in the
benchmark results, but I think I've forward ported it wrong, because it
should at least change the binary size, and it doesn't. I need to look
into this further.
I also decided I don't know enough about ARMv7, so I spent some time
reading a few chapters from the ARM A.R.M.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* RVCT Interoperability patch
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg00059.html
Last week:
* Launchpad #711819 / PR47719: ARM minipool ICE. Followed up on
discussion with Bernd and Ramana. Later posted discussion results on
gcc-patches, where Richard Earnshaw took it over with a final fix.
* Coremark ARMv7/v6 regressions: mostly pinpointed the exact cases where
RTL simplification fails to optimize away ZERO_EXTEND expressions. Still
working on how to enhance it.
* TW Public Holiday on Feb.28 (Mon), was off for one day.
This week:
* Try to turn Coremark regression investigation into code form.
* Other GCC issues.
I've been spending this week playing around with various representations
of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the
best representation would be to use built-in functions.
One concern in the original discussion was that the optimisers might
move the original MEM_REFs away from the call. I don't think that's
a problem though. For loads, we can simply treat the whole of the
accessed memory as an array, and pass the array by value. If we do that,
then the call would just look like:
__builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
(where, despite the C notation, the MEM_REF accesses the whole of elem[N]).
It is of course possible in principle for the tree optimisers to replace
this MEM_REF with another, equivalent, one, but that's OK semantically.
It isn't possible for the optimisers to replace it with something like
an SSA name, because arrays can't be stored in gimple registers.
__builtin_load_lanes would then be used like this:
combined_vectors = __builtin_load_lanes (...);
vector1 = ...extract first vector from combined_vectors...
vector2 = ...extract second vector from combined_vectors...
....
So combined_vectors only exists for load and extract operations.
The question then is: what type should it have? (At this point I'm
just talking about types, not modes.) The main possibilities seemed to be:
1. an integer type
Pros
* Gimple registers can store integers.
Cons
* As Julian points out, GCC doesn't really support integer types
that are wider than 2 HOST_WIDE_INTs. It would be good to
remove that restriction, but it might be a lot of work, and it
isn't something we'd want to take on as part of this project.
* We're not really using the type as an integer.
* The combination of the integer type and the __builtin_load_lanes
array argument wouldn't be enough to determine the correct
load operation. __builtin_load_lanes would need something
like a vector count (N => vldN) argument as well.
2. a combined vector type
Pros
* Gimple registers can store vectors.
Cons
* For vld3, this would mean creating vector types with non-power-
of-two vectors. GCC doesn't support those yet, and you get
ICEs as soon as you try to use them. (Remember that this is
all about types, not modes.)
It _might_ be interesting to implement this support, but as
above, it would be a lot of work. It also raises some semantic
questions, such as: what is the alignment of the new vectors?
Which leads to...
* The alignment of the type would be strange. E.g. suppose
we're loading N*2 uint32_ts into N vectors of 2 elements each.
The types and alignments would be:
N=2 uint32x4_t, alignment 16
N=3 uint32x6_t, alignment 8 (if we follow the convention for modes)
N=4 uint32x8_t, alignment 32
We don't need alignments greater than 8 in our intended use;
16 and 32 are overkill.
* We're not really using the type as a single vector,
but as a collection of vectors.
* The combination of the vector type and the __builtin_load_lanes
array argument wouldn't be enough to determine the correct
load operation. __builtin_load_lanes would need something
like a vector count (N => vldN) argument as well.
3. an array of vectors type
Pros
* No support for new GCC features (large integers or non-power-of-two
vectors) is needed.
* The alignment of the type would be taken from the alignment of the
individual vectors, which is correct.
* It accurately reflects how the loaded value is going to be used.
* The type uniquely identifies the correct load operation,
without need for additional arguments. (This is minor.)
Cons
* Gimple registers can't store array values.
So I think the only disadvantage of using an array of vectors is that the
result can never be a gimple register. But that isn't much of a disadvantage
really; the things we care about are the individual vectors, which can
of course be treated as gimple registers. I think our tracking of memory
values is good enough for combined_vectors to be treated as such
(even though, with the back-end changes we talked about earlier,
they will actually be stored in RTL registers).
So how about the following functions? (Forgive the pascally syntax.)
__builtin_load_lanes (REF : array N*M of X)
returns array N of vector M of X
maps to vldN
in practice, the result would be used in assignments of the form:
vectorX = ARRAY_REF <result, X>
__builtin_store_lanes (VECTORS : array N of vector M of X)
returns array N*M of X
maps to vstN
in practice, the argument would be populated by assignments of the form:
vectorX = ARRAY_REF <result, X>
__builtin_load_lane (REF : array N of X,
VECTORS : array N of vector M of X,
LANE : integer)
returns array N of vector M of X
maps to vldN_lane
__builtin_store_lane (VECTORS : array N of vector M of X,
LANE : integer)
returns array N of X
maps to vstN_lane
Note that each operation can be expanded independently. The expansion
doesn't rely on preceding or following statements.
I've hacked up the prototype below as a proof of concept. It includes
changes to the C parser to allow these functions to be created in the
original source code. This is throw-away code though; it would never
be submitted.
I've also included a simple test case and the output I get from it.
The output looks pretty good; there's not even the stray VMOV that
I saw with the intrinsics earlier in the week.
(Note that if you'd like to try this yourself, you'll need the patch
I posted on Monday as well.)
What do you think? Obviously this discussion needs to move to gcc@ at
some point, but I wanted to make sure this was vaguely sane first.
Richard
Hello,
I am looking for a way to disable '-gtoggle' flag in the run of stage 2 in
bootstrap; when
configuring ARM with (*).
The flag seems to be applied in stage 2 but not in stage 3 which seems to
cause bootstrap failure when
testing SMS as in stage 2 SMS fails because of debug_insn caused
by -gtoggle disturbing do-loop; while in stage 3 SMS succeeds; resulting
in different .o files and bootsrtrap failure.
(*) This the configure I used:
../gcc/configure --prefix=/home/eres/mainline/build --enable-checking
--enable-languages=c --enable-bootstrap
Thanks,
Revital
== GDB ==
* Committed fix for the GDB part of #620611 (Unable to
backtrace out of vector page 0xffff0000) to mainline and
Linaro GDB 7.2.
* Ran into GDB crashes due to memory corruption in tests
involving multiple inferiors. Tracked down root cause
(using valgrind) to long-standing double free bug in GDB
terminal state handling code. Committed fix to mainline
and Linaro GDB 7.2.
* While using valgrind (see above), ran into problems:
* ptrace system call is unsupported on ARM
* certain variants of the "SUB from SP" Thumb-2 instruction
are not handled by the VEX compiler
Fixed both problems locally, and was then able to successfully
valgrind GDB on ARM.
* Created Linaro GDB 7.2-2011.03-0 release.
* Worked on glibc patch to add ARM unwind tables to system
call stubs; this will help unwinding in the absence of
debug info for libc, and in particular fix #684218 (Failures
in gdb.base/call-signal-resume.exp)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
== PandaBoard ==
* upgraded my ARM dev environment from Ubuntu to Linaro snapshot (20110303)
* found another kernel bug on the panda (#728565)
== libunwind ==
* resolved build issues on ARM (when using the linaro snapshot)
* allows the testsuite to work with linkers that do not pull in indirect
shared libs
* fix build of the test-static-link test case on ARM
* link libunwind-setjmp.so against libunwind-elf
* posted some first patches on the libunwind ml
* learned about the Exception Handling ABI for the ARM Architecture
Regards
Ken
Starting back in Linaro land after a gap of 3-4 weeks where I've been
away on ARM internal tasks.
== GCC ==
- Setting up a new machine that I received for Linaro work.
- Spent some time reviewing upstream patches. Spent some time on the
P1 PR47719 upstream to get this fixed .
- Starting to read up on the benchmarking report and recreating the
environments.
- Looked through some of the speed tickets to have a look through and
spend some time on it.
- Put in a hardware request for a Panda board.
== Next Week ==
- Set up environment properly for some amount of benchmarking.
- Look at some of the performance regressions and work on some things
that need to be done.
- Continue looking at PR47719.
* Investigated and fixed sqlite3 testsuite failure on ARM (bug 725052)
* Discussing libffi API changes with maintainer; hopefully he's
going to send out his comments today.
* Looking at how to upstream the string routine changes
* Need to look at big endian testing
* Testing QEmu pre-release for Peter; looking very nice.
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
== maintain-beagle-models ==
* preparation and test for next week's qemu-linaro 2011-03 release
* put in a temporary fix for bug 723630 (apt/glibc now try prlimit64
syscall, so silence qemu warnings about not implementing it)
* investigated qemu warnings about bad 16 bit writes: this is a
kernel bug: https://bugs.launchpad.net/linux-linaro/+bug/727781
== vexpress model ==
* sent vexpress patches upstream, put into qemu-linaro
== merge-correctness-fixes ==
* more work on performance counter registers: proper cycle counter
implementation; now just needs a bit of tidying before upstreaming
* ran valgrind's test cases on qemu; added revealed issues to
https://blueprints.launchpad.net/qemu-linaro/+spec/merge-correctness-fixes
* sent patch fixing broken VMOV s0,s1,r0,r1 implementation
* sent patch fixing inverted carry bit on ORNS
== other ==
* meetings: toolchain, standup, architecture q&a, pdsw-tools, team brief
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
== This week ==
* Submitted the fix for the Qt miscompilation upstream. Applied after
approval.
* Submitted a patch for the Thumb LDR problem that Dave Martin hit.
This was rejected.
* Ended up spending a few days on the "unreasonable amount of memory
while compiling qemu" bug due to unfamiilarity with the DWARF 2 code.
I realise the original idea was that I'd just file this upstream,
but it was one of those cases where I kept finding out more info
for the bug report until the problem became obvious.
I've now submitted two patches for this upstream. The first was trivial
and is now in. I was asked to add a bit of extra code to the second,
which I hope to do next week.
* Looked at the MIPS bug that was reported against the Linaro toolchain.
This turned out to be a problem in our extension elimination pass.
Submitted a merge request for that.
* Got confirmation from ARM that we should use relocation number 160
for R_ARM_IRELATIVE, and that it was OK to make the changes public
(thanks!). I've now submitted the binutils patches upstream.
I'll do the eglibc ones when I get back.
== Next week ==
Holiday!
Richard