Hi,
== Investigate developer tools ==
* Finished latrace investigation.
== PandaBoard ==
* The defective PandaBoard that was sent back in December is now repaired and
on my desk again. It doesn't show the behaviour of #708883 and works
flawlessly so far. :)
== libunwind ==
* Did some debugging of the test-async-sig testcase to get started with
libunwind. It will dead-lock if you add "--enable-debug" since libunwind does
printfs in this case which are not signal safe.
* Sorted out which of Zachs patches are upstream and which are not.
* Started to learn about the different unwind methods that libunwind provides
on ARM.
Regards
Ken
== ffi ==
* Sent variadic patch for libffi to libffi-discuss
* Worked through some suggestions from Chung-Lin, need to do some rework
== string routines ==
* memchr & strchr patch sent for inclusion in ubuntu packages
* tried sqlite's benchmarks - they don't spend too much time in the
C library; although
a few % in memcpy, and ~1% in memset (also seem to have found an
sqlite test case failure on
ARM and filed as bug 725052)
== porting jam ==
* There wasn't much traffic on #linaro during this related to the jam
* I closed bug 635850 (fastdep FTBFS) which was already fixed with
an explicit fix for ARM in the changelog
and bug 492336 (eglibc's tst-eintr1 failing) which seems to work now
but it's not clear when it was fixed.
* Looking at eglibc's test log there seem to be a bunch of others
that are failing and may well be worth investigating.
* bug 372121 (qemu/xargs stack/number of arguments limit) seems to
work ok, however the reporter did say it was quite a fragile test;
that needs more investigation to see
whether the original reason has actually been fixed.
== misc ==
* swapping notes with Peter on the PBX SD card investigation
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
== maintain-beagle-models ==
* rebased qemu-linaro on upstream
* checked omap_uart model for any issues with enabling the extended
(non-16550A) features which the new Linux drivers need. Sent meego
merge request for patchset which turns on the features, and does
a little cleanup. Now in meego, qemu-linaro.
== merge-correctness-fixes ==
* reviewed versions 5 and 6 of Christophe's vrecpe/vsqrte patchset;
v6 was good and has now been committed
* sent a version of "dummy cp14 debug registers" patch upstream;
however I've realised it triggers a false positive in the
temp-leak debugging code in target-arm/translate.c
* wrote/sent a patch which moves this temp-leak debugging code
into TCG proper (which I think makes it much simpler and cleaner
and avoids the false positives mentioned above)
* some work on the cp15 performance counter registers. I now
have some code which I think is a fully architecturally valid
implementation of an "implements no events" core, except that
we don't implement the cycle count register.
* started testing/review of Adam's VA-to-PA translation regs patch.
In the course of this discovered that qemu unconditionally
implements an ARM940 cp15 WFI register which clashes with these;
submitted patch to add correct not-for-v6/v7 feature gating.
* sent out patch fixing usermode seeks by 32 bit guest on 64 bit
host (based on a diagnosis and suggested fix by Eoghan Sherry)
* sent patch fixing compile error in vnc code
== vexpress model ==
* sent a patchset for fixing the MMC card detect wiring on
PBX upstream; this is needed for vexpress too
* finished vexpress cleanup and cross-checking against the docs; I
now have a patchset I'm happy to upstream and will post next week
== other ==
* took part in pgp keysigning event with emdebian folks
* meetings: toolchain, PDSW-tools
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
== GDB ==
* Worked with Will Deacon and the Linaro kernel team to
make sure HW watchpoint and Versatile Express errata
fixes are included in the upcoming Linaro kernel release.
* Committed GDB HW watchpoint patches to mainline, and
backport to Linaro GDB. This completes work on the
HW watchpoint blueprint.
* Worked on fixing the GDB part of #620611 (Unable to
backtrace out of vector page 0xffff0000). Posted
(two versions of) mainline patch for discussion.
* Worked on kernel patch for #615974 (Interrupted system
call handling).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* Looked at the poor code generated for Neon load/store intrinsics.
Looked into the history behind the treatment of VFP registers by
CANNOT_CHANGE_MODE_CLASS. Peter confirmed that the restrictions
apply only to VFPv1. Wrote a patch to improve the code, which
partly overlapped with Julian's.
* Looked at how the operations should be represented at the tree level.
Experimented with various combinations of tree codes and types
to see which felt right. Wrote this up in the message I sent today.
== Next week ==
* More vectorisation.
* Submit some queued patches.
* Maybe some bug fixing. (I see there's a reload bug just waiting
to be claimed by a lucky developer.)
On holiday the following week.
Richard
Services at ex.seabright.co.nz are back up.
On Tue, Feb 22, 2011 at 10:06 PM, Michael Hope <michael.hope(a)linaro.org> wrote:
> Hi there. We've had an earthquake. Family and friends are fine but i'll be
> unavailable for a few days. Services on ex.seabright.co.nz are down. I'll
> cancel Wednesdays standup call.
>
> See you soon,
>
> -- Michael
Hello,
Implemented a patch for SMS to support targets that their doloop part is
not decoupled from the rest of the loop's instructions (which is the
current assumption of SMS). ARM is an example of such target, where the
loop's instructions might use CC reg which is used in the doloop part.
Now testing the patch on ARM and other targets that have do-loop.
Thanks,
Revital
Hi,
* vectorizer cost model
- implemented builtin_vectorization_cost for NEON
- added register spilling considerations to the cost model
- started testing/tuning on EEMBC Telecom and DenBench (for now I
have only two examples for spilling: fdct_int32 mp4encode that
shouldn't get vectorized and viterbi that should)
* measured vectorization impact on Telecom autcor - it's about 5x
(initially I got run time segfault, but the bug is already fixed on
GCC trunk, I'll have to check gcc-linaro-4.5 as well)
* NEON-vs.non-NEON degradation
- started to look at aes. There are 6 loops that get vectorized with
4.6 (due to this patch
http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01927.html that allows
cond_expr in number of loop iterations expressions) and vzip/vuzp
patch, but not with gcc-linaro-4.5. But it doesn't explain the
degradation of course.
- I don't understand mp4decodepsnr improvement, since I don't see
any loops or basic blocks vectorized.
Ira
One of the vectorisation discussions from last year was about the poor
code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces the
result of the loads onto the stack, then loads the individual pieces from
there. It does the same thing in reverse for stores.
I think there are two major problems here:
1. The result of the vld*() is a record type such as:
typedef struct int16x4x3_t
{
int16x4_t val[3];
} int16x4x3_t;
Ideally, we'd like one of these structures to be stored in a pseudo
register. However, the ARM port currently limits in-register
record types to 64 bits, so something this big is always given
BLKmode and stored on the stack.
A simple "fix" for this is to increase MAX_FIXED_MODE_SIZE.
That would do the right thing for the structures in arm_neon.h,
but wouldn't be safe in general.
2. The vld*() returns values as a single integer (such as EI mode),
while uses of the value will typically be in a vector mode such
as V4SI. CANNOT_CHANGE_MODE_CLASS doesn't allow direct
"mode-punning" between the two in VFP_REGS, so this again
forces the punning to be done on the stack.
The code in question is:
/* FPA registers can't do subreg as all values are reformatted to internal
precision. VFP registers may only be accessed in the mode they
were set. */
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
(GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
? reg_classes_intersect_p (FPA_REGS, (CLASS)) \
|| reg_classes_intersect_p (VFP_REGS, (CLASS)) \
However, the VFP restriction appears to be specific to VFPv1 --
thanks to Peter for the archaeology -- and isn't a problem for v6+.
In that case, removing this restriction is an important optimisation.
I tried the patch below on the following simple testcase:
#include "arm_neon.h"
void
foo (uint16_t *a)
{
uint16x4x3_t x, y;
x = vld3_u16 (a);
y = vld3_u16 (a + 12);
x.val[0] = vadd_u16 (x.val[0], y.val[0]);
x.val[1] = vadd_u16 (x.val[1], y.val[1]);
x.val[2] = vadd_u16 (x.val[2], y.val[2]);
vst3_u16 (a, x);
}
(not necessarily sensible!). Before the patch, -O2 produced:
sub sp, sp, #48
add r3, r0, #24
vld3.16 {d16-d18}, [r3]
vld3.16 {d20-d22}, [r0]
add r3, sp, #24
vstmia sp, {d20-d22}
vstmia r3, {d16-d18}
fldd d19, [sp, #8]
fldd d16, [sp, #0]
fldd d17, [sp, #24]
fldd d20, [sp, #32]
vadd.i16 d18, d16, d17
vadd.i16 d17, d19, d20
fldd d19, [sp, #16]
fldd d20, [sp, #40]
vadd.i16 d16, d19, d20
fstd d18, [sp, #0]
fstd d17, [sp, #8]
fstd d16, [sp, #16]
vldmia sp, {d16-d18}
vst3.16 {d16-d18}, [r0]
add sp, sp, #48
bx lr
After the patch we get:
vld3.16 {d24-d26}, [r0]
add r3, r0, #24
vld3.16 {d20-d22}, [r3]
vmov q8, q12 @ ti
vadd.i16 d17, d17, d21
vadd.i16 d16, d24, d20
vadd.i16 d18, d26, d22
vst3.16 {d16-d18}, [r0]
bx lr
The VMOV is a bit disappointing, and needs further investigation.
The first hunk fixes (2), and I think is correct. The second hunk
hacks (1), and isn't suitable in itself. I'll next try to make
arm_neon.h use built-in record types that are explicitly EImode,
which should remove the need to change MAX_FIXED_MODE_SIZE.
Richard
Index: gcc/gcc/config/arm/arm.h
===================================================================
--- gcc.orig/gcc/config/arm/arm.h
+++ gcc/gcc/config/arm/arm.h
@@ -1171,10 +1171,12 @@ enum reg_class
/* FPA registers can't do subreg as all values are reformatted to internal
precision. VFP registers may only be accessed in the mode they
were set. */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
- (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
- ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \
- || reg_classes_intersect_p (VFP_REGS, (CLASS)) \
2+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+ (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
+ ? (reg_classes_intersect_p (FPA_REGS, (CLASS)) \
+ || (TARGET_VFP \
+ && reg_classes_intersect_p (VFP_REGS, (CLASS)) \
+ && arm_fpu_desc->rev == 1)) \
: 0)
/* The class value for index registers, and the one for base regs. */
@@ -2458,4 +2460,6 @@ enum arm_builtins
instruction. */
#define MAX_LDM_STM_OPS 4
+#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (XImode)
+
#endif /* ! GCC_ARM_H */