Hi,
* I've been debugging various errors and warnings that I encountered
with the binary CSL 2011.03 toolchain
* Fleshed out my recipe for the external toolchain; now get a working
core-image-minimal that boots fine within qemu
* Debugged why cmake based recipes (like libproxy) are having trouble
when compiling with an external toolchain
* Currently the libc is provided by the sysroot of the external
toolchain. This might not be ideal and as time permits I'd like to find
a way to get eglibc build instead.
Regards
Ken
Task Planned Estimated Actual
Historical
~~~~~~~
Connect 2011.q4
preparation 28/10/2011 28/10/2011
28/10/2011
Linaro Tasks
~~~~~~~~~~~~
Fully Investigate the O3
performance
regressions 31/01/2012
Neon backend experiments 09/12/2011 14/12/2011
with alignment hints
and addressing mode work.
Investigate partial-partial
PRE and regression with
bitmnp01 18/12/2011
Writeup on the optimizations 31/12/2011
enabled with PGO
RAG :
RED : None
AMBER:
==Progress===
* The Android guys found a bug with the vcvt.f64.s32 instruction
coming out after my patch and I found a few assembler issues as well
during this process which are now fixed upstream.
* Backported the A15 patches into Linaro 4.6
* Assisted as needed with the release which really wasn't too much
work for me other than the revert .
* Backported one part of the partial-partial PRE patch . Still looking into it.
* Did some analysis of the failure with di-layout.c test failure and
RichardS has now fixed it in the middle-end.
* Wrote a patch to replace all vector mode aligned vldm / vstm with
equivalent vld1.64 and vst1.64 to allow more alignment hints to come
out of the compiler. Still not fully happy with it but it's looking
much better than the original hack.
=== Plans ===
* Continue looking at partial-partial PRE and try and understand it further.
* Flush out these neon patches that I'm accruing with the addressing
modes and see where we get to with alignment hints and vld1.64's .
* Look at movw's / movt's vs constant pools.
* Submit my PGO patch .
Absences.
* Dec 19 - 31st Dec - Tentatively booked
* Feb 6-10 : Linaro Connect Q1.12.
* Feb 11- 15 : Holiday.
== QEMU ==
* Wrote a fix for bug 883133 (code buffer/libc conflict); spent some
time testing it because
I wasn't sure whether the crash I was seeing after that was my fix not
being complete or actually
bug 893208.
* Got it to boot with -cpu 486; without that it's triple faulting in
a divide just after a load of time stamp
reads which makes me suspicious that 893208 is a timer problem.
* (It also fails when used with vnc graphics, but works in SDL and
curses, but I'll leave that bug for
another time).
== String routines ==
* With one more tweak to my memchr, it finally made it into eglibc.
Dave
Hi,
I received this question from an ARM FAE:
Does the 4.5.2 version support A15 optimization? Or would
you recommend using the latest 4.6 versions?
Thanks for any response I could forward back to him.
Best regards,
Matt
== This week ==
* Got the -fsched-pressure code into a state where it's almost
presentable. Found a few more things to tweak on the way.
Fixed some FIXMEs, notably to honour MAX_SCHED_READY_INSNS.
* More testing on ARM. Tried to get some SPEC2000 results
as well as the usual EEMBC & DENbench, but I'm not sure
how noisy the SPEC ones are.
* More testing on powerpc. Decided that this really isn't a good target
to test on for 4.7 because of the poor choice of pressure classes.
SPEC CPU2006 INT results are reasonable-to-good, but the FP ones
suffer from the fact that we think there are twice as many registers
available for normal FP than there actually are. I'd like to fix this,
but all pressure-estimation bits of GCC suffer from the same problem,
and it's hard to justify as part of Linaro, because it doesn't
apply to ARM.
* Fixed upstream PR 50873 (ICE for NEON misaligned moves). Thanks to
Ramana for the heads-up and analysis.
* Retested and posted the patch for PR 48941 upstream (poor code generated
by the vzip*() and vunzp*() arm_neon.h functions).
Richard
== GDB ==
* Created and published Linaro GDB 7.3-2011.12 release.
* Updated Linaro GDB 7.3 to GDB 7.3.1 code base.
* Implemented support for single-stepping atomic operation
code sequences for ARM (and Thumb) (LP #892008). Checked
in to mainline and Linaro GDB.
* Ongoing work on remote support for "info proc" and core file
generation. Currently yet another solution for the remote
interface has been brought up in mailing list discussions
(support accessing arbitrary files on the remote side, not
just /proc). I'm working on prototyping this suggestion.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi Michael,
I have finally managed to complete the release process. It wasn't quite
as smooth as I would have liked, but we seem the have got there!
Notes:
- Ramana's VCVT patch caused an Android problem. This was reverted right
before the release.
- The initial release spin and test went without a hitch.
- There was an additional test failure in the GCC testsuite, but this
turns out to be because the snapshot date "20121201" happens to contain
the string "120". Interestingly, this will also be true for most of 2012.
- The ubutest runs seem to have a some problems: all of the glibc and
python builds have failed with a message about libgcc. Since this has
hit both 4.5 and 4.6 simultaneously I'm assuming it's environmental and
not caused by a new toolchain bug. The rest of the compilation appears fine.
- The benchmarking seems fine on A9, but I couldn't find results for the
others, although the scheduler lists the jobs.
- The upload to Launchpad was somewhat problematic. Uploading 4.5 took
two attempts. Uploading 4.6 failed about 6 times (at 20 minutes or so
each) before I tried from another machine with a faster uplink - that
went first time.
Andrew
The Linaro Toolchain Working Group is pleased to announce the 2011.12
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2011.12 is the tenth release in the 4.6 series. Based
off the latest GCC 4.6.2+svn181866, it contains a range of vectoriser
performance improvements and general bug fixes.
Interesting changes include:
* Updates to 4.6.2+svn181866
* Generic tuing support for Big-endian platforms.
* SLP support for operations with arbirary numbers of operands.
* SLP support for conditions.
* Pattern recognition support in basic-block SLP.
* Enhancements to mixed-size condition pattern recognition.
* Support for 64bit __sync* primitives on ARM.
* Unaligned block-move support for ARMv7.
* Added Cortex-A15 integer pipeline tuning.
Linaro GCC 4.5 2011.12 is the sixteenth release in the 4.5
series. Based off the latest GCC 4.5.3+svn181877, this is a
maintenance focused release.
Interesting changes in 4.5 include:
* Updates to 4.5.3+svn181877
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.12https://launchpad.net/gcc-linaro/+milestone/4.5-2011.12
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release page:
https://launchpad.net/gcc-linaro/4.6/4.6-2011.12https://launchpad.net/gcc-linaro/4.5/4.5-2011.12
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
Hi,
- fixed PR 51285
- continued looking at the alignment issue, ran Michael's script with
different options, tested Ramana's preliminary patch for vld1/vst1,
and my "don't peel for low loop bounds" patch
Ira
The Linaro Toolchain Working Group is pleased to announce the
release of Linaro QEMU 2011.12.
Linaro QEMU 2011.12 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- There are no Linaro-specific changes of note in this release
- This release is based on the upstream QEMU 1.0 release.
(Note that future qemu-linaro releases will continue to track
upstream trunk; the release dates for upstream and our
release just happened to be conveniently aligned in this case.)
Known issues:
- Graphics do not work for OMAP3 based models (beagle, overo)
with 11.10 Linaro images.
- This release of qemu-linaro is known not to work on ARM hosts.
(See bugs #883133, #883136)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.12
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the release of
Linaro GDB 7.3.
Linaro GDB 7.3 2011.12 is the fourth release in the 7.3 series. Based off
the latest GDB 7.3.1, it includes a number of ARM-focused bug fixes and
enhancements.
This release contains:
* Update to GDB 7.3.1 code base
* Support single-stepping atomic operations (LDREX/STREX sequences)
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.12
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
I had a play with the vecotiser to see how peeling, unrolling, and
alignment affected the performance of simple memory bound loops.
The short story is:
* For fixed length loops, don't peel
* Performance is the same for 8 byte aligned arrays and up
* Performance is very similar for unaliged arrays
* vld1 is as fast as vldmia
* vld1 with specified alignment is much faster than vld1
The loop is the rather ugly and artifical::
void op(struct ains * __restrict out, const struct aints * __restrict in)
{
for (int i = 0; i < COUNT; i++)
{
out->v[i] = (in->v[i] * 173) | in->v[i];
}
}
where `struct aints` is a aligned structure. I couldn't figure out how
to use an aligned typedef of ints without still introducing a runtime
check. I assume I was running into some type of runtime alias
checking.
This compiled into::
vmov.i32 q10, #173
add r3, r0, #5
0:
vldmia r1!, {d16-d17}
vmul.i32 q9, q8, q10
vorr q8, q9, q8
vstmia r0!, {d16-d17}
cmp r0, r3
bne 0b
I then lied to the compiler by changing the actual alignment at
runtime. See:
http://people.linaro.org/~michaelh/incoming/runtime-offset.png
The performance didn't change for actual alignments of 8,
16, or 32 bytes.
I then converted the loop into one using vld1 and fed it smaller
alignments. See:
http://people.linaro.org/~michaelh/incoming/small-offsets.png
The throughput falls into two camps: one of alignments
1, 2, or 4 and one of 8, 16, 32. The throughput is very similar for
both camps but has some stange dropoffs at 24 words, around 48 words,
and around 96 words. The terminal throughput at 300 words and above
is within 0.5 %
I then converted the vld1 and vst1 to specifiy an alignment of 64
bits. See:
http://people.linaro.org/~michaelh/incoming/set-alignment.png
This improved the throughput in all cases and in cases for more than 50
words by 14 %. This graph also shows the overhead of the runtime
peeling check. The blue line is the vectoriser version which is
slower to pick up due the greater per call overhead.
I then went back to the vectoriser and changed the alignment of the
struct to cause peeling to turn on and off. See:
http://people.linaro.org/~michaelh/incoming/unroll.png
At 200 words, the version without peeling is 2.9 % faster. This is
partly due to a fixed count loop turning into a runtime count due to
unknown alignment.
This run also showed the affect of loop unrolling. The loop seems to
be unrolled for loops of <= 64 words and drops off in performance past
around 8 words. When the unrolling finally drops out, performance
increases by 101 %.
Raw results and the test cases are available in
lp:~linaro-toolchain-dev/linaro-toolchain-benchmarks/private-runs
A graph of all results is at:
http://people.linaro.org/~michaelh/incoming/everything.png
The usual caveats apply: this test was all in L1, only on the A9, and
very artificial.
-- Michael
> On Mon, Dec 5, 2011 at 1:40 AM, Tom Gall <tom.gall(a)linaro.org> wrote:
> > I probably know the answer to this already but ...
> >
> > For shared libs one can define and use something like:
> >
> > void __attribute__ ((constructor)) my_init(void);
> > void __attribute__ ((destructor)) my_fini(void);
> >
> > Which of course allows your lib to run code just after the library is
> > loaded and just before the library is going to be unloaded. This helps
> > keep out cruft such as the following out of your design:
> >
> > PleaseCallThisLibraryFunctionFirstOrThereWillBeAnErrorWhichYouWillHitCausingYouToPostToTheMailingListAskingTheSameQuestionThatHasBeenAsked1000sOfTimes();
> >
> > Yeah .. you know the function. I don't like it either.
> >
> > Unfortunately this doesn't work when people link in the .a from your
> > lib. Libs like libjpeg-turbo in theory should never ever need to be
> > linked in that fashion but consider the browsers who link to the
> > universe instead of using system shared libs.
On Mon, Dec 05, 2011 at 04:19:11PM +0800, Kito Cheng wrote:
> Here is some triky way for this problem, you can put the constructor
> and destructor to the source file which contain necessary function
> call in your libraries to enforce the linker to archive your
> constructor and destructor.
>
> However if this solution is not work for your situation, you can apply
> the patch in attach for build script to enable the
> LOCAL_WHOLE_STATIC_LIBRARIES for executable,
>
> After patch you can just add a line in your Android.mk :
>
> LOCAL_WHOLE_STATIC_LIBRARIES += libfoo
>
> The most disadvantage of this way is you should always link libfoo by
> LOCAL_WHOLE_STATIC_LIBRARIES...and this patch don't send to linaro and
> aosp yet.
[...]
Part of the problem here is that .a libraries lack the dependency and
linkage metadata that shared libraries have.
-2)
Put up with the need to call an explicit initialisation function
for the library. A lot of commonly-used libraries require an
initialisation call, and I'm not sure it causes that much of a
problem in practice...
-1)
Put a C++ wrapper around just enough of your library such that your
constructor/destructor code is recognised as a needed static
constructor/descructor by the toolchain.
I can't think of a very nice way of doing this, so I won't elaborate
on it...
It's also not really a solution, since you still need to pull in a
dummy static object from somewhere in order to cause the construcor
and descructor to get called.
0)
libtool or similar may help solve this problem, but I don't know much
about this -- also, for solving the problem, that approach only works
if uses of your library link via libtool.
1)
One hacky approach is to rename your library to libmylib-real.a, and
then make replace libmylib.a with a linker script which pulls in the
needed constructor as well as the real library:
libmylib.a:
EXTERN(__mylib_constructor)
INPUT(/path/to/libmylib-real.a)
This works, providing that __mylib_constructor is external (normally,
you would be able have the constructor function static, but it needs
to be externally visible in order to be pulled in in this way.
2)
Another way of doing a similar thing is to mark __mylib_constructor
as undefined in all the objects that make up the library.
Unfortunately, there seems to be no obvious way of doing that: the
assembler generates undefined symbol references automatically for
unresolved references at assembly time. There's no way for force
the existence of an undefined symbol without an actual reference to
it. objcopy/elfedit don't seem to support adding such a symbol
either. It would be simple to write a tool to add the undefined
symbol reference (such tools may exist already), but binutils doesn't
seem to provide this for you. The plausible-looking -u option to
gcc doesn't do anything unless doing a link.
One other way of doing it without a special tool is to insert a bogus
relocation into the text section of each object with an assembler
.reloc directive specifying relocation type R_<arch>_NONE.
There isn't really a portable way to do that, though. The name of
the relocation changes per-arch, and some arches have other quirks
(on ARM for example, .reloc cannot refer to the current location,
but seems instead to need to refer to a defined symbol which is non-zero
distance away from the location counter).
One advantage to this approach is that your .a file looks just
like any other .a file. Also, you can include that dependency
in only those objects which really require the library to be
initialised (normally, this is not a huge benefit though, since
probably most of your objects _do_ require the library to be
initialised).
A disadvantage (other than portability problems) is that, like (1),
the constructor symbol must be external (not static)... so it
pollutes the symbol table and isn't protected against people calling
it directly.
You can create a dummy symbol instead of referring to the constructor
symbol directly though -- this solves the second problem.
3)
Finally, you can split your contructor/destructor code out into a
separate .o file (say mylib-ctors.o), and use the linker script
trick for (1) to forcibly include this object when linking:
libmylib.a:
INPUT(/path/to/mylib-ctors.o /path/to/mylib-real.a)
This avoids some of the disadvantages of the other approaches,
but you still end up with a strange-looking library which is really
a linker script.
This is closer to how the C library traditionally solves the problem
(i.e., the crt*.o stuff). libc.so also tends to be a linker script,
which deals with the fact that some parts of libc must be statically
linked from a separate library when linking to -lc.
Obviously, approaches (1)..(3) all suffer from arch or toolchain
portability problems (or both). (The GNU/GCC __constructor__ thing
is obviously a portability problem in itself, it you're minded to
care about it.)
Cheers
---Dave
* Linaro GCC
Continued work on 64-bit shift / extend / etc. in NEON. I have posted an
RFC to the gcc-patches list in the hope of getting some feedback on how
best to fix this. No response yet. Hopefully some of the Linaro guys are
at least looking at it ...
Merged FSF GCC 4.5 and 4.6 into the Linaro GCC release branches prior to
the release next week.
Set more benchmarking work running in my ongoing investigation into
generic tuning.
Did a dry run of the extra release testing Michael normally does. It
failed. Michael says he's fixed it now, but I know how to do my bit, so
fingers crossed.
* Other
Experienced some IT/connectivity outages within Mentor. Resolved now.
==Progress===
* Off sick on Monday
* Systematic testing duty - few Aarch64 issues.
* Linaro patch review duty.
* Tested my vcvt fixed point patch and close to committing.
* Worked on sometime on movw / movt for symbol references rather than
constant pools . While this gives nice benefits it's a code size hog
and needs further investigation.
* PGO patch being tested finally and should go back up for review.
=== Plans ===
* Release week next week.
* Start looking at partial_partial PRE.
* Finish committing by backlog of patches.
Absences.
* Dec 19 - 31st Dec - Tentatively booked
* Feb 6-10 : Linaro Connect Q1.12/
Summary:
* Patch linaro crosstool-ng.
* Windows install package
Details:
* Patch linaro crosstool-ng:
* Back port upstream patches.
* Check-in the zlib/libiconv/expat/ncurses related patches to linaro branch.
* Create reference windows install package for linaro toolchain from
installjammer. The install process works well on Win7.
Plans:
* Investigate test on Windows.
Best regards!
-Zhenqiang
Hi,
OpenEmbedded:
* started on creating a receipts to compile the "core-image-minimal"
using an external prebuilt toolchain (csl arm-2011.03)
* there are still a lot of warnings at the do_package/do_package_qa task
* the good news is that the build process finishes and kernel plus root
file system image gets created
* the bad news is that the rootfs lacks some important libs like libc
and therefore won't run under qemu-system-arm
(since init, busybox, etc. are dynamically linked)
* currently a 3-lines hack on oe-core is required to be able to
overwrite a task of the generic glibc receipt; all other files could go
into a separate layer
Linaro Android:
* had a quick look into the EABI attribute tag issue
Regards
Ken
== String routines ==
* Sent updated memchr to the eglibc list
== 64 bit atomics ==
* Ran a set of timing consistency tests that a colleague had sent me
while I was off; Panda passed those, so time
doesn't appear to be going backwards or anything, so that's not the
problem with membase.
* Pushed the code into linaro-gcc.
== QEmu ==
* Tested Peter's prerelease - all good.
* Started looking at the issues for running in TCG mode on ARM
== Other ==
* Read through the ARMv8 instructions docs that landed on arm.com;
quite interesting. Note that multiple instruction
IT blocks are listed as being deprecated for 32bit mode on v8
(although this will work but it can be put in a mode to fault
you to make it easy to find the uses).
* Some debugging of Panda odd timing issue with Paul Mckenney.
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || ||
||cp15-rework || 2012-01-06 || 2012-01-06 || ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| ||
(for blueprint definitions: https://wiki.linaro.org/PeterMaydell/QemuKVM)
Historical Milestones:
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
== qemu-kvm-getting-started ==
* now reasonably set up to run KVM under Fast Model; howto is here:
https://wiki.linaro.org/PeterMaydell/A15OnFastModels
* rebased kvm patches into qemu-linaro
* fixed bug where we weren't passing cpu number to kvm properly
when delivering an interrupt
* sent some minor patches to upstream qemu that will be needed for
kvm (eg configure script tweaks)
== initial-a15-system-model ==
* started on cleaning up a9/11mpcore private peripheral implementation;
now mostly done and looking much better as a base for a15
== other ==
* preparation for qemu-linaro release (rolled tarball, tested)
* submitted patch to fix buffer overrun in GIC model
* discussion: linux-user mode race conditions, and in particular
how we should handle signals that arrive during syscall emulation
* upstream patch review: imx31 round 3
-- PMM
== GDB ==
* Completed new set of patches to support both "info proc" and
core file generation across the remote protocol, and posted
them to the mailing list for review.
* Tested GDB trunk in preparation for 7.4 release branch point
on multiple platforms; analyzed and fixed a couple of problems,
some also present on ARM in remote testing. Patches checked
in to mainline.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* More on -fsched-pressure. Testing on POWER7 showed a degenerate case
that I'd failed to handle well. Fixed that. Saw that part of the
problem on POWER7 was that IRA was using a combination of GENERAL_REGS
and CR_REGS as a single pressure class, so there appeared to be 39
registers available for storing integers. Fixed (or worked around) that.
Tweaked a few other things too. The only denbench result that I
wasn't happy with was RSA, where both forms of -fsched-pressure are
significantly worse than -fno-sched-pressure. Tracked down the cause
of that. We had a block BB1:
A: (set (reg:DI X) Y)
B: (clobber (reg:DI Z))
C: (set (subreg:SI (reg:DI Z) 0) (... X ...))
D: (set (subreg:SI (reg:DI Z) 4) (...))
where B makes sure that Z is treated as dead before C. Interblock
motion causes B to be scheduled in an earlier block, but none of
the other instructions can be. This means that, when we schedule BB1,
it still contains A, C and D, and Z now appears to be live on entry to
the block. C therefore appears to reduce register pressure, because
it contains the last use of X, and appears to leave Z's liveness
unaffected. In reality it should be treated as increasing register
pressure by 1 (-1 for the death of X, +2 for the birth of Z).
I "fixed" this by moving C's dependencies to B, a bit like we do
for scheduling groups (although none of the other handling of
scheduling groups should apply). This made a big difference,
so that the new code is a win on RSA.
There's still one SPEC2006 degradation on POWER7 that I want
to look at.
* Caught up on a lot of mail. gcc-patches backlog has gone down
from ~4900 when I got back to ~500.
* Briefly looked at x86's drap support, to see what would be needed
for ARM. Didn't look for long though: the overhead seems excessive
for optional alignment, and the agreement seemed to be that 128-bit
alignment wouldn't really make much of a difference anyway.
Richard
Hi!
* Continued with running eembc, coremark, denbench and spec2k on the ursas
with the latest of the Linaro and FSF series. The variants used were
o3-neon and o3-neon-novect. Something went wrong with the variants the
first time, so I had to rerun the tests once.
Discussed draft report with Michael, next week I will share with the rest
of the team.
* Did a rerun SPEC2K runs with "train" and "ref" data sets. I did -o2 and
-o3 runs on a panda with the two data sets. Asked for a sanity check of the
numbers.
* Prepared and held a presentation about the tcwg internally.
* Will be tied up with internal work for the most of w49.
Best regards
Åsa
Hi,
- Ran eon with gcc 4.7: there are much more loops similar to the one
in lp#831094 that get vectorized (due to some data ref analysis
improvement), so the impact of disabling peeling for such loops (i.e.
loops with low loop bound) is even bigger than for 4.6, and
vectorization improves the performance by 2.5%.
I prefer to understand the peeling/alignment situation better and not
just commit this patch (and I spent some time trying to do that).
- Fixed PR 51301 - a bug in over-promotion pattern. Proposed for merge
to gcc-linaro-4.6.
- Merged the last SLP patch to gcc-linaro-4.6.
Ira
This email is just a quick summary of what we (Linaro) are
planning in the way of QEMU work to support KVM on ARM Cortex-A15.
The idea is to let people know what's coming up, find out if we've
forgotten anything, and avoid people duplicating work unnecessarily.
Most of this is based on a useful session at the recent 'ARM server
mini-summit' in Orlando (UDS/Linaro Connect) at the beginning of
this month.
The work we're currently proposing to do falls into three parts:
* refactor QEMU's cp15 register handling
At the moment QEMU handles cp15 accesses by calling out to a single
helper function which is an enormous set of nested switch statements
to handle the different coprocessor registers. Access permissions are
checked separately at translate time. This design makes specifying
board-dependent or cpu-dependent registers somewhat painful; it's also
easy for the access permission checks to be out of sync. There is no
support for banked cp15 registers either (needed for trustzone and
virtualisation). We need a better design which lets a board or core
register handler routines for cp15 registers. This will make the code
cleaner and more maintainable as a base for new features.
This isn't strictly a requirement for KVM, but we're going to want
KVM to be able to hand off cp15 accesses to QEMU, and I don't think
that's going to be maintainable or reliable without this refactoring.
(https://blueprints.launchpad.net/qemu-linaro/+spec/cp15-rework)
* A15 system model
Basically a QEMU model of a Versatile-Express with a Cortex-A15
minus the virtualization and LPAE extensions. This needs the
A15 private peripherals (just the GIC in the right place in
the memory map, really; generic timer not required) and the
new memory map version of the vexpress board model, plus some
new cp15 registers. (Bill Carson has already done some patches
in this area but they need a little rework and may have minor
missing pieces.)
https://blueprints.launchpad.net/qemu-linaro/+spec/initial-a15-system-model
* miscellaneous integration work
We're aiming for a reasonable working prototype of A15 guest on
an A15 Fast Model host here; we need to fix at least some of
the bugs which currently mean upstream QEMU doesn't work on ARM hosts,
sort out which kernel and qemu trees we are developing from, and
get things running in our validation lab's continuous integration
setup.
https://blueprints.launchpad.net/qemu-linaro/+spec/qemu-kvm-getting-started
Also on the radar is a fourth piece of work:
* QEMU virtio-mmio support
This is adding support for the 'mmio' virtio transport, which will
allow virtio support in a versatile-express model. We're going to
need this at some point but the current thought is that we want
to do the above listed more important bits of work first...
(The exception would probably be if it turned out that this was
sufficiently useful for making early KVM development easier)
https://blueprints.launchpad.net/qemu-linaro/+spec/add-amba-virtio-support
So, questions:
(1) did we forget something important?
(2) is anybody else already planning to do any of this (or would
like to start)? if so we should coordinate...
(3) is there anything that the kernel folk need/want earlier
rather than later?
thanks
-- PMM
Hi,
Now that upstream trunk is in stage3 and we have a few patches that
won't really make it upstream until stage1 is reopened is it
worthwhile having a new status in the merge requests that moves it
into a to_upstream status . The other option is to have a common
spreadsheet that we keep updating with links to merge requests that
need to be upstreamed .
Thoughts ?
Ramana
PS - Any clue on what's happening with the branch diff bug that's been
open in launchpad forever now ?
Hi,
* Worked on peeling problem in eon (#831094). Wrote a patch that
checks if the number of vector iterations is going to be more than 2,
and disables peeling otherwise. With this patch I see about 1.5%
regression with vectorization (and about 7% without it).
* I am thinking to extend the patch for unknown number of iterations
by creating a run-time check. The threshold could be set by param.
Another option, could be doing it through the cost model, but it's
hard to evaluate costs when misalignments are unknown (and, I think,
the cost model handles known misalignment properly).
* Disabling peeling for low loop bounds also helps with one of EEMBC
benchmarks, for which vectorization with double-words is more
beneficial than with quad-words. It turns out that we are able to
force the alignment for double-words (and, therefore, avoid peeling),
because we check that the required alignment (64 in this case) is less
or equal to BIGGEST_ALIGNMENT, where
arm.h:#define BIGGEST_ALIGNMENT (ARM_DOUBLEWORD_ALIGN ?
DOUBLEWORD_ALIGNMENT : 32)
and
arm.h:#define DOUBLEWORD_ALIGNMENT 64
So, we can never force alignment for 128 bits on ARM. I wonder if
that's a real limitation.
* Proposed three SLP patches to gcc-linaro, and merged two of them.
Ira
Addressing the comments received from Richard and Ayal regarding the
patch to estimate register pressure.
Testing the patch on eembc and libav micro benchmarks.
Looking at the regressions seen with SMS.
== GDB ==
* Ongoing work on support for cross-platform core file generation.
Posted a new design proposal to the mailing list to include not
only "info proc mappings", but *all* "info proc" commands. This
would involve a remote protocol command to read arbitrary proc
files, instead of a specific command to retrieve the memory map.
* Investigated Launchpad bug:
#891970 msp430-gdb segmentation fault with target remote
== GCC ==
* Patch review week.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Worked on adding support for 64-bit NEON integer shifts. I have this
working now, although I'm still not very happy about how the register
allocator chooses which mode to use - it prefers core-registers if the
values start or end in core-regs, even though moving to values to NEON
registers might be more efficient (general 64-bit shifts in core
registers require several instructions). I've also had to mark the CC
register clobbered in all cases, even though it only gets clobbered in
some of them, which might be necessary, but isn't very satisfactory.
The NEON shifts work showed that 32->64 bit extends could be done better
also. This hasn't been a great problem up to now, but the shift amount
(in particular) is typically a 32-bit value and yet needs to be
zero-extended to 64-bit for NEON's purposes. Right now, GCC prefers to
extend the value in core-registers, and then copy it to NEON. This
works, but burns another core-register - a scarce commodity - so I think
it would be better to copy it first, and then extend it after. NEON has
instructions for this, so I'm investigating how to get the compiler to
do it (this is all strictly post-combine, so the usual options are out,
and the register allocator has to be allowed to do it the old way in the
case where core-regs really are the best option, so it's tricky).
Summary:
* Upstream crosstool-ng patches.
* Create windows install package from installjammer.
* Investigate link issues.
Details:
* crosstool-ng patches.
* Patches for newlib extra config, gdb extra config, pch, nls option
are committed to crosstool-NG upstream.
* The dependant library patches are in discussion.
* Learn installjammer and integrate it to scripts to create windows
install package.
* Investigate warning message from link when linking the prebuilt zlib
for migw32 host.
It might be OK with static link, but migh fail with dynamic link on windows.
For i586-mingw32[msvc] host, lots of messages like
libtool: link: Could not determine host path corresponding to ...
For i386-mingw32 host: In addition to the message in i586-mingw32
build, output the following message
*** Warning: linker path does not have real file for library -lz. ...
Plans:
* Build and test.
Absences:
* Nov 29, 30: Trainings.
Thanks!
-Zhenqiang
Hi,
Good news -- I just built a version of ICS with the current version of
linaro-gcc.
Panda build here:
http://people.linaro.org/~bernhardrosenkranzer/boot.tar.bz2http://people.linaro.org/~bernhardrosenkranzer/system.tar.bz2http://people.linaro.org/~bernhardrosenkranzer/userdata.tar.bz2
Use linaro-android-media-create as usual to install.
This is not yet a build that we can reproduce inside android-build
because I've had to cheat by swapping out linkers in a couple of
places (just using current binutils the way we normally do produces a
build that doesn't boot, using binutils built from the AOSP source
release works, but the prehistoric linker doesn't know about "dmb st",
can't link u-boot, can't link the kernel, and strangely enough can't
link some components of ICS - apparently the binaries they ship have
some extra patches in).
But the good news is that every part is built with our compiler -
there's nothing in the way of using that (aside from the code
insanities I've already fixed).
I'll work on sorting out binutils now...
ttyl
bero
Hi,
I've spent most of my time to dig into OE. First I started with OE
(classic); then realized that OE-core is where the future happens and
switched to it. I've set up a build system and got a ARM minimal image
to build that boots in QEMU *yay*. In parallel I've been reading the
manual and looked into the receipts to find out what toolchain they are
using (gcc-4_6-branch plus patches). Next step is to get the OE-core
built using the Linaro-GCC.
Regards
Ken
== This week ==
* Looked at the MIPS _unpack_d bug. libgcc.a did have a definition,
and Michael couldn't reproduce with his build, so the bug report
is now marked as Incomplete.
* Backported patch for PR 48190 to upstream 4.6 and 4.5.
* Reviewed Revital's SMS register-pressure patch.
* More on -fsched-pressure. I now have a version that I'm happy with
as far as ARM goes, in that it usually seems to produce code that is
no worse than the better of currect -fsched-pressure and current
-fno-sched-pressure. (I'm sure there's a better way of saying that.)
In some cases it is better than both.
* Continued trying to catch up on mail.
== Next week ==
* Clean up the -fsched-pressure code (it's still in its "experimental mess"
state). Try it on Power.
* Resurrect vzip and vunzp patch after Richard E said he wouldn't object.
Richard
Hi!
* Ran eembc, coremark, denbench and spec2k on the ursas with the latest of
the Linaro and FSF series. The variants used were o3-neon and
o3-neon-novect.
I first got a c++ related build error when using 4.4.x compilers, the was
error caused by symbol versioning. Michael's explanation: "We want to use
the gcc-4.4.5 libstdc++ when building and running. However, when running
c++ itself, it links in /usr/lib/libppl_c.so, which was built with the host
4.5 compiler, which needs the 4.5 libstdc++!"
The work around is to remove the LD_LIBRARY_PATH from build.mk (the
gcc-%/benchmarks.stamp target) and run the C only tests.
* Continued documentation of running benchmarks:
https://wiki.linaro.org/AsaSandahl/Sandbox/RunningBenchmark. Tips of more
efficient ways of doing things are always welcome.
* Collected the results for SPEC2K runs with "train" and "ref" data sets. I
did -o2 and -o3 runs on a panda with the two data sets. The results for -o2
and -o3 looks almost the same though. I will double check the "*build.txt"
files from the benchmark runs, and if needed do a complementary run.
Best regards
Åsa
Dear All,
I am using arch/arm/configs/vexpress_defconfig to configure and build Linux
Kernel 3.1.1
http://launchpad.net/linux-linaro/3.1/3.1-2011.11/+download/linux-linaro-3.…
and then if I booth the zImage crated on Linaro QEMU
http://launchpad.net/qemu-linaro/trunk/2011.10/+download/qemu-linaro-0.15.5…
,it works properly.
But if i enable the LPAE support in the config file, the kernel builds and
when I boot the kernel image on QEMU, it just prints the output as :
Uncompressing Linux... done, booting the kernel.
And, then it hangs ... Can anyone please tell how to fix this issue?
Looking forward to your reply.
Thanks and Regards,
Jubi
I discovered some excessive memory usage in gas recently when
defining macros. It turns out that this is a weird implementation
feature rather than a bug.
This patch has a possible fix for the issue, but I'd be interested
in people's views before I go so far as cleaning it up and
discussing it upstream.
Cheers
---Dave
Dave Martin (1):
gas: Allow for a more sensible number of macro arguments
gas/as.c | 17 +++++++++++++++++
gas/doc/as.texinfo | 9 ++++++++-
gas/hash.c | 5 +++--
gas/hash.h | 1 +
gas/macro.c | 22 +++++++++++++++++++++-
gas/macro.h | 1 +
6 files changed, 51 insertions(+), 4 deletions(-)
--
1.7.4.1
[Jubi, I'm afraid this is the second copy of this you'll see, because
you accidentally sent your reply to linaro-toolchain-request rather
than to the actual mailing list, and so my first reply was misdirected.
This reply is to the correct list address...]
On 22 November 2011 13:28, Jubi Taneja <jubitaneja(a)gmail.com> wrote:
> Thanks for your reply. Please find the response inline ..
> On Tue, Nov 22, 2011 at 6:44 PM, Peter Maydell <peter.maydell(a)linaro.org>
> wrote:
>> On 22 November 2011 13:06, Jubi Taneja <jubitaneja(a)gmail.com> wrote:
>> > But if i enable the LPAE support in the config file, the kernel builds
>> > and
>> > when I boot the kernel image on QEMU, it just prints the output as :
>> >
>> > Uncompressing Linux... done, booting the kernel.
>>
>> Does your kernel boot OK on real hardware?
>>
>> (ie, is a kernel with LPAE support expected to boot on a CPU like the
>> A9 which doesn't have LPAE?)
>
> Yes, it is expected to boot ARM Cortex A15 CPU.
The A9 and the A15 are different CPUs. QEMU currently supports
only the A9. This is why I asked if this kernel boots OK on real
Versatile Express A9 hardware.
>> Also if your config/kernel command line don't turn on earlyprintk it's
>> worth enabling this as it usually gets you better diagnostic messages
>> for early kernel boot failures.
>
> Ok, I will try to check this. But, unfortunately now I again tried enabling
> LPAE in config file and the current status is that when I boot the kernel
> image on Qemu. it simply hangs. It now don't show that message of
> Uncompressing kernel.. I am trying to debug it using gdb, but could not find
> much. Please guide me how shall I proceed ahead.
If you've turned on kernel support for the Versatile Express A15
rather than the Versatile Express A9 then this is expected behaviour:
the VE-A15 has a different memory layout and in particular the serial
ports are in a different place. So if you try to boot the kernel on
a VE-A9 system (which is what QEMU is modelling) then it will display
nothing because the kernel is trying to write to UARTs which aren't
there.
What are you actually trying to achieve here?
-- PMM
Continued looking at constant reuse optimizations, as a background task.
I've fiddled with the costs a bit more to remove false positives.
Continued benchmarking different generic tuning ideas. With each test
run taking most of a day this is slow going.
Took Michael's rootfs that is used for all the toolchain testing and
benchmarking, unpacked it, and repacked it so that it is compatible with
"linaro-media-create", then tested that I could use it to run tests on
LAVA successfully. I was hoping to use this for extra benchmarking
bandwidth, but there's a permissions problem in the LAVA website
software that means it's not yet possible to post private results to the
system, so no proprietary benchmarks yet. I can still continue
pipe-cleaning my process, and maybe run some benchmarks without actually
reporting the results (or perhaps posting them somewhere write-only).
Begun work on adding GCC support for 64-bit shifts with NEON. This is
not quite as simple as it ought to be because a) it's inefficient to
move a value to NEON registers just to do a shift, so it needs to detect
where the value is, and b) right shifts are encoded as left shift by a
negative amount, and negative shift amounts are normally considered
undefined behaviour.
RAG :
RED : None
AMBER: Worried about trunk failures with test runs. Number of
testsuite failures after the atomics merge has increased - more below.
.
Task Planned Estimated Actual
Historical
~~~~~~~
Connect 2011.q4
preparation 28/10/2011 28/10/2011
28/10/2011
Linaro Tasks
~~~~~~~~~~~~
Fully Investigate the O3
performance
regressions 31/01/2012
Writeup on the optimizations 31/12/2011
enabled with PGO
==Progress===
* Debugged the LTO failures for some time this week - not much progress.
* The bootstrap failure with trunk turned out to be the same problem
as with the CFG not being updated properly with some of the
shrink-wrap patches from Alan M . This was fixed on trunk later on
Monday.
* Tested atomics fixes but the test results are overall looking ugly .
Need to do some more debugging. Discovered GDB was broken for single
stepping in ARM atomic sequences - Filed bug report
https://bugs.launchpad.net/gdb-linaro/+bug/892008 here.
* Looked into the vmul / vmla issue for a bit - did some experiments.
Need to write these up and follow up .
=== Plans ===
* Commit the vcvt fixed point patch.
* Finish experiments with the vmla stuff and find out more about this.
* Finish debugging the LTO failures with PGO bootstrap.
* Some research into the O3 perf issues.
Absences.
* Dec 19 - 31st Dec - Tentatively booked
== GDB ==
* Ongoing work on support for cross-platform core file generation.
== GCC ==
* Investigated Launchpad bugs:
#889984 binaries: should step across helper functions
#889985 binaries: can't step out of helper functions
#890764 4.6-11.11 seems to misdetect some files as system header and
implicit extern "C"
== Misc ==
* Gave talk on Linaro at the IBM Germany Technical Expert Council
fall meeting.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Sent the patch which implements register pressure estimation in SMS to
the gcc mailing list as RFC.
I looked at some of the regressions in libav and intend to continue
with that this week.
== Last week ==
* Caught up on lots of email.
* Looked into the SMS regression that Revital found. Turned out to be
caused by the ARM backend not modelling the VMLA fast accumulator path.
Need to know how the path actually works before modelling it
(the docs aren't clear).
* Continued to look at -fsched-pressure.
Richard
Summary:
* Create crosstool-ng patches and build embedded toolchain on ubuntu-8.04.4.
Details:
* Crosstool-ng patches
* Update old patches according to Michael's comments and revise them
according the guideline.
* Create patches for nls option, newlib and gdb.
* Try to build embedded toolchain on ubuntu-8.04.4. It works after
installing all the dependence libraries.
Plans:
* Upstream the patches.
* Build and test.
Thanks!
-Zhenqiang
Hi there. The 4.6-2011.10 release causes an output miscompare failure
in 175.vpr from SPEC 2000. The fault didn't exist in 2011.09 and has
cleared in 2011.11.
Does this ring a bell with anyone? Andrew, could it be related to
your widening multiplies fix?
-- Michael
Hi,
* rewrote the Android.mk of libunwind to make use of autoreconf and libtool
* finished my work on libunwind
* upgraded my Linaro Android build environment
* debugged Linaro Android build failures (#891753)
* tested backtracing on the Linaro Android 2.3.5 and 2.3.7 branches
* documented debuggerd usage on Android:
https://wiki.linaro.org/Platform/Android/DebugAndroidSystemComponents
* Debugged a linking failure of the android perflab benchmark that Andy
is seeing. Turns out that the GCC is optimizing two consecutive calls to
sinf and cosf (same angle) are optimized by the GCC to one sincosf call.
The libm provided by the benchmark is lacking sincosf. Workaround is to
use -fno-builtin-sinf -fno-builtin-cosf.
Regards
Ken
== 64 bit atomics ==
* Still fighting membase
* Cleaned up a bunch of other issues, but I'm back at an 'expiry'
issue, where the test
stores some data with a fixed expiry time and then waits until after
it should have expired,
and checks it has. Except on ARM it sometimes doesn't expire quickly
enough. I've got
enough debug now to see that the server processes view of time (which
it updates via
an event about every second) is sometimes very behind gettimeofday()'s
view of time - and
have a small test for it. This doesn't seem to happen on x86. The
good part is that it's now
a much smaller test, the bad part is that it fails rarely - somewhere
between 1/1000 and 1/100
depending on its mood.
* Looked at a few other things to see if they might use 64 bit atomics:
- spice's (as in the VNC like protocol) FAQ said it needed 64bit
atomics and didn't work
on 32bit machines due to that; but the source appears to have been
fixed for 32bit.
- Looked at boost lock-free; it does have an implementation using
gcc's __sync primitives,
however for ARM it uses a hand coded set of primitives, those are
missing the 64 bit implementation,
but the contributor of the ARM code said that the boost lock-free
author preferred
not to use the gcc primtives.
== Other ==
* Testing latest libffi rc
- Had most of my varargs for hf fix in (had missed one part of a test)
* 1 day of non-linaro work
I'm on holiday next week.
Dave
(short week, 4 days)
RAG:
Red:
Amber: we still haven't settled on engineering blueprints and schedule
for the KVM work. Proceeding with some obviously necessary bits anyway
Amber: not clear whether we can do virtio-mmio this quarter
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
(still no milestones, see above)
Historical Milestones:
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
== upstream-omap3-cleanup ==
* spent a half day on this to try to get this blueprint tidied away
even if we aren't going to have much time for the later upstreaming
== other ==
* A15/KVM planning
* experimenting with getting an A15 kernel booting on Fast Model
(see https://wiki.linaro.org/LoicMinier/Sandbox/FastModels
and https://wiki.linaro.org/PeterMaydell/A15OnFastModels)
* estimated required work for doing a qemu model of a board for
one of the Landing Teams (ans: 6 man months +)
Hi,
- spent most of the week trying to reproduce regressions with vectorization
- started bringing the latest SLP feature, condition with different
types, to gcc-linaro. There are 5 patches. Merged one, started to
prepare another one.
- fixed PR 51112
Ira
Hi,
I've now put this at :
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2011-11-15
Are there any other topics that folks want to bring up ?
The one thing worth thinking about ahead of time is if we want to
bring ahead the call by an hour to allow Michael to join at a not so
crazy hour for him. What do folks think of 9 a.m. Tuesdays /
Wednesdays UTC ?
cheers
Ramana
Hey
When building u-boot for an ARMv5T platform (versatileqemu_config), the
Ubuntu-packaged Linaro cross-toolchain isn't suitable because it
only offers an ARMv7T2 libgcc. But I'd like the build to fail when
that happens rather than silently generating an u-boot.bin which will
trigger a SIGILL when it hits the first non-ARMv5T instruction.
I heard that gcc/ld are supposed to check this, but I'm not sure how
it's supposed to work; perhaps the way u-boot does its final link
prevents this from working properly?
I tried building u-boot as follows:
make O=obj-broken \
CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm \
OPTFLAGS="-marm -march=armv5te" \
versatileqemu_config
make O=obj-broken \
CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm \
OPTFLAGS="-marm -march=armv5te" \
-j2
The final link looks like this:
UNDEF_SYM=`arm-linux-gnueabi-objdump -x /home/lool/git/denx/u-boot/obj-v-broken/board/armltd/versatile/libversatile.o /home/lool/git/denx/u-boot/obj-v-broken/api/libapi.o /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/cpu/arm926ejs/libarm926ejs.o /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/cpu/arm926ejs/versatile/libversatile.o /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/libarm.o /home/lool/git/denx/u-boot/obj-v-broken/common/libcommon.o /home/lool/git/denx/u-boot/obj-v-broken/disk/libdisk.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/bios_emulator/libatibiosemu.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/block/libblock.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/dma/libdma.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/fpga/libfpga.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/gpio/libgpio.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/hwmon/libhwmon.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/i2c/libi2c.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/input/libinput.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/misc/libmisc.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/mmc/libmmc.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/mtd/libmtd.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/mtd/nand/libnand.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/mtd/onenand/libonenand.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/mtd/spi/libspi_flash.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/mtd/ubi/libubi.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/net/libnet.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/net/phy/libphy.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/pci/libpci.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/pcmcia/libpcmcia.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/power/libpower.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/rtc/librtc.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/serial/libserial.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/spi/libspi.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/twserial/libtws.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/usb/eth/libusb_eth.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/usb/gadget/libusb_gadget.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/usb/host/libusb_host.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/usb/musb/libusb_musb.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/usb/phy/libusb_phy.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/video/libvideo.o /home/lool/git/denx/u-boot/obj-v-broken/drivers/watchdog/libwatchdog.o /home/lool/git/denx/u-boot/obj-v-broken/fs/cramfs/libcramfs.o /home/lool/git/denx/u-boot/obj-v-broken/fs/ext2/libext2fs.o /home/lool/git/denx/u-boot/obj-v-broken/fs/fat/libfat.o /home/lool/git/denx/u-boot/obj-v-broken/fs/fdos/libfdos.o /home/lool/git/denx/u-boot/obj-v-broken/fs/jffs2/libjffs2.o /home/lool/git/denx/u-boot/obj-v-broken/fs/reiserfs/libreiserfs.o /home/lool/git/denx/u-boot/obj-v-broken/fs/ubifs/libubifs.o /home/lool/git/denx/u-boot/obj-v-broken/fs/yaffs2/libyaffs2.o /home/lool/git/denx/u-boot/obj-v-broken/lib/libfdt/libfdt.o /home/lool/git/denx/u-boot/obj-v-broken/lib/libgeneric.o /home/lool/git/denx/u-boot/obj-v-broken/lib/lzma/liblzma.o /home/lool/git/denx/u-boot/obj-v-broken/lib/lzo/liblzo.o /home/lool/git/denx/u-boot/obj-v-broken/lib/zlib/libz.o /home/lool/git/denx/u-boot/obj-v-broken/net/libnet.o /home/lool/git/denx/u-boot/obj-v-broken/post/libpost.o | sed -n -e 's/.*\(__u_boot_cmd_.*\)/-u\1/p'|sort|uniq`; cd /home/lool/git/denx/u-boot/obj-v-broken && arm-linux-gnueabi-ld -pie -T /home/lool/git/denx/u-boot/obj-v-broken/u-boot.lds -Bstatic -Ttext 0x10000 $UNDEF_SYM arch/arm/cpu/arm926ejs/start.o --start-group api/libapi.o arch/arm/cpu/arm926ejs/libarm926ejs.o arch/arm/cpu/arm926ejs/versatile/libversatile.o arch/arm/lib/libarm.o common/libcommon.o disk/libdisk.o drivers/bios_emulator/libatibiosemu.o drivers/block/libblock.o drivers/dma/libdma.o drivers/fpga/libfpga.o drivers/gpio/libgpio.o drivers/hwmon/libhwmon.o drivers/i2c/libi2c.o drivers/input/libinput.o drivers/misc/libmisc.o drivers/mmc/libmmc.o drivers/mtd/libmtd.o drivers/mtd/nand/libnand.o drivers/mtd/onenand/libonenand.o drivers/mtd/spi/libspi_flash.o drivers/mtd/ubi/libubi.o drivers/net/libnet.o drivers/net/phy/libphy.o drivers/pci/libpci.o drivers/pcmcia/libpcmcia.o drivers/power/libpower.o drivers/rtc/librtc.o drivers/serial/libserial.o drivers/spi/libspi.o drivers/twserial/libtws.o drivers/usb/eth/libusb_eth.o drivers/usb/gadget/libusb_gadget.o drivers/usb/host/libusb_host.o drivers/usb/musb/libusb_musb.o drivers/usb/phy/libusb_phy.o drivers/video/libvideo.o drivers/watchdog/libwatchdog.o fs/cramfs/libcramfs.o fs/ext2/libext2fs.o fs/fat/libfat.o fs/fdos/libfdos.o fs/jffs2/libjffs2.o fs/reiserfs/libreiserfs.o fs/ubifs/libubifs.o fs/yaffs2/libyaffs2.o lib/libfdt/libfdt.o lib/libgeneric.o lib/lzma/liblzma.o lib/lzo/liblzo.o lib/zlib/libz.o net/libnet.o post/libpost.o board/armltd/versatile/libversatile.o --end-group /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/eabi_compat.o -L /usr/lib/gcc/arm-linux-gnueabi/4.6.1 -lgcc -Map u-boot.map -o u-boot
I verified that all the .o files passed above have Tag_CPU_name:
"5TE" in their arm-linux-gnueabi-readelf -A output; the only
problematic file is -lgcc.
Note that the final link is done with arm-linux-gnueabi-ld and doesn't
set any architecture; I changed it manually to use gcc and pass the
-marm -march=armv5te, and had to set -nostdlib too when using gcc:
arm-linux-gnueabi-gcc -marm -march=armv5te -nostdlib -pie -T /home/lool/git/denx/u-boot/obj-v-broken/u-boot.lds -Bstatic -Ttext 0x10000 $UNDEF_SYM arch/arm/cpu/arm926ejs/start.o -Wl,--start-group api/libapi.o arch/arm/cpu/arm926ejs/libarm926ejs.o arch/arm/cpu/arm926ejs/versatile/libversatile.o arch/arm/lib/libarm.o common/libcommon.o disk/libdisk.o drivers/bios_emulator/libatibiosemu.o drivers/block/libblock.o drivers/dma/libdma.o drivers/fpga/libfpga.o drivers/gpio/libgpio.o drivers/hwmon/libhwmon.o drivers/i2c/libi2c.o drivers/input/libinput.o drivers/misc/libmisc.o drivers/mmc/libmmc.o drivers/mtd/libmtd.o drivers/mtd/nand/libnand.o drivers/mtd/onenand/libonenand.o drivers/mtd/spi/libspi_flash.o drivers/mtd/ubi/libubi.o drivers/net/libnet.o drivers/net/phy/libphy.o drivers/pci/libpci.o drivers/pcmcia/libpcmcia.o drivers/power/libpower.o drivers/rtc/librtc.o drivers/serial/libserial.o drivers/spi/libspi.o drivers/twserial/libtws.o drivers/usb/eth/libusb_eth.o drivers/usb/gadget/libusb_gadget.o drivers/usb/host/libusb_host.o drivers/usb/musb/libusb_musb.o drivers/usb/phy/libusb_phy.o drivers/video/libvideo.o drivers/watchdog/libwatchdog.o fs/cramfs/libcramfs.o fs/ext2/libext2fs.o fs/fat/libfat.o fs/fdos/libfdos.o fs/jffs2/libjffs2.o fs/reiserfs/libreiserfs.o fs/ubifs/libubifs.o fs/yaffs2/libyaffs2.o lib/libfdt/libfdt.o lib/libgeneric.o lib/lzma/liblzma.o lib/lzo/liblzo.o lib/zlib/libz.o net/libnet.o post/libpost.o board/armltd/versatile/libversatile.o -Wl,--end-group /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/eabi_compat.o -L /usr/lib/gcc/arm-linux-gnueabi/4.6.1 -lgcc -Wl,-Map u-boot.map -o u-boot
But this command works and produces an u-boot ELF which has
Tag_CPU_name: "7-A".
How would I break the build when libgcc isn't ARMv5T?
Thanks,
--
Loïc Minier
Spun Linaro GCC 4.6 release tarball, uploaded it to Michael's server,
and launched the testing.
Continued work on constant reuse optimization. I've now eliminated some
more false positives caused by inconsistent rtx_cost results. It turns
out the pass also fixes up inefficient constants generated by
arm_split_constants, which is nice.
Set yet more spec benchmark runs going as part of the generic tuning
investigation.
Other:
Half day Monday to recover from the weekend's travel.
Half day on internal Mentor activities.
== GDB ==
* Worked on support for cross-platform core file generation.
After some discussion on the mailing list it seems we've
come to an agreement that the remote protocol ought to have
two separate packets related to memory layout, one that
describes the permanent, system-wide layout (for embedded
systems) and one that describes the dynamic, per-process
layout (for processes with memory-mapped files). The latter
also ought to be integrated with the "info proc mappings"
command, which should work with gdbserver too.
I've been working on updating the patches accordingly.
== GCC ==
* Patch review week.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Testing the SMS register pressure estimation on libav micro benchmarks
and eembc.
Discussed with Ayal the implementation. He had some ideas to consider
regarding the it.
Looking into the regressions of SMSed kernels in libav which are not
related to register pressure:
Consulting with Ayal regarding the case in dsputil-ssd_int8_vs_int16_c
where we have severe regression with SMS; it seemed that the
regression was due to dependence between accumulations that can be
avoided, more specifically we had the following case in vector code:
vec1 = vec1 + ...
...
vec1 = vec1+ ...
...
vec1 = vec1+ ...
...
vec1 = vec1+...
to resolve this, I implemented a hack similar to MVE optimiation in
the loop-unroller as follows:
vec1 = vec1 + ...
...
vec2 = vec2+ ...
...
vec3 = vec3+ ...
...
vec4 = vec4+...
This gives ~4.5% improvements to the non-SMSed version. The SMS
version now shows no regression as the problematic loop which caused
the regression now failed to be SMSed and I'm looking into the reason.
Another regression showed in idct-internal-8 is apparently related to
the do-loop optimziation (SMS actually failed to be applied in this
loop). when applying the patch to expand SMS to recognise doloop then
the regression is resolved.
(http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02051.html; patch is not
in mainline yet)
Summary:
* Add expat and ncurses support for gdb-cross.
* Rebase and create patches for crosstool-ng upstream.
* Compare configurations difference between crosstool-ng and embedded toolchain.
Details:
* Add expat and ncurses support for gdb cross. At this time, all the
packages can be built for both Linux and Mingw32 host with baremetal
target.
* Rebase and create patches for crosstool-ng upstream. Patches are
sent to Michael for review.
* Compare configurations difference between crosstool-ng and embedded
toolchain. To align the configuration, crosstool-ng
* need (not necessary) --disable-nls for companion libs.
* need make document.
* need add --enable-newlib-register-fini config and enhance
CFLAGS_FOR_TARGET for newlib.
* need --disable-sim for gdb.
* need multilib support (Can workaround for current implementation.
Need improvement when it is fully supported).
Thanks!
-Zhenqiang
== 64 bit atomics ==
* Nailed one more of the membase tests; again this was a test
harness race condition (which I've reported here:
http://code.google.com/p/moxi/issues/detail?id=2&thanks=2&ts=1321037460 )
In this case there were two calls to write) performed on the
server, yet the test client performed a single read and
compared the result to what it was expecting; and got lucky on x86 and
about half the time on ARM in that the
server data managed to all get read by the 1st read.
I think this leaves one more case - that I've seen rarely.
== Qemu ==
* Tested Peter's 11.11 pre release; ran into a couple of issues
(vexpress without sound causing hangs, and
the Linaro 11.10 Beagle and Overo images not running X). Also
filed a couple of bugs in l-i-f-ui that
I tripped over while testing it.
== String routines ==
* The new newlib A15 optimised memcpy is slower on an A9 than my
routines; posted to newlib list
asking what the normal way of dealing with a bunch of different
routines is. Would it make sense to get
gcc to define a GCC_ARM_TUNE_CORTEX_A-whatever ?
== Other ==
* Watched the Youtube video of the Kernel/Toolchain discussion -
for those who didn't attend,
I'd encourage a check of the Youtube videos, they're pretty nicely done.
* Got pulled away on non-Linaro work for about half the week.
Dave
Hi,
Android:
* managed to remotely debug a system process (like debuggerd) using
gdbserver
libunwind:
* found an error when unwinding via DWARF debug frames when
configured for REMOTE_ONLY
* discussions on the me revealed that libunwind-ptrace should not be
compiled for REMOTE_ONLY case at all (it was intended for host!=target)
* this means that our current build approach on Android needs to be
changed in the future
Misc:
* internal meetings
Regards
Ken
RAG:
Red:
Amber: upstream-omap3-cleanup stalled, not clear whether we're going
to have any time for it this quarter
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
(Future milestones to be added once post-Connect planning is completed.)
Historical Milestones:
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
== linaro-qemu-11.11 ==
* 2011.11 release: tarball built, tested and released
== other ==
* nailing down A15/KVM work we're going to do this quarter
* usual upstream code review/etc
* sent patches upstream to fix some easy bugs somebody found
running Coverity on QEMU's source code
Hi!
* Ran EEMBC and SPEC on the ursa4. Sorted out a bunch of basic questions
related to permissions ans such with Michael. Familiarized myself with the
scripts for parsing benchmark results.
* Created wiki for running benchmarks in cbuild. It is in my sandbox right
now: https://wiki.linaro.org/AsaSandahl/Sandbox/RunningBenchmark
* Started off SPEC runs for comparing the "train" and "ref" data sets. We
want to know if the changes between variants are the same for the two sets.
Best regards
Åsa
The Linaro Toolchain Working Group is pleased to announce the
release of Linaro QEMU 2011.11.
Linaro QEMU 2011.11 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- The ARM vexpress-a9, versatilepb, versatileab and realview-*
boards now have audio support (thanks to Mathieu Sonet who
contributed a PL041 implementation upstream)
- Support for multiple instances of the "-sd" option on the
command line has been dropped; this was never present in
upstream QEMU and has been removed for consistency. Use
"-drive,if=sd,index=N,file=file.img" for N=0,1,2... instead
- Fixes #886980: 8 and 16 bit reads from the OMAP GPIO module
would crash due to an infinite recursion
- Fixes #823902: problems running multithreaded programs in
linux-user mode
Known issues:
- Graphics do not work for OMAP3 based models (beagle, overo)
with 11.10 Linaro images.
NB: if you run QEMU on a host system without properly configured
audio you might find that QEMU now hangs at some point; you can
fix this by fixing your host system, or work around it by setting
the environment variable QEMU_AUDIO_DRV=none.
If you build from source you may now want to pass configure
a suitable --audio-drv-list=LIST option.
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.11
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
Hi,
- SLP improvements for weight-h264-pixels16x16-8 (libav):
- conditions in SLP - committed upstream
- support pattern detection in SLP - implemented
- enhance mixed condition pattern to handle non-constant then/else
clauses - implemented
weight-h264-pixels16x16-8 now gets vectorized with 2.6x speedup.
- Vectorizer maintenance (bug fixes, patch reviews).
I'll be on vacation on Sunday.
Ira
Hi there,
As discussed with Loïc, please find attached my slides to the ELCE presentation related to our implementation of linux-awareness for JTAG debugging. I still need a formal clearance of my organization on my contribution patch, but I (and my managers) will be happy to see some or all of this work benefit to-and-from the community. If you are interested, I will try to upload a self-contained qemu-based demo to a public ftp.
Cheers,
Marc Titinger.
> -----Original Message-----
> From: Loïc Minier [mailto:lool@dooz.org]
> Sent: Wednesday, November 02, 2011 3:19 PM
> To: Marc TITINGER
> Cc: Michael Hope; Ulrich Weigand
> Subject: Re: contribution Linux Kernel Debugger
>
> On Wed, Nov 02, 2011, Marc TITINGER wrote:
> > J'ai été content de vous rencontrer toi et Nicolas à la suite de ma
> > présentation pour discuter de l'opportunité de la contribution de
> > notre debugger linux. J'ai une question: dans la mesure ou STMicro ne
> > fait pas partie des membres de linaro, aurais-je un accès restreint
> > aux outils et discussions si je souhaite contribuer? Quels serons les
> > blueprints correspondant à ce projet ?
>
> Most things Linaro does are public; concerning the toolchain,
> benchmarks are kept private due to licensing constraints. Even if
> STMicro is not a member, you're welcome to present your ideas and code
> to us (we don't mind if STMicro joins as a member though ;-).
>
> We covered topics similar to your ELC-Europe talk this week:
> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/linaro-
> toolchain-kernel-debugging
>
> Ulrich Weigand and Michael Hope will continue discussions around where
> we will go in terms of helping kernel debugging next cycle, it might
> be
> that we end up working on similar areas than the ones you and I
> discussed (special handling for linux in GDB -- tasks, backtracing
> across kernelspace/userspace; OpenOCD fixes...).
>
> Your slides don't seem to be at
> https://events.linuxfoundation.org/events/embedded-linux-conference-
> europe/titinger
> yet, so perhaps you could share a link with Michael and Ulrich? or
> post on the linaro-toolchain@ mailing-list
>
> We'll be a bit busy this week, but if you want to discuss your
> patches,
> upstreaming, further developments, I would think Michael can arrange
> for you to join a Toolchain WG call in the next weeks -- Michael, I'll
> let you comment once you get to see the slides :-)
>
> --
> Loïc Minier
Summary:
* Add zlib and libiconv support in crosstool-ng and repack embedded
toolchain source package.
Details:
* Read crosstool-ng scripts, configs and document to learn on how it works.
* Try mkedwards's extensions for crosstool-ng at
https://github.com/mkedwards/crosstool-ng. It does have lots of
extensions, the GDB-cross can build. But zlib and libiconv do not meet
our requirement.
* Add config, patch and build scripts for zlib and update the binutils
build scripts to use the prebuilt zlib.
* Add config and build scripts for libiconv and update the build
scripts of gcc and gdb.
* Write scripts to patch and repack embedded toolchain source packages
to the standard format.
Plans:
* Linaro connect: Oct. 31 - Nov. 4.
* Integrate the repack scripts with crosstool-ng.
Thanks!
-Zhenqiang
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
== other ==
* Linaro Connect week. Included an extremely useful double-length
session about KVM on A15, which should turn into blueprints/plans
in due course
* Found out a bit more about UEFI -- I'm leaning towards having QEMU
for vexpress run UEFI by default as a way of letting you just pass
it a disk image rather than having to feed it a separatekernel/initrd.
(Will look into this more when the ARM landing team have it all
building and working on hardware.)
* I have a working prototype of the QEMU virtio-mmio transport (written
to Pawel's spec). However to get this upstream we will first need to
properly refactor the qemu virtio code so the link between the
transport and the blk/net/etc backends is a qdev bus.
-- PMM
Continue working on the regsiter pressure estimation implementation -
testing the implementation on libav micro benchmarks.
With the patch some SMSed kernels in put-h264-qpel8-hv-lowpass-8,
swscale-rgb24ToY_c mjpegenc benchmarks are identified as having
register pressure.
I'm looking at the kernels which still have regressions with SMS and
it seems the reason is not related to register pressure.
Hi All,
This is a brain dump of what I learned about running LAVA today.
Dave will probably find a place for this in the Validation wiki, but
I'll pass it round in the meantime.
Hope it helps
Andrew
Hi,
* libunwind
* posted small bug fixes
* noticed the unwinding on Android is broken somehow
(need to track down the commit that broke it)
* linaro android
* repo sync fails due invalid bionic commit id (#885792)
* tried to remotely attend the Connect
* +1 for having live streams of the plenaries
(http://video.ubuntu.com/live/)
* -1 for pointing us to the wrong grand sierra irc channels
(http://uds.ubuntu.com/participate/remote)
* icecast streams worked most of the time
(* public holiday on tuesday)
Regards
Ken
=== 64 bit atomics
* I got the race in membase down to a futex issue, and asking dmart
pointed me at a kernel bug that
affects recent kernels where a fix had gone in about a month ago.
That was a nasty one!
* I've still got a few bugs left; most are turning out to be timing
races in the test code (e.g. one that
times out after 2seconds but the code takes around 1.7 seconds ish -
but if something else gets
in trips over the line, and another one where it did a recv_from on a
socket but only got
the start of a message, presumably because the sender had used
multiple sends). It's tricky going
because the tests are a combination of most scripting languages (perl,
python, ruby with a splash of Erlang).
I've so far found no bugs in the atomic code.
* I looked at apr and SDL-1.3; both of which use atomics; but end up
not using 64bit atomics;
the tendency is for them to ensure they can do atomics on long and on
a void*; both of which
for us are 32bit.
=== String routines
* I've got the Newlib A15 optimised memcpy running in a test harness
at the moment for
comparison.
=== Listening to connect
* I listened in to a few connect sessions each day; the 1st day or
so was 3/4 lost on
audio systems that didn't work (I'm especially annoyed at not being
able to hear the QEMU for A15/KVM session
and toolchain support for kernel). The Rypple session was rather lost
through the lack of any screen share
or slides.
Hello all,
I've been playing around with linaro and have it working on my
Pandaboard locally. I have a couple of questions about the linaro
environment; if this is the wrong forum, I'm happy to take it elsewhere.
I see that Linaro makes monthly releases of the hwpacks and images.
How are the packages/binaries in those images created? Are they
cross-compiled, or compiled natively on the target platform? If they
are cross-compiled, how is the environment created?
The reason I ask is that we've been looking at cross-compiling some
packages ourselves, and have been running into issues. So we were
wondering what toolchain the linaro community uses.
Thanks in advance,
--
Chris Lalancette
Hello all,
I've been playing around with linaro and have it working on my
Pandaboard locally. I have a couple of questions about the linaro
environment; if this is the wrong forum, I'm happy to take it elsewhere.
I see that Linaro makes monthly releases of the hwpacks and images.
How are the packages/binaries in those images created? Are they
cross-compiled, or compiled natively on the target platform? If they
are cross-compiled, how is the environment created?
The reason I ask is that we've been looking at cross-compiling some
packages ourselves, and have been running into issues. So we were
wondering what toolchain the linaro community uses.
Thanks in advance,
--
Chris Lalancette
Hi,
- Finished rewriting SLP analysis to support not only unary and binary
operations. Committed upstream.
- Implemented cond_expr support in SLP (for libav weight_h264_pixels).
Testing it now.
- Vectorizer maintenance (test/bug fixes, patch reviews).
Ira
Testing an initial version of the implementation which estimates
register pressure in SMS on libav micro benchmarks.
I see 20% improvements in mjpegenc microbench and 11% on aacsbr-2 with
SMS. However swscale-rgb24ToY_c
still have spills in the final code although it requires maximum 64
VFP_REGS registers out of the available 64 registers so I'm trying to
understand the reason for the spill.
==Progress===
* Off for one day during the week for Diwali.
* Connect preparation - Wrote down areas to look at during connect and
tried to plan what we
want to look at during connect.
* Looked at some of the cases with vcond<float> with Ira and helped
frame blueprint.
* Investigated one of the big performance regressions in the popular
embedded benchmark
and looked at why it wasn't being vectorized only to realize that it
couldn't be. Thanks
Ira. Still don't know why ARM state is 22% faster than Thumb2 state.
* Looked at the issue with fPIC where GCSE appears to remove a label
for sometime
but not much progress.
=== Plans ===
* Connect ! next week and then vacation.
Absences.
* 26 Oct - Diwali
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked -
* 08 Nov - 11 Nov - Vacation booked
* Dec 19 - 31st Dec - Vacation booked
== 64 bit atomics ==
* I've been building and testing membase
* Version 1.7.1.1 source builds OK (after turning off -Werror due to
some of their curious type naming)
* The git version fails to build - it doesn't seem consistent
* 1.7.1.1 passes simple tests, but there are 3 tests in its test
suite that intermittently fail on ARM and
seem to be solid on x86. (There are also some that just require
timeouts increased due to the
relatively slow machine).
* t/issue_163.t turned out to be a timing race in the test itself,
made worse by being on a relatively slow
machine and probably made worse by the Pandas odd idea of timing.
That was reported to them with
a break down of it, and upstream has fixed their test. (
http://code.google.com/p/memcached/issues/detail?id=230 )
* t/issue_67.t is proving tougher; once in a while memcached will
lock up during init in thread_init;
there is one particular point where adding a printf will make it work
apparently reliably. I've got one
or two ideas but I need to check my understanding of pthread_cond_wait first.
* There is an assert I've seen triggered once - not looked at that yet.
== String routines ==
* While I was off last week, my memchr and strlen were accepted into newlib
* Joseph has responded to my eglibc mail, with a couple of small queries.
== Other ==
* Wrote a more detailed test case for bug 873453 (odd timing
behaviour on panda); it's
quite odd - I can get > ~80ms timing discrepency so it's not a clock
granularity issue.
* Replicated a QEMU crash for Peter.
Dave
Hi,
* finished changing libunwind to be more portable
* tested patchset on ARM and X86_64
* now builds on Android without modifications
(Android.mk, config.h and libunwind-common.h are still required)
* verified that the modified debuggerd still works
* discussed backtracing using libunwind on ARM with Harald from the BSC
* they use libunwind in a sampling tool that generates Paraver
tracefiles
* started to upgrade my Linaro Android environment and ran into issues
* need to check:
* why building the toolchain using linaro-build.sh fails
* why repo sync fails due to invalid platform/bionic SHA1
* what happened to LEB-panda.xml
Regards
Ken
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
== a15-usermode-support ==
* A15 instruction support patches committed upstream in time for
upstream's 1.0 release
== upstream-omap3-cleanup ==
* some work on restructuring the omap3 patchset -- it's now basically
in the right order and the last 'touches several different
bits of code' jumbo patch has been split
== other ==
* sent some patches upstream which address the main things I
want to get into qemu 1.0 (PL041 audio support and fixing a
regression in handling multithreaded programs in linux-user mode)
* A15 KVM planning work and other preparation for Linaro Connect
* finally tracked down the qemu-on-ARM memory corruption: we
mmap the code generation buffer at 0x1000000 with MAP_FIXED;
unfortunately this is now in the middle of glibc's heap...
(filed as LP:883133)
* qemu now has a coroutine implementation which defaults to using
makecontext() if it is present. Unfortunately ARM eglibc provides
an implementation which always returns ENOSYS, which is a bit
tricky to detect with a compile time configure check (without
breaking cross-compilation support).
* these two things (and some other known bugs) mean that QEMU on
ARM hosts is basically broken, and will probably continue to be
since we don't have the spare resource to test and fix bugs
(beyond those which we need to fix for KVM-on-ARM)
* Looked at how to configure Firefox and how to build different parts of the
program. Usage of .mozconfig, myrules.mk and myconfig.mk.
* Tested the Talos framework. https://wiki.mozilla.org/Buildbot/Talos. I
think it would be good to use Talos for the browsing benchmarks. We can
discuss it further at connect.
* Preparing for connect.
Best Regards
Åsa
Summary:
* Exercise crosstool-ng and summarize the gaps.
Details:
* Exercise crosstool-ng
(1) Sync with lp:~linaro-toolchain-dev/crosstool-ng/linaro.
(2) Try to config linux-host-baremental-target an
mingw32-host-baremental-target.
(3) Try to build the toolchain for both embedded toolchain and
linaro-gcc-4.6-2011.10 with the config.
. C compiler for linux and mingw32 hosts and c++ compiler for
linux host can be built without any change.
. C++ compiler for mingw32 host can be built after PCH is disabled.
. GDB-cross build fail due to dependence packages.
* Gaps in crosstool-ng
(1) Improve GDB-cross scripts to download and build the dependence
packages: expat and ncurses. Or put expat and ncurses as
companion_libraries.
(2) To remove dependence, embedded toolchain requires more
prerequisites like zlib.
New config and scripts are required to support the packages.
(3) Currently, the embedded toolchain source packages are released
as a tarball, which includes gcc, gmp, etc. New scripts are required
to support it.
(4) To make sure the toolchain can run with lower version glibc like
redhat4/5, the embedded toolchain requires lower version native
gcc4.3.6 to build it.
To support it,
. Users can build the native gcc manually, or
. Enhance the scripts to add one step to build native gcc.
(5) All the default package configurations are different from
embedded toolchain internal build scripts.
Since the configurations in embedded toolchain had been tuned
and tested, we will change the configurations in crosstool-ng if they
do not match and not configurable.
The same rule will apply for linaro toolchain.
Plans:
* Write scripts to re-pack the embedded toolchain source packages.
* Add the supports for all prerequisites in crosstool-ng menuconfig.
Thanks!
-Zhenqiang
Posted a patch upstream to fix big-endian for generic tuning. This was a
simple omission from my previous patches.
Merged GCC 4.6.2 to Linaro GCC. It's still in testing now, so I'll have
to commit it sometime over the weekend or next week.
Looked at the benchmark results from Spec2000 running on both A8 and A9
systems, with and with NEON, and with various compiler options. Posted
the results in a spreadsheet (visible within Linaro only).
Begun making adjustments to generic tuning and started new spec2k runs
to see if they are beneficial. First, I'm trying A9 prefetch settings on
A8 to see how much damage it does. Next I'll try enabling the A8 NEON
tuning settings on A9 to see what happens there.
Prepared for travel next week.
Vacation Friday
Hi,
- Merged to gcc-linaro:
- widening shifts
- SLP features: support loads with different offsets and swap
operands if necessary
- Started rewriting SLP analysis to support operations with more than
two operands (towards SLP of conditions)
- Updated NEON presentation following Ramana's suggestions (thanks!)
- Suggested to Ramana to implement vcond with mixed types, created a
blueprint: https://blueprints.launchpad.net/gcc-linaro/+spec/vcond-with-mixed-types
- Vectorizer:
- updated vectorizer's webpage
- updated vectorizer's wiki page
- the usual maintenance
- Committed upstream two SLP data-ref analysis improvements: PR 50730
and PR 50819
Ira
Hi there. Connect is just around the corner. Have a look at:
https://wiki.linaro.org/MichaelHope/Sandbox/Q4.11Plans
for a summary of the toolchain sessions and hacking topics.
It would be great to have kernel and OCTO input in the ARM STM driver,
Kernel debugging, and KVM sessions.
-- Michael
Hi Folks,
Draft agenda for the performance meeting next week at Connect -
https://blueprints.launchpad.net/gcc-linaro/+spec/linaro-toolchain-performa…
Are there any topics that people would like to bring up during this
meeting other than the ones listed here ? I suspect that we'll
probably just have about 10-15 minutes for a topic in this case. I am
not considering discussing PGO related stuff in this session given
that we've got another session in which we can discuss this.
Thoughts ?
cheers
Ramana
Hi Folks,
I've been trying to capture what we want to do in terms of hacking
time and some of the performance related backlog that we have in the
system. I have done so here.
https://wiki.linaro.org/RamanaRadhakrishnan/Sandbox/Q411ConnectGCCPerfPlan
I'm on vacation tomorrow but should be picking email for sometime
during the day.
Thoughts about what else we could be doing in this area or if there's
a better way we could use our hacking time.
cheers
Ramana
---
At the moment ARM eglibc doesn't support the functions declared
in ucontext.h: getcontext(), setcontext(), swapcontext() and
makecontext(). Instead you get implementations which always
fail and set errno to ENOSYS.
QEMU uses these functions to implement coroutines. Although there
is a fallback implementation in terms of threads, there are reasons
why using the fallback is suboptimal:
* its performance is worse
* it will be less tested, because x86_64 and i386 both implement
the ucontext functions and so QEMU on those hosts will be using
different code paths
* I'm not aware of a good way at configure time to detect whether
getcontext() et al will always fail without actually running a
test binary, which won't work in a cross-compile setup. (If eglibc
just didn't provide the functions at all this would be much
simpler...)
We're going to care about performance and reliability of QEMU on
ARM hosts as we start to support KVM on Cortex-A15, so it would
be good if we could add ucontext function support to eglibc as
part of that effort.
Opinions? Have I missed some good reason why there isn't an
ARM implementation of these functions?
(I'm aware that the ucontext functions have been removed from
the latest version of the POSIX spec; however AFAIK there's no
equivalent functionality that replaces them so I think they're
still worth having implementations of for parity with other
architectures.)
-- PMM
==Progress===
* Some upstream patch review.
* Spent time looking at LP 836588 which is a case where CSE removes a
particular label access in one case but doesn't remove it from the
list of things in the constant pool which is quite bizarre. Will
probably need some help with looking into this one.
* Sent out vcvt.f32 and vcvt.f64 patches .
* Connect preparation - laptop cleanup and getting it finally onto an
x86_64 distribution.
* Looked at some of the vec_perm / vec_rev cases in Neon with Ira.
* Spent some time looking at some of Andrew's issues with generic-v7a
tuning especially the cases where it was doing better and gave some
suggestions.
=== Plans ===
* Prepare for Connect.
* Prepare by looking at some of the large differences between
various comparative benchmarks.
* Some research into PGO related stuff.
* Try to upstream some more of my patches in the backlog before the
end of the week.
* Finish off some internal paperwork.
* I'm off on 26th - Wednesday.
Absences.
* 26th Oct - Day off.
* 31st Oct - 4th Nov - Linaro Connect Q4.11
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
Continued looking at my constant reuse optimization. I've identified a
couple of hundred optimization opportunities in the whole of gcc itself,
which is fewer than I had hoped. There are almost no opportunities when
compiling for size as constants are always loaded from a constant pool
in that case (I'm not sure why that's the case, given that this isn't
any more space efficient than movw+movt, unless it can share the
constant in more than one place).
Backported my -mtune=native patch to Linaro GCC.
Backported my generic tuning patch to Linaro GCC.
Backported my pr50717 patch to Linaro, and pushed to Launchpad for testing.
Analysed my benchmark results I made to aid generic tuning.
Disappointingly the A8/A9 tuning is not as beneficial as one would like.
In fact, the existing generic tuning patch (which was supposed to be a
framework only) is actually quite competitive and gives better
performance in some cases.
Set more benchmarks running, this time with NEON enabled. That's about
36 hour's worth on A9, and more like 90 hours on my A8 (obviously,
there's some difference in the clock speeds there).
Discovered that my native tuning code won't compile with a C++ compiler
(GCC Bugzilla PR50809). Tested and committed a fix upstream.
== GDB ==
* Worked on support for cross-platform core file generation.
Posted initial set of patches for comments.
* Created "Toolchain support for kernel debugging" blueprint.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I made some progress on transforming the hacks I did to get libunwind
working on Android into proper patches that can go upstream. Things learned:
* bionic employs OpenBSD header files that therefore lack some GNU and
ARM specific defines (only small fix needed - plan is to change
libunwind to work with non-patched bionic too)
* Android basically provides all the functionality that is required
for libunwind-ptrace - but...
* no one seems to build libunwind with remote unwind functionality
(including libunwind-ptrace) only
* most of the issues can be avoided by changing libunwind to be more
portable
Regards
Ken
* Still having trouble with using multistrap/pdebuild-cross for
cross-compiling Firefox - it looks like only x86 packages get downloaded,
not armel. I have asked Wookey for advice, and he will try to reproduce the
build.
* Falling back to native compiling until the cross-compiling set up has been
sorted out. I will now take a look at how to pass different compiler options
to the Mozilla build system and how to build different parts of the program.
Best Regards
Åsa
(short week: 4 days)
RAG:
Red:
Amber:
Green: blog started :-) http://translatedcode.wordpress.com/
Current Milestones:
|| || Planned || Estimate || Actual ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
== other ==
* upstream patch review, putting together pull requests
* more time spent on qemu on ARM host apparent memory corruption
bug (no luck yet :-(); found a Valgrind bug in the process,
though (KDE:284472). This ate up way too much of this week.
* A15 KVM planning work
* meetings etc
* moved over to patches.linaro for QEMU patch tracking
-- PMM
Hi,
* widening shifts - finally committed upstream
* SLP loads with different offsets and operand swaps - committed upstream
* SLP with multiple types - merged to gcc-linaro-4.6
* vectorizer stuff: patch review, test fixes, discussions, bug fix
* Ramana and I discussed what can be done with VEC_PERM_EXPR for NEON,
and created https://blueprints.launchpad.net/gcc-linaro/+spec/support-vec-perm
for this issue.
Ira
Following on from last night's performance call, I had a look at how
64 bit integer operations are mapped to NEON instructions. The
summary is:
* add - fine
* subtract - fine
* bitwise and - fine
* bitwise or - fine
* bitwise xor - fine
* multiply - can't as the instruction tops out at 32 bits. Might be
able to compose using VMLAL
* div, mod - no instruction
* negate - instruction tops out at 32 bits, but could be turned into
vmov #0, vsub
* left shift constant - missing
* right shift constant - missing
* right arithmetic shift constant - missing
* left shift register - missing
* right shift register - tricky, as you do this as a left shift -register
* not - no instruction, but could be done through a vceq, #0?
* bitwise not - missing
I also noticed that the replicated constants aren't being used. A
pre-increment is load constant pool; vadd but could be done as a vmov,
#-1; vsub. The same with pre-decrement - it could be done as a vmov,
#-1; vadd.
This seems worth blueprinting.
-- Michael