== GDB ==
* Prepared Linaro GDB 7.2-2011-02.0 release
* Committed two patches to implement LP #661253 (Improve
backtrace by using ARM exception tables) to mainline and
Linaro GDB 7.2
* Provided two follow-on fixes to the patch for LP #616000
(Handle -fstack-protector prologue code); both applied to
mainline and Linaro GDB 7.2
* Backported mainline fix for LP #685494
(gdb.xml/tdesc-regs.exp failure) to Linaro GDB 7.2
* Identified root cause of LP #711375 (gdb internal error
in inline_frame_this_id trying to debug a qemu target);
committed fix to mainline and Linaro GDB 7.2
* Worked on re-implementation of fix for LP #615978 (Failure
to software single-step into signal handler)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber: DATE/QEMU conference still hasn't confirmed I have a place...
Green: qemu-linaro RC2 prerelease uploaded
Current Milestones:
| Planned | Estimate | Actual |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | handed off |
* maintain-beagle-models:
+ RC1 packaging flushed out some bugs:
++ we include some binary blobs which we're better not distributing:
https://bugs.launchpad.net/qemu-linaro/+bug/709965
++ a couple of array overruns don't compile with our picky armel gcc:
https://bugs.launchpad.net/qemu-linaro/+bug/709711https://bugs.launchpad.net/qemu-linaro/+bug/711272
(both fixes sent and applied upstream)
++ we ought to include a fix for the "swp" in qemu-lock.h:
http://patchwork.ozlabs.org/patch/81205/
(patch sent upstream, no comment on it yet)
+ ...so I have rolled an RC2 tarball with these fixed
+ investigated why USB keyboard model doesn't work: it turns out
that the OMAP host USB model is basically just a stub
+ some warnings from qemu about bad width accesses are a symptom of
a nasty disagreement between gcc and the kernel about getting atomic
32 bit accesses:
http://www.spinics.net/lists/arm-kernel/msg113002.html
+ warnings from qemu about access to a nonexistent i2c register
appear to be a kernel bug in the i2c driver for OMAP36xx:
https://bugs.edge.launchpad.net/linux-linaro/+bug/645324
+ investigated a hang running images built with linaro-media-create's
--swap_file option. Still digging but I suspect the Linux driver
doesn't cope with an MMC card which can erase in zero time...
* merge-correctness-fixes
+ the usual upstream mailing list monitoring and code review
+ sent patch for PLI and hint space decoding fixes:
http://patchwork.ozlabs.org/patch/81711/
* misc
+ some non-Linaro time this week; may be more next week
+ some advance planning of what we might want to do with QEMU
in the future
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hi,
* I continued to setup the pandaboad but ran into #708883
- tested vaious various hwpacks and headless images
- used different power supplies and several SD cards
- built various test kernels from the linux-linaro-natty.git
* looked into the crash utility
* https://wiki.linaro.org/KenWerner/Sandbox/crash-utility#future%20areas
Regards
Ken
Trying to build qemu on a beagle board:
CC sparc64-linux-user/translate.o
cc1: out of memory allocating 26975704 bytes after a total of 70742016 bytes
# ls -lh target-sparc/translate.c
-rw-r--r-- 1 root root 191K Jan 31 12:05 target-sparc/translate.c
ie gcc wants (at least) 100M of RAM trying to compile a 190K sourcefile.
(and probably more overall since the board has 500MB RAM total and
it hit the out-of-memory condition).
This seems a bit excessive to me, but do we consider it enough of
a bug to be worth looking into?
(I believe this source file has caused compile failures on the buildds
too, which have rather more RAM/swap than my beagle.)
-- PMM
Hello,
Profiling Denbech:
* The profiling information on x86 indicate that some benchmarks might need
to run longer as helper functions such as t_run_test are reported to be
hot.
So I've increased the time each benchmark is executed and will continue to
experiment with that for the problematic benchmarks until I start to see
more reasonable results.
* Opened PR711819 after having an ICE running DENbench with trunk (natively
built on ARM machine).
- I've started to look at an old patch for SMS that was written in 2005 by
colleague; Mostafa Hagog; to place the register moves in free slots.
Currently, they are placed greedily before each def thats needs them.
Thanks,
Revital
Hi,
I continued to work on vect_interleave and vect_extract implementation on NEON:
* debugged the compiler to find out what's the problem with
neon_vzip/vuzp<mode>_internal
* fixed it following Uli's advice
* checked how neon_vzip/vuzp<mode>_internal work for intrinsics by
writing tests
* fixed the patch according to Uli's comments
* now fully testing the patch
Thanks,
Ira
Hi,
I am trying to implement interleave_high/low and extract_even/odd
using vzip and vuzp instructions. I am attaching a patch that attempts
to do that. It uses already existing neon_vzip<mode>_internal. The
problem with it is that it doesn't express the fact that the two
outputs of vzip depend on both inputs, which causes wrong code
generation in CSE:
for
(a,b)<- vzip (c,d)
and
(e,f) <- vzip (g,d)
CSE decides that b==f, since on RTL level b and f depend only on d.
Here is neon_vzip<mode>_internal:
(define_insn "neon_vzip<mode>_internal"
[(set (match_operand:VDQW 0 "s_register_operand" "=w")
(unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")]
UNSPEC_VZIP1))
(set (match_operand:VDQW 2 "s_register_operand" "=w")
(unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")]
UNSPEC_VZIP2))]
"TARGET_NEON"
"vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
[(set (attr "neon_type")
(if_then_else (ne (symbol_ref "<Is_d_reg>") (const_int 0))
(const_string "neon_bp_simple")
(const_string "neon_bp_3cycle")))]
)
Is there a way to properly mark the dependence?
Thanks,
Ira
Hi; this is a note to say that we have now produced a prerelease
tarball of qemu-linaro. (The first formal qemu-linaro release will
happen in sync with other toolchain group releases on 8th Feb.)
This prerelease is primarily to pipeclean the release process and
to allow work to start on producing Ubuntu and Linaro packages;
however it does include a number of useful bugfixes which are
required if you want to be able to boot a recent Linaro snapshot
on the beagle model. So the enthusiastic might like to build it
from source and give it a spin.
Like the Linaro kernel trees, the qemu-linaro tree aims to only
include patches we are confident will go upstream; at the moment
this means the OMAP3 support and ARM correctness fixes from
the qemu-meego tree, based on the qemu upstream trunk.
You can download the source tarball from:
https://launchpad.net/qemu-linaro/+milestone/2011.02
-- Peter Maydell
Hi,
What do people understand to be the expected semantics of IT blocks
in the cases below, of which there has been some confusion
in relation to a recent Qt issue.
The code in question had a sequence something like:
comparison
IT... EQ
blahEQ
TEQ
BEQ
The important bits here are that we have an IT EQ block and two special cases:
1) There is a TEQ in the IT block - are all comparisons in the block
allowed and do their effects immediately take
effect? As far as I can tell this is allowed and any flag changes are
used straight away;
2) There is a BEQ at the end of the IT block, as far as I can tell,
as long as the destination of the BEQ is close it shouldn't
make any difference if the BEQ is included in the IT block or not.
Does that match everyone elses understanding?
Dave
Hi all,
With gas, does anyone know of a way to create a section whose name is
based on that of the current section?
The specific requirement is to be able to define a generic macro like
the example "fixup" below, whose purpose is to record ancilliary data
related to some other section. To illustrate:
.macro fixup
100\@ :
.pushsection fixup<current section name>, "a"
.long 100\@b
.popsection
.endm
.text
...
fixup
.long sym1
...
.section .other, "ax"
...
fixup
.long sym2
The linux kernel uses a technique just like this for patching SMP
kernels at bootup to work on uniprocessor platforms (when
CONFIG_SMP_ON_UP is enabled), resulting in code looking something like
this:
void exit __attribute__ (( __section__ (".text.exit") ))
{
...
asm(
...
FIXUP("something")
...
);
}
Note that the inline asm may actually come out of a generic header
file rather than being explitly written for this invocation. So it
may have to be truly generic.
Is far as I have been able to determine, it's not possible to generate
sections named based on the current section. In practice, the kernel
puts all the fixups into a single section.
The downside of this is that when sections are selectively discarded
at link time (which in general may happen -- for example, Linux
discards the "module exit" code for drivers which are built into the
kernel and therefore never exit) there is no way to selectively
discard the related fixup entries. Currently the only solution is to
include all the module exit code in the image and discard it at
run-time when the kernel boots. This is obviously wasteful.
Attempting to discard that code at like time results in a link error,
since fixups refer to the removed sections.
Of course, the "fixup" macro could be given an extra parameter to name
the containing section, but the macro can then no longer be called in
a generic way: all the calls to that macro must be manually (and
buggily) maintained to ensure that the referenced section name is
correct, some object post-processing must be done before linking,
and/or a tool must be created to implement the missing assembler
functionality. Unfortunately, such solutions are likely to be too
fragile or complex to make it upstream.
It's interesting to note that the same problem will apply for any
section containing ancilliary data for another section. In
particular, it looks like either the ABI or the assembler has had to
grow a special-case workaround for this in order to support exception
unwind information sections generated by .fnstart ... .fnend in a sane
way: the unwind information sections get called .ARM.ex{idx,tab} for
.text, and .ARM.ex{idx,tab}<section> for any other section. As a
consequence, link-time discarding can handle this information
properly, but IMHO this is a bit of a cheat and admits the general
need to create sections with names based transparently on those of
other sections, without satisfying that need. .popsection is also an
example of such a cheat: most other aspects of assmbler state still
cannot be saved and restored.
In general, it would be useful if gas supported some general
reflective abilities: i.e., the ability to query the current assembler
state (section, subsection, active instruction set, active macro mode,
etc.) and/or the ability to wrap or hook existing pseudo-ops. For
example, the above problem would almost certainly solvable using
assembler macros (albeit painfully) if wrapper macros could be defined
for the section manipulation directives (section, .text, .data, .bss,
.pushsection, .popsection, .previous). However, supporting some magic
macro parameters reflecting the assembler state would be a lot
simpler.
As an example of the kind of behaviour I think would be useful, the
macro argument qualifier could be extended to allow macros to query
the assembler state in a backwards-compatible way; something like:
.macro fixup base_section:gas_current_section_name,
old_altmacro:gas_macro_mode
.altmacro
LOCAL fixup_location
fixup_location:
.pushsection \base_section\().fixup
.long 100\@b
.popsection
\old_altmacro
.endm
Existing assembler code will continue to work just fine with this approach.
Note how this also enables a local label to be generated hygenically,
by making it possible to save and restore the macro mode. Otherwise,
.altmacro (and hence LOCAL) is hard to use safely, since the initial
macro mode is unknown and can't be restored.
Any thoughts / comments?
Cheers.
---Dave
== Last week ==
* PR47246, VFP index range on Thumb-2. Submitted and committed patch
upstream.
* Pinged two upstream submissions on gcc-patches, one for PR44557 and
the other a patch for LP:689887; still awaiting approval.
== This week ==
* Chinese New Year Holiday, I'll be off until Feb.8th.
Created a Google docs spreadsheet to help visualise the benchmark
results. The graphs are not very informative yet - too many lines and
too much noise. I'm going to have to revisit them.
Continued trying to build Android. The toolchains build fine, but
Android itself complains about -Werror, and there are a few other real
errors too. Considering I was told it built fine with GCC 4.6 and all I
needed to do was tweak 4.5 to match, I'm not terribly impressed. I'm
sinking too much time into fixing up Android, and I haven't even got to
looking at the compiler trouble. Alexander Sack has said he will try to
get me to a more appropriate starting place (I think), so I'll see what
happens there.
Discussed my maddhidi4 patch with Richard E (wearing his GCC ARM
maintainer hat). He's not convinced that my change won't make something
else produce worse code. I can't prove that it won't either, so I'm
going to have to revisit it.
Wrote up the GCC 4.6 branch policy and upgrade plan.
Just a reminder that the dial-in numbers for today's and all future
calls has changed. See:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings
for the new list. I'll hang out on IRC just before the meeting to
help the lost...
-- Michael
RAG:
Red:
Amber:
Green: qemu-linaro RC0 prerelease uploaded
Current Milestones:
| Planned | Estimate | Actual |
first qemu-linaro release | 2011-01-11 | 2011-01-11 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | handed off |
* maintain-beagle-models:
+ went through diffs between qemu-linaro and qemu-meego to
confirm we hadn't dropped any patches by mistake
+ tested qemu-linaro tree on ubuntu netbook image: boots OK
+ investigated and fixed qemu bugs caused by new x-loader
https://bugs.launchpad.net/qemu-linaro/+bug/704484
and new u-boot:
https://bugs.launchpad.net/qemu-linaro/+bug/703094
+ made merge requests to meego for a CRIS compile failure
and the x-loader bugfix
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/4http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/5
+ Went through the process of doing a qemu-linaro release
with a "RC0" prerelease as a pipecleaning exercise and
to provide a tarball to slangasek for doing packaging. Download:
https://launchpad.net/qemu-linaro/+milestone/2011.02
Release process writeup:
https://wiki.linaro.org/WorkingGroups/ToolChain/QemuReleaseProcess
* merge-correctness-fixes
+ the usual upstream mailing list monitoring and code review
+ tested and sent meego VQ(R)DMULH.s16 fix upstream:
http://patchwork.ozlabs.org/patch/80725/
+ working on a patch to fix decoding of the preload and hint space
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
SPEC
Tried to track down what was going on with lbm; it doesn't seem to
be repeatable on canis1; I'd previously seen it fail at O1 and work at
O0 and tried to chop down the flags between the two; but after adding
all the flags back in on top of -O0 it still worked and then I tried
-O1 again and it worked. Going to try on another machine, but it
might be uninitialised data somewhere.
Panda
Our panda arrived; it's now happily nestling near our Beagles and
running the 0126 headless snapshot (with 0127 hwpack). It seems fine
except
for rather slow USB and SD IO. Tip: Panda's do absolutely nothing (no
LEDs, no serial console activity) unless you put an SD card
in with the firmware on.
Libffi
Wrote the changes for armhf. Tested on arm, armhf, i386, ppc and
s390x - all happy. (Not too unsuspectingly variadic calls just work
on everything other than armhf without the api change)
Mailed Python CType list asking how much of a pain the API change
will be and any hints on what might be affected.
Awaiting sign off for submission of code.
Optimised library routines
Looked at benchmarking 'git'; I'd seen previous discussions where it
had been pointed out that it spends a lot of time in library routines;
and indeed it does spend useful
amounts in memchr, memcpy and friends on a simple git diff v2.6.36
v.2.6.37 > /dev/null of the current kernel tree produces a useful
~25second run.
One interesting observation is that the variation in the times
reported by 'time' - i.e. user, system and real, the variation in
user+system is much less than either user or
system individually and is quite stable (within ~0.7% over 10 runs).
I've just tried preloading my memchr routine in and it does get a
consistent 1-1.2% improvement which does look above the nice.
Also asked on libc-help list for suggestions as to other benchmarks
people actually trust to reflect useful performance increases in core
routines as opposed to totally
artificial ones.
Dave
== GCC ==
* Determined root cause of #Bug 598462 (corrupted profile info
with -O[23] -fprofile-use), identified work-around (using
-fprofile-correction), and verified on GCC 4.5 and 4.6
== GDB ==
* Worked on re-implementation of fix for #615978 (Failure to
software single-step into signal handler)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== GCC ==
* Internal ARM tasks that kept me busy for some time this week.
* Reworked the divmodsi4 patch based on comments. Still some test
failures that need to be investigated.
* Started working on the performance improvement plan for GCC 4.5.
* Number of patch reviews upstream.
* Looking at A9 memset issue. .
* Got an internal bug report about the 4.4 tools from someone using it
which had to do with a missing backport. This is now lp:709329.
Backported and testing.
* Backported a missing upstream patch into GCC 4.5 for bswap patterns.
* Wrote up on how to cross-test with qemu and updated that on the
Linaro wiki. Comments are welcome.
Plans:
* Find out the default number of iterations needed for steady state
with EEMBC.
* Finish testing fix for lp:709329 and get that committed.
* Finish off divmodsi4 patch and get it's testing completed and
finished.
* Submit backport for bswapsi at Os fix (PR44392) for gcc-linaro
* Finish the plan for GCC performance improvements based on what we
discussed at the
sprint.
* Away 2-1/2 days next week for an internal engineering conference.
This week
=========
- On other IBM duties until today. Now finished!
- Looked at the neon failures that Peter reported. I'm testing patches
for the ICE and the "must be a constant" error now.
- In the background, I've been trying to pin down the chromium build
failure. I can only reproduce it when running under dpkg-buildpackage:
if I run the link line manually, it works. This is reminiscent of a
problem that Dave saw elsewhere: the linker segfaulted only when run
through the normal build system.
Next week
=========
- Bernd Schmidt has committed our combined patch for #695302.
Will backport to our sources.
- Submit the neon fixes above.
- More on STT_GNU_IFUNC. There are some more tests I want to write.
Richard
Some news from the qemu mailing list that I think might be
of interest to gcc folks here:
Christophe Lyon from ST has kindly released a large
set of test cases of Neon intrinsics:
http://gitorious.org/arm-neon-tests/arm-neon-tests
(the tests themselves are more aimed at testing qemu,
so they just produce output to be compared against a
reference generated from running on hardware).
However they don't currently compile with gcc (but
are ok with armcc). From the README:
# The tests currently fail to build with GCC/ARM:
# - no support for Neon_Overflow/fpsrc register
# - ICE when compiling ref_vldX.c, ref_vldX_lane.c, ref_vstX_lane.c
# - fails to compile vst1_lane.c
# - missing include files: dspfns.h, armdsp.h
Maybe it's worth somebody having a look at this,
at least enough to find out whether the ICEs are
things we already know about or have perhaps
already fixed in linaro gcc?
thanks
-- PMM
Hi,
I am working on implementation of interleave_high/low and
extract_even/odd for NEON. The pairs of high/low (even/odd) are
"magically" united into single vzip (vuzp) instruction in the back
end, so there is no need in special support from the tree level. There
are still some test failures that I need to solve.
Ira
Hi,
I've written up a wiki page finally on the cross-testing with qemu
here. It's taken me slightly longer than expected as I was playing
around with moin-moin syntax. These are based on qemu-0.13.0 which is
what I tried when I wrote this up for some testing of a patch that I
was playing with.
https://wiki.linaro.org/WorkingGroups/ToolChain/CrossTestingQemu
Comments are welcome .
Ramana
Hi,
* I looked into the perf utility with regard to ARMv7 and raw event support
* https://wiki.linaro.org/KenWerner/Sandbox/perf
* testsuite fixes for the OpenCL GDB
* started to setup the pandaboard (currently the headless snapshot hangs
shortly after I got the bash prompt - I'm not sure what's going on here)
* On Friday I'll attend a class.
Regards
Ken
Hello,
* Submitted to mainline the patch to model Doloop for ARM
(http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) .
The ARM back-end part was reviews by Richard Earnshaw.
* Looking into EEMBC/DENbench for opportunities applying Modulo-Scheduling.
Based on partial profiling information there are SMSed hot loops. My next
step is to generate complete profiling information (including gcov info)
and execute the benchmarks on ARM machine.
Thanks,
Revital
Hi Vijay,
On Sat, Jan 22, 2011 at 9:59 AM, Vijay Kilari <vijay.kilari(a)gmail.com> wrote:
> Hello Dave,
>
> Thanks for this info.
>
> I have few more queries after looking at the results of memset on A9 & A8.
> I agree that externel bus speed matters in comparision across platforms.
>
> 1) Why memset is performance is good on A8 than A9?. any justification?
I've CC'd the linaro-toolchain list who have been working on this
topic and may be able to provide you with more information.
Cheers
---Dave
Hi there. I've had a think and done a write-up on the causes behind
the 2011.01 release failure and what should be changed. See:
https://wiki.linaro.org/WorkingGroups/ToolChain/Incidents/2011.01-X86_64
The changes involve being more explicit in the release process
document and changing the continuous build to give earlier warning.
Comments are appreciated.
-- Michael
Hello,
I have a patch for ARM that I want to test and I'm not sure what's the
procedure of testing in Linaro nor to where
it should be committed. (GCC trunk? it's currently under bug fixes mode
only).
I appreciate help with that.
Thanks,
Revital
Hello,
could you please provide some comments about the state of "-Os"
(optimising for size) in the gcc 4.5.x versions of Linaro's tool
chain?
It appears there are a number of issues with recent versions of GCC
that get triggered when optimising for size, for example
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45052http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44392
Some other projects like the Linux Foundation driven Poky (resp.
Yocto project) capitulated and stopped using -Os, see for example
here:
http://thread.gmane.org/gmane.linux.embedded.poky/2311/focus=2565
On the other hand, I can see that Linaro even adds improvements for
"-Os", see for example here:
http://thread.gmane.org/gmane.linux.linaro.toolchain/367
So I wonder what the state of these problems with "-Os" is in the
Linaro tool chain? Have these issues been solved, and is "-Os"
reliably working with the Linaro tool chain?
Thanks in advance.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd(a)denx.de
The optimum committee has no members.
- Norman Augustine
I assume we're moving the toolchain calls over to the
new confcall numbers -- but just to check, are we doing
so for this Wednesday's status call? The calendar
entry still has the old numbers...
-- PMM
==GCC==
Progress:
* Got invited to the Toolchain WG and was at the sprint in Dallas.
* Worked on fixing inconsistencies in the A9 scheduler. Now fixed up
upstream and submitted merge request for that.
* Introduced to other members of the Toolchain WG and had
conversations with everyone about what was going on within the
Toolchain WG.
* Spent some time trying to set up a Pandaboard that I borrowed
(Thanks Wookey) but the Panda didn't work. Turns out both the SD cards
were broken and or the versions of the boot loader / firmware were
broken.
* Worked through the patch list with Andrew trying to understand the
patches that exist in the backend for the delta between gcc-linaro and
upstream.
* Did some patch review upstream to help ease the backlog.
* Flushed some patches out from my patch queue upstream.
* Fixed the scheduler issue upstream and submitted a merge request for
it in linaro gcc-4.5 since it really fixes the A9 scheduler.
* Some ARM internal tasks with respect to transition into Linaro.
* Worked on the divmodsi4 issue and testing my patch with trunk.
Pushed my branch into launchpad if someone wants to see the code. Does
the right thing for the case we want but won't help with the SPEC2k
case really because that's a case with Os. But a good size improvement
anyhow and something that will help with divmod cases with 2 variables
rather than where a constant is used.
* Set up a natty chroot on one of our ve2 boxes which is now running a
bootstrap for all languages we are interested in and collecting test
results from trunk upstream.
* Worked through the new starter guide and still have an issue with
accessing wiki.linaro.org
Absences:
February 1-3:ARM Internal engineering conference.
== GCC ==
* Analyzed root cause of #685352 (libplymouth2_0.8.2-2ubuntu6 and
later give ragged splash and text rendering), opened mainline
bugzilla PR rtl-optimization/47299
* Submitted backport merge request for the #685352 fix
* Successfully completed profiled-bootstrap run on ARM
* Investigated profiled pyhton package build problems
== GDB ==
* Forward-ported GDB patch to support ARM hardware watchpoints
* Determined root cause of #615978 (Failure to software single-step
into signal handler), started discussion of proposed fix
* Miscellaneous Launchpad bug maintenance
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
=A9 Qemu=
I've spent most of the week looking at QEmu emulation of SMP A9. The
model (of a realview-pbx-a9) doesn't have any working block IO;
I spent some time looking at trying to get SD working and got part
way, but I fell back to using NFS root.
It seems to work OK for basic CPU emulation, and SMP 'works' in the
sense that the guest sees multiple CPUs;
however QEmu is restricted to only using one host CPU core for
multiple guest CPUs, so it's of limited
help in debugging SMP code.
Video doesn't seem to work either.
To get to that point it does need a bunch of patches to QEmu, most of
which Peter Maydell already knows of;
I've put some notes here :
https://wiki.linaro.org/Internal/People/DaveGilbert/QEMUA9SMP
Note that the realview-pbx doesn't currently have a Linaro hardware
pack; I used a kernel from ARMs website
and a 2.6.37 I built myself.
=SPEC=
SPEC ref got quite a far way through on Canis (with half-duplex
ether), however 'lbm' failed when running
in ref mode (while having worked in test and train) giving a different
output; it takes quite a long time to fail.
= Perf =
I sent an updated version of my patch for perf's thumb annotation upstream.
and apparently we have a Panda on the way.
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | handed off |
first qemu-linaro release | 2011-01-11 | 2011-01-11 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
* submitted meego patch upstream to print instruction
boundaries in TCG op-level debug dumps:
http://patchwork.ozlabs.org/patch/79298/
* reviewed a couple of patches by Christophe Lyon from ST
which fix some problems with VMULL and friends
(together with a qemu-meego patch in the same area)
* lots of rebasing work to get a qemu-linaro tree into
shape. Mostly putting together my exploded set of
ARM patches from the meego tree with a rebased set
of omap patches. I have something which I think is about
right but I need to test it.
* patches submitted to qemu upstream:
+ add -version option to qemu-user:
http://patchwork.ozlabs.org/patch/79635/
+ fix writing default vector address in PL190:
http://patchwork.ozlabs.org/patch/79712/
+ print instruction boundaries in TCG op-level debug dumps:
http://patchwork.ozlabs.org/patch/79298/
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
I had a few spare cycles this afternoon on my A9 board so I got it to
build and run 72 variants of CoreMark with the 2011.01 compiler. The
results are here:
https://wiki.linaro.org/MichaelHope/Sandbox/CoreMark1
I haven't looked deeply into it and the usual disclaimer about
benchmarks applies, but there's some interesting stuff there. The ARM
to Thumb-2 drop is more than expected. Tuning for Cortex-A9 generally
makes things worse.
-- Michael
Hi,
One of the things that came up in yesterday's chat was about ARM vs
Thumb2. If some folks are interested in a high level overview that
doesn't go into too many details with respect to the ISA that I know a
compiler writer will be interested in you could read this presentation
unless you've seen it. At the minute the canonical way of figuring out
more about Thumb2 is really to read the ARM-ARM.
https://wiki.ubuntu.com/Specs/M/ARMGeneralArchitectureOverview?action=Attac…
This was a presentation that Dave Rusling gave at the UDS in Belgium
last year, it roughly covers the architecture at a high level and
talks about Thumb2 vs ARM , why Thumb2 , the evolution of the ARM
architecture etc. It's not a lot of detail but quite a high level overview.
HTH
cheers
Ramana
Hi,
* finished SLP for reduction patch. The loop in DenBench that needs
this feature also requires support of load permutation. I am
considering to implement that too. I looked for other occasions that
need this feature, but only found loops that are not vectorizable. So,
I am not sure I'll proceed in this direction.
* looked into extract_even/odd and interleave_high/low implementation
on ARM as a backup plan for the case we don't have special load/store
support on time for the next release. Even though NEON VZIP and VUZP
instructions can perform both even and odd (and high and low)
computations simultaneously, I don't see how we can express that at
the tree level.
* looking into ffmpeg
* non-Linaro issues
Ira
Hi ,
So rather than manually backporting a patch from upstream "trunk" I
decided to try using bzr for this yesterday. Even though this was a
slow process and thanks to some help from Michael on IRC last night I
think I got it right finally. However I can't get bzr visualize to
show me the revision history correctly with the merge from the
upstream repository.
So what I did was to do the following
bzr branch 4.5 new-branch
cd new-branch
bzr merge -c XXX lp:gcc where XXX is the bzr revision number of the
svn commit that I'm tracking upstream.
gcc/Changelog merge fails - there is a conflict. Expected because you
are merging trunk's gcc/Changelog into a version of the Changelog file
in our tree
Thus I reverted changes that were brought in by copying in a backup of
gcc/Changelog that I had made before the merge
bzr resolved gcc/Changelog
edit Changelog.linaro
Add changelog entry
bzr commit
bzr push lp:gcc private repository
Create merge request upstream.
Does this look sensible to people or do folks follow other recipes?
cheers
Ramana
Hi there. I've cancelled Monday's meeting as the CodeSourcery people
are at their annual meeting and others are travelling back from the
sprint. Please come to the Wednesday stand up call if you are
available.
-- Michael
Got a complete run of SPEC Train on Orion board - all working.
Kicked off a SPEC ref run - canis1 died.
Gathered a full set of 'perf record's for all of SPEC on silverbell;
and had a quick look through them;
there aren't too many surprises; a few things that might be worth a
look at though.
(Not as much using libc functions as I hoped).
There are some odd bits - chunks of samples landing apparently outside libraries
that aren't obvious what's going on.
Sent tentative patch for Thumb perf annotate issue (bug 677547) to
lkml for comments.
Started on libffi variadic fixing.
Caught the qemu pbx-a9 testing from PM; got qemu built and getting a
handful of lines of output
both from a kernel from arm.com's site and a linaro-2.6.37 that I built for it.
Dave
RAG:
Red:
Amber:
Green: issues I wanted to nail at this sprint all handled;
a couple of blueprints handed off to other people
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | |
At the Linaro/Ubuntu sprint/rally in Dallas this week
* qemu-continuous-integration
** spoke to Paul Larson (in Linaro Validation team) and
handed this blueprint off to him (with a clarification
of the requirements from qemu's point of view)
* maintain-beagle-models
** we've agreed to start doing official "Linaro Qemu"
releases which are essentially going to be the meego
tree plus point fixes for things. These will be every
month with the rest of the toolchain group releases.
Ubuntu will also take these releases to replace the
current use of the qemu-kvm tree to provide ARM models
* verify-a9-pbx-support
** this blueprint has been handed off to David Gilbert
* merge-correctness-fixes
** wrote and posted patches:
*** to restore IT bits after unexpected exceptions
(including fix of Linux usermode bug where it wasn't
clearing the IT bits when entering a signal handler)
*** to include opcode hex in disassembly of ARM insns
*** fixing a compile failure for a previous change when
the host linux system didn't have linux/fiemap.h
** I have managed to split the huge "lots of ARM TCG fixes"
commit in the meego tree up into 65 more self-contained
commits. These now need reordering and possibly some
may be recombined.
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
2011: Holiday 21 Jan, 22 Apr - 2 May.
Hi there. Unfortunately the Linaro GCC 4.5 2011.01-0 release has a
failure in the x86_64 compiler causing it to fail during the initial
build. We're working on triaging the problem at the moment.
ARM and i386 targets are not affected. I'll send a new announcement
when the problem has been fixed and the replacement 2011.01-1 release
is available.
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.4 and Linaro GCC 4.5.
The Linaro Toolchain Working Group is pleased to announce the release of
both Linaro GCC 4.4 and Linaro GCC 4.5.
Linaro GCC 4.4 is the sixth release in the 4.4 series. Based off the
latest GCC 4.4.5, it is a maintenance release that fixes one problem
found through use.
Linaro GCC 4.5 is the sixth release in the 4.5 series. Based off the
official GCC 4.5.2 release, it includes many ARM-focused performance
improvements and bug fixes.
Interesting changes include:
* Improved optimization of multiple load instructions, and
multiply-and-accumulate.
* -fshrink-wrap optimization for better use of function prologues and
epilogues.
* plus, various other bug fixes, and minor improvements.
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2011.01-0
and
https://launchpad.net/gcc-linaro/+milestone/4.4-2011.01-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Hi,
* Continued with testing and implementation of reduction support in SLP
* Found a major problem in vectorization of if-converted data
accesses. Looked into other ways to solve the problem.
* Spent some time on non-Linaro vectorization plans
* Unsuccessfully tried to make the board work
Ira
== Last Week ==
* Displaced stepping support for 32-bit Thumb insns are done, unless
new bugs are found.
Decode writeback for str/ldr in Thumb 32-bit. Fix bugs in supporting
LDR/STR/LDRH/STRH in both ARM and Thumb.
Test GDB to handle various 32-bit Thumb insns in displaced stepping.
Code cleanup and write some new test cases.
* One day New Year holiday on Jan. 3rd.
== This Week ==
* Linaro Sprint.
* Get patches for Linaro GDB bugs approved on upstreams as many as
possible.
--
Yao Qi
Hello,
While testing SMS on Crotex-A9 I see that the latency of load instruction
is 1
cycle when compiling with -mcpu=cortex-a9 -mthumb -mtune=cortex-a9 -O3.
Below is a snippet from the SMS dump file showing the DDG, created for the
loop in foo function, which depicts the edge between the load of input[i]
(insn 181) and the mult instruction (insn 184).
[181 -(T,1,0)-> 184] is the true dependence edge created between the
two insns; with latency of 1.
On Crotex-A8 the latency of the load is 3 as expected.
I've read in crotex-a9.md file that loads should have a latency of 4 cycles
so I just wanted to check if I should have used other combination of flags
for Crotex-A9 or the load latency should indeed be of 1 cycle here.
Thanks,
Revital
int foo (int max, signed short *input, int y)
{
int i, accum;
for (i = 0; i < max; i++) {
accum += (signed int) input[i] * (signed int) input[i+y];
}
return accum;
}
The snippet from the DDG:
Node num: 2
(insn 181 178 184 13 (set (reg:SI 216 [ D.2019 ])
(zero_extend:SI (mem:HI (plus:SI (reg:SI 319 [ ivtmp.34 ])
(reg:SI 345)) [2 MEM[base: D.2076_257, index:
D.2079_226, offset: 0B]+0 S2 A16]))) tmp.c:7 714
{*thumb2_zero_extendhisi2_v6}
(nil))
OUT ARCS: [181 -(A,0,1)-> 176] [181 -(T,1,0)-> 184]
IN ARCS: [184 -(A,0,1)-> 181] [176 -(T,1,0)-> 181]
Node num: 3
(insn 184 181 234 13 (set (reg/v:SI 209 [ accum ])
(plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 212
[ D.2013 ]) 0))
(sign_extend:SI (subreg/s/u:HI (reg:SI 216 [ D.2019 ]) 0)))
(reg/v:SI 209 [ accum ]))) tmp.c:7 64 {maddhisi4}
(expr_list:REG_DEAD (reg:SI 216 [ D.2019 ])
(expr_list:REG_DEAD (reg:SI 212 [ D.2013 ])
(nil))))
== Last Week ==
* Worked to improve libunwind test suite results using new ARM-specific
unwinding. Managed to reduce the number of failures, but some tests
still fail.
* Tested latest ltrace upstream tree and determined that It Is Good.
Hopefully, this marks the cusp of a new release that includes my work.
== This Week ==
* Finish up work with libunwind. Update wiki brain dump.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743