Hi,
* regtested vzip/vuzp patch
* looked into big-endian build
* applied all the required patches and checked that Viterbi gets
vectorized giving ~2x performance improvement (compiled with
cross-compiler)
* looked into vld/vst implementation - mostly discussions with Richard
* DenBench analysis:
- there are loops that should get vectorized with vzip/vuzp patch,
I'll check them next week
- sad8_c (hot function from mp4encode) needs reduction SLP (which I
implemented several weeks ago), and an ability to jump unknown stride
in loop SLP - I am looking into this
Ira
On Wednesday 09 February 2011 20:25:32 Will Deacon wrote:
> > - Why is architecture 0x4 not supported? This seems to be the variant of
> > the v7 debug architecture with memory-mapped registers. Apparently the
> > IGEP only supports this version ... Do you know what the
> > Beagle-/Pandaboard and other clones do? What would it take to support this
> > architecture variant? Given the widespread use of those boards, it would
> > be really nice if we could support hardware debugging on them ...
>
> The memory-mapped interface is hugely unreliable in real hardware because
> you have to calculate the address of the memory-mapped debug registers by
> using a base and offset, which are hardcoded in some information registers.
> Unfortunately, I've never found a board where these registers have been
> programmed correctly so (a) I had nothing to test my code with (b) few people
> would be able to use it and (c) there's not really a safe way to go around
> poking random areas of memory.
So the only problem is that it's board specific? That's something we
know how to deal with -- all I/O components have some random board
specific address, and we put them in a platform device that is
listed in the board file. This should be easy enough to do for another
register area, though it means we have to do it separately for each board.
> > - Which hardware is supported? Can you recommend a board I should be
> > using to verify GDB support is working?
>
> The simple rule is Cortex-A8 is unsupported and Cortex-A9 is supported.
> The A5 should work (untested) and the A15 will need a bit of hacking to
> get it supported.
Is that because A8 is memory mapped and A9 uses CP14, or is there another
problem with A8?
Arnd
Hello Will,
I've been trying to get GDB support for hardware watchpoints/breakpoints
going. I've ported Matthew's GDB patch to current mainline, and am running
this under a 2.6.37-1002-linaro-omap kernel on an IGEPv2 board.
However, something seems to be not quite working: I'm seeing this kernel
message on boot:
hw-breakpoint: debug architecture 0x4 unsupported.
and then at runtime, the result of a PTRACE_GETHBPREGS call for register 0
is 0x04000106:
debug architecture: 4
watchpoint size: 0
nr. watchpoints: 1
nr. breakpoints: 6
This leads me to a couple of questions:
- It seems odd that the kernel says it doesn't support the debug
architecture, but then reports to user space that 1 watchpoint and 6
breakpoints are supported ... GDB will never use the watchpoint, because
the maximum watchpoint size is reported as zero, but GDB will attempt to
use the breakpoints. Setting a breakpoint will appear to succeed, but then
the breakpoint just never triggers. The kernel should IMO be more
consistent in how unsupported configurations are handled ...
- Why is architecture 0x4 not supported? This seems to be the variant of
the v7 debug architecture with memory-mapped registers. Apparently the
IGEP only supports this version ... Do you know what the
Beagle-/Pandaboard and other clones do? What would it take to support this
architecture variant? Given the widespread use of those boards, it would
be really nice if we could support hardware debugging on them ...
- Which hardware *is* supported? Can you recommend a board I should be
using to verify GDB support is working?
Thanks for your help in getting this working!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I'm working in the Linaro toolchain team on adding ARM support for GNU
indirect functions (STT_GNU_IFUNCs). The indirect function feature
requires a new relocation type, which is typically called R_FOO_IRELATIVE.
I'd therefore like to propose a new R_ARM_IRELATIVE relocation type for
the ARM EABI.
This relocation is only used in ET_EXEC and ET_DYN objects. If the
object has a PT_DYNAMIC tag, then the relocation may only appear in
the DT_REL(A) table; it cannot appear in the DT_JMPREL table.
(Note that this is a deliberate divergence from the x86 and x86_64
behaviour, which does allow the IRELATIVE relocation to be used in
DT_JMPREL table, but which requires it to be applied at load time,
regardless of bind-now vs. lazy semantics. However, the proposed
ARM behaviour matches that of other targets like PowerPC.)
Static ET_EXEC objects may have R_ARM_IRELATIVE relocations. In this
case, the relocations are stored in a relocation table that contains no
other type of relocation (not even R_ARM_NONE). The static linker
defines two symbols:
__rel_iplt_start, which the linker points to the start of this table
__rel_iplt_end, which the linker points to the last byte of this table
plus one.
The two symbols are equal if the executable has no R_ARM_IRELATIVE
relocations. It is the executable's responsibility to apply these
relocations as appropriate. If the static linker emits a symbol table,
then it is not defined whether the linker includes __rel_iplt_start and
__rel_iplt_end in that symbol table.
The static linker may (or may not) define __rel_iplt_start and
__rel_iplt_end in dynamic objects. However, if it does define them,
the symbols must refer to part of the DT_REL(A) table, and it is still
the dynamic linker's responsibility to apply the relocations.
An R_ARM_IRELATIVE relocation applies to all bits of a 4-byte field.
There are no alignment restrictions on the field. The relocation
value is:
call(B(S) + A)
where call(X) represents the value of r0 after performing an indirect
branch-with-link-and-exchange (BLX) to address X.
The dynamic linker must have applied all earlier DT_REL(A) relocations
before calling X. It is undefined whether later DT_REL(A) relocations
have been applied or not, and X must not make any assumptions about the
status of those relocations.
If there is an R_ARM_IRELATIVE relocation with symbol S and addend A,
then the relocation value:
call(B(S) + A)
is considered to be a load-time constant. It is possible for an object
to have more than one R_ARM_IRELATIVE relocation with the same value
of B(S) + A, and in such a case, it is not defined whether the dynamic
linker invokes the target function each time, or whether it caches the
results of earlier calls.
I realise this isn't the cleanest extension in the world. As Alan Modra
noted on the binutils list, the choice of __rel_iplt_start and __rel_iplt_end
is particularly unfortunate, since the relocations are not specific to
"PLTs". However, the GNU extension has been defined this way,
so unfortunately there isn't much room for target-specific variation.
Thanks,
Richard
Hi,
I'd like to check vzip/vuzp patch in big endian mode. But when I try
to compile with -mbig-endian flag, I get
> ~/mainline/bin/bin/gcc -O3 -mfloat-abi=softfp -mfpu=neon neon-vtrnu8.c -mbig-endian
/home/irar/mainline/bin/lib/gcc/armv7l-unknown-linux-gnueabi/4.6.0/../../../libgcc_s.so.1:
could not read symbols: File in wrong format
collect2: ld returned 1 exit status
What am I missing?
Thanks,
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2011.02-0 is the third release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes
and enhancements.
Interesting changes include:
* Backtracing is more reliable through using the ARM specific
exception tables for unwinding
* Better supports debugging functions compiled with GCC's -fstack-protector
* Multiple testsuite related fixes
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2011.02-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.4 and Linaro GCC 4.5.
Linaro GCC 4.5 is the seventh release in the 4.5 series. Based off the
latest GCC 4.5.2, it includes many ARM-focused performance
improvements and bug fixes.
Interesting changes include:
* Improved code generation in the __sync primitives
* Better modelling of the Cortex-A9 NEON pipeline
* Added a performance improvement that converts a tree of ifs into a switchs
* Many bug fixes
Linaro GCC 4.4 is the seventh release in the 4.4 series. Based off the
latest GCC 4.4.5, it is a maintenance release that fixes one fault
found with offsets on NEON loads.
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2011.02-0https://launchpad.net/gcc-linaro/+milestone/4.4-2011.02-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.02-0.
Linaro QEMU 2011.02-0 is the first official release of qemu-linaro. Based
off upstream qemu, it includes a number of ARM-focused bug fixes and
enhancements.
- This initial qemu-linaro release includes all the ARM code generation
fixes from the qemu-meego tree; these are mainly Neon related
- The OMAP3 support from qemu-meego is also included
- Various bugs which prevented newer Linaro snapshots from booting
on the beagle model have been fixed
- Bugs causing linaro-media-create to print warnings about unimplemented
syscalls and ioctls have been fixed
Known issues:
- There is no support for USB keyboard or mouse, so only a serial console
is usable (#708703)
- Images built with linaro-media-create's --swap_file option will not
boot (#713101)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.02
Binary builds of this qemu-linaro release are available for users of
Ubuntu. Natty users can find qemu-linaro 2011.02-0 in the Ubuntu archive.
Users of Ubuntu 10.04 LTS and Ubuntu 10.10 can find packages in the
linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
== Linaro GCC 4.5 ==
Reviewed, tested and merged all the outstanding patches waiting to go
into Linaro GCC 4.5. Michael reported that there was a build failure on
i686 and amd64. I attempted to reproduce this but my builds completed
successfully - very strange. Eventually I found that I had a corrupted
checkout and managed to reproduce the problem - thanks bzr! The problem
is in Tom's recent changes to stmt.c, so I informed him and backed out
the patches, temporarily.
Spun the Linaro GCC 4.4 and 4.5 release tarballs and passed them to
Michael Hope for final testing.
== GCC 4.6 ==
Tested a more recent version of GCC 4.6 and pushed it to the bazaar
repository. Already out of date by the time testing finished of course,
but never mind. The number of test failures is greatly reduced. Started
another build/test with an even more up-to-date check-out.
Begun work merging the 4.5 patches into 4.6. Pushed 1 patch upstream.
Got another ready to go, once I've tested it.
== Android ==
Tried to unpick a large patch I was sent that supposedly added Android
support to Linaro GCC 4.5. The patch was suspicious from the start
because it had large changes to gcc/ChangeLog that clearly backed out
the 4.5.2 release. After comparing it against various sources I
concluded that it was a 4.6 snapshot from last May with (at least some
of) the Linaro patches forward ported, and the release numbers fudged to
look like it was 4.5.2 based. This was not terribly helpful - I can't
very well backport that into our 4.5 branch!
== Upstream GCC ==
Upstream patches requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* Kazu's VFP testcases:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html
== Last week ==
* Backported the fixes for lp693502, lp710623 and lp710652 to linaro 4.6
and linaro 4.5. Tested and sent merge requests.
* Wrote several more ifunc tests, and fixed the bugs they showed up.
Found that ARM generates unnecessary dynamic relocs against GOT entries,
so fixed that as a prerequisite. Improved the tracking of STB_LOCAL
ifuncs, so that they're treated more like STB_GLOBAL.
* Submitted a request for R_ARM_IRELATIVE to be added to the ARM EABI.
== This week ==
* More ifunc.
I'm away next week (14th-18th)
Richard
Hello,
Matthias noticed the following ICE when attempting to build the SPU
compiler from the Linaro GCC 4.5 sources:
../../../../src-spu/libgcc/../gcc/libgcc2.c: In function '__fixunssfdi':
../../../../src-spu/libgcc/../gcc/libgcc2.c:1344:1: internal compiler
error: in
spu_expand_mov, at config/spu/spu.c:4575
It turns out that this is due to the new "extension elimination" pass that
was recently added in Linaro GCC, as port from the CodeSourcery compiler.
This patch has also been proposed, but not yet included upstream.
The problem is that this patch seems to frequently introduce instructions
that *set* a sub-word lowpart subreg of a register. Now such
instructions, according to the docs, are probably valid RTL, but since the
effect of the instruction onto the highpart of the register is deliberately
left unspecified, they tend to be very infrequently used. Probably
because of this, there seem to be parts of the compiler that simply don't
handle such instructions correctly. This has been already noticed in the
case of the RTL loop optimizers (see discussion here
http://gcc.gnu.org/ml/gcc/2010-11/msg00552.html).
The failure in the SPU back-end is another instance of the same problem.
SPU needs special code to handle subregs (since a "lowpart" SImode subreg
of a DImode register is not actually valid on the SPU, because SImode
values live in bytes 0..3 while DImode values live in bytes 0..7 of the
otherwise big-endian 16-byte SPU registers), and this code simply aborts
when given an assignment to a sub-word lowpart subreg.
Now, I guess there's two ways forward: either the outcome of the ongoing
discussions on gcc-patches is that it is in fact not a good idea to
generate such sets, and the EE pass is subsequently rewritten to avoid
them; or else, if those instructions are considered valid, I'll have to
extend the SPU move expander to handle them. Thoughts?
Matthias, if you need a quick workaround for now, I guess you could disable
the new pass for SPU by adding a line "flag_ee = 0;" to
spu_override_options.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
I've had a go with running the QEMU release candidate. Short story is
that it boots to a prompt against the 11.05 alpha2 release so I'm
happy.
It was a messy road so I've written up my train of though here:
https://wiki.linaro.org/MichaelHope/Sandbox/QEMU
Note that if you follow the instructions on:
https://wiki.linaro.org/Releases/GettingInstallingTesting
and turn on a swap file then it halts during boot.
-- Michael
== String routines ==
* After some discussions about IT semantics managed to shave a
couple of instructions out of a couple of routines
* Got around to trying a suggestion that was made some months ago,
that LDM is faster than LDRD on A9's; and indeed
it does seem to be in some cases; those cases seem pretty hard to
define though - it's no slower than LDRD, so it seems
best to avoid LDRD.
* Digging around eglibc's build/configure system to see how to add
assembler routines to only get used on certain build
conditions (i.e. v7 & up)
== SPEC ==
* Compiled lbm -O2 and ran it on our local panda and on Michael's
ursa1 - it seems happy (with a drop of swap); so I'd say that
confirms the issues I previously had were local to something on canis.
That's a bit of a pain since it's the only machine with enough
RAM to run the rest of the suite.
== Other ==
* Tested a headless Alpha-2 install on our Beagle C4 - mostly worked
* Tested qemu-linaro release on the realview-pbx kernel/nfs setup I had
* A simple smoke test for pldw on qemu
* Tripped over ltrace not working while trying to profile git's use
of memcpy and memcmp; it does some _very_ odd things;
it's predominant size of memcpy seems to be 1 byte.
== GDB ==
* Prepared Linaro GDB 7.2-2011-02.0 release
* Committed two patches to implement LP #661253 (Improve
backtrace by using ARM exception tables) to mainline and
Linaro GDB 7.2
* Provided two follow-on fixes to the patch for LP #616000
(Handle -fstack-protector prologue code); both applied to
mainline and Linaro GDB 7.2
* Backported mainline fix for LP #685494
(gdb.xml/tdesc-regs.exp failure) to Linaro GDB 7.2
* Identified root cause of LP #711375 (gdb internal error
in inline_frame_this_id trying to debug a qemu target);
committed fix to mainline and Linaro GDB 7.2
* Worked on re-implementation of fix for LP #615978 (Failure
to software single-step into signal handler)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber: DATE/QEMU conference still hasn't confirmed I have a place...
Green: qemu-linaro RC2 prerelease uploaded
Current Milestones:
| Planned | Estimate | Actual |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | handed off |
* maintain-beagle-models:
+ RC1 packaging flushed out some bugs:
++ we include some binary blobs which we're better not distributing:
https://bugs.launchpad.net/qemu-linaro/+bug/709965
++ a couple of array overruns don't compile with our picky armel gcc:
https://bugs.launchpad.net/qemu-linaro/+bug/709711https://bugs.launchpad.net/qemu-linaro/+bug/711272
(both fixes sent and applied upstream)
++ we ought to include a fix for the "swp" in qemu-lock.h:
http://patchwork.ozlabs.org/patch/81205/
(patch sent upstream, no comment on it yet)
+ ...so I have rolled an RC2 tarball with these fixed
+ investigated why USB keyboard model doesn't work: it turns out
that the OMAP host USB model is basically just a stub
+ some warnings from qemu about bad width accesses are a symptom of
a nasty disagreement between gcc and the kernel about getting atomic
32 bit accesses:
http://www.spinics.net/lists/arm-kernel/msg113002.html
+ warnings from qemu about access to a nonexistent i2c register
appear to be a kernel bug in the i2c driver for OMAP36xx:
https://bugs.edge.launchpad.net/linux-linaro/+bug/645324
+ investigated a hang running images built with linaro-media-create's
--swap_file option. Still digging but I suspect the Linux driver
doesn't cope with an MMC card which can erase in zero time...
* merge-correctness-fixes
+ the usual upstream mailing list monitoring and code review
+ sent patch for PLI and hint space decoding fixes:
http://patchwork.ozlabs.org/patch/81711/
* misc
+ some non-Linaro time this week; may be more next week
+ some advance planning of what we might want to do with QEMU
in the future
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hi,
* I continued to setup the pandaboad but ran into #708883
- tested vaious various hwpacks and headless images
- used different power supplies and several SD cards
- built various test kernels from the linux-linaro-natty.git
* looked into the crash utility
* https://wiki.linaro.org/KenWerner/Sandbox/crash-utility#future%20areas
Regards
Ken
Trying to build qemu on a beagle board:
CC sparc64-linux-user/translate.o
cc1: out of memory allocating 26975704 bytes after a total of 70742016 bytes
# ls -lh target-sparc/translate.c
-rw-r--r-- 1 root root 191K Jan 31 12:05 target-sparc/translate.c
ie gcc wants (at least) 100M of RAM trying to compile a 190K sourcefile.
(and probably more overall since the board has 500MB RAM total and
it hit the out-of-memory condition).
This seems a bit excessive to me, but do we consider it enough of
a bug to be worth looking into?
(I believe this source file has caused compile failures on the buildds
too, which have rather more RAM/swap than my beagle.)
-- PMM
Hello,
Profiling Denbech:
* The profiling information on x86 indicate that some benchmarks might need
to run longer as helper functions such as t_run_test are reported to be
hot.
So I've increased the time each benchmark is executed and will continue to
experiment with that for the problematic benchmarks until I start to see
more reasonable results.
* Opened PR711819 after having an ICE running DENbench with trunk (natively
built on ARM machine).
- I've started to look at an old patch for SMS that was written in 2005 by
colleague; Mostafa Hagog; to place the register moves in free slots.
Currently, they are placed greedily before each def thats needs them.
Thanks,
Revital
Hi,
I continued to work on vect_interleave and vect_extract implementation on NEON:
* debugged the compiler to find out what's the problem with
neon_vzip/vuzp<mode>_internal
* fixed it following Uli's advice
* checked how neon_vzip/vuzp<mode>_internal work for intrinsics by
writing tests
* fixed the patch according to Uli's comments
* now fully testing the patch
Thanks,
Ira
Hi,
I am trying to implement interleave_high/low and extract_even/odd
using vzip and vuzp instructions. I am attaching a patch that attempts
to do that. It uses already existing neon_vzip<mode>_internal. The
problem with it is that it doesn't express the fact that the two
outputs of vzip depend on both inputs, which causes wrong code
generation in CSE:
for
(a,b)<- vzip (c,d)
and
(e,f) <- vzip (g,d)
CSE decides that b==f, since on RTL level b and f depend only on d.
Here is neon_vzip<mode>_internal:
(define_insn "neon_vzip<mode>_internal"
[(set (match_operand:VDQW 0 "s_register_operand" "=w")
(unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")]
UNSPEC_VZIP1))
(set (match_operand:VDQW 2 "s_register_operand" "=w")
(unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")]
UNSPEC_VZIP2))]
"TARGET_NEON"
"vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
[(set (attr "neon_type")
(if_then_else (ne (symbol_ref "<Is_d_reg>") (const_int 0))
(const_string "neon_bp_simple")
(const_string "neon_bp_3cycle")))]
)
Is there a way to properly mark the dependence?
Thanks,
Ira
Hi; this is a note to say that we have now produced a prerelease
tarball of qemu-linaro. (The first formal qemu-linaro release will
happen in sync with other toolchain group releases on 8th Feb.)
This prerelease is primarily to pipeclean the release process and
to allow work to start on producing Ubuntu and Linaro packages;
however it does include a number of useful bugfixes which are
required if you want to be able to boot a recent Linaro snapshot
on the beagle model. So the enthusiastic might like to build it
from source and give it a spin.
Like the Linaro kernel trees, the qemu-linaro tree aims to only
include patches we are confident will go upstream; at the moment
this means the OMAP3 support and ARM correctness fixes from
the qemu-meego tree, based on the qemu upstream trunk.
You can download the source tarball from:
https://launchpad.net/qemu-linaro/+milestone/2011.02
-- Peter Maydell