Hi,
* continued to look into latrace and found an issue in case a dynamic
library gets unloaded. Otherwise latrace looks quite good on ARM.
https://wiki.linaro.org/KenWerner/Sandbox/latrace
* chasing bugs:
- After a lot of testing Andy Green has made a big step forward in
finding the root cause for the shut-down issue of my PandaBoard.
The PMIC is seeing an overcurrent and issues an interrupt that gets
ignored by current kernels. Then the PMIC shuts the board down for
safety reasons. As a workaround Andy has made a kernel patch for the
twl6030 driver that enables all interrupt sources. The kernel will
acknowledge the overcurrent reported by the PMIC and the board survives.
A patched kernel binary can be found at:
https://wiki.linaro.org/KenWerner/Sandbox/708883
- While testing Andys patches on the linaro natty kernels I ran into
https://bugs.launchpad.net/bugs/720055
- The flash-kernel utility doesn't work on the PandaBoard because the
subarch check expects omap4 instead of omap:
https://bugs.launchpad.net/bugs/721147
- Looked into the apr fail (process shared mutex's fail on armel v7).
Their mutex functionality can be mappped to various methods, but only
pthread is of interest here. The code relies on pthread_mutex_lock and
pthread_mutex_trylock which is implemented by the (e)glibc. The c library
uses GCCs __sync primitives if eglibc >= 2.12.1-0ubuntu11 and GCC >=4.5.
The testprocmutex testcase passes now.
https://bugs.launchpad.net/bugs/604753
Regards
Ken
"Will Deacon" <will.deacon(a)arm.com> wrote on 02/16/2011 01:07:09 PM:
> > I've now built a kernel with CONFIG_ARM_ERRATA_720789 enabled, and the
> > symptoms indeed seem to have disappeared completely ...
>
> Yup - that's because without it, invalidating a TLB entry for a
particular
> process isn't broadcast correctly, so you can end up using the old
(pre-COW)
> mappings if you're running on a different core.
OK. So I guess the only remaining questions is: if this hardware needs the
errata fix to work properly, shouldn't it be automatically selected by the
kernel configure logic? Note that this appears to happen for certain OMAP
boards, see arch/arm/mach-omap2/Kconfig:
config ARCH_OMAP4
bool "TI OMAP4"
default y
depends on ARCH_OMAP2PLUS
select CPU_V7
select ARM_GIC
select PL310_ERRATA_588369
select ARM_ERRATA_720789 <<=====
select USB_ARCH_HAS_EHCI
But this does not happen for the vexpress; arch/arm/mach-vexpress/Kconfig
has only:
config ARCH_VEXPRESS_CA9X4
bool "Versatile Express Cortex-A9x4 tile"
select CPU_V7
select ARM_GIC
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hello,
* Continue looking into DENbench benchmarks.
* While testing SMS I realized that my current implementation of doloop
pattern for ARM does not follow SMS's requirement to have the doloop
instructions be decoupled from the other loop's instructions. This happens
because doloop uses CC register which might be used elsewhere in the loop.
I am looking into a solution for that.
Thanks,
Revital
Hi,
This week I looked into DENBench:
* sad8_c (hot function from mp4encode) needs SLP reduction, but it
also contains cond_expr which cannot be vectorized as reduction, so I
don't think there is anything I can do here
* fdct_int32 (another hot function from mp4encode) now gets vectorized
with vzip/vuzp patch, but the vectorization causes performance
degradation here because of multiple register spills. I also noticed
that vectorizer costs are not set for NEON, i.e., it uses default
costs. So, I am now working on costs for NEON and adding registers
consideration into vectorizer's cost model.
I also did some general vectorization research, checking opportunities
of collaboration with GRAPHITE pass and auto-parallelization.
Ira
I mentioned in the toolchain standup call that I'd done a quick
estimate of the work required to support vexpress, so I thought I
might as well clean it up a little and post it.
This is a quick summary and time estimate for adding Versatile
Express support to qemu. The general idea is that most of the
components on this board already have QEMU implementations
(since they're standard ARM primecells used in versatile/realview),
and we can live without the few major components that aren't
implemented (maybe we'd need dummy implementations if the
kernel prods them on startup.)
Components already supported by QEMU:
-------------------------------------
A9MPx4
PL050 keyboard, mouse
SMCS LAN9118 ethernet
PL011 UARTs
SP804 timers
Components with a near match in QEMU:
-------------------------------------
PL111 CLCD -- qemu has a PL110
PL180 MMC card -- qemu has a PL181
-- both cases should either just work or be fairly trivial tweaks
Components not supported by QEMU:
---------------------------------
PL041 audio
compact flash
two-wire serial bus (for PCI-express switch config and DVI-I displays)
ISP1761 Philips USB controller
User switches and LEDs -- vexpress specific, but trivial to do
Components where a dummy implementation should be sufficient:
-------------------------------------------------------------
PL310 L2 cache controller
PL341 dynamic memory controller
PL354 static memory bus controller
trustzone controllers
Other required work:
--------------------
The usual knitting for interrupts, clocks, reset etc etc.
Summary
-------
Assuming we're happy not to worry about support for
audio, USB, two-wire serial bus or compact flash, this
is about two weeks work to put together, test and get
a more-or-less upstreamable patchset from. This would
produce a platform hopefully at least as usable as
versatile, but with an A9 and 1GB RAM.
-- PMM
"Will Deacon" <will.deacon(a)arm.com> wrote on 02/14/2011 11:30:45 AM:
> > - In testing on Versatile Express, I noticed what appears to be SMP
> > related bugs in handling regular software breakpoints: occasionally,
> > software breakpoints simply are not hit and execution continues as if
> > the underlying code had not been changed at all. This symptom
> > completely goes away if GDB and the debugged process are forced to
> > the same CPU using the affinity feature (e.g. with schedtool).
>
> I've seen this issue in the past but I thought I'd fixed it. What kernel
are
> you using and do you have CONFIG_ARM_ERRATA_720789 enabled?
I'm using the 2.6.37-1002-linaro-vexpress kernel from the Linaro package
of the same name. This does *not* have CONFIG_ARM_ERRATA_720789 enabled
(presumably because the mach-vexpress/Kconfig file does not add it?) ...
> > My guess, just from seeing those symptoms, would be that when
inserting
> > a software breakpoint via ptrace, not all i-caches on all CPUs are
> > reliably flushed ... Any thoughts on this?
>
> There was an I-cache aliasing problem in the kernel coupled with a TLB
> invalidation hardware bug on the versatile express. I fixed these though
> and haven't seen any problems since.
Hmm, a TLB flush problem could also explain the symptom (because the write
of the breakpoint to the text section causes a copy-on-write operation
which
installs a new page ...)
I'll try rebuilding the kernel with the above config option enabled.
> Hmmm, I'll need to have a think about this. What does GDB do if it
receives
> a SIGTRAP with si_addr set to (potentially) complete nonsense? As an
aside,
> Cortex-A15 reports the faulting address for a watchpoint correctly, so we
> will be able to use multiple watchpoints there.
The GDB common core can handle either of the following two indications:
A) The (read/write/access) watchpoint at address XXX triggered.
B) A write watchpoint may have triggered at some address.
In the case of B, GDB will scan all the write breakpoints it is currently
tracking and compare the current value at that address with the last value
it remembers being present there. Any changes GDB sees will cause it to
report the corresponding watchpoint as triggered.
As far as the kernel interface is concerned, the important issue that the
ARM native target in GDB is able to understand what the kernel reports, so
it can in turn report either case A or B to the common core.
This means as long as there is some way for GDB to understand the kernel
is reporting a write watchpoint hit at an unknown address, everything is
fine. This could be done e.g. be reporting a "slot" zero in si_errno to
indicate the slot (and then also the address) triggering the watchpoint
is unknown ...
> > - Finally, I noticed when reading kernel code that under some
> > circumstances, the kernel will automatically do a single step to
> > get off a watchpoint that was just hit. However, this does not
> > happen for user-space watchpoints installed via ptrace, right?
> > (Just wanting to confirm; since GDB currently does that single
> > step itself -- we don't want *both* kernel and GDB to issue a
> > single step each ...)
>
> If the {break,watch}point has been inserted via ptrace, the kernel will
> send a SIGTRAP instead of stepping the instruction.
OK, thanks for the confirmation!
> > I haven't gotten to looking further into other hardware (IGEP,
> > Panda) -- that's next on the list.
>
> Good stuff, keep me posted if you see any further problems!
Sure, will do!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hello, my fellow ARM aficionados!
The Linaro Developer Platform Team is pleased to announce a new initiative
to help improve the state of software on ARM: the ARM porting jam. Starting
today, February 16th, we will be running a weekly IRC jam on Wednesdays from
1400-1800 UTC to bring developers together to work on all manner of
userspace porting bugs, with the aim of fixing portability issues and
getting the fixes delivered to our upstreams.
An initial porting queue of known issues can be found here:
https://bugs.launchpad.net/ubuntu/+bugs?field.tag=arm-porting-queue
Interested in making the software in Ubuntu run better on ARM? Stop on by
the #linaro channel on irc.linaro.org today!
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
slangasek(a)ubuntu.com vorlon(a)debian.org
"Will Deacon" <will.deacon(a)arm.com> wrote on 02/11/2011 10:13:01 AM:
> I don't have a pandaboard, so I'd be interested to see if the code
> works there. I developed it using ARM boards, so the versatile express
> is a known good target.
I've now got it working reliably on on Versatile Express, after fixing
a couple of bugs on the GDB side (both in the HW-watchpoint patch, and
in common GDB code). The testsuite now passes with no regressions when
enabling HW watchpoints, except for two tests that require more than one
single watchpoint to be supported.
This raises another couple of issues/questions, however:
- In testing on Versatile Express, I noticed what appears to be SMP
related bugs in handling regular software breakpoints: occasionally,
software breakpoints simply are not hit and execution continues as if
the underlying code had not been changed at all. This symptom
completely goes away if GDB and the debugged process are forced to
the same CPU using the affinity feature (e.g. with schedtool).
My guess, just from seeing those symptoms, would be that when inserting
a software breakpoint via ptrace, not all i-caches on all CPUs are
reliably flushed ... Any thoughts on this?
- As mentioned above, the kernel currently only supports one single
watchpoint to be active at a time, even though hardware might support
multiple ones. The reason seems to be that when a watchpoint triggers,
the kernel cannot figure out which one it was (if there's more than one
choice).
This is a bit unfortunate, given that GDB will attempt to insert two
or more watchpoints in many interesting cases (e.g. a "watch *p"
command will insert *two* low-level watchpoints, one at the address
of p, and one at the address where p (currently) points to).
In addition, for regular (write) watchpoints, GDB does not actually
*require* the underlying hardware/kernel to specify which watchpoint
was hit; GDB is able to find out by itself by checking whether the
values at any of the currently active locations actually changed.
(For read/access type watchpoints, GDB does require that underlying
support -- but those are much more rarely used anyway.)
Do you see any chance of improving upon the current behaviour?
- Finally, I noticed when reading kernel code that under some
circumstances, the kernel will automatically do a single step to
get off a watchpoint that was just hit. However, this does not
happen for user-space watchpoints installed via ptrace, right?
(Just wanting to confirm; since GDB currently does that single
step itself -- we don't want *both* kernel and GDB to issue a
single step each ...)
I haven't gotten to looking further into other hardware (IGEP,
Panda) -- that's next on the list.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== Linaro GCC 4.5 ==
Re merged all the patches I've had to back out of Linaro GCC due to
various test failures. I've now found all the extra fixes/patches
necessary to make them go ... I think. Tested the build and test on ARM
and x86_64.
== Linaro GCC 4.6 ==
Continued getting the 4.5 patches forward ported to 4.6. I now have
about 4 patches waiting for review upatream, or ready to be posted.
Upstream review isn't happening though. This partly due to GCC being in
stage 4, but mostly due to Richard Earshaw being on sabatical, and the
other maintainers being inactive. I can see that I'm going to have to
abandon my hopes of only merging to Linaro GCC once it's been approved
upstream, and be content with merging to Linaro once it's posted upstream.
Started another test to rebase the Linaro 4.6 branch with the latest
from upstream. Once that's done, I think I'll start merging my changes
in, and call that our baseline. (There'll still be merges from upstream,
but the history will diverge.)
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* Kazu's VFP testcases:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html
* Jie's thumb2 testcase fix:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00670.html
== Week of Jan.31st--Feb.6th ==
* Vacation, Chinese New Year Holiday.
== Last week ==
* Monday (Feb.7th), last day of vacation.
* LP #711819, ICE in push_minipool_fix: this turned out to be a simple
case where a memory load alternative was not tagged with the minipool
range attributes. Patch sent upstream, awaiting approval.
* LP #709453, wrong code generated for NEON. Tracked this down and
mostly know how to fix this, but discussion with Ramana brought the
issue up that the entire idea of using NEON vmov.i32 for loading VFP
constants may not be good for A9, and unclear for A8. We probably should
just revert the patch from the Linaro tree for now.
* PR46002, IRA internal compiler error with -fira-algorithm=priority.
Been looking at this as a part of my background IRA studies. Have a
possible patch for this, plus found another assert fail ICE under ARM.
Will see if can post upstream this week.
== This week ==
* Continue to look at above unfinished issues, as well as other new ones.
== GDB ==
* Installed 2.6.37 Linaro kernel on IGEP and Versatile Express
in order to verify support for HW breakpoints/watchpoints
* Tested GDB HW watchpoints patch, fixed several bugs in the
patch and core GDB, and got it working reliably on vexpress
* Started discussion with Will Deacon (ARM) regarding possible
further enhancements to related kernel support
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== String routines ==
* Copied an improvement I'd previously made to memchr (removing a
branch using a big IT block) to strlen
* Modified benchmark setup to build everything as a library to
fairly give everything a PLT overhead.
* Pushed optimised memchr and strlen and simple strchr into
cortex-strings bzr repo
* Patched eglibc to use memchr and strchr code - although currently
fighting to get appropriate .changes file
== ffi ==
* Kicked off TSC request for license permissions
== bugs ==
* Built and recreated the qt4-x11 bug, produced all the dumps and
boiled it down to a few lines of suspicious RTL for Richard.
** Away next week.
== GCC ==
* Finished testing fix for lp:709329 and got that merged.
* Wrote up a plan for GCC performance improvements based on what we
discussed at the sprint.
* Internal ARM tasks that kept me busy for most of last week and this week.
Plans:
* still stuck on some ARM internal tasks for next week.
== This week ==
* Got the STT_GNU_IFUNC work ready to submit. Split out some preparatory
patches, including fixes for some general ARM inefficiencies that I
noticed this week. Ran the EGLIBC testsuite (including ifunc tests)
and they passed.
* Discussed ideas for representing permuted vector loads with Ira.
I'm still um-ing and ah-ing about the various possible approaches,
but I think I understand the constraints a bit more now.
* Fixed Qt miscompilation (lp #705689).
* Fixed PC-relative load bug in the assembler (lp #716967).
== Next week ==
Holiday!
Richard
RAG:
Red:
Amber: DATE/QEMU conference still hasn't confirmed I have a place...
Green: qemu-linaro first release made!
Current Milestones:
| Planned | Estimate | Actual |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
* maintain-beagle-models:
+ first qemu-linaro release (2011.02-0) made on time
+ fixed OMAP3 MMC controller model bug that was causing the kernel
to hang when enabling a swapfile; pushed fix to qemu and meego trees
+ rebased qemu-linaro on new upstream
* merge-correctness-fixes
+ reviewed some softfloat patches from Christophe; testing of
the half-precision floating point conversion instructions
showed up a number of other bugs which I submitted patches for:
http://patchwork.ozlabs.org/patch/82594/ (n/6)
+ reviewed and tested Christophe's patches for VQMOVUN and
VSLI.64/VSRI.64; these have been committed upstream
+ fix compile failure if !CONFIG_USE_GUEST_BASE
http://patchwork.ozlabs.org/patch/82630/
+ remove stray #include halfway through source file
http://patchwork.ozlabs.org/patch/82661/
+ improved vmull.p8 implementation over the meego version, sent
upstream: http://patchwork.ozlabs.org/patch/82657/
+ upstreamed patch to fix VQDMLSL:
http://patchwork.ozlabs.org/patch/82752/
+ upstreamed patch fixing thumb-to-arm neon dp insn conversion:
http://patchwork.ozlabs.org/patch/82757/
+ upstreamed patches fixing Neon VZIP and VUZP
* other
+ did a quick estimate of required effort to do vexpress model
(answer: 2 weeks if we don't want audio/USB/compact flash)
+ usual crop of standing meetings
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hi,
* moved from Ubuntu Maverick to Natty on the PandaBoard
* investigation on the LTTng User Space Tracer:
https://wiki.linaro.org/KenWerner/Sandbox/LTTng
* started to look into latrace:
https://wiki.linaro.org/KenWerner/Sandbox/latrace
The idea is neat but there are issues in case the users code does dlclose
on a shared object. I'll investigate further when time permits.
* spent some time on IBM internal process work
Regards
Ken
Hi Will,
> > - It seems odd that the kernel says it doesn't support the debug
> > architecture, but then reports to user space that 1 watchpoint and 6
> > breakpoints are supported ... GDB will never use the watchpoint,
because
> > the maximum watchpoint size is reported as zero, but GDB will attempt
to
> > use the breakpoints. Setting a breakpoint will appear to succeed, but
then
> > the breakpoint just never triggers. The kernel should IMO be more
> > consistent in how unsupported configurations are handled ...
>
> Agreed. This is an artifact of how the ptrace info register is populated.
> I'll work on a fix tomorrow so that we don't report any resources when
> the architecture is unsupported.
Great, thanks!
> > - Why is architecture 0x4 not supported? This seems to be the variant
of
> > the v7 debug architecture with memory-mapped registers. Apparently the
> > IGEP only supports this version ... Do you know what the
> > Beagle-/Pandaboard and other clones do? What would it take to support
this
> > architecture variant? Given the widespread use of those boards, it
would
> > be really nice if we could support hardware debugging on them ...
>
> The memory-mapped interface is hugely unreliable in real hardware because
> you have to calculate the address of the memory-mapped debug registers by
> using a base and offset, which are hardcoded in some information
registers.
> Unfortunately, I've never found a board where these registers have been
> programmed correctly so (a) I had nothing to test my code with (b) few
people
> would be able to use it and (c) there's not really a safe way to go
around
> poking random areas of memory.
Huh, I see. I have no idea whether those information registers contain
correct values on IGEP ..
> > - Which hardware *is* supported? Can you recommend a board I should be
> > using to verify GDB support is working?
>
> The simple rule is Cortex-A8 is unsupported and Cortex-A9 is supported.
> The A5 should work (untested) and the A15 will need a bit of hacking to
> get it supported.
OK. I guess I can try on our Versatile Express.
> > Thanks for your help in getting this working!
>
> No problem. If you find anybody with working memory-mapped debug and some
> spare time, I'd be happy to review patches :)
Thanks! I'll try and see if I can figure out where the MM area is
on the IGEP ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hello,
* Analyzing DENBench benchmarks.
* Running mp3 player on Crotex A9 with gcc-linaro -r99463 using SMS flags
(*) gives 21% improvement in execution time compared to using only base
flags(**).
(*) -fmodulo-sched -fmodulo-sched-allow-regmoves
(**) -mcpu=cortex-a9 -mtune=cortex-a9 -mthumb -static --fast-math
Thanks,
Revital
Hi,
* regtested vzip/vuzp patch
* looked into big-endian build
* applied all the required patches and checked that Viterbi gets
vectorized giving ~2x performance improvement (compiled with
cross-compiler)
* looked into vld/vst implementation - mostly discussions with Richard
* DenBench analysis:
- there are loops that should get vectorized with vzip/vuzp patch,
I'll check them next week
- sad8_c (hot function from mp4encode) needs reduction SLP (which I
implemented several weeks ago), and an ability to jump unknown stride
in loop SLP - I am looking into this
Ira
On Wednesday 09 February 2011 20:25:32 Will Deacon wrote:
> > - Why is architecture 0x4 not supported? This seems to be the variant of
> > the v7 debug architecture with memory-mapped registers. Apparently the
> > IGEP only supports this version ... Do you know what the
> > Beagle-/Pandaboard and other clones do? What would it take to support this
> > architecture variant? Given the widespread use of those boards, it would
> > be really nice if we could support hardware debugging on them ...
>
> The memory-mapped interface is hugely unreliable in real hardware because
> you have to calculate the address of the memory-mapped debug registers by
> using a base and offset, which are hardcoded in some information registers.
> Unfortunately, I've never found a board where these registers have been
> programmed correctly so (a) I had nothing to test my code with (b) few people
> would be able to use it and (c) there's not really a safe way to go around
> poking random areas of memory.
So the only problem is that it's board specific? That's something we
know how to deal with -- all I/O components have some random board
specific address, and we put them in a platform device that is
listed in the board file. This should be easy enough to do for another
register area, though it means we have to do it separately for each board.
> > - Which hardware is supported? Can you recommend a board I should be
> > using to verify GDB support is working?
>
> The simple rule is Cortex-A8 is unsupported and Cortex-A9 is supported.
> The A5 should work (untested) and the A15 will need a bit of hacking to
> get it supported.
Is that because A8 is memory mapped and A9 uses CP14, or is there another
problem with A8?
Arnd
Hello Will,
I've been trying to get GDB support for hardware watchpoints/breakpoints
going. I've ported Matthew's GDB patch to current mainline, and am running
this under a 2.6.37-1002-linaro-omap kernel on an IGEPv2 board.
However, something seems to be not quite working: I'm seeing this kernel
message on boot:
hw-breakpoint: debug architecture 0x4 unsupported.
and then at runtime, the result of a PTRACE_GETHBPREGS call for register 0
is 0x04000106:
debug architecture: 4
watchpoint size: 0
nr. watchpoints: 1
nr. breakpoints: 6
This leads me to a couple of questions:
- It seems odd that the kernel says it doesn't support the debug
architecture, but then reports to user space that 1 watchpoint and 6
breakpoints are supported ... GDB will never use the watchpoint, because
the maximum watchpoint size is reported as zero, but GDB will attempt to
use the breakpoints. Setting a breakpoint will appear to succeed, but then
the breakpoint just never triggers. The kernel should IMO be more
consistent in how unsupported configurations are handled ...
- Why is architecture 0x4 not supported? This seems to be the variant of
the v7 debug architecture with memory-mapped registers. Apparently the
IGEP only supports this version ... Do you know what the
Beagle-/Pandaboard and other clones do? What would it take to support this
architecture variant? Given the widespread use of those boards, it would
be really nice if we could support hardware debugging on them ...
- Which hardware *is* supported? Can you recommend a board I should be
using to verify GDB support is working?
Thanks for your help in getting this working!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I'm working in the Linaro toolchain team on adding ARM support for GNU
indirect functions (STT_GNU_IFUNCs). The indirect function feature
requires a new relocation type, which is typically called R_FOO_IRELATIVE.
I'd therefore like to propose a new R_ARM_IRELATIVE relocation type for
the ARM EABI.
This relocation is only used in ET_EXEC and ET_DYN objects. If the
object has a PT_DYNAMIC tag, then the relocation may only appear in
the DT_REL(A) table; it cannot appear in the DT_JMPREL table.
(Note that this is a deliberate divergence from the x86 and x86_64
behaviour, which does allow the IRELATIVE relocation to be used in
DT_JMPREL table, but which requires it to be applied at load time,
regardless of bind-now vs. lazy semantics. However, the proposed
ARM behaviour matches that of other targets like PowerPC.)
Static ET_EXEC objects may have R_ARM_IRELATIVE relocations. In this
case, the relocations are stored in a relocation table that contains no
other type of relocation (not even R_ARM_NONE). The static linker
defines two symbols:
__rel_iplt_start, which the linker points to the start of this table
__rel_iplt_end, which the linker points to the last byte of this table
plus one.
The two symbols are equal if the executable has no R_ARM_IRELATIVE
relocations. It is the executable's responsibility to apply these
relocations as appropriate. If the static linker emits a symbol table,
then it is not defined whether the linker includes __rel_iplt_start and
__rel_iplt_end in that symbol table.
The static linker may (or may not) define __rel_iplt_start and
__rel_iplt_end in dynamic objects. However, if it does define them,
the symbols must refer to part of the DT_REL(A) table, and it is still
the dynamic linker's responsibility to apply the relocations.
An R_ARM_IRELATIVE relocation applies to all bits of a 4-byte field.
There are no alignment restrictions on the field. The relocation
value is:
call(B(S) + A)
where call(X) represents the value of r0 after performing an indirect
branch-with-link-and-exchange (BLX) to address X.
The dynamic linker must have applied all earlier DT_REL(A) relocations
before calling X. It is undefined whether later DT_REL(A) relocations
have been applied or not, and X must not make any assumptions about the
status of those relocations.
If there is an R_ARM_IRELATIVE relocation with symbol S and addend A,
then the relocation value:
call(B(S) + A)
is considered to be a load-time constant. It is possible for an object
to have more than one R_ARM_IRELATIVE relocation with the same value
of B(S) + A, and in such a case, it is not defined whether the dynamic
linker invokes the target function each time, or whether it caches the
results of earlier calls.
I realise this isn't the cleanest extension in the world. As Alan Modra
noted on the binutils list, the choice of __rel_iplt_start and __rel_iplt_end
is particularly unfortunate, since the relocations are not specific to
"PLTs". However, the GNU extension has been defined this way,
so unfortunately there isn't much room for target-specific variation.
Thanks,
Richard
Hi,
I'd like to check vzip/vuzp patch in big endian mode. But when I try
to compile with -mbig-endian flag, I get
> ~/mainline/bin/bin/gcc -O3 -mfloat-abi=softfp -mfpu=neon neon-vtrnu8.c -mbig-endian
/home/irar/mainline/bin/lib/gcc/armv7l-unknown-linux-gnueabi/4.6.0/../../../libgcc_s.so.1:
could not read symbols: File in wrong format
collect2: ld returned 1 exit status
What am I missing?
Thanks,
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2011.02-0 is the third release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes
and enhancements.
Interesting changes include:
* Backtracing is more reliable through using the ARM specific
exception tables for unwinding
* Better supports debugging functions compiled with GCC's -fstack-protector
* Multiple testsuite related fixes
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2011.02-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael