Hi All,
This is a brain dump of what I learned about running LAVA today.
Dave will probably find a place for this in the Validation wiki, but
I'll pass it round in the meantime.
Hope it helps
Andrew
Hi,
* libunwind
* posted small bug fixes
* noticed the unwinding on Android is broken somehow
(need to track down the commit that broke it)
* linaro android
* repo sync fails due invalid bionic commit id (#885792)
* tried to remotely attend the Connect
* +1 for having live streams of the plenaries
(http://video.ubuntu.com/live/)
* -1 for pointing us to the wrong grand sierra irc channels
(http://uds.ubuntu.com/participate/remote)
* icecast streams worked most of the time
(* public holiday on tuesday)
Regards
Ken
=== 64 bit atomics
* I got the race in membase down to a futex issue, and asking dmart
pointed me at a kernel bug that
affects recent kernels where a fix had gone in about a month ago.
That was a nasty one!
* I've still got a few bugs left; most are turning out to be timing
races in the test code (e.g. one that
times out after 2seconds but the code takes around 1.7 seconds ish -
but if something else gets
in trips over the line, and another one where it did a recv_from on a
socket but only got
the start of a message, presumably because the sender had used
multiple sends). It's tricky going
because the tests are a combination of most scripting languages (perl,
python, ruby with a splash of Erlang).
I've so far found no bugs in the atomic code.
* I looked at apr and SDL-1.3; both of which use atomics; but end up
not using 64bit atomics;
the tendency is for them to ensure they can do atomics on long and on
a void*; both of which
for us are 32bit.
=== String routines
* I've got the Newlib A15 optimised memcpy running in a test harness
at the moment for
comparison.
=== Listening to connect
* I listened in to a few connect sessions each day; the 1st day or
so was 3/4 lost on
audio systems that didn't work (I'm especially annoyed at not being
able to hear the QEMU for A15/KVM session
and toolchain support for kernel). The Rypple session was rather lost
through the lack of any screen share
or slides.
Hello all,
I've been playing around with linaro and have it working on my
Pandaboard locally. I have a couple of questions about the linaro
environment; if this is the wrong forum, I'm happy to take it elsewhere.
I see that Linaro makes monthly releases of the hwpacks and images.
How are the packages/binaries in those images created? Are they
cross-compiled, or compiled natively on the target platform? If they
are cross-compiled, how is the environment created?
The reason I ask is that we've been looking at cross-compiling some
packages ourselves, and have been running into issues. So we were
wondering what toolchain the linaro community uses.
Thanks in advance,
--
Chris Lalancette
Hello all,
I've been playing around with linaro and have it working on my
Pandaboard locally. I have a couple of questions about the linaro
environment; if this is the wrong forum, I'm happy to take it elsewhere.
I see that Linaro makes monthly releases of the hwpacks and images.
How are the packages/binaries in those images created? Are they
cross-compiled, or compiled natively on the target platform? If they
are cross-compiled, how is the environment created?
The reason I ask is that we've been looking at cross-compiling some
packages ourselves, and have been running into issues. So we were
wondering what toolchain the linaro community uses.
Thanks in advance,
--
Chris Lalancette
Hi,
- Finished rewriting SLP analysis to support not only unary and binary
operations. Committed upstream.
- Implemented cond_expr support in SLP (for libav weight_h264_pixels).
Testing it now.
- Vectorizer maintenance (test/bug fixes, patch reviews).
Ira
Testing an initial version of the implementation which estimates
register pressure in SMS on libav micro benchmarks.
I see 20% improvements in mjpegenc microbench and 11% on aacsbr-2 with
SMS. However swscale-rgb24ToY_c
still have spills in the final code although it requires maximum 64
VFP_REGS registers out of the available 64 registers so I'm trying to
understand the reason for the spill.
==Progress===
* Off for one day during the week for Diwali.
* Connect preparation - Wrote down areas to look at during connect and
tried to plan what we
want to look at during connect.
* Looked at some of the cases with vcond<float> with Ira and helped
frame blueprint.
* Investigated one of the big performance regressions in the popular
embedded benchmark
and looked at why it wasn't being vectorized only to realize that it
couldn't be. Thanks
Ira. Still don't know why ARM state is 22% faster than Thumb2 state.
* Looked at the issue with fPIC where GCSE appears to remove a label
for sometime
but not much progress.
=== Plans ===
* Connect ! next week and then vacation.
Absences.
* 26 Oct - Diwali
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked -
* 08 Nov - 11 Nov - Vacation booked
* Dec 19 - 31st Dec - Vacation booked
== 64 bit atomics ==
* I've been building and testing membase
* Version 1.7.1.1 source builds OK (after turning off -Werror due to
some of their curious type naming)
* The git version fails to build - it doesn't seem consistent
* 1.7.1.1 passes simple tests, but there are 3 tests in its test
suite that intermittently fail on ARM and
seem to be solid on x86. (There are also some that just require
timeouts increased due to the
relatively slow machine).
* t/issue_163.t turned out to be a timing race in the test itself,
made worse by being on a relatively slow
machine and probably made worse by the Pandas odd idea of timing.
That was reported to them with
a break down of it, and upstream has fixed their test. (
http://code.google.com/p/memcached/issues/detail?id=230 )
* t/issue_67.t is proving tougher; once in a while memcached will
lock up during init in thread_init;
there is one particular point where adding a printf will make it work
apparently reliably. I've got one
or two ideas but I need to check my understanding of pthread_cond_wait first.
* There is an assert I've seen triggered once - not looked at that yet.
== String routines ==
* While I was off last week, my memchr and strlen were accepted into newlib
* Joseph has responded to my eglibc mail, with a couple of small queries.
== Other ==
* Wrote a more detailed test case for bug 873453 (odd timing
behaviour on panda); it's
quite odd - I can get > ~80ms timing discrepency so it's not a clock
granularity issue.
* Replicated a QEMU crash for Peter.
Dave
Hi,
* finished changing libunwind to be more portable
* tested patchset on ARM and X86_64
* now builds on Android without modifications
(Android.mk, config.h and libunwind-common.h are still required)
* verified that the modified debuggerd still works
* discussed backtracing using libunwind on ARM with Harald from the BSC
* they use libunwind in a sampling tool that generates Paraver
tracefiles
* started to upgrade my Linaro Android environment and ran into issues
* need to check:
* why building the toolchain using linaro-build.sh fails
* why repo sync fails due to invalid platform/bionic SHA1
* what happened to LEB-panda.xml
Regards
Ken
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
== a15-usermode-support ==
* A15 instruction support patches committed upstream in time for
upstream's 1.0 release
== upstream-omap3-cleanup ==
* some work on restructuring the omap3 patchset -- it's now basically
in the right order and the last 'touches several different
bits of code' jumbo patch has been split
== other ==
* sent some patches upstream which address the main things I
want to get into qemu 1.0 (PL041 audio support and fixing a
regression in handling multithreaded programs in linux-user mode)
* A15 KVM planning work and other preparation for Linaro Connect
* finally tracked down the qemu-on-ARM memory corruption: we
mmap the code generation buffer at 0x1000000 with MAP_FIXED;
unfortunately this is now in the middle of glibc's heap...
(filed as LP:883133)
* qemu now has a coroutine implementation which defaults to using
makecontext() if it is present. Unfortunately ARM eglibc provides
an implementation which always returns ENOSYS, which is a bit
tricky to detect with a compile time configure check (without
breaking cross-compilation support).
* these two things (and some other known bugs) mean that QEMU on
ARM hosts is basically broken, and will probably continue to be
since we don't have the spare resource to test and fix bugs
(beyond those which we need to fix for KVM-on-ARM)
* Looked at how to configure Firefox and how to build different parts of the
program. Usage of .mozconfig, myrules.mk and myconfig.mk.
* Tested the Talos framework. https://wiki.mozilla.org/Buildbot/Talos. I
think it would be good to use Talos for the browsing benchmarks. We can
discuss it further at connect.
* Preparing for connect.
Best Regards
Åsa
Summary:
* Exercise crosstool-ng and summarize the gaps.
Details:
* Exercise crosstool-ng
(1) Sync with lp:~linaro-toolchain-dev/crosstool-ng/linaro.
(2) Try to config linux-host-baremental-target an
mingw32-host-baremental-target.
(3) Try to build the toolchain for both embedded toolchain and
linaro-gcc-4.6-2011.10 with the config.
. C compiler for linux and mingw32 hosts and c++ compiler for
linux host can be built without any change.
. C++ compiler for mingw32 host can be built after PCH is disabled.
. GDB-cross build fail due to dependence packages.
* Gaps in crosstool-ng
(1) Improve GDB-cross scripts to download and build the dependence
packages: expat and ncurses. Or put expat and ncurses as
companion_libraries.
(2) To remove dependence, embedded toolchain requires more
prerequisites like zlib.
New config and scripts are required to support the packages.
(3) Currently, the embedded toolchain source packages are released
as a tarball, which includes gcc, gmp, etc. New scripts are required
to support it.
(4) To make sure the toolchain can run with lower version glibc like
redhat4/5, the embedded toolchain requires lower version native
gcc4.3.6 to build it.
To support it,
. Users can build the native gcc manually, or
. Enhance the scripts to add one step to build native gcc.
(5) All the default package configurations are different from
embedded toolchain internal build scripts.
Since the configurations in embedded toolchain had been tuned
and tested, we will change the configurations in crosstool-ng if they
do not match and not configurable.
The same rule will apply for linaro toolchain.
Plans:
* Write scripts to re-pack the embedded toolchain source packages.
* Add the supports for all prerequisites in crosstool-ng menuconfig.
Thanks!
-Zhenqiang
Posted a patch upstream to fix big-endian for generic tuning. This was a
simple omission from my previous patches.
Merged GCC 4.6.2 to Linaro GCC. It's still in testing now, so I'll have
to commit it sometime over the weekend or next week.
Looked at the benchmark results from Spec2000 running on both A8 and A9
systems, with and with NEON, and with various compiler options. Posted
the results in a spreadsheet (visible within Linaro only).
Begun making adjustments to generic tuning and started new spec2k runs
to see if they are beneficial. First, I'm trying A9 prefetch settings on
A8 to see how much damage it does. Next I'll try enabling the A8 NEON
tuning settings on A9 to see what happens there.
Prepared for travel next week.
Vacation Friday
Hi,
- Merged to gcc-linaro:
- widening shifts
- SLP features: support loads with different offsets and swap
operands if necessary
- Started rewriting SLP analysis to support operations with more than
two operands (towards SLP of conditions)
- Updated NEON presentation following Ramana's suggestions (thanks!)
- Suggested to Ramana to implement vcond with mixed types, created a
blueprint: https://blueprints.launchpad.net/gcc-linaro/+spec/vcond-with-mixed-types
- Vectorizer:
- updated vectorizer's webpage
- updated vectorizer's wiki page
- the usual maintenance
- Committed upstream two SLP data-ref analysis improvements: PR 50730
and PR 50819
Ira
Hi there. Connect is just around the corner. Have a look at:
https://wiki.linaro.org/MichaelHope/Sandbox/Q4.11Plans
for a summary of the toolchain sessions and hacking topics.
It would be great to have kernel and OCTO input in the ARM STM driver,
Kernel debugging, and KVM sessions.
-- Michael
Hi Folks,
Draft agenda for the performance meeting next week at Connect -
https://blueprints.launchpad.net/gcc-linaro/+spec/linaro-toolchain-performa…
Are there any topics that people would like to bring up during this
meeting other than the ones listed here ? I suspect that we'll
probably just have about 10-15 minutes for a topic in this case. I am
not considering discussing PGO related stuff in this session given
that we've got another session in which we can discuss this.
Thoughts ?
cheers
Ramana
Hi Folks,
I've been trying to capture what we want to do in terms of hacking
time and some of the performance related backlog that we have in the
system. I have done so here.
https://wiki.linaro.org/RamanaRadhakrishnan/Sandbox/Q411ConnectGCCPerfPlan
I'm on vacation tomorrow but should be picking email for sometime
during the day.
Thoughts about what else we could be doing in this area or if there's
a better way we could use our hacking time.
cheers
Ramana
---
At the moment ARM eglibc doesn't support the functions declared
in ucontext.h: getcontext(), setcontext(), swapcontext() and
makecontext(). Instead you get implementations which always
fail and set errno to ENOSYS.
QEMU uses these functions to implement coroutines. Although there
is a fallback implementation in terms of threads, there are reasons
why using the fallback is suboptimal:
* its performance is worse
* it will be less tested, because x86_64 and i386 both implement
the ucontext functions and so QEMU on those hosts will be using
different code paths
* I'm not aware of a good way at configure time to detect whether
getcontext() et al will always fail without actually running a
test binary, which won't work in a cross-compile setup. (If eglibc
just didn't provide the functions at all this would be much
simpler...)
We're going to care about performance and reliability of QEMU on
ARM hosts as we start to support KVM on Cortex-A15, so it would
be good if we could add ucontext function support to eglibc as
part of that effort.
Opinions? Have I missed some good reason why there isn't an
ARM implementation of these functions?
(I'm aware that the ucontext functions have been removed from
the latest version of the POSIX spec; however AFAIK there's no
equivalent functionality that replaces them so I think they're
still worth having implementations of for parity with other
architectures.)
-- PMM
==Progress===
* Some upstream patch review.
* Spent time looking at LP 836588 which is a case where CSE removes a
particular label access in one case but doesn't remove it from the
list of things in the constant pool which is quite bizarre. Will
probably need some help with looking into this one.
* Sent out vcvt.f32 and vcvt.f64 patches .
* Connect preparation - laptop cleanup and getting it finally onto an
x86_64 distribution.
* Looked at some of the vec_perm / vec_rev cases in Neon with Ira.
* Spent some time looking at some of Andrew's issues with generic-v7a
tuning especially the cases where it was doing better and gave some
suggestions.
=== Plans ===
* Prepare for Connect.
* Prepare by looking at some of the large differences between
various comparative benchmarks.
* Some research into PGO related stuff.
* Try to upstream some more of my patches in the backlog before the
end of the week.
* Finish off some internal paperwork.
* I'm off on 26th - Wednesday.
Absences.
* 26th Oct - Day off.
* 31st Oct - 4th Nov - Linaro Connect Q4.11
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
Continued looking at my constant reuse optimization. I've identified a
couple of hundred optimization opportunities in the whole of gcc itself,
which is fewer than I had hoped. There are almost no opportunities when
compiling for size as constants are always loaded from a constant pool
in that case (I'm not sure why that's the case, given that this isn't
any more space efficient than movw+movt, unless it can share the
constant in more than one place).
Backported my -mtune=native patch to Linaro GCC.
Backported my generic tuning patch to Linaro GCC.
Backported my pr50717 patch to Linaro, and pushed to Launchpad for testing.
Analysed my benchmark results I made to aid generic tuning.
Disappointingly the A8/A9 tuning is not as beneficial as one would like.
In fact, the existing generic tuning patch (which was supposed to be a
framework only) is actually quite competitive and gives better
performance in some cases.
Set more benchmarks running, this time with NEON enabled. That's about
36 hour's worth on A9, and more like 90 hours on my A8 (obviously,
there's some difference in the clock speeds there).
Discovered that my native tuning code won't compile with a C++ compiler
(GCC Bugzilla PR50809). Tested and committed a fix upstream.
== GDB ==
* Worked on support for cross-platform core file generation.
Posted initial set of patches for comments.
* Created "Toolchain support for kernel debugging" blueprint.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I made some progress on transforming the hacks I did to get libunwind
working on Android into proper patches that can go upstream. Things learned:
* bionic employs OpenBSD header files that therefore lack some GNU and
ARM specific defines (only small fix needed - plan is to change
libunwind to work with non-patched bionic too)
* Android basically provides all the functionality that is required
for libunwind-ptrace - but...
* no one seems to build libunwind with remote unwind functionality
(including libunwind-ptrace) only
* most of the issues can be avoided by changing libunwind to be more
portable
Regards
Ken
* Still having trouble with using multistrap/pdebuild-cross for
cross-compiling Firefox - it looks like only x86 packages get downloaded,
not armel. I have asked Wookey for advice, and he will try to reproduce the
build.
* Falling back to native compiling until the cross-compiling set up has been
sorted out. I will now take a look at how to pass different compiler options
to the Mozilla build system and how to build different parts of the program.
Best Regards
Åsa
(short week: 4 days)
RAG:
Red:
Amber:
Green: blog started :-) http://translatedcode.wordpress.com/
Current Milestones:
|| || Planned || Estimate || Actual ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
== other ==
* upstream patch review, putting together pull requests
* more time spent on qemu on ARM host apparent memory corruption
bug (no luck yet :-(); found a Valgrind bug in the process,
though (KDE:284472). This ate up way too much of this week.
* A15 KVM planning work
* meetings etc
* moved over to patches.linaro for QEMU patch tracking
-- PMM
Hi,
* widening shifts - finally committed upstream
* SLP loads with different offsets and operand swaps - committed upstream
* SLP with multiple types - merged to gcc-linaro-4.6
* vectorizer stuff: patch review, test fixes, discussions, bug fix
* Ramana and I discussed what can be done with VEC_PERM_EXPR for NEON,
and created https://blueprints.launchpad.net/gcc-linaro/+spec/support-vec-perm
for this issue.
Ira
Following on from last night's performance call, I had a look at how
64 bit integer operations are mapped to NEON instructions. The
summary is:
* add - fine
* subtract - fine
* bitwise and - fine
* bitwise or - fine
* bitwise xor - fine
* multiply - can't as the instruction tops out at 32 bits. Might be
able to compose using VMLAL
* div, mod - no instruction
* negate - instruction tops out at 32 bits, but could be turned into
vmov #0, vsub
* left shift constant - missing
* right shift constant - missing
* right arithmetic shift constant - missing
* left shift register - missing
* right shift register - tricky, as you do this as a left shift -register
* not - no instruction, but could be done through a vceq, #0?
* bitwise not - missing
I also noticed that the replicated constants aren't being used. A
pre-increment is load constant pool; vadd but could be done as a vmov,
#-1; vsub. The same with pre-decrement - it could be done as a vmov,
#-1; vadd.
This seems worth blueprinting.
-- Michael
limits-fndefn.c takes an impressively long time to run. On an idle
machine, -O3 -g -c takes 17:31 and -O2 -g -c takes The test already
has a dg-timeout-factor of 4 giving a total timeout of 20 minutes.
Removing the -g brings this down to 30 s. Keeping the -g and adding
-fno-var-tracking brings this down to 45 s.
We could bump the multiplier up to 8 but it's getting a bit
ridiculous. Any thoughts?
-- Michael
== Last week and today ==
* Backported fix for returning std::pair<bool, bool>. Unfortunately
this showed up a regression on 4.5. I couldn't reproduce it cross,
and the testcase itself looks innocuous, so I'm wondering whether
the patch might trigger a miscompilation of cc1plus.
* Committed SMS register-scheduling patches upstream and backported
to Linaro 4.6.
* Most of the week spent on -fsched-pressure. Still trying a few
variations in order to get the right balance. (My local haifa-sched.c
now has about 20 new toggles.) Still feel like I'm making progress,
rather than hitting the point of diminishing returns.
Hope Connect goes well. See everyone in a few weeks' time.
Richard
Completed the 4.5 and 4.6 FSF to Linaro merges.
Spun the Linaro GCC release tarballs, uploaded them to the test farm,
and set off the test builds.
Continued looking at the constant reuse optimization. This time I've
build GCC itself with the new pass to see how many optimization
opportunities there are. This shook out a lot more small bugs, which was
useful.
Backported my negative-shifts patch to Linaro 4.6, pushed it to
Launchpad for testing, and then committed it to 4.6 once in was approved.
Experimented with running SPEC2K on A8 and A9 boards in order to
establish a baseline for the generic tuning tweaks. A short test doesn't
give much clue as to what can be achieved, and a long test takes way too
long. The problem is also complicated by the benchmarks where the A8
tuning works better on A9 than A9 tuning does. :S
Received a bug report (GCC bugzilla 50717) for my widening multiplies
patches. Analysed the problem, developed a patch, and posted it to
gcc-patches.
<Short week with 2 days gone on an internal training course>
==Progress===
* Some patch review.
* Spent time looking at LP 836588.
* Tried some different approaches for the vcvt.f64.s32 case and it
looks like the simple solution is the best one unfortunately :(
* 2 days off at internal training course.
=== Plans ===
* continue looking into LP 836588
* Patch review week/
* Work on getting vcvt.f* case done and finish some of the backlog.
Absences.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked -
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
I've just tried rerunning some benchmarks on my panda, which I
reinstalled recently and am getting
some odd behaviour:
The kernel is 3.0.0-1404-linaro-lt-omap
For example:
simple_strlen: ,102400, loops of ,62, bytes=6.054688 MB, transferred
in ,20324707.000000 ns, giving, 297.897898 MB/s
simple_strlen: ,102400, loops of ,32, bytes=3.125000 MB, transferred
in ,7904053.000000 ns, giving, 395.366782 MB/s
simple_strlen: ,102400, loops of ,16, bytes=1.562500 MB, transferred
in ,7354736.000000 ns, giving, 212.448142 MB/s
simple_strlen: ,102400, loops of ,8, bytes=0.781250 MB, transferred in
,91553.000000 ns, giving, 8533.308575 MB/s
simple_strlen: ,102400, loops of ,4, bytes=0.390625 MB, transferred in
,1495361.000000 ns, giving, 261.224547 MB/s
simple_strlen: ,102400, loops of ,2, bytes=0.195312 MB, transferred in
,1983643.000000 ns, giving, 98.461518 MB/s
Note the 8 byte one apparently 40 times faster, and for true oddness:
smarter_strlen_ldrd: ,102400, loops of ,62, bytes=6.054688 MB,
transferred in ,3936768.000000 ns, giving, 1537.984331 MB/s
smarter_strlen_ldrd: ,102400, loops of ,32, bytes=3.125000 MB,
transferred in ,0.000000 ns, giving, inf MB/s
smarter_strlen_ldrd: ,102400, loops of ,16, bytes=1.562500 MB,
transferred in ,4180909.000000 ns, giving, 373.722557 MB/s
Now, while I like infinite transfer rates, I suspect they're wrong.
Anyone else seeing this?
Dave
Implementing estimating register pressure in SMS.
Experimenting with the implementation on libav microbench.
Discussed with Richard some issue raised while implementing.
== GDB ==
* Created and published Linaro GDB 7.3-2011.10 release.
* Fixed LP #871901 (Linaro GDB crashes on 3.x kernels) in
mainline and Linaro GDB 7.3.
* Backported mainline fix for LP #829595 (Separate debuginfo
misidentified with "remote:" access) to Linaro GDB 7.3.
* Completed blueprint "GDB as a cross-debugger".
* Worked on support for cross-platform core file generation.
== GCC ==
* Patch review week.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== 64 bit atomics ==
* Thanks to Ramana for OKing my gcc patches; and Richard for
committing them - I've backported these to the gcc-linaro branch
and pushed it - hopefully those will pass OK!
== String routines ==
* Sent my memchr patch to upstream newlib, received comments,
tweeked, and resent
* Sent strlen patch to upstream newlib
* Spent some time getting confused by timing issues on our Panda; it
got reinstalled with 11.09 a few
weeks ago and is now showing some odd behaviours. In particular I'm
seeing some tests show completion in 0ns
(and my code isn't -that- fast!), and others where the times vary
wildly - it's almost as if a timer interrupt is delayed
or missing; my same test binary works fine on one of Michael's Ursa's
running an older install.
== QEMU ==
* Tested Peters QEMU image for release
== Other ==
* Spent an afternoon reading through the System trace docs
On holiday next week; I'll poll email occasionally.
Dave
Hi,
* working through my inbox after being away
* a former patch of mine accidentally broke remote unwinding on IA64
* maintainer made a quick fix that made things worse for ARM
* posted a patch that aims to fix things up for all archs
* identified and submitted libunwind-android patches that could go upstream
* sent patch that lets another testcase pass on ARM Linux
* tried to help people on the mailing list with various issues
Regards
Ken
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
== add-omap3-networking ==
* patches pushed into qemu-linaro and final testing done
== a15-usermode-support ==
* this is now in qemu-linaro 2011.10; only remaining thing is
for the patches to be taken upstream (no major review issues)
== linaro-qemu-11.11 ==
* qemu-linaro 2011.10 released
== other ==
* completed a desk move
* some sw/hw archaeology to answer a question about the FIFO size of
the PL041 on ARM devboards as part of reviewing audio support patch
* trying to track down a really weird problem where qemu on ARM host
dies with apparent memory corruption
* Working on using pdebuildcross/multistrap when cross compiling Firefox.
With multistrap all dependencies should be sorted out automatically.
The current status is that pdebuild-cross for armel is set up and the
compilation of the Firefox package starts. Does not come all the way though,
because some dependencies (for X11) are not in place after all. Will
continue investigation.
Best Regards
Åsa
The Linaro Toolchain Working Group is pleased to announce the
release of Linaro QEMU 2011.10.
Linaro QEMU 2011.10 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- Instructions introduced with the Cortex-A15 (ARM mode
SDIV and UDIV, and the VFPv4 fused multiply-accumulate
instructions VFMA, VFMS, VFNMA, VFNMS) are now supported
in linux-user mode
- Beagle models now support USB networking (run the model with
"-usb -device usb-net,netdev=mynet -netdev user,id=mynet")
Known issues:
- There may be some problems with running multithreaded programs in
linux-user mode (LP:823902)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.10
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the 2011.10
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2011.10 is the eighth release in the 4.6 series. Based
off the latest GCC 4.6.1+svn179483, it contains a range of vectoriser
performance improvements and general bug fixes.
Interesting changes include:
* Updates to 4.6.1+svn179483
* Vectorises more straight-line code with data dependencies
* Now picks the best vector width when vectorising straight line code
* Better handles handling auto increment addresses in SMS
* Changes the default vector width from double word to quad word
* Better handling extracting the top or bottom half of a quad word vector
* Now supports the NEON absolute difference instruction
Fixes:
* LP: #689887 ICE in get_arm_condition_code
* LP: #809761 oss4 version 4.2-build2004-1ubuntu1 failed to build on armel
Linaro GCC 4.5 2011.10 is the fifteenth release in the 4.5
series. Based off the latest GCC 4.5.3+svn179438, this is a
maintenance focused release.
Interesting changes in 4.5 include:
* Updates to 4.5.3+svn179438
Fixes:
* LP: #689887 ICE in get_arm_condition_code
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.10https://launchpad.net/gcc-linaro/+milestone/4.5-2011.10
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release page:
https://launchpad.net/gcc-linaro/4.6/4.6-2011.10https://launchpad.net/gcc-linaro/4.5/4.5-2011.10
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.3.
Linaro GDB 7.3 2011.10 is the third release in the 7.3 series. Based
off the latest GDB 7.3, it includes a number of ARM-focused bug fixes
and enhancements.
This release contains:
* Support for disabling address space randomization in gdbserver
* Fix for GDB crashes on 3.x kernels
* Fix spurious "CRC mismatch" warnings when using "remote:" sysroot
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.10
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
Hi,
* Finished a presentation for NEON forum. Revital and Richard kindly
agreed to take a look and gave me some valuable comments. Thanks!
* widen-shifts:
- While preparing the presentation I found some room for improvement
in the pattern detection, so I implemented it. It gave additional 13%
to rgb24tobgr16.
- Ramana suggested a solution on how to check the constant operand of vshll.
Testing these two things on ARM.
* SLP improvements:
- Implemented a patch that swaps operands if necessary to make the
operations isomorphic, and supports loads with different offsets.
Testing it now.
- The three relevant libav loops now get vectorized giving 42%-57% speedup.
Next week holidays: half days Sunday-Wednesday and Thursday.
Ira
Here's my summary from Monday's meeting on the harder parts of binary
toolchains.
Using a 4.6 compiler against a 4.5 based sysroot such as Natty:
* libgcc and libstdc++ are part of the compiler
* The compiler expects features that are in the corresponding runtime
* You can't reliably run or validate against an earlier runtime
The solution is to upgrade the runtime on the sysroot to 4.6. 4.6 is
backwards compatible. Ubuntu did this with Maverick and it caused no
problems, although problems such as Debian #622783 have been seen.
Multiarch:
* The Ubuntu multiarch patch should work with a sysroot
* Multiarch and multilib should work together
Multilib:
* The current ARM multilib rules are old and not very relevant
* Multilib means you need multiple sysroots as well
* Skip multilib for the first release
Other:
* Anything we support in cross we should support native first
* Check that we don't have to directly supply the source that goes
with the binary sysroot
-- Michael
==Progress===
* Out of office for a day.
* Wrote a quick patch to do vcvt.f32.s32 with fractional bits where we
can. Tested no regressions, need to commit this after review upstream.
* Desk move and packing for that.
* Looked at a bug report LP 836588 appears to go away with fno-gcse.
Needs more digging.
=== Plans ===
* Look at LP 836588.
* Finish auto-inc-dec pipeline scheduling work.
* Clear out some of the old patches (POST_MODIFY_DISP for vfp, BRANCH_COST )
* Settle into new desk.
* Again a short week as I'm away for 2 days for an internal training event.
Meetings:
* 1-1s
* TCWG calls
Absences.
* 5th October - out of office.
* 13th - 14th October - Out of Office - Internal training event.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked -
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
== Last week ==
* Patch review.
* Backported second attempt to fix get_arm_condition_code ICE.
* Worked on -fsched-pressure. Experimented with various combinations
of ideas. This is giving some good results (e.g. a 2x improvement
in libav's put_h264_qpel8_hv_lowpass_8) but needs a bit more work
to fix some outliers.
I'll be away from 18th Oct to 14th Nov.
Richard
Random data for the day: Dave Pigott has installed some new PandaBoard
build machines in the validation lab. They're identical to mine
except that root is on USB Flash instead of NFS, and they have a much
faster flash drive for the build area.
The time taken to bootstrap and test gcc-linaro-2011.09 with C, C++,
Fortran, and LTO is:
* ursa3, ursa4 (Toshiba USB stick): 301 minutes build / 369 test
* ursa2 (no-name USB stick): 324 minutes build / 422 test
* tcpanda (fast USB stick): 274 minutes build / 265 test
So the new combo gives a 1.38 x faster build. I'm surprised as I
though the build was CPU bound. I'd hate to see what building on an
SD card is like.
Note that /tmp is in RAM, /scratch is ext4, the new boards use
noatime, and the kernel doesn't have the new USB performance fix.
-- Michael
Continue working on estimating register pressure with SMS:
- Discussed current approach with Richard which gave useful leads.
- Started to implement this approach.
- Doing experiments on libav microbench.
The unaligned accesses in libpng are, for the large copies, a bug. Our attempt to align the row buffer to a 16 byte boundary was off-by-one so we end up always mis-aligning it. I've posted a patch on the png-mng-implement list:
http://sourceforge.net/mailarchive/message.php?msg_id=28194444
The time spent in memcpy() is probably an illusion. The data out of zlib gets copied to one row buffer where it is unfiltered (if necessary) then a copy is made in a separate buffer that is only used for the filter handling. If you test using images with large rows (I don't know what pngbench does) the copy buffer may well get flushed out of the second level cache between each row, then the memcpy will stall bringing it back in.
If you have machine level profiling you may see this as a massive time spike on some probably unrelated instruction which just happens to be in the PC when the stall stops everything.
Anyway, I have several ideas of how to avoid the copy when it isn't required.
John Bowler <jbowler(a)acm.org>
-----Original Message-----
From: Glenn Randers-Pehrson [mailto:glennrp@gmail.com]
Sent: Monday, October 03, 2011 1:15 PM
To: PNG/MNG implementation discussion list
Subject: [png-mng-implement] Use of memcpy() in libpng [Fwd from linaro-toolchain list]
Re: Use of memcpy() in libpng
David Gilbert
Tue, 27 Sep 2011 06:20:14 -0700
On 27 September 2011 14:16, Christian Robottom Reis <k...(a)linaro.org> wrote:
> On Tue, Sep 27, 2011 at 09:47:33AM +0100, Ramana Radhakrishnan wrote:
>> On 26 September 2011 21:51, Michael Hope <michael.h...(a)linaro.org> wrote:
>> > Saw this on the linaro-multimedia list:
>> >
>> > http://lists.linaro.org/pipermail/linaro-multimedia/2011-September/
>> > 000074.html
>> >
>> > libpng spends a significant amount of time in memcpy(). This might
>> > tie in with Ramana's investigation or the unaligned access work by
>> > allowing more memcpy()s to be inlined.
>>
>> It's the unaligned access and the change / improvements to the memcpy
>> that *might* help in this case. But that ofcourse depends on the
>> compiler knowing when it can do such a thing. Ofcourse what might be
>> more interesting is the kind of workload analysis that Dave's done in
>> the past with memcpy to know what the alignment and size of the
>> buffer being copied is.
>
> If you guys could take a look at this there is a potential requirement
> for the MMWG around libpng optimization; we could fit this in along
> with other work (possible vectorizing, etc) on that component.
It wouldn't take long to analyse the memcpy calls - life would be easier if we had the test program and some details on things like what size of images were used in these benchmarks.
Dave
Continued work on my constant reuse optimizations. Not too much this
week though. I've now fixed some issues with the ARM size-costs code
that was causing it to wildly over-estimate the cost of a MOVT
instruction. I'll have to post this upstream sometime soon.
Took another look at the shift-amount bug. Discussed the issue with Paul
Brook. I've now fixed the original bug, and fixed the new bug introduced
by Paul's original fix, and committed that upstream. I still need to
backport it to Linaro GCC though, and the latent bug that Richard S
spotted is still being analysed.
Did a merge from FSF 4.5 & 4.6 to Linaro, and pushed them the Launchpad
branches for testing.
Begun work benchmarking different setups for the generic tuning patches.
I had a lot of trouble trying to set up SPEC2000 though. Hopefully these
issues are now resolved, with some help from Michael, and I have
established some baseline figures on both A8 and A9 to work from.
No progress on native tuning. I'm still waiting for upstream review.
In other news: Mentor's contract with Linaro has now been extended for
another 6 months. :)
== String Routines ==
* Built and tested a newlib with my memchr in - ready to go with a
bit of tidy up.
* Followed up on my eglibc patch submission by a comment suggesting
the use of --with-cpu pointing back at the previous discussion.
== 64 Bit atomics ==
* Updated gcc patch based on Ramana's comments, retested and posted
new version
- Lost half a day to a failing SD card in our panda.
== QEMU ==
* Posted a patch that made one variable thread local using __thread
that fixes multi threaded user mode ARM programs (e.g. firefox); this
seems to have mutated on the list into a patch for more general thread
local support.
Dave
== GDB ==
* Reimplemented patch to disable address space randomization
in gdbserver to respect the "set disable-randomization"
command, and checked it in to mainline and Linaro GDB 7.3
* Worked on support for cross-platform core file generation.
== GCC ==
* Checked in mainline fix for PR 50305.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
- worked on the RTL part of the widen-shift patch
- backported to linaro 2/3 of the SLP patches, and proposed the third one
- worked on additional SLP improvements:
- swap operands to make statements isomorphic
- support load with offset 1 (after load from 0)
- started working on presentation for NEON forum
Upcoming holidays:
Oct 12, Wed - half day
Oct 13, Thu
Oct 16-19, Sun-Wed - half day
Oct 20, Thu
Ira
Hi,
So one of the things Michael pointed out in today's call was that the
ARM backend doesn't generate vcvt.f32.s<type> where you have an idiom
conversion from fixed to floating point as in the example below. I've
chosen to implement this in the following manner in the backend using
these interfaces from real.c . The reason I've chosen to not allow
this transformation in case flag_rounding_math is true is because this
instruction always ends up rounding using round-to-nearest rather than
obeying whats in the FPSCR and thus is not safe for programs that want
to dynamically set their rounding modes.
The benefits are quite obvious in that we eliminate a load from the
constant pool and a floating point multiply and thus essentially
shaving off a floating point multiply + Load latency off these
sequences. This instruction can only write the output into the same
register as the input register which is why I've modelled it as below
by tying op1 into op0.
If there's a simpler way of using the interfaces into real.c then I'm all ears ?
Thoughts ? I believe such idioms are used in libav from where the
original report appears to have come and thus it's a worthwhile gain
where we can have it. Any other places where folks might have noticed
this.
I will post upstream as well once I finish testing this patch. I'm
posting this here to get some feedback as well to let anyone who is
really really keen about trying this out have a go given I'm out
tomorrow.
( I took a quick look at the short -> f32 case as well but the fact
remains that loads either zero or sign extend anyway so there's
probably not much gain in modelling that right away and the win really
is in getting rid of that fp mul and the constant pool load. There's
probably some gain in going from i64-> f64 as well so those patterns
need to be written up at some point for completeness )
cheers
Ramana
2011-10-04 Ramana Radhakrishnan <ramana.radhakrishnan(a)linaro.org>
* config/arm/arm.c (vfp3_const_double_for_fract_bits): Define.
* config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare.
* config/arm/constraints.md ("Dt"): New constraint.
* config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal):
New.
* config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New.
(*arm_combine_vcvt_f32_u32): New.
For the following testcases I see the code as follows with
-mfloat-abi=hard -mfpu=vfpv3 and -mcpu=cortex-a9
float foo (int i)
{
float v = (float)i / (1 << 11);
return v;
}
float foa_unsigned (unsigned int i)
{
float v = (float)i / (1 << 5);
return v;
}
After patch .
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s0, r0 @ int
vcvt.f32.s32 s0, s0, #11
bx lr
.size foo, .-foo
.align 2
.global foa_unsigned
.type foa_unsigned, %function
foa_unsigned:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s0, r0 @ int
vcvt.f32.u32 s0, s0, #5
bx lr
.size foa_unsigned, .-foa_unsigned
.align 2
.global foo1
.type foo1, %function
rather than
.type foo, %function
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s15, r0 @ int
fsitos s0, s15
flds s15, .L2
fmuls s0, s0, s15
bx lr
.L3:
.align 2
.L2:
.word 973078528
.size foo, .-foo
.align 2
.global foa_unsigned
.type foa_unsigned, %function
foa_unsigned:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
fmsr s15, r0 @ int
fuitos s0, s15
flds s15, .L5
fmuls s0, s0, s15
bx lr
.L6:
.align 2
.L5:
.word 1023410176
* Vacation
Monday, Tuesday, and Wednesday.
* GCC
Continued work on my constant reuse optimizations. Disappointingly, I've
found that there are very few optimization opportunities in EEMBC
(ARM/Thumb V7-A), although it's not difficult to write testcases that
the optimization could improve. I also discovered that the data-flow
chains don't work exactly how I thought (with respect to if-then-else
cases) so I need to do a little more work.
Pinged the native tuning patches; they're still waiting for upstream review.
Still can't get the generic tuning work done as the CodeSourcery panda
boards appear to be still offline.
Committed to mainline the patch to support instructions with auto-inc
operations in SMS after addressing Ayal's comments. The patch contains
two parts; one of them fixes a bug revealed during bootstrapping with
the patch and SMS flags.
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01988.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-09/msg01987.html
Looking at estimating register pressure with SMS: based on previous
discussion with Richard the current approach is to try and use the
register pressure estimation in loop invariant pass.
I'm compiling an application built with TI's DVSDK 3 *[0].
/home/user/ti/dvsdk/dvsdk_3_01_00_10/linuxutils_2_25_02_08/packages/ti/sdo/linuxutils/cmem/lib/cmem.a470MV(cmem.o470MV):(.ARM.exidx+0x0):
undefined reference to `__aeabi_unwind_cpp_pr0'
arm-linux-gnueabi-gcc --version
arm-linux-gnueabi-gcc (Ubuntu/Linaro 4.5.2-5ubuntu2~ppa1) 4.5.2
arm-linux-gnueabi-ld --version
GNU ld (GNU Binutils for Ubuntu) 2.21.0.20110302
More full output is here (but it isn't particularly helpful due to TI's RTSC
make system's black-magic)
https://gist.github.com/925674
FYI: the MV in cmem.a470MV stands for MontaVista.
This name is hard-coded somewhere even though it's not being linked against
a MontaVista system.
I believe the 470 means that it should work with ARMv4 through ARMv7, but
I'm not positive.
My googling suggest that this is a toolchain bug and that the best way
around the issue is to create a file which defines the function as a void
dummy and include it.
http://www.codesourcery.com/archives/arm-gnu/msg03604.htmlhttp://comments.gmane.org/gmane.comp.boot-loaders.u-boot/78649http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=__aeabi_unwind_cpp_pr0
I have a script that I'll post shortly with instructions as to how to setup
TI's DVSDK with Linaro
AJ ONeal
[0] I'm not using the latest DVSDK version 4 because the paths and such are
so hard-coded for the 2009q3 version of codesourcery on ubuntu 10.04 LTS
that I don't know where to start fixing it.
===Progress===
* Patch review week.
* Looked at bootstrap issue for a while but Richard Sandiford picked
it up and sorted it out (Thanks Richard).
* Fun and games with some paperwork.
* Some backporting and testing patches. (50099 and 50186) underway.
=== Plans ===
* Clear out some of the old patches (POST_MODIFY_DISP for vfp,
BRANCH_COST ) and finish on auto-inc-dec patch from last week.
* Away for 1 day next week.
Meetings:
* 1-1s
* TCWG calls
Absences.
* 5th October -Out of Office.
* 13th - 14th October - Internal training.
* 31st Oct - 4th Nov - Linaro Summit Orlando.
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
== This week ==
* Applied patch for doing NEON high/low extraction using subregs.
Ramana pointed out that we do the same thing for insertion,
so I wrote a patch to handle that too. Both now merged into
Linaro sources.
* Looked at ARM bootstrap problem on trunk. Turned out to be
an aliasing problem. Submitted and applied patch.
* Reworked part of my SMS register-scheduling patch after feedback
from Ayal. Submitted new version upstream.
* Got SPEC2006 running on the powerpc boxes and tested one part
of my -fsched-pressure patch. Bit of a mixed bag. h264ref was
one of the worst sufferers, which was a bit worrying. I think
I'll need to make a third change too.
To recap, there are two pieces now:
1) Make -fsched-pressure honour the DFA
2) Make -fsched-pressure allow values that are live across a
loop to be spilled.
I naively hoped that (1) would be OK on its own, but h264 shows
that the current -fsched-pressure code is very conservative
when it comes to large blocks. It only considers register
deaths once there is a single remaining use; if there are two
unscheduled uses, it assumes that the register remains live
for the rest of the block.
So the problem that (1) was fixing was that -fsched-pressure was too
optimistic in terms of what it could schedule in a cycle. But with
that fixed, we seem to have too many sources of pessimism...
Richard
== GDB ==
* Worked on support for cross-platform core file generation.
* Followed up on patch to support disabling address space
randomization in gdbserver.
== GCC ==
* Followed up on patch for PR 50305.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
* Working on croos-compiling Firefox. Getting dependencies in place and
setting up the configuration file (.mozconfig). I have had the strategy to
fix one dependency at a time, picking prebuilt packages or building my self.
Michael told me at yesterday's meeting about multistrap, that could possibly
be used for fixing all dependencies at once. I will look into that next
week.
* During this process I have also spent some time reading up on cross
compilation in general and also on autoconf and the GNU build system.
Best Regards
Åsa
== String routines ==
* Got eglibc testing setup happy at last
- Note that -O3 builds generally seem to give a few more errors
that are probably worth looking at
- -march=armv6 -mthumb hit some non-thumb1 instructions (normally
non-lo registers), again worth looking at
- Cross testing to Qemu user mode often stalls, mostly on nptl
tests that abort/fail when run in system/natively
* Sent new version of eglibc/memchr patch upstream
* Now have working newlib test setup and reference set
- next step is to try adding my memchr there
== Other ==
* Testing a QEmu patch with Peter
* Looking at bug 861296 (difference in mmap layouts)
* Adding a few suggestions to the set of cpu hotplug tests.
* Dealing with the Manchester lab cold.
Short week; back on Monday
Dave
== GDB ==
* Committed hardware watchpoint support for gdbserver to mainline,
including two minor changes resulting from review comments;
backported those fixes to Linaro GDB as well.
* Implemented and tested support for disabling address space
randomization in gdbserver; patch posted for review.
* Investigated support for cross-platform core file generation.
== GCC ==
* Patch review week.
* Posted updated patch for PR 50305.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Arnd Bergmann <arnd(a)arndb.de> wrote on 08/26/2011 04:44:26 PM:
> On Thursday 25 August 2011, Russell King - ARM Linux wrote:
> >
> > Arnd, can you test this to make sure your gdb test case still works,
and
> > Mark, can you test this to make sure it fixes your problem please?
>
> Hi Russell,
>
> The patch in question was not actually from me but from Ulrich Weigand,
> so he's probably the right person to test your patch.
> I'm forwarding it in full to Uli for reference.
Hi Arnd, hi Russell,
sorry for the late reply, I've just returned from vacation today ...
I've not yet run the test, but just from reading through the patch
it seems that this will at least partially re-introduce the problem
my original patch was trying to fix.
The situation here is about GDB performing an "inferior function
call", e.g. via the GDB "call" command. To do so, GDB will:
0. [ Have gotten control of the target process via some ptrace
intercept previously, and then ... ]
1. Save the register state
2. Create a dummy frame on the stack and set up registers (PC, SP,
argument registers, ...) as appropriate for a function call
3. Restart via PTRACE_CONTINUE
[ ... at this point, the target process runs the function until
it returns to a breakpoint instruction and GDB gets control
again via another ptrace intercept ... ]
4. Restore the register state saved in [1.]
5. At some later point, continue the target process [at its
original location] with PTRACE_CONTINUE
The problem now occurs if at point [0.] the target process just
happened to be blocked in a restartable system call. For this
sequence to then work as expected, two things have to happen:
- at point [3.], the kernel must *not* attempt to restart a
system call, even though it thinks we're stopped in a
restartable system call
- at point [5.], the kernel now *must* restart the originally
interrupted system call, even though it thinks we're stopped
at some breakpoint, and not within a system call
My patch achieved both these goals, while it would seem your
patch only solves the first issue, not the second one. In
fact, since any interaction with ptrace will always cause the
TIF_SYS_RESTART flag to be *reset*, and there is no way at all
to *set* it, there doesn't appear to be any way for GDB to
achive that second goal.
[ With my patch, that second goal was implicitly achieved by
the fact that at [1.] GDB would save a register state that
already corresponds to the way things should be for restarting
the system call. When that register set is then restored in [4.],
restart just happens automatically without any further kernel
intervention. ]
One way to fix this might be to make the TIF_SYS_RESTART flag
itself visible to ptrace, so the GDB could save/restore it
along with the rest of the register set; this would be similar
to how that problem is handled on other platforms. However,
there doesn't appear to be an obvious place for the flag in
the ptrace register set ...
Bye,
Ulrich
== String routines ==
* Having got agreement on ignoring the triplet for picking the
routine, I'm just testing a patch,
but fighting a qemu setup.
* Found the binfmt binding for armeb was wrong (runs the le
version); filed bug with fix in
Dave
==GCC==
Combined report for last 2 weeks -
===Progress===
* Committed conditional compares patch to Linaro GCC 4.6
* Looking at modelling auto-inc-decs better .
* Tried patch for PR19599 and that broke bootstrap with a segfault.
Needs some re-engineering.
* Looked at the latest bootstrap failure on trunk. Still narrowing down.
* Some work on some administrative stuff
* Bit of patch review.
* Went for LLVM dev meeting.
* Release week had a few issues and helped dry-run cbuild spawns of
jobs and think I now know how to do that.
=== Plans ===
* finish looking at bootstrap failure.
* Finish auto-inc-dec patch.
* some more patch review.
* Send out LLVM dev meeting report.
Absences.
* 5th October - Out of office.
* 13th -14th October - Internal ARM training.
* 31st Oct - 4th Nov - Linaro Summit Orlando
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
(short week: 4 days)
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
== a15-system-mode-planning ==
* now complete: we have generated blueprints/roadmap cards for the TSC
for the various options
== a15-usermode-support ==
* tested udiv/sdiv implementation
* fused mac: rough idea of what needs to be done, need to get all
the fiddly details right
== omap3 upstreaming ==
* rebased and sent pullreq for various outstanding patches
== other ==
* code/design walkthrough for upstream's new memoryregion API
* working on lightning talk for pdsw doughnut session next week
* investigated compile failure building QEMU in thumb mode with debug
enabled (we're trying to use the Thumb framepointer register as a
temporary...)
* meetings: toolchain, standup, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences (to end of year):
Sep 29-Oct 07, Oct 17, Nov 21, Dec 15-Jan 03: leave
Oct 30-Nov 04: Linaro Connect Q4.11
== This week ==
* Submitted a fix for the performance regression caused by my
arm_comparison_operator patch. Applied upstream after approval
from Ramana (thanks). Will backport to Linaro towards the end
of next week if there are no reported problems.
* Went back to looking at -fsched-pressure. To recap, a colleague
ran SPEC for s390 comparing:
(a) normal -O3 based flags
(b) (a) + -fsched-pressure without my patch
(c) (a) + -fsched-pressure with my patch
(c) got the best geomean result, but there were some individual
tests for which (b) was significantly worse than (a), and for
which (c) only partly closed the gap.
Found one problem. It looks like -fsched-pressure only really
operates on the issue rate and instruction latencies; it doesn't
seem to use the DFA. This seems to be unintentional, and fixing
it showed some nice results.
Also, the -fsched-pressure patch that I wrote at Connect set the
starting pressure based on the set of registers that are both live
on entry to the block _and_ used within the enclosing loop,
This still seems to be a bit too conservative, in that it makes
the scheduler go out of its way to preserve loop invariants,
even if there are too many of them. Experimented with changing
"used" to "defined". This too seemed to be a win.
* Got access to some PowerPC GNU/Linux machines that are suitable
for running SPEC. Set up my account there and got SPEC building.
The idea is to use this to get more cross-target evidence for the
-fsched-pressure submission(s).
* Discussion about the SMS register-scheduling patches after great
feedback from Ayal. While drafting a still-unsent reply justifying
the main part of the patch, I found I was also explaining why another
part of the patch (specifically the prologue/epilogue part) was wrong.
Thought about that a bit today.
* Submitted fix for LP 641126.
== Next week ==
* More SMS register scheduling.
* More -fsched-pressure.
* Hopefully remerge the arm_comparison_operator patch with this week's fix.
Richard
* Working on getting everything in place for cross-compiling Firefox for
ARM. Trying to understand how the configuration script and make file works.
* Working on a test that will run Sunspider and extract the results. The
challenging part is that results are only presented on the page, not e.g.
written to stdout or to file. My approach to create an html file, embed the
page with the test in an iframe, and read out the results when the test is
done.
* Running SPEC2K on the Snowball board. An updated kernel solved the issue
with great variations in the test results. Some tests results look a bit
strange, so I will look at what those tests do to see what part of the
system is stressed.
Best Regards
Åsa
Hi,
* widening shifts patch - submitted upstream
* change default vector size patch - submitted to linaro-gcc
* automatic choice of vector size for basic block vectorization - testing
* vectorizer bug fixes
Next week we have New Year holiday on Wednesday (half day) and Thursday.
Ira
The Linaro Toolchain Working Group is pleased to announce the 2011.09
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2011.09-1 is the seventh release in the 4.6 series. Based
off the latest GCC 4.6.1+svn178681, it contains a range of vectoriser
and core performance improvements as well as fixing a number of
bugs.
Interesting changes include:
* Updates to 4.6.1+svn178681
* Improves performance by making better use of conditional compares
* Improves performance by properly scheduling widening multiplies
* Improves size and speed by improving constant generation in Thumb-2
* Implements support for widening multiples in toe core
* Improves vectorised code by reducing the over-promotion of intermediates
* Improves performance by reducing redundant moves between VFP and ARM
* Finishes off supporting the Android team in integrating Linaro GCC
Fixes:
* LP: #823548 Can't use -flto with skia
* LP: #823711 libvirt version 0.9.2-4ubuntu8 failed to build on armel
* LP: #827990 internal compiler error: in decode_addr_const, at varasm.c:2632
* LP: #836401 ICE on a | (b << negative-constant)
* LP: #838994 ICE building perl w/ -marm
* LP: #843775 ICE optimizing widening multiply-and-accumulate
Linaro GCC 4.5 2011.09 is the fourteenth release in the 4.5
series. Based off the latest GCC 4.5.3+svn178560, this is a
maintenance focused release.
Interesting changes in 4.5 include:
* Updates to 4.5.3+svn178560
Fixes:
* LP: #823711 libvirt version 0.9.2-4ubuntu8 failed to build on armel
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.09https://launchpad.net/gcc-linaro/+milestone/4.5-2011.09
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release page:
https://launchpad.net/gcc-linaro/4.6/4.6-2011.09https://launchpad.net/gcc-linaro/4.5/4.5-2011.09
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
I tried to bootstrap current GCC trunk and our latest gcc-linaro-4.6
in profile guided, link time optimisation, and SMS modes. The results
are here:
https://wiki.linaro.org/MichaelHope/Sandbox/PGOLTOSMSStatus1
Short story: you can't bootstrap in LTO or PGO on ARM as they run out
of memory. i686 LTO is broken on trunk and gcc-linaro-4.6. SMS is
fine in general.
I'll run these once a week and keep an eye on them. A -fwhopr instead
of -flto may help on ARM. I don't know why the PGO build runs out of
memory.
-- Michael
Måns pointed me at the IDCT throughput test that's included with
libav. I've written up a page on how to build and run it at:
https://wiki.linaro.org/MichaelHope/Sandbox/LibAvDCT
Included are results with and without the vectoriser. In all cases
the vectoriser improves things, including increasing the SIMPLE-C
version by 11 % and the peak by 17 %.
The coefficient of variance is low so the results are consistent. I
haven't investigated the benchmark itself to see if its valid - we
could be vectorising the loop overhead instead of the IDCT itself.
-- Michael
Please coordinate with Jon Masters at RedHat/Fedora and Adam Conrad at
Ubuntu/Debian on this. (Cc'ing the cross-distro list, through which the
recent ARM summit at Linux Plumbers was organized.)
Cheers,
- Michael
On Sep 16, 2011 8:41 AM, "David Gilbert" <david.gilbert(a)linaro.org> wrote:
> OK, so we seem to have agreement here that what we want is autodetect
> for eglibc and
> forget about the triplet; well technically that probably makes my life
> easier, and I don't
> think it's too hard a sell.
>
> Dave
>
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain(a)lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
* Linaro GCC
Spun 4.5 and 4.6 2011.09 GCC release tarballs. Uploaded them to
Michael's server, and kicked off the tests.
Continued work on my new constant optimization experiments. I now have
it tracking all the constants and am looking at how to detect the
optimization opportunities. So far it only calculates how exprensive it
would be to generate a value by adding to an existing constant, which is
a start at least. I'm having difficulties detecting whether changing an
insn will make it's parent (dependency-wise) obsolete, or not (and
therefore whether to count its costs - there's no problem for
instructions that overwrite an entire register, but ones that write to
portions of registers (such as MOVT) make more complex dependency
chains, and the def-use chains don't seem to be sorted into the order of
use.
* Other
Half day vacation on Thursday.
* Added testcases to Richard's micro benchmarks taken from libav.
* Discussed with Ayal the new version of the patch to support
instructions with
REG_INC_NOTE in SMS which causes bootstrap failure. I intend to debug
the bootstrap failure in order to find the cause for it.
(http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01216.html)
== String routines ==
* Tidying up bits of cortex strings for the release process
* Nailing down the behaviour of config.sub and the config systems in
gcc, binutils and eglibc
== Other ==
* A discussion on synchronisation primitives on various CPUs that
started on the gcc list
- looking at http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
- pointing out the 64bit instructions
- asking why they used isb's when neither the kernel or gcc use
them (answer the DMBs should
be fine as well, but there is some debate over which is
quicker, oh and DMBs are
converted to slower dsb's on most A9s due to an errata).
* Looking for docs on the non-core bits of current SoCs
* Extracting some denbench stats from a few months back for Ramana
About a day of non-Linaro IBM stuff.
Dave
RAG:
Red:
Amber:
Green:
NB: since qemu-linaro releases demonstrably go out on schedule
every month I'm dropping them from the milestone tables in these
reports, in favour of blueprint completion dates (usually they'll
be planned for dates coinciding with a qemu-linaro release).
Current Milestones:
|| || Planned || Estimate || Actual ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
== linaro-qemu-11.11 ==
* completed this month's release
== add-omap3-networking ==
* investigated why qemu's usb-net model didn't work on the beagle;
this was due to a bug in the usb-ohci controller model; patches
sent upstream. Fix will go into qemu-linaro 2011-10.
== a15-system-mode-planning ==
* wrote up options and my suggestion on the wiki:
https://wiki.linaro.org/PeterMaydell/QemuA15
* just need to discuss with Michael and turn this into a roadmap entry
== a15-usermode-support ==
* complete but untested implementation of UDIV and SDIV
* fused multiply-accumulate: implemented decode, and the special-cases
parts of the softfloat implementation (NaN, inf, etc); the difficult
bit of actually implementing the operation remains
== other ==
* meetings: toolchain, pdsw doughnuts, AFDS
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences (to end of year):
Sep 19, Sep 29-Oct 07, Oct 17, Nov 21, Dec 15-Jan 03: leave
Oct 30-Nov 04: Linaro Connect Q4.11
As mentioned on the standup call this morning, I've been trying to get my head
around the way different parts of the toolchain using the config scripts and the
triplets. I'd appreciate some thoughts on what the right thing to do
is, especially
since there was some unease at some of the ideas.
My aim here is to add an armv7 specific set of routines to the eglibc
ports and get this picked up
only when eglibc is built for armv7; but it's getting a bit tricky.
eglibc shares with gcc and binutils a script called config.sub (that
lives in a separate repository)
which munges the triplet into a $basic_machine and validates it for a
set of known
triplets.
So for example it has the (shell) pattern:
arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb]
to recognise triplets of the form arm- armbe- armle-
armel- armbe- armv5- armv5l- or armv5b-
It also knows more obscure things such as if you're configuring for
a netwinder it's an armv4l- system running linux - but frankly most
of that type of thing are a decade or two out of date. Note it doesn't
yet know about armv6 or armv7.
eglibc builds a search path that at the moment includes a path under
the 'ports' directory of the form
arm/eabi/$machine
where $machine is typically the first part of your triplet; however
at the moment eglibc doesn't have any ARM version specific subdirectories.
If I just added an ports/sysdeps/arm/eabi/armv7 directory it wouldn't
use it because
it searches in arm/eabi/arm if configured with the triplet arm-linux-gnueabi or
--with-cpu sets $submachine (NOT $machine) - so if you pass --with-cpu=armv7
it ends up searching
arm/eabi/arm/armv7
if you used the triplet arm-linux-gnueabi. If you had a triplet like
armel then I think
it would be searching
arm/eabi/armel/armv7
So my original patch (
http://old.nabble.com/-ARM--architecture-specific-subdirectories,-optimised…
)
did the following:
* Modified the paths searched to be arm/eabi (rather than arm/eabi/$machine)
* If $submachine hadn't been set by --with-cpu then autodetect it
from gcc's #defines
which meant that it ignored the start of the triplet and let you
specify --with-cpu=armv7
After some discussion with Joseph Myers, he's convinced me that isn't
what eglibc
is expecting (see later in the thread linked above); what it should
be doing is that
$machine should be armv7 and $submachine should be used if we wanted
say a cortex-a8 or
cortext-a9 specific version.
My current patch:
* adds armv6 and armv7 to config.sub
* adds arm/eabi/armv7 and arm/eabi/armv6t2 and one assembler
routine in there.
* If $machine is just 'arm' then it autodetects from gcc's #defines
* else if $machine is armv.... then that's still $machine
So if you use:
a triplet like arm-linux-gnueabi it looks at gcc and if that's configured
for armv7-a it searches arm/eabi/armv7
a triplet like armv7-linux-gnueabi then it searches arm/eabi/armv7
irrespective
of what gcc was configured for
a triplet like armv7-linux-gnueabi and --with-cpu=cortex-a9 then it searches
arm/eabi/armv7/cortex-a9 then arm/eabi/armv7
As far as I can tell gcc ignores the first part of the triplet, other than
noting it's arm and spotting if it ends with b for big endian; (i.e.
configuring gcc with armv4-linux-gnueabi and armv7-linux-gnueabi
ends up with the same compiler).
binutils also mostly ignores the 1st part of the triple - although is
a bit of a mess
with different parts parsing it differently (it seems to spot arm9e for some odd
reason); as far as I can tell gold will accept armbe* for big endian where as
ld takes arm*b !
If you're still reading, then the questions are:
1) Does the approach I've suggested make sense - in particular that the
machine directory chosen is based either on the triplet or where the triplet
doesn't specify the configuration of gcc; that's my interpretation of what
Joseph is suggesting.
2) Doing (1) would seem to suggest I should give config.sub armv6t2 and
some of the other complex names.
Dave
== This week ==
* Reviewed patches for the release.
* ...then broke the release. Tried to spin a new one.
* Worked on a "real" fix for bug 850099. Now in testing.
* Looked more at auto-inc-dec stuff. Saw a case that didn't behave
as I expected on the A9. The A9 TRM doesn't describe what happens
for post-indexed addressing, so I asked Ramana. Apparently the
behaviour is expected. Once I have more info, I'll try to update
the patches.
* Worked on neon-highlow-extract and neon-strided-load-extract.
Posted the three patches upstream. Nicely, the one I thought
was going to be the most controversial actually got positive
feedback from Paolo (who wrote the affected code).
Richard
== GDB ==
* Completed hardware watchpoint support for gdbserver.
* Tracked down watchpoint resource accounting regression
on GDB mainline (not present in 7.3).
* Created and published Linaro GDB 7.3-2011.09 release.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
* Running SPEC2K on the Snowball board. A fresh kernel with HIGHMEM enabled
made it possible to run the tests. Great variations in the results indicates
that something strange is going on. Turning off one of the CPU:s gives
stable result (but slow), so my current guess is that the variations are
caused by a known bug that makes one cpu run slower.
http://igloocommunity.org/bugzilla3/show_bug.cgi?id=1
The patch for this bug was not included in my kernel. Will have another go
with a kernel where the patch is included, as a background activity.
* Planned and started working on the "Adding browsing benchmarks to our
current set of tests"-activity. I will try to keep documentation up to date
here:
https://wiki.linaro.org/AsaSandahl/Sandbox/BrowsingBenchmarks
Experimenting with building Firefox in different ways, so far for x86.
Best Regards
Åsa
(bouncing to linaro-dev as it's generally interesting)
On Fri, Sep 16, 2011 at 8:17 AM, Ramana Radhakrishnan
<ramana.radhakrishnan(a)linaro.org> wrote:
> Hi,
>
> I've been looking at some of the perf regressions we've been seeing
> these days in an attempt to understand what's going on in these cases.
> While I can use perf and get more statistics and do other things to
> figure out why there are perf regressions between 2 binaries along
> with perf record and report, I wonder if it is possible to use u-boot
> to accurately measure what's going on. I would like to try and get the
> values of the performance counters between 2 program points .
>
> I am aware that there are patches that are floating around that allow
> users to set and reset the PMU counters by allowing user level access
> to it in the kernel : while that maybe useful to some I'm not sure if
> I want to take a chance with some other process getting scheduled that
> ends up getting scheduled. Even if there are parts of the kernel that
> save and restore PMU counters associated per process with across
> context switches . I'm looking for as accurate measurements as
> possible in this case and I wonder if u-boot is the best bet for this
> ( in the absence of any dedicated hardware debug / trace unit) given
> not all of us have one.
>
>
> At the minimum to do this I believe we require u-boot or some start-up code to:
>
> * Turn on i-cache and d-cache. ( The current u-boot for panda that I
> get from the linaro-uboot git repo
> git://git.linaro.org/boot/u-boot-linaro-stable.git says "Warning
> Caches turned off" when starting up ). Googling around I find a few
> patches floating around that turn on the d-cache in August from Aneesh
> at TI . We should consider getting these in at some point.
>
> * Looking in $(UBOOT_TOP)/examples/api I see that there are simple
> printf routines and simple stand-alone applications that exist which
> could be used for this purpose. The one problem with this is the fact
> that u-boot appears to require use of -ffixed-r8 for it's purposes
> which *might* mean we need these if we were to use API calls into
> standard u-boot functions .
I wonder if R8 is used in the current ARM version? There's no reason
we can't cherry pick parts such as the serial I/O out into a library
and make the app completely self contained. Skip all of the
initialisation stuff and assume the boot loader has done it for you.
> * Turn on / off speculative prefetching - I believe the kernel does
> this already for a few boards, but could this be done in u-boot just
> before it launches a test application ?
>
> * Turn on the VFP and Neon units.
>
> * Turn on unaligned access so that unaligned accesses are allowed in
> the test applications. GCC will now move towards generating unaligned
> accesses on versions of the architecture that support it, the patches
> upstream have now been approved.
>
> * Memory map / linker scripts to make sure we are putting things in
> the right places (sigh, has to be per-board).
But everything goes in RAM so you have one generic linker script and a
per board MEMORY definition. Similar to:
http://bazaar.launchpad.net/~stm32f-dev/stm32f-dev/stm32f-startup/view/head…
...but even lighter.
> We then write a set of library functions that could then look at what
> performance counters are of interest to us and track them by resetting
> them to 0 and making sure they haven't overflown.
>
> Has anyone else in the group played with u-boot before or has any
> thoughts in this direction ? I am not suggesting that we do this work
> right now but it sounds like an interesting thought of where we can
> get to with this.
My worry is that we miss turning on a feature and get results that
aren't representative. That should be easy enough to check by
baselineing against a Linux hosted run.
We can use NFS or kermit to load the programs. u-boot has a network
console which is nice when you don't have serial. This combined with
an expect script (or LAVA? Paul?) should automate the whole process.
-- Michael
Hi,
* put the sources of the libunwind android port, the patches for
debuggerd and the Android test app online
* documented things at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Outputs/LibunwindDebuggerd
* noticed differences between the old (debuggerd) and the new
(debuggerd+libunwind) backtraces
* I'm still not sure what's going on (maybe they are adding offsets
or something)
* however, the backtrace that libunwind does looks sane to me
Note: I'll be on vacation till October 7th.
Regards
Ken
Hi,
* testing widen-shifts patch on ARM
* SLP improvements:
- submitted a patch to allow not simple ivs in SLP
- committed a patch to allow read-after-read dependencies in SLP
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.09.
Linaro QEMU 2011.09 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- linux-user mode now supports the 64 bit cmpxchg kernel helpers
(only needed for applications compiled for ARMv6 or lower)
- PL111 display controller now supported; this fixes a problem
where BGR was interpreted as RGB on recent versatilepb kernels
Plus a few other minor bug fixes and the usual round of upstream
fixes and improvements.
Known issues:
- The beagle and beaglexm models still do not support USB networking;
we intend to fix this for the 2011.10 release
- There may be some problems with running multithreaded programs in
linux-user mode (LP:823902)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.09
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.3.
Linaro GDB 7.3 2011.09 is the second release in the 7.3 series. Based
off the latest GDB 7.3, it includes a number of ARM-focused bug fixes
and enhancements.
This release contains:
* Support for hardware breakpoints and watchpoints in gdbserver
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.09
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
Hi there. The 2011.09 release has been spun and is testing up well.
The 4.5 and 4.6 branches are now open so feel free to commit any
approved patches.
-- Michael