All,
In the Toolchain Working Group Mans has been doing some examination of SPEC
2000 and SPEC 2006 to see what C Library (glibc) routines impact performance
the most, and are worth tuning.
This has come up with two areas we consider worthy of further investigation:
1) malloc performance
2) Floating-point rounding functions.
This email is interested with the first of these.
Analysis of malloc shows large amounts of time is spent in executing
synchronization primitives even when the program under test is single-threaded.
The obvious 'fix' is to remove the synchronization primitives which will
give a performance boost. This is, of course, not safe and will require
reworking malloc's algorithms to be (substantially) synchronization free.
A quick Google suggests that there are better performing algorithms
available (TCMalloc, Lockless, Hoard, &c), and so changing glibc's algorithm
is something well worth investigating.
Currently we see around 4.37% of time being spent in libc for the whole of
SPEC CPU 2006. Around 75% of that is in malloc related functions (so about
3.1% of the total). One benchmark however spends around 20% of its time in
malloc. So overall we are looking at maybe 1% improvement in the SPEC 2006
score, which is not large given the amount of effort I estimate this is
going to require (as we have to convince the community we have made
everyone's life better).
So before we go any further I would like to see what the view of LEG is
about a better malloc. My questions boil down to:
* Is malloc important - or do server applications just implement their own?
* Do you have any benchmarks that stress malloc and would provide us with
some more data points?
But any and all comments on the subject are welcome.
Thanks,
Matt
--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
Greetings,
I'm using the Linaro tool chain with Eclipse (Juno) (under Windows) and
openOCD to write firmware for an STM32F20x based design (using an ST-Link2
debugger).
In general, that all works fairly well.
The part I'm having problems with is debugging (step-in, etc) from Eclipse.
The execution flow seems chaotic when single stepping through C code: it
skips statements, it jumps into the middle of a function, then returns to
the start of a function, it loops over certain statements (while there's no
loop in the code), etc. (It's close to useless).
I have seen this behavior with other IDE's and tool chains when code was
built with optimization turned on.
However, I specify 'no optimization' (-O0) when I build my code.
My questions:
a) Is there some implicit optimization being done in the compiler, even
though I tell it not to do so, which may affect proper debugging?
b) Are other people using Eclipse (Juno) and are they seeing the same
issue? Are there any known ways to fix this chaotic debugger behavior?
Kind regards,
~ Paul Claessen
== Progress ==
* Performed investigation on gdb7.6 test suite failures and untested test
cases.
* Updated JIRA enteries with test suite failures on arm to track progress.
* Wrote an automation script for selection of individual test cases from a
text file.
* Got the gdb.dwarf2 test suite patch reviewed from Matt and Will.
* Day off on Friday.
== Plan ==
* Finish up initial investigation on gdb7.6 test suite results.
* Complete updates of JIRA enteries after investigation on test suite
results in complete.
* Start work on integration of different testing scripts written in past
couple of months.
* Send gdb.dwarf2 test suite patch upstream.
Hi Richard,
After adding some new ops, I can keep the conditional compare to the
end of tree-level optimization. As tests, I expand conditional compare
to BIT_AND_EXPR/BIT_IOR_EXPR, which still depend on later "combine"
pass to combine them.
Is it possible to expand it to *cmp_and/*cmp_ior like patterns?
What's the expected RTL representation for conditional compare after
expand and before combine?
Thanks!
-Zhenqiang
== Progress ==
4 day week was ill on Friday.
* AARCH64 gprof –c option support.
Completed and submitted patch in binutils and got it upstreamed.
http://sourceware.org/ml/binutils/2013-05/msg00265.htmlhttp://sourceware.org/ml/binutils/2013-05/msg00264.html
The committer has changed the logic and seems it is not working for
backward addresses. Sent him an offline mail to correct it.
http://sourceware.org/ml/binutils/2013-05/msg00279.html
* AARCH64 gprof –glibc support.
Submitted patch and got is approved
http://sourceware.org/ml/libc-ports/2013-05/msg00098.html
* AARCH64 gprof – gcc support
Tested with removing test clause as suggested by Marcus (Aarch64 maintainer).
Marcus wants me to send two separate patches. Will be posting it.
* AARCH64 testing
Got boot strap failure with GCC 4.9 trunk on open embedded image with
glibc changes.
To confirm it is not a regression I ran with the openembedded image
available at Linaro download site.
Bootstrap failure can be reproduced. I am documenting the steps to
increase the size of image and also
Steps to reproduce boot strap failure in the model.
== Plan ==
* Continue bootstrap testing and push patches to GCC
* libssp support for Aarch64
* Linaro connect travel prep.
Misc
------
Planned leave 29-5-2013.
== Issues ==
* None.
== Progress ==
* LRA on ARM and AArch64:
- Reduced the ARM test case.
- The issue occurs during the constraints alternatives choice.
- Debug still ongoing.
== Plan ==
* LRA, LRA and LRA
== Progress ==
VRP based zero/sign extension
- Got some review comments for the patch and started addressing them
- split the patch into two; 1. propagating value range and 2. do rtl
expansion
- testing in progress
specfp regression
- Benchmarked spec2k for A15 with trunk and couldn't reproduce it.
- benchmarked spec2k for A9 with trunk and couldn't reproduce between
24th and 28th
== Leave ==
- Monday off Sick
== Plan ==
VRP based zero/sign extension
- Send patch for review
specfp regression
- benchmark for trunk version on 23rd
- Resolve regression
Sorry, this covers the last two weeks, not just one.
== Progress ==
* Created database schema for DejaGnu test results
* Created data schema for benchmarks
* Wrote scripts to convert benchmark and test data into a form that
can be imported into a database, added them to DejaGnu branch
* Imported all historical benchmark data
* Imported most historical test results (GCC still importing)
* Did some experimental graphs of test results
* Read lots of web pages to come up to speed on Linaro, registered
for lots of web pages and accounts
* Learned about Cbuild and LAVA
* Started on Cbuild v2
* Installed Ubuntu on Chromebook
== Plan ==
* Write Cbuild 2 design doc
* Continue work on Cbuild v2 to be able to use it for the June release
* Get remote testing working with Chromebook & foundation model
* More support tasks resulting from move off launchpad
- rob -
Greetings,
I'm using the Linaro tool chain with Eclipse (Juno) (under Windows) and
openOCD to write firmware for an STM32F20x based design (using an ST-Link2
debugger).
In general, that all works fairly well.
The part I'm having problems with is debugging (step-in, etc) from Eclipse.
The execution flow seems chaotic when single stepping through C code: it
skips statements, it jumps into the middle of a function, then returns to
the start of a function, it loops over certain statements (while there's no
loop in the code), etc. (It's close to useless).
I have seen this behavior with other IDE's and tool chains when code was
built with optimization turned on.
However, I specify 'no optimization' (-O0) when I build my code.
My questions:
a) Is there some implicit optimization being done in the compiler, even
though I tell it not to do so, which may affect proper debugging?
b) Are other people using Eclipse (Juno) and are they seeing the same
issue? Are there any known ways to fix this chaotic debugger behavior?
Kind regards,
~ Paul Claessen