Hi,
We are looking for some possible improvements and optimizations on
thumb2 code size. Currently, I am running some benchmarks with
compilation flag "-Os -march=armv7-a -mthumb", and hope to find some
thing interesting that we can improve. Beside that, do you have some
ideas on this topic? or do you have some observations on thumb2 code
that we may probably improve the size?
Any thoughts on this are appreciated.
Yao
I think that it is easier to describe situation in email then on irc.
Currently there are 4 packages related to cross compilation support:
- armel-cross-toolchain-base (a-c-t-base in short)
- gcc-4.4-armel-cross
- gcc-4.5-armel-cross
- gcc-defaults-armel-cross
Each of them got into archive but they need to be updated to get installable
packages.
Status of each package:
1. a-c-t-base is at 1.47 in archive and was built from gcc-4.5-source
4.5.1-6ubuntu1 version. This package is used to bootstrap armel cross
toolchain and generates:
- binutils-arm-linux-gnueabi (from binutils-source)
- libc6(-dev,-dbg)-armel-cross (from eglibc-source)
- linux-libc-dev-armel-cross (from linux-source-2.6.35)
- gcc-4.5-arm-linux-gnueabi-base, libgcc1(-dbg)-armel-cross (from
gcc-4.5-source)
libgcc1* packages have /usr/share/doc/ directories as symlinks to
/usr/share/doc/gcc-4.5-arm-linux-gnueabi-base/
I have a version which does not provide gcc-4.5-arm-linux-gnueabi-base
package, libgcc(-dbg)-armel-cross depends on gcc-4.5-base and have
/usr/share/doc/ directories pointing into gcc-4.5-base one. Need to fix
this symlink by providing those files in libgcc1 package instead.
2. gcc-4.4-armel-cross is at 1.36 in archive and was built with gcc-4.4-source
4.4.4-14ubuntu4 version. This package provides compilers,
libstc++6-4.4-(dev,dbg,pic)-armel-cross, libmudflap0-4.4-dev-armel-cross
and gcc-4.4-arm-linux-gnueabi-base packages.
I have 1.38 version ready to upload which fixes #637454 #640298 bugs.
3. gcc-4.5-armel-cross is at 1.35 in archive and was built with gcc-4.5-source
4.5.1-7ubuntu1 version. This package provides compilers and runtime
libraries. But it does not provide libgcc1(-dbg)-armel-cross and
gcc-4.5-arm-linux-gnueabi-base because they are in a-c-t-base source
package. All resulting packages have /usr/share/doc/ directories pointing
into gcc-4.5-arm-linux-gnueabi-base one which is policy violation.
I have 1.37 version ready to upload which fixes #637454 #640298 bugs and
provides gcc-4.5-arm-linux-gnueabi-base package so policy violation is
removed.
4. gcc-defaults-armel-cross is at 1.3 in archive and does not require any
changes.
Main problem is that packages generated from gcc-4.5-source are split into two
packages: armel-cross-toolchain-base (libgcc1(-dbg)-armel-cross) and
gcc-4.5-armel-cross (all the rest). This was required to allow to bootstrap
cross compiler but gives problems when one is built with other version of
gcc-4.5-source then other - resulting packages are not installable (we have it
now in archive). It is also a thing which Matthias does not like and I
understand it. For now my only solution is to build both with one version of
gcc-4.5-source.
What are your opinions?
http://marcin.juszkiewicz.com.pl/download/ubuntu/ is download link for
mentioned versions.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
xf. http://lists.linaro.org/pipermail/linaro-toolchain/2010-August/000069.html
> It is not upstreamable due to copyright issues, but we have a policy
> that we can keep such patches, if we wish.
I wrote this patch. If I am the copyright issue, then there is no issue.
I have a copyright assignment for all my GCC work to the FSF. That
assignment also covers the patch in the e-mail stored at
http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00199.html. I consider
copyright to all my patches assigned to the FSF if I have submitted
the patches to gcc-patches(a)gcc.gnu.org, or attached them to a Problem
Report in GCC bugzilla, or both.
The only reason why this patch for GIMPLE PRE is not in the FSF GCC
already, is that I just never cared enough to pursue it. GCC is just a
hobby for me, and experimenting with ideas is fun. Doing all the
required testing for inclusion in the FSF GCC is not fun and it costs
time that I usually can't find. I am just too busy with other things
to clear off this and other pending patches/ideas from my TODO list
:-)
If you wish to submit this patch for the FSF GCC, please feel free to
do so. In fact, I'd encourage you to do so. Likewise for my patch for
e.g. http://gcc.gnu.org/PR20070, and for the GIMPLE hoisting pass.
Ciao!
Steven
Hi,
about the status of binutils testsuite Thumb coverage (CS204 in the
workplan), I have filed two Launchpad bugs:
#640263: Testsuite coverage: Thumb-2 VFP/NEON encodings
https://bugs.launchpad.net/binutils-linaro/+bug/640263
#640272: Testsuite coverage: Thumb relocations
https://bugs.launchpad.net/binutils-linaro/+bug/640272
To summarize: I currently do not see any testing of Thumb-2 VFP/NEON
encodings; Thumb mode relocations are also only barely tested in the ld
testsuite.
Also, please inform if there are any other areas of binutils Thumb
testing that may be of concern to Linaro.
Thanks,
Chung-Lin
* Goal
Goal of this work is to look for thumb2 code size improvements on FSF
GCC trunk.
* Methodology
** Build FSF GCC trunk w/ and wo/ hardfp, run benchmarks including
eembc, spec2000, and dhrystone, and check asm code to see if there is
any possible improvements on size.
** Get input and suggestion from ARM experts.
** Search open PRs in GCC bugzilla.
* Results
Each item has been tracked on launchpad, and is listed with some elements,
** Cause: cause of this problem is known or unknown
** Difficulty: estimation of implementation difficulty
** Recommendation: Yao's recommendation on that bug for next step
1. LP:633233 Push/pop low register rather than high register when
keeping stack alignment
As Richard E. pointed out, it was implemented in gcc-4.5 on 2009, but
Yao still can see the usage of r8 on FSF GCC trunk.
Cause: Might be a regression if problem disappears on gcc-4.5.
Difficulty: Easy. might not hard to fix a regression.
Recommendations: Fix this regression if it is.
2. LP:633243 Improve regrename to make use of low registers.
Get input from Bernd S. and Julian B. Initial implementation has been
suggested by Bernd S.
Cause: current regrename in gcc treats high and low registers equally.
Difficulty: Medium.
Recommendation: Implement it as Bernd suggested, and do benchmarking
to see how much size is improved.
3. LP:634682 Redundant uxth/sxth insn are generated
Cause: Unknown
Difficulty: Unknown
Recommendation: No recommendation so far.
4. LP:634696 Function is not inlined properly with -Os
In consumer/cjpeg/jmemmgr.c, GCC inlined out_of_memory() with -Os, so
increase code size.
Cause: Unknown.
Difficulty: Unknown
Recommendation: Educate GCC to inline carefully when -Os is turned on.
5. GCC PR40730 LP:634731 Redundant memory load
6. LP:634738 inefficient code to extract least bits from an integer value
GCC PR40697 is for thumb-1. The same problem is in thumb-2.
Cause: Unknown.
Difficulty: Medium.
Recommendation: Fix it the similar way as fixing GCC PR40697.
7. LP:634891 Replace load/store by memcpy more aggressively
Difficulty: Should be easy.
Recommendation: Fix to this problem might be "reduce threshold value
once -Os is turned on".
8. LP:637220 allocate local variables with fewer instructions
GCC PR40657 is about this kind of problem, and was fixed. The similar
prolbme exits on gcc with hardfp.
Cause: Unknown.
Difficulty: Unknown.
Recommendation: No recommendation so far.
9. GCC PR 43721 Failure to optimize (a/b) and (a%b) into single
__aeabi_idivmod call
Difficulty: Medium or easy.
Recommendation: No.
10. LP:637814 Combine add/move to add
LP:637882 Combine ldr/mov to ldr
Possible improvements have been found. No idea how to fix it yet.
Cause: Unknown.
Difficulty: Unknown.
Recommendation: No.
11. LP:638014 Replace memset by memclr when 2nd parameter is zero
Difficulty: Easy.
Recommendation: No recommendation so far.
12. LP:625233 Merge constant pools for small functions
Cause: Unknown.
Difficulty: Medium.
Recommendation: No.
13. LP:638935 Replace multiple vldr by vldm
Some vldr insns accessing consecutive address can be replaced by
single vldm. It is not about thumb2, but related to code size optimization.
Cause: Unknown.
Difficulty: Medium.
Recommendation: No.
--
Yao Qi
CodeSourcery
yao(a)codesourcery.com
(650) 331-3385 x739
Hi there. I've always wanted to mix this:
http://www.futurlec.com/ET-STM32_Stamp.shtml
with some of this:
http://bit.ly/cD0JPS
to control my one of these:
http://www.traxxas.com/products/electric/rustler2006/gallery/3705-3qrtr-Bla…
and it sounds like a good opportunity to dogfood the Linaro toolchain
at the same time. What's the best way to set up a Cortex-M3 toolchain
with an appropriate newlib and libgcc?
A wrapper script works fine but I need a way of recompiling libgcc for
the Cortex-M series. I'd love to get a arm-none-eabi toolchain
package out of this that others could use. Could I re-work the cross
packaging to use newlib and change the configure flags instead? Are
there existing Debianised cross packages that I could reuse?
Ta,
-- Michael
Hi Andrew. Well, the builds are done and they're OK. I've added the
ability to compare against an explicit release to make checking
regressions easier.
4.4 results are here:
http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs…
i686 and x86_64 have not regressed since 2010.08.
On arm, and ignoring the limits test, 2010.09 adds a failure on
gcc.c-torture/compile/991026-2.c. According to the log the run timed
out but I can't reproduce it.
4.5 results are here:
http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs…
i686 has not regressed since 2010.08. x86_64 fails on
gcc.target/i386/wmul-1.c, but this is a new tests for new features and
are not a regression against 4.5.1.
arm is messier. The following new failures exist:
Vectoriser related:
* g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorized 1 loops" 1
* g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
* gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect
"vectorized 1 loops" 1
* gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect
"vectorized 1 loops" 1
* gcc.dg/vect/vect-reduc-dot-s16b.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-1a.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-1b.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-1c.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-2a.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-2b.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c scan-tree-dump-times
vect "vectorized 1 loops" 0
Others:
* gcc.target/arm/neon-load-df0.c scan-assembler vmov.i32[
\t]+[dD][0-9]+, #0\n
* gcc.target/arm/synchronize.c scan-assembler __sync_synchronize
neon-load-df0 is a new test. synchronize.c is an incorrect test as
the compiler now correctly uses the dmb instruction.
Your thoughts?
-- Michael
I would like to announce that my work on armel cross toolchain got to the very
nice point - all packages are available from PPA.
What does it mean to you?
1. no "are you sure to install those unverified packages" messages from APT
2. ability to easily rebuild toolchain on own machines
So if you used my repository from people.canonical.com then please switch to
PPA one:
add-apt-repository ppa:hrw/armel-cross-compilers
Old repository will be available for some time but will not get any updates.
Next step: merging those packages into Maverick release.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
I've been checking over the benchmarks as a lead up to the 2010.09
release. We're in a good way compared to both 4.4.4 and 4.5.1 for
most non-trivial tests.
* pybench is 10.9 % faster than 4.4.4 and 7.7 % faster than 4.5.1.
* linpack is 46.4 % faster than 4.4.4 and the same as 4.5.1.
* ffmpeg h.264 video decode (with hand written assembler versions
turned off) is 15.4 % faster than 4.4.4 and 1.2 % faster than 4.5.1.
All results are statistically invalid and against poor workloads, but
I'll work on that.
See http://ex.seabright.co.nz/helpers/benchcompare for more.
-- Michael