Thumb2 size optimization report
Yao Qi
yao at codesourcery.com
Wed Sep 15 14:49:34 BST 2010
* Goal
Goal of this work is to look for thumb2 code size improvements on FSF
GCC trunk.
* Methodology
** Build FSF GCC trunk w/ and wo/ hardfp, run benchmarks including
eembc, spec2000, and dhrystone, and check asm code to see if there is
any possible improvements on size.
** Get input and suggestion from ARM experts.
** Search open PRs in GCC bugzilla.
* Results
Each item has been tracked on launchpad, and is listed with some elements,
** Cause: cause of this problem is known or unknown
** Difficulty: estimation of implementation difficulty
** Recommendation: Yao's recommendation on that bug for next step
1. LP:633233 Push/pop low register rather than high register when
keeping stack alignment
As Richard E. pointed out, it was implemented in gcc-4.5 on 2009, but
Yao still can see the usage of r8 on FSF GCC trunk.
Cause: Might be a regression if problem disappears on gcc-4.5.
Difficulty: Easy. might not hard to fix a regression.
Recommendations: Fix this regression if it is.
2. LP:633243 Improve regrename to make use of low registers.
Get input from Bernd S. and Julian B. Initial implementation has been
suggested by Bernd S.
Cause: current regrename in gcc treats high and low registers equally.
Difficulty: Medium.
Recommendation: Implement it as Bernd suggested, and do benchmarking
to see how much size is improved.
3. LP:634682 Redundant uxth/sxth insn are generated
Cause: Unknown
Difficulty: Unknown
Recommendation: No recommendation so far.
4. LP:634696 Function is not inlined properly with -Os
In consumer/cjpeg/jmemmgr.c, GCC inlined out_of_memory() with -Os, so
increase code size.
Cause: Unknown.
Difficulty: Unknown
Recommendation: Educate GCC to inline carefully when -Os is turned on.
5. GCC PR40730 LP:634731 Redundant memory load
6. LP:634738 inefficient code to extract least bits from an integer value
GCC PR40697 is for thumb-1. The same problem is in thumb-2.
Cause: Unknown.
Difficulty: Medium.
Recommendation: Fix it the similar way as fixing GCC PR40697.
7. LP:634891 Replace load/store by memcpy more aggressively
Difficulty: Should be easy.
Recommendation: Fix to this problem might be "reduce threshold value
once -Os is turned on".
8. LP:637220 allocate local variables with fewer instructions
GCC PR40657 is about this kind of problem, and was fixed. The similar
prolbme exits on gcc with hardfp.
Cause: Unknown.
Difficulty: Unknown.
Recommendation: No recommendation so far.
9. GCC PR 43721 Failure to optimize (a/b) and (a%b) into single
__aeabi_idivmod call
Difficulty: Medium or easy.
Recommendation: No.
10. LP:637814 Combine add/move to add
LP:637882 Combine ldr/mov to ldr
Possible improvements have been found. No idea how to fix it yet.
Cause: Unknown.
Difficulty: Unknown.
Recommendation: No.
11. LP:638014 Replace memset by memclr when 2nd parameter is zero
Difficulty: Easy.
Recommendation: No recommendation so far.
12. LP:625233 Merge constant pools for small functions
Cause: Unknown.
Difficulty: Medium.
Recommendation: No.
13. LP:638935 Replace multiple vldr by vldm
Some vldr insns accessing consecutive address can be replaced by
single vldm. It is not about thumb2, but related to code size optimization.
Cause: Unknown.
Difficulty: Medium.
Recommendation: No.
--
Yao Qi
CodeSourcery
yao at codesourcery.com
(650) 331-3385 x739
More information about the linaro-toolchain
mailing list