Report of thumb2 tuning investigation

Yao Qi yao at
Fri Oct 22 10:30:05 UTC 2010

The gaol and plan of investigation has been described in [1]

In the plan, this task is divided into three parts, 1) patch backport,
2) regression fix, and 3) exploration and study other ARM compilers.
This report follow the same manner.

1.  Patch backport.
8 patches are listed in [1].  Backport them to Linaro 4.5 tree will
improve speed performance.
Action/Recommendation: Backport them if speed improves.  These patches
are ones that I think they *should* improve speed, but "performance
surprise" is not impossible.

2.  Regression fix.
So far (until r99399), Linaro GCC 4.5 is slower than FSF GCC 4.5.0 on
some EEMBC benchmarks.  Performance regression is introduced by four
commits, r99324,r99330,r99369,r99380, see details in [2].
Action/Recommendation: Figure out why speed regression is introduced,
and try to fix it.
One cent here is that how to avoid speed regression.  I do believe that
sometimes regression is unavoidable, but it is better if can track them,
and keep them manageable.

3.  Exploration and study other ARM compilers.
In this part, I don't find any possible thumb-2 specific improvements.
However, loop optimization and instruction scheduling should be improved
on ARM. (This statement may be true to all ports, or even all compilers)

Some tickets are opened for this part,
LP:660644 Missed optimization opportunities
LP:662692 Inner loop in autcor00 can be optimized better
LP:656957 LP:645267 Improve code generation on switch statement
LP:663793 Tune Swing Modulo Scheduling or Selective Scheduling for ARM
LP:656373 Try -fsched-pressure for ARM
I have to admit that instruction scheduling is quite hard, but if we can
do something here, that will be great.  I've put it in
"performance-insdie-gcc" session on UDS.  Let us talk about it a little
there next week.

During this investigation, I also find LTO or "whole-program
optimization" is useful to some EEMBC benchmarks. (I didn't run LTO/WPO
at all, but I got this when read source of benchmarks)

[1] Plan of CS304: Thumb2 tuning investigation.

Yao Qi
yao at
(650) 331-3385 x739

More information about the linaro-toolchain mailing list