Effect of SMS register move scheduling

Richard Sandiford richard.sandiford at linaro.org
Thu Aug 25 08:17:59 UTC 2011


Revital Eres <revital.eres at linaro.org> writes:
> btw, do you also have numbers of how much SMS (hopefully) improves
> performance on top of the vectorized code?

OK, here's a comparison of:

    -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
    -fno-auto-inc-dec

vs:

    -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
    -fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec

(including the register-scheduling patch).  As you can see, it's a bit
of a mixed bag.

mjpegenc is another case where SMS generates lots of spilling while the
normal scheduler doesn't.

Richard


a3dec
  before:  500000 runs take 4.61447s
  after:   500000 runs take 4.61377s
  speedup: x1
aacsbr-1
  before:  5000000 runs take 4.08304s
  after:   5000000 runs take 4.37424s
  speedup: x0.933
aacsbr-2
  before:  5000000 runs take 3.01974s
  after:   5000000 runs take 3.08987s
  speedup: x0.977
aacsbr-3
  before:  4000000 runs take 5.77838s
  after:   4000000 runs take 5.63406s
  speedup: x1.03
aes
  before:  500000 runs take 24.6801s
  after:   500000 runs take 16.9731s
  speedup: x1.45
avs
  before:  1000000 runs take 2.26315s
  after:   1000000 runs take 2.23679s
  speedup: x1.01
cdgraphics
  before:  1000000 runs take 2.40573s
  after:   1000000 runs take 2.40582s
  speedup: x1
dwt
  before:  2000000 runs take 9.02847s
  after:   2000000 runs take 9.1022s
  speedup: x0.992
dxa
  before:  2000000 runs take 4.55194s
  after:   2000000 runs take 4.40613s
  speedup: x1.03
mjpegenc
  before:  500000 runs take 3.28186s
  after:   500000 runs take 7.31247s
  speedup: x0.449
qtrle
  before:  1000000 runs take 4.52829s
  after:   1000000 runs take 4.54483s
  speedup: x0.996
resample
  before:  1000000 runs take 2.32559s
  after:   1000000 runs take 1.91016s
  speedup: x1.22
rgb2rgb-rgb24tobgr16
  before:  1000000 runs take 1.15713s
  after:   1000000 runs take 1.1557s
  speedup: x1
rgb2rgb-rgb24tobgr32
  before:  2000000 runs take 4.55701s
  after:   2000000 runs take 4.55148s
  speedup: x1
rgb2rgb-rgb32tobgr24
  before:  2000000 runs take 3.59705s
  after:   2000000 runs take 3.59683s
  speedup: x1
rgb2rgb-shuffle-bytes
  before:  500000 runs take 2.23944s
  after:   500000 runs take 2.24091s
  speedup: x0.999
rgb2rgb-yuy2toyv12
  before:  500000 runs take 4.51581s
  after:   500000 runs take 4.51593s
  speedup: x1
rgb2rgb-yv12touyvy
  before:  1500000 runs take 3.52603s
  after:   1500000 runs take 3.49863s
  speedup: x1.01
twinvq
  before:  500000 runs take 0.446442s
  after:   500000 runs take 0.452545s
  speedup: x0.987
wmavoice
  before:  500000 runs take 0.864716s
  after:   500000 runs take 0.864685s
  speedup: x1



More information about the linaro-toolchain mailing list