Effect of SMS register move scheduling
Richard Sandiford
richard.sandiford at linaro.org
Thu Aug 25 08:17:59 UTC 2011
Revital Eres <revital.eres at linaro.org> writes:
> btw, do you also have numbers of how much SMS (hopefully) improves
> performance on top of the vectorized code?
OK, here's a comparison of:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
-fno-auto-inc-dec
vs:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec
(including the register-scheduling patch). As you can see, it's a bit
of a mixed bag.
mjpegenc is another case where SMS generates lots of spilling while the
normal scheduler doesn't.
Richard
a3dec
before: 500000 runs take 4.61447s
after: 500000 runs take 4.61377s
speedup: x1
aacsbr-1
before: 5000000 runs take 4.08304s
after: 5000000 runs take 4.37424s
speedup: x0.933
aacsbr-2
before: 5000000 runs take 3.01974s
after: 5000000 runs take 3.08987s
speedup: x0.977
aacsbr-3
before: 4000000 runs take 5.77838s
after: 4000000 runs take 5.63406s
speedup: x1.03
aes
before: 500000 runs take 24.6801s
after: 500000 runs take 16.9731s
speedup: x1.45
avs
before: 1000000 runs take 2.26315s
after: 1000000 runs take 2.23679s
speedup: x1.01
cdgraphics
before: 1000000 runs take 2.40573s
after: 1000000 runs take 2.40582s
speedup: x1
dwt
before: 2000000 runs take 9.02847s
after: 2000000 runs take 9.1022s
speedup: x0.992
dxa
before: 2000000 runs take 4.55194s
after: 2000000 runs take 4.40613s
speedup: x1.03
mjpegenc
before: 500000 runs take 3.28186s
after: 500000 runs take 7.31247s
speedup: x0.449
qtrle
before: 1000000 runs take 4.52829s
after: 1000000 runs take 4.54483s
speedup: x0.996
resample
before: 1000000 runs take 2.32559s
after: 1000000 runs take 1.91016s
speedup: x1.22
rgb2rgb-rgb24tobgr16
before: 1000000 runs take 1.15713s
after: 1000000 runs take 1.1557s
speedup: x1
rgb2rgb-rgb24tobgr32
before: 2000000 runs take 4.55701s
after: 2000000 runs take 4.55148s
speedup: x1
rgb2rgb-rgb32tobgr24
before: 2000000 runs take 3.59705s
after: 2000000 runs take 3.59683s
speedup: x1
rgb2rgb-shuffle-bytes
before: 500000 runs take 2.23944s
after: 500000 runs take 2.24091s
speedup: x0.999
rgb2rgb-yuy2toyv12
before: 500000 runs take 4.51581s
after: 500000 runs take 4.51593s
speedup: x1
rgb2rgb-yv12touyvy
before: 1500000 runs take 3.52603s
after: 1500000 runs take 3.49863s
speedup: x1.01
twinvq
before: 500000 runs take 0.446442s
after: 500000 runs take 0.452545s
speedup: x0.987
wmavoice
before: 500000 runs take 0.864716s
after: 500000 runs take 0.864685s
speedup: x1
More information about the linaro-toolchain
mailing list