Thumb2 code size improvements
Yao Qi
yao.qi at linaro.org
Thu Sep 9 16:22:41 BST 2010
Yao Qi wrote:
> Hi,
> We are looking for some possible improvements and optimizations on
> thumb2 code size. Currently, I am running some benchmarks with
> compilation flag "-Os -march=armv7-a -mthumb", and hope to find some
> thing interesting that we can improve. Beside that, do you have some
> ideas on this topic? or do you have some observations on thumb2 code
> that we may probably improve the size?
>
> Any thoughts on this are appreciated.
I found some new possible improvements. Your comments on them are
welcome. See more details in
https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SizeOptimize
10. Replace multiple vldr by vldm
Observed in bezier01float/bez.o,
8: f100 0438 add.w r4, r0, #56 ; 0x38
c: b085 sub sp, #20
e: 2600 movs r6, #0
10: e03d b.n 8e <interpolatePoints+0x8e>
12: e954 2302 ldrd r2, r3, [r4, #-8]
16: 2500 movs r5, #0
18: ed14 ab0e vldr d10, [r4, #-56] ; 0xffffffc8 // <--
1c: ed14 bb0c vldr d11, [r4, #-48] ; 0xffffffd0 // <--
20: ed14 cb0a vldr d12, [r4, #-40] ; 0xffffffd8 // <--
24: ed14 db08 vldr d13, [r4, #-32] ; 0xffffffe0 // <--
28: e9cd 2300 strd r2, r3, [sp]
2c: ed14 eb06 vldr d14, [r4, #-24] ; 0xffffffe8 // <--
These vldr instructions can be replaced by one vldm.
11. Replace str/ldr by memcpy
Observed in bezier01fixed/pointio.o:outputPoints()
00000000 <outputPoints>:
0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
4: 4604 mov r4, r0
6: b089 sub sp, #36 ; 0x24
8: 2600 movs r6, #0
a: 460f mov r7, r1
c: e025 b.n 5a <outputPoints+0x5a>
e: 68e3 ldr r3, [r4, #12]
10: 2500 movs r5, #0
12: e894 0e00 ldmia.w r4, {r9, sl, fp}
16: 9303 str r3, [sp, #12]
18: 6923 ldr r3, [r4, #16]
1a: 9304 str r3, [sp, #16]
1c: 6963 ldr r3, [r4, #20]
1e: 9305 str r3, [sp, #20]
20: 69a3 ldr r3, [r4, #24]
22: 9306 str r3, [sp, #24]
24: 69e3 ldr r3, [r4, #28]
26: 9307 str r3, [sp, #28]
code size will be smaller if we replace ldr/str by memcpy().
12. uxth/sxth
Observed in automotive/idctrn01/bmark.c
short
unPack( unsigned char c )
{
/* Only want lower four bit nibble */
c = c & (unsigned char)0x0F ;
if( c > 7 ) {
/* Negative nibble */
return( ( short )( c - 16 ) ) ;
}
else
{
/* positive nibble */
return( ( short )c ) ;
}
}
GCC produces code like this,
00000024 <unPack>:
24: f000 000f and.w r0, r0, #15
28: 2807 cmp r0, #7
2a: d901 bls.n 30 <unPack+0xc>
2c: 3810 subs r0, #16
2e: b280 uxth r0, r0 <--[1]
30: b200 sxth r0, r0 <--[2]
32: 4770 bx lr
Are instruction [1] and [2] redundant? Can we remove these two
instructions? If they are redundant, we can remove them safely.
More information about the linaro-toolchain
mailing list