Thumb2 code size improvements

Yao Qi yao.qi at linaro.org
Thu Sep 9 16:22:41 BST 2010


Yao Qi wrote:
> Hi,
> We are looking for some possible improvements and optimizations on
> thumb2 code size.  Currently, I am running some benchmarks with
> compilation flag "-Os -march=armv7-a -mthumb", and hope to find some
> thing interesting that we can improve.  Beside that, do you have some
> ideas on this topic? or do you have some observations on thumb2 code
> that we may probably improve the size?
> 
> Any thoughts on this are appreciated.

I found some new possible improvements.  Your comments on them are
welcome.  See more details in
https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SizeOptimize

10.  Replace multiple vldr by vldm
Observed in bezier01float/bez.o,
   8:   f100 0438       add.w   r4, r0, #56     ; 0x38
   c:   b085            sub     sp, #20
   e:   2600            movs    r6, #0
  10:   e03d            b.n     8e <interpolatePoints+0x8e>
  12:   e954 2302       ldrd    r2, r3, [r4, #-8]
  16:   2500            movs    r5, #0
  18:   ed14 ab0e       vldr    d10, [r4, #-56] ; 0xffffffc8 // <--
  1c:   ed14 bb0c       vldr    d11, [r4, #-48] ; 0xffffffd0 // <--
  20:   ed14 cb0a       vldr    d12, [r4, #-40] ; 0xffffffd8 // <--
  24:   ed14 db08       vldr    d13, [r4, #-32] ; 0xffffffe0 // <--
  28:   e9cd 2300       strd    r2, r3, [sp]
  2c:   ed14 eb06       vldr    d14, [r4, #-24] ; 0xffffffe8 // <--

These vldr instructions can be replaced by one vldm.

11. Replace str/ldr by memcpy
Observed in bezier01fixed/pointio.o:outputPoints()
00000000 <outputPoints>:
   0:   e92d 4ff0       stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
   4:   4604            mov     r4, r0
   6:   b089            sub     sp, #36 ; 0x24
   8:   2600            movs    r6, #0
   a:   460f            mov     r7, r1
   c:   e025            b.n     5a <outputPoints+0x5a>
   e:   68e3            ldr     r3, [r4, #12]
  10:   2500            movs    r5, #0
  12:   e894 0e00       ldmia.w r4, {r9, sl, fp}
  16:   9303            str     r3, [sp, #12]
  18:   6923            ldr     r3, [r4, #16]
  1a:   9304            str     r3, [sp, #16]
  1c:   6963            ldr     r3, [r4, #20]
  1e:   9305            str     r3, [sp, #20]
  20:   69a3            ldr     r3, [r4, #24]
  22:   9306            str     r3, [sp, #24]
  24:   69e3            ldr     r3, [r4, #28]
  26:   9307            str     r3, [sp, #28]
code size will be smaller if we replace ldr/str by memcpy().

12.  uxth/sxth
Observed in automotive/idctrn01/bmark.c
short
unPack( unsigned char c )
{
    /* Only want lower four bit nibble */
    c = c & (unsigned char)0x0F ;

    if( c > 7 ) {
        /* Negative nibble */
        return( ( short )( c - 16 ) ) ;
    }
    else
    {
        /* positive nibble */
        return( ( short )c ) ;
    }
}

GCC produces code like this,
00000024 <unPack>:
  24:   f000 000f       and.w   r0, r0, #15
  28:   2807            cmp     r0, #7
  2a:   d901            bls.n   30 <unPack+0xc>
  2c:   3810            subs    r0, #16
  2e:   b280            uxth    r0, r0 <--[1]
  30:   b200            sxth    r0, r0 <--[2]
  32:   4770            bx      lr

Are instruction [1] and [2] redundant?  Can we remove these two
instructions?  If they are redundant, we can remove them safely.





More information about the linaro-toolchain mailing list