On 20 October 2011 18:27, Christian Robottom Reis kiko@linaro.org wrote:
- Do we know how much better Thumb-2 actually is, in practice? It's easy for us to confirm this on Android; what do the numbers and feel of the system tell us?
I did some tests comparing Libav built for ARM and Thumb-2. The Thumb-2 build has 18% smaller code size than the ARM build. Data size is of course unchanged. The overall size reduction of text+data+bss is 10%. Benchmarking on a Cortex-A9 (Panda), the Thumb-2 build is 1-3% slower in most of my test cases, only one test being faster by 1%. In these tests, the hand-written assembly code was enabled (it can be built as ARM or Thumb-2).
This is of course highly specialised code so the results are not generally applicable. Nevertheless, I would expect similar results from other compute-intensive applications.
- What are the downsides to using Thumb-2 in general? Do we have anecdotes or threads that talk about bad experiences or blockers in the transition?
The r1pX versions of Cortex-A8 had a few Thumb-2 related errata that caused a bit of grief until they were properly understood and worked around. Some of the workarounds required on these core revisions have a negative impact on performance (extra invalidations of some branch prediction buffers etc).
- If it's so great, how could we lead a wide-ranging transition to Thumb2 becoming the standard ISA for modern v7 applications, including Android, Yocto and anything else relevant that runs on a Cortex A?
The space savings provided by Thumb-2 only matter if the available memory (either RAM or non-volatile storage) is almost fully utilised, which is not the typically the case on the type of systems we are focusing on (Android and desktop distributions), where I strongly doubt code size is the major contributor to memory usage.
One real benefit not mentioned in the quoted blurb is possibly reduced startup times for applications when they are loaded from disk/flash. This could make a case for building things started on system boot as Thumb-2 in order to speed the boot process.
The promised speed gains from Thumb-2 are only possible as a result of better I-cache utilisation due to reduced code size. As seen in the Libav case, a 20% reduction is code is realistic. For this to give a significant speed boost, the execution pattern would have to be such that reducing the instruction working set by 20% allows it to fit within the 16-32k (typical) I-cache thus reducing thrashing. For instruction working sets outside this fairly narrow range, switching to Thumb-2 would have little impact on performance. I have no numbers to go by here, but my feeling is that the number of realistic workloads that would benefit here is fairly small.
In light of these observations, I do not think pushing for either instruction set to be applied system-wide is proper. Instead, each application/library should be built using whichever gives it best performance. There is no problem mixing the instruction sets.