NEON power consumption
vishwanath.sripathy at linaro.org
Tue Nov 30 09:34:29 UTC 2010
On Tue, Nov 30, 2010 at 2:15 AM, Michael Hope <michael.hope at linaro.org> wrote:
> On Tue, Nov 30, 2010 at 12:37 AM, Dave Martin <dave.martin at linaro.org> wrote:
>> On Sun, Nov 28, 2010 at 10:28 PM, Michael Hope <michael.hope at linaro.org> wrote:
>>> I sat down and measured the power consumption of the NEON unit on an
>>> OMAP3. Method and results are here:
>>> The board takes 2.37 W and the NEON unit adds an extra 120 mW.
>>> Assuming the core takes 1 W, then the code needs to run 12 % faster
>>> with NEON on to be a net power win.
>>> Note that the results are inaccurate but valid enough.
>> Just to play devil's advocate... the results will differ, perhaps
>> significantly, between SoCs of course.
>> In terms of the amount of energy required to perform a particular
>> operation (i.e., at the microbenchmark level) I agree with your
>> conclusion. However, in practice I suspect this isn't enough. I'm
>> not familiar with exactly when NEON is likely to get turned on and
>> off, but you need to factor in the behaviour of the OS--- if you
>> accelerate a DSP operation which is used a few dozen times per
>> timeslice, NEON will be used for only a tiny proportion of the time it
>> is used, because once NEON is on, it probably stays on at least until
>> the interrupt, and probably until the next task switch. With the
>> kernel configured for dynamic timer tick, this can get even more
>> exaggerated, since the rescheduling frequency may drop.
>> The real benefits, in performance and power, therefore come in
>> operations which dominate the run-time of a particular process, such
>> as intensive image handling or codec operations. NEON in
>> widely-dispersed but sporadically used features (such as
>> general-purpose library code) could be expected to come at a net power
>> cost. If you use NEON for memcpy for example, you will basically
>> never be able to turn the NEON unit off. That's unlikely to be a win
>> overall, since even if you now optimise all the code in the system for
>> NEON, you're unlikely to see a significant performance boost-- NEON
>> simply isn't designed for accelerating general-purpose code.
>> The correct decision for how to optimise a given piece of code seems
>> to depend on the SoC and the runtime load profile. And while you can
>> usefully predict that at build-time for a media player or dedicated
>> media stack components, it's pretty much impossible to do so with
>> general-purpose libraries... unless there's a cunning strategy I
>> haven't thought of.
>> Ideally, processes whose load varies significantly over time and
>> between different use cases (such as Xorg) would be able to select
>> between NEON-ised and non-NEON-ised implementations dynamically, based
>> on the current load. But I guess we're some distance away from being
>> able to achieve that... ?
> I agree. I've been wondering if this is more of a power management
> topic as what you've described there is basically the same as what the
> CPU frequency governor does in deciding the best way to achieve a
> workload. Perhaps this can also turn into hints to executing code re:
> what instruction set to use.
> There might be an argument for explicit control as well. Say you're
> decoding a AAC stream and using 20 % CPU - it might be more efficient
> to acquire and release the NEON unit from within the decoder to start
> it up faster and release it as soon as the job is done.
> Could a kernel developer describe how the NEON unit is controlled? My
> understanding is:
> * NEON is generally off
> * Executing a NEON instruction causes a instruction trap, which kicks
> the kernel, which starts the unit up
> * The kernel only saves the NEON registers if the code uses them
> I'm not sure about:
> * Does NEON remain on as long as that process is executing? Does it
> get turned off on task switch, or perhaps after a timeout?
On OMAP3, Neon is a separate Power domain and it can transition to low
power state on its own based on its activity (managed by PRCM HW).
However Neon PD has a Wake dependency with MPU which means Neon is
woken up whenever MPU comes out of standby state.
> * VFP uses the same register set. Does a floating point instruction
> also turn the NEON coprocessor on?
Yes I supposed so since VFP engine is part of Neon Unit.
> -- Michael
> linaro-dev mailing list
> linaro-dev at lists.linaro.org
More information about the linaro-dev