Re: Optimized kernel memcpy/memset

5 May 2011


      On Thu, 5 May 2011, David Gilbert wrote:
...
Yes, while I've not actually looked at coding CRC32 or the crypto things
I agree that they feel like they have much more room for working with;
it's outside of the scope of what I was asked to look at however.
Well, you said that the current memcpy code in the kernel is quite good, 
which is nice not only because I wrote it :-) but that might indicate 
that the Neon optimization efforts might have a bigger return on 
the investment elsewhere.
...
...
The memcpy case is not interesting.  Not at all.  Most kernel memcpy
calls are for small size copies.  The large copy instances are just bad
and misdesigned in the first place if they rely on memcpy (maybe they
should simply have a custom copy function, maybe implemented with Neon).
Even outside the kernel vast memcpy's are fairly rare as far as I can 
tell - everyone knows they're going to hurt so people try and avoid 
them; the other thing is that people have been optimising ARM memcpy 
for decades and it appears to me to be hitting cache/bus bandwidths 
somewhere (although I don't have any figures for what those bandwidths 
are) - there may be some scope for optimising the smaller memcpy cases 
(e.g. taking advantage of things like the newer cbz to cut a few 
instructions out) - from my graphs the slope up to the point at which 
the non-neon code plateaus is quite gradual, which suggests it might 
be possible to optimise it a bit.
Indeed.
Nicolas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Optimized kernel memcpy/memset