Re: Optimized kernel memcpy/memset

5 May 2011


      David Gilbert david.gilbert@linaro.org writes:
...
Hi Kiko,
On 5 May 2011 15:21, Christian Robottom Reis kiko@linaro.org wrote:
...
Hey there,
I was asked today in the board meeting about the use of NEON
routines in the kernel; I said we had looked into this but hadn't done
it because a) it wasn't conclusively better and b) if better, it would
need to be done conditionally per-platform. But I wanted to double-check
that's actually true (and I'm copying Vijay to keep me honest). I have
some references:
Not quite:
  a) Neon memcpy/memset is worse on A9 than non-neon versions (better
on A8 typically)
That is not my experience at all.  On the contrary, I've seen memcpy
throughput on A9 roughly double with use of NEON for large copies.
For small copies, plain ARM is might be faster since the overhead of
preparing for a properly aligned NEON loop is avoided.
What do you base your claims on?
...
b) In general I don't believe fpu or Neon code can be used
internally to the kernel.
That is true.  There is currently no support for the context save and
restore it would require.
...
...
http://lists.linaro.org/pipermail/linaro-toolchain/2011-January/000722.html
http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc...
http://www.spinics.net/lists/arm-kernel/msg106503.html
http://dev.gentoo.org/~armin76/arm/memcpy-neon_result.txt
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemcpy?hig...
   https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines?highlight=%28...
There may be the potential still for non-neon optimised memcpy/memset
for Cortex a9; however the kernel routines are pretty good.
...
Incidentally, this ties into the question sent earlier this week which
had to do with Nico's work item in:
https://blueprints.launchpad.net/linux-linaro/+spec/other-kernel-thumb2
Which IIRC Nico says probably isn't worth it, right?
I thought dmart had done a lot of that?
I don't see the connection between Thumb2 and memcpy performance.
Thumb2 can do anything 32-bit ARM can.
-- 
Måns Rullgård
mans@mansr.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Optimized kernel memcpy/memset