On 5 May 2011 17:21, Christian Robottom Reis kiko@linaro.org wrote:
Hey there,
I was asked today in the board meeting about the use of NEON routines in the kernel; I said we had looked into this but hadn't done it because a) it wasn't conclusively better and b) if better, it would need to be done conditionally per-platform. But I wanted to double-check that's actually true (and I'm copying Vijay to keep me honest). I have some references:
http://lists.linaro.org/pipermail/linaro-toolchain/2011-January/000722.html
http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc...
http://www.spinics.net/lists/arm-kernel/msg106503.html
http://dev.gentoo.org/~armin76/arm/memcpy-neon_result.txt
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemcpy?hig... https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines?highlight=%28...
Back in 2003-2004 iirc, Freescale played with the idea of using Altivec (the PowerPC SIMD engine) inside the kernel, and published a paper on this:
http://cache.freescale.com/files/32bit/doc/app_note/AN2581.pdf
All of it is a good read, but for the hasty ones, I'd suggest moving to paragraph 3.3: in essense it says that due to potential problems in context switching (ie if a usermode applications contests with the kernel for the SIMD unit), performance might drop for both due to excessive context switching. OTOH, a new SIMD-memcpy used in specific cases or even combined with some other functionality -as for example the TCP checksum in this case- might prove quite rewarding and probably the proper way to use a SIMD inside the kernel. I'm sure that this is irrelevant of the actual SIMD unit in question, whatever applies to NEON might apply as well to Altivec or SSE*.
Regards
Konstantinos