On Thu, 5 May 2011, David Gilbert wrote:
If people believe it's worth breaking the context-switching taboo and putting a neon version into the kernel then yes I agree it's something you'd want to do as a build and/or runtime selection - but that's quite a big taboo to break.
There is no taboo. Only numbers.
The cost of using Neon in the kernel is non negligible. It is also hard to measure as it depends on the actual Neon usage simultaneously happening in user space or in other concurrent kernel contexts. This is not something that a dedicated benchmark can evaluate.
There _are_ cases for Neon to be used in the kernel i.e. those where the initial cost is offset by the gain. The first that comes to mind is crypto of course. But there is also simple things like CRC32 which is used all over the place by BTRFS for example. And that is the actual test case I think we should focus our efforts on, given that BTRFS is going to be the next major filesystem on Linux. Last time I tried BTRFS on ARM, the CRC32 computation was dominating CPU usage big time. CRC32 is easy to understand, easy to validate, and will provide the right reason for creating the needed infrastructure to manipulate the Neon context in kernel space. Once that's in place we could move to other targets such as crypto which is already complex enough without having to bother with the Neon context handling.
The memcpy case is not interesting. Not at all. Most kernel memcpy calls are for small size copies. The large copy instances are just bad and misdesigned in the first place if they rely on memcpy (maybe they should simply have a custom copy function, maybe implemented with Neon). And I doubt the small memcpy's are going to gain anything from Neon. Even on X86 they don't do it, while they do have a CRC32 function using SSE2. Maybe we could use Neon for copy_page() which is one of those custom bulk copy functions, but I've never seen memcpy() in kernel space show up on any profile.