Hi Ard!
As we've discussed today in IRC I'm sending you my asm based implementation for RAID syndrome functions. I would be glad if you can compare this implementation to intrinsics based one you are currently working on.
I don't post here my code for VFP/NEON context save/restore. People who are interested may find patches developed by Ard on [1]. However, I'm using "fpu" notation in these patches. Therefore, some changes to vfp/neon might be necessary to make things working.
[1] https://patchwork.kernel.org/patch/2605041/
Thanks! Vladimir Murzin