On Wed, 25 May 2011, Dave Martin wrote:
On Tue, May 24, 2011 at 11:45:04PM -0400, Nicolas Pitre wrote:
- typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
const int64_t *newval,
volatile int64_t *ptr);
- #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0xffff0f60)
- Atomically store newval in *ptr if *ptr is equal to oldval for user space.
- Return zero if *ptr was changed or non-zero if no exchange happened.
- The C flag is also set if *ptr was changed to allow for assembly
- optimization in the calling code.
- Do not attempt to call this function unless __kernel_helper_version >= 5.
Yep. I will queue your other patch prior to this one and make this blurb consistent with the rest.
+kuser_cmpxchg64_fixup:
- @ Called from kuser_cmpxchg_fixup.
- @ r2 = address of interrupted insn (must be preserved).
- @ sp = saved regs. r7 and r8 are clobbered.
- @ 1b = first critical insn, 2b = last critical insn.
- @ If r2 >= 1b and r2 <= 2b then saved pc_usr is set to 1b.
- mov r7, #0xffff0fff
- sub r7, r7, #(0xffff0fff - (0xffff0f60 + (1b - __kuser_cmpxchg64)))
- subs r8, r2, r7
- rsbcss r8, r8, #(2b - 1b)
- strcs r7, [sp, #S_PC]
+#if __LINUX_ARM_ARCH__ < 6
- b kuser_cmpxchg32_fixup
Can we just have movcs pc,lr here, and put kuser_cmpxchg32_fixup immediately after?
This would allow us to skip the branch, and the initial "mov r7" in the kuser_cmpxchg32_fixup code.
The 'mov r7' is still needed unless the second instruction uses another target register.
I thought about the possibility of "movcs pc, lr", especially since the .text segments are simply concatenated and therefore the branch is effectively branching to the very next instruction. So there could be like a common preamble, then a list of concatenated fixup segments (only two of them now) and finally a postamble which would simply be "mov pc, lr". This would all be put contigous at link time. However I'm not sure yet if this is worth optimizing that far since this code is far from being hot, and clarity would also be affected.
There's a fair amount of duplicated logic from the 32-bit case. Is it worth trying to merge the two?
The core logic spans 5 instructions. Only 3 of them are actually the same in both cases i.e. those with no references to 1b or 2b, and they're perfectly interlaced in between the others. Trying to make this into common runtime code would result in even bigger code I'm afraid. This could be made into common source code via a macro though.
Thanks for the review.
Nicolas