Re: Usefulness of GCC's 64bit __sync_* ops on ARM

26 May 2011

      On Wed, May 25, 2011 at 10:21:43AM -0400, Nicolas Pitre wrote:
...
On Wed, 25 May 2011, Dave Martin wrote:
...
On Tue, May 24, 2011 at 11:45:04PM -0400, Nicolas Pitre wrote:
...

typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,

			   const int64_t *newval,

			   volatile int64_t *ptr);

#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0xffff0f60)

Atomically store newval in *ptr if *ptr is equal to oldval for user space.

Return zero if *ptr was changed or non-zero if no exchange happened.

The C flag is also set if *ptr was changed to allow for assembly

optimization in the calling code.

Do not attempt to call this function unless __kernel_helper_version >= 5.

Yep.  I will queue your other patch prior to this one and make this 
blurb consistent with the rest.
Oh, OK.  I hadn't feedback on that yet, so I wasn't sure whether anyone
was picking it up.  I will repost to alkml, but I was waiting for the
merge window to close since this was just a tidyup rather than something
urgent.
This warning is more important for the new helper than for the existing
ones (where really we could omit it without much risk).
...
...
...
+kuser_cmpxchg64_fixup:

@ Called from kuser_cmpxchg_fixup.
@ r2 = address of interrupted insn (must be preserved).
@ sp = saved regs. r7 and r8 are clobbered.
@ 1b = first critical insn, 2b = last critical insn.
@ If r2 >= 1b and r2 <= 2b then saved pc_usr is set to 1b.
mov	r7, #0xffff0fff
sub	r7, r7, #(0xffff0fff - (0xffff0f60 + (1b - __kuser_cmpxchg64)))
subs	r8, r2, r7
rsbcss	r8, r8, #(2b - 1b)
strcs	r7, [sp, #S_PC]

+#if __LINUX_ARM_ARCH__ < 6

b	kuser_cmpxchg32_fixup

Can we just have movcs pc,lr here, and put kuser_cmpxchg32_fixup
immediately after?
This would allow us to skip the branch, and the initial "mov r7" in
the kuser_cmpxchg32_fixup code.
The 'mov r7' is still needed unless the second instruction uses another 
target register.
What I meant is that at the end of the above sequence, r7 = 1b.
So we can just fall through in to the 32-bit fixup code and add the
appropriate small offset to r7 so that it now points to the corresponding
1b label for __kuser_cmpxchg32, without needing to re-derive the address
using mov+sub.
It's a pretty minor tweak though.
...
I thought about the possibility of "movcs pc, lr", especially since the 
.text segments are simply concatenated and therefore the branch is 
effectively branching to the very next instruction.  So there could be 
like a common preamble, then a list of concatenated fixup segments (only 
two of them now) and finally a postamble which would simply be "mov pc, lr".  
This would all be put contigous at link time.  However I'm not sure yet 
if this is worth optimizing that far since this code is far from being 
hot, and clarity would also be affected.
Also, probably the number of fixups is never going to grow very large.
...
...
There's a fair amount of duplicated logic from the 32-bit case.
Is it worth trying to merge the two?
The core logic spans 5 instructions.  Only 3 of them are actually the 
same in both cases i.e. those with no references to 1b or 2b, and 
they're perfectly interlaced in between the others.  Trying to make this 
into common runtime code would result in even bigger code I'm afraid.  
This could be made into common source code via a macro though.
Fair enough -- a macro might be worth a try _if_ it simplifies things
in the source, but I think you're right that merging the actual code
probably isn't worth it just to save a few words in the vectors page
(which eats up 4K regardless of what we put in it) and a few cycles per
fault (which already costs many, many cycles).
Cheers
---Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Usefulness of GCC's 64bit __sync_* ops on ARM