On Wed, May 25, 2011 at 12:58:30PM +0100, David Gilbert wrote:
On 25 May 2011 04:45, Nicolas Pitre nicolas.pitre@linaro.org wrote:
FWIW, here's what the kernel part might look like, i.e. for compatibility with pre ARMv6k systems (beware, only compile tested):
OK, so that makes a eglibc part for that pretty easy. For things like fetch_and_add (which I can see membase needs) would you expect implementation using this cmpxchg so it has a fall back or just to use ldrexd directly which I assume would be somewhat more efficient.
(Question holds for both eglibc and gcc's __sync_*)
It depends on the baseline architecture for the build.
An eglibc built for ARMv6 and above would need to call the helper by default, though it could also use ldrexd/strexd if it determines at run- time that this is supported by the CPU.
Similarly, if GCC is building for -march=marmv7-a it can inline the atomics directly using ldrex/strex and friends, but for -march=armv6 it will need to call helpers via libgcc.
- Notes:
- - This routine already includes memory barriers as needed.
Hmm I wonder whether something like the atomic add primitives from user space need that or not; it depends whether people are using them to build
The GCC __sync_* primitives must mostly be full barriers. This is what the Itanium ABI specifies (this is the spec GCC follows for these).
The kernel user helper itself could omit the barriers, but this would deviate from the existing 32-bit implementation, and also slow down the (common) case where at least one of the barriers really is needed.
_just_ updating a counter doesn't need the barriers. But if it's important that a counter can be atomically updated by multiple threads, the synchronisation of that update against other data structures usually turns out to be important too...
larger data structures or just trying to keep a consistent count somewhere.
GCC's atomic primitives don't address the full/partial barrier distinction. Some other atomics APIs, like Qt's for example, do express different flavours of barrier behavour and so in principle can be more optimal for cases where it makes a difference.
Cheers ---Dave