Hi Daniel,
Thanks for investigating this further.
On Wed, Nov 11, 2015 at 04:52:00PM +0100, Daniel Borkmann wrote:
I played a bit around with eBPF code to assign the __sync_fetch_and_add() return value to a var and dump it to trace pipe, or use it as return code. llvm compiles it (with the result assignment) and it looks like:
[...] 206: (b7) r3 = 3 207: (db) lock *(u64 *)(r0 +0) += r3 208: (bf) r1 = r10 209: (07) r1 += -16 210: (b7) r2 = 10 211: (85) call 6 // r3 dumped here [...]
[...] 206: (b7) r5 = 3 207: (db) lock *(u64 *)(r0 +0) += r5 208: (bf) r1 = r10 209: (07) r1 += -16 210: (b7) r2 = 10 211: (b7) r3 = 43 212: (b7) r4 = 42 213: (85) call 6 // r5 dumped here [...]
[...] 11: (b7) r0 = 3 12: (db) lock *(u64 *)(r1 +0) += r0 13: (95) exit // r0 returned here [...]
What it seems is that we 'get back' the value (== 3 here in r3, r5, r0) that we're adding, at least that's what seems to be generated wrt register assignments. Hmm, the semantic differences of bpf target should be documented somewhere for people writing eBPF programs to be aware of.
If we're going to document it, a bug tracker might be a good place to start. The behaviour, as it stands, is broken wrt the definition of the __sync primitives. That is, there is no way to build __sync_fetch_and_add out of BPF_XADD without changing its semantics.
We could fix this by either:
(1) Defining BPF_XADD to match __sync_fetch_and_add (including memory barriers).
(2) Introducing some new BPF_ atomics, that map to something like the C11 __atomic builtins and deprecating BPF_XADD in favour of these.
(3) Introducing new source-language intrinsics to match what BPF can do (unlikely to be popular).
As it stands, I'm not especially keen on adding BPF_XADD to the arm64 JIT backend until we have at least (1) and preferably (2) as well.
Will