On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node) +{
- intptr_t *targetptr, newval, expect;
- int cpu, ret;
- /* Try fast path. */
- cpu = rseq_cpu_start();
- /* Load list->c[cpu].head with single-copy atomicity. */
- expect = (intptr_t)READ_ONCE(list->c[cpu].head);
- newval = (intptr_t)node;
- targetptr = (intptr_t *)&list->c[cpu].head;
- node->next = (struct percpu_list_node *)expect;
- ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
- if (likely(!ret))
return cpu;
- return cpu;
+}
+static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
int cpu)
+{
- __asm__ __volatile__ goto (
RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
So the actual C part of the RSEQ is subject to an ABA, right? We can get migrated to another CPU and back again without then failing here.
It used to be that this was caught by the sequence count, but that is now gone.
The thing that makes it work is the compare against @v:
"cmpq %[v], %[expect]\n\t"
"jnz 5f\n\t"
That then ensures things are still as we observed them before (although this itself is also subject to ABA).
This means all RSEQ primitives that have a C part must have a cmp-and- form, but I suppose that was already pretty much the case anyway. I just don't remember seeing that spelled out anywhere. Then again, I've not yet read that manpage.
/* final store */
"movq %[newv], %[v]\n\t"
"2:\n\t"
RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
: /* gcc asm goto does not allow outputs */
: [cpu_id]"r"(cpu),
[current_cpu_id]"m"(__rseq_abi.cpu_id),
[rseq_cs]"m"(__rseq_abi.rseq_cs),
[v]"m"(*v),
[expect]"r"(expect),
[newv]"r"(newv)
: "memory", "cc", "rax"
: abort, cmpfail
- );
- return 0;
+abort:
- return -1;
+cmpfail:
- return 1;
+}
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
+static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
int cpu)
+{
- __asm__ __volatile__ goto (
RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
"cmpq %[v], %[expect]\n\t"
"jnz 5f\n\t"
Also, I'm confused between the abort and cmpfail cases.
In would expect the cpu_id compare to also result in cmpfail, that is, I would only expect the kernel to result in abort.
/* final store */
"movq %[newv], %[v]\n\t"
"2:\n\t"
RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
: /* gcc asm goto does not allow outputs */
: [cpu_id]"r"(cpu),
[current_cpu_id]"m"(__rseq_abi.cpu_id),
[rseq_cs]"m"(__rseq_abi.rseq_cs),
[v]"m"(*v),
[expect]"r"(expect),
[newv]"r"(newv)
: "memory", "cc", "rax"
: abort, cmpfail
- );
- return 0;
+abort:
- return -1;
Which then would suggest this be -EINTR or something like that.
+cmpfail:
- return 1;
+}
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- On Nov 23, 2017, at 3:57 AM, Peter Zijlstra peterz@infradead.org wrote:
On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
+static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
int cpu)
+{
- __asm__ __volatile__ goto (
RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
"cmpq %[v], %[expect]\n\t"
"jnz 5f\n\t"
Also, I'm confused between the abort and cmpfail cases.
In would expect the cpu_id compare to also result in cmpfail, that is, I would only expect the kernel to result in abort.
Let's take the per-cpu spinlock as an example to explain why we need the "compare fail" and "cpu_id compare fail" to return different values.
Getting this lock involves doing:
cpu = rseq_cpu_start(); ret = rseq_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
Now based on the "ret" value:
if ret == 0, it means that @v was indeed 0, and that rseq executed the commit (stored 1).
if ret > 0, it means the comparison of @v against 0 failed, which means the lock was already held. We therefore need to postpone and retry later. A "try_lock" operation would return that the lock is currently busy.
if ret < 0, then we have either been aborted by the kernel, or the comparison of @cpu against cpu_id failed. If we think about it, having @cpu != cpu_id will happen if we are migrated before we enter the rseq critical section, which is pretty similar to being aborted by the kernel within the critical section. So I don't see any reason for making the branch target of the cpu_id comparison anything else than the abort_ip. In that situation, the caller needs to either re-try with an updated @cpu value (except for multi-part algorithms e.g. reserve+commit, which don't allow changing the @cpu number on commit), or use cpu_opv to perform the operation.
Note that another cause why the @cpu == cpu_id test may fail is if rseq is not registered for the current thread. Again, just branching to the abort_ip and letting the caller fallback to cpu_opv solves this.
/* final store */
"movq %[newv], %[v]\n\t"
"2:\n\t"
RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
: /* gcc asm goto does not allow outputs */
: [cpu_id]"r"(cpu),
[current_cpu_id]"m"(__rseq_abi.cpu_id),
[rseq_cs]"m"(__rseq_abi.rseq_cs),
[v]"m"(*v),
[expect]"r"(expect),
[newv]"r"(newv)
: "memory", "cc", "rax"
: abort, cmpfail
- );
- return 0;
+abort:
- return -1;
Which then would suggest this be -EINTR or something like that.
I'm not so sure returning kernel error codes is the expected practice for user-space libraries.
Thoughts ?
Thanks!
Mathieu
+cmpfail:
- return 1;
+}
----- On Nov 23, 2017, at 3:55 AM, Peter Zijlstra peterz@infradead.org wrote:
On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node) +{
- intptr_t *targetptr, newval, expect;
- int cpu, ret;
- /* Try fast path. */
- cpu = rseq_cpu_start();
- /* Load list->c[cpu].head with single-copy atomicity. */
- expect = (intptr_t)READ_ONCE(list->c[cpu].head);
- newval = (intptr_t)node;
- targetptr = (intptr_t *)&list->c[cpu].head;
- node->next = (struct percpu_list_node *)expect;
- ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
- if (likely(!ret))
return cpu;
- return cpu;
+}
+static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
int cpu)
+{
- __asm__ __volatile__ goto (
RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
So the actual C part of the RSEQ is subject to an ABA, right? We can get migrated to another CPU and back again without then failing here.
Yes, that's correct. All algorithms preparing something in C and then using a compare-and-other-stuff sequence need to ensure they do not have ABA situations. For instance, a list push does not care if the list head is reclaimed and re-inserted concurrently, because none of the preparation steps in C involve the head next pointer.
It used to be that this was caught by the sequence count, but that is now gone.
The sequence count introduced other weirdness: although it would catch those migration cases, it is a sequence read-lock, which means the C code "protected" by this sequence read-lock needed to be extremely careful about not accessing reclaimed memory. The sequence lock ensures consistency of the data when they comparison matches, but it does not protect against other side-effects.
So removing this sequence lock is actually a good thing: it removes any expectation that users may have about that sequence counter being anything stronger than a read seqlock.
The thing that makes it work is the compare against @v:
"cmpq %[v], %[expect]\n\t"
"jnz 5f\n\t"
That then ensures things are still as we observed them before (although this itself is also subject to ABA).
Yes.
This means all RSEQ primitives that have a C part must have a cmp-and- form, but I suppose that was already pretty much the case anyway. I just don't remember seeing that spelled out anywhere. Then again, I've not yet read that manpage.
Yes, pretty much. The only primitives that don't have the compare are things like "rseq_addv()", which does not have much in the C part (it's just incrementing a counter).
I did not state anything like "typical rseq c.s. do a compare and other stuff" in rseq(2), given that the role of this man page, AFAIU, is to explain how to interact with the kernel system call, and not really a document about user-space implementation guide lines.
But let me know if I should expand it with a user-space sequence implementation guide lines, which would include notes about being careful about ABA. I'm not sure it belongs there though.
Thanks!
Mathieu
/* final store */
"movq %[newv], %[v]\n\t"
"2:\n\t"
RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
: /* gcc asm goto does not allow outputs */
: [cpu_id]"r"(cpu),
[current_cpu_id]"m"(__rseq_abi.cpu_id),
[rseq_cs]"m"(__rseq_abi.rseq_cs),
[v]"m"(*v),
[expect]"r"(expect),
[newv]"r"(newv)
: "memory", "cc", "rax"
: abort, cmpfail
- );
- return 0;
+abort:
- return -1;
+cmpfail:
- return 1;
+}
linux-kselftest-mirror@lists.linaro.org