membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented as syncing the core on all sibling threads but not necessarily the calling thread. This behavior is fundamentally buggy and cannot be used safely. Suppose a user program has two threads. Thread A is on CPU 0 and thread B is on CPU 1. Thread A modifies some text and calls membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE). Then thread B executes the modified code. If, at any point after membarrier() decides which CPUs to target, thread A could be preempted and replaced by thread B on CPU 0. This could even happen on exit from the membarrier() syscall. If this happens, thread B will end up running on CPU 0 without having synced.
In principle, this could be fixed by arranging for the scheduler to sync_core_before_usermode() whenever switching between two threads in the same mm if there is any possibility of a concurrent membarrier() call, but this would have considerable overhead. Instead, make membarrier() sync the calling CPU as well.
As an optimization, this avoids an extra smp_mb() in the default barrier-only mode.
Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski luto@kernel.org --- kernel/sched/membarrier.c | 51 +++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 18 deletions(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 01538b31f27e..57266ab32ef9 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -333,7 +333,8 @@ static int membarrier_private_expedited(int flags, int cpu_id) return -EPERM; }
- if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) + if (flags != MEMBARRIER_FLAG_SYNC_CORE && + (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)) return 0;
/* @@ -352,8 +353,6 @@ static int membarrier_private_expedited(int flags, int cpu_id)
if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id)) goto out; - if (cpu_id == raw_smp_processor_id()) - goto out; rcu_read_lock(); p = rcu_dereference(cpu_rq(cpu_id)->curr); if (!p || p->mm != mm) { @@ -368,16 +367,6 @@ static int membarrier_private_expedited(int flags, int cpu_id) for_each_online_cpu(cpu) { struct task_struct *p;
- /* - * Skipping the current CPU is OK even through we can be - * migrated at any point. The current CPU, at the point - * where we read raw_smp_processor_id(), is ensured to - * be in program order with respect to the caller - * thread. Therefore, we can skip this CPU from the - * iteration. - */ - if (cpu == raw_smp_processor_id()) - continue; p = rcu_dereference(cpu_rq(cpu)->curr); if (p && p->mm == mm) __cpumask_set_cpu(cpu, tmpmask); @@ -385,12 +374,38 @@ static int membarrier_private_expedited(int flags, int cpu_id) rcu_read_unlock(); }
- preempt_disable(); - if (cpu_id >= 0) + if (cpu_id >= 0) { + /* + * smp_call_function_single() will call ipi_func() if cpu_id + * is the calling CPU. + */ smp_call_function_single(cpu_id, ipi_func, NULL, 1); - else - smp_call_function_many(tmpmask, ipi_func, NULL, 1); - preempt_enable(); + } else { + /* + * For regular membarrier, we can save a few cycles by + * skipping the current cpu -- we're about to do smp_mb() + * below, and if we migrate to a different cpu, this cpu + * and the new cpu will execute a full barrier in the + * scheduler. + * + * For CORE_SYNC, we do need a barrier on the current cpu -- + * otherwise, if we are migrated and replaced by a different + * task in the same mm just before, during, or after + * membarrier, we will end up with some thread in the mm + * running without a core sync. + * + * For RSEQ, don't rseq_preempt() the caller. User code + * is not supposed to issue syscalls at all from inside an + * rseq critical section. + */ + if (flags != MEMBARRIER_FLAG_SYNC_CORE) { + preempt_disable(); + smp_call_function_many(tmpmask, ipi_func, NULL, true); + preempt_enable(); + } else { + on_each_cpu_mask(tmpmask, ipi_func, NULL, true); + } + }
out: if (cpu_id < 0)
----- On Dec 4, 2020, at 12:07 AM, Andy Lutomirski luto@kernel.org wrote:
membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented as syncing the core on all sibling threads but not necessarily the calling thread. This behavior is fundamentally buggy and cannot be used safely. Suppose a user program has two threads. Thread A is on CPU 0 and thread B is on CPU 1. Thread A modifies some text and calls membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE). Then thread B executes the modified code. If, at any point after membarrier() decides which CPUs to target, thread A could be preempted and replaced by thread B on CPU 0. This could even happen on exit from the membarrier() syscall. If this happens, thread B will end up running on CPU 0 without having synced.
In principle, this could be fixed by arranging for the scheduler to sync_core_before_usermode() whenever switching between two threads in the same mm if there is any possibility of a concurrent membarrier() call, but this would have considerable overhead. Instead, make membarrier() sync the calling CPU as well.
As an optimization, this avoids an extra smp_mb() in the default barrier-only mode.
^ we could also add to the commit message that it avoids doing rseq preempt on the caller as well.
Other than that:
Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com
Thanks!
Mathieu
Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski luto@kernel.org
kernel/sched/membarrier.c | 51 +++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 18 deletions(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 01538b31f27e..57266ab32ef9 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -333,7 +333,8 @@ static int membarrier_private_expedited(int flags, int cpu_id) return -EPERM; }
- if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
if (flags != MEMBARRIER_FLAG_SYNC_CORE &&
(atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1))
return 0;
/*
@@ -352,8 +353,6 @@ static int membarrier_private_expedited(int flags, int cpu_id)
if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id)) goto out;
if (cpu_id == raw_smp_processor_id())
rcu_read_lock(); p = rcu_dereference(cpu_rq(cpu_id)->curr); if (!p || p->mm != mm) {goto out;
@@ -368,16 +367,6 @@ static int membarrier_private_expedited(int flags, int cpu_id) for_each_online_cpu(cpu) { struct task_struct *p;
/*
* Skipping the current CPU is OK even through we can be
* migrated at any point. The current CPU, at the point
* where we read raw_smp_processor_id(), is ensured to
* be in program order with respect to the caller
* thread. Therefore, we can skip this CPU from the
* iteration.
*/
if (cpu == raw_smp_processor_id())
continue; p = rcu_dereference(cpu_rq(cpu)->curr); if (p && p->mm == mm) __cpumask_set_cpu(cpu, tmpmask);
@@ -385,12 +374,38 @@ static int membarrier_private_expedited(int flags, int cpu_id) rcu_read_unlock(); }
- preempt_disable();
- if (cpu_id >= 0)
- if (cpu_id >= 0) {
/*
* smp_call_function_single() will call ipi_func() if cpu_id
* is the calling CPU.
smp_call_function_single(cpu_id, ipi_func, NULL, 1);*/
- else
smp_call_function_many(tmpmask, ipi_func, NULL, 1);
- preempt_enable();
- } else {
/*
* For regular membarrier, we can save a few cycles by
* skipping the current cpu -- we're about to do smp_mb()
* below, and if we migrate to a different cpu, this cpu
* and the new cpu will execute a full barrier in the
* scheduler.
*
* For CORE_SYNC, we do need a barrier on the current cpu --
* otherwise, if we are migrated and replaced by a different
* task in the same mm just before, during, or after
* membarrier, we will end up with some thread in the mm
* running without a core sync.
*
* For RSEQ, don't rseq_preempt() the caller. User code
* is not supposed to issue syscalls at all from inside an
* rseq critical section.
*/
if (flags != MEMBARRIER_FLAG_SYNC_CORE) {
preempt_disable();
smp_call_function_many(tmpmask, ipi_func, NULL, true);
preempt_enable();
} else {
on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
}
- }
out: if (cpu_id < 0) -- 2.28.0
On Thu, Dec 3, 2020 at 9:07 PM Andy Lutomirski luto@kernel.org wrote:
membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented as syncing the core on all sibling threads but not necessarily the calling thread. This behavior is fundamentally buggy and cannot be used safely. Suppose a user program has two threads. Thread A is on CPU 0 and thread B is on CPU 1. Thread A modifies some text and calls membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE). Then thread B executes the modified code. If, at any point after membarrier() decides which CPUs to target, thread A could be preempted and replaced by thread B on CPU 0. This could even happen on exit from the membarrier() syscall. If this happens, thread B will end up running on CPU 0 without having synced.
In principle, this could be fixed by arranging for the scheduler to sync_core_before_usermode() whenever switching between two threads in the same mm if there is any possibility of a concurrent membarrier() call, but this would have considerable overhead. Instead, make membarrier() sync the calling CPU as well.
As an optimization, this avoids an extra smp_mb() in the default barrier-only mode.
Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
also:
/*
* For regular membarrier, we can save a few cycles by
* skipping the current cpu -- we're about to do smp_mb()
* below, and if we migrate to a different cpu, this cpu
* and the new cpu will execute a full barrier in the
* scheduler.
*
* For CORE_SYNC, we do need a barrier on the current cpu --
s/CORE_SYNC/SYNC_CORE/
--Andy
linux-stable-mirror@lists.linaro.org