From: "Steven Rostedt (VMware)" rostedt@goodmis.org
While debugging another bug, I was looking at all the synchronize*() functions being used in kernel/trace, and noticed that trace_uprobes was using synchronize_sched(), with a comment to synchronize with {u,ret}_probe_trace_func(). When looking at those functions, the data is protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This is using the wrong synchronize_*() function.
Cc: stable@vger.kernel.org Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org --- kernel/trace/trace_uprobe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index bf89a51e740d..ac02fafc9f1b 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
list_del_rcu(&link->list); /* synchronize with u{,ret}probe_trace_func */ - synchronize_sched(); + synchronize_rcu(); kfree(link);
if (!list_empty(&tu->tp.files))
On 08/09, Steven Rostedt wrote:
--- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
list_del_rcu(&link->list); /* synchronize with u{,ret}probe_trace_func */
synchronize_sched();
synchronize_rcu();
Can't we change uprobe_trace_func() and uretprobe_trace_func() to use rcu_read_lock_sched() instead? It is more cheap.
Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't look right... Not only because kfree() can race with list_for_each_entry_rcu(), we should not put the 1st link on list until uprobe_buffer_enable().
Does the patch below make sense or I am confused?
Oleg.
--- x/kernel/trace/trace_uprobe.c +++ x/kernel/trace/trace_uprobe.c @@ -896,8 +896,6 @@ probe_event_enable(struct trace_uprobe * return -ENOMEM;
link->file = file; - list_add_tail_rcu(&link->list, &tu->tp.files); - tu->tp.flags |= TP_FLAG_TRACE; } else { if (tu->tp.flags & TP_FLAG_TRACE) @@ -909,7 +907,7 @@ probe_event_enable(struct trace_uprobe * WARN_ON(!uprobe_filter_is_empty(&tu->filter));
if (enabled) - return 0; + goto add;
ret = uprobe_buffer_enable(); if (ret) @@ -920,7 +918,8 @@ probe_event_enable(struct trace_uprobe * ret = uprobe_register(tu->inode, tu->offset, &tu->consumer); if (ret) goto err_buffer; - + add: + list_add_tail_rcu(&link->list, &tu->tp.files); return 0;
err_buffer: @@ -928,7 +927,6 @@ probe_event_enable(struct trace_uprobe *
err_flags: if (file) { - list_del(&link->list); kfree(link); tu->tp.flags &= ~TP_FLAG_TRACE; } else {
On 08/10, Oleg Nesterov wrote:
@@ -920,7 +918,8 @@ probe_event_enable(struct trace_uprobe * ret = uprobe_register(tu->inode, tu->offset, &tu->consumer); if (ret) goto err_buffer;
- add:
- list_add_tail_rcu(&link->list, &tu->tp.files);
if (link) list_add_tail_rcu(&link->list, &tu->tp.files);
Oleg.
On Fri, 10 Aug 2018 13:35:49 +0200 Oleg Nesterov oleg@redhat.com wrote:
On 08/09, Steven Rostedt wrote:
--- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
list_del_rcu(&link->list); /* synchronize with u{,ret}probe_trace_func */
synchronize_sched();
synchronize_rcu();
Can't we change uprobe_trace_func() and uretprobe_trace_func() to use rcu_read_lock_sched() instead? It is more cheap.
Is it? rcu_read_lock_sched() is a preempt_disable(), where rcu_read_lock() may just be a task counter increment.
Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't look right... Not only because kfree() can race with list_for_each_entry_rcu(), we should not put the 1st link on list until uprobe_buffer_enable().
Does the patch below make sense or I am confused?
I guess the question is, if it isn't enabled, are there any users or even past users still running. If not, then I think the current code is OK, as there shouldn't be anything happening to race with it.
-- Steve
Oleg.
--- x/kernel/trace/trace_uprobe.c +++ x/kernel/trace/trace_uprobe.c @@ -896,8 +896,6 @@ probe_event_enable(struct trace_uprobe * return -ENOMEM; link->file = file;
list_add_tail_rcu(&link->list, &tu->tp.files);
- tu->tp.flags |= TP_FLAG_TRACE; } else { if (tu->tp.flags & TP_FLAG_TRACE)
@@ -909,7 +907,7 @@ probe_event_enable(struct trace_uprobe * WARN_ON(!uprobe_filter_is_empty(&tu->filter)); if (enabled)
return 0;
goto add;
ret = uprobe_buffer_enable(); if (ret) @@ -920,7 +918,8 @@ probe_event_enable(struct trace_uprobe * ret = uprobe_register(tu->inode, tu->offset, &tu->consumer); if (ret) goto err_buffer;
- add:
- list_add_tail_rcu(&link->list, &tu->tp.files); return 0;
err_buffer: @@ -928,7 +927,6 @@ probe_event_enable(struct trace_uprobe * err_flags: if (file) {
kfree(link); tu->tp.flags &= ~TP_FLAG_TRACE; } else {list_del(&link->list);
On 08/10, Steven Rostedt wrote:
On Fri, 10 Aug 2018 13:35:49 +0200 Oleg Nesterov oleg@redhat.com wrote:
On 08/09, Steven Rostedt wrote:
--- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
list_del_rcu(&link->list); /* synchronize with u{,ret}probe_trace_func */
synchronize_sched();
synchronize_rcu();
Can't we change uprobe_trace_func() and uretprobe_trace_func() to use rcu_read_lock_sched() instead? It is more cheap.
Is it? rcu_read_lock_sched() is a preempt_disable(),
which is just raw_cpu_inc()
where rcu_read_lock() may just be a task counter increment.
and __rcu_read_unlock() is more heavy.
OK, I agree, this doesn't really matter.
Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't look right... Not only because kfree() can race with list_for_each_entry_rcu(), we should not put the 1st link on list until uprobe_buffer_enable().
Does the patch below make sense or I am confused?
I guess the question is, if it isn't enabled, are there any users or even past users still running.
Note that uprobe_register() is not "atomic".
To simplify, suppose we have 2 tasks T1 and T2 running the probed binary. So we are going to do install_breakpoint(T1->mm) + install_breakpoint(T2->mm). If the 2nd install_breakpoint() fails for any reason, _register() will do remove_breakpoint(T1->mm) and return the error.
However, T1 can hit this bp right after install_breakpoint(T1->mm), so it can call uprobe_trace_func() before list_del(&link->list).
OK, even if I am right this is mostly theoretical.
Oleg.
[ Removing Jovi as his email is bouncing ]
On Fri, 10 Aug 2018 15:36:08 +0200 Oleg Nesterov oleg@redhat.com wrote:
Can't we change uprobe_trace_func() and uretprobe_trace_func() to use rcu_read_lock_sched() instead? It is more cheap.
Is it? rcu_read_lock_sched() is a preempt_disable(),
which is just raw_cpu_inc()
where rcu_read_lock() may just be a task counter increment.
and __rcu_read_unlock() is more heavy.
OK, I agree, this doesn't really matter.
Are you OK with this patch? I set it for stable and plan on pushing it with the patches for the upcoming merge window. If you are OK, mind giving me an Acked or Reviewed-by?
Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't look right... Not only because kfree() can race with list_for_each_entry_rcu(), we should not put the 1st link on list until uprobe_buffer_enable().
Does the patch below make sense or I am confused?
I guess the question is, if it isn't enabled, are there any users or even past users still running.
Note that uprobe_register() is not "atomic".
To simplify, suppose we have 2 tasks T1 and T2 running the probed binary. So we are going to do install_breakpoint(T1->mm) + install_breakpoint(T2->mm). If the 2nd install_breakpoint() fails for any reason, _register() will do remove_breakpoint(T1->mm) and return the error.
However, T1 can hit this bp right after install_breakpoint(T1->mm), so it can call uprobe_trace_func() before list_del(&link->list).
OK, even if I am right this is mostly theoretical.
Even if it is theoretical, we should make sure it can't happen. But this is unrelated to the current patch, and if we should fix this, then it can be a separate patch. I don't think your change hurts, and even if it can't technically happen, it may let us sleep better at night. Want to send a formal patch to make this change?
-- Steve
On 08/10, Steven Rostedt wrote:
Are you OK with this patch? I set it for stable and plan on pushing it with the patches for the upcoming merge window. If you are OK, mind giving me an Acked or Reviewed-by?
Yes, thanks, feel free to add
Acked-by: Oleg Nesterov oleg@redhat.com
Even if it is theoretical, we should make sure it can't happen. But this is unrelated to the current patch,
agreed.
and if we should fix this, then it can be a separate patch. I don't think your change hurts, and even if it can't technically happen, it may let us sleep better at night. Want to send a formal patch to make this change?
OK, will do.
Oleg.
linux-stable-mirror@lists.linaro.org