Fix a double free bug of event_trigger_data caused by calling unregister_trigger() from register_snapshot_trigger(). This kicks a kernel BUG if double free checker is enabled as below;
kernel BUG at /home/mhiramat/ksrc/linux/mm/slub.c:296! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 4312 Comm: ftracetest Not tainted 4.18.0-rc1+ #44 Hardware name: ASUS All Series/B85M-G, BIOS 2108 08/11/2014 RIP: 0010:set_freepointer.part.37+0x0/0x10 Code: 41 b8 01 00 00 00 29 c8 4d 8d 0c 0c b9 10 00 00 00 50 e8 e3 28 23 00 8b 53 08 5e 5f 89 d1 81 e1 00 04 00 00 e9 e9 fe ff ff 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 90 7f 0d RSP: 0018:ffffa799caa3bd90 EFLAGS: 00010246 RAX: ffff9b825f8c8e80 RBX: ffff9b825f8c8e80 RCX: ffff9b825f8c8e80 RDX: 0000000000021562 RSI: ffff9b830e9e70e0 RDI: 0000000000000202 RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b830e0072c0 R13: ffffeb8e0d7e3200 R14: ffffffff961db7af R15: 00000000fffffffe FS: 00007f135ba9f700(0000) GS:ffff9b830e800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563736b5f3a2 CR3: 0000000295916005 CR4: 00000000001606e0 Call Trace: kfree+0x35d/0x380 event_trigger_callback+0x13f/0x1c0 event_trigger_write+0xf2/0x1a0 ? lock_acquire+0x9f/0x200 __vfs_write+0x26/0x170 ? rcu_read_lock_sched_held+0x6b/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x13e/0x1a0 ? vfs_write+0x18a/0x1b0 vfs_write+0xc1/0x1b0 ksys_write+0x45/0xa0 do_syscall_64+0x60/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe
unregister_trigger() will free given event_trigger_data at last. But that event_trigger_data will be freed again in event_trigger_callback() if register_snapshot_trigger() is failed, and causes a double free bug.
Registering the data should be the final operation in the register function on normal path, because the trigger must be ready for taking action right after it is registered.
Fixes: commit 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Ingo Molnar mingo@redhat.com Cc: Tom Zanussi tom.zanussi@linux.intel.com Cc: stable@vger.kernel.org --- kernel/trace/trace.c | 5 +++++ kernel/trace/trace.h | 2 ++ kernel/trace/trace_events_trigger.c | 10 ++++++---- 3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f054bd6a1c66..2556d8c097d2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -980,6 +980,11 @@ static void free_snapshot(struct trace_array *tr) tr->allocated_snapshot = false; }
+void tracing_free_snapshot_instance(struct trace_array *tr) +{ + free_snapshot(tr); +} + /** * tracing_alloc_snapshot - allocate snapshot buffer. * diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index f8f86231ad90..03468bb8a79a 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1823,12 +1823,14 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len) #ifdef CONFIG_TRACER_SNAPSHOT void tracing_snapshot_instance(struct trace_array *tr); int tracing_alloc_snapshot_instance(struct trace_array *tr); +void tracing_free_snapshot_instance(struct trace_array *tr); #else static inline void tracing_snapshot_instance(struct trace_array *tr) { } static inline int tracing_alloc_snapshot_instance(struct trace_array *tr) { return 0; } +static inline void tracing_free_snapshot_instance(struct trace_array *tr) { } #endif
extern struct trace_iterator *tracepoint_print_iter; diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..40e2f4406b2c 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1079,11 +1079,13 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) { - int ret = register_trigger(glob, ops, data, file); + int free_if_fail = !file->tr->allocated_snapshot; + int ret = 0;
- if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) { - unregister_trigger(glob, ops, data, file); - ret = 0; + if (!tracing_alloc_snapshot_instance(file->tr)) { + ret = register_trigger(glob, ops, data, file); + if (ret == 0 && free_if_fail) + tracing_free_snapshot_instance(file->tr); }
return ret;
On Sat, 14 Jul 2018 01:27:47 +0900 Masami Hiramatsu mhiramat@kernel.org wrote:
Fix a double free bug of event_trigger_data caused by calling unregister_trigger() from register_snapshot_trigger(). This kicks a kernel BUG if double free checker is enabled as below;
kernel BUG at /home/mhiramat/ksrc/linux/mm/slub.c:296! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 4312 Comm: ftracetest Not tainted 4.18.0-rc1+ #44 Hardware name: ASUS All Series/B85M-G, BIOS 2108 08/11/2014 RIP: 0010:set_freepointer.part.37+0x0/0x10 Code: 41 b8 01 00 00 00 29 c8 4d 8d 0c 0c b9 10 00 00 00 50 e8 e3 28 23 00 8b 53 08 5e 5f 89 d1 81 e1 00 04 00 00 e9 e9 fe ff ff 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 90 7f 0d RSP: 0018:ffffa799caa3bd90 EFLAGS: 00010246 RAX: ffff9b825f8c8e80 RBX: ffff9b825f8c8e80 RCX: ffff9b825f8c8e80 RDX: 0000000000021562 RSI: ffff9b830e9e70e0 RDI: 0000000000000202 RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b830e0072c0 R13: ffffeb8e0d7e3200 R14: ffffffff961db7af R15: 00000000fffffffe FS: 00007f135ba9f700(0000) GS:ffff9b830e800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563736b5f3a2 CR3: 0000000295916005 CR4: 00000000001606e0 Call Trace: kfree+0x35d/0x380 event_trigger_callback+0x13f/0x1c0 event_trigger_write+0xf2/0x1a0 ? lock_acquire+0x9f/0x200 __vfs_write+0x26/0x170 ? rcu_read_lock_sched_held+0x6b/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x13e/0x1a0 ? vfs_write+0x18a/0x1b0 vfs_write+0xc1/0x1b0 ksys_write+0x45/0xa0 do_syscall_64+0x60/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe
unregister_trigger() will free given event_trigger_data at last. But that event_trigger_data will be freed again in event_trigger_callback() if register_snapshot_trigger() is failed, and causes a double free bug.
Registering the data should be the final operation in the register function on normal path, because the trigger must be ready for taking action right after it is registered.
Nice catch. I can reproduce this. But this patch is fixing the symptom and not the disease.
Fixes: commit 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Ingo Molnar mingo@redhat.com Cc: Tom Zanussi tom.zanussi@linux.intel.com Cc: stable@vger.kernel.org
kernel/trace/trace.c | 5 +++++ kernel/trace/trace.h | 2 ++ kernel/trace/trace_events_trigger.c | 10 ++++++---- 3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f054bd6a1c66..2556d8c097d2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -980,6 +980,11 @@ static void free_snapshot(struct trace_array *tr) tr->allocated_snapshot = false; } +void tracing_free_snapshot_instance(struct trace_array *tr) +{
- free_snapshot(tr);
+}
/**
- tracing_alloc_snapshot - allocate snapshot buffer.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index f8f86231ad90..03468bb8a79a 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1823,12 +1823,14 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len) #ifdef CONFIG_TRACER_SNAPSHOT void tracing_snapshot_instance(struct trace_array *tr); int tracing_alloc_snapshot_instance(struct trace_array *tr); +void tracing_free_snapshot_instance(struct trace_array *tr); #else static inline void tracing_snapshot_instance(struct trace_array *tr) { } static inline int tracing_alloc_snapshot_instance(struct trace_array *tr) { return 0; } +static inline void tracing_free_snapshot_instance(struct trace_array *tr) { } #endif extern struct trace_iterator *tracepoint_print_iter; diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..40e2f4406b2c 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1079,11 +1079,13 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) {
- int ret = register_trigger(glob, ops, data, file);
- int free_if_fail = !file->tr->allocated_snapshot;
- int ret = 0;
- if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) {
unregister_trigger(glob, ops, data, file);
ret = 0;
- if (!tracing_alloc_snapshot_instance(file->tr)) {
ret = register_trigger(glob, ops, data, file);
if (ret == 0 && free_if_fail)
}tracing_free_snapshot_instance(file->tr);
return ret;
Really, when we register_trigger() we should be able to freely call unresgister_trigger() with no side effects like the above happens. Instead of doing the above, I believe this is a better approach:
Thoughts?
P.S. This brings up another minor bug. The failure should return ENOMEM not ENOENT.
-- Steve
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..d15a746bcdfb 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -679,6 +679,7 @@ event_trigger_callback(struct event_command *cmd_ops, goto out_free;
out_reg: + event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, @@ -687,9 +688,9 @@ event_trigger_callback(struct event_command *cmd_ops, */ if (!ret) { ret = -ENOENT; - goto out_free; + goto out_free_trigger; } else if (ret < 0) - goto out_free; + goto out_free_trigger; ret = 0; out: return ret; @@ -699,6 +700,10 @@ event_trigger_callback(struct event_command *cmd_ops, cmd_ops->set_filter(NULL, trigger_data, NULL); kfree(trigger_data); goto out; + + out_free_trigger: + event_trigger_free(trigger_ops, trigger_data); + goto out; }
/**
On Mon, 23 Jul 2018 22:10:06 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Sat, 14 Jul 2018 01:27:47 +0900 Masami Hiramatsu mhiramat@kernel.org wrote:
Fix a double free bug of event_trigger_data caused by calling unregister_trigger() from register_snapshot_trigger(). This kicks a kernel BUG if double free checker is enabled as below;
kernel BUG at /home/mhiramat/ksrc/linux/mm/slub.c:296! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 4312 Comm: ftracetest Not tainted 4.18.0-rc1+ #44 Hardware name: ASUS All Series/B85M-G, BIOS 2108 08/11/2014 RIP: 0010:set_freepointer.part.37+0x0/0x10 Code: 41 b8 01 00 00 00 29 c8 4d 8d 0c 0c b9 10 00 00 00 50 e8 e3 28 23 00 8b 53 08 5e 5f 89 d1 81 e1 00 04 00 00 e9 e9 fe ff ff 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 90 7f 0d RSP: 0018:ffffa799caa3bd90 EFLAGS: 00010246 RAX: ffff9b825f8c8e80 RBX: ffff9b825f8c8e80 RCX: ffff9b825f8c8e80 RDX: 0000000000021562 RSI: ffff9b830e9e70e0 RDI: 0000000000000202 RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b830e0072c0 R13: ffffeb8e0d7e3200 R14: ffffffff961db7af R15: 00000000fffffffe FS: 00007f135ba9f700(0000) GS:ffff9b830e800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563736b5f3a2 CR3: 0000000295916005 CR4: 00000000001606e0 Call Trace: kfree+0x35d/0x380 event_trigger_callback+0x13f/0x1c0 event_trigger_write+0xf2/0x1a0 ? lock_acquire+0x9f/0x200 __vfs_write+0x26/0x170 ? rcu_read_lock_sched_held+0x6b/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x13e/0x1a0 ? vfs_write+0x18a/0x1b0 vfs_write+0xc1/0x1b0 ksys_write+0x45/0xa0 do_syscall_64+0x60/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe
unregister_trigger() will free given event_trigger_data at last. But that event_trigger_data will be freed again in event_trigger_callback() if register_snapshot_trigger() is failed, and causes a double free bug.
Registering the data should be the final operation in the register function on normal path, because the trigger must be ready for taking action right after it is registered.
Nice catch. I can reproduce this. But this patch is fixing the symptom and not the disease.
Fixes: commit 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Ingo Molnar mingo@redhat.com Cc: Tom Zanussi tom.zanussi@linux.intel.com Cc: stable@vger.kernel.org
kernel/trace/trace.c | 5 +++++ kernel/trace/trace.h | 2 ++ kernel/trace/trace_events_trigger.c | 10 ++++++---- 3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f054bd6a1c66..2556d8c097d2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -980,6 +980,11 @@ static void free_snapshot(struct trace_array *tr) tr->allocated_snapshot = false; } +void tracing_free_snapshot_instance(struct trace_array *tr) +{
- free_snapshot(tr);
+}
/**
- tracing_alloc_snapshot - allocate snapshot buffer.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index f8f86231ad90..03468bb8a79a 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1823,12 +1823,14 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len) #ifdef CONFIG_TRACER_SNAPSHOT void tracing_snapshot_instance(struct trace_array *tr); int tracing_alloc_snapshot_instance(struct trace_array *tr); +void tracing_free_snapshot_instance(struct trace_array *tr); #else static inline void tracing_snapshot_instance(struct trace_array *tr) { } static inline int tracing_alloc_snapshot_instance(struct trace_array *tr) { return 0; } +static inline void tracing_free_snapshot_instance(struct trace_array *tr) { } #endif extern struct trace_iterator *tracepoint_print_iter; diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..40e2f4406b2c 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1079,11 +1079,13 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) {
- int ret = register_trigger(glob, ops, data, file);
- int free_if_fail = !file->tr->allocated_snapshot;
- int ret = 0;
- if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) {
unregister_trigger(glob, ops, data, file);
ret = 0;
- if (!tracing_alloc_snapshot_instance(file->tr)) {
ret = register_trigger(glob, ops, data, file);
if (ret == 0 && free_if_fail)
}tracing_free_snapshot_instance(file->tr);
return ret;
Really, when we register_trigger() we should be able to freely call unresgister_trigger() with no side effects like the above happens. Instead of doing the above, I believe this is a better approach:
Thoughts?
Hmm, your patch seems to leak a memory since event_trigger_init() will be called twice on same trigger_data (Note that event_trigger_init() does not init ref counter, but increment it.) So we should decrement it when we find it is succeeded. Moreover, if register_trigger() fails before calling data->ops->init() (see -EEXIST case), the ref counter will be 0 (-1 +1). But if it fails after data->ops->init(), the ref counter will be 1 (-1 +1 +1). It still be unstable. (Ah, that means we may have another trouble...)
P.S. This brings up another minor bug. The failure should return ENOMEM not ENOENT.
Hmm it seems we should review the register_trigger() implementation. It should return the return value of trace_event_trigger_enable_disable(), shouldn't it?
Thank you,
-- Steve
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..d15a746bcdfb 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -679,6 +679,7 @@ event_trigger_callback(struct event_command *cmd_ops, goto out_free; out_reg:
- event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /*
- The above returns on success the # of functions enabled,
@@ -687,9 +688,9 @@ event_trigger_callback(struct event_command *cmd_ops, */ if (!ret) { ret = -ENOENT;
goto out_free;
} else if (ret < 0)goto out_free_trigger;
goto out_free;
ret = 0; out: return ret;goto out_free_trigger;
@@ -699,6 +700,10 @@ event_trigger_callback(struct event_command *cmd_ops, cmd_ops->set_filter(NULL, trigger_data, NULL); kfree(trigger_data); goto out;
- out_free_trigger:
- event_trigger_free(trigger_ops, trigger_data);
- goto out;
} /**
On Wed, 25 Jul 2018 00:09:09 +0900 Masami Hiramatsu mhiramat@kernel.org wrote:
Hmm, your patch seems to leak a memory since event_trigger_init() will be called twice on same trigger_data (Note that event_trigger_init() does not init ref counter, but increment it.) So we should decrement it when we find it is succeeded. Moreover, if register_trigger()
Good catch, and easily fixed.
fails before calling data->ops->init() (see -EEXIST case), the ref counter will be 0 (-1 +1). But if it fails after data->ops->init(), the ref counter will be 1 (-1 +1 +1). It still be unstable. (Ah, that means we may have another trouble...)
I'm not sure there's a problem here. I now have:
out_reg: /* Up the trigger_data count to make sure reg doesn't free it on failuer */ event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */ if (!ret) { ret = -ENOENT; } else if (ret > 0) ret = 0;
/* Down the counter of trigger_data or free it if not used anymore */ event_trigger_free(trigger_ops, trigger_data); out: return ret;
Thus we increment trigger_data before calling reg, and free it afterward. But if reg() did an init too, then the event_trigger_free() just decs the ref counter.
As for register_trigger()
P.S. This brings up another minor bug. The failure should return ENOMEM not ENOENT.
Hmm it seems we should review the register_trigger() implementation. It should return the return value of trace_event_trigger_enable_disable(), shouldn't it?
Yeah, that's not done well. I'll fix it up.
Thanks for pointing it out.
-- Steve
On Tue, 24 Jul 2018 16:49:59 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Hmm it seems we should review the register_trigger() implementation. It should return the return value of trace_event_trigger_enable_disable(), shouldn't it?
Yeah, that's not done well. I'll fix it up.
Thanks for pointing it out.
Tom,
register_trigger() is messed up. I should have caught this when it was first submitted, but I'm totally confused. The comments don't match the code.
First we have this:
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */
Which looks to be total BS.
As we have this:
/** * register_trigger - Generic event_command @reg implementation * @glob: The raw string used to register the trigger * @ops: The trigger ops associated with the trigger * @data: Trigger-specific data to associate with the trigger * @file: The trace_event_file associated with the event * * Common implementation for event trigger registration. * * Usually used directly as the @reg method in event command * implementations. * * Return: 0 on success, errno otherwise */ static int register_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) { struct event_trigger_data *test; int ret = 0;
list_for_each_entry_rcu(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) { ret = -EEXIST; goto out; } }
if (data->ops->init) { ret = data->ops->init(data->ops, data); if (ret < 0) goto out; }
list_add_rcu(&data->list, &file->triggers); ret++;
update_cond_flag(file); if (trace_event_trigger_enable_disable(file, 1) < 0) { list_del_rcu(&data->list); update_cond_flag(file); ret--; } out: return ret; }
Where the comment is total wrong. It doesn't return 0 on success, it returns 1. And if trace_event_trigger_enable_disable() fails it returns zero.
And that can fail with the call->class->reg() return value, which could fail for various strange reasons. I don't know why we would want to return 0 when it fails?
I don't see where ->reg() would return anything but 1 on success. Maybe I'm missing something. I'll look some more, but I'm thinking of changing ->reg() to return zero on all success, and negative on all errors and just check those results.
-- Steve
On Tue, 24 Jul 2018 17:30:08 -0400 Steven Rostedt rostedt@goodmis.org wrote:
I don't see where ->reg() would return anything but 1 on success. Maybe I'm missing something. I'll look some more, but I'm thinking of changing ->reg() to return zero on all success, and negative on all errors and just check those results.
Yeah, I'm going to make these changes. That is, all struct event_command .reg functions will return negative on error, and zero or positive on everything else. All the checks are going to be only for the negative value.
But for the bug fix itself, I believe this should do it. Masami, what do you think?
-- Steve
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..8771a27ca60f 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -679,6 +679,8 @@ event_trigger_callback(struct event_command *cmd_ops, goto out_free;
out_reg: + /* Up the trigger_data count to make sure reg doesn't free it on failure */ + event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, @@ -687,10 +689,11 @@ event_trigger_callback(struct event_command *cmd_ops, */ if (!ret) { ret = -ENOENT; - goto out_free; - } else if (ret < 0) - goto out_free; - ret = 0; + } else if (ret > 0) + ret = 0; + + /* Down the counter of trigger_data or free it if not used anymore */ + event_trigger_free(trigger_ops, trigger_data); out: return ret;
On Tue, 24 Jul 2018 19:13:31 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Tue, 24 Jul 2018 17:30:08 -0400 Steven Rostedt rostedt@goodmis.org wrote:
I don't see where ->reg() would return anything but 1 on success. Maybe I'm missing something. I'll look some more, but I'm thinking of changing ->reg() to return zero on all success, and negative on all errors and just check those results.
Yeah, I'm going to make these changes. That is, all struct event_command .reg functions will return negative on error, and zero or positive on everything else. All the checks are going to be only for the negative value.
But for the bug fix itself, I believe this should do it. Masami, what do you think?
Hm, as far as I can see, when register_trigger() returns >= 0, it already calls ->init the trigger_data. This means its refcount++, and in that case, below patch will miss to free the trigger_data. How about below for tentative fix?
if (!ret) { ret = -ENOENT; /* We have to forcibly free the trigger_data */ goto out_free; } else if (ret > 0) ret = 0;
Thank you,
-- Steve
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d18249683682..8771a27ca60f 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -679,6 +679,8 @@ event_trigger_callback(struct event_command *cmd_ops, goto out_free; out_reg:
- /* Up the trigger_data count to make sure reg doesn't free it on failure */
- event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /*
- The above returns on success the # of functions enabled,
@@ -687,10 +689,11 @@ event_trigger_callback(struct event_command *cmd_ops, */ if (!ret) { ret = -ENOENT;
goto out_free;
- } else if (ret < 0)
goto out_free;
- ret = 0;
- } else if (ret > 0)
ret = 0;
- /* Down the counter of trigger_data or free it if not used anymore */
- event_trigger_free(trigger_ops, trigger_data); out: return ret;
On Wed, 25 Jul 2018 10:16:53 +0900 Masami Hiramatsu mhiramat@kernel.org wrote:
Hm, as far as I can see, when register_trigger() returns >= 0, it already calls ->init the trigger_data. This means its refcount++, and in that case, below patch will miss to free the trigger_data. How about below for tentative fix?
if (!ret) { ret = -ENOENT; /* We have to forcibly free the trigger_data */ goto out_free; } else if (ret > 0) ret = 0;
Or better yet, match it properly:
out_reg: /* Up the trigger_data count to make sure reg doesn't free it on failure */ event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */ if (!ret) { cmd_ops->unreg(glob, trigger_ops, trigger_data, file); ret = -ENOENT; } else if (ret > 0) ret = 0;
/* Down the counter of trigger_data or free it if not used anymore */ event_trigger_free(trigger_ops, trigger_data); out: return ret;
-- Steve
On Tue, 24 Jul 2018 22:41:49 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Wed, 25 Jul 2018 10:16:53 +0900 Masami Hiramatsu mhiramat@kernel.org wrote:
Hm, as far as I can see, when register_trigger() returns >= 0, it already calls ->init the trigger_data. This means its refcount++, and in that case, below patch will miss to free the trigger_data. How about below for tentative fix?
if (!ret) { ret = -ENOENT; /* We have to forcibly free the trigger_data */ goto out_free; } else if (ret > 0) ret = 0;
Or better yet, match it properly:
OK, this looks good to me :)
Reviewed-by: Masami Hiramatsu mhiramat@kernel.org
Thanks!
out_reg: /* Up the trigger_data count to make sure reg doesn't free it on failure */ event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */ if (!ret) { cmd_ops->unreg(glob, trigger_ops, trigger_data, file); ret = -ENOENT; } else if (ret > 0) ret = 0;
/* Down the counter of trigger_data or free it if not used anymore */ event_trigger_free(trigger_ops, trigger_data); out: return ret;
-- Steve
Hi Steve,
On 7/24/2018 4:30 PM, Steven Rostedt wrote:
On Tue, 24 Jul 2018 16:49:59 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Hmm it seems we should review the register_trigger() implementation. It should return the return value of trace_event_trigger_enable_disable(), shouldn't it?
Yeah, that's not done well. I'll fix it up.
Thanks for pointing it out.
Tom,
register_trigger() is messed up. I should have caught this when it was first submitted, but I'm totally confused. The comments don't match the code.
First we have this:
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */
Which looks to be total BS.
Yes, it is BS in the case of event triggers. This was taken from the ftrace function trigger code, where it does make sense. I think I left it in thinking the code would at some point later converge.
As we have this:
/**
- register_trigger - Generic event_command @reg implementation
- @glob: The raw string used to register the trigger
- @ops: The trigger ops associated with the trigger
- @data: Trigger-specific data to associate with the trigger
- @file: The trace_event_file associated with the event
- Common implementation for event trigger registration.
- Usually used directly as the @reg method in event command
- implementations.
- Return: 0 on success, errno otherwise
And this is how it should work.
*/ static int register_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) { struct event_trigger_data *test; int ret = 0;
list_for_each_entry_rcu(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) { ret = -EEXIST; goto out; } }
if (data->ops->init) { ret = data->ops->init(data->ops, data); if (ret < 0) goto out; }
list_add_rcu(&data->list, &file->triggers); ret++;
update_cond_flag(file); if (trace_event_trigger_enable_disable(file, 1) < 0) { list_del_rcu(&data->list); update_cond_flag(file); ret--; } out: return ret; }
Where the comment is total wrong. It doesn't return 0 on success, it returns 1. And if trace_event_trigger_enable_disable() fails it returns zero.
And that can fail with the call->class->reg() return value, which could fail for various strange reasons. I don't know why we would want to return 0 when it fails?
I don't see where ->reg() would return anything but 1 on success. Maybe I'm missing something. I'll look some more, but I'm thinking of changing ->reg() to return zero on all success, and negative on all errors and just check those results.
Right, in the case of event triggers, we only register one at a time, whereas with the trace function triggers, with globbing multiple functions can register triggers at the same time, so it makes sense there to have reg() return a count and the more convoluted error handling.
So I agree, simplifying things here by using the standard error handling would be an improvement.
Tom
-- Steve
On Wed, 25 Jul 2018 11:01:22 -0500 Tom Zanussi tom.zanussi@linux.intel.com wrote:
First we have this:
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */
Which looks to be total BS.
Yes, it is BS in the case of event triggers. This was taken from the ftrace function trigger code, where it does make sense. I think I left it in thinking the code would at some point later converge.
OK, that makes a little more sense.
As we have this:
/**
- register_trigger - Generic event_command @reg implementation
- @glob: The raw string used to register the trigger
- @ops: The trigger ops associated with the trigger
- @data: Trigger-specific data to associate with the trigger
- @file: The trace_event_file associated with the event
- Common implementation for event trigger registration.
- Usually used directly as the @reg method in event command
- implementations.
- Return: 0 on success, errno otherwise
And this is how it should work.
*/ static int register_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) { struct event_trigger_data *test; int ret = 0;
list_for_each_entry_rcu(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) { ret = -EEXIST; goto out; } }
if (data->ops->init) { ret = data->ops->init(data->ops, data); if (ret < 0) goto out; }
list_add_rcu(&data->list, &file->triggers); ret++;
update_cond_flag(file); if (trace_event_trigger_enable_disable(file, 1) < 0) { list_del_rcu(&data->list); update_cond_flag(file); ret--; } out: return ret; }
Where the comment is total wrong. It doesn't return 0 on success, it returns 1. And if trace_event_trigger_enable_disable() fails it returns zero.
And that can fail with the call->class->reg() return value, which could fail for various strange reasons. I don't know why we would want to return 0 when it fails?
I don't see where ->reg() would return anything but 1 on success. Maybe I'm missing something. I'll look some more, but I'm thinking of changing ->reg() to return zero on all success, and negative on all errors and just check those results.
Right, in the case of event triggers, we only register one at a time, whereas with the trace function triggers, with globbing multiple functions can register triggers at the same time, so it makes sense there to have reg() return a count and the more convoluted error handling.
OK, reg in function probes will be handled differently.
So I agree, simplifying things here by using the standard error handling would be an improvement.
I'll start working on something for 4.19 to simplify it.
Thanks for confirming!
-- Steve
On Tue, 24 Jul 2018 16:49:59 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Wed, 25 Jul 2018 00:09:09 +0900 Masami Hiramatsu mhiramat@kernel.org wrote:
Hmm, your patch seems to leak a memory since event_trigger_init() will be called twice on same trigger_data (Note that event_trigger_init() does not init ref counter, but increment it.) So we should decrement it when we find it is succeeded. Moreover, if register_trigger()
Good catch, and easily fixed.
fails before calling data->ops->init() (see -EEXIST case), the ref counter will be 0 (-1 +1). But if it fails after data->ops->init(), the ref counter will be 1 (-1 +1 +1). It still be unstable. (Ah, that means we may have another trouble...)
I'm not sure there's a problem here. I now have:
out_reg: /* Up the trigger_data count to make sure reg doesn't free it on failuer */ event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, * but if it didn't find any functions it returns zero. * Consider no functions a failure too. */ if (!ret) { ret = -ENOENT; } else if (ret > 0) ret = 0;
Can we mixed up ret == 0 and ret > 0? It seems cmd_ops->reg() == 0 is a failure case.
/* Down the counter of trigger_data or free it if not used anymore */ event_trigger_free(trigger_ops, trigger_data); out: return ret;
Thus we increment trigger_data before calling reg, and free it afterward. But if reg() did an init too, then the event_trigger_free() just decs the ref counter.
To avoid confusion, I would like to suggest to rename those pair to event_trigger_data_get/put(). :)
As for register_trigger()
P.S. This brings up another minor bug. The failure should return ENOMEM not ENOENT.
Hmm it seems we should review the register_trigger() implementation. It should return the return value of trace_event_trigger_enable_disable(), shouldn't it?
Yeah, that's not done well. I'll fix it up.
Thanks!
Thanks for pointing it out.
-- Steve
linux-stable-mirror@lists.linaro.org