Hi all,
In v4: - A few cosmetic changes in func_set_flag(), as suggested by Steven Rostedt. Firstly, in the patch that adds pstore option, I use 'else if' for the new bit (it keeps changes minimal), but then there is a new separate patch (the last in the series) that converts the whole thing into a switch construction.
In v3: - Make traces versioned, as suggested by Steven, Tony and Colin. (The version tag is stored in the PRZ signature, see the last patch for the implementation details). - Add Steven's Ack on the first patch.
In v2: - Do not introduce a separate 'persistent' tracer, but introduce an option to the existing 'function' tracer.
Rationale for this patch set:
With this support kernel can save functions call chain log into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. It can be used to determine what function was last called before a hang or an unexpected reset (caused by, for example, a buggy driver that abuses HW).
Here's a "nano howto", to get the idea:
# mount -t debugfs debugfs /sys/kernel/debug/ # cd /sys/kernel/debug/tracing # echo function > current_tracer # echo 1 > options/func_pstore # reboot -f [...] # mount -t pstore pstore /mnt/ # tail /mnt/ftrace-ramoops 0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0 0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0 0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90 0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90 0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40 0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0 0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0 0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0 0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40 0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20
Mostly the code comes from trace_persistent.c driver found in the Android git tree, written by Colin Cross ccross@android.com (according to sign-off history). I reworked the driver a little bit, and ported it to pstore subsystem.
--- Documentation/ramoops.txt | 25 +++++++++ fs/pstore/Kconfig | 13 +++++ fs/pstore/Makefile | 1 + fs/pstore/ftrace.c | 35 +++++++++++++ fs/pstore/inode.c | 111 ++++++++++++++++++++++++++++++++++++++-- fs/pstore/internal.h | 43 ++++++++++++++++ fs/pstore/platform.c | 12 ++++- fs/pstore/ram.c | 65 +++++++++++++++++------ fs/pstore/ram_core.c | 12 +++-- include/linux/pstore.h | 13 +++++ include/linux/pstore_ram.h | 3 +- kernel/trace/trace.c | 7 +-- kernel/trace/trace_functions.c | 36 +++++++++---- 13 files changed, 337 insertions(+), 39 deletions(-)
If tracer->init() fails, current code will leave current_tracer pointing to an unusable tracer, which at best makes 'current_tracer' report inaccurate value.
Fix the issue by pointing current_tracer to nop tracer, and only update current_tracer with the new one after all the initialization succeeds.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org Acked-by: Steven Rostedt rostedt@goodmis.org --- kernel/trace/trace.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 814ff30..954f7d7 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3183,10 +3183,10 @@ static int tracing_set_tracer(const char *buf) } destroy_trace_option_files(topts);
- current_trace = t; + current_trace = &nop_trace;
- topts = create_trace_option_files(current_trace); - if (current_trace->use_max_tr) { + topts = create_trace_option_files(t); + if (t->use_max_tr) { int cpu; /* we need to make per cpu buffer sizes equivalent */ for_each_tracing_cpu(cpu) { @@ -3206,6 +3206,7 @@ static int tracing_set_tracer(const char *buf) goto out; }
+ current_trace = t; trace_branch_enable(tr); out: mutex_unlock(&trace_types_lock);
For function tracing we need to stop using pstore.buf directly, since in a tracing callback we can't use spinlocks, and thus we can't safely use the global buffer.
With write_buf callback, backends no longer need to access pstore.buf directly, and thus we can pass any buffers (e.g. allocated on stack).
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- fs/pstore/platform.c | 10 ++++++++++ include/linux/pstore.h | 4 ++++ 2 files changed, 14 insertions(+)
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 6b3ff04..ef5ca8a 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -188,6 +188,14 @@ static void pstore_register_console(void) static void pstore_register_console(void) {} #endif
+static int pstore_write_compat(enum pstore_type_id type, + enum kmsg_dump_reason reason, + u64 *id, unsigned int part, + size_t size, struct pstore_info *psi) +{ + return psi->write_buf(type, reason, id, part, psinfo->buf, size, psi); +} + /* * platform specific persistent storage driver registers with * us here. If pstore is already mounted, call the platform @@ -212,6 +220,8 @@ int pstore_register(struct pstore_info *psi) return -EINVAL; }
+ if (!psi->write) + psi->write = pstore_write_compat; psinfo = psi; mutex_init(&psinfo->read_mutex); spin_unlock(&pstore_lock); diff --git a/include/linux/pstore.h b/include/linux/pstore.h index 1bd014b..b107484 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -48,6 +48,10 @@ struct pstore_info { int (*write)(enum pstore_type_id type, enum kmsg_dump_reason reason, u64 *id, unsigned int part, size_t size, struct pstore_info *psi); + int (*write_buf)(enum pstore_type_id type, + enum kmsg_dump_reason reason, u64 *id, + unsigned int part, const char *buf, size_t size, + struct pstore_info *psi); int (*erase)(enum pstore_type_id type, u64 id, struct pstore_info *psi); void *data;
With this support kernel can save function call chain log into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. It can be used to determine what function was last called before a reset or panic.
We store the log in a binary format and then decode it at read time.
p.s. Mostly the code comes from trace_persistent.c driver found in the Android git tree, written by Colin Cross ccross@android.com (according to sign-off history). I reworked the driver a little bit, and ported it to pstore.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- fs/pstore/Kconfig | 12 ++++++ fs/pstore/Makefile | 1 + fs/pstore/ftrace.c | 35 +++++++++++++++ fs/pstore/inode.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++-- fs/pstore/internal.h | 43 +++++++++++++++++++ fs/pstore/platform.c | 2 +- include/linux/pstore.h | 9 ++++ 7 files changed, 208 insertions(+), 5 deletions(-) create mode 100644 fs/pstore/ftrace.c
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d044de6..d39bb5c 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -19,6 +19,18 @@ config PSTORE_CONSOLE When the option is enabled, pstore will log all kernel messages, even if no oops or panic happened.
+config PSTORE_FTRACE + bool "Persistent function tracer" + depends on PSTORE + depends on FUNCTION_TRACER + help + With this option kernel traces function calls into a persistent + ram buffer that can be decoded and dumped after reboot through + pstore filesystem. It can be used to determine what function + was last called before a reset or panic. + + If unsure, say N. + config PSTORE_RAM tristate "Log panic/oops to a RAM buffer" depends on PSTORE diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile index 278a44e..4c9095c 100644 --- a/fs/pstore/Makefile +++ b/fs/pstore/Makefile @@ -5,6 +5,7 @@ obj-y += pstore.o
pstore-objs += inode.o platform.o +obj-$(CONFIG_PSTORE_FTRACE) += ftrace.o
ramoops-objs += ram.o ram_core.o obj-$(CONFIG_PSTORE_RAM) += ramoops.o diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c new file mode 100644 index 0000000..a130d48 --- /dev/null +++ b/fs/pstore/ftrace.c @@ -0,0 +1,35 @@ +/* + * Copyright 2012 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/compiler.h> +#include <linux/irqflags.h> +#include <linux/percpu.h> +#include <linux/smp.h> +#include <linux/atomic.h> +#include <asm/barrier.h> +#include "internal.h" + +void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +{ + struct pstore_ftrace_record rec = {}; + + if (unlikely(oops_in_progress)) + return; + + rec.ip = ip; + rec.parent_ip = parent_ip; + pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); + psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, + sizeof(rec), psinfo); +} diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 45bff54..4ab572e 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -27,6 +27,7 @@ #include <linux/list.h> #include <linux/string.h> #include <linux/mount.h> +#include <linux/seq_file.h> #include <linux/ramfs.h> #include <linux/parser.h> #include <linux/sched.h> @@ -52,18 +53,117 @@ struct pstore_private { char data[]; };
+struct pstore_ftrace_seq_data { + const void *ptr; + size_t off; + size_t size; +}; + +#define REC_SIZE sizeof(struct pstore_ftrace_record) + +static void *pstore_ftrace_seq_start(struct seq_file *s, loff_t *pos) +{ + struct pstore_private *ps = s->private; + struct pstore_ftrace_seq_data *data; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return NULL; + + data->off = ps->size % REC_SIZE; + data->off += *pos * REC_SIZE; + if (data->off + REC_SIZE > ps->size) { + kfree(data); + return NULL; + } + + return data; + +} + +static void pstore_ftrace_seq_stop(struct seq_file *s, void *v) +{ + kfree(v); +} + +static void *pstore_ftrace_seq_next(struct seq_file *s, void *v, loff_t *pos) +{ + struct pstore_private *ps = s->private; + struct pstore_ftrace_seq_data *data = v; + + data->off += REC_SIZE; + if (data->off + REC_SIZE > ps->size) + return NULL; + + (*pos)++; + return data; +} + +static int pstore_ftrace_seq_show(struct seq_file *s, void *v) +{ + struct pstore_private *ps = s->private; + struct pstore_ftrace_seq_data *data = v; + struct pstore_ftrace_record *rec = (void *)(ps->data + data->off); + + seq_printf(s, "%d %08lx %08lx %pf <- %pF\n", + pstore_ftrace_decode_cpu(rec), rec->ip, rec->parent_ip, + (void *)rec->ip, (void *)rec->parent_ip); + + return 0; +} + +static const struct seq_operations pstore_ftrace_seq_ops = { + .start = pstore_ftrace_seq_start, + .next = pstore_ftrace_seq_next, + .stop = pstore_ftrace_seq_stop, + .show = pstore_ftrace_seq_show, +}; + static ssize_t pstore_file_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { - struct pstore_private *ps = file->private_data; + struct seq_file *sf = file->private_data; + struct pstore_private *ps = sf->private;
+ if (ps->type == PSTORE_TYPE_FTRACE) + return seq_read(file, userbuf, count, ppos); return simple_read_from_buffer(userbuf, count, ppos, ps->data, ps->size); }
+static int pstore_file_open(struct inode *inode, struct file *file) +{ + struct pstore_private *ps = inode->i_private; + struct seq_file *sf; + int err; + const struct seq_operations *sops = NULL; + + if (ps->type == PSTORE_TYPE_FTRACE) + sops = &pstore_ftrace_seq_ops; + + err = seq_open(file, sops); + if (err < 0) + return err; + + sf = file->private_data; + sf->private = ps; + + return 0; +} + +static loff_t pstore_file_llseek(struct file *file, loff_t off, int origin) +{ + struct seq_file *sf = file->private_data; + + if (sf->op) + return seq_lseek(file, off, origin); + return default_llseek(file, off, origin); +} + static const struct file_operations pstore_file_operations = { - .open = simple_open, - .read = pstore_file_read, - .llseek = default_llseek, + .open = pstore_file_open, + .read = pstore_file_read, + .llseek = pstore_file_llseek, + .release = seq_release, };
/* @@ -215,6 +315,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, case PSTORE_TYPE_CONSOLE: sprintf(name, "console-%s", psname); break; + case PSTORE_TYPE_FTRACE: + sprintf(name, "ftrace-%s", psname); + break; case PSTORE_TYPE_MCE: sprintf(name, "mce-%s-%lld", psname, id); break; diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 3bde461..958c48d 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -1,6 +1,49 @@ +#ifndef __PSTORE_INTERNAL_H__ +#define __PSTORE_INTERNAL_H__ + +#include <linux/pstore.h> + +#if NR_CPUS <= 2 && defined(CONFIG_ARM_THUMB) +#define PSTORE_CPU_IN_IP 0x1 +#elif NR_CPUS <= 4 && defined(CONFIG_ARM) +#define PSTORE_CPU_IN_IP 0x3 +#endif + +struct pstore_ftrace_record { + unsigned long ip; + unsigned long parent_ip; +#ifndef PSTORE_CPU_IN_IP + unsigned int cpu; +#endif +}; + +static inline void +pstore_ftrace_encode_cpu(struct pstore_ftrace_record *rec, unsigned int cpu) +{ +#ifndef PSTORE_CPU_IN_IP + rec->cpu = cpu; +#else + rec->ip |= cpu; +#endif +} + +static inline unsigned int +pstore_ftrace_decode_cpu(struct pstore_ftrace_record *rec) +{ +#ifndef PSTORE_CPU_IN_IP + return rec->cpu; +#else + return rec->ip & PSTORE_CPU_IN_IP; +#endif +} + +extern struct pstore_info *psinfo; + extern void pstore_set_kmsg_bytes(int); extern void pstore_get_records(int); extern int pstore_mkfile(enum pstore_type_id, char *psname, u64 id, char *data, size_t size, struct timespec time, struct pstore_info *psi); extern int pstore_is_mounted(void); + +#endif diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index ef5ca8a..29996e8 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -61,7 +61,7 @@ static DECLARE_WORK(pstore_work, pstore_dowork); * calls to pstore_register() */ static DEFINE_SPINLOCK(pstore_lock); -static struct pstore_info *psinfo; +struct pstore_info *psinfo;
static char *backend;
diff --git a/include/linux/pstore.h b/include/linux/pstore.h index b107484..120443b 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -30,6 +30,7 @@ enum pstore_type_id { PSTORE_TYPE_DMESG = 0, PSTORE_TYPE_MCE = 1, PSTORE_TYPE_CONSOLE = 2, + PSTORE_TYPE_FTRACE = 3, PSTORE_TYPE_UNKNOWN = 255 };
@@ -57,6 +58,14 @@ struct pstore_info { void *data; };
+ +#ifdef CONFIG_PSTORE_FTRACE +extern void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip); +#else +static inline void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +{ } +#endif + #ifdef CONFIG_PSTORE extern int pstore_register(struct pstore_info *); #else
On Mon, 2012-07-09 at 17:10 -0700, Anton Vorontsov wrote:
--- /dev/null +++ b/fs/pstore/ftrace.c @@ -0,0 +1,35 @@ +/*
- Copyright 2012 Google, Inc.
- This software is licensed under the terms of the GNU General Public
- License version 2, as published by the Free Software Foundation, and
- may be copied, distributed, and modified under those terms.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- */
+#include <linux/kernel.h> +#include <linux/compiler.h> +#include <linux/irqflags.h> +#include <linux/percpu.h> +#include <linux/smp.h> +#include <linux/atomic.h> +#include <asm/barrier.h> +#include "internal.h"
+void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip)
BTW, you can make the entire file 'notrace' without adding annotations by including in the Makefile:
CFLAGS_REMOVE_ftrace.o = -pg
+{
- struct pstore_ftrace_record rec = {};
- if (unlikely(oops_in_progress))
return;
- rec.ip = ip;
- rec.parent_ip = parent_ip;
- pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id());
- psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec,
sizeof(rec), psinfo);
+}
BTW, can any of the called functions go into module code that can be removed? If so, then this is not safe at all. Normal function tracing can not be synced in a preemptible kernel.
Also, I'm starting to wonder if this should be in its own utility (separate debugfs?) than hooking directly into ftrace. Then you don't need to modify ftrace at all and you can do the following:
static struct ftrace_ops trace_ops { .func = pstore_ftrace_call; };
then in your write to debugfs file:
register_ftrace_function(&trace_ops);
To turn off tracing:
unregister_ftrace_function(&trace_ops);
Note, it's safe to use if the trace_ops (or anything the callback calls) is a module. That's because it detects the trace_ops is not kernel core code and will place a wrapper around it that allows the function tracing to by synced with module unload. You still need to unregister the trace_ops before unloading the module, or you can have a crash that way.
-- Steve
On Tue, Jul 17, 2012 at 03:38:18PM -0400, Steven Rostedt wrote: [...]
+void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip)
BTW, you can make the entire file 'notrace' without adding annotations by including in the Makefile:
CFLAGS_REMOVE_ftrace.o = -pg
Actually it was in the first version in the patch, but then I changed it 'notrace' for just this func. This is for the case if the file would contain some more code which we actually may trace. Doing things fine- grained seemed to be better than making the whole file as notrace. Plus it is one line less. :-)
But I have no preference, so I can change it.
+{
- struct pstore_ftrace_record rec = {};
- if (unlikely(oops_in_progress))
return;
- rec.ip = ip;
- rec.parent_ip = parent_ip;
- pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id());
- psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec,
sizeof(rec), psinfo);
+}
BTW, can any of the called functions go into module code that can be removed? If so, then this is not safe at all. Normal function tracing can not be synced in a preemptible kernel.
Um. Yes, psinfo->write_buf() might be in the module. Nice catch.
Also, I'm starting to wonder if this should be in its own utility (separate debugfs?) than hooking directly into ftrace. Then you don't need to modify ftrace at all and you can do the following:
static struct ftrace_ops trace_ops { .func = pstore_ftrace_call; };
then in your write to debugfs file:
register_ftrace_function(&trace_ops);
To turn off tracing:
unregister_ftrace_function(&trace_ops);
Note, it's safe to use if the trace_ops (or anything the callback calls) is a module. That's because it detects the trace_ops is not kernel core code and will place a wrapper around it that allows the function tracing to by synced with module unload. You still need to unregister the trace_ops before unloading the module, or you can have a crash that way.
Hehe. Like this? http://lkml.org/lkml/2012/5/26/80 :-D
So, do you want something like this, but combinded: we don't register another tracer, but register our own ftrace_ops? This sounds doable.
Thanks!
On Tue, 2012-07-17 at 13:01 -0700, Anton Vorontsov wrote:
On Tue, Jul 17, 2012 at 03:38:18PM -0400, Steven Rostedt wrote: [...]
+void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip)
BTW, you can make the entire file 'notrace' without adding annotations by including in the Makefile:
CFLAGS_REMOVE_ftrace.o = -pg
Actually it was in the first version in the patch, but then I changed it 'notrace' for just this func. This is for the case if the file would contain some more code which we actually may trace. Doing things fine- grained seemed to be better than making the whole file as notrace. Plus it is one line less. :-)
But I have no preference, so I can change it.
I have no preference, it was just an FYI. For just a single function in a file, it really makes no difference which you use. The Makefile trick is better for needing to make sure an entire file is not traced.
+{
- struct pstore_ftrace_record rec = {};
- if (unlikely(oops_in_progress))
return;
- rec.ip = ip;
- rec.parent_ip = parent_ip;
- pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id());
- psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec,
sizeof(rec), psinfo);
+}
BTW, can any of the called functions go into module code that can be removed? If so, then this is not safe at all. Normal function tracing can not be synced in a preemptible kernel.
Um. Yes, psinfo->write_buf() might be in the module. Nice catch.
Also, I'm starting to wonder if this should be in its own utility (separate debugfs?) than hooking directly into ftrace. Then you don't need to modify ftrace at all and you can do the following:
static struct ftrace_ops trace_ops { .func = pstore_ftrace_call; };
then in your write to debugfs file:
register_ftrace_function(&trace_ops);
To turn off tracing:
unregister_ftrace_function(&trace_ops);
Note, it's safe to use if the trace_ops (or anything the callback calls) is a module. That's because it detects the trace_ops is not kernel core code and will place a wrapper around it that allows the function tracing to by synced with module unload. You still need to unregister the trace_ops before unloading the module, or you can have a crash that way.
Hehe. Like this? http://lkml.org/lkml/2012/5/26/80 :-D
So, do you want something like this, but combinded: we don't register another tracer, but register our own ftrace_ops? This sounds doable.
Yeah, the combined. That is, don't make a tracer out of it. Use another mechanism to enable the tracing and not as a tracer in /debug/tracing/available_tracers. We can probably set up events for you too, but at a later time.
-- Steve
On Tue, Jul 17, 2012 at 05:38:22PM -0400, Steven Rostedt wrote: [...]
BTW, can any of the called functions go into module code that can be removed? If so, then this is not safe at all. Normal function tracing can not be synced in a preemptible kernel.
Um. Yes, psinfo->write_buf() might be in the module. Nice catch.
Oh, actually, while write_buf() can technically be in a removable module, the RAM backend can't be removed once loaded. So it is not a real issue. But I'll implement your idea anyway, it seems a bit cleaner and safer, plus if we ever want to make persistent_ram backend removable, we won't need to bother w/ this particular issue.
Thanks!
With this patch we no longer reuse function tracer infrastructure, now we register our own tracer back-end via a debugfs knob.
It's a bit more code, but that is the only downside. On the bright side we have:
- Ability to make persistent_ram module removable (when needed, we can move ftrace_ops struct into a module). Note that persistent_ram is still not removable for other reasons, but with this patch it's just one thing less to worry about;
- Pstore part is more isolated from the generic function tracer. We tried it already by registering our own tracer in available_tracers, but that way we're loosing ability to see the traces while we record them to pstore. This solution is somewhere in the middle: we only register "internal ftracer" back-end, but not the "front-end";
- When only pstore tracing enabled, the kernel will only write to the pstore buffer, omitting function tracer buffer (which, of course, still can be enabled via 'echo function > current_tracer').
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Tue, Jul 17, 2012 at 05:38:22PM -0400, Steven Rostedt wrote: [...]
Note, it's safe to use if the trace_ops (or anything the callback calls) is a module. That's because it detects the trace_ops is not kernel core code and will place a wrapper around it that allows the function tracing to by synced with module unload. You still need to unregister the trace_ops before unloading the module, or you can have a crash that way.
Hehe. Like this? http://lkml.org/lkml/2012/5/26/80 :-D
So, do you want something like this, but combinded: we don't register another tracer, but register our own ftrace_ops? This sounds doable.
Yeah, the combined. That is, don't make a tracer out of it. Use another mechanism to enable the tracing and not as a tracer in /debug/tracing/available_tracers. We can probably set up events for you too, but at a later time.
OK, here it is. How does it look?
Documentation/ramoops.txt | 4 +- fs/pstore/Kconfig | 1 + fs/pstore/ftrace.c | 94 +++++++++++++++++++++++++++++++++++++++- fs/pstore/internal.h | 6 +++ fs/pstore/platform.c | 1 + include/linux/pstore.h | 8 ---- kernel/trace/trace_functions.c | 15 +------ 7 files changed, 103 insertions(+), 26 deletions(-)
diff --git a/Documentation/ramoops.txt b/Documentation/ramoops.txt index 197ad59..69b3cac 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/ramoops.txt @@ -102,9 +102,7 @@ related hangs. The functions call chain log is stored in a "ftrace-ramoops" file. Here is an example of usage:
# mount -t debugfs debugfs /sys/kernel/debug/ - # cd /sys/kernel/debug/tracing - # echo function > current_tracer - # echo 1 > options/func_pstore + # echo 1 > /sys/kernel/debug/pstore/record_ftrace # reboot -f [...] # mount -t pstore pstore /mnt/ diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d39bb5c..ca71db6 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -23,6 +23,7 @@ config PSTORE_FTRACE bool "Persistent function tracer" depends on PSTORE depends on FUNCTION_TRACER + depends on DEBUG_FS help With this option kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index a130d48..824be81 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -17,19 +17,111 @@ #include <linux/percpu.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/types.h> +#include <linux/mutex.h> +#include <linux/ftrace.h> +#include <linux/fs.h> +#include <linux/debugfs.h> +#include <linux/err.h> #include <asm/barrier.h> #include "internal.h"
-void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip, + unsigned long parent_ip) { + unsigned long flags; struct pstore_ftrace_record rec = {};
+ if (unlikely(!ftrace_enabled)) + return; if (unlikely(oops_in_progress)) return;
+ local_irq_save(flags); + rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo); + + local_irq_restore(flags); +} + +static struct ftrace_ops pstore_ftrace_ops __read_mostly = { + .func = pstore_ftrace_call, +}; + +static DEFINE_MUTEX(pstore_ftrace_lock); +static bool pstore_ftrace_enabled; + +static ssize_t pstore_ftrace_knob_write(struct file *f, const char __user *buf, + size_t count, loff_t *ppos) +{ + u8 on; + int err; + + err = kstrtou8_from_user(buf, count, 2, &on); + if (err) + return err; + + mutex_lock(&pstore_ftrace_lock); + + if (!on ^ pstore_ftrace_enabled) + goto out; + pstore_ftrace_enabled = on; + + if (on) + register_ftrace_function(&pstore_ftrace_ops); + else + unregister_ftrace_function(&pstore_ftrace_ops); +out: + mutex_unlock(&pstore_ftrace_lock); + + return count; +} + +static ssize_t pstore_ftrace_knob_read(struct file *f, char __user *buf, + size_t count, loff_t *ppos) +{ + char val[] = { '0' + pstore_ftrace_enabled, '\n' }; + + return simple_read_from_buffer(buf, count, ppos, val, sizeof(val)); +} + +static const struct file_operations pstore_knob_fops = { + .open = simple_open, + .read = pstore_ftrace_knob_read, + .write = pstore_ftrace_knob_write, +}; + +void pstore_register_ftrace(void) +{ + struct dentry *dir; + struct dentry *file; + int err; + + if (!psinfo->write_buf) + return; + + dir = debugfs_create_dir("pstore", NULL); + if (IS_ERR_OR_NULL(dir)) { + err = PTR_ERR(dir); + pr_err("%s: unable to create pstore debugfs: %d\n", + __func__, err); + return; + } + + file = debugfs_create_file("record_ftrace", 0600, dir, NULL, + &pstore_knob_fops); + if (IS_ERR_OR_NULL(file)) { + err = PTR_ERR(file); + pr_err("%s: unable to create pstore/ftrace file: %d\n", + __func__, err); + goto err_file; + } + + return; +err_file: + debugfs_remove(dir); } diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 0d0d3b7..4847f58 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -39,6 +39,12 @@ pstore_ftrace_decode_cpu(struct pstore_ftrace_record *rec) #endif }
+#ifdef CONFIG_PSTORE_FTRACE +extern void pstore_register_ftrace(void); +#else +static inline void pstore_register_ftrace(void) {} +#endif + extern struct pstore_info *psinfo;
extern void pstore_set_kmsg_bytes(int); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 29996e8..6c23eab 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -236,6 +236,7 @@ int pstore_register(struct pstore_info *psi)
kmsg_dump_register(&pstore_dumper); pstore_register_console(); + pstore_register_ftrace();
if (pstore_update_ms >= 0) { pstore_timer.expires = jiffies + diff --git a/include/linux/pstore.h b/include/linux/pstore.h index c892587..ee3034a 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -64,14 +64,6 @@ struct pstore_info { void *data; };
- -#ifdef CONFIG_PSTORE_FTRACE -extern void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip); -#else -static inline void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) -{ } -#endif - #ifdef CONFIG_PSTORE extern int pstore_register(struct pstore_info *); #else diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c index a426f41..0ad83e3 100644 --- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -13,7 +13,6 @@ #include <linux/debugfs.h> #include <linux/uaccess.h> #include <linux/ftrace.h> -#include <linux/pstore.h> #include <linux/fs.h>
#include "trace.h" @@ -75,10 +74,9 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip) preempt_enable_notrace(); }
-/* Our two options */ +/* Our option */ enum { TRACE_FUNC_OPT_STACK = 0x1, - TRACE_FUNC_OPT_PSTORE = 0x2, };
static struct tracer_flags func_flags; @@ -106,12 +104,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip) disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) { - /* - * So far tracing doesn't support multiple buffers, so - * we make an explicit call for now. - */ - if (unlikely(func_flags.val & TRACE_FUNC_OPT_PSTORE)) - pstore_ftrace_call(ip, parent_ip); pc = preempt_count(); trace_function(tr, ip, parent_ip, flags, pc); } @@ -177,9 +169,6 @@ static struct tracer_opt func_opts[] = { #ifdef CONFIG_STACKTRACE { TRACER_OPT(func_stack_trace, TRACE_FUNC_OPT_STACK) }, #endif -#ifdef CONFIG_PSTORE_FTRACE - { TRACER_OPT(func_pstore, TRACE_FUNC_OPT_PSTORE) }, -#endif { } /* Always set a last empty entry */ };
@@ -232,8 +221,6 @@ static int func_set_flag(u32 old_flags, u32 bit, int set) }
break; - case TRACE_FUNC_OPT_PSTORE: - break; default: return -EINVAL; }
On Tue, Jul 17, 2012 at 08:47:22PM -0700, Anton Vorontsov wrote: [...]
-void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip,
unsigned long parent_ip)
{
- unsigned long flags; struct pstore_ftrace_record rec = {};
- if (unlikely(!ftrace_enabled))
if (unlikely(oops_in_progress)) return;return;
- local_irq_save(flags);
- rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo);
Btw, here we might be running w/o recurse protection, and that helped to find a bug in the persistent ram module.
The bug was quite subtle: it only happened if pstore tracing was enabled before any other tracers. And it magically disappeared otherwise.
This is because ftrace_ops_list_func() does its own recurse protection, but ftrace_ops_list_func() is only used when there are more than one 'struct ops' registered, otherwise ->func is called directly.
Of course, if I specify FL_GLOBAL/FL_CONTROL flag for the tracer, then it will not try to call the func directly. But then there is a question: do we really want to set these flags if we yet don't want removable modules?
Or, setting at least FL_CONTROL would be a good idea anyway, since it will then react to ftrace_function_local_{enable,disable}()?
Thanks,
On Wed, 2012-07-18 at 00:26 -0700, Anton Vorontsov wrote:
On Tue, Jul 17, 2012 at 08:47:22PM -0700, Anton Vorontsov wrote: [...]
-void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip,
unsigned long parent_ip)
{
- unsigned long flags; struct pstore_ftrace_record rec = {};
- if (unlikely(!ftrace_enabled))
if (unlikely(oops_in_progress)) return;return;
- local_irq_save(flags);
- rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo);
Btw, here we might be running w/o recurse protection, and that helped to find a bug in the persistent ram module.
The bug was quite subtle: it only happened if pstore tracing was enabled before any other tracers. And it magically disappeared otherwise.
This is because ftrace_ops_list_func() does its own recurse protection, but ftrace_ops_list_func() is only used when there are more than one 'struct ops' registered, otherwise ->func is called directly.
Of course, if I specify FL_GLOBAL/FL_CONTROL flag for the tracer, then it will not try to call the func directly. But then there is a question: do we really want to set these flags if we yet don't want removable modules?
Or, setting at least FL_CONTROL would be a good idea anyway, since it will then react to ftrace_function_local_{enable,disable}()?
I have a patch to fix this already. It's part of my kprobe/ftrace work.
The patch has been published here:
https://lkml.org/lkml/2012/7/11/476
I'm hoping to get this ready for 3.6. Thus, don't worry about adding recursion protection. ftrace should do that for you.
Thanks!
-- Steve
On Tue, 2012-07-17 at 20:47 -0700, Anton Vorontsov wrote:
diff --git a/Documentation/ramoops.txt b/Documentation/ramoops.txt index 197ad59..69b3cac 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/ramoops.txt @@ -102,9 +102,7 @@ related hangs. The functions call chain log is stored in a "ftrace-ramoops" file. Here is an example of usage: # mount -t debugfs debugfs /sys/kernel/debug/
- # cd /sys/kernel/debug/tracing
- # echo function > current_tracer
- # echo 1 > options/func_pstore
- # echo 1 > /sys/kernel/debug/pstore/record_ftrace # reboot -f [...] # mount -t pstore pstore /mnt/
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d39bb5c..ca71db6 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -23,6 +23,7 @@ config PSTORE_FTRACE bool "Persistent function tracer" depends on PSTORE depends on FUNCTION_TRACER
- depends on DEBUG_FS help With this option kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through
diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index a130d48..824be81 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -17,19 +17,111 @@ #include <linux/percpu.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/types.h> +#include <linux/mutex.h> +#include <linux/ftrace.h> +#include <linux/fs.h> +#include <linux/debugfs.h> +#include <linux/err.h> #include <asm/barrier.h> #include "internal.h" -void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip,
unsigned long parent_ip)
{
- unsigned long flags; struct pstore_ftrace_record rec = {};
- if (unlikely(!ftrace_enabled))
return;
You don't need this check. The ftrace_enabled is internal to ftrace and the users shouldn't care. If we disable ftrace, the callback shouldn't be executed. If it is, that's our fault ;-)
if (unlikely(oops_in_progress)) return;
- local_irq_save(flags);
- rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo);
- local_irq_restore(flags);
+}
+static struct ftrace_ops pstore_ftrace_ops __read_mostly = {
- .func = pstore_ftrace_call,
+};
+static DEFINE_MUTEX(pstore_ftrace_lock); +static bool pstore_ftrace_enabled;
+static ssize_t pstore_ftrace_knob_write(struct file *f, const char __user *buf,
size_t count, loff_t *ppos)
+{
- u8 on;
- int err;
- err = kstrtou8_from_user(buf, count, 2, &on);
- if (err)
return err;
- mutex_lock(&pstore_ftrace_lock);
- if (!on ^ pstore_ftrace_enabled)
goto out;
- pstore_ftrace_enabled = on;
- if (on)
register_ftrace_function(&pstore_ftrace_ops);
You might want to check the return value of this.
-- Steve
- else
unregister_ftrace_function(&pstore_ftrace_ops);
+out:
- mutex_unlock(&pstore_ftrace_lock);
- return count;
+}
+static ssize_t pstore_ftrace_knob_read(struct file *f, char __user *buf,
size_t count, loff_t *ppos)
+{
- char val[] = { '0' + pstore_ftrace_enabled, '\n' };
- return simple_read_from_buffer(buf, count, ppos, val, sizeof(val));
+}
+static const struct file_operations pstore_knob_fops = {
- .open = simple_open,
- .read = pstore_ftrace_knob_read,
- .write = pstore_ftrace_knob_write,
+};
+void pstore_register_ftrace(void) +{
- struct dentry *dir;
- struct dentry *file;
- int err;
- if (!psinfo->write_buf)
return;
- dir = debugfs_create_dir("pstore", NULL);
- if (IS_ERR_OR_NULL(dir)) {
err = PTR_ERR(dir);
pr_err("%s: unable to create pstore debugfs: %d\n",
__func__, err);
return;
- }
- file = debugfs_create_file("record_ftrace", 0600, dir, NULL,
&pstore_knob_fops);
- if (IS_ERR_OR_NULL(file)) {
err = PTR_ERR(file);
pr_err("%s: unable to create pstore/ftrace file: %d\n",
__func__, err);
goto err_file;
- }
- return;
+err_file:
- debugfs_remove(dir);
}
With this patch we no longer reuse function tracer infrastructure, now we register our own tracer back-end via a debugfs knob.
It's a bit more code, but that is the only downside. On the bright side we have:
- Ability to make persistent_ram module removable (when needed, we can move ftrace_ops struct into a module). Note that persistent_ram is still not removable for other reasons, but with this patch it's just one thing less to worry about;
- Pstore part is more isolated from the generic function tracer. We tried it already by registering our own tracer in available_tracers, but that way we're loosing ability to see the traces while we record them to pstore. This solution is somewhere in the middle: we only register "internal ftracer" back-end, but not the "front-end";
- When there is only pstore tracing enabled, the kernel will only write to the pstore buffer, omitting function tracer buffer (which, of course, still can be enabled via 'echo function > current_tracer').
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Wed, Jul 18, 2012 at 09:10:21AM -0400, Steven Rostedt wrote: [...]
- if (unlikely(!ftrace_enabled))
return;
You don't need this check. The ftrace_enabled is internal to ftrace and the users shouldn't care. If we disable ftrace, the callback shouldn't be executed. If it is, that's our fault ;-)
I believe I've seen this in some other tracer and thought that there was a good reason for it. OK, removed.
(Yup, just checked, that was kernel/trace/trace_stack.c)
[...]
- if (on)
register_ftrace_function(&pstore_ftrace_ops);
You might want to check the return value of this.
Right you are, I'd better check.
Thanks for the review! Here comes v2.
Documentation/ramoops.txt | 4 +- fs/pstore/Kconfig | 1 + fs/pstore/ftrace.c | 101 +++++++++++++++++++++++++++++++++++++++- fs/pstore/internal.h | 6 +++ fs/pstore/platform.c | 1 + include/linux/pstore.h | 8 ---- kernel/trace/trace_functions.c | 15 +----- 7 files changed, 110 insertions(+), 26 deletions(-)
diff --git a/Documentation/ramoops.txt b/Documentation/ramoops.txt index 197ad59..69b3cac 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/ramoops.txt @@ -102,9 +102,7 @@ related hangs. The functions call chain log is stored in a "ftrace-ramoops" file. Here is an example of usage:
# mount -t debugfs debugfs /sys/kernel/debug/ - # cd /sys/kernel/debug/tracing - # echo function > current_tracer - # echo 1 > options/func_pstore + # echo 1 > /sys/kernel/debug/pstore/record_ftrace # reboot -f [...] # mount -t pstore pstore /mnt/ diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d39bb5c..ca71db6 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -23,6 +23,7 @@ config PSTORE_FTRACE bool "Persistent function tracer" depends on PSTORE depends on FUNCTION_TRACER + depends on DEBUG_FS help With this option kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index a130d48..1185b92 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -17,19 +17,118 @@ #include <linux/percpu.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/types.h> +#include <linux/mutex.h> +#include <linux/ftrace.h> +#include <linux/fs.h> +#include <linux/debugfs.h> +#include <linux/err.h> +#include <linux/cache.h> #include <asm/barrier.h> #include "internal.h"
-void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip, + unsigned long parent_ip) { + unsigned long flags; struct pstore_ftrace_record rec = {};
if (unlikely(oops_in_progress)) return;
+ local_irq_save(flags); + rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo); + + local_irq_restore(flags); +} + +static struct ftrace_ops pstore_ftrace_ops __read_mostly = { + .func = pstore_ftrace_call, +}; + +static DEFINE_MUTEX(pstore_ftrace_lock); +static bool pstore_ftrace_enabled; + +static ssize_t pstore_ftrace_knob_write(struct file *f, const char __user *buf, + size_t count, loff_t *ppos) +{ + u8 on; + ssize_t ret; + + ret = kstrtou8_from_user(buf, count, 2, &on); + if (ret) + return ret; + + mutex_lock(&pstore_ftrace_lock); + + if (!on ^ pstore_ftrace_enabled) + goto out; + + if (on) + ret = register_ftrace_function(&pstore_ftrace_ops); + else + ret = unregister_ftrace_function(&pstore_ftrace_ops); + if (ret) { + pr_err("%s: unable to %sregister ftrace ops: %zd\n", + __func__, on ? "" : "un", ret); + goto err; + } + + pstore_ftrace_enabled = on; +out: + ret = count; +err: + mutex_unlock(&pstore_ftrace_lock); + + return ret; +} + +static ssize_t pstore_ftrace_knob_read(struct file *f, char __user *buf, + size_t count, loff_t *ppos) +{ + char val[] = { '0' + pstore_ftrace_enabled, '\n' }; + + return simple_read_from_buffer(buf, count, ppos, val, sizeof(val)); +} + +static const struct file_operations pstore_knob_fops = { + .open = simple_open, + .read = pstore_ftrace_knob_read, + .write = pstore_ftrace_knob_write, +}; + +void pstore_register_ftrace(void) +{ + struct dentry *dir; + struct dentry *file; + int err; + + if (!psinfo->write_buf) + return; + + dir = debugfs_create_dir("pstore", NULL); + if (IS_ERR_OR_NULL(dir)) { + err = PTR_ERR(dir); + pr_err("%s: unable to create pstore debugfs: %d\n", + __func__, err); + return; + } + + file = debugfs_create_file("record_ftrace", 0600, dir, NULL, + &pstore_knob_fops); + if (IS_ERR_OR_NULL(file)) { + err = PTR_ERR(file); + pr_err("%s: unable to create pstore/ftrace file: %d\n", + __func__, err); + goto err_file; + } + + return; +err_file: + debugfs_remove(dir); } diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 0d0d3b7..4847f58 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -39,6 +39,12 @@ pstore_ftrace_decode_cpu(struct pstore_ftrace_record *rec) #endif }
+#ifdef CONFIG_PSTORE_FTRACE +extern void pstore_register_ftrace(void); +#else +static inline void pstore_register_ftrace(void) {} +#endif + extern struct pstore_info *psinfo;
extern void pstore_set_kmsg_bytes(int); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 29996e8..6c23eab 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -236,6 +236,7 @@ int pstore_register(struct pstore_info *psi)
kmsg_dump_register(&pstore_dumper); pstore_register_console(); + pstore_register_ftrace();
if (pstore_update_ms >= 0) { pstore_timer.expires = jiffies + diff --git a/include/linux/pstore.h b/include/linux/pstore.h index c892587..ee3034a 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -64,14 +64,6 @@ struct pstore_info { void *data; };
- -#ifdef CONFIG_PSTORE_FTRACE -extern void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip); -#else -static inline void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) -{ } -#endif - #ifdef CONFIG_PSTORE extern int pstore_register(struct pstore_info *); #else diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c index a426f41..0ad83e3 100644 --- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -13,7 +13,6 @@ #include <linux/debugfs.h> #include <linux/uaccess.h> #include <linux/ftrace.h> -#include <linux/pstore.h> #include <linux/fs.h>
#include "trace.h" @@ -75,10 +74,9 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip) preempt_enable_notrace(); }
-/* Our two options */ +/* Our option */ enum { TRACE_FUNC_OPT_STACK = 0x1, - TRACE_FUNC_OPT_PSTORE = 0x2, };
static struct tracer_flags func_flags; @@ -106,12 +104,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip) disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) { - /* - * So far tracing doesn't support multiple buffers, so - * we make an explicit call for now. - */ - if (unlikely(func_flags.val & TRACE_FUNC_OPT_PSTORE)) - pstore_ftrace_call(ip, parent_ip); pc = preempt_count(); trace_function(tr, ip, parent_ip, flags, pc); } @@ -177,9 +169,6 @@ static struct tracer_opt func_opts[] = { #ifdef CONFIG_STACKTRACE { TRACER_OPT(func_stack_trace, TRACE_FUNC_OPT_STACK) }, #endif -#ifdef CONFIG_PSTORE_FTRACE - { TRACER_OPT(func_pstore, TRACE_FUNC_OPT_PSTORE) }, -#endif { } /* Always set a last empty entry */ };
@@ -232,8 +221,6 @@ static int func_set_flag(u32 old_flags, u32 bit, int set) }
break; - case TRACE_FUNC_OPT_PSTORE: - break; default: return -EINVAL; }
On 07/17/12 20:47, Anton Vorontsov wrote:
- file = debugfs_create_file("record_ftrace", 0600, dir, NULL,
&pstore_knob_fops);
- if (IS_ERR_OR_NULL(file)) {
err = PTR_ERR(file);
pr_err("%s: unable to create pstore/ftrace file: %d\n",
__func__, err);
goto err_file;
- }
debugfs only returns NULL on failure.
On Wed, Jul 18, 2012 at 10:12:44AM -0700, Stephen Boyd wrote:
On 07/17/12 20:47, Anton Vorontsov wrote:
- file = debugfs_create_file("record_ftrace", 0600, dir, NULL,
&pstore_knob_fops);
- if (IS_ERR_OR_NULL(file)) {
err = PTR_ERR(file);
pr_err("%s: unable to create pstore/ftrace file: %d\n",
__func__, err);
goto err_file;
- }
debugfs only returns NULL on failure.
Well, techincally, with DEBUG_FS=y, yes. (And we have dependency on it.)
But see include/linux/debugfs.h for DEBUG_FS=n case:
static inline struct dentry *debugfs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops) { return ERR_PTR(-ENODEV); }
So, I think it is fine to check for IS_ERR_OR_NULL(), although today it's always NULL for our case, true.
Thanks,
On 07/18/12 11:50, Anton Vorontsov wrote:
On Wed, Jul 18, 2012 at 10:12:44AM -0700, Stephen Boyd wrote:
On 07/17/12 20:47, Anton Vorontsov wrote:
- file = debugfs_create_file("record_ftrace", 0600, dir, NULL,
&pstore_knob_fops);
- if (IS_ERR_OR_NULL(file)) {
err = PTR_ERR(file);
pr_err("%s: unable to create pstore/ftrace file: %d\n",
__func__, err);
goto err_file;
- }
debugfs only returns NULL on failure.
Well, techincally, with DEBUG_FS=y, yes. (And we have dependency on it.)
But see include/linux/debugfs.h for DEBUG_FS=n case:
static inline struct dentry *debugfs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops) { return ERR_PTR(-ENODEV); }
So, I think it is fine to check for IS_ERR_OR_NULL(), although today it's always NULL for our case, true.
What does PTR_ERR(NULL) mean then? It will always print "unable to create pstore/ftrace file: 0"?
With this patch we no longer reuse function tracer infrastructure, now we register our own tracer back-end via a debugfs knob.
It's a bit more code, but that is the only downside. On the bright side we have:
- Ability to make persistent_ram module removable (when needed, we can move ftrace_ops struct into a module). Note that persistent_ram is still not removable for other reasons, but with this patch it's just one thing less to worry about;
- Pstore part is more isolated from the generic function tracer. We tried it already by registering our own tracer in available_tracers, but that way we're loosing ability to see the traces while we record them to pstore. This solution is somewhere in the middle: we only register "internal ftracer" back-end, but not the "front-end";
- When there is only pstore tracing enabled, the kernel will only write to the pstore buffer, omitting function tracer buffer (which, of course, still can be enabled via 'echo function > current_tracer').
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Wed, Jul 18, 2012 at 11:59:21AM -0700, Stephen Boyd wrote: [...]
So, I think it is fine to check for IS_ERR_OR_NULL(), although today it's always NULL for our case, true.
What does PTR_ERR(NULL) mean then? It will always print "unable to create pstore/ftrace file: 0"?
Well, that might be not pretty, yeah. OK, since debugfs is mandatory anyway, let's check for !NULL then, it's also less lines of code. :-)
Thanks a lot for the review, Stephen!
Documentation/ramoops.txt | 4 +- fs/pstore/Kconfig | 1 + fs/pstore/ftrace.c | 96 +++++++++++++++++++++++++++++++++++++++- fs/pstore/internal.h | 6 +++ fs/pstore/platform.c | 1 + include/linux/pstore.h | 8 ---- kernel/trace/trace_functions.c | 15 +------ 7 files changed, 105 insertions(+), 26 deletions(-)
diff --git a/Documentation/ramoops.txt b/Documentation/ramoops.txt index 197ad59..69b3cac 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/ramoops.txt @@ -102,9 +102,7 @@ related hangs. The functions call chain log is stored in a "ftrace-ramoops" file. Here is an example of usage:
# mount -t debugfs debugfs /sys/kernel/debug/ - # cd /sys/kernel/debug/tracing - # echo function > current_tracer - # echo 1 > options/func_pstore + # echo 1 > /sys/kernel/debug/pstore/record_ftrace # reboot -f [...] # mount -t pstore pstore /mnt/ diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d39bb5c..ca71db6 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -23,6 +23,7 @@ config PSTORE_FTRACE bool "Persistent function tracer" depends on PSTORE depends on FUNCTION_TRACER + depends on DEBUG_FS help With this option kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index a130d48..2d57e1a 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -17,19 +17,113 @@ #include <linux/percpu.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/types.h> +#include <linux/mutex.h> +#include <linux/ftrace.h> +#include <linux/fs.h> +#include <linux/debugfs.h> +#include <linux/err.h> +#include <linux/cache.h> #include <asm/barrier.h> #include "internal.h"
-void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip, + unsigned long parent_ip) { + unsigned long flags; struct pstore_ftrace_record rec = {};
if (unlikely(oops_in_progress)) return;
+ local_irq_save(flags); + rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo); + + local_irq_restore(flags); +} + +static struct ftrace_ops pstore_ftrace_ops __read_mostly = { + .func = pstore_ftrace_call, +}; + +static DEFINE_MUTEX(pstore_ftrace_lock); +static bool pstore_ftrace_enabled; + +static ssize_t pstore_ftrace_knob_write(struct file *f, const char __user *buf, + size_t count, loff_t *ppos) +{ + u8 on; + ssize_t ret; + + ret = kstrtou8_from_user(buf, count, 2, &on); + if (ret) + return ret; + + mutex_lock(&pstore_ftrace_lock); + + if (!on ^ pstore_ftrace_enabled) + goto out; + + if (on) + ret = register_ftrace_function(&pstore_ftrace_ops); + else + ret = unregister_ftrace_function(&pstore_ftrace_ops); + if (ret) { + pr_err("%s: unable to %sregister ftrace ops: %zd\n", + __func__, on ? "" : "un", ret); + goto err; + } + + pstore_ftrace_enabled = on; +out: + ret = count; +err: + mutex_unlock(&pstore_ftrace_lock); + + return ret; +} + +static ssize_t pstore_ftrace_knob_read(struct file *f, char __user *buf, + size_t count, loff_t *ppos) +{ + char val[] = { '0' + pstore_ftrace_enabled, '\n' }; + + return simple_read_from_buffer(buf, count, ppos, val, sizeof(val)); +} + +static const struct file_operations pstore_knob_fops = { + .open = simple_open, + .read = pstore_ftrace_knob_read, + .write = pstore_ftrace_knob_write, +}; + +void pstore_register_ftrace(void) +{ + struct dentry *dir; + struct dentry *file; + + if (!psinfo->write_buf) + return; + + dir = debugfs_create_dir("pstore", NULL); + if (!dir) { + pr_err("%s: unable to create pstore directory\n", __func__); + return; + } + + file = debugfs_create_file("record_ftrace", 0600, dir, NULL, + &pstore_knob_fops); + if (!file) { + pr_err("%s: unable to create record_ftrace file\n", __func__); + goto err_file; + } + + return; +err_file: + debugfs_remove(dir); } diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 0d0d3b7..4847f58 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -39,6 +39,12 @@ pstore_ftrace_decode_cpu(struct pstore_ftrace_record *rec) #endif }
+#ifdef CONFIG_PSTORE_FTRACE +extern void pstore_register_ftrace(void); +#else +static inline void pstore_register_ftrace(void) {} +#endif + extern struct pstore_info *psinfo;
extern void pstore_set_kmsg_bytes(int); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 29996e8..6c23eab 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -236,6 +236,7 @@ int pstore_register(struct pstore_info *psi)
kmsg_dump_register(&pstore_dumper); pstore_register_console(); + pstore_register_ftrace();
if (pstore_update_ms >= 0) { pstore_timer.expires = jiffies + diff --git a/include/linux/pstore.h b/include/linux/pstore.h index c892587..ee3034a 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -64,14 +64,6 @@ struct pstore_info { void *data; };
- -#ifdef CONFIG_PSTORE_FTRACE -extern void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip); -#else -static inline void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) -{ } -#endif - #ifdef CONFIG_PSTORE extern int pstore_register(struct pstore_info *); #else diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c index a426f41..0ad83e3 100644 --- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -13,7 +13,6 @@ #include <linux/debugfs.h> #include <linux/uaccess.h> #include <linux/ftrace.h> -#include <linux/pstore.h> #include <linux/fs.h>
#include "trace.h" @@ -75,10 +74,9 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip) preempt_enable_notrace(); }
-/* Our two options */ +/* Our option */ enum { TRACE_FUNC_OPT_STACK = 0x1, - TRACE_FUNC_OPT_PSTORE = 0x2, };
static struct tracer_flags func_flags; @@ -106,12 +104,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip) disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) { - /* - * So far tracing doesn't support multiple buffers, so - * we make an explicit call for now. - */ - if (unlikely(func_flags.val & TRACE_FUNC_OPT_PSTORE)) - pstore_ftrace_call(ip, parent_ip); pc = preempt_count(); trace_function(tr, ip, parent_ip, flags, pc); } @@ -177,9 +169,6 @@ static struct tracer_opt func_opts[] = { #ifdef CONFIG_STACKTRACE { TRACER_OPT(func_stack_trace, TRACE_FUNC_OPT_STACK) }, #endif -#ifdef CONFIG_PSTORE_FTRACE - { TRACER_OPT(func_pstore, TRACE_FUNC_OPT_PSTORE) }, -#endif { } /* Always set a last empty entry */ };
@@ -232,8 +221,6 @@ static int func_set_flag(u32 old_flags, u32 bit, int set) }
break; - case TRACE_FUNC_OPT_PSTORE: - break; default: return -EINVAL; }
On Wed, Jul 18, 2012 at 12:30:52PM -0700, Anton Vorontsov wrote:
With this patch we no longer reuse function tracer infrastructure, now we register our own tracer back-end via a debugfs knob.
It's a bit more code, but that is the only downside. On the bright side we have:
Ability to make persistent_ram module removable (when needed, we can move ftrace_ops struct into a module). Note that persistent_ram is still not removable for other reasons, but with this patch it's just one thing less to worry about;
Pstore part is more isolated from the generic function tracer. We tried it already by registering our own tracer in available_tracers, but that way we're loosing ability to see the traces while we record them to pstore. This solution is somewhere in the middle: we only register "internal ftracer" back-end, but not the "front-end";
When there is only pstore tracing enabled, the kernel will only write to the pstore buffer, omitting function tracer buffer (which, of course, still can be enabled via 'echo function > current_tracer').
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
Hi Steven,
Unless there are any issues, may I take the patch via linux-pstore.git tree?
Thanks!
Documentation/ramoops.txt | 4 +- fs/pstore/Kconfig | 1 + fs/pstore/ftrace.c | 96 +++++++++++++++++++++++++++++++++++++++- fs/pstore/internal.h | 6 +++ fs/pstore/platform.c | 1 + include/linux/pstore.h | 8 ---- kernel/trace/trace_functions.c | 15 +------ 7 files changed, 105 insertions(+), 26 deletions(-)
diff --git a/Documentation/ramoops.txt b/Documentation/ramoops.txt index 197ad59..69b3cac 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/ramoops.txt @@ -102,9 +102,7 @@ related hangs. The functions call chain log is stored in a "ftrace-ramoops" file. Here is an example of usage: # mount -t debugfs debugfs /sys/kernel/debug/
- # cd /sys/kernel/debug/tracing
- # echo function > current_tracer
- # echo 1 > options/func_pstore
- # echo 1 > /sys/kernel/debug/pstore/record_ftrace # reboot -f [...] # mount -t pstore pstore /mnt/
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d39bb5c..ca71db6 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -23,6 +23,7 @@ config PSTORE_FTRACE bool "Persistent function tracer" depends on PSTORE depends on FUNCTION_TRACER
- depends on DEBUG_FS help With this option kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through
diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index a130d48..2d57e1a 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -17,19 +17,113 @@ #include <linux/percpu.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/types.h> +#include <linux/mutex.h> +#include <linux/ftrace.h> +#include <linux/fs.h> +#include <linux/debugfs.h> +#include <linux/err.h> +#include <linux/cache.h> #include <asm/barrier.h> #include "internal.h" -void notrace pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) +static void notrace pstore_ftrace_call(unsigned long ip,
unsigned long parent_ip)
{
- unsigned long flags; struct pstore_ftrace_record rec = {};
if (unlikely(oops_in_progress)) return;
- local_irq_save(flags);
- rec.ip = ip; rec.parent_ip = parent_ip; pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, sizeof(rec), psinfo);
- local_irq_restore(flags);
+}
+static struct ftrace_ops pstore_ftrace_ops __read_mostly = {
- .func = pstore_ftrace_call,
+};
+static DEFINE_MUTEX(pstore_ftrace_lock); +static bool pstore_ftrace_enabled;
+static ssize_t pstore_ftrace_knob_write(struct file *f, const char __user *buf,
size_t count, loff_t *ppos)
+{
- u8 on;
- ssize_t ret;
- ret = kstrtou8_from_user(buf, count, 2, &on);
- if (ret)
return ret;
- mutex_lock(&pstore_ftrace_lock);
- if (!on ^ pstore_ftrace_enabled)
goto out;
- if (on)
ret = register_ftrace_function(&pstore_ftrace_ops);
- else
ret = unregister_ftrace_function(&pstore_ftrace_ops);
- if (ret) {
pr_err("%s: unable to %sregister ftrace ops: %zd\n",
__func__, on ? "" : "un", ret);
goto err;
- }
- pstore_ftrace_enabled = on;
+out:
- ret = count;
+err:
- mutex_unlock(&pstore_ftrace_lock);
- return ret;
+}
+static ssize_t pstore_ftrace_knob_read(struct file *f, char __user *buf,
size_t count, loff_t *ppos)
+{
- char val[] = { '0' + pstore_ftrace_enabled, '\n' };
- return simple_read_from_buffer(buf, count, ppos, val, sizeof(val));
+}
+static const struct file_operations pstore_knob_fops = {
- .open = simple_open,
- .read = pstore_ftrace_knob_read,
- .write = pstore_ftrace_knob_write,
+};
+void pstore_register_ftrace(void) +{
- struct dentry *dir;
- struct dentry *file;
- if (!psinfo->write_buf)
return;
- dir = debugfs_create_dir("pstore", NULL);
- if (!dir) {
pr_err("%s: unable to create pstore directory\n", __func__);
return;
- }
- file = debugfs_create_file("record_ftrace", 0600, dir, NULL,
&pstore_knob_fops);
- if (!file) {
pr_err("%s: unable to create record_ftrace file\n", __func__);
goto err_file;
- }
- return;
+err_file:
- debugfs_remove(dir);
} diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 0d0d3b7..4847f58 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -39,6 +39,12 @@ pstore_ftrace_decode_cpu(struct pstore_ftrace_record *rec) #endif } +#ifdef CONFIG_PSTORE_FTRACE +extern void pstore_register_ftrace(void); +#else +static inline void pstore_register_ftrace(void) {} +#endif
extern struct pstore_info *psinfo; extern void pstore_set_kmsg_bytes(int); diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 29996e8..6c23eab 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -236,6 +236,7 @@ int pstore_register(struct pstore_info *psi) kmsg_dump_register(&pstore_dumper); pstore_register_console();
- pstore_register_ftrace();
if (pstore_update_ms >= 0) { pstore_timer.expires = jiffies + diff --git a/include/linux/pstore.h b/include/linux/pstore.h index c892587..ee3034a 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -64,14 +64,6 @@ struct pstore_info { void *data; };
-#ifdef CONFIG_PSTORE_FTRACE -extern void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip); -#else -static inline void pstore_ftrace_call(unsigned long ip, unsigned long parent_ip) -{ } -#endif
#ifdef CONFIG_PSTORE extern int pstore_register(struct pstore_info *); #else diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c index a426f41..0ad83e3 100644 --- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -13,7 +13,6 @@ #include <linux/debugfs.h> #include <linux/uaccess.h> #include <linux/ftrace.h> -#include <linux/pstore.h> #include <linux/fs.h> #include "trace.h" @@ -75,10 +74,9 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip) preempt_enable_notrace(); } -/* Our two options */ +/* Our option */ enum { TRACE_FUNC_OPT_STACK = 0x1,
- TRACE_FUNC_OPT_PSTORE = 0x2,
}; static struct tracer_flags func_flags; @@ -106,12 +104,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip) disabled = atomic_inc_return(&data->disabled); if (likely(disabled == 1)) {
/*
* So far tracing doesn't support multiple buffers, so
* we make an explicit call for now.
*/
if (unlikely(func_flags.val & TRACE_FUNC_OPT_PSTORE))
pc = preempt_count(); trace_function(tr, ip, parent_ip, flags, pc); }pstore_ftrace_call(ip, parent_ip);
@@ -177,9 +169,6 @@ static struct tracer_opt func_opts[] = { #ifdef CONFIG_STACKTRACE { TRACER_OPT(func_stack_trace, TRACE_FUNC_OPT_STACK) }, #endif -#ifdef CONFIG_PSTORE_FTRACE
- { TRACER_OPT(func_pstore, TRACE_FUNC_OPT_PSTORE) },
-#endif { } /* Always set a last empty entry */ }; @@ -232,8 +221,6 @@ static int func_set_flag(u32 old_flags, u32 bit, int set) } break;
- case TRACE_FUNC_OPT_PSTORE:
default: return -EINVAL; }break;
-- 1.7.10.4
On Mon, 2012-08-20 at 18:46 -0700, Anton Vorontsov wrote:
On Wed, Jul 18, 2012 at 12:30:52PM -0700, Anton Vorontsov wrote:
With this patch we no longer reuse function tracer infrastructure, now we register our own tracer back-end via a debugfs knob.
It's a bit more code, but that is the only downside. On the bright side we have:
Ability to make persistent_ram module removable (when needed, we can move ftrace_ops struct into a module). Note that persistent_ram is still not removable for other reasons, but with this patch it's just one thing less to worry about;
Pstore part is more isolated from the generic function tracer. We tried it already by registering our own tracer in available_tracers, but that way we're loosing ability to see the traces while we record them to pstore. This solution is somewhere in the middle: we only register "internal ftracer" back-end, but not the "front-end";
When there is only pstore tracing enabled, the kernel will only write to the pstore buffer, omitting function tracer buffer (which, of course, still can be enabled via 'echo function > current_tracer').
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
Hi Steven,
Unless there are any issues, may I take the patch via linux-pstore.git tree?
I'm fine with it. I know there was some issues about recursion protection and I said that the function tracer now has its own protection where you don't need to worry about it. I was hoping that code would make it into 3.6, but Linus opened the merge window the day after I posted the final version. Which I figured was too close to the merge window to push for 3.6 (lots of changes occurred, and I wanted it vetted in linux-next for a bit).
Now those changes are queued for 3.7 and are currently in the tip tree. You can supply your own temporary recursion protection to the function tracer callback, or wait till my changes make it into Linus's tree.
Specifically, see commit 4740974a6844156c14d741b0080b59d275679a23 (in tip.git).
-- Steve
On Tue, Aug 21, 2012 at 09:07:19PM -0400, Steven Rostedt wrote: [...]
I'm fine with it. I know there was some issues about recursion protection and I said that the function tracer now has its own protection where you don't need to worry about it. I was hoping that code would make it into 3.6, but Linus opened the merge window the day after I posted the final version. Which I figured was too close to the merge window to push for 3.6 (lots of changes occurred, and I wanted it vetted in linux-next for a bit).
Now those changes are queued for 3.7 and are currently in the tip tree. You can supply your own temporary recursion protection to the function tracer callback, or wait till my changes make it into Linus's tree.
Great! Btw, the particular recursion issue that I faced back then was triggered by a missing 'notrace' specifier for the ->write() callback in pstore code, i.e. a bug in pstore.
Running without any recursion protection is prone to weird lockups/reboots, and probably a good idea to have it on a production system. But recursion during tracing is still an evidence of some other bugs, right? At least the fact that I didn't have it helped me to find a bug. So, does it make sense to make the recursion protection optionally disabled? Maybe as some CONFIG_DEBUG_* option (briefly looking into kernel/trace/Kconfig I didn't find any)?
Thanks,
Anton.
On Tue, 2012-08-21 at 19:10 -0700, Anton Vorontsov wrote:
Running without any recursion protection is prone to weird lockups/reboots, and probably a good idea to have it on a production system. But recursion during tracing is still an evidence of some other bugs, right? At least the fact that I didn't have it helped me to find a bug. So, does it make sense to make the recursion protection optionally disabled? Maybe as some CONFIG_DEBUG_* option (briefly looking into kernel/trace/Kconfig I didn't find any)?
The problem is that recursion bugs are usually due to calling something that might be traced. Really, I've hit so many recursion bugs, that having the protection is probably the best thing. I wouldn't turn it off.
The worse that recursion can do (with proper protection) is that it causes a little wasted effort during the trace (it wastes cycles tracing something again and then recognizing that it is doing it due to recursion).
Therefore, the 'notrace' annotations are mostly for speed up (don't trace this while tracing, because the tracer uses it too).
-- Steve
This patch introduces 'func_ptrace' option, now available in /sys/kernel/debug/tracing/options when function tracer is selected.
The patch also adds some tiny code that calls back to pstore to record the trace. The callback is no-op when PSTORE=n.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- kernel/trace/trace_functions.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c index c7b0c6a..13770ab 100644 --- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -13,6 +13,7 @@ #include <linux/debugfs.h> #include <linux/uaccess.h> #include <linux/ftrace.h> +#include <linux/pstore.h> #include <linux/fs.h>
#include "trace.h" @@ -74,6 +75,14 @@ function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip) preempt_enable_notrace(); }
+/* Our two options */ +enum { + TRACE_FUNC_OPT_STACK = 0x1, + TRACE_FUNC_OPT_PSTORE = 0x2, +}; + +static struct tracer_flags func_flags; + static void function_trace_call(unsigned long ip, unsigned long parent_ip) { @@ -97,6 +106,12 @@ function_trace_call(unsigned long ip, unsigned long parent_ip) disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) { + /* + * So far tracing doesn't support multiple buffers, so + * we make an explicit call for now. + */ + if (unlikely(func_flags.val & TRACE_FUNC_OPT_PSTORE)) + pstore_ftrace_call(ip, parent_ip); pc = preempt_count(); trace_function(tr, ip, parent_ip, flags, pc); } @@ -158,15 +173,13 @@ static struct ftrace_ops trace_stack_ops __read_mostly = .flags = FTRACE_OPS_FL_GLOBAL, };
-/* Our two options */ -enum { - TRACE_FUNC_OPT_STACK = 0x1, -}; - static struct tracer_opt func_opts[] = { #ifdef CONFIG_STACKTRACE { TRACER_OPT(func_stack_trace, TRACE_FUNC_OPT_STACK) }, #endif +#ifdef CONFIG_PSTORE_FTRACE + { TRACER_OPT(func_pstore, TRACE_FUNC_OPT_PSTORE) }, +#endif { } /* Always set a last empty entry */ };
@@ -218,6 +231,8 @@ static int func_set_flag(u32 old_flags, u32 bit, int set) }
return 0; + } else if (bit == TRACE_FUNC_OPT_PSTORE) { + return 0; }
return -EINVAL;
On Mon, 2012-07-09 at 17:10 -0700, Anton Vorontsov wrote:
+static struct tracer_flags func_flags;
static void function_trace_call(unsigned long ip, unsigned long parent_ip) { @@ -97,6 +106,12 @@ function_trace_call(unsigned long ip, unsigned long parent_ip) disabled = atomic_inc_return(&data->disabled); if (likely(disabled == 1)) {
/*
* So far tracing doesn't support multiple buffers, so
* we make an explicit call for now.
I'm working on it ;-)
-- Steve
*/
if (unlikely(func_flags.val & TRACE_FUNC_OPT_PSTORE))
pc = preempt_count(); trace_function(tr, ip, parent_ip, flags, pc); }pstore_ftrace_call(ip, parent_ip);
Don't use pstore.buf directly, instead convert the code to write_buf callback which passes a pointer to a buffer as an argument.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- fs/pstore/ram.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index b39aebb..74f4111 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -170,11 +170,12 @@ static size_t ramoops_write_kmsg_hdr(struct persistent_ram_zone *prz) return len; }
-static int ramoops_pstore_write(enum pstore_type_id type, - enum kmsg_dump_reason reason, - u64 *id, - unsigned int part, - size_t size, struct pstore_info *psi) + +static int ramoops_pstore_write_buf(enum pstore_type_id type, + enum kmsg_dump_reason reason, + u64 *id, unsigned int part, + const char *buf, size_t size, + struct pstore_info *psi) { struct ramoops_context *cxt = psi->data; struct persistent_ram_zone *prz = cxt->przs[cxt->dump_write_cnt]; @@ -183,7 +184,7 @@ static int ramoops_pstore_write(enum pstore_type_id type, if (type == PSTORE_TYPE_CONSOLE) { if (!cxt->cprz) return -ENOMEM; - persistent_ram_write(cxt->cprz, cxt->pstore.buf, size); + persistent_ram_write(cxt->cprz, buf, size); return 0; }
@@ -212,7 +213,7 @@ static int ramoops_pstore_write(enum pstore_type_id type, hlen = ramoops_write_kmsg_hdr(prz); if (size + hlen > prz->buffer_size) size = prz->buffer_size - hlen; - persistent_ram_write(prz, cxt->pstore.buf, size); + persistent_ram_write(prz, buf, size);
cxt->dump_write_cnt = (cxt->dump_write_cnt + 1) % cxt->max_dump_cnt;
@@ -250,7 +251,7 @@ static struct ramoops_context oops_cxt = { .name = "ramoops", .open = ramoops_pstore_open, .read = ramoops_pstore_read, - .write = ramoops_pstore_write, + .write_buf = ramoops_pstore_write_buf, .erase = ramoops_pstore_erase, }, };
The ftrace log size is configurable via ramoops.ftrace_size module option, and the log itself is available via <pstore-mount>/ftrace-ramoops file.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- Documentation/ramoops.txt | 25 +++++++++++++++++++++++++ fs/pstore/ram.c | 37 +++++++++++++++++++++++++++++++++---- include/linux/pstore_ram.h | 1 + 3 files changed, 59 insertions(+), 4 deletions(-)
diff --git a/Documentation/ramoops.txt b/Documentation/ramoops.txt index 59a74a8..197ad59 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/ramoops.txt @@ -94,3 +94,28 @@ timestamp and a new line. The dump then continues with the actual data. The dump data can be read from the pstore filesystem. The format for these files is "dmesg-ramoops-N", where N is the record number in memory. To delete a stored record from RAM, simply unlink the respective pstore file. + +5. Persistent function tracing + +Persistent function tracing might be useful for debugging software or hardware +related hangs. The functions call chain log is stored in a "ftrace-ramoops" +file. Here is an example of usage: + + # mount -t debugfs debugfs /sys/kernel/debug/ + # cd /sys/kernel/debug/tracing + # echo function > current_tracer + # echo 1 > options/func_pstore + # reboot -f + [...] + # mount -t pstore pstore /mnt/ + # tail /mnt/ftrace-ramoops + 0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0 + 0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0 + 0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90 + 0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90 + 0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40 + 0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0 + 0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0 + 0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0 + 0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40 + 0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20 diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index 74f4111..1dd108e 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -45,6 +45,10 @@ static ulong ramoops_console_size = MIN_MEM_SIZE; module_param_named(console_size, ramoops_console_size, ulong, 0400); MODULE_PARM_DESC(console_size, "size of kernel console log");
+static ulong ramoops_ftrace_size = MIN_MEM_SIZE; +module_param_named(ftrace_size, ramoops_ftrace_size, ulong, 0400); +MODULE_PARM_DESC(ftrace_size, "size of ftrace log"); + static ulong mem_address; module_param(mem_address, ulong, 0400); MODULE_PARM_DESC(mem_address, @@ -70,16 +74,19 @@ MODULE_PARM_DESC(ramoops_ecc, struct ramoops_context { struct persistent_ram_zone **przs; struct persistent_ram_zone *cprz; + struct persistent_ram_zone *fprz; phys_addr_t phys_addr; unsigned long size; size_t record_size; size_t console_size; + size_t ftrace_size; int dump_oops; int ecc_size; unsigned int max_dump_cnt; unsigned int dump_write_cnt; unsigned int dump_read_cnt; unsigned int console_read_cnt; + unsigned int ftrace_read_cnt; struct pstore_info pstore; };
@@ -138,6 +145,9 @@ static ssize_t ramoops_pstore_read(u64 *id, enum pstore_type_id *type, prz = ramoops_get_next_prz(&cxt->cprz, &cxt->console_read_cnt, 1, id, type, PSTORE_TYPE_CONSOLE, 0); if (!prz) + prz = ramoops_get_next_prz(&cxt->fprz, &cxt->ftrace_read_cnt, + 1, id, type, PSTORE_TYPE_FTRACE, 0); + if (!prz) return 0;
/* TODO(kees): Bogus time for the moment. */ @@ -186,6 +196,11 @@ static int ramoops_pstore_write_buf(enum pstore_type_id type, return -ENOMEM; persistent_ram_write(cxt->cprz, buf, size); return 0; + } else if (type == PSTORE_TYPE_FTRACE) { + if (!cxt->fprz) + return -ENOMEM; + persistent_ram_write(cxt->fprz, buf, size); + return 0; }
if (type != PSTORE_TYPE_DMESG) @@ -235,6 +250,9 @@ static int ramoops_pstore_erase(enum pstore_type_id type, u64 id, case PSTORE_TYPE_CONSOLE: prz = cxt->cprz; break; + case PSTORE_TYPE_FTRACE: + prz = cxt->fprz; + break; default: return -EINVAL; } @@ -348,7 +366,8 @@ static int __devinit ramoops_probe(struct platform_device *pdev) if (cxt->max_dump_cnt) goto fail_out;
- if (!pdata->mem_size || (!pdata->record_size && !pdata->console_size)) { + if (!pdata->mem_size || (!pdata->record_size && !pdata->console_size && + !pdata->ftrace_size)) { pr_err("The memory size and the record/console size must be " "non-zero\n"); goto fail_out; @@ -357,18 +376,20 @@ static int __devinit ramoops_probe(struct platform_device *pdev) pdata->mem_size = rounddown_pow_of_two(pdata->mem_size); pdata->record_size = rounddown_pow_of_two(pdata->record_size); pdata->console_size = rounddown_pow_of_two(pdata->console_size); + pdata->ftrace_size = rounddown_pow_of_two(pdata->ftrace_size);
cxt->dump_read_cnt = 0; cxt->size = pdata->mem_size; cxt->phys_addr = pdata->mem_address; cxt->record_size = pdata->record_size; cxt->console_size = pdata->console_size; + cxt->ftrace_size = pdata->ftrace_size; cxt->dump_oops = pdata->dump_oops; cxt->ecc_size = pdata->ecc_size;
paddr = cxt->phys_addr;
- dump_mem_sz = cxt->size - cxt->console_size; + dump_mem_sz = cxt->size - cxt->console_size - cxt->ftrace_size; err = ramoops_init_przs(dev, cxt, &paddr, dump_mem_sz); if (err) goto fail_out; @@ -377,9 +398,14 @@ static int __devinit ramoops_probe(struct platform_device *pdev) if (err) goto fail_init_cprz;
- if (!cxt->przs && !cxt->cprz) { + err = ramoops_init_prz(dev, cxt, &cxt->fprz, &paddr, cxt->ftrace_size); + if (err) + goto fail_init_fprz; + + if (!cxt->przs && !cxt->cprz && !cxt->fprz) { pr_err("memory size too small, minimum is %lu\n", - cxt->console_size + cxt->record_size); + cxt->console_size + cxt->record_size + + cxt->ftrace_size); goto fail_cnt; }
@@ -426,6 +452,8 @@ fail_clear: cxt->pstore.bufsize = 0; cxt->max_dump_cnt = 0; fail_cnt: + kfree(cxt->fprz); +fail_init_fprz: kfree(cxt->cprz); fail_init_cprz: ramoops_free_przs(cxt); @@ -480,6 +508,7 @@ static void ramoops_register_dummy(void) dummy_data->mem_address = mem_address; dummy_data->record_size = record_size; dummy_data->console_size = ramoops_console_size; + dummy_data->ftrace_size = ramoops_ftrace_size; dummy_data->dump_oops = dump_oops; /* * For backwards compatibility ramoops.ecc=1 means 16 bytes ECC diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h index dcf805f..af848e1 100644 --- a/include/linux/pstore_ram.h +++ b/include/linux/pstore_ram.h @@ -72,6 +72,7 @@ struct ramoops_platform_data { unsigned long mem_address; unsigned long record_size; unsigned long console_size; + unsigned long ftrace_size; int dump_oops; int ecc_size; };
Decoding the binary trace w/ a different kernel might be troublesome since we convert addresses to symbols. For kernels with minimal changes, the mappings would probably match, but it's not guaranteed at all. (But still we could convert the addresses by hand, since we do print raw addresses.)
If we use modules, the symbols could be loaded at different addresses from the previously booted kernel, and so this would also fail, but there's nothing we can do about it.
Also, the binary data format that pstore/ram is using in its ringbuffer may change between the kernels, so here we too must ensure that we're running the same kernel.
So, there are two questions really:
1. How to compute the unique kernel tag; 2. Where to store it.
In this patch we're just passing linux_banner through CRC32. This way we are protecting from the kernel version mismatch, making sure that we're running the same image. We could also add CRC of a symbol table (as suggested by Tony Luck), but for now let's not be that strict.
And as for storing, we are using a small trick here. Instead of allocating a dedicated buffer for the tag (i.e. another prz), or hacking ram_core routines to "reserve" some control data in the buffer, we are just encoding the tag into the buffer signature (and XOR'ing it with the actual signature value, so that buffers not needing a tag can just pass zero, which will result into the plain old PRZ signature).
Suggested-by: Steven Rostedt rostedt@goodmis.org Suggested-by: Tony Luck tony.luck@intel.com Suggested-by: Colin Cross ccross@android.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- fs/pstore/Kconfig | 1 + fs/pstore/ram.c | 13 ++++++++----- fs/pstore/ram_core.c | 12 +++++++----- include/linux/pstore_ram.h | 2 +- 4 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index d39bb5c..a123754 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -36,6 +36,7 @@ config PSTORE_RAM depends on PSTORE depends on HAS_IOMEM depends on HAVE_MEMBLOCK + select CRC32 select REED_SOLOMON select REED_SOLOMON_ENC8 select REED_SOLOMON_DEC8 diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index 1dd108e..a7610a1 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -31,6 +31,7 @@ #include <linux/ioport.h> #include <linux/platform_device.h> #include <linux/slab.h> +#include <linux/crc32.h> #include <linux/pstore_ram.h>
#define RAMOOPS_KERNMSG_HDR "====" @@ -309,7 +310,7 @@ static int ramoops_init_przs(struct device *dev, struct ramoops_context *cxt, for (i = 0; i < cxt->max_dump_cnt; i++) { size_t sz = cxt->record_size;
- cxt->przs[i] = persistent_ram_new(*paddr, sz, cxt->ecc_size); + cxt->przs[i] = persistent_ram_new(*paddr, sz, 0, cxt->ecc_size); if (IS_ERR(cxt->przs[i])) { err = PTR_ERR(cxt->przs[i]); dev_err(dev, "failed to request mem region (0x%zx@0x%llx): %d\n", @@ -327,7 +328,7 @@ fail_prz:
static int ramoops_init_prz(struct device *dev, struct ramoops_context *cxt, struct persistent_ram_zone **prz, - phys_addr_t *paddr, size_t sz) + phys_addr_t *paddr, size_t sz, u32 sig) { if (!sz) return 0; @@ -335,7 +336,7 @@ static int ramoops_init_prz(struct device *dev, struct ramoops_context *cxt, if (*paddr + sz > *paddr + cxt->size) return -ENOMEM;
- *prz = persistent_ram_new(*paddr, sz, cxt->ecc_size); + *prz = persistent_ram_new(*paddr, sz, sig, cxt->ecc_size); if (IS_ERR(*prz)) { int err = PTR_ERR(*prz);
@@ -394,11 +395,13 @@ static int __devinit ramoops_probe(struct platform_device *pdev) if (err) goto fail_out;
- err = ramoops_init_prz(dev, cxt, &cxt->cprz, &paddr, cxt->console_size); + err = ramoops_init_prz(dev, cxt, &cxt->cprz, &paddr, + cxt->console_size, 0); if (err) goto fail_init_cprz;
- err = ramoops_init_prz(dev, cxt, &cxt->fprz, &paddr, cxt->ftrace_size); + err = ramoops_init_prz(dev, cxt, &cxt->fprz, &paddr, cxt->ftrace_size, + crc32_le(0, linux_banner, strlen(linux_banner))); if (err) goto fail_init_fprz;
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index 4dabbb8..eecd2a8 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -391,7 +391,7 @@ static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size, }
static int __devinit persistent_ram_post_init(struct persistent_ram_zone *prz, - int ecc_size) + u32 sig, int ecc_size) { int ret;
@@ -399,7 +399,9 @@ static int __devinit persistent_ram_post_init(struct persistent_ram_zone *prz, if (ret) return ret;
- if (prz->buffer->sig == PERSISTENT_RAM_SIG) { + sig ^= PERSISTENT_RAM_SIG; + + if (prz->buffer->sig == sig) { if (buffer_size(prz) > prz->buffer_size || buffer_start(prz) > buffer_size(prz)) pr_info("persistent_ram: found existing invalid buffer," @@ -417,7 +419,7 @@ static int __devinit persistent_ram_post_init(struct persistent_ram_zone *prz, " (sig = 0x%08x)\n", prz->buffer->sig); }
- prz->buffer->sig = PERSISTENT_RAM_SIG; + prz->buffer->sig = sig; persistent_ram_zap(prz);
return 0; @@ -442,7 +444,7 @@ void persistent_ram_free(struct persistent_ram_zone *prz) }
struct persistent_ram_zone * __devinit persistent_ram_new(phys_addr_t start, - size_t size, + size_t size, u32 sig, int ecc_size) { struct persistent_ram_zone *prz; @@ -458,7 +460,7 @@ struct persistent_ram_zone * __devinit persistent_ram_new(phys_addr_t start, if (ret) goto err;
- ret = persistent_ram_post_init(prz, ecc_size); + ret = persistent_ram_post_init(prz, sig, ecc_size); if (ret) goto err;
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h index af848e1..1d779c4 100644 --- a/include/linux/pstore_ram.h +++ b/include/linux/pstore_ram.h @@ -46,7 +46,7 @@ struct persistent_ram_zone { };
struct persistent_ram_zone * __devinit persistent_ram_new(phys_addr_t start, - size_t size, + size_t size, u32 sig, int ecc_size); void persistent_ram_free(struct persistent_ram_zone *prz); void persistent_ram_zap(struct persistent_ram_zone *prz);
On Mon, Jul 09, 2012 at 05:10:45PM -0700, Anton Vorontsov wrote:
Decoding the binary trace w/ a different kernel might be troublesome since we convert addresses to symbols. For kernels with minimal changes, the mappings would probably match, but it's not guaranteed at all. (But still we could convert the addresses by hand, since we do print raw addresses.)
If we use modules, the symbols could be loaded at different addresses from the previously booted kernel, and so this would also fail, but there's nothing we can do about it.
Also, the binary data format that pstore/ram is using in its ringbuffer may change between the kernels, so here we too must ensure that we're running the same kernel.
So, there are two questions really:
- How to compute the unique kernel tag;
- Where to store it.
In this patch we're just passing linux_banner through CRC32.
That's nice, but it breaks the build on my system as linux_banner somehow isn't enabled as part of the build?
Is there something else you can use? Or fix the build to work properly?
I'll take the previous patches in the series, but I obviously can't take this one because of the build error, sorry.
greg k-h
On Tue, Jul 17, 2012 at 09:56:01AM -0700, Greg Kroah-Hartman wrote: [...]
In this patch we're just passing linux_banner through CRC32.
That's nice, but it breaks the build on my system as linux_banner somehow isn't enabled as part of the build?
Hm... I'll take a look, thanks a lot!
On Tue, Jul 17, 2012 at 09:56:01AM -0700, Greg Kroah-Hartman wrote: [...]
That's nice, but it breaks the build on my system as linux_banner somehow isn't enabled as part of the build?
I really don't see how you managed to disable it. :-)
init/version.c is built unconditionally, and it has 'const char linux_banner[]' unconditionally. Then ram.c includes kernel.h, which unconditionally (well, under #ifdef __KERNEL__) includes printk.h, which has 'extern const char linux_banner[];' prototype...
I'm still looking further, but can you please send your config file? And may be gcc version?
p.s. While investigating this, I found that quite a lot of includes are missing in pstore headers, and I was able to trigger at least one issue. The patch will follow.
Headers should really include all the needed prototypes, types, defines etc. to be self-contained. This is a long-standing issue, but apparently the new tracing code unearthed it (SMP=n is also a prerequisite):
In file included from fs/pstore/internal.h:4:0, from fs/pstore/ftrace.c:21: include/linux/pstore.h:43:15: error: field ‘read_mutex’ has incomplete type
While at it, I also added the following:
linux/types.h -> size_t, phys_addr_t, uXX and friends linux/spinlock.h -> spinlock_t linux/errno.h -> Exxxx linux/time.h -> struct timespec (struct passed by value) struct module and rs_control forward declaration (passed via pointers).
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- fs/pstore/internal.h | 2 ++ include/linux/pstore.h | 6 ++++++ include/linux/pstore_ram.h | 1 + 3 files changed, 9 insertions(+)
diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 958c48d..0d0d3b7 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -1,6 +1,8 @@ #ifndef __PSTORE_INTERNAL_H__ #define __PSTORE_INTERNAL_H__
+#include <linux/types.h> +#include <linux/time.h> #include <linux/pstore.h>
#if NR_CPUS <= 2 && defined(CONFIG_ARM_THUMB) diff --git a/include/linux/pstore.h b/include/linux/pstore.h index 120443b..c892587 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -24,6 +24,10 @@
#include <linux/time.h> #include <linux/kmsg_dump.h> +#include <linux/mutex.h> +#include <linux/types.h> +#include <linux/spinlock.h> +#include <linux/errno.h>
/* types */ enum pstore_type_id { @@ -34,6 +38,8 @@ enum pstore_type_id { PSTORE_TYPE_UNKNOWN = 255 };
+struct module; + struct pstore_info { struct module *owner; char *name; diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h index af848e1..ba2b211 100644 --- a/include/linux/pstore_ram.h +++ b/include/linux/pstore_ram.h @@ -24,6 +24,7 @@ #include <linux/init.h>
struct persistent_ram_buffer; +struct rs_control;
struct persistent_ram_zone { phys_addr_t paddr;
On Tue, Jul 17, 2012 at 11:13:59AM -0700, Anton Vorontsov wrote:
Headers should really include all the needed prototypes, types, defines
<snip>
Content-Transfer-Encoding: base64
That's not a nice way to send patches out, care to fix this and resend?
thanks,
greg k-h
Headers should really include all the needed prototypes, types, defines etc. to be self-contained. This is a long-standing issue, but apparently the new tracing code unearthed it (SMP=n is also a prerequisite):
In file included from fs/pstore/internal.h:4:0, from fs/pstore/ftrace.c:21: include/linux/pstore.h:43:15: error: field ‘read_mutex’ has incomplete type
While at it, I also added the following:
linux/types.h -> size_t, phys_addr_t, uXX and friends linux/spinlock.h -> spinlock_t linux/errno.h -> Exxxx linux/time.h -> struct timespec (struct passed by value) struct module and rs_control forward declaration (passed via pointers).
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Tue, Jul 17, 2012 at 11:19:44AM -0700, Greg Kroah-Hartman wrote:
On Tue, Jul 17, 2012 at 11:13:59AM -0700, Anton Vorontsov wrote:
Headers should really include all the needed prototypes, types, defines
<snip>
Content-Transfer-Encoding: base64
That's not a nice way to send patches out, care to fix this and resend?
I used 'git send-email' as normal, and it said:
Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
OK, I checked what vger.kernel.org received, and it still says:
Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
And I see the email readable in the text editor.
OK, with linux_banner I got what the problem is, but where base64 came from?.. :-/
Anyways, resending w/ mutt.
fs/pstore/internal.h | 2 ++ include/linux/pstore.h | 6 ++++++ include/linux/pstore_ram.h | 1 + 3 files changed, 9 insertions(+)
diff --git a/fs/pstore/internal.h b/fs/pstore/internal.h index 958c48d..0d0d3b7 100644 --- a/fs/pstore/internal.h +++ b/fs/pstore/internal.h @@ -1,6 +1,8 @@ #ifndef __PSTORE_INTERNAL_H__ #define __PSTORE_INTERNAL_H__
+#include <linux/types.h> +#include <linux/time.h> #include <linux/pstore.h>
#if NR_CPUS <= 2 && defined(CONFIG_ARM_THUMB) diff --git a/include/linux/pstore.h b/include/linux/pstore.h index 120443b..c892587 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -24,6 +24,10 @@
#include <linux/time.h> #include <linux/kmsg_dump.h> +#include <linux/mutex.h> +#include <linux/types.h> +#include <linux/spinlock.h> +#include <linux/errno.h>
/* types */ enum pstore_type_id { @@ -34,6 +38,8 @@ enum pstore_type_id { PSTORE_TYPE_UNKNOWN = 255 };
+struct module; + struct pstore_info { struct module *owner; char *name; diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h index af848e1..ba2b211 100644 --- a/include/linux/pstore_ram.h +++ b/include/linux/pstore_ram.h @@ -24,6 +24,7 @@ #include <linux/init.h>
struct persistent_ram_buffer; +struct rs_control;
struct persistent_ram_zone { phys_addr_t paddr;
On Tue, Jul 17, 2012 at 11:37:07AM -0700, Anton Vorontsov wrote:
Headers should really include all the needed prototypes, types, defines etc. to be self-contained. This is a long-standing issue, but apparently the new tracing code unearthed it (SMP=n is also a prerequisite):
In file included from fs/pstore/internal.h:4:0, from fs/pstore/ftrace.c:21: include/linux/pstore.h:43:15: error: field ‘read_mutex’ has incomplete type
While at it, I also added the following:
linux/types.h -> size_t, phys_addr_t, uXX and friends linux/spinlock.h -> spinlock_t linux/errno.h -> Exxxx linux/time.h -> struct timespec (struct passed by value) struct module and rs_control forward declaration (passed via pointers).
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
On Tue, Jul 17, 2012 at 11:19:44AM -0700, Greg Kroah-Hartman wrote:
On Tue, Jul 17, 2012 at 11:13:59AM -0700, Anton Vorontsov wrote:
Headers should really include all the needed prototypes, types, defines
<snip>
Content-Transfer-Encoding: base64
That's not a nice way to send patches out, care to fix this and resend?
I used 'git send-email' as normal, and it said:
Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
OK, I checked what vger.kernel.org received, and it still says:
Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
And I see the email readable in the text editor.
OK, with linux_banner I got what the problem is, but where base64 came from?.. :-/
It looks like the devel mailing list does this conversion, ugh, mailman sucks at times.
Sorry about that,
greg k-h
On Tue, Jul 17, 2012 at 11:09:18AM -0700, Anton Vorontsov wrote:
On Tue, Jul 17, 2012 at 09:56:01AM -0700, Greg Kroah-Hartman wrote: [...]
That's nice, but it breaks the build on my system as linux_banner somehow isn't enabled as part of the build?
I really don't see how you managed to disable it. :-)
init/version.c is built unconditionally, and it has 'const char linux_banner[]' unconditionally. Then ram.c includes kernel.h, which unconditionally (well, under #ifdef __KERNEL__) includes printk.h, which has 'extern const char linux_banner[];' prototype...
I'm still looking further, but can you please send your config file? And may be gcc version?
It failed in the linking stage, so perhaps it is because linux_banner is not exported and you had module code looking for it?
.config is attached.
thanks,
greg k-h
Decoding the binary trace w/ a different kernel might be troublesome since we convert addresses to symbols. For kernels with minimal changes, the mappings would probably match, but it's not guaranteed at all. (But still we could convert the addresses by hand, since we do print raw addresses.)
If we use modules, the symbols could be loaded at different addresses from the previously booted kernel, and so this would also fail, but there's nothing we can do about it.
Also, the binary data format that pstore/ram is using in its ringbuffer may change between the kernels, so here we too must ensure that we're running the same kernel.
So, there are two questions really:
1. How to compute the unique kernel tag; 2. Where to store it.
In this patch we're using LINUX_VERSION_CODE, just as hibernation (suspend-to-disk) does. This way we are protecting from the kernel version mismatch, making sure that we're running the same kernel version and patch level. We could use CRC of a symbol table (as suggested by Tony Luck), but for now let's not be that strict.
And as for storing, we are using a small trick here. Instead of allocating a dedicated buffer for the tag (i.e. another prz), or hacking ram_core routines to "reserve" some control data in the buffer, we are just encoding the tag into the buffer signature (and XOR'ing it with the actual signature value, so that buffers not needing a tag can just pass zero, which will result into the plain old PRZ signature).
Suggested-by: Steven Rostedt rostedt@goodmis.org Suggested-by: Tony Luck tony.luck@intel.com Suggested-by: Colin Cross ccross@android.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Tue, Jul 17, 2012 at 09:56:01AM -0700, Greg Kroah-Hartman wrote: [...]
That's nice, but it breaks the build on my system as linux_banner somehow isn't enabled as part of the build?
Is there something else you can use? Or fix the build to work properly?
Yeah, as you say, linux_banner is not available for modules.
Looking into the hibernation/suspend-to-disk code, I see it uses LINUX_VERSION_CODE. It is less strict as it does not include build date, but I guess if it is not an issue for hibernation code, then it should not be an issue for pstore/ftrace as well. And if we see any issue with verison code, then we should "fix" it for all. Or switch to CRC'ing kernel symbols.
So, let do things even simpler, just as hibernation code does: don't involve CRC and other stuff, just use linux version code. Here is the updated patch.
Thanks!
p.s. Checkpatch complains 'WARNING: LINUX_VERSION_CODE should be avoided, code should be for the version to which it is merged', but surely it's not for our case. :-)
fs/pstore/ram.c | 13 ++++++++----- fs/pstore/ram_core.c | 12 +++++++----- include/linux/pstore_ram.h | 2 +- 3 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index 1dd108e..0b311bc 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -25,6 +25,7 @@ #include <linux/kernel.h> #include <linux/err.h> #include <linux/module.h> +#include <linux/version.h> #include <linux/pstore.h> #include <linux/time.h> #include <linux/io.h> @@ -309,7 +310,7 @@ static int ramoops_init_przs(struct device *dev, struct ramoops_context *cxt, for (i = 0; i < cxt->max_dump_cnt; i++) { size_t sz = cxt->record_size;
- cxt->przs[i] = persistent_ram_new(*paddr, sz, cxt->ecc_size); + cxt->przs[i] = persistent_ram_new(*paddr, sz, 0, cxt->ecc_size); if (IS_ERR(cxt->przs[i])) { err = PTR_ERR(cxt->przs[i]); dev_err(dev, "failed to request mem region (0x%zx@0x%llx): %d\n", @@ -327,7 +328,7 @@ fail_prz:
static int ramoops_init_prz(struct device *dev, struct ramoops_context *cxt, struct persistent_ram_zone **prz, - phys_addr_t *paddr, size_t sz) + phys_addr_t *paddr, size_t sz, u32 sig) { if (!sz) return 0; @@ -335,7 +336,7 @@ static int ramoops_init_prz(struct device *dev, struct ramoops_context *cxt, if (*paddr + sz > *paddr + cxt->size) return -ENOMEM;
- *prz = persistent_ram_new(*paddr, sz, cxt->ecc_size); + *prz = persistent_ram_new(*paddr, sz, sig, cxt->ecc_size); if (IS_ERR(*prz)) { int err = PTR_ERR(*prz);
@@ -394,11 +395,13 @@ static int __devinit ramoops_probe(struct platform_device *pdev) if (err) goto fail_out;
- err = ramoops_init_prz(dev, cxt, &cxt->cprz, &paddr, cxt->console_size); + err = ramoops_init_prz(dev, cxt, &cxt->cprz, &paddr, + cxt->console_size, 0); if (err) goto fail_init_cprz;
- err = ramoops_init_prz(dev, cxt, &cxt->fprz, &paddr, cxt->ftrace_size); + err = ramoops_init_prz(dev, cxt, &cxt->fprz, &paddr, cxt->ftrace_size, + LINUX_VERSION_CODE); if (err) goto fail_init_fprz;
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index 4dabbb8..eecd2a8 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -391,7 +391,7 @@ static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size, }
static int __devinit persistent_ram_post_init(struct persistent_ram_zone *prz, - int ecc_size) + u32 sig, int ecc_size) { int ret;
@@ -399,7 +399,9 @@ static int __devinit persistent_ram_post_init(struct persistent_ram_zone *prz, if (ret) return ret;
- if (prz->buffer->sig == PERSISTENT_RAM_SIG) { + sig ^= PERSISTENT_RAM_SIG; + + if (prz->buffer->sig == sig) { if (buffer_size(prz) > prz->buffer_size || buffer_start(prz) > buffer_size(prz)) pr_info("persistent_ram: found existing invalid buffer," @@ -417,7 +419,7 @@ static int __devinit persistent_ram_post_init(struct persistent_ram_zone *prz, " (sig = 0x%08x)\n", prz->buffer->sig); }
- prz->buffer->sig = PERSISTENT_RAM_SIG; + prz->buffer->sig = sig; persistent_ram_zap(prz);
return 0; @@ -442,7 +444,7 @@ void persistent_ram_free(struct persistent_ram_zone *prz) }
struct persistent_ram_zone * __devinit persistent_ram_new(phys_addr_t start, - size_t size, + size_t size, u32 sig, int ecc_size) { struct persistent_ram_zone *prz; @@ -458,7 +460,7 @@ struct persistent_ram_zone * __devinit persistent_ram_new(phys_addr_t start, if (ret) goto err;
- ret = persistent_ram_post_init(prz, ecc_size); + ret = persistent_ram_post_init(prz, sig, ecc_size); if (ret) goto err;
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h index ba2b211..098d2a8 100644 --- a/include/linux/pstore_ram.h +++ b/include/linux/pstore_ram.h @@ -47,7 +47,7 @@ struct persistent_ram_zone { };
struct persistent_ram_zone * __devinit persistent_ram_new(phys_addr_t start, - size_t size, + size_t size, u32 sig, int ecc_size); void persistent_ram_free(struct persistent_ram_zone *prz); void persistent_ram_zap(struct persistent_ram_zone *prz);
Since the function accepts just one bit, we can use the switch construction instead of if/else if/...
Just a cosmetic change, there should be no functional changes.
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- kernel/trace/trace_functions.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c index 13770ab..a426f41 100644 --- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -217,10 +217,11 @@ static void tracing_stop_function_trace(void)
static int func_set_flag(u32 old_flags, u32 bit, int set) { - if (bit == TRACE_FUNC_OPT_STACK) { + switch (bit) { + case TRACE_FUNC_OPT_STACK: /* do nothing if already set */ if (!!set == !!(func_flags.val & TRACE_FUNC_OPT_STACK)) - return 0; + break;
if (set) { unregister_ftrace_function(&trace_ops); @@ -230,12 +231,14 @@ static int func_set_flag(u32 old_flags, u32 bit, int set) register_ftrace_function(&trace_ops); }
- return 0; - } else if (bit == TRACE_FUNC_OPT_PSTORE) { - return 0; + break; + case TRACE_FUNC_OPT_PSTORE: + break; + default: + return -EINVAL; }
- return -EINVAL; + return 0; }
static struct tracer function_trace __read_mostly =
On Tuesday 10 July 2012, Anton Vorontsov wrote:
With this support kernel can save functions call chain log into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. It can be used to determine what function was last called before a hang or an unexpected reset (caused by, for example, a buggy driver that abuses HW).
I just looked at the patches, and they are all well implemented. I could find nothing wrong with the implementation. As to the question of whether we want this functionality I have to leave that to the tracing people.
Arnd
On Tue, 2012-07-10 at 12:01 +0000, Arnd Bergmann wrote:
On Tuesday 10 July 2012, Anton Vorontsov wrote:
With this support kernel can save functions call chain log into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. It can be used to determine what function was last called before a hang or an unexpected reset (caused by, for example, a buggy driver that abuses HW).
I just looked at the patches, and they are all well implemented. I could find nothing wrong with the implementation. As to the question of whether we want this functionality I have to leave that to the tracing people.
I'm fine with it. You can add my:
Acked-by: Steven Rostedt rostedt@goodmis.org
to the tracing specific patches (I haven't taken a deep look at the pstore specific patches).
-- Steve
Hi Greg,
On Fri, Jun 15, 2012 at 02:28:24PM -0700, Greg Kroah-Hartman wrote: [...]
This looks fine to me, but as it touches pstore and tracing code, I can't apply it without acks from the relevant maintainers/owners.
Now that the patchset has an ack from Steven Rostedt, can you please apply it? I know, it might be a bit late, but we still have a week (or more :-), plus the patches are really not that intrusive as the initial pstore rework; now it's just small feature additions.
The kernel/trace/ part is also small and unlikely to cause any issues. In a sensitive place, it just adds pstore hook, and the rest is just some code for the sysfs option.
Please note that these patches go on top of another series, i.e.
[PATCH resend 0/3] pstore/ram: Configurable ECC size
^ This is a new feature plus two cleanups, the ECC series has an Ack from Kees Cook.
Thanks!
On Mon, Jul 16, 2012 at 12:14:21AM -0700, Anton Vorontsov wrote:
Hi Greg,
On Fri, Jun 15, 2012 at 02:28:24PM -0700, Greg Kroah-Hartman wrote: [...]
This looks fine to me, but as it touches pstore and tracing code, I can't apply it without acks from the relevant maintainers/owners.
Now that the patchset has an ack from Steven Rostedt, can you please apply it? I know, it might be a bit late, but we still have a week (or more :-), plus the patches are really not that intrusive as the initial pstore rework; now it's just small feature additions.
Ok, I'll queue it up later today or tomorrow.
greg k-h
linaro-kernel@lists.linaro.org