Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
v3: Patch 3,4,5,7,9: removed all the eventfs_rwsem code and replaced it with an srcu lock for the readers, and a mutex to synchronize the writers of the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really dangerous to have unchecked recursion in the kernel (we do have a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods are done, it adds the eventfs_file onto a llist (lockless link list) and wakes up a work queue. Then the work queue does the freeing (this needs to be done in task/workqueue context, as srcu callbacks are done in softirq context).
Patch 6: renamed: eventfs_create_file() -> create_file() eventfs_create_dir() -> create_dir()
v2: Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' Patch 02: moved from v1 1/9 Patch 03: moved from v1 2/9 As suggested by Zheng Yejian, introduced eventfs_prepare_ef() helper function to add files or directories to eventfs fix WARNING reported by kernel test robot in v1 8/9 Patch 04: moved from v1 3/9 used eventfs_prepare_ef() to add files fix WARNING reported by kernel test robot in v1 8/9 Patch 05: moved from v1 4/9 fix compiling warning reported by kernel test robot in v1 4/9 Patch 06: moved from v1 5/9 Patch 07: moved from v1 6/9 Patch 08: moved from v1 7/9 Patch 09: moved from v1 8/9 rebased because of v3 01/10 Patch 10: moved from v1 9/9
v1: Patch 1: add header file Patch 2: resolved kernel test robot issues protecting eventfs lists using nested eventfs_rwsem Patch 3: protecting eventfs lists using nested eventfs_rwsem Patch 4: improve events cleanup code to fix crashes Patch 5: resolved kernel test robot issues removed d_instantiate_anon() calls Patch 6: resolved kernel test robot issues fix kprobe test in eventfs_root_lookup() protecting eventfs lists using nested eventfs_rwsem Patch 7: remove header file Patch 8: pass eventfs_rwsem as argument to eventfs functions called eventfs_remove_events_dir() instead of tracefs_remove() from event_trace_del_tracer() Patch 9: new patch to fix kprobe test case
Ajay Kaher (9): tracefs: Rename some tracefs function eventfs: Implement eventfs dir creation functions eventfs: Implement eventfs file add functions eventfs: Implement eventfs file, directory remove function eventfs: Implement functions to create eventfs files and directories eventfs: Implement eventfs lookup, read, open functions eventfs: Implement tracefs_inode_cache eventfs: Move tracing/events to eventfs test: ftrace: Fix kprobe test for eventfs
Steven Rostedt (Google) (1): tracing: Require all trace events to have a TRACE_SYSTEM
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 711 ++++++++++++++++++ fs/tracefs/inode.c | 124 ++- fs/tracefs/internal.h | 25 + include/linux/trace_events.h | 1 + include/linux/tracefs.h | 32 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 78 +- .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +- .../test.d/kprobe/kprobe_args_string.tc | 4 +- 10 files changed, 930 insertions(+), 52 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The creation of the trace event directory requires that a TRACE_SYSTEM is defined that the trace event directory is added within the system it was defined in.
The code handled the case where a TRACE_SYSTEM was not added, and would then add the event at the events directory. But nothing should be doing this. This code also prevents the implementation of creating dynamic dentrys for the eventfs system.
As this path has never been hit on correct code, remove it. If it does get hit, issues a WARN_ON_ONCE() and return ENODEV.
Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Ajay Kaher akaher@vmware.com --- kernel/trace/trace_events.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 57e539d47989..12ed71428939 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2426,14 +2426,15 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
/* * If the trace point header did not define TRACE_SYSTEM - * then the system would be called "TRACE_SYSTEM". + * then the system would be called "TRACE_SYSTEM". This should + * never happen. */ - if (strcmp(call->class->system, TRACE_SYSTEM) != 0) { - d_events = event_subsystem_dir(tr, call->class->system, file, parent); - if (!d_events) - return -ENOMEM; - } else - d_events = parent; + if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) + return -ENODEV; + + d_events = event_subsystem_dir(tr, call->class->system, file, parent); + if (!d_events) + return -ENOMEM;
name = trace_event_name(call); file->dir = tracefs_create_dir(name, d_events);
Renaming following functions as these would require by eventfs as well:
start_creating -> tracefs_start_creating failed_creating -> tracefs_failed_creating end_creating -> tracefs_end_creating
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 57ac8aa4a724..b0348efc0238 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs");
-static struct dentry *start_creating(const char *name, struct dentry *parent) +static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; @@ -437,7 +437,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) return dentry; }
-static struct dentry *failed_creating(struct dentry *dentry) +static struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -445,7 +445,7 @@ static struct dentry *failed_creating(struct dentry *dentry) return NULL; }
-static struct dentry *end_creating(struct dentry *dentry) +static struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; @@ -490,14 +490,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, if (!(mode & S_IFMT)) mode |= S_IFREG; BUG_ON(!S_ISREG(mode)); - dentry = start_creating(name, parent); + dentry = tracefs_start_creating(name, parent);
if (IS_ERR(dentry)) return NULL;
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return failed_creating(dentry); + return tracefs_failed_creating(dentry);
inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; @@ -506,13 +506,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return tracefs_end_creating(dentry); }
static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) { - struct dentry *dentry = start_creating(name, parent); + struct dentry *dentry = tracefs_start_creating(name, parent); struct inode *inode;
if (IS_ERR(dentry)) @@ -520,7 +520,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return failed_creating(dentry); + return tracefs_failed_creating(dentry);
/* Do not set bits for OTH */ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; @@ -534,7 +534,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return tracefs_end_creating(dentry); }
/**
Some nits.
First, the subject should be:
tracefs: Rename and export some tracefs functions
Yes, we should export them here too, even though they are not used yet.
On Thu, 13 Jul 2023 17:03:16 +0530 Ajay Kaher akaher@vmware.com wrote:
Renaming following functions as these would require by eventfs as well:
I would have this state:
Export a few tracefs functions that will be needed by the eventfs dynamic file system. Rename them to start with "tracefs_" to keep with the name space.
start_creating -> tracefs_start_creating failed_creating -> tracefs_failed_creating end_creating -> tracefs_end_creating
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 57ac8aa4a724..b0348efc0238 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs"); -static struct dentry *start_creating(const char *name, struct dentry *parent) +static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
Remove the static, and create the fs/tracefs/internal.h file in this patch.
-- Steve
{ struct dentry *dentry; int error; @@ -437,7 +437,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) return dentry; } -static struct dentry *failed_creating(struct dentry *dentry) +static struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -445,7 +445,7 @@ static struct dentry *failed_creating(struct dentry *dentry) return NULL; } -static struct dentry *end_creating(struct dentry *dentry) +static struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; @@ -490,14 +490,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, if (!(mode & S_IFMT)) mode |= S_IFREG; BUG_ON(!S_ISREG(mode));
- dentry = start_creating(name, parent);
- dentry = tracefs_start_creating(name, parent);
if (IS_ERR(dentry)) return NULL; inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode))
return failed_creating(dentry);
return tracefs_failed_creating(dentry);
inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; @@ -506,13 +506,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
- return tracefs_end_creating(dentry);
} static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) {
- struct dentry *dentry = start_creating(name, parent);
- struct dentry *dentry = tracefs_start_creating(name, parent); struct inode *inode;
if (IS_ERR(dentry)) @@ -520,7 +520,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode))
return failed_creating(dentry);
return tracefs_failed_creating(dentry);
/* Do not set bits for OTH */ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; @@ -534,7 +534,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
- return tracefs_end_creating(dentry);
} /**
Adding eventfs_file structure which will hold properties of file or dir.
Adding following functions to add dir in eventfs:
eventfs_create_events_dir() will directly create events dir within tracing folder.
eventfs_add_subsystem_dir() adds the information of subsystem_dir to eventfs, and dynamically creates subsystem_dir directories when they are accessed.
eventfs_add_dir() adds the information of the dir, within a subsystem_dir, to eventfs and dynamically creates these directories when they are accessed.
Adding tracefs_inode structure, this will help eventfs to keep track of inode, flags and pointer to private date.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com --- fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 217 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/inode.c | 8 +- fs/tracefs/internal.h | 25 +++++ include/linux/tracefs.h | 11 ++ 5 files changed, 258 insertions(+), 4 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile index 7c35a282b484..73c56da8e284 100644 --- a/fs/tracefs/Makefile +++ b/fs/tracefs/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only tracefs-objs := inode.o +tracefs-objs += event_inode.o
obj-$(CONFIG_TRACING) += tracefs.o
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c new file mode 100644 index 000000000000..4e7a8eccaa0b --- /dev/null +++ b/fs/tracefs/event_inode.c @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * event_inode.c - part of tracefs, a pseudo file system for activating tracing + * + * Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) rostedt@goodmis.org + * Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher akaher@vmware.com + * + * eventfs is used to show trace events with one set of dentries + * + * eventfs stores meta-data of files/dirs and skip to create object of + * inodes/dentries. As and when requires, eventfs will create the + * inodes/dentries for only required files/directories. Also eventfs + * would delete the inodes/dentries once no more requires but preserve + * the meta data. + */ +#include <linux/fsnotify.h> +#include <linux/fs.h> +#include <linux/namei.h> +#include <linux/workqueue.h> +#include <linux/security.h> +#include <linux/tracefs.h> +#include <linux/kref.h> +#include <linux/delay.h> +#include "internal.h" + +struct eventfs_inode { + struct list_head e_top_files; +}; + +struct eventfs_file { + const char *name; + struct dentry *d_parent; + struct dentry *dentry; + struct list_head list; + struct eventfs_inode *ei; + const struct file_operations *fop; + const struct inode_operations *iop; + union { + struct rcu_head rcu; + struct llist_node llist; /* For freeing after RCU */ + }; + void *data; + umode_t mode; + bool created; +}; + +static DEFINE_MUTEX(eventfs_mutex); + +static const struct file_operations eventfs_file_operations = { +}; + +static const struct inode_operations eventfs_root_dir_inode_operations = { +}; + +/** + * eventfs_prepare_ef - helper function to prepare eventfs_file + * @name: the name of the file/directory to create. + * @mode: the permission that the file should have. + * @fop: struct file_operations that should be used for this file/directory. + * @iop: struct inode_operations that should be used for this file/directory. + * @data: something that the caller will want to get to later on. The + * inode.i_private pointer will point to this value on the open() call. + * + * This function allocates and fills the eventfs_file structure. + */ +static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode, + const struct file_operations *fop, + const struct inode_operations *iop, + void *data) +{ + struct eventfs_file *ef; + + ef = kzalloc(sizeof(*ef), GFP_KERNEL); + if (!ef) + return ERR_PTR(-ENOMEM); + + ef->name = kstrdup(name, GFP_KERNEL); + if (!ef->name) { + kfree(ef); + return ERR_PTR(-ENOMEM); + } + + if (S_ISDIR(mode)) { + ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL); + if (!ef->ei) { + kfree(ef->name); + kfree(ef); + return ERR_PTR(-ENOMEM); + } + INIT_LIST_HEAD(&ef->ei->e_top_files); + } else { + ef->ei = NULL; + } + + ef->iop = iop; + ef->fop = fop; + ef->mode = mode; + ef->data = data; + return ef; +} + +/** + * eventfs_create_events_dir - create the trace event structure + * @name: the name of the directory to create. + * @parent: parent dentry for this file. This should be a directory dentry + * if set. If this parameter is NULL, then the directory will be + * created in the root of the tracefs filesystem. + * + * This function creates the top of the trace event directory. + */ +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent) +{ + struct dentry *dentry = tracefs_start_creating(name, parent); + struct eventfs_inode *ei; + struct tracefs_inode *ti; + struct inode *inode; + + if (IS_ERR(dentry)) + return dentry; + + ei = kzalloc(sizeof(*ei), GFP_KERNEL); + if (!ei) + return ERR_PTR(-ENOMEM); + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) { + kfree(ei); + tracefs_failed_creating(dentry); + return ERR_PTR(-ENOMEM); + } + + INIT_LIST_HEAD(&ei->e_top_files); + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + ti->private = ei; + + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + inode->i_op = &eventfs_root_dir_inode_operations; + inode->i_fop = &eventfs_file_operations; + + /* directory inodes start off with i_nlink == 2 (for "." entry) */ + inc_nlink(inode); + d_instantiate(dentry, inode); + inc_nlink(dentry->d_parent->d_inode); + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); + return tracefs_end_creating(dentry); +} + +/** + * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later + * @name: the name of the file to create. + * @parent: parent dentry for this dir. + * + * This function adds eventfs subsystem dir to list. + * And all these dirs are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent) +{ + struct tracefs_inode *ti_parent; + struct eventfs_inode *ei_parent; + struct eventfs_file *ef; + + if (!parent) + return ERR_PTR(-EINVAL); + + ti_parent = get_tracefs(parent->d_inode); + ei_parent = ti_parent->private; + + ef = eventfs_prepare_ef(name, + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, + &eventfs_file_operations, + &eventfs_root_dir_inode_operations, NULL); + + if (IS_ERR(ef)) + return ef; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ei_parent->e_top_files); + ef->d_parent = parent; + mutex_unlock(&eventfs_mutex); + return ef; +} + +/** + * eventfs_add_dir - add eventfs dir to list to create later + * @name: the name of the file to create. + * @ef_parent: parent eventfs_file for this dir. + * + * This function adds eventfs dir to list. + * And all these dirs are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent) +{ + struct eventfs_file *ef; + + if (!ef_parent) + return ERR_PTR(-EINVAL); + + ef = eventfs_prepare_ef(name, + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, + &eventfs_file_operations, + &eventfs_root_dir_inode_operations, NULL); + + if (IS_ERR(ef)) + return ef; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + ef->d_parent = ef_parent->dentry; + mutex_unlock(&eventfs_mutex); + return ef; +} diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index b0348efc0238..7ef3a02766f5 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -127,7 +127,7 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, };
-static struct inode *tracefs_get_inode(struct super_block *sb) +struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); if (inode) { @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs");
-static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; @@ -437,7 +437,7 @@ static struct dentry *tracefs_start_creating(const char *name, struct dentry *pa return dentry; }
-static struct dentry *tracefs_failed_creating(struct dentry *dentry) +struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -445,7 +445,7 @@ static struct dentry *tracefs_failed_creating(struct dentry *dentry) return NULL; }
-static struct dentry *tracefs_end_creating(struct dentry *dentry) +struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h new file mode 100644 index 000000000000..c443a0c32a8c --- /dev/null +++ b/fs/tracefs/internal.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TRACEFS_INTERNAL_H +#define _TRACEFS_INTERNAL_H + +enum { + TRACEFS_EVENT_INODE = BIT(1), +}; + +struct tracefs_inode { + unsigned long flags; + void *private; + struct inode vfs_inode; +}; + +static inline struct tracefs_inode *get_tracefs(const struct inode *inode) +{ + return container_of(inode, struct tracefs_inode, vfs_inode); +} + +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); +struct dentry *tracefs_end_creating(struct dentry *dentry); +struct dentry *tracefs_failed_creating(struct dentry *dentry); +struct inode *tracefs_get_inode(struct super_block *sb); + +#endif /* _TRACEFS_INTERNAL_H */ diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 99912445974c..432e5e6f7901 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -21,6 +21,17 @@ struct file_operations;
#ifdef CONFIG_TRACING
+struct eventfs_file; + +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent); + +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent); + +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Some more nits.
On Thu, 13 Jul 2023 17:03:17 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding eventfs_file structure which will hold properties of file or dir.
Add eventfs_file structure which will hold the properties of the eventfs files and directories.
Adding following functions to add dir in eventfs:
Add following functions to create the directories in eventfs:
eventfs_create_events_dir() will directly create events dir within tracing folder.
eventfs_create_events_dir() will create the top level "events" directory within the tracefs file system.
eventfs_add_subsystem_dir() adds the information of subsystem_dir to eventfs, and dynamically creates subsystem_dir directories when they are accessed.
eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the given name of the subsystem.
eventfs_add_dir() adds the information of the dir, within a subsystem_dir, to eventfs and dynamically creates these directories when they are accessed.
eventfs_add_dir() creates an eventfs_file descriptor with the given name of the directory and attached to a eventfs_file of a subsystem.
Adding tracefs_inode structure, this will help eventfs to keep track of inode, flags and pointer to private date.
Add tracefs_inode structure to hold the inodes, flags and pointers to private data used by eventfs.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 217 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/inode.c | 8 +- fs/tracefs/internal.h | 25 +++++ include/linux/tracefs.h | 11 ++ 5 files changed, 258 insertions(+), 4 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile index 7c35a282b484..73c56da8e284 100644 --- a/fs/tracefs/Makefile +++ b/fs/tracefs/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only tracefs-objs := inode.o +tracefs-objs += event_inode.o obj-$(CONFIG_TRACING) += tracefs.o diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c new file mode 100644 index 000000000000..4e7a8eccaa0b --- /dev/null +++ b/fs/tracefs/event_inode.c @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- event_inode.c - part of tracefs, a pseudo file system for activating tracing
- Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) rostedt@goodmis.org
- Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher akaher@vmware.com
- eventfs is used to show trace events with one set of dentries
eventfs is used to dynamically create inodes and dentries based on the meta data provided by the tracing system.
- eventfs stores meta-data of files/dirs and skip to create object of
- inodes/dentries. As and when requires, eventfs will create the
- inodes/dentries for only required files/directories. Also eventfs
- would delete the inodes/dentries once no more requires but preserve
- the meta data.
eventfs stores the meta-data of files/dirs and holds off on creating inodes/dentries of the files. When accessed, the eventfs will create the inodes/dentries in a just-in-time (JIT) manner. The eventfs will clean up and delete the inodes/dentries when they are no longer referenced.
- */
+#include <linux/fsnotify.h> +#include <linux/fs.h> +#include <linux/namei.h> +#include <linux/workqueue.h> +#include <linux/security.h> +#include <linux/tracefs.h> +#include <linux/kref.h> +#include <linux/delay.h> +#include "internal.h"
+struct eventfs_inode {
- struct list_head e_top_files;
+};
We probably want to document the below structure, and only add fields as they are added by the patches.
+struct eventfs_file {
- const char *name;
- struct dentry *d_parent;
- struct dentry *dentry;
dentry is never assigned, although at the end in eventfs_add_dir() we have:
ef->d_parent = ef_parent->dentry;
Both the above assignment and entry in the structure should be removed from this patch. It makes it easier to review. Because that assignment is meaningless. When the above line is added in later patches, it should have meaning.
When breaking up patches, you don't just want to cut and paste in blocks of functions. You want to build the functionality, so the added code makes sense.
- struct list_head list;
- struct eventfs_inode *ei;
- const struct file_operations *fop;
- const struct inode_operations *iop;
- union {
struct rcu_head rcu;
struct llist_node llist; /* For freeing after RCU */
- };
The above union should be added when the clean up code is added.
- void *data;
- umode_t mode;
- bool created;
The "created" field should be added when its use case is added.
+};
+static DEFINE_MUTEX(eventfs_mutex);
+static const struct file_operations eventfs_file_operations = { +};
+static const struct inode_operations eventfs_root_dir_inode_operations = { +};
+/**
- eventfs_prepare_ef - helper function to prepare eventfs_file
- @name: the name of the file/directory to create.
- @mode: the permission that the file should have.
- @fop: struct file_operations that should be used for this file/directory.
- @iop: struct inode_operations that should be used for this file/directory.
- @data: something that the caller will want to get to later on. The
inode.i_private pointer will point to this value on the open() call.
- This function allocates and fills the eventfs_file structure.
- */
+static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
const struct file_operations *fop,
const struct inode_operations *iop,
void *data)
+{
- struct eventfs_file *ef;
- ef = kzalloc(sizeof(*ef), GFP_KERNEL);
- if (!ef)
return ERR_PTR(-ENOMEM);
- ef->name = kstrdup(name, GFP_KERNEL);
- if (!ef->name) {
kfree(ef);
return ERR_PTR(-ENOMEM);
- }
- if (S_ISDIR(mode)) {
ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
if (!ef->ei) {
kfree(ef->name);
kfree(ef);
return ERR_PTR(-ENOMEM);
}
INIT_LIST_HEAD(&ef->ei->e_top_files);
- } else {
ef->ei = NULL;
- }
- ef->iop = iop;
- ef->fop = fop;
- ef->mode = mode;
- ef->data = data;
- return ef;
+}
+/**
- eventfs_create_events_dir - create the trace event structure
- @name: the name of the directory to create.
- @parent: parent dentry for this file. This should be a directory dentry
if set. If this parameter is NULL, then the directory will be
created in the root of the tracefs filesystem.
- This function creates the top of the trace event directory.
- */
+struct dentry *eventfs_create_events_dir(const char *name,
struct dentry *parent)
+{
- struct dentry *dentry = tracefs_start_creating(name, parent);
- struct eventfs_inode *ei;
- struct tracefs_inode *ti;
- struct inode *inode;
- if (IS_ERR(dentry))
return dentry;
- ei = kzalloc(sizeof(*ei), GFP_KERNEL);
- if (!ei)
return ERR_PTR(-ENOMEM);
- inode = tracefs_get_inode(dentry->d_sb);
- if (unlikely(!inode)) {
kfree(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
- }
- INIT_LIST_HEAD(&ei->e_top_files);
- ti = get_tracefs(inode);
- ti->flags |= TRACEFS_EVENT_INODE;
- ti->private = ei;
- inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
- inode->i_op = &eventfs_root_dir_inode_operations;
- inode->i_fop = &eventfs_file_operations;
- /* directory inodes start off with i_nlink == 2 (for "." entry) */
- inc_nlink(inode);
- d_instantiate(dentry, inode);
- inc_nlink(dentry->d_parent->d_inode);
- fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
- return tracefs_end_creating(dentry);
+}
+/**
- eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later
- @name: the name of the file to create.
- @parent: parent dentry for this dir.
- This function adds eventfs subsystem dir to list.
- And all these dirs are created on the fly when they are looked up,
- and the dentry and inodes will be removed when they are done.
- */
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
struct dentry *parent)
+{
- struct tracefs_inode *ti_parent;
- struct eventfs_inode *ei_parent;
- struct eventfs_file *ef;
- if (!parent)
return ERR_PTR(-EINVAL);
- ti_parent = get_tracefs(parent->d_inode);
- ei_parent = ti_parent->private;
- ef = eventfs_prepare_ef(name,
S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
&eventfs_file_operations,
&eventfs_root_dir_inode_operations, NULL);
- if (IS_ERR(ef))
return ef;
- mutex_lock(&eventfs_mutex);
- list_add_tail(&ef->list, &ei_parent->e_top_files);
- ef->d_parent = parent;
- mutex_unlock(&eventfs_mutex);
- return ef;
+}
+/**
- eventfs_add_dir - add eventfs dir to list to create later
- @name: the name of the file to create.
- @ef_parent: parent eventfs_file for this dir.
- This function adds eventfs dir to list.
- And all these dirs are created on the fly when they are looked up,
- and the dentry and inodes will be removed when they are done.
- */
+struct eventfs_file *eventfs_add_dir(const char *name,
struct eventfs_file *ef_parent)
+{
- struct eventfs_file *ef;
- if (!ef_parent)
return ERR_PTR(-EINVAL);
- ef = eventfs_prepare_ef(name,
S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
&eventfs_file_operations,
&eventfs_root_dir_inode_operations, NULL);
- if (IS_ERR(ef))
return ef;
- mutex_lock(&eventfs_mutex);
- list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
- ef->d_parent = ef_parent->dentry;
As mentioned above. Let's not add the above assignment yet, as the ef_parent->dentry will not be anything but NULL here.
-- Steve
- mutex_unlock(&eventfs_mutex);
- return ef;
+} diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index b0348efc0238..7ef3a02766f5 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -127,7 +127,7 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, }; -static struct inode *tracefs_get_inode(struct super_block *sb) +struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); if (inode) { @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs"); -static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; @@ -437,7 +437,7 @@ static struct dentry *tracefs_start_creating(const char *name, struct dentry *pa return dentry; } -static struct dentry *tracefs_failed_creating(struct dentry *dentry) +struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -445,7 +445,7 @@ static struct dentry *tracefs_failed_creating(struct dentry *dentry) return NULL; } -static struct dentry *tracefs_end_creating(struct dentry *dentry) +struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h new file mode 100644 index 000000000000..c443a0c32a8c --- /dev/null +++ b/fs/tracefs/internal.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TRACEFS_INTERNAL_H +#define _TRACEFS_INTERNAL_H
+enum {
- TRACEFS_EVENT_INODE = BIT(1),
+};
+struct tracefs_inode {
- unsigned long flags;
- void *private;
- struct inode vfs_inode;
+};
+static inline struct tracefs_inode *get_tracefs(const struct inode *inode) +{
- return container_of(inode, struct tracefs_inode, vfs_inode);
+}
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); +struct dentry *tracefs_end_creating(struct dentry *dentry); +struct dentry *tracefs_failed_creating(struct dentry *dentry); +struct inode *tracefs_get_inode(struct super_block *sb);
+#endif /* _TRACEFS_INTERNAL_H */ diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 99912445974c..432e5e6f7901 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -21,6 +21,17 @@ struct file_operations; #ifdef CONFIG_TRACING +struct eventfs_file;
+struct dentry *eventfs_create_events_dir(const char *name,
struct dentry *parent);
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
struct dentry *parent);
+struct eventfs_file *eventfs_add_dir(const char *name,
struct eventfs_file *ef_parent);
struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Adding following function to eventfs to add files:
eventfs_add_top_file() adds the info of top file to eventfs and dynamically create these files when they are accessed.
eventfs_add_file() adds the info of nested files to eventfs and dynamically create these files when they are accessed.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com --- fs/tracefs/event_inode.c | 84 ++++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 8 ++++ 2 files changed, 92 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 4e7a8eccaa0b..75dc8953d813 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -215,3 +215,87 @@ struct eventfs_file *eventfs_add_dir(const char *name, mutex_unlock(&eventfs_mutex); return ef; } + +/** + * eventfs_add_top_file - add event top file to list to create later + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @parent: parent dentry for this file. + * @data: something that the caller will want to get to later on. The + * inode.i_private pointer will point to this value on the open() call. + * @fop: struct file_operations that should be used for this file. + * + * This function adds top files of event dir to list. + * And all these files are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +int eventfs_add_top_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + + if (!parent) + return -EINVAL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + if (!parent->d_inode) + return -EINVAL; + + ti = get_tracefs(parent->d_inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); + + if (IS_ERR(ef)) + return -ENOMEM; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ei->e_top_files); + ef->d_parent = parent; + mutex_unlock(&eventfs_mutex); + return 0; +} + +/** + * eventfs_add_file - add eventfs file to list to create later + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @ef_parent: parent eventfs_file for this file. + * @data: something that the caller will want to get to later on. The + * inode.i_private pointer will point to this value on the open() call. + * @fop: struct file_operations that should be used for this file. + * + * This function adds top files of event dir to list. + * And all these files are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, + void *data, + const struct file_operations *fop) +{ + struct eventfs_file *ef; + + if (!ef_parent) + return -EINVAL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); + if (IS_ERR(ef)) + return -ENOMEM; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + ef->d_parent = ef_parent->dentry; + mutex_unlock(&eventfs_mutex); + return 0; +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 432e5e6f7901..a51312ff803c 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, struct eventfs_file *eventfs_add_dir(const char *name, struct eventfs_file *ef_parent);
+int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, void *data, + const struct file_operations *fops); + +int eventfs_add_top_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
More nits.
On Thu, 13 Jul 2023 17:03:18 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding following function to eventfs to add files:
Add the following functions to add files to evenfs:
eventfs_add_top_file() adds the info of top file to eventfs and dynamically create these files when they are accessed.
I wonder if we should rename this, because I didn't really know what "top" meant. Perhaps:
eventfs_add_events_file() to add the data needed to create a specific file located at the top level events directory. The dentry/inode will be created when the events directory is scanned.
eventfs_add_file() adds the info of nested files to eventfs and dynamically create these files when they are accessed.
eventfs_add_file() to add the data needed for files within the directories below the top level events directory. The dentry/inode of the file will be created when the directory that the file is in is scanned.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
fs/tracefs/event_inode.c | 84 ++++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 8 ++++ 2 files changed, 92 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 4e7a8eccaa0b..75dc8953d813 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -215,3 +215,87 @@ struct eventfs_file *eventfs_add_dir(const char *name, mutex_unlock(&eventfs_mutex); return ef; }
+/**
- eventfs_add_top_file - add event top file to list to create later
I'm still thinking we should rename this to eventfs_add_events_file() to match the "eventfs_create_events_dir()".
eventfs_add_events_file - add the data needed to create file in events dir
- @name: the name of the file to create.
- @mode: the permission that the file should have.
- @parent: parent dentry for this file.
- @data: something that the caller will want to get to later on. The
inode.i_private pointer will point to this value on the open() call.
Note, for kerneldoc, it's best to have the above be one line, and then add anything else below it. * @data: something that the caller will want to get to later on.
- @fop: struct file_operations that should be used for this file.
- This function adds top files of event dir to list.
- And all these files are created on the fly when they are looked up,
- and the dentry and inodes will be removed when they are done.
* This function is used to add the information needed to create a * dentry/inode within the top level events directory. The file created * will have the @mode permissions. The @data will be used to fill the * inode.i_private when the open() call is done. The dentry and inodes are * all created when they are referenced, and removed when they are no * longer referenced.
- */
+int eventfs_add_top_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fop)
+{
- struct tracefs_inode *ti;
- struct eventfs_inode *ei;
- struct eventfs_file *ef;
- if (!parent)
return -EINVAL;
- if (!(mode & S_IFMT))
mode |= S_IFREG;
- if (!parent->d_inode)
return -EINVAL;
- ti = get_tracefs(parent->d_inode);
- if (!(ti->flags & TRACEFS_EVENT_INODE))
return -EINVAL;
- ei = ti->private;
- ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
- if (IS_ERR(ef))
return -ENOMEM;
- mutex_lock(&eventfs_mutex);
- list_add_tail(&ef->list, &ei->e_top_files);
- ef->d_parent = parent;
- mutex_unlock(&eventfs_mutex);
- return 0;
+}
+/**
- eventfs_add_file - add eventfs file to list to create later
Same idea here.
eventfs_add_file - add the data needed to create a file for later reference
- @name: the name of the file to create.
- @mode: the permission that the file should have.
- @ef_parent: parent eventfs_file for this file.
- @data: something that the caller will want to get to later on. The
inode.i_private pointer will point to this value on the open() call.
And same note about the line.
- @fop: struct file_operations that should be used for this file.
- This function adds top files of event dir to list.
Seems you have a cut and paste error here.
- And all these files are created on the fly when they are looked up,
- and the dentry and inodes will be removed when they are done.
* This function is used to add the information needed to create a * file within a subdirectory of the events directory. The file created * will have the @mode permissions. The @data will be used to fill the * inode.i_private when the open() call is done. The dentry and inodes are * all created when they are referenced, and removed when they are no * longer referenced.
-- Steve
- */
+int eventfs_add_file(const char *name, umode_t mode,
struct eventfs_file *ef_parent,
void *data,
const struct file_operations *fop)
+{
- struct eventfs_file *ef;
- if (!ef_parent)
return -EINVAL;
- if (!(mode & S_IFMT))
mode |= S_IFREG;
- ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
- if (IS_ERR(ef))
return -ENOMEM;
- mutex_lock(&eventfs_mutex);
- list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
- ef->d_parent = ef_parent->dentry;
- mutex_unlock(&eventfs_mutex);
- return 0;
+} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 432e5e6f7901..a51312ff803c 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, struct eventfs_file *eventfs_add_dir(const char *name, struct eventfs_file *ef_parent); +int eventfs_add_file(const char *name, umode_t mode,
struct eventfs_file *ef_parent, void *data,
const struct file_operations *fops);
+int eventfs_add_top_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);
struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Adding eventfs_remove(), this function will recursively remove dir or file info from eventfs.
added a recursion check to eventfs_remove_rec() as it is really dangerous to have unchecked recursion in the kernel (we do have a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods are done, it adds the eventfs_file onto a llist (lockless link list) and wakes up a work queue. Then the work queue does the freeing (this needs to be done in task/workqueue context, as srcu callbacks are done in softirq context).
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/ --- fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 4 ++ 2 files changed, 114 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 75dc8953d813..322a77be5a56 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -45,6 +45,7 @@ struct eventfs_file { };
static DEFINE_MUTEX(eventfs_mutex); +DEFINE_STATIC_SRCU(eventfs_srcu);
static const struct file_operations eventfs_file_operations = { }; @@ -299,3 +300,112 @@ int eventfs_add_file(const char *name, umode_t mode, mutex_unlock(&eventfs_mutex); return 0; } + +static LLIST_HEAD(free_list); + +static void eventfs_workfn(struct work_struct *work) +{ + struct eventfs_file *ef, *tmp; + struct llist_node *llnode; + + llnode = llist_del_all(&free_list); + llist_for_each_entry_safe(ef, tmp, llnode, llist) { + if (ef->created && ef->dentry) + dput(ef->dentry); + kfree(ef->name); + kfree(ef->ei); + kfree(ef); + } +} + +DECLARE_WORK(eventfs_work, eventfs_workfn); + +static void free_ef(struct rcu_head *head) +{ + struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); + + if (!llist_add(&ef->llist, &free_list)) + return; + + queue_work(system_unbound_wq, &eventfs_work); +} + + + +/** + * eventfs_remove_rec - remove eventfs dir or file from list + * @ef: eventfs_file to be removed. + * + * This function recursively remove eventfs_file which + * contains info of file or dir. + */ +static void eventfs_remove_rec(struct eventfs_file *ef, int level) +{ + struct eventfs_file *ef_child; + + if (!ef) + return; + /* + * Check recursion depth. It should never be greater than 3: + * 0 - events/ + * 1 - events/group/ + * 2 - events/group/event/ + * 3 - events/group/event/file + */ + if (WARN_ON_ONCE(level > 3)) + return; + + if (ef->ei) { + /* search for nested folders or files */ + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, + lockdep_is_held(&eventfs_mutex)) { + eventfs_remove_rec(ef_child, level + 1); + } + } + + if (ef->created && ef->dentry) + d_invalidate(ef->dentry); + + list_del_rcu(&ef->list); + call_srcu(&eventfs_srcu, &ef->rcu, free_ef); +} + +/** + * eventfs_remove - remove eventfs dir or file from list + * @ef: eventfs_file to be removed. + * + * This function acquire the eventfs_mutex lock and call eventfs_remove_rec() + */ +void eventfs_remove(struct eventfs_file *ef) +{ + if (!ef) + return; + + mutex_lock(&eventfs_mutex); + eventfs_remove_rec(ef, 0); + mutex_unlock(&eventfs_mutex); +} + +/** + * eventfs_remove_events_dir - remove eventfs dir or file from list + * @dentry: events's dentry to be removed. + * + * This function remove events main directory + */ +void eventfs_remove_events_dir(struct dentry *dentry) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + + if (!dentry || !dentry->d_inode) + return; + + ti = get_tracefs(dentry->d_inode); + if (!ti || !(ti->flags & TRACEFS_EVENT_INODE)) + return; + + ei = ti->private; + d_invalidate(dentry); + dput(dentry); + kfree(ei); +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index a51312ff803c..2c08edd4a739 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -40,6 +40,10 @@ int eventfs_add_top_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
+void eventfs_remove(struct eventfs_file *ef); + +void eventfs_remove_events_dir(struct dentry *dentry); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Some more nits:
Subject:
eventfs: Implement removal of meta data from eventfs
On Thu, 13 Jul 2023 17:03:19 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding eventfs_remove(), this function will recursively remove dir or file info from eventfs.
When events are removed from tracefs, the eventfs must be aware of this. The eventfs_remove() removes the meta data from eventfs so that it will no longer create the files associated with that event.
When an instance is removed from tracefs, eventfs_remove_events_dir() will remove and clean up the entire "events" directory.
added a recursion check to eventfs_remove_rec() as it is really dangerous to have unchecked recursion in the kernel (we do have a fixed size stack).
The above doesn't need to be in the change log. It's an update from the previous version. The eventfs_remove_rec() is added here, so the recursion check was not.
have the free use srcu callbacks. After the srcu grace periods are done, it adds the eventfs_file onto a llist (lockless link list) and wakes up a work queue. Then the work queue does the freeing (this needs to be done in task/workqueue context, as srcu callbacks are done in softirq context).
If you want to document the above, we can say:
The helper function eventfs_remove_rec() is used to clean up and free the associated data from eventfs for both of the added functions. SRCU is used to protect the lists of meta data stored in the eventfs. The eventfs_mutex is used to protect the content of the items in the list.
As lookups may be happening as deletions of events are made, the freeing of of dentry/inodes and relative information is done after the SRCU grace period has passed. As the callback of SRCU is in a softirq context, a work queue is added to perform the cleanups in a task context.
The struct evenfs_file is given a union of an rcu_head and a llist_node. The SRCU callback uses the rcu_head from this structure to insert it into the SRCU queue. When the SRCU grace periods are complete, the callback will then insert the eventfs_file struct onto a lockless llist using the llist_node of the structure. A union is used as this process is just a hand off from SRCU to workqueue, and only one field is necessary for this to work.
You can also add the above as a comment in the code (and keep it in the change log as well).
-- Steve
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/
fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 4 ++ 2 files changed, 114 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 75dc8953d813..322a77be5a56 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -45,6 +45,7 @@ struct eventfs_file { }; static DEFINE_MUTEX(eventfs_mutex); +DEFINE_STATIC_SRCU(eventfs_srcu); static const struct file_operations eventfs_file_operations = { }; @@ -299,3 +300,112 @@ int eventfs_add_file(const char *name, umode_t mode, mutex_unlock(&eventfs_mutex); return 0; }
+static LLIST_HEAD(free_list);
+static void eventfs_workfn(struct work_struct *work) +{
- struct eventfs_file *ef, *tmp;
- struct llist_node *llnode;
- llnode = llist_del_all(&free_list);
- llist_for_each_entry_safe(ef, tmp, llnode, llist) {
if (ef->created && ef->dentry)
dput(ef->dentry);
kfree(ef->name);
kfree(ef->ei);
kfree(ef);
- }
+}
+DECLARE_WORK(eventfs_work, eventfs_workfn);
+static void free_ef(struct rcu_head *head) +{
- struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
- if (!llist_add(&ef->llist, &free_list))
return;
- queue_work(system_unbound_wq, &eventfs_work);
+}
+/**
- eventfs_remove_rec - remove eventfs dir or file from list
- @ef: eventfs_file to be removed.
- This function recursively remove eventfs_file which
- contains info of file or dir.
- */
+static void eventfs_remove_rec(struct eventfs_file *ef, int level) +{
- struct eventfs_file *ef_child;
- if (!ef)
return;
- /*
* Check recursion depth. It should never be greater than 3:
* 0 - events/
* 1 - events/group/
* 2 - events/group/event/
* 3 - events/group/event/file
*/
- if (WARN_ON_ONCE(level > 3))
return;
- if (ef->ei) {
/* search for nested folders or files */
list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
lockdep_is_held(&eventfs_mutex)) {
eventfs_remove_rec(ef_child, level + 1);
}
- }
- if (ef->created && ef->dentry)
d_invalidate(ef->dentry);
- list_del_rcu(&ef->list);
- call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+}
+/**
- eventfs_remove - remove eventfs dir or file from list
- @ef: eventfs_file to be removed.
- This function acquire the eventfs_mutex lock and call eventfs_remove_rec()
- */
+void eventfs_remove(struct eventfs_file *ef) +{
- if (!ef)
return;
- mutex_lock(&eventfs_mutex);
- eventfs_remove_rec(ef, 0);
- mutex_unlock(&eventfs_mutex);
+}
+/**
- eventfs_remove_events_dir - remove eventfs dir or file from list
- @dentry: events's dentry to be removed.
- This function remove events main directory
- */
+void eventfs_remove_events_dir(struct dentry *dentry) +{
- struct tracefs_inode *ti;
- struct eventfs_inode *ei;
- if (!dentry || !dentry->d_inode)
return;
- ti = get_tracefs(dentry->d_inode);
- if (!ti || !(ti->flags & TRACEFS_EVENT_INODE))
return;
- ei = ti->private;
- d_invalidate(dentry);
- dput(dentry);
- kfree(ei);
+} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index a51312ff803c..2c08edd4a739 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -40,6 +40,10 @@ int eventfs_add_top_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); +void eventfs_remove(struct eventfs_file *ef);
+void eventfs_remove_events_dir(struct dentry *dentry);
struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Adding create_file(), create_dir() to create file, dir at runtime when they are accessed.
These function will be called either from lookup of inode_operations or open of file_operations.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/inode.c | 47 +++++++++++++++++ include/linux/tracefs.h | 7 +++ 3 files changed, 164 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 322a77be5a56..34b5d3d8005b 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -46,6 +46,116 @@ struct eventfs_file {
static DEFINE_MUTEX(eventfs_mutex); DEFINE_STATIC_SRCU(eventfs_srcu); +/** + * create_file - create a file in the tracefs filesystem + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @parent: parent dentry for this file. + * @data: something that the caller will want to get to later on. + * The inode.i_private pointer will point to this value on + * the open() call. + * @fop: struct file_operations that should be used for this file. + * + * This is the basic "create a file" function for tracefs. It allows for a + * wide range of flexibility in creating a file. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the tracefs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If tracefs is not enabled in the kernel, the value -%ENODEV will be + * returned. + */ +struct dentry *create_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop) +{ + struct tracefs_inode *ti; + struct dentry *dentry; + struct inode *inode; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + if (WARN_ON_ONCE(!S_ISREG(mode))) + return NULL; + + dentry = eventfs_start_creating(name, parent); + + if (IS_ERR(dentry)) + return dentry; + + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) + return eventfs_failed_creating(dentry); + + inode->i_mode = mode; + inode->i_fop = fop; + inode->i_private = data; + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + d_instantiate(dentry, inode); + fsnotify_create(dentry->d_parent->d_inode, dentry); + return eventfs_end_creating(dentry); +} + +/** + * create_dir - create a dir in the tracefs filesystem + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @parent: parent dentry for this file. + * @data: something that the caller will want to get to later on. + * The inode.i_private pointer will point to this value on + * the open() call. + * @fop: struct file_operations that should be used for this dir. + * @iop: struct inode_operations that should be used for this dir. + * + * This is the basic "create a dir" function for eventfs. It allows for a + * wide range of flexibility in creating a dir. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the tracefs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If tracefs is not enabled in the kernel, the value -%ENODEV will be + * returned. + */ +struct dentry *create_dir(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop, + const struct inode_operations *iop) +{ + struct tracefs_inode *ti; + struct dentry *dentry; + struct inode *inode; + + WARN_ON(!S_ISDIR(mode)); + + dentry = eventfs_start_creating(name, parent); + if (IS_ERR(dentry)) + return dentry; + + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) + return eventfs_failed_creating(dentry); + + inode->i_mode = mode; + inode->i_op = iop; + inode->i_fop = fop; + inode->i_private = data; + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + + inc_nlink(inode); + d_instantiate(dentry, inode); + inc_nlink(dentry->d_parent->d_inode); + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); + return eventfs_end_creating(dentry); +}
static const struct file_operations eventfs_file_operations = { }; diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 7ef3a02766f5..7dc692a5fee1 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -451,6 +451,53 @@ struct dentry *tracefs_end_creating(struct dentry *dentry) return dentry; }
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) +{ + struct dentry *dentry; + int error; + + error = simple_pin_fs(&trace_fs_type, &tracefs_mount, + &tracefs_mount_count); + if (error) + return ERR_PTR(error); + + /* + * If the parent is not specified, we create it in the root. + * We need the root dentry to do this, which is in the super + * block. A pointer to that is in the struct vfsmount that we + * have around. + */ + if (!parent) + parent = tracefs_mount->mnt_root; + + if (unlikely(IS_DEADDIR(parent->d_inode))) + dentry = ERR_PTR(-ENOENT); + else + dentry = lookup_one_len(name, parent, strlen(name)); + + if (!IS_ERR(dentry) && dentry->d_inode) { + dput(dentry); + dentry = ERR_PTR(-EEXIST); + } + + if (IS_ERR(dentry)) + simple_release_fs(&tracefs_mount, &tracefs_mount_count); + + return dentry; +} + +struct dentry *eventfs_failed_creating(struct dentry *dentry) +{ + dput(dentry); + simple_release_fs(&tracefs_mount, &tracefs_mount_count); + return NULL; +} + +struct dentry *eventfs_end_creating(struct dentry *dentry) +{ + return dentry; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 2c08edd4a739..47c1b4d21735 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -23,6 +23,13 @@ struct file_operations;
struct eventfs_file;
+struct dentry *eventfs_start_creating(const char *name, + struct dentry *parent); + +struct dentry *eventfs_failed_creating(struct dentry *dentry); + +struct dentry *eventfs_end_creating(struct dentry *dentry); + struct dentry *eventfs_create_events_dir(const char *name, struct dentry *parent);
Some more nits.
Subject: eventfs: Implement functions to create files and dirs when accessed
On Thu, 13 Jul 2023 17:03:20 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding create_file(), create_dir() to create file, dir at runtime when they are accessed.
These function will be called either from lookup of inode_operations or open of file_operations.
Add create_file() and create_dir() functions to create the files and directories respectively when they are accessed. The functions will be called from the lookup operation of the inode_operations or from the open function of file_operations.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/inode.c | 47 +++++++++++++++++ include/linux/tracefs.h | 7 +++ 3 files changed, 164 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 322a77be5a56..34b5d3d8005b 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -46,6 +46,116 @@ struct eventfs_file { static DEFINE_MUTEX(eventfs_mutex); DEFINE_STATIC_SRCU(eventfs_srcu); +/**
- create_file - create a file in the tracefs filesystem
- @name: the name of the file to create.
- @mode: the permission that the file should have.
- @parent: parent dentry for this file.
- @data: something that the caller will want to get to later on.
The inode.i_private pointer will point to this value on
the open() call.
Again, should move the extra lines into the description.
- @fop: struct file_operations that should be used for this file.
- This is the basic "create a file" function for tracefs. It allows for a
- wide range of flexibility in creating a file.
- This function will return a pointer to a dentry if it succeeds. This
- pointer must be passed to the tracefs_remove() function when the file is
- to be removed (no automatic cleanup happens if your module is unloaded,
- you are responsible here.) If an error occurs, %NULL will be returned.
- If tracefs is not enabled in the kernel, the value -%ENODEV will be
- returned.
- */
+struct dentry *create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fop)
+{
- struct tracefs_inode *ti;
- struct dentry *dentry;
- struct inode *inode;
- if (!(mode & S_IFMT))
mode |= S_IFREG;
- if (WARN_ON_ONCE(!S_ISREG(mode)))
return NULL;
- dentry = eventfs_start_creating(name, parent);
- if (IS_ERR(dentry))
return dentry;
- inode = tracefs_get_inode(dentry->d_sb);
- if (unlikely(!inode))
return eventfs_failed_creating(dentry);
- inode->i_mode = mode;
- inode->i_fop = fop;
- inode->i_private = data;
- ti = get_tracefs(inode);
- ti->flags |= TRACEFS_EVENT_INODE;
- d_instantiate(dentry, inode);
- fsnotify_create(dentry->d_parent->d_inode, dentry);
- return eventfs_end_creating(dentry);
+}
+/**
- create_dir - create a dir in the tracefs filesystem
- @name: the name of the file to create.
- @mode: the permission that the file should have.
- @parent: parent dentry for this file.
- @data: something that the caller will want to get to later on.
The inode.i_private pointer will point to this value on
the open() call.
Move the extra lines down into the description here too.
- @fop: struct file_operations that should be used for this dir.
- @iop: struct inode_operations that should be used for this dir.
- This is the basic "create a dir" function for eventfs. It allows for a
- wide range of flexibility in creating a dir.
- This function will return a pointer to a dentry if it succeeds. This
- pointer must be passed to the tracefs_remove() function when the file is
- to be removed (no automatic cleanup happens if your module is unloaded,
- you are responsible here.) If an error occurs, %NULL will be returned.
- If tracefs is not enabled in the kernel, the value -%ENODEV will be
- returned.
- */
+struct dentry *create_dir(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fop,
const struct inode_operations *iop)
+{
- struct tracefs_inode *ti;
- struct dentry *dentry;
- struct inode *inode;
- WARN_ON(!S_ISDIR(mode));
- dentry = eventfs_start_creating(name, parent);
- if (IS_ERR(dentry))
return dentry;
- inode = tracefs_get_inode(dentry->d_sb);
- if (unlikely(!inode))
return eventfs_failed_creating(dentry);
- inode->i_mode = mode;
- inode->i_op = iop;
- inode->i_fop = fop;
- inode->i_private = data;
- ti = get_tracefs(inode);
- ti->flags |= TRACEFS_EVENT_INODE;
- inc_nlink(inode);
- d_instantiate(dentry, inode);
- inc_nlink(dentry->d_parent->d_inode);
- fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
- return eventfs_end_creating(dentry);
+} static const struct file_operations eventfs_file_operations = { }; diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 7ef3a02766f5..7dc692a5fee1 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -451,6 +451,53 @@ struct dentry *tracefs_end_creating(struct dentry *dentry) return dentry; }
This needs to be documented. What about:
/** * eventfs_start_creating - start the process of creating a dentry * @name: Name of the file created for the dentry * @parent: The parent dentry where this dentry will be created * * This is a simple helper function for the dynamically created eventfs * files. When the directory of the eventfs files are accessed, their * dentries are created on the fly. This function is used to start that * process. */
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) +{
- struct dentry *dentry;
- int error;
- error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
&tracefs_mount_count);
- if (error)
return ERR_PTR(error);
- /*
* If the parent is not specified, we create it in the root.
* We need the root dentry to do this, which is in the super
* block. A pointer to that is in the struct vfsmount that we
* have around.
*/
- if (!parent)
parent = tracefs_mount->mnt_root;
- if (unlikely(IS_DEADDIR(parent->d_inode)))
dentry = ERR_PTR(-ENOENT);
- else
dentry = lookup_one_len(name, parent, strlen(name));
- if (!IS_ERR(dentry) && dentry->d_inode) {
dput(dentry);
dentry = ERR_PTR(-EEXIST);
- }
- if (IS_ERR(dentry))
simple_release_fs(&tracefs_mount, &tracefs_mount_count);
- return dentry;
+}
/** * eventfs_failed_creating - clean up a failed eventfs dentry creation * @dentry: The dentry to clean up * * If after calling eventfs_start_creating(), a failure is detected, the * resources created by eventfs_start_creating() needs to be cleaned up. In * that case, this function should be called to perform that clean up. */
+struct dentry *eventfs_failed_creating(struct dentry *dentry) +{
- dput(dentry);
- simple_release_fs(&tracefs_mount, &tracefs_mount_count);
- return NULL;
+}
/** * eventfs_end_creating - Finish the process of creating a eventfs dentry * @dentry: The dentry that has successfully been created. * * This function is currently just a place holder to match * eventfs_start_creating(). In case any synchronization needs to be added, * this function will be used to implement that without having to modify * the callers of eventfs_start_creating(). */
+struct dentry *eventfs_end_creating(struct dentry *dentry) +{
- return dentry;
+}
-- Steve
/**
- tracefs_create_file - create a file in the tracefs filesystem
- @name: a pointer to a string containing the name of the file to
create. diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 2c08edd4a739..47c1b4d21735 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -23,6 +23,13 @@ struct file_operations; struct eventfs_file; +struct dentry *eventfs_start_creating(const char *name,
struct dentry *parent);
+struct dentry *eventfs_failed_creating(struct dentry *dentry);
+struct dentry *eventfs_end_creating(struct dentry *dentry);
struct dentry *eventfs_create_events_dir(const char *name, struct dentry *parent);
Adding following inode_operations, file_operations and helper functions to eventfs: dcache_dir_open_wrapper() eventfs_root_lookup() eventfs_release() eventfs_set_ef_status_free() eventfs_post_create_dir()
inode_operations, file_operations will be called from vfs.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/event_inode.c | 194 ++++++++++++++++++++++++++++++++++++++- include/linux/tracefs.h | 2 + 2 files changed, 194 insertions(+), 2 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 34b5d3d8005b..7167340ac29e 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -67,7 +67,7 @@ DEFINE_STATIC_SRCU(eventfs_srcu); * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ -struct dentry *create_file(const char *name, umode_t mode, +static struct dentry *create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fop) { @@ -123,7 +123,7 @@ struct dentry *create_file(const char *name, umode_t mode, * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ -struct dentry *create_dir(const char *name, umode_t mode, +static struct dentry *create_dir(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fop, const struct inode_operations *iop) @@ -157,10 +157,200 @@ struct dentry *create_dir(const char *name, umode_t mode, return eventfs_end_creating(dentry); }
+/** + * eventfs_set_ef_status_free - set the ef->status to free + * @dentry: dentry who's status to be freed + * + * eventfs_set_ef_status_free will be called if no more + * reference remains + */ +void eventfs_set_ef_status_free(struct dentry *dentry) +{ + struct tracefs_inode *ti_parent; + struct eventfs_file *ef; + + ti_parent = get_tracefs(dentry->d_parent->d_inode); + if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) + return; + + ef = dentry->d_fsdata; + if (!ef) + return; + ef->created = false; + ef->dentry = NULL; +} + +/** + * eventfs_post_create_dir - post create dir routine + * @ef: eventfs_file of recently created dir + * + * Files with-in eventfs dir should know dentry of parent dir + */ +static void eventfs_post_create_dir(struct eventfs_file *ef) +{ + struct eventfs_file *ef_child; + struct tracefs_inode *ti; + + /* srcu lock already held */ + /* fill parent-child relation */ + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, + srcu_read_lock_held(&eventfs_srcu)) { + ef_child->d_parent = ef->dentry; + } + + ti = get_tracefs(ef->dentry->d_inode); + ti->private = ef->ei; +} + +/** + * eventfs_root_lookup - lookup routine to create file/dir + * @dir: directory in which lookup to be done + * @dentry: file/dir dentry + * @flags: + * + * Used to create dynamic file/dir with-in @dir, search with-in ei + * list, if @dentry found go ahead and create the file/dir + */ + +static struct dentry *eventfs_root_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *ret = NULL; + int idx; + + ti = get_tracefs(dir); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return NULL; + + ei = ti->private; + idx = srcu_read_lock(&eventfs_srcu); + list_for_each_entry_srcu(ef, &ei->e_top_files, list, + srcu_read_lock_held(&eventfs_srcu)) { + if (strcmp(ef->name, dentry->d_name.name)) + continue; + ret = simple_lookup(dir, dentry, flags); + if (ef->created) + continue; + mutex_lock(&eventfs_mutex); + ef->created = true; + if (ef->ei) + ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent, + ef->data, ef->fop, ef->iop); + else + ef->dentry = create_file(ef->name, ef->mode, ef->d_parent, + ef->data, ef->fop); + + if (IS_ERR_OR_NULL(ef->dentry)) { + ef->created = false; + mutex_unlock(&eventfs_mutex); + } else { + if (ef->ei) + eventfs_post_create_dir(ef); + ef->dentry->d_fsdata = ef; + mutex_unlock(&eventfs_mutex); + dput(ef->dentry); + } + break; + } + srcu_read_unlock(&eventfs_srcu, idx); + return ret; +} + +/** + * eventfs_release - called to release eventfs file/dir + * @inode: inode to be released + * @file: file to be released (not used) + */ +static int eventfs_release(struct inode *inode, struct file *file) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + int idx; + + ti = get_tracefs(inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + idx = srcu_read_lock(&eventfs_srcu); + list_for_each_entry_srcu(ef, &ei->e_top_files, list, + srcu_read_lock_held(&eventfs_srcu)) { + if (ef->created) + dput(ef->dentry); + } + srcu_read_unlock(&eventfs_srcu, idx); + return dcache_dir_close(inode, file); +} + +/** + * dcache_dir_open_wrapper - eventfs open wrapper + * @inode: not used + * @file: dir to be opened (to create it's child) + * + * Used to dynamic create file/dir with-in @file, all the + * file/dir will be created. If already created then reference + * will be increased + */ +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *dentry = file_dentry(file); + struct inode *f_inode = file_inode(file); + int idx; + + ti = get_tracefs(f_inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + idx = srcu_read_lock(&eventfs_srcu); + list_for_each_entry_rcu(ef, &ei->e_top_files, list) { + if (ef->created) { + dget(ef->dentry); + continue; + } + mutex_lock(&eventfs_mutex); + ef->created = true; + + inode_lock(dentry->d_inode); + if (ef->ei) + ef->dentry = create_dir(ef->name, ef->mode, dentry, + ef->data, ef->fop, ef->iop); + else + ef->dentry = create_file(ef->name, ef->mode, dentry, + ef->data, ef->fop); + inode_unlock(dentry->d_inode); + + if (IS_ERR_OR_NULL(ef->dentry)) { + ef->created = false; + } else { + if (ef->ei) + eventfs_post_create_dir(ef); + ef->dentry->d_fsdata = ef; + } + mutex_unlock(&eventfs_mutex); + } + srcu_read_unlock(&eventfs_srcu, idx); + return dcache_dir_open(inode, file); +} + static const struct file_operations eventfs_file_operations = { + .open = dcache_dir_open_wrapper, + .read = generic_read_dir, + .iterate_shared = dcache_readdir, + .llseek = generic_file_llseek, + .release = eventfs_release, };
static const struct inode_operations eventfs_root_dir_inode_operations = { + .lookup = eventfs_root_lookup, };
/** diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 47c1b4d21735..4d30b0cafc5f 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -51,6 +51,8 @@ void eventfs_remove(struct eventfs_file *ef);
void eventfs_remove_events_dir(struct dentry *dentry);
+void eventfs_set_ef_status_free(struct dentry *dentry); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Some more nits.
On Thu, 13 Jul 2023 17:03:21 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding following inode_operations, file_operations and helper functions to eventfs:
Add the inode_operations, file_operations, and helper functions to eventfs:
dcache_dir_open_wrapper() eventfs_root_lookup() eventfs_release() eventfs_set_ef_status_free() eventfs_post_create_dir()
inode_operations, file_operations will be called from vfs.
The inode_operations and file_operations functions will be called from the VFS layer.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/event_inode.c | 194 ++++++++++++++++++++++++++++++++++++++- include/linux/tracefs.h | 2 + 2 files changed, 194 insertions(+), 2 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 34b5d3d8005b..7167340ac29e 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -67,7 +67,7 @@ DEFINE_STATIC_SRCU(eventfs_srcu);
- If tracefs is not enabled in the kernel, the value -%ENODEV will be
- returned.
*/
-struct dentry *create_file(const char *name, umode_t mode, +static struct dentry *create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fop) { @@ -123,7 +123,7 @@ struct dentry *create_file(const char *name, umode_t mode,
- If tracefs is not enabled in the kernel, the value -%ENODEV will be
- returned.
*/ -struct dentry *create_dir(const char *name, umode_t mode, +static struct dentry *create_dir(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fop, const struct inode_operations *iop)
Please fold these changes into the patches that added these functions. They should have been created as static functions.
If you are worried about the warnings that they caused by being static and not used, perhaps these patches should be reordered, and we add this patch first, with those functions as stubs.
That is, build the VFS functions first with just place holders for the internals, and then fill in the internals. That would actually make more sense in reviewing the code.
@@ -157,10 +157,200 @@ struct dentry *create_dir(const char *name, umode_t mode, return eventfs_end_creating(dentry); } +/**
- eventfs_set_ef_status_free - set the ef->status to free
- @dentry: dentry who's status to be freed
- eventfs_set_ef_status_free will be called if no more
- reference remains
"references remain"
- */
+void eventfs_set_ef_status_free(struct dentry *dentry) +{
- struct tracefs_inode *ti_parent;
- struct eventfs_file *ef;
- ti_parent = get_tracefs(dentry->d_parent->d_inode);
- if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
return;
- ef = dentry->d_fsdata;
- if (!ef)
return;
- ef->created = false;
- ef->dentry = NULL;
+}
+/**
- eventfs_post_create_dir - post create dir routine
- @ef: eventfs_file of recently created dir
- Files with-in eventfs dir should know dentry of parent dir
* Map the meta-data of files within an eventfs dir to their parent dentry.
- */
+static void eventfs_post_create_dir(struct eventfs_file *ef) +{
- struct eventfs_file *ef_child;
- struct tracefs_inode *ti;
- /* srcu lock already held */
- /* fill parent-child relation */
- list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
srcu_read_lock_held(&eventfs_srcu)) {
ef_child->d_parent = ef->dentry;
- }
- ti = get_tracefs(ef->dentry->d_inode);
- ti->private = ef->ei;
+}
+/**
- eventfs_root_lookup - lookup routine to create file/dir
- @dir: directory in which lookup to be done
"in which a lookup is being done"
- @dentry: file/dir dentry
- @flags:
?
"unused" or "to pass as flags parameter to simple lookup"?
- Used to create dynamic file/dir with-in @dir, search with-in ei
- list, if @dentry found go ahead and create the file/dir
"within" is a real word ;-)
* Used to create a dynamic file/dir within @dir. Use the eventfs_inode * list of meta data to find the information needed to create the file/dir.
- */
+static struct dentry *eventfs_root_lookup(struct inode *dir,
struct dentry *dentry,
unsigned int flags)
+{
- struct tracefs_inode *ti;
- struct eventfs_inode *ei;
- struct eventfs_file *ef;
- struct dentry *ret = NULL;
- int idx;
- ti = get_tracefs(dir);
- if (!(ti->flags & TRACEFS_EVENT_INODE))
return NULL;
- ei = ti->private;
- idx = srcu_read_lock(&eventfs_srcu);
- list_for_each_entry_srcu(ef, &ei->e_top_files, list,
srcu_read_lock_held(&eventfs_srcu)) {
if (strcmp(ef->name, dentry->d_name.name))
continue;
ret = simple_lookup(dir, dentry, flags);
if (ef->created)
continue;
mutex_lock(&eventfs_mutex);
ef->created = true;
if (ef->ei)
ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
ef->data, ef->fop, ef->iop);
else
ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
ef->data, ef->fop);
if (IS_ERR_OR_NULL(ef->dentry)) {
ef->created = false;
mutex_unlock(&eventfs_mutex);
} else {
if (ef->ei)
eventfs_post_create_dir(ef);
ef->dentry->d_fsdata = ef;
mutex_unlock(&eventfs_mutex);
dput(ef->dentry);
}
break;
- }
- srcu_read_unlock(&eventfs_srcu, idx);
- return ret;
+}
+/**
- eventfs_release - called to release eventfs file/dir
- @inode: inode to be released
- @file: file to be released (not used)
- */
+static int eventfs_release(struct inode *inode, struct file *file) +{
- struct tracefs_inode *ti;
- struct eventfs_inode *ei;
- struct eventfs_file *ef;
- int idx;
- ti = get_tracefs(inode);
- if (!(ti->flags & TRACEFS_EVENT_INODE))
return -EINVAL;
- ei = ti->private;
- idx = srcu_read_lock(&eventfs_srcu);
- list_for_each_entry_srcu(ef, &ei->e_top_files, list,
srcu_read_lock_held(&eventfs_srcu)) {
if (ef->created)
dput(ef->dentry);
- }
- srcu_read_unlock(&eventfs_srcu, idx);
- return dcache_dir_close(inode, file);
+}
+/**
- dcache_dir_open_wrapper - eventfs open wrapper
- @inode: not used
- @file: dir to be opened (to create it's child)
"its child"
- Used to dynamic create file/dir with-in @file, all the
- file/dir will be created. If already created then reference
- will be increased
* Used to dynamically create the file/dir within @file. @file is really a * directory and all the files/dirs of the children within @file will be * created. If any of the files/dirs have already been created, their * reference count will be incremented.
- */
+static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) +{
- struct tracefs_inode *ti;
- struct eventfs_inode *ei;
- struct eventfs_file *ef;
- struct dentry *dentry = file_dentry(file);
- struct inode *f_inode = file_inode(file);
- int idx;
- ti = get_tracefs(f_inode);
- if (!(ti->flags & TRACEFS_EVENT_INODE))
return -EINVAL;
- ei = ti->private;
- idx = srcu_read_lock(&eventfs_srcu);
- list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
if (ef->created) {
dget(ef->dentry);
continue;
}
mutex_lock(&eventfs_mutex);
ef->created = true;
inode_lock(dentry->d_inode);
if (ef->ei)
ef->dentry = create_dir(ef->name, ef->mode, dentry,
ef->data, ef->fop, ef->iop);
else
ef->dentry = create_file(ef->name, ef->mode, dentry,
ef->data, ef->fop);
inode_unlock(dentry->d_inode);
if (IS_ERR_OR_NULL(ef->dentry)) {
ef->created = false;
} else {
if (ef->ei)
eventfs_post_create_dir(ef);
ef->dentry->d_fsdata = ef;
}
mutex_unlock(&eventfs_mutex);
- }
- srcu_read_unlock(&eventfs_srcu, idx);
- return dcache_dir_open(inode, file);
+}
static const struct file_operations eventfs_file_operations = {
- .open = dcache_dir_open_wrapper,
- .read = generic_read_dir,
- .iterate_shared = dcache_readdir,
- .llseek = generic_file_llseek,
- .release = eventfs_release,
}; static const struct inode_operations eventfs_root_dir_inode_operations = {
- .lookup = eventfs_root_lookup,
}; /** diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 47c1b4d21735..4d30b0cafc5f 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -51,6 +51,8 @@ void eventfs_remove(struct eventfs_file *ef); void eventfs_remove_events_dir(struct dentry *dentry); +void eventfs_set_ef_status_free(struct dentry *dentry);
Shouldn't this be internal to the tracefs code, and not something exposed to users of tracefs?
struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
-- Steve
Creating tracefs_inode_cache which is a cache of tracefs_inode. Adding helping functions: tracefs_alloc_inode() tracefs_free_inode()
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 7dc692a5fee1..76820d3e97eb 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -21,13 +21,33 @@ #include <linux/parser.h> #include <linux/magic.h> #include <linux/slab.h> +#include "internal.h"
#define TRACEFS_DEFAULT_MODE 0700 +static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
static struct vfsmount *tracefs_mount; static int tracefs_mount_count; static bool tracefs_registered;
+static struct inode *tracefs_alloc_inode(struct super_block *sb) +{ + struct tracefs_inode *ti; + + ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL); + if (!ti) + return NULL; + + ti->flags = 0; + + return &ti->vfs_inode; +} + +static void tracefs_free_inode(struct inode *inode) +{ + kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); +} + static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) { @@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root) }
static const struct super_operations tracefs_super_operations = { + .alloc_inode = tracefs_alloc_inode, + .free_inode = tracefs_free_inode, + .drop_inode = generic_delete_inode, .statfs = simple_statfs, .remount_fs = tracefs_remount, .show_options = tracefs_show_options, @@ -675,10 +698,26 @@ bool tracefs_initialized(void) return tracefs_registered; }
+static void init_once(void *foo) +{ + struct tracefs_inode *ti = (struct tracefs_inode *) foo; + + inode_init_once(&ti->vfs_inode); +} + static int __init tracefs_init(void) { int retval;
+ tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache", + sizeof(struct tracefs_inode), + 0, (SLAB_RECLAIM_ACCOUNT| + SLAB_MEM_SPREAD| + SLAB_ACCOUNT), + init_once); + if (!tracefs_inode_cachep) + return -ENOMEM; + retval = sysfs_create_mount_point(kernel_kobj, "tracing"); if (retval) return -EINVAL;
I wonder if this should be patch 2?
On Thu, 13 Jul 2023 17:03:22 +0530 Ajay Kaher akaher@vmware.com wrote:
Creating tracefs_inode_cache which is a cache of tracefs_inode.
Create a kmem cache of tracefs_inodes. To be more efficient, as there are lots of tracefs inodes, create its own cache. This also allows to see how many tracefs inodes have been created.
Adding helping functions:
Add helper functions:
tracefs_alloc_inode() tracefs_free_inode()
-- Steve
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 7dc692a5fee1..76820d3e97eb 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -21,13 +21,33 @@ #include <linux/parser.h> #include <linux/magic.h> #include <linux/slab.h> +#include "internal.h" #define TRACEFS_DEFAULT_MODE 0700 +static struct kmem_cache *tracefs_inode_cachep __ro_after_init; static struct vfsmount *tracefs_mount; static int tracefs_mount_count; static bool tracefs_registered; +static struct inode *tracefs_alloc_inode(struct super_block *sb) +{
- struct tracefs_inode *ti;
- ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL);
- if (!ti)
return NULL;
- ti->flags = 0;
- return &ti->vfs_inode;
+}
+static void tracefs_free_inode(struct inode *inode) +{
- kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+}
static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) { @@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root) } static const struct super_operations tracefs_super_operations = {
- .alloc_inode = tracefs_alloc_inode,
- .free_inode = tracefs_free_inode,
- .drop_inode = generic_delete_inode, .statfs = simple_statfs, .remount_fs = tracefs_remount, .show_options = tracefs_show_options,
@@ -675,10 +698,26 @@ bool tracefs_initialized(void) return tracefs_registered; } +static void init_once(void *foo) +{
- struct tracefs_inode *ti = (struct tracefs_inode *) foo;
- inode_init_once(&ti->vfs_inode);
+}
static int __init tracefs_init(void) { int retval;
- tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
sizeof(struct tracefs_inode),
0, (SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD|
SLAB_ACCOUNT),
init_once);
- if (!tracefs_inode_cachep)
return -ENOMEM;
- retval = sysfs_create_mount_point(kernel_kobj, "tracing"); if (retval) return -EINVAL;
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 18 ++++++++++ include/linux/trace_events.h | 1 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 67 +++++++++++++++++++----------------- 4 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 76820d3e97eb..a098d7153498 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, };
+static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) +{ + struct tracefs_inode *ti; + + if (!dentry || !inode) + return; + + ti = get_tracefs(inode); + if (ti && ti->flags & TRACEFS_EVENT_INODE) + eventfs_set_ef_status_free(dentry); + iput(inode); +} + +static const struct dentry_operations tracefs_dentry_operations = { + .d_iput = tracefs_dentry_iput, +}; + static int trace_fill_super(struct super_block *sb, void *data, int silent) { static const struct tree_descr trace_files[] = {{""}}; @@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent) goto fail;
sb->s_op = &tracefs_super_operations; + sb->s_d_op = &tracefs_dentry_operations;
tracefs_apply_options(sb, false);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 7c4a0b72334e..c6bb74cccea3 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -635,6 +635,7 @@ struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter; + struct eventfs_file *ef; struct dentry *dir; struct trace_array *tr; struct trace_subsystem_dir *system; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 79bdefe9261b..d9a0ec2b918e 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1309,7 +1309,7 @@ struct trace_subsystem_dir { struct list_head list; struct event_subsystem *subsystem; struct trace_array *tr; - struct dentry *entry; + struct eventfs_file *ef; int ref_count; int nr_events; }; diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 12ed71428939..4058178de682 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,7 +990,8 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) return;
if (!--dir->nr_events) { - tracefs_remove(dir->entry); + if (dir->ef) + eventfs_remove(dir->ef); list_del(&dir->list); __put_system_dir(dir); } @@ -1011,7 +1012,8 @@ static void remove_event_file_dir(struct trace_event_file *file)
tracefs_remove(dir); } - + if (file->ef) + eventfs_remove(file->ef); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); @@ -2297,13 +2299,13 @@ create_new_subsystem(const char *name) return NULL; }
-static struct dentry * +static struct eventfs_file * event_subsystem_dir(struct trace_array *tr, const char *name, struct trace_event_file *file, struct dentry *parent) { struct event_subsystem *system, *iter; struct trace_subsystem_dir *dir; - struct dentry *entry; + int res;
/* First see if we did not already create this dir */ list_for_each_entry(dir, &tr->systems, list) { @@ -2311,7 +2313,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name, if (strcmp(system->name, name) == 0) { dir->nr_events++; file->system = dir; - return dir->entry; + return dir->ef; } }
@@ -2335,8 +2337,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name, } else __get_system(system);
- dir->entry = tracefs_create_dir(name, parent); - if (!dir->entry) { + dir->ef = eventfs_add_subsystem_dir(name, parent); + if (IS_ERR(dir->ef)) { pr_warn("Failed to create system directory %s\n", name); __put_system(system); goto out_free; @@ -2351,22 +2353,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name, /* the ftrace system is special, do not create enable or filter files */ if (strcmp(name, "ftrace") != 0) {
- entry = tracefs_create_file("filter", TRACE_MODE_WRITE, - dir->entry, dir, + res = eventfs_add_file("filter", TRACE_MODE_WRITE, + dir->ef, dir, &ftrace_subsystem_filter_fops); - if (!entry) { + if (res) { kfree(system->filter); system->filter = NULL; pr_warn("Could not create tracefs '%s/filter' entry\n", name); }
- trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir, + eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir, &ftrace_system_enable_fops); }
list_add(&dir->list, &tr->systems);
- return dir->entry; + return dir->ef;
out_free: kfree(dir); @@ -2420,7 +2422,7 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) { struct trace_event_call *call = file->event_call; struct trace_array *tr = file->tr; - struct dentry *d_events; + struct eventfs_file *ef_subsystem = NULL; const char *name; int ret;
@@ -2432,24 +2434,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) return -ENODEV;
- d_events = event_subsystem_dir(tr, call->class->system, file, parent); - if (!d_events) + ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent); + if (!ef_subsystem) return -ENOMEM;
name = trace_event_name(call); - file->dir = tracefs_create_dir(name, d_events); - if (!file->dir) { + file->ef = eventfs_add_dir(name, ef_subsystem); + if (IS_ERR(file->ef)) { pr_warn("Could not create tracefs '%s' directory\n", name); return -1; }
if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) - trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file, + eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file, &ftrace_enable_fops);
#ifdef CONFIG_PERF_EVENTS if (call->event.type && call->class->reg) - trace_create_file("id", TRACE_MODE_READ, file->dir, + eventfs_add_file("id", TRACE_MODE_READ, file->ef, (void *)(long)call->event.type, &ftrace_event_id_fops); #endif @@ -2465,27 +2467,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) * triggers or filters. */ if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { - trace_create_file("filter", TRACE_MODE_WRITE, file->dir, + eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef, file, &ftrace_event_filter_fops);
- trace_create_file("trigger", TRACE_MODE_WRITE, file->dir, + eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, file, &event_trigger_fops); }
#ifdef CONFIG_HIST_TRIGGERS - trace_create_file("hist", TRACE_MODE_READ, file->dir, file, + eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file, &event_hist_fops); #endif #ifdef CONFIG_HIST_TRIGGERS_DEBUG - trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file, + eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file, &event_hist_debug_fops); #endif - trace_create_file("format", TRACE_MODE_READ, file->dir, call, + eventfs_add_file("format", TRACE_MODE_READ, file->ef, call, &ftrace_event_format_fops);
#ifdef CONFIG_TRACE_EVENT_INJECT if (call->event.type && call->class->reg) - trace_create_file("inject", 0200, file->dir, file, + eventfs_add_file("inject", 0200, file->ef, file, &event_inject_fops); #endif
@@ -3638,21 +3640,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) { struct dentry *d_events; struct dentry *entry; + int error = 0;
entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, tr, &ftrace_set_event_fops); if (!entry) return -ENOMEM;
- d_events = tracefs_create_dir("events", parent); - if (!d_events) { + d_events = eventfs_create_events_dir("events", parent); + if (IS_ERR(d_events)) { pr_warn("Could not create tracefs 'events' directory\n"); return -ENOMEM; }
- entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events, + error = eventfs_add_top_file("enable", TRACE_MODE_WRITE, d_events, tr, &ftrace_tr_enable_fops); - if (!entry) + if (error) return -ENOMEM;
/* There are not as crucial, just warn if they are not created */ @@ -3665,11 +3668,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) &ftrace_set_event_notrace_pid_fops);
/* ring buffer internal formats */ - trace_create_file("header_page", TRACE_MODE_READ, d_events, + eventfs_add_top_file("header_page", TRACE_MODE_READ, d_events, ring_buffer_print_page_header, &ftrace_show_header_fops);
- trace_create_file("header_event", TRACE_MODE_READ, d_events, + eventfs_add_top_file("header_event", TRACE_MODE_READ, d_events, ring_buffer_print_entry_header, &ftrace_show_header_fops);
@@ -3757,7 +3760,7 @@ int event_trace_del_tracer(struct trace_array *tr)
down_write(&trace_event_sem); __trace_remove_event_dirs(tr); - tracefs_remove(tr->event_dir); + eventfs_remove_events_dir(tr->event_dir); up_write(&trace_event_sem);
tr->event_dir = NULL;
On Thu, 13 Jul 2023 17:03:23 +0530 Ajay Kaher akaher@vmware.com wrote:
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
[ note: /sys/kernel/debug/tracing is deprecated. Please avoid referencing it. ]
Up until now, /sys/kernel/tracing/events was no different than any other part of tracefs. The files and directories within the events directory was created when the tracefs was mounted, and also created for the instances in /sys/kernel/tracing/instances/<instance>/events. Most of these files and directories will never be referenced. Since there are thousands of these files and directories they spend their time wasting precious memory resources.
Move the "events" directory to the new eventfs. The eventfs will take the meta data of the events that they represent and store that. When the files in the events directory are referenced, the dentry and inodes to represent them are then created. When the files are no longer referenced, they are freed. This saves the precious memory resources that were wasted on these seldom referenced dentries and inodes.
We should also include memory metrics of both improvements from /proc/slabinfo and /proc/meminfo.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 18 ++++++++++ include/linux/trace_events.h | 1 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 67 +++++++++++++++++++----------------- 4 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 76820d3e97eb..a098d7153498 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, }; +static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) +{
- struct tracefs_inode *ti;
- if (!dentry || !inode)
return;
- ti = get_tracefs(inode);
- if (ti && ti->flags & TRACEFS_EVENT_INODE)
eventfs_set_ef_status_free(dentry);
- iput(inode);
+}
+static const struct dentry_operations tracefs_dentry_operations = {
- .d_iput = tracefs_dentry_iput,
+};
static int trace_fill_super(struct super_block *sb, void *data, int silent) { static const struct tree_descr trace_files[] = {{""}}; @@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent) goto fail; sb->s_op = &tracefs_super_operations;
- sb->s_d_op = &tracefs_dentry_operations;
tracefs_apply_options(sb, false); diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 7c4a0b72334e..c6bb74cccea3 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -635,6 +635,7 @@ struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter;
- struct eventfs_file *ef; struct dentry *dir; struct trace_array *tr; struct trace_subsystem_dir *system;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 79bdefe9261b..d9a0ec2b918e 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1309,7 +1309,7 @@ struct trace_subsystem_dir { struct list_head list; struct event_subsystem *subsystem; struct trace_array *tr;
- struct dentry *entry;
- struct eventfs_file *ef; int ref_count; int nr_events;
}; diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 12ed71428939..4058178de682 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,7 +990,8 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) return; if (!--dir->nr_events) {
tracefs_remove(dir->entry);
if (dir->ef)
eventfs_remove(dir->ef);
Why the if statement? eventfs_remove() has:
void eventfs_remove(struct eventfs_file *ef) { if (!ef) return; [..] }
Let's get rid of that.
list_del(&dir->list); __put_system_dir(dir);
} @@ -1011,7 +1012,8 @@ static void remove_event_file_dir(struct trace_event_file *file) tracefs_remove(dir); }
- if (file->ef)
eventfs_remove(file->ef);
Same here.
list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); @@ -2297,13 +2299,13 @@ create_new_subsystem(const char *name) return NULL; } -static struct dentry * +static struct eventfs_file * event_subsystem_dir(struct trace_array *tr, const char *name, struct trace_event_file *file, struct dentry *parent) { struct event_subsystem *system, *iter; struct trace_subsystem_dir *dir;
- struct dentry *entry;
- int res;
/* First see if we did not already create this dir */ list_for_each_entry(dir, &tr->systems, list) { @@ -2311,7 +2313,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name, if (strcmp(system->name, name) == 0) { dir->nr_events++; file->system = dir;
return dir->entry;
} }return dir->ef;
@@ -2335,8 +2337,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name, } else __get_system(system);
- dir->entry = tracefs_create_dir(name, parent);
- if (!dir->entry) {
- dir->ef = eventfs_add_subsystem_dir(name, parent);
- if (IS_ERR(dir->ef)) { pr_warn("Failed to create system directory %s\n", name); __put_system(system); goto out_free;
@@ -2351,22 +2353,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name, /* the ftrace system is special, do not create enable or filter files */ if (strcmp(name, "ftrace") != 0) {
entry = tracefs_create_file("filter", TRACE_MODE_WRITE,
dir->entry, dir,
res = eventfs_add_file("filter", TRACE_MODE_WRITE,
dir->ef, dir, &ftrace_subsystem_filter_fops);
if (!entry) {
}if (res) { kfree(system->filter); system->filter = NULL; pr_warn("Could not create tracefs '%s/filter' entry\n", name);
trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir,
}eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir, &ftrace_system_enable_fops);
list_add(&dir->list, &tr->systems);
- return dir->entry;
- return dir->ef;
out_free: kfree(dir); @@ -2420,7 +2422,7 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) { struct trace_event_call *call = file->event_call; struct trace_array *tr = file->tr;
- struct dentry *d_events;
- struct eventfs_file *ef_subsystem = NULL;
Move this around to keep the upside-down-xmas tree order.
-- Steve
const char *name; int ret; @@ -2432,24 +2434,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) return -ENODEV;
- d_events = event_subsystem_dir(tr, call->class->system, file, parent);
- if (!d_events)
- ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent);
- if (!ef_subsystem) return -ENOMEM;
name = trace_event_name(call);
- file->dir = tracefs_create_dir(name, d_events);
- if (!file->dir) {
- file->ef = eventfs_add_dir(name, ef_subsystem);
- if (IS_ERR(file->ef)) { pr_warn("Could not create tracefs '%s' directory\n", name); return -1; }
if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE))
trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file,
eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file, &ftrace_enable_fops);
#ifdef CONFIG_PERF_EVENTS if (call->event.type && call->class->reg)
trace_create_file("id", TRACE_MODE_READ, file->dir,
eventfs_add_file("id", TRACE_MODE_READ, file->ef, (void *)(long)call->event.type, &ftrace_event_id_fops);
#endif @@ -2465,27 +2467,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) * triggers or filters. */ if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
trace_create_file("filter", TRACE_MODE_WRITE, file->dir,
eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef, file, &ftrace_event_filter_fops);
trace_create_file("trigger", TRACE_MODE_WRITE, file->dir,
}eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, file, &event_trigger_fops);
#ifdef CONFIG_HIST_TRIGGERS
- trace_create_file("hist", TRACE_MODE_READ, file->dir, file,
- eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file, &event_hist_fops);
#endif #ifdef CONFIG_HIST_TRIGGERS_DEBUG
- trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file,
- eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file, &event_hist_debug_fops);
#endif
- trace_create_file("format", TRACE_MODE_READ, file->dir, call,
- eventfs_add_file("format", TRACE_MODE_READ, file->ef, call, &ftrace_event_format_fops);
#ifdef CONFIG_TRACE_EVENT_INJECT if (call->event.type && call->class->reg)
trace_create_file("inject", 0200, file->dir, file,
eventfs_add_file("inject", 0200, file->ef, file, &event_inject_fops);
#endif @@ -3638,21 +3640,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) { struct dentry *d_events; struct dentry *entry;
- int error = 0;
entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, tr, &ftrace_set_event_fops); if (!entry) return -ENOMEM;
- d_events = tracefs_create_dir("events", parent);
- if (!d_events) {
- d_events = eventfs_create_events_dir("events", parent);
- if (IS_ERR(d_events)) { pr_warn("Could not create tracefs 'events' directory\n"); return -ENOMEM; }
- entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events,
- error = eventfs_add_top_file("enable", TRACE_MODE_WRITE, d_events, tr, &ftrace_tr_enable_fops);
- if (!entry)
- if (error) return -ENOMEM;
/* There are not as crucial, just warn if they are not created */ @@ -3665,11 +3668,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) &ftrace_set_event_notrace_pid_fops); /* ring buffer internal formats */
- trace_create_file("header_page", TRACE_MODE_READ, d_events,
- eventfs_add_top_file("header_page", TRACE_MODE_READ, d_events, ring_buffer_print_page_header, &ftrace_show_header_fops);
- trace_create_file("header_event", TRACE_MODE_READ, d_events,
- eventfs_add_top_file("header_event", TRACE_MODE_READ, d_events, ring_buffer_print_entry_header, &ftrace_show_header_fops);
@@ -3757,7 +3760,7 @@ int event_trace_del_tracer(struct trace_array *tr) down_write(&trace_event_sem); __trace_remove_event_dirs(tr);
- tracefs_remove(tr->event_dir);
- eventfs_remove_events_dir(tr->event_dir); up_write(&trace_event_sem);
tr->event_dir = NULL;
On 15-Jul-2023, at 2:36 AM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Thu, 13 Jul 2023 17:03:23 +0530 Ajay Kaher akaher@vmware.com wrote:
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
[ note: /sys/kernel/debug/tracing is deprecated. Please avoid referencing it. ]
Up until now, /sys/kernel/tracing/events was no different than any other part of tracefs. The files and directories within the events directory was created when the tracefs was mounted, and also created for the instances in /sys/kernel/tracing/instances/<instance>/events. Most of these files and directories will never be referenced. Since there are thousands of these files and directories they spend their time wasting precious memory resources.
Move the "events" directory to the new eventfs. The eventfs will take the meta data of the events that they represent and store that. When the files in the events directory are referenced, the dentry and inodes to represent them are then created. When the files are no longer referenced, they are freed. This saves the precious memory resources that were wasted on these seldom referenced dentries and inodes.
Some correction here:
The dentry and inodes to represent eventfs files or directories will be freed only during drop cache or eventfs_remove(). This is same as with other dynamic fs e.g. sysfs or procfs.
We can achieve ‘free the dentry and inodes if no longer requires’ using create_file()->d_instantiate_anon() instead create_file()->d_instantiate(), but I faced some issues. This optimisation we may consider in future along with sysfs, procfs.
-Ajay
On 15-Jul-2023, at 2:36 AM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Thu, 13 Jul 2023 17:03:23 +0530 Ajay Kaher akaher@vmware.com wrote:
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
[ note: /sys/kernel/debug/tracing is deprecated. Please avoid referencing it. ]
Up until now, /sys/kernel/tracing/events was no different than any other part of tracefs. The files and directories within the events directory was created when the tracefs was mounted, and also created for the instances in /sys/kernel/tracing/instances/<instance>/events. Most of these files and directories will never be referenced. Since there are thousands of these files and directories they spend their time wasting precious memory resources.
Move the "events" directory to the new eventfs. The eventfs will take the meta data of the events that they represent and store that. When the files in the events directory are referenced, the dentry and inodes to represent them are then created. When the files are no longer referenced, they are freed. This saves the precious memory resources that were wasted on these seldom referenced dentries and inodes.
Some correction here:
The dentry and inodes to represent eventfs files or directories will be freed only during drop cache or eventfs_remove(). This is same as with other dynamic fs e.g. sysfs or procfs.
We can achieve ‘free the dentry and inodes if no longer requires’ using create_file()->d_instantiate_anon() instead create_file()->d_instantiate(), but I faced some issues. This optimisation we may consider in future along with sysfs, procfs.
-Ajay
Ajay Kaher akaher@vmware.com writes:
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 18 ++++++++++ include/linux/trace_events.h | 1 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 67 +++++++++++++++++++----------------- 4 files changed, 55 insertions(+), 33 deletions(-)
With this patchset in next-20230908 the following crash is observed on s390 while running ftracetest test.d/instances/instance-event.tc:
(It also crashes without KASAN, just not on every try)
[ 26.728436] ================================================================== [ 26.728450] BUG: KASAN: slab-use-after-free in __ftrace_event_enable_disable+0x56c/0x648 [ 26.728460] Read of size 8 at addr 00000000107d3838 by task ftracetest/514 [ 26.728465] [ 26.728468] CPU: 1 PID: 514 Comm: ftracetest Not tainted 6.5.0-rc4-00014-g27152bceea1d #958 [ 26.728474] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0) [ 26.728478] Call Trace: [ 26.728482] [<000000000262349e>] dump_stack_lvl+0x106/0x1c8 [ 26.728490] [<00000000009e3b8c>] print_address_description.constprop.0+0x34/0x378 [ 26.728498] [<00000000009e3f7c>] print_report+0xac/0x240 [ 26.728504] [<00000000009e42fa>] kasan_report+0xf2/0x130 [ 26.728509] [<00000000005c61c4>] __ftrace_event_enable_disable+0x56c/0x648 [ 26.728516] [<00000000005c6d02>] event_enable_write+0x132/0x218 [ 26.728522] [<0000000000a89598>] vfs_write+0x208/0x930 [ 26.728528] [<0000000000a8a148>] ksys_write+0x118/0x200 [ 26.728535] [<000000000010c5fc>] do_syscall+0x22c/0x328 [ 26.728547] [<000000000268e8b2>] __do_syscall+0x9a/0xf8 [ 26.728553] [<00000000026c1d28>] system_call+0x70/0x98 [ 26.728559] 3 locks held by ftracetest/514: [ 26.728562] #0: 00000000226addd8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x84/0xa0 [ 26.728577] #1: 0000000008f19440 (sb_writers#14){.+.+}-{0:0}, at: ksys_write+0x118/0x200 [ 26.728594] #2: 00000000033205d0 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0xdc/0x218 [ 26.728609] [ 26.728613] Allocated by task 515: [ 26.728617] kasan_save_stack+0x40/0x68 [ 26.728627] kasan_set_track+0x36/0x48 [ 26.728632] __kasan_slab_alloc+0x8e/0xb8 [ 26.728638] kmem_cache_alloc+0x1ec/0x568 [ 26.728643] trace_create_new_event+0x88/0x3d8 [ 26.728648] event_trace_add_tracer+0x13a/0x2a0 [ 26.728654] trace_array_create_dir+0x80/0x1a0 [ 26.728660] trace_array_create+0x344/0x700 [ 26.728665] instance_mkdir+0xbe/0x118 [ 26.728670] tracefs_syscall_mkdir+0xa8/0xf8 [ 26.728678] vfs_mkdir+0x454/0x6c0 [ 26.728685] do_mkdirat+0x24e/0x2a8 [ 26.728692] __s390x_sys_mkdir+0xf4/0x168 [ 26.728696] do_syscall+0x22c/0x328 [ 26.728700] __do_syscall+0x9a/0xf8 [ 26.728704] system_call+0x70/0x98 [ 26.728710] [ 26.728713] Freed by task 553: [ 26.728716] kasan_save_stack+0x40/0x68 [ 26.728721] kasan_set_track+0x36/0x48 [ 26.728726] kasan_save_free_info+0x42/0x60 [ 26.728733] ____kasan_slab_free+0x17c/0x1e0 [ 26.728741] __kasan_slab_free+0x24/0x30 [ 26.728747] slab_free_freelist_hook+0x272/0x3a0 [ 26.728751] kmem_cache_free+0xec/0x540 [ 26.728757] event_trace_del_tracer+0x108/0x1f8 [ 26.728763] __remove_instance+0x242/0x668 [ 26.728781] instance_rmdir+0xc4/0x108 [ 26.728788] tracefs_syscall_rmdir+0xd4/0x160 [ 26.728796] vfs_rmdir+0x18a/0x538 [ 26.728802] do_rmdir+0x2c0/0x358 [ 26.728807] __s390x_sys_rmdir+0xd2/0x138 [ 26.728811] do_syscall+0x22c/0x328 [ 26.728817] __do_syscall+0x9a/0xf8 [ 26.728823] system_call+0x70/0x98 [ 26.728829] [ 26.728833] The buggy address belongs to the object at 00000000107d3828 [ 26.728833] which belongs to the cache trace_event_file of size 96 [ 26.728841] The buggy address is located 16 bytes inside of [ 26.728841] freed 96-byte region [00000000107d3828, 00000000107d3888) [ 26.728846] [ 26.728848] The buggy address belongs to the physical page: [ 26.728851] page:000040000041f4c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x107d3 [ 26.728871] flags: 0x1ffff00000000200(slab|node=0|zone=0|lastcpupid=0x1ffff) [ 26.728880] page_type: 0xffffffff() [ 26.728889] raw: 1ffff00000000200 0000000005a2c100 0000400000891490 00004000006a89d0 [ 26.728894] raw: 0000000000000000 0013002600000000 ffffffff00000001 0000000000000000 [ 26.728898] page dumped because: kasan: bad access detected [ 26.728902] [ 26.728906] Memory state around the buggy address: [ 26.728912] 00000000107d3700: fc fc fc fc fc fc fc fc fc fc fc fa fb fb fb fb [ 26.728919] 00000000107d3780: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc [ 26.728928] >00000000107d3800: fc fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb [ 26.728933] ^ [ 26.728940] 00000000107d3880: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fa [ 26.728943] 00000000107d3900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc [ 26.728946] ================================================================== [ 26.728955] Disabling lock debugging due to kernel taint [ 27.089717] Unable to handle kernel pointer dereference in virtual kernel address space [ 27.089725] Failing address: 0d896d6d6d6d6000 TEID: 0d896d6d6d6d6803 [ 27.089729] Fault in home space mode while using kernel ASCE. [ 27.089737] AS:00000000046b800b R2:0000000000000028 [ 27.089770] Oops: 0038 ilc:2 [#1] PREEMPT SMP [ 27.089779] Modules linked in: [ 27.089784] CPU: 1 PID: 514 Comm: ftracetest Tainted: G B 6.5.0-rc4-00014-g27152bceea1d #958 [ 27.089790] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0) [ 27.089794] Krnl PSW : 0404e00180000000 00000000005c4488 (trace_event_buffer_reserve+0xf0/0x3e8) [ 27.089804] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 27.089810] Krnl GPRS: 0000000000000001 0000000000000001 6b6b6b6b6b6b6c5b 001c000000000000 [ 27.089816] 0d896d6d6d6d6d8b 00000000005c4702 001bff80013cf170 6b6b6b6b6b6b6b6b [ 27.089822] 6b6b6b6b6b6b6b6b 00037ff000279e26 00000000107d3828 001bff80013cf208 [ 27.089827] 00000000031669f8 00000000026c1c1c 00000000005c4702 001bff80013cf078 [ 27.089843] Krnl Code: 00000000005c447a: 412080f0 la %r2,240(%r8) [ 27.089843] 00000000005c447e: eb420003000c srlg %r4,%r2,3 [ 27.089843] #00000000005c4484: b9080043 agr %r4,%r3 [ 27.089843] >00000000005c4488: 95004000 cli 0(%r4),0 [ 27.089843] 00000000005c448c: a7740142 brc 7,00000000005c4710 [ 27.089843] 00000000005c4490: e3e080f00004 lg %r14,240(%r8) [ 27.089843] 00000000005c4496: e3e0f0a80024 stg %r14,168(%r15) [ 27.089843] 00000000005c449c: 412080f8 la %r2,248(%r8) [ 27.089885] Call Trace: [ 27.089890] [<00000000005c4488>] trace_event_buffer_reserve+0xf0/0x3e8 [ 27.089897] ([<00000000005c4702>] trace_event_buffer_reserve+0x36a/0x3e8) [ 27.089903] [<000000000028979e>] trace_event_raw_event_sched_switch+0xce/0x430 [ 27.089910] [<00000000026a1cd4>] __schedule+0xba4/0x1f58 [ 27.089916] [<00000000026a4544>] preempt_schedule_irq+0xdc/0x190 [ 27.089925] [<000000000268f99c>] irqentry_exit+0xac/0x128 [ 27.089931] [<00000000026c1fac>] ext_int_handler+0xc4/0xf0 [ 27.089936] [<0000000000135a98>] __unwind_start+0x190/0x568 [ 27.089944] ([<0000000000135a7c>] __unwind_start+0x174/0x568) [ 27.089950] [<000000000013bb90>] arch_stack_walk+0x130/0x1d0 [ 27.089957] [<0000000000424198>] stack_trace_save+0xc8/0xf8 [ 27.089968] [<00000000009d3468>] set_track_prepare+0x38/0x58 [ 27.089974] [<00000000009df35a>] free_to_partial_list+0x1ea/0x2f0 [ 27.089979] [<00000000009df656>] __slab_free+0x1f6/0x3c0 [ 27.089984] [<00000000009e19e8>] ___cache_free+0x158/0x188 [ 27.089989] [<00000000009e6df6>] qlist_free_all+0x7e/0x150 [ 27.089994] [<00000000009e752a>] kasan_quarantine_reduce+0x17a/0x1d8 [ 27.089999] [<00000000009e3488>] __kasan_slab_alloc+0x98/0xb8 [ 27.090005] [<00000000009de3fc>] kmem_cache_alloc+0x1ec/0x568 [ 27.090011] [<0000000000ab1d8e>] getname_flags.part.0+0x56/0x420 [ 27.090018] [<0000000000a81244>] do_sys_openat2+0xc4/0x178 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 01. [ 27.090027] [<0000000000a815ae>] do_sys_open+0xce/0x118 [ 27.090033] [<000000000010c5fc>] do_syscall+0x22c/0x328 [ 27.090040] [<000000000268e8b2>] __do_syscall+0x9a/0xf8 [ 27.090048] [<00000000026c1d28>] system_call+0x70/0x98 [ 27.090056] INFO: lockdep is turned off. [ 27.090060] Last Breaking-Event-Address: [ 27.090066] [<00000000005c4702>] trace_event_buffer_reserve+0x36a/0x3e8 [ 27.090077] Kernel panic - not syncing: Fatal exception: panic_on_oops
On Fri, 08 Sep 2023 14:14:20 +0200 Sven Schnelle svens@linux.ibm.com wrote:
Ajay Kaher akaher@vmware.com writes:
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 18 ++++++++++ include/linux/trace_events.h | 1 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 67 +++++++++++++++++++----------------- 4 files changed, 55 insertions(+), 33 deletions(-)
With this patchset in next-20230908 the following crash is observed on s390 while running ftracetest test.d/instances/instance-event.tc:
(It also crashes without KASAN, just not on every try)
Yep, and I've spent the last few days fixing this :-)
https://lore.kernel.org/linux-trace-kernel/20230907024710.866917011@goodmis....
-- Steve
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org --- .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..523cfb64539f 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,14 @@ mips*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..b9f8c3f8bae8 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,14 @@ loongarch*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
On Thu, 13 Jul 2023 17:03:24 +0530 Ajay Kaher akaher@vmware.com wrote:
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org
As this patch as is will break when running on older kernels, I was wondering if we should do this instead?
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..ff7499eb98d6 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,19 @@ mips*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then + DIR_NAME="eventfs_add_dir" +else + DIR_NAME="tracefs_create_dir" +fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..a202b2ea4baf 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,19 @@ loongarch*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then + DIR_NAME="eventfs_add_dir" +else + DIR_NAME="tracefs_create_dir" +fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
-- Steve
.../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..523cfb64539f 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,14 @@ mips*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..b9f8c3f8bae8 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,14 @@ loongarch*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
On Thu, 13 Jul 2023 22:37:58 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Thu, 13 Jul 2023 17:03:24 +0530 Ajay Kaher akaher@vmware.com wrote:
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org
As this patch as is will break when running on older kernels, I was wondering if we should do this instead?
+1 since the latest kselftest is used also for checking the older stable kernels, the test case has to check the environment and change the parameter (or make it unsupported for new feature) So below looks good to me.
Thanks,
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..ff7499eb98d6 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,19 @@ mips*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then
- DIR_NAME="eventfs_add_dir"
+else
- DIR_NAME="tracefs_create_dir"
+fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..a202b2ea4baf 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,19 @@ loongarch*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then
- DIR_NAME="eventfs_add_dir"
+else
- DIR_NAME="tracefs_create_dir"
+fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
-- Steve
.../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..523cfb64539f 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,14 @@ mips*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..b9f8c3f8bae8 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,14 @@ loongarch*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
On 14-Jul-2023, at 6:57 PM, Masami Hiramatsu mhiramat@kernel.org wrote:
!! External Email
On Thu, 13 Jul 2023 22:37:58 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Thu, 13 Jul 2023 17:03:24 +0530 Ajay Kaher akaher@vmware.com wrote:
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org
As this patch as is will break when running on older kernels, I was wondering if we should do this instead?
+1 since the latest kselftest is used also for checking the older stable kernels, the test case has to check the environment and change the parameter (or make it unsupported for new feature) So below looks good to me.
+1, many ftrace tests are unsupported in my setup and may few require changes, not sure. Does any auto job takes care of this?
- Ajay
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..ff7499eb98d6 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,19 @@ mips*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then
- DIR_NAME="eventfs_add_dir"
+else
- DIR_NAME="tracefs_create_dir"
+fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..a202b2ea4baf 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,19 @@ loongarch*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then
- DIR_NAME="eventfs_add_dir"
+else
- DIR_NAME="tracefs_create_dir"
+fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
-- Steve
.../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..523cfb64539f 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,14 @@ mips*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..b9f8c3f8bae8 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,14 @@ loongarch*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
On Mon, 17 Jul 2023 05:24:43 +0000 Ajay Kaher akaher@vmware.com wrote:
As this patch as is will break when running on older kernels, I was wondering if we should do this instead?
+1 since the latest kselftest is used also for checking the older stable kernels, the test case has to check the environment and change the parameter (or make it unsupported for new feature) So below looks good to me.
+1, many ftrace tests are unsupported in my setup and may few require changes, not sure. Does any auto job takes care of this?
You mean like some kernel CI? Not that I know of.
Shuah, do you run these selftests on older kernels to make sure they don't fail just because the test is unsupported?
-- Steve
On Thu, 13 Jul 2023 17:03:14 +0530 Ajay Kaher akaher@vmware.com wrote:
Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
I think we are very close to getting this in for the next merge window. I ran several tests and so far it's holding up!
I made a bunch of nits for this series, but nothing major. Mostly fixing up change logs and comments, as well as some naming conventions and reorganizing the series a little bit.
Anyway, I'm hoping that v5 will be ready to go into linux-next.
Thanks a lot Ajay for working on this!
-- Steve
v3: Patch 3,4,5,7,9: removed all the eventfs_rwsem code and replaced it with an srcu lock for the readers, and a mutex to synchronize the writers of the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really dangerous to have unchecked recursion in the kernel (we do have a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods are done, it adds the eventfs_file onto a llist (lockless link list) and wakes up a work queue. Then the work queue does the freeing (this needs to be done in task/workqueue context, as srcu callbacks are done in softirq context).
Patch 6: renamed: eventfs_create_file() -> create_file() eventfs_create_dir() -> create_dir()
v2: Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' Patch 02: moved from v1 1/9 Patch 03: moved from v1 2/9 As suggested by Zheng Yejian, introduced eventfs_prepare_ef() helper function to add files or directories to eventfs fix WARNING reported by kernel test robot in v1 8/9 Patch 04: moved from v1 3/9 used eventfs_prepare_ef() to add files fix WARNING reported by kernel test robot in v1 8/9 Patch 05: moved from v1 4/9 fix compiling warning reported by kernel test robot in v1 4/9 Patch 06: moved from v1 5/9 Patch 07: moved from v1 6/9 Patch 08: moved from v1 7/9 Patch 09: moved from v1 8/9 rebased because of v3 01/10 Patch 10: moved from v1 9/9
v1: Patch 1: add header file Patch 2: resolved kernel test robot issues protecting eventfs lists using nested eventfs_rwsem Patch 3: protecting eventfs lists using nested eventfs_rwsem Patch 4: improve events cleanup code to fix crashes Patch 5: resolved kernel test robot issues removed d_instantiate_anon() calls Patch 6: resolved kernel test robot issues fix kprobe test in eventfs_root_lookup() protecting eventfs lists using nested eventfs_rwsem Patch 7: remove header file Patch 8: pass eventfs_rwsem as argument to eventfs functions called eventfs_remove_events_dir() instead of tracefs_remove() from event_trace_del_tracer() Patch 9: new patch to fix kprobe test case
Ajay Kaher (9): tracefs: Rename some tracefs function eventfs: Implement eventfs dir creation functions eventfs: Implement eventfs file add functions eventfs: Implement eventfs file, directory remove function eventfs: Implement functions to create eventfs files and directories eventfs: Implement eventfs lookup, read, open functions eventfs: Implement tracefs_inode_cache eventfs: Move tracing/events to eventfs test: ftrace: Fix kprobe test for eventfs
Steven Rostedt (Google) (1): tracing: Require all trace events to have a TRACE_SYSTEM
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 711 ++++++++++++++++++ fs/tracefs/inode.c | 124 ++- fs/tracefs/internal.h | 25 + include/linux/trace_events.h | 1 + include/linux/tracefs.h | 32 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 78 +- .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +- .../test.d/kprobe/kprobe_args_string.tc | 4 +- 10 files changed, 930 insertions(+), 52 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
On 15-Jul-2023, at 4:28 AM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Thu, 13 Jul 2023 17:03:14 +0530 Ajay Kaher akaher@vmware.com wrote:
Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
I think we are very close to getting this in for the next merge window. I ran several tests and so far it's holding up!
I made a bunch of nits for this series, but nothing major. Mostly fixing up change logs and comments, as well as some naming conventions and reorganizing the series a little bit.
Anyway, I'm hoping that v5 will be ready to go into linux-next.
Thanks a lot Ajay for working on this!
Thanks Steve, hopefully I will fix all the pending nits in v5. Here is the checkpatch.pl report:
./scripts/checkpatch.pl v4/* -------------------------- v4/0000-cover-letter.patch -------------------------- total: 0 errors, 0 warnings, 0 lines checked
v4/0000-cover-letter.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 22 lines checked
v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch has no obvious style problems and is ready for submission. -------------------------------------------------- v4/0002-tracefs-Rename-some-tracefs-function.patch -------------------------------------------------- total: 0 errors, 0 warnings, 71 lines checked
v4/0002-tracefs-Rename-some-tracefs-function.patch has no obvious style problems and is ready for submission. -------------------------------------------------------------- v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch -------------------------------------------------------------- WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #52: new file mode 100644
WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'. #194: FILE: fs/tracefs/event_inode.c:138: + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'. #229: FILE: fs/tracefs/event_inode.c:173: + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'. #261: FILE: fs/tracefs/event_inode.c:205: + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
total: 0 errors, 4 warnings, 297 lines checked
NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace.
v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch has style problems, please review. ---------------------------------------------------------- v4/0004-eventfs-Implement-eventfs-file-add-functions.patch ---------------------------------------------------------- total: 0 errors, 0 warnings, 101 lines checked
v4/0004-eventfs-Implement-eventfs-file-add-functions.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 129 lines checked
v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 182 lines checked
v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 224 lines checked
v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch has no obvious style problems and is ready for submission. --------------------------------------------------- v4/0008-eventfs-Implement-tracefs_inode_cache.patch --------------------------------------------------- total: 0 errors, 0 warnings, 68 lines checked
v4/0008-eventfs-Implement-tracefs_inode_cache.patch has no obvious style problems and is ready for submission. ---------------------------------------------------- v4/0009-eventfs-Move-tracing-events-to-eventfs.patch ---------------------------------------------------- total: 0 errors, 0 warnings, 241 lines checked
v4/0009-eventfs-Move-tracing-events-to-eventfs.patch has no obvious style problems and is ready for submission. ----------------------------------------------------- v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch ----------------------------------------------------- total: 0 errors, 0 warnings, 32 lines checked
v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch has no obvious style problems and is ready for submission.
-Ajay
On Sun, 16 Jul 2023 17:32:35 +0000 Ajay Kaher akaher@vmware.com wrote:
Thanks Steve, hopefully I will fix all the pending nits in v5. Here is the checkpatch.pl report:
Hold off on v5. I hit the following on v4:
[ 220.170527] BUG: unable to handle page fault for address: fffffffffffffff0 [ 220.172792] #PF: supervisor read access in kernel mode [ 220.174618] #PF: error_code(0x0000) - not-present page [ 220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0 [ 220.178559] Oops: 0000 [#1] PREEMPT SMP PTI [ 220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15 [ 220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 220.186629] Workqueue: events_unbound eventfs_workfn [ 220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40 [ 220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40 [ 220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287 [ 220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000 [ 220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000 [ 220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022 [ 220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058 [ 220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848 [ 220.205685] FS: 0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000 [ 220.207476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0 [ 220.210342] Call Trace: [ 220.210879] <TASK> [ 220.211359] ? __die+0x23/0x70 [ 220.212036] ? page_fault_oops+0xa4/0x180 [ 220.212904] ? exc_page_fault+0xf6/0x190 [ 220.213738] ? asm_exc_page_fault+0x26/0x30 [ 220.214586] ? eventfs_set_ef_status_free+0x17/0x40 [ 220.216081] tracefs_dentry_iput+0x39/0x50 [ 220.217370] __dentry_kill+0xdc/0x170 [ 220.218581] dput+0x142/0x310 [ 220.219647] eventfs_workfn+0x42/0x70 [ 220.220805] process_one_work+0x1e2/0x3e0 [ 220.222031] worker_thread+0x1da/0x390 [ 220.223204] ? __pfx_worker_thread+0x10/0x10 [ 220.224476] kthread+0xf7/0x130 [ 220.225543] ? __pfx_kthread+0x10/0x10 [ 220.226735] ret_from_fork+0x2c/0x50 [ 220.227898] </TASK> [ 220.228792] Modules linked in: [ 220.229860] CR2: fffffffffffffff0 [ 220.230960] ---[ end trace 0000000000000000 ]---
I think I know the issue, and looking to see if I can fix it.
-- Steve
On 18-Jul-2023, at 7:10 PM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Sun, 16 Jul 2023 17:32:35 +0000 Ajay Kaher akaher@vmware.com wrote:
Thanks Steve, hopefully I will fix all the pending nits in v5. Here is the checkpatch.pl report:
Hold off on v5. I hit the following on v4:
OK.
[ 220.170527] BUG: unable to handle page fault for address: fffffffffffffff0 [ 220.172792] #PF: supervisor read access in kernel mode [ 220.174618] #PF: error_code(0x0000) - not-present page [ 220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0 [ 220.178559] Oops: 0000 [#1] PREEMPT SMP PTI [ 220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15 [ 220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 220.186629] Workqueue: events_unbound eventfs_workfn [ 220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40 [ 220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40 [ 220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287 [ 220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000 [ 220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000 [ 220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022 [ 220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058 [ 220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848 [ 220.205685] FS: 0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000 [ 220.207476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0 [ 220.210342] Call Trace: [ 220.210879] <TASK> [ 220.211359] ? __die+0x23/0x70 [ 220.212036] ? page_fault_oops+0xa4/0x180 [ 220.212904] ? exc_page_fault+0xf6/0x190 [ 220.213738] ? asm_exc_page_fault+0x26/0x30 [ 220.214586] ? eventfs_set_ef_status_free+0x17/0x40 [ 220.216081] tracefs_dentry_iput+0x39/0x50 [ 220.217370] __dentry_kill+0xdc/0x170 [ 220.218581] dput+0x142/0x310 [ 220.219647] eventfs_workfn+0x42/0x70 [ 220.220805] process_one_work+0x1e2/0x3e0 [ 220.222031] worker_thread+0x1da/0x390 [ 220.223204] ? __pfx_worker_thread+0x10/0x10 [ 220.224476] kthread+0xf7/0x130 [ 220.225543] ? __pfx_kthread+0x10/0x10 [ 220.226735] ret_from_fork+0x2c/0x50 [ 220.227898] </TASK> [ 220.228792] Modules linked in: [ 220.229860] CR2: fffffffffffffff0 [ 220.230960] ---[ end trace 0000000000000000 ]---
I think I know the issue, and looking to see if I can fix it.
- Is it also reproducible on v3? - Is it manually reproducible or reproducible using any specific script?
Let me know if I can help.
-Ajay
On Wed, 19 Jul 2023 10:25:28 +0000 Ajay Kaher akaher@vmware.com wrote:
- Is it also reproducible on v3?
- Is it manually reproducible or reproducible using any specific script?
Let me know if I can help.
Just tried it against v3, and it gave me the splat that I originally had and starting to fix, which now gives me another splat. I'll spend a couple more days on it and start sharing code and seeing if we can work together on this.
Here's the reproducer (of both v3 splat and the bug I'm hitting now).
~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
v3 gives me (and my updates too)
====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc1-test+ #576 Not tainted ------------------------------------------------------ trace-cmd/840 is trying to acquire lock: ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
but task is already holding lock: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (eventfs_rwsem/1){.+.+}-{3:3}: down_read_nested+0x41/0x180 eventfs_root_lookup+0x42/0x120 __lookup_slow+0xff/0x1b0 walk_component+0xdb/0x150 path_lookupat+0x67/0x1a0 filename_lookup+0xe4/0x1f0 vfs_statx+0x9e/0x180 vfs_fstatat+0x51/0x70 __do_sys_newfstatat+0x3f/0x80 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
-> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 down_write+0x3b/0xd0 dcache_dir_open_wrapper+0xc1/0x1b0 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- rlock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5); lock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5);
*** DEADLOCK ***
1 lock held by trace-cmd/840: #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
stack backtrace: CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x90 check_noncircular+0x14b/0x160 __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 ? dcache_dir_open_wrapper+0xc1/0x1b0 down_write+0x3b/0xd0 ? dcache_dir_open_wrapper+0xc1/0x1b0 dcache_dir_open_wrapper+0xc1/0x1b0 ? __pfx_dcache_dir_open_wrapper+0x10/0x10 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f1743267e41 Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b </TASK>
I moved the code around a bit, and it appears that kprobes is getting dput() more than once.
I moved the d_invalidate() and dput() into the workqueue function, and on kprobes going away, d_invalidate() frees it, and dput() is now corrupted.
Still investigating. The VFS layer is a magic box that needs the right wizard hat to deal with, but I unfortunately am waiting on back order to retrieve that specific hat :-p
-- Steve
On 19-Jul-2023, at 7:53 PM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Wed, 19 Jul 2023 10:25:28 +0000 Ajay Kaher akaher@vmware.com wrote:
- Is it also reproducible on v3?
- Is it manually reproducible or reproducible using any specific script?
Let me know if I can help.
Just tried it against v3, and it gave me the splat that I originally had and starting to fix, which now gives me another splat. I'll spend a couple more days on it and start sharing code and seeing if we can work together on this.
Here's the reproducer (of both v3 splat and the bug I'm hitting now).
~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
I tried above steps on v4 but couldn’t reproduce:
root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ enable filter format id trigger root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events -bash: echo: write error: No such file or directory
I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list will have unordered list and could cause problem.
v3 gives me (and my updates too)
====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc1-test+ #576 Not tainted
trace-cmd/840 is trying to acquire lock: ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
but task is already holding lock: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (eventfs_rwsem/1){.+.+}-{3:3}: down_read_nested+0x41/0x180 eventfs_root_lookup+0x42/0x120 __lookup_slow+0xff/0x1b0 walk_component+0xdb/0x150 path_lookupat+0x67/0x1a0 filename_lookup+0xe4/0x1f0 vfs_statx+0x9e/0x180 vfs_fstatat+0x51/0x70 __do_sys_newfstatat+0x3f/0x80 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
-> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 down_write+0x3b/0xd0 dcache_dir_open_wrapper+0xc1/0x1b0 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ----
rlock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5); lock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5);
*** DEADLOCK ***
1 lock held by trace-cmd/840: #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
stack backtrace: CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x57/0x90 check_noncircular+0x14b/0x160 __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 ? dcache_dir_open_wrapper+0xc1/0x1b0 down_write+0x3b/0xd0 ? dcache_dir_open_wrapper+0xc1/0x1b0 dcache_dir_open_wrapper+0xc1/0x1b0 ? __pfx_dcache_dir_open_wrapper+0x10/0x10 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f1743267e41 Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b </TASK>
This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not reproduced on v3 then it’s v4 issue.
-Ajay
I moved the code around a bit, and it appears that kprobes is getting dput() more than once.
I moved the d_invalidate() and dput() into the workqueue function, and on kprobes going away, d_invalidate() frees it, and dput() is now corrupted.
Still investigating. The VFS layer is a magic box that needs the right wizard hat to deal with, but I unfortunately am waiting on back order to retrieve that specific hat :-p
-- Steve
!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
On Wed, 19 Jul 2023 18:37:12 +0000 Ajay Kaher akaher@vmware.com wrote:
Here's the reproducer (of both v3 splat and the bug I'm hitting now).
~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
I tried above steps on v4 but couldn’t reproduce:
root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ enable filter format id trigger root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events -bash: echo: write error: No such file or directory
I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list will have unordered list and could cause problem.
I modified the srcu portion a bit. Will post soon. I think I got something working.
I'm having doubt that the dput()s were needed in the eventfs_remove_rec(), as the d_invalidate() appears to be enough. I'm still testing.
v3 gives me (and my updates too)
====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc1-test+ #576 Not tainted
trace-cmd/840 is trying to acquire lock: ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
but task is already holding lock: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is: -> #1 (eventfs_rwsem/1){.+.+}-{3:3}: down_read_nested+0x41/0x180 eventfs_root_lookup+0x42/0x120 __lookup_slow+0xff/0x1b0 walk_component+0xdb/0x150 path_lookupat+0x67/0x1a0 filename_lookup+0xe4/0x1f0 vfs_statx+0x9e/0x180 vfs_fstatat+0x51/0x70 __do_sys_newfstatat+0x3f/0x80 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 down_write+0x3b/0xd0 dcache_dir_open_wrapper+0xc1/0x1b0 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ----
rlock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5); lock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5);
*** DEADLOCK ***
1 lock held by trace-cmd/840: #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
stack backtrace: CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x57/0x90 check_noncircular+0x14b/0x160 __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 ? dcache_dir_open_wrapper+0xc1/0x1b0 down_write+0x3b/0xd0 ? dcache_dir_open_wrapper+0xc1/0x1b0 dcache_dir_open_wrapper+0xc1/0x1b0 ? __pfx_dcache_dir_open_wrapper+0x10/0x10 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f1743267e41 Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b </TASK>
This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not reproduced on v3 then it’s v4 issue.
The issue comes from fixing the above ;-)
-- Steve
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]
On Wed, 19 Jul 2023 14:40:46 -0400 Steven Rostedt rostedt@goodmis.org wrote:
I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list will have unordered list and could cause problem.
I modified the srcu portion a bit. Will post soon. I think I got something working.
I'm having doubt that the dput()s were needed in the eventfs_remove_rec(), as the d_invalidate() appears to be enough. I'm still testing.
OK, I got this working and it appears to pass all my tests. I actually got it working Wednesday night, but I tried a different approach on Thursday that got rid of the evenfs_file and only used eventfs_inodes and makes the files more dynamic. There's still a couple of corner cases that are not working with this approach (the dentry counters are getting out of sync). This should not stop this from going in. I'll continue working on that approach for the next merge cycle. But for now, here's the patch to this series that works.
I'm going to post it here, and then reply to it with comments about my changes.
-- Steve
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 4db048250cdb..2718de1533e6 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -36,16 +36,36 @@ struct eventfs_file { const struct file_operations *fop; const struct inode_operations *iop; union { + struct list_head del_list; struct rcu_head rcu; - struct llist_node llist; /* For freeing after RCU */ + unsigned long is_freed; /* Freed if one of the above is set */ }; void *data; umode_t mode; - bool created; + unsigned int flags; };
static DEFINE_MUTEX(eventfs_mutex); DEFINE_STATIC_SRCU(eventfs_srcu); + +static struct dentry *eventfs_root_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags); +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); +static int eventfs_release(struct inode *inode, struct file *file); + +static const struct inode_operations eventfs_root_dir_inode_operations = { + .lookup = eventfs_root_lookup, +}; + +static const struct file_operations eventfs_file_operations = { + .open = dcache_dir_open_wrapper, + .read = generic_read_dir, + .iterate_shared = dcache_readdir, + .llseek = generic_file_llseek, + .release = eventfs_release, +}; + /** * create_file - create a file in the tracefs filesystem * @name: the name of the file to create. @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode, * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ -static struct dentry *create_dir(const char *name, umode_t mode, - struct dentry *parent, void *data, - const struct file_operations *fop, - const struct inode_operations *iop) +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) { struct tracefs_inode *ti; struct dentry *dentry; struct inode *inode;
- WARN_ON(!S_ISDIR(mode)); - dentry = eventfs_start_creating(name, parent); if (IS_ERR(dentry)) return dentry; @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode, if (unlikely(!inode)) return eventfs_failed_creating(dentry);
- inode->i_mode = mode; - inode->i_op = iop; - inode->i_fop = fop; + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + inode->i_op = &eventfs_root_dir_inode_operations; + inode->i_fop = &eventfs_file_operations; inode->i_private = data;
ti = get_tracefs(inode); @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry) struct tracefs_inode *ti_parent; struct eventfs_file *ef;
+ mutex_lock(&eventfs_mutex); ti_parent = get_tracefs(dentry->d_parent->d_inode); if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) - return; + goto out;
ef = dentry->d_fsdata; if (!ef) - return; - ef->created = false; + goto out; + /* + * If ef was freed, then the LSB bit is set for d_fsdata. + * But this should not happen, as it should still have a + * ref count that prevents it. Warn in case it does. + */ + if (WARN_ON_ONCE((unsigned long)ef & 1)) + goto out; + + dentry->d_fsdata = NULL; + ef->dentry = NULL; + out: + mutex_unlock(&eventfs_mutex); }
/** @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) ti->private = ef->ei; }
+static struct dentry * +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) +{ + bool invalidate = false; + struct dentry *dentry; + + mutex_lock(&eventfs_mutex); + if (ef->is_freed) { + mutex_unlock(&eventfs_mutex); + return NULL; + } + if (ef->dentry) { + dentry = ef->dentry; + /* On dir open, up the ref count */ + if (!lookup) + dget(dentry); + mutex_unlock(&eventfs_mutex); + return dentry; + } + mutex_unlock(&eventfs_mutex); + + if (!lookup) + inode_lock(parent->d_inode); + + if (ef->ei) + dentry = create_dir(ef->name, parent, ef->data); + else + dentry = create_file(ef->name, ef->mode, parent, + ef->data, ef->fop); + + if (!lookup) + inode_unlock(parent->d_inode); + + mutex_lock(&eventfs_mutex); + if (IS_ERR_OR_NULL(dentry)) { + /* If the ef was already updated get it */ + dentry = ef->dentry; + if (dentry && !lookup) + dget(dentry); + mutex_unlock(&eventfs_mutex); + return dentry; + } + + if (!ef->dentry && !ef->is_freed) { + ef->dentry = dentry; + if (ef->ei) + eventfs_post_create_dir(ef); + dentry->d_fsdata = ef; + } else { + /* A race here, should try again (unless freed) */ + invalidate = true; + } + mutex_unlock(&eventfs_mutex); + if (invalidate) + d_invalidate(dentry); + + if (lookup || invalidate) + dput(dentry); + + return invalidate ? NULL : dentry; +} + +static bool match_event_file(struct eventfs_file *ef, const char *name) +{ + bool ret; + + mutex_lock(&eventfs_mutex); + ret = !ef->is_freed && strcmp(ef->name, name) == 0; + mutex_unlock(&eventfs_mutex); + + return ret; +} + /** * eventfs_root_lookup - lookup routine to create file/dir * @dir: directory in which lookup to be done @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) * Used to create dynamic file/dir with-in @dir, search with-in ei * list, if @dentry found go ahead and create the file/dir */ - static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, idx = srcu_read_lock(&eventfs_srcu); list_for_each_entry_srcu(ef, &ei->e_top_files, list, srcu_read_lock_held(&eventfs_srcu)) { - if (strcmp(ef->name, dentry->d_name.name)) + if (!match_event_file(ef, dentry->d_name.name)) continue; ret = simple_lookup(dir, dentry, flags); - if (ef->created) - continue; - mutex_lock(&eventfs_mutex); - ef->created = true; - if (ef->ei) - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent, - ef->data, ef->fop, ef->iop); - else - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent, - ef->data, ef->fop); - - if (IS_ERR_OR_NULL(ef->dentry)) { - ef->created = false; - mutex_unlock(&eventfs_mutex); - } else { - if (ef->ei) - eventfs_post_create_dir(ef); - ef->dentry->d_fsdata = ef; - mutex_unlock(&eventfs_mutex); - dput(ef->dentry); - } + create_dentry(ef, ef->d_parent, true); break; } srcu_read_unlock(&eventfs_srcu, idx); @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file) struct tracefs_inode *ti; struct eventfs_inode *ei; struct eventfs_file *ef; + struct dentry *dentry; int idx;
ti = get_tracefs(inode); @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file) idx = srcu_read_lock(&eventfs_srcu); list_for_each_entry_srcu(ef, &ei->e_top_files, list, srcu_read_lock_held(&eventfs_srcu)) { - if (ef->created) - dput(ef->dentry); + mutex_lock(&eventfs_mutex); + dentry = ef->dentry; + mutex_unlock(&eventfs_mutex); + if (dentry) + dput(dentry); } srcu_read_unlock(&eventfs_srcu, idx); return dcache_dir_close(inode, file); @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) ei = ti->private; idx = srcu_read_lock(&eventfs_srcu); list_for_each_entry_rcu(ef, &ei->e_top_files, list) { - if (ef->created) { - dget(ef->dentry); - continue; - } - mutex_lock(&eventfs_mutex); - ef->created = true; - - inode_lock(dentry->d_inode); - if (ef->ei) - ef->dentry = create_dir(ef->name, ef->mode, dentry, - ef->data, ef->fop, ef->iop); - else - ef->dentry = create_file(ef->name, ef->mode, dentry, - ef->data, ef->fop); - inode_unlock(dentry->d_inode); - - if (IS_ERR_OR_NULL(ef->dentry)) { - ef->created = false; - } else { - if (ef->ei) - eventfs_post_create_dir(ef); - ef->dentry->d_fsdata = ef; - } - mutex_unlock(&eventfs_mutex); + create_dentry(ef, dentry, false); } srcu_read_unlock(&eventfs_srcu, idx); return dcache_dir_open(inode, file); }
-static const struct file_operations eventfs_file_operations = { - .open = dcache_dir_open_wrapper, - .read = generic_read_dir, - .iterate_shared = dcache_readdir, - .llseek = generic_file_llseek, - .release = eventfs_release, -}; - -static const struct inode_operations eventfs_root_dir_inode_operations = { - .lookup = eventfs_root_lookup, -}; - /** * eventfs_prepare_ef - helper function to prepare eventfs_file * @name: the name of the file/directory to create. @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, ti_parent = get_tracefs(parent->d_inode); ei_parent = ti_parent->private;
- ef = eventfs_prepare_ef(name, - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, - &eventfs_file_operations, - &eventfs_root_dir_inode_operations, NULL); - + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); if (IS_ERR(ef)) return ef;
@@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name, if (!ef_parent) return ERR_PTR(-EINVAL);
- ef = eventfs_prepare_ef(name, - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, - &eventfs_file_operations, - &eventfs_root_dir_inode_operations, NULL); - + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); if (IS_ERR(ef)) return ef;
@@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode, return 0; }
-static LLIST_HEAD(free_list); - -static void eventfs_workfn(struct work_struct *work) -{ - struct eventfs_file *ef, *tmp; - struct llist_node *llnode; - - llnode = llist_del_all(&free_list); - llist_for_each_entry_safe(ef, tmp, llnode, llist) { - if (ef->created && ef->dentry) - dput(ef->dentry); - kfree(ef->name); - kfree(ef->ei); - kfree(ef); - } -} - -DECLARE_WORK(eventfs_work, eventfs_workfn); - static void free_ef(struct rcu_head *head) { struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
- if (!llist_add(&ef->llist, &free_list)) - return; - - queue_work(system_unbound_wq, &eventfs_work); + kfree(ef->name); + kfree(ef->ei); + kfree(ef); }
- - /** * eventfs_remove_rec - remove eventfs dir or file from list * @ef: eventfs_file to be removed. @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head) * This function recursively remove eventfs_file which * contains info of file or dir. */ -static void eventfs_remove_rec(struct eventfs_file *ef, int level) +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) { struct eventfs_file *ef_child;
@@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level) /* search for nested folders or files */ list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, lockdep_is_held(&eventfs_mutex)) { - eventfs_remove_rec(ef_child, level + 1); + eventfs_remove_rec(ef_child, head, level + 1); } }
- if (ef->created && ef->dentry) - d_invalidate(ef->dentry); - list_del_rcu(&ef->list); - call_srcu(&eventfs_srcu, &ef->rcu, free_ef); + list_add_tail(&ef->del_list, head); }
/** @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level) */ void eventfs_remove(struct eventfs_file *ef) { + struct eventfs_file *tmp; + LIST_HEAD(ef_del_list); + struct dentry *dentry_list = NULL; + struct dentry *dentry; + if (!ef) return;
mutex_lock(&eventfs_mutex); - eventfs_remove_rec(ef, 0); + eventfs_remove_rec(ef, &ef_del_list, 0); + + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { + if (ef->dentry) { + unsigned long ptr = (unsigned long)dentry_list; + + /* Keep the dentry from being freed yet */ + dget(ef->dentry); + + /* + * Paranoid: The dget() above should prevent the dentry + * from being freed and calling eventfs_set_ef_status_free(). + * But just in case, set the link list LSB pointer to 1 + * and have eventfs_set_ef_status_free() check that to + * make sure that if it does happen, it will not think + * the d_fsdata is an event_file. + * + * For this to work, no event_file should be allocated + * on a odd space, as the ef should always be allocated + * to be at least word aligned. Check for that too. + */ + WARN_ON_ONCE(ptr & 1); + + ef->dentry->d_fsdata = (void *)(ptr | 1); + dentry_list = ef->dentry; + ef->dentry = NULL; + } + call_srcu(&eventfs_srcu, &ef->rcu, free_ef); + } mutex_unlock(&eventfs_mutex); + + while (dentry_list) { + unsigned long ptr; + + dentry = dentry_list; + ptr = (unsigned long)dentry->d_fsdata & ~1UL; + dentry_list = (struct dentry *)ptr; + dentry->d_fsdata = NULL; + d_invalidate(dentry); + mutex_lock(&eventfs_mutex); + /* dentry should now have at least a single reference */ + WARN_ONCE((int)d_count(dentry) < 1, + "dentry %px less than one reference (%d) after invalidate\n", + dentry, d_count(dentry)); + mutex_unlock(&eventfs_mutex); + dput(dentry); + } }
/** diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index c443a0c32a8c..1b880b5cd29d 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb);
+void eventfs_set_ef_status_free(struct dentry *dentry); + #endif /* _TRACEFS_INTERNAL_H */ diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 4d30b0cafc5f..47c1b4d21735 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
void eventfs_remove_events_dir(struct dentry *dentry);
-void eventfs_set_ef_status_free(struct dentry *dentry); - struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
On Fri, 21 Jul 2023 09:18:43 -0400 Steven Rostedt rostedt@goodmis.org wrote:
OK, I got this working and it appears to pass all my tests. I actually got it working Wednesday night, but I tried a different approach on Thursday that got rid of the evenfs_file and only used eventfs_inodes and makes the files more dynamic. There's still a couple of corner cases that are not working with this approach (the dentry counters are getting out of sync). This should not stop this from going in. I'll continue working on that approach for the next merge cycle. But for now, here's the patch to this series that works.
Just figured out my bug with my new design. It was in the eventfs_release() code, where I have a loop that does the dput on the children, but I wasn't dput(child), I was dput(parent) in that loop!!
Anyway, for this merge window I much rather have this code in, and for the next merge window I'll add this update.
I can post the new design too, but first let's focus on this.
-- Steve
linux-kselftest-mirror@lists.linaro.org