Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
v2: Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' Patch 02: moved from v1 1/9 Patch 03: moved from v1 2/9 As suggested by Zheng Yejian, introduced eventfs_prepare_ef() helper function to add files or directories to eventfs fix WARNING reported by kernel test robot in v1 8/9 Patch 04: moved from v1 3/9 used eventfs_prepare_ef() to add files fix WARNING reported by kernel test robot in v1 8/9 Patch 05: moved from v1 4/9 fix compiling warning reported by kernel test robot in v1 4/9 Patch 06: moved from v1 5/9 Patch 07: moved from v1 6/9 Patch 08: moved from v1 7/9 Patch 09: moved from v1 8/9 rebased because of v3 01/10 Patch 10: moved from v1 9/9
v1: Patch 1: add header file Patch 2: resolved kernel test robot issues protecting eventfs lists using nested eventfs_rwsem Patch 3: protecting eventfs lists using nested eventfs_rwsem Patch 4: improve events cleanup code to fix crashes Patch 5: resolved kernel test robot issues removed d_instantiate_anon() calls Patch 6: resolved kernel test robot issues fix kprobe test in eventfs_root_lookup() protecting eventfs lists using nested eventfs_rwsem Patch 7: remove header file Patch 8: pass eventfs_rwsem as argument to eventfs functions called eventfs_remove_events_dir() instead of tracefs_remove() from event_trace_del_tracer() Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 757 ++++++++++++++++++ fs/tracefs/inode.c | 124 ++- fs/tracefs/internal.h | 25 + include/linux/trace_events.h | 1 + include/linux/tracefs.h | 49 ++ kernel/trace/trace.h | 3 +- kernel/trace/trace_events.c | 78 +- .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +- .../test.d/kprobe/kprobe_args_string.tc | 4 +- 10 files changed, 994 insertions(+), 52 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The creation of the trace event directory requires that a TRACE_SYSTEM is defined that the trace event directory is added within the system it was defined in.
The code handled the case where a TRACE_SYSTEM was not added, and would then add the event at the events directory. But nothing should be doing this. This code also prevents the implementation of creating dynamic dentrys for the eventfs system.
As this path has never been hit on correct code, remove it. If it does get hit, issues a WARN_ON_ONCE() and return ENODEV.
Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Ajay Kaher akaher@vmware.com --- kernel/trace/trace_events.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 654ffa404..16bc5ba45 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2424,14 +2424,15 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
/* * If the trace point header did not define TRACE_SYSTEM - * then the system would be called "TRACE_SYSTEM". + * then the system would be called "TRACE_SYSTEM". This should + * never happen. */ - if (strcmp(call->class->system, TRACE_SYSTEM) != 0) { - d_events = event_subsystem_dir(tr, call->class->system, file, parent); - if (!d_events) - return -ENOMEM; - } else - d_events = parent; + if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) + return -ENODEV; + + d_events = event_subsystem_dir(tr, call->class->system, file, parent); + if (!d_events) + return -ENOMEM;
name = trace_event_name(call); file->dir = tracefs_create_dir(name, d_events);
Introduce tracefs_inode structure, this will help eventfs to keep track of inode, flags and pointer to private date.
Rename function names and remove the static qualifier for functions that should be exposed.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 21 +++++++++++---------- fs/tracefs/internal.h | 25 +++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 10 deletions(-) create mode 100644 fs/tracefs/internal.h
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 57ac8aa4a..7df1752e8 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -21,6 +21,7 @@ #include <linux/parser.h> #include <linux/magic.h> #include <linux/slab.h> +#include "internal.h"
#define TRACEFS_DEFAULT_MODE 0700
@@ -127,7 +128,7 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, };
-static struct inode *tracefs_get_inode(struct super_block *sb) +struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); if (inode) { @@ -399,7 +400,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs");
-static struct dentry *start_creating(const char *name, struct dentry *parent) +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; @@ -437,7 +438,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) return dentry; }
-static struct dentry *failed_creating(struct dentry *dentry) +struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -445,7 +446,7 @@ static struct dentry *failed_creating(struct dentry *dentry) return NULL; }
-static struct dentry *end_creating(struct dentry *dentry) +struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; @@ -490,14 +491,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, if (!(mode & S_IFMT)) mode |= S_IFREG; BUG_ON(!S_ISREG(mode)); - dentry = start_creating(name, parent); + dentry = tracefs_start_creating(name, parent);
if (IS_ERR(dentry)) return NULL;
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return failed_creating(dentry); + return tracefs_failed_creating(dentry);
inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; @@ -506,13 +507,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return tracefs_end_creating(dentry); }
static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) { - struct dentry *dentry = start_creating(name, parent); + struct dentry *dentry = tracefs_start_creating(name, parent); struct inode *inode;
if (IS_ERR(dentry)) @@ -520,7 +521,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return failed_creating(dentry); + return tracefs_failed_creating(dentry);
/* Do not set bits for OTH */ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; @@ -534,7 +535,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return tracefs_end_creating(dentry); }
/** diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h new file mode 100644 index 000000000..6776b4693 --- /dev/null +++ b/fs/tracefs/internal.h @@ -0,0 +1,25 @@ +#ifndef _TRACEFS_INTERNAL_H +#define _TRACEFS_INTERNAL_H + +enum { + TRACEFS_EVENT_INODE = BIT(1), +}; + +struct tracefs_inode { + unsigned long flags; + void *private; + struct inode vfs_inode; +}; + +static inline struct tracefs_inode *get_tracefs(const struct inode *inode) +{ + return container_of(inode, struct tracefs_inode, vfs_inode); +} + +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); +struct dentry *tracefs_end_creating(struct dentry *dentry); +struct dentry *tracefs_failed_creating(struct dentry *dentry); +struct inode *tracefs_get_inode(struct super_block *sb); + +#endif /* _TRACEFS_INTERNAL_H */ +
On Thu, 1 Jun 2023 14:30:05 +0530 Ajay Kaher akaher@vmware.com wrote:
Introduce tracefs_inode structure, this will help eventfs to keep track of inode, flags and pointer to private date.
Rename function names and remove the static qualifier for functions that should be exposed.
We should probably break this patch up into two. Or at least remove the static functions and make them non static whet they are needed.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 21 +++++++++++---------- fs/tracefs/internal.h | 25 +++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 10 deletions(-) create mode 100644 fs/tracefs/internal.h
[..]
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h new file mode 100644 index 000000000..6776b4693 --- /dev/null +++ b/fs/tracefs/internal.h @@ -0,0 +1,25 @@ +#ifndef _TRACEFS_INTERNAL_H +#define _TRACEFS_INTERNAL_H
+enum {
- TRACEFS_EVENT_INODE = BIT(1),
+};
+struct tracefs_inode {
- unsigned long flags;
- void *private;
- struct inode vfs_inode;
+};
+static inline struct tracefs_inode *get_tracefs(const struct inode *inode) +{
- return container_of(inode, struct tracefs_inode, vfs_inode);
+}
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); +struct dentry *tracefs_end_creating(struct dentry *dentry); +struct dentry *tracefs_failed_creating(struct dentry *dentry); +struct inode *tracefs_get_inode(struct super_block *sb);
+#endif /* _TRACEFS_INTERNAL_H */
git complains about the above extra line.
-- Steve
On Thu, 1 Jun 2023 14:30:05 +0530 Ajay Kaher akaher@vmware.com wrote:
Introduce tracefs_inode structure, this will help eventfs to keep track of inode, flags and pointer to private date.
Rename function names and remove the static qualifier for functions that should be exposed.
I think the removing static and renaming is OK, but please do not introduce new 'tracefs_inode' and 'get_tracefs()' which are not used. I think those should be merged with [3/10].
Thank you,
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
fs/tracefs/inode.c | 21 +++++++++++---------- fs/tracefs/internal.h | 25 +++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 10 deletions(-) create mode 100644 fs/tracefs/internal.h
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 57ac8aa4a..7df1752e8 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -21,6 +21,7 @@ #include <linux/parser.h> #include <linux/magic.h> #include <linux/slab.h> +#include "internal.h" #define TRACEFS_DEFAULT_MODE 0700 @@ -127,7 +128,7 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, }; -static struct inode *tracefs_get_inode(struct super_block *sb) +struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); if (inode) { @@ -399,7 +400,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs"); -static struct dentry *start_creating(const char *name, struct dentry *parent) +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; @@ -437,7 +438,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) return dentry; } -static struct dentry *failed_creating(struct dentry *dentry) +struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -445,7 +446,7 @@ static struct dentry *failed_creating(struct dentry *dentry) return NULL; } -static struct dentry *end_creating(struct dentry *dentry) +struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; @@ -490,14 +491,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, if (!(mode & S_IFMT)) mode |= S_IFREG; BUG_ON(!S_ISREG(mode));
- dentry = start_creating(name, parent);
- dentry = tracefs_start_creating(name, parent);
if (IS_ERR(dentry)) return NULL; inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode))
return failed_creating(dentry);
return tracefs_failed_creating(dentry);
inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; @@ -506,13 +507,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
- return tracefs_end_creating(dentry);
} static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) {
- struct dentry *dentry = start_creating(name, parent);
- struct dentry *dentry = tracefs_start_creating(name, parent); struct inode *inode;
if (IS_ERR(dentry)) @@ -520,7 +521,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode))
return failed_creating(dentry);
return tracefs_failed_creating(dentry);
/* Do not set bits for OTH */ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; @@ -534,7 +535,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
- return tracefs_end_creating(dentry);
} /** diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h new file mode 100644 index 000000000..6776b4693 --- /dev/null +++ b/fs/tracefs/internal.h @@ -0,0 +1,25 @@ +#ifndef _TRACEFS_INTERNAL_H +#define _TRACEFS_INTERNAL_H
+enum {
- TRACEFS_EVENT_INODE = BIT(1),
+};
+struct tracefs_inode {
- unsigned long flags;
- void *private;
- struct inode vfs_inode;
+};
+static inline struct tracefs_inode *get_tracefs(const struct inode *inode) +{
- return container_of(inode, struct tracefs_inode, vfs_inode);
+}
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); +struct dentry *tracefs_end_creating(struct dentry *dentry); +struct dentry *tracefs_failed_creating(struct dentry *dentry); +struct inode *tracefs_get_inode(struct super_block *sb);
+#endif /* _TRACEFS_INTERNAL_H */
-- 2.40.0
Adding eventfs_file structure which will hold properties of file or dir.
Adding following functions to add dir in eventfs:
eventfs_create_events_dir() directly creates events dir with-in tracing folder.
eventfs_add_subsystem_dir() adds the information of subsystem_dir to eventfs and dynamically creates subsystem_dir as and when requires.
eventfs_add_dir() adds the information of dir (which is with-in subsystem_dir) to eventfs and dynamically creates these dir as and when requires.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com --- fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 272 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 29 +++++ kernel/trace/trace.h | 1 + 4 files changed, 303 insertions(+) create mode 100644 fs/tracefs/event_inode.c
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile index 7c35a282b..73c56da8e 100644 --- a/fs/tracefs/Makefile +++ b/fs/tracefs/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only tracefs-objs := inode.o +tracefs-objs += event_inode.o
obj-$(CONFIG_TRACING) += tracefs.o
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c new file mode 100644 index 000000000..a48ce23c0 --- /dev/null +++ b/fs/tracefs/event_inode.c @@ -0,0 +1,272 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * event_inode.c - part of tracefs, a pseudo file system for activating tracing + * + * Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) rostedt@goodmis.org + * Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher akaher@vmware.com + * + * eventfs is used to show trace events with one set of dentries + * + * eventfs stores meta-data of files/dirs and skip to create object of + * inodes/dentries. As and when requires, eventfs will create the + * inodes/dentries for only required files/directories. Also eventfs + * would delete the inodes/dentries once no more requires but preserve + * the meta data. + */ +#include <linux/fsnotify.h> +#include <linux/fs.h> +#include <linux/namei.h> +#include <linux/security.h> +#include <linux/tracefs.h> +#include <linux/kref.h> +#include <linux/delay.h> +#include "internal.h" + +/** + * eventfs_dentry_to_rwsem - Return corresponding eventfs_rwsem + * @dentry: a pointer to dentry + * + * helper function to return crossponding eventfs_rwsem for given dentry + */ +static struct rw_semaphore *eventfs_dentry_to_rwsem(struct dentry *dentry) +{ + if (S_ISDIR(dentry->d_inode->i_mode)) + return (struct rw_semaphore *)dentry->d_inode->i_private; + else + return (struct rw_semaphore *)dentry->d_parent->d_inode->i_private; +} + +/** + * eventfs_down_read - acquire read lock function + * @eventfs_rwsem: a pointer to rw_semaphore + * + * helper function to perform read lock. Nested locking requires because + * lookup(), release() requires read lock, these could be called directly + * or from open(), remove() which already hold the read/write lock. + */ +static void eventfs_down_read(struct rw_semaphore *eventfs_rwsem) +{ + down_read_nested(eventfs_rwsem, SINGLE_DEPTH_NESTING); +} + +/** + * eventfs_up_read - release read lock function + * @eventfs_rwsem: a pointer to rw_semaphore + * + * helper function to release eventfs_rwsem lock if locked + */ +static void eventfs_up_read(struct rw_semaphore *eventfs_rwsem) +{ + up_read(eventfs_rwsem); +} + +/** + * eventfs_down_write - acquire write lock function + * @eventfs_rwsem: a pointer to rw_semaphore + * + * helper function to perform write lock on eventfs_rwsem + */ +static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem) +{ + while (!down_write_trylock(eventfs_rwsem)) + msleep(10); +} + +/** + * eventfs_up_write - release write lock function + * @eventfs_rwsem: a pointer to rw_semaphore + * + * helper function to perform write lock on eventfs_rwsem + */ +static void eventfs_up_write(struct rw_semaphore *eventfs_rwsem) +{ + up_write(eventfs_rwsem); +} + +static const struct file_operations eventfs_file_operations = { +}; + +static const struct inode_operations eventfs_root_dir_inode_operations = { +}; + +/** + * eventfs_prepare_ef - helper function to prepare eventfs_file + * @name: a pointer to a string containing the name of the file/directory + * to create. + * @mode: the permission that the file should have. + * @fop: a pointer to a struct file_operations that should be used for + * this file/directory. + * @iop: a pointer to a struct inode_operations that should be used for + * this file/directory. + * @data: a pointer to something that the caller will want to get to later + * on. The inode.i_private pointer will point to this value on + * the open() call. + * + * This function allocate the fill eventfs_file structure. + */ +static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode, + const struct file_operations *fop, + const struct inode_operations *iop, + void *data) +{ + struct eventfs_file *ef; + + ef = kzalloc(sizeof(*ef), GFP_KERNEL); + if (!ef) + return ERR_PTR(-ENOMEM); + + ef->name = kstrdup(name, GFP_KERNEL); + if (!ef->name) { + kfree(ef); + return ERR_PTR(-ENOMEM); + } + + if (S_ISDIR(mode)) { + ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL); + if (!ef->ei) { + kfree(ef->name); + kfree(ef); + return ERR_PTR(-ENOMEM); + } + INIT_LIST_HEAD(&ef->ei->e_top_files); + } else { + ef->ei = NULL; + } + + ef->iop = iop; + ef->fop = fop; + ef->mode = mode; + ef->data = data; + ef->dentry = NULL; + ef->d_parent = NULL; + ef->created = false; + return ef; +} + +/** + * eventfs_create_events_dir - create the trace event structure + * @name: a pointer to a string containing the name of the directory to + * create. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is NULL, then the + * directory will be created in the root of the tracefs filesystem. + * @eventfs_rwsem: a pointer to rw_semaphore + * + * This function creates the top of the trace event directory. + */ +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent, + struct rw_semaphore *eventfs_rwsem) +{ + struct dentry *dentry = tracefs_start_creating(name, parent); + struct eventfs_inode *ei; + struct tracefs_inode *ti; + struct inode *inode; + + if (IS_ERR(dentry)) + return dentry; + + ei = kzalloc(sizeof(*ei), GFP_KERNEL); + if (!ei) + return ERR_PTR(-ENOMEM); + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) { + kfree(ei); + tracefs_failed_creating(dentry); + return ERR_PTR(-ENOMEM); + } + + init_rwsem(eventfs_rwsem); + INIT_LIST_HEAD(&ei->e_top_files); + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + ti->private = ei; + + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + inode->i_op = &eventfs_root_dir_inode_operations; + inode->i_fop = &eventfs_file_operations; + inode->i_private = eventfs_rwsem; + + /* directory inodes start off with i_nlink == 2 (for "." entry) */ + inc_nlink(inode); + d_instantiate(dentry, inode); + inc_nlink(dentry->d_parent->d_inode); + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); + return tracefs_end_creating(dentry); +} + +/** + * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later + * @name: a pointer to a string containing the name of the file to create. + * @parent: a pointer to the parent dentry for this dir. + * @eventfs_rwsem: a pointer to rw_semaphore + * + * This function adds eventfs subsystem dir to list. + * And all these dirs are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent, + struct rw_semaphore *eventfs_rwsem) +{ + struct tracefs_inode *ti_parent; + struct eventfs_inode *ei_parent; + struct eventfs_file *ef; + + if (!parent) + return ERR_PTR(-EINVAL); + + ti_parent = get_tracefs(parent->d_inode); + ei_parent = ti_parent->private; + + ef = eventfs_prepare_ef(name, + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, + &eventfs_file_operations, + &eventfs_root_dir_inode_operations, + (void *) eventfs_rwsem); + + if (IS_ERR(ef)) + return ef; + + eventfs_down_write(eventfs_rwsem); + list_add_tail(&ef->list, &ei_parent->e_top_files); + ef->d_parent = parent; + eventfs_up_write(eventfs_rwsem); + return ef; +} + +/** + * eventfs_add_dir - add eventfs dir to list to create later + * @name: a pointer to a string containing the name of the file to create. + * @ef_parent: a pointer to the parent eventfs_file for this dir. + * @eventfs_rwsem: a pointer to rw_semaphore + * + * This function adds eventfs dir to list. + * And all these dirs are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent, + struct rw_semaphore *eventfs_rwsem) +{ + struct eventfs_file *ef; + + if (!ef_parent) + return ERR_PTR(-EINVAL); + + ef = eventfs_prepare_ef(name, + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, + &eventfs_file_operations, + &eventfs_root_dir_inode_operations, + (void *) eventfs_rwsem); + + if (IS_ERR(ef)) + return ef; + + eventfs_down_write(eventfs_rwsem); + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + ef->d_parent = ef_parent->dentry; + eventfs_up_write(eventfs_rwsem); + return ef; +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 999124459..aeca6761f 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -21,6 +21,35 @@ struct file_operations;
#ifdef CONFIG_TRACING
+struct eventfs_inode { + struct list_head e_top_files; +}; + +struct eventfs_file { + const char *name; + struct dentry *d_parent; + struct dentry *dentry; + struct list_head list; + struct eventfs_inode *ei; + const struct file_operations *fop; + const struct inode_operations *iop; + void *data; + umode_t mode; + bool created; +}; + +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent, + struct rw_semaphore *eventfs_rwsem); + +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent, + struct rw_semaphore *eventfs_rwsem); + +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent, + struct rw_semaphore *eventfs_rwsem); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 79bdefe92..b895c3346 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -359,6 +359,7 @@ struct trace_array { struct dentry *options; struct dentry *percpu_dir; struct dentry *event_dir; + struct rw_semaphore eventfs_rwsem; struct trace_options *topts; struct list_head systems; struct list_head events;
FYI, all subjects should start with a capital letter:
"eventfs: Implement eventfs dir creation functions"
On Thu, 1 Jun 2023 14:30:06 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding eventfs_file structure which will hold properties of file or dir.
Adding following functions to add dir in eventfs:
eventfs_create_events_dir() directly creates events dir with-in
"within" is a proper word.
tracing folder.
eventfs_add_subsystem_dir() adds the information of subsystem_dir to eventfs and dynamically creates subsystem_dir as and when requires.
"as and when requires" does not make sense.
eventfs_add_dir() adds the information of dir (which is with-in
"within"
subsystem_dir) to eventfs and dynamically creates these dir as and when requires.
I'm guessing you want to say:
eventfs_add_dir() adds the information of the dir, within a subsystem_dir, to eventfs and dynamically creates these directories when they are accessed.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 272 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 29 +++++ kernel/trace/trace.h | 1 + 4 files changed, 303 insertions(+) create mode 100644 fs/tracefs/event_inode.c
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile index 7c35a282b..73c56da8e 100644 --- a/fs/tracefs/Makefile +++ b/fs/tracefs/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only tracefs-objs := inode.o +tracefs-objs += event_inode.o obj-$(CONFIG_TRACING) += tracefs.o diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c new file mode 100644 index 000000000..a48ce23c0 --- /dev/null +++ b/fs/tracefs/event_inode.c @@ -0,0 +1,272 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- event_inode.c - part of tracefs, a pseudo file system for activating tracing
- Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) rostedt@goodmis.org
- Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher akaher@vmware.com
- eventfs is used to show trace events with one set of dentries
- eventfs stores meta-data of files/dirs and skip to create object of
- inodes/dentries. As and when requires, eventfs will create the
- inodes/dentries for only required files/directories. Also eventfs
- would delete the inodes/dentries once no more requires but preserve
- the meta data.
- */
+#include <linux/fsnotify.h> +#include <linux/fs.h> +#include <linux/namei.h> +#include <linux/security.h> +#include <linux/tracefs.h> +#include <linux/kref.h> +#include <linux/delay.h> +#include "internal.h"
+/**
- eventfs_dentry_to_rwsem - Return corresponding eventfs_rwsem
- @dentry: a pointer to dentry
- helper function to return crossponding eventfs_rwsem for given dentry
- */
+static struct rw_semaphore *eventfs_dentry_to_rwsem(struct dentry *dentry) +{
- if (S_ISDIR(dentry->d_inode->i_mode))
return (struct rw_semaphore *)dentry->d_inode->i_private;
- else
return (struct rw_semaphore *)dentry->d_parent->d_inode->i_private;
+}
+/**
- eventfs_down_read - acquire read lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform read lock. Nested locking requires because
- lookup(), release() requires read lock, these could be called directly
- or from open(), remove() which already hold the read/write lock.
- */
+static void eventfs_down_read(struct rw_semaphore *eventfs_rwsem) +{
- down_read_nested(eventfs_rwsem, SINGLE_DEPTH_NESTING);
+}
+/**
- eventfs_up_read - release read lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to release eventfs_rwsem lock if locked
- */
+static void eventfs_up_read(struct rw_semaphore *eventfs_rwsem) +{
- up_read(eventfs_rwsem);
+}
+/**
- eventfs_down_write - acquire write lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform write lock on eventfs_rwsem
- */
+static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem) +{
- while (!down_write_trylock(eventfs_rwsem))
msleep(10);
What's this loop for? Something like that needs a very good explanation in a comment. Loops like these are usually a sign of a workaround for a bug in the design, or worse, simply hides an existing bug.
+}
+/**
- eventfs_up_write - release write lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform write lock on eventfs_rwsem
- */
+static void eventfs_up_write(struct rw_semaphore *eventfs_rwsem) +{
- up_write(eventfs_rwsem);
+}
+static const struct file_operations eventfs_file_operations = { +};
+static const struct inode_operations eventfs_root_dir_inode_operations = { +};
+/**
- eventfs_prepare_ef - helper function to prepare eventfs_file
- @name: a pointer to a string containing the name of the file/directory
to create.
- @mode: the permission that the file should have.
- @fop: a pointer to a struct file_operations that should be used for
this file/directory.
- @iop: a pointer to a struct inode_operations that should be used for
this file/directory.
- @data: a pointer to something that the caller will want to get to later
on. The inode.i_private pointer will point to this value on
the open() call.
- This function allocate the fill eventfs_file structure.
"allocates and fills the" ?
- */
+static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
const struct file_operations *fop,
const struct inode_operations *iop,
void *data)
+{
- struct eventfs_file *ef;
- ef = kzalloc(sizeof(*ef), GFP_KERNEL);
- if (!ef)
return ERR_PTR(-ENOMEM);
- ef->name = kstrdup(name, GFP_KERNEL);
- if (!ef->name) {
kfree(ef);
return ERR_PTR(-ENOMEM);
- }
- if (S_ISDIR(mode)) {
ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
if (!ef->ei) {
kfree(ef->name);
kfree(ef);
return ERR_PTR(-ENOMEM);
}
INIT_LIST_HEAD(&ef->ei->e_top_files);
- } else {
ef->ei = NULL;
- }
- ef->iop = iop;
- ef->fop = fop;
- ef->mode = mode;
- ef->data = data;
- ef->dentry = NULL;
- ef->d_parent = NULL;
- ef->created = false;
No need for the initialization to NULL or even the false, as the kzalloc() already did that.
- return ef;
+}
+/**
- eventfs_create_events_dir - create the trace event structure
- @name: a pointer to a string containing the name of the directory to
create.
You don't need to add "a pointer" we can see it's a pointer. Just say:
* @name: The name of the directory to create
Adding more makes it confusing to read.
- @parent: a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is NULL, then the
directory will be created in the root of the tracefs filesystem.
- @eventfs_rwsem: a pointer to rw_semaphore
Same with all the descriptions.
- This function creates the top of the trace event directory.
- */
+struct dentry *eventfs_create_events_dir(const char *name,
struct dentry *parent,
struct rw_semaphore *eventfs_rwsem)
OK, I'm going to have to really look at this. Passing in a lock to the API is just broken. We need to find a way to solve this another way.
I'm about to board a plane to JFK shortly, I'm hoping to play with this while flying back.
-- Steve
+{
- struct dentry *dentry = tracefs_start_creating(name, parent);
- struct eventfs_inode *ei;
- struct tracefs_inode *ti;
- struct inode *inode;
- if (IS_ERR(dentry))
return dentry;
- ei = kzalloc(sizeof(*ei), GFP_KERNEL);
- if (!ei)
return ERR_PTR(-ENOMEM);
- inode = tracefs_get_inode(dentry->d_sb);
- if (unlikely(!inode)) {
kfree(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
- }
- init_rwsem(eventfs_rwsem);
- INIT_LIST_HEAD(&ei->e_top_files);
- ti = get_tracefs(inode);
- ti->flags |= TRACEFS_EVENT_INODE;
- ti->private = ei;
- inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
- inode->i_op = &eventfs_root_dir_inode_operations;
- inode->i_fop = &eventfs_file_operations;
- inode->i_private = eventfs_rwsem;
- /* directory inodes start off with i_nlink == 2 (for "." entry) */
- inc_nlink(inode);
- d_instantiate(dentry, inode);
- inc_nlink(dentry->d_parent->d_inode);
- fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
- return tracefs_end_creating(dentry);
+}
On 01-Jul-2023, at 7:24 PM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
FYI, all subjects should start with a capital letter:
"eventfs: Implement eventfs dir creation functions"
On Thu, 1 Jun 2023 14:30:06 +0530 Ajay Kaher akaher@vmware.com wrote:
Adding eventfs_file structure which will hold properties of file or dir.
Adding following functions to add dir in eventfs:
eventfs_create_events_dir() directly creates events dir with-in
"within" is a proper word.
tracing folder.
eventfs_add_subsystem_dir() adds the information of subsystem_dir to eventfs and dynamically creates subsystem_dir as and when requires.
"as and when requires" does not make sense.
eventfs_add_dir() adds the information of dir (which is with-in
"within"
subsystem_dir) to eventfs and dynamically creates these dir as and when requires.
I'm guessing you want to say:
eventfs_add_dir() adds the information of the dir, within a subsystem_dir, to eventfs and dynamically creates these directories when they are accessed.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 272 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 29 +++++ kernel/trace/trace.h | 1 + 4 files changed, 303 insertions(+) create mode 100644 fs/tracefs/event_inode.c
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile index 7c35a282b..73c56da8e 100644 --- a/fs/tracefs/Makefile +++ b/fs/tracefs/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only tracefs-objs := inode.o +tracefs-objs += event_inode.o
obj-$(CONFIG_TRACING) += tracefs.o
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c new file mode 100644 index 000000000..a48ce23c0 --- /dev/null +++ b/fs/tracefs/event_inode.c @@ -0,0 +1,272 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- event_inode.c - part of tracefs, a pseudo file system for activating tracing
- Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) rostedt@goodmis.org
- Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher akaher@vmware.com
- eventfs is used to show trace events with one set of dentries
- eventfs stores meta-data of files/dirs and skip to create object of
- inodes/dentries. As and when requires, eventfs will create the
- inodes/dentries for only required files/directories. Also eventfs
- would delete the inodes/dentries once no more requires but preserve
- the meta data.
- */
+#include <linux/fsnotify.h> +#include <linux/fs.h> +#include <linux/namei.h> +#include <linux/security.h> +#include <linux/tracefs.h> +#include <linux/kref.h> +#include <linux/delay.h> +#include "internal.h"
+/**
- eventfs_dentry_to_rwsem - Return corresponding eventfs_rwsem
- @dentry: a pointer to dentry
- helper function to return crossponding eventfs_rwsem for given dentry
- */
+static struct rw_semaphore *eventfs_dentry_to_rwsem(struct dentry *dentry) +{
if (S_ISDIR(dentry->d_inode->i_mode))
return (struct rw_semaphore *)dentry->d_inode->i_private;
else
return (struct rw_semaphore *)dentry->d_parent->d_inode->i_private;
+}
+/**
- eventfs_down_read - acquire read lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform read lock. Nested locking requires because
- lookup(), release() requires read lock, these could be called directly
- or from open(), remove() which already hold the read/write lock.
- */
+static void eventfs_down_read(struct rw_semaphore *eventfs_rwsem) +{
down_read_nested(eventfs_rwsem, SINGLE_DEPTH_NESTING);
+}
+/**
- eventfs_up_read - release read lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to release eventfs_rwsem lock if locked
- */
+static void eventfs_up_read(struct rw_semaphore *eventfs_rwsem) +{
up_read(eventfs_rwsem);
+}
+/**
- eventfs_down_write - acquire write lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform write lock on eventfs_rwsem
- */
+static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem) +{
while (!down_write_trylock(eventfs_rwsem))
msleep(10);
What's this loop for? Something like that needs a very good explanation in a comment. Loops like these are usually a sign of a workaround for a bug in the design, or worse, simply hides an existing bug.
Yes correct, this logic is to solve deadlock:
Thread 1 Thread 2 down_read_nested() - read lock acquired down_write() - waiting for write lock to acquire down_read_nested() - deadlock
Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting. down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent deadlock scenario.
I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s upstreamed, tested and working in cifs, please refer: https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438
Looking further for your input. I will add explanation in v4.
+}
+/**
- eventfs_up_write - release write lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform write lock on eventfs_rwsem
- */
+static void eventfs_up_write(struct rw_semaphore *eventfs_rwsem) +{
up_write(eventfs_rwsem);
+}
+static const struct file_operations eventfs_file_operations = { +};
+static const struct inode_operations eventfs_root_dir_inode_operations = { +};
+/**
- eventfs_prepare_ef - helper function to prepare eventfs_file
- @name: a pointer to a string containing the name of the file/directory
to create.
- @mode: the permission that the file should have.
- @fop: a pointer to a struct file_operations that should be used for
this file/directory.
- @iop: a pointer to a struct inode_operations that should be used for
this file/directory.
- @data: a pointer to something that the caller will want to get to later
on. The inode.i_private pointer will point to this value on
the open() call.
- This function allocate the fill eventfs_file structure.
"allocates and fills the" ?
- */
+static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
const struct file_operations *fop,
const struct inode_operations *iop,
void *data)
+{
struct eventfs_file *ef;
ef = kzalloc(sizeof(*ef), GFP_KERNEL);
if (!ef)
return ERR_PTR(-ENOMEM);
ef->name = kstrdup(name, GFP_KERNEL);
if (!ef->name) {
kfree(ef);
return ERR_PTR(-ENOMEM);
}
if (S_ISDIR(mode)) {
ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
if (!ef->ei) {
kfree(ef->name);
kfree(ef);
return ERR_PTR(-ENOMEM);
}
INIT_LIST_HEAD(&ef->ei->e_top_files);
} else {
ef->ei = NULL;
}
ef->iop = iop;
ef->fop = fop;
ef->mode = mode;
ef->data = data;
ef->dentry = NULL;
ef->d_parent = NULL;
ef->created = false;
No need for the initialization to NULL or even the false, as the kzalloc() already did that.
return ef;
+}
+/**
- eventfs_create_events_dir - create the trace event structure
- @name: a pointer to a string containing the name of the directory to
create.
You don't need to add "a pointer" we can see it's a pointer. Just say:
- @name: The name of the directory to create
Adding more makes it confusing to read.
- @parent: a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is NULL, then the
directory will be created in the root of the tracefs filesystem.
- @eventfs_rwsem: a pointer to rw_semaphore
Same with all the descriptions.
- This function creates the top of the trace event directory.
- */
+struct dentry *eventfs_create_events_dir(const char *name,
struct dentry *parent,
struct rw_semaphore *eventfs_rwsem)
OK, I'm going to have to really look at this. Passing in a lock to the API is just broken. We need to find a way to solve this another way.
eventfs_rwsem is a member of struct trace_array, I guess we should pass pointer to trace_array.
I'm about to board a plane to JFK shortly, I'm hoping to play with this while flying back.
I have replied for major concerns. All other minor I will take care in v4.
Thanks a lot for giving time to eventfs patches.
- Ajay
-- Steve
+{
struct dentry *dentry = tracefs_start_creating(name, parent);
struct eventfs_inode *ei;
struct tracefs_inode *ti;
struct inode *inode;
if (IS_ERR(dentry))
return dentry;
ei = kzalloc(sizeof(*ei), GFP_KERNEL);
if (!ei)
return ERR_PTR(-ENOMEM);
inode = tracefs_get_inode(dentry->d_sb);
if (unlikely(!inode)) {
kfree(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
}
init_rwsem(eventfs_rwsem);
INIT_LIST_HEAD(&ei->e_top_files);
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
ti->private = ei;
inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
inode->i_private = eventfs_rwsem;
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
d_instantiate(dentry, inode);
inc_nlink(dentry->d_parent->d_inode);
fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
return tracefs_end_creating(dentry);
+}
!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
On Mon, 3 Jul 2023 10:13:22 +0000 Ajay Kaher akaher@vmware.com wrote:
+/**
- eventfs_down_write - acquire write lock function
- @eventfs_rwsem: a pointer to rw_semaphore
- helper function to perform write lock on eventfs_rwsem
- */
+static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem) +{
while (!down_write_trylock(eventfs_rwsem))
msleep(10);
What's this loop for? Something like that needs a very good explanation in a comment. Loops like these are usually a sign of a workaround for a bug in the design, or worse, simply hides an existing bug.
Yes correct, this logic is to solve deadlock:
Thread 1 Thread 2 down_read_nested() - read lock acquired down_write() - waiting for write lock to acquire down_read_nested() - deadlock
Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting. down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent deadlock scenario.
I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s upstreamed, tested and working in cifs, please refer: https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438
I just looked at that code and the commit, and I honestly believe that is a horrible hack, and very fragile. It's in the smb code, so it was unlikely reviewed by anyone outside that subsystem. I really do not want to prolificate that solution around the kernel. We need to come up with something else.
I also think it's buggy (yes the cifs code is buggy!) because in the comment above the down_read_nested() it says:
/* * nested locking. NOTE: rwsems are not allowed to recurse * (which occurs if the same task tries to acquire the same * lock instance multiple times), but multiple locks of the * same lock class might be taken, if the order of the locks * is always the same. This ordering rule can be expressed * to lockdep via the _nested() APIs, but enumerating the * subclasses that are used. (If the nesting relationship is * static then another method for expressing nested locking is * the explicit definition of lock class keys and the use of * lockdep_set_class() at lock initialization time. * See Documentation/locking/lockdep-design.rst for more details.) */
So this is NOT a solution (and the cifs code should be fixed too!)
Can you show me the exact backtrace where the reader lock gets taken again? We will have to come up with a way to not take the same lock twice.
We can also look to see if we can implement this with RCU. What exactly is this rwsem protecting?
Looking further for your input. I will add explanation in v4.
+}
[..]
- This function creates the top of the trace event directory.
- */
+struct dentry *eventfs_create_events_dir(const char *name,
struct dentry *parent,
struct rw_semaphore *eventfs_rwsem)
OK, I'm going to have to really look at this. Passing in a lock to the API is just broken. We need to find a way to solve this another way.
eventfs_rwsem is a member of struct trace_array, I guess we should pass pointer to trace_array.
No, it should not be part of the trace_array. If we can't do this with RCU, then we need to add a descriptor that contains the dentry that is returned above, and have the lock held there. The caller of the eventfs_create_events_dir() should not care about locking. That's an implementation detail that should *not* be part of the API.
That is, if you need a lock:
struct eventfs_dentry { struct dentry *dentry; struct rwsem *rwsem; };
And then get to that lock by using the container_of() macro. All created eventfs dentry's could have this structure, where the rwsem points to the top one. Again, that's only if we can't do this with RCU.
-- Steve
I'm about to board a plane to JFK shortly, I'm hoping to play with this while flying back.
I have replied for major concerns. All other minor I will take care in v4.
Thanks a lot for giving time to eventfs patches.
- Ajay
On 03-Jul-2023, at 8:38 PM, Steven Rostedt rostedt@goodmis.org wrote:
I just looked at that code and the commit, and I honestly believe that is a horrible hack, and very fragile. It's in the smb code, so it was unlikely reviewed by anyone outside that subsystem. I really do not want to prolificate that solution around the kernel. We need to come up with something else.
I also think it's buggy (yes the cifs code is buggy!) because in the comment above the down_read_nested() it says:
/*
- nested locking. NOTE: rwsems are not allowed to recurse
- (which occurs if the same task tries to acquire the same
- lock instance multiple times), but multiple locks of the
- same lock class might be taken, if the order of the locks
- is always the same. This ordering rule can be expressed
- to lockdep via the _nested() APIs, but enumerating the
- subclasses that are used. (If the nesting relationship is
- static then another method for expressing nested locking is
- the explicit definition of lock class keys and the use of
- lockdep_set_class() at lock initialization time.
- See Documentation/locking/lockdep-design.rst for more details.)
*/
So this is NOT a solution (and the cifs code should be fixed too!)
Can you show me the exact backtrace where the reader lock gets taken again? We will have to come up with a way to not take the same lock twice.
[ 244.185505] eventfs_root_lookup+0x37/0x1f0 <--- require read lock [ 244.185509] __lookup_slow+0x72/0x100 [ 244.185511] lookup_one_len+0x6a/0x70 [ 244.185513] eventfs_start_creating+0x58/0xd0 [ 244.185515] ? security_locked_down+0x2e/0x50 [ 244.185518] eventfs_create_file+0x57/0x150 [ 244.185521] dcache_dir_open_wrapper+0x1c6/0x260 <--- require read lock [ 244.185524] ? __pfx_dcache_dir_open_wrapper+0x10/0x10 [ 244.185526] do_dentry_open+0x1ed/0x420 [ 244.185529] vfs_open+0x2d/0x40
We can also look to see if we can implement this with RCU. What exactly is this rwsem protecting?
- struct eventfs_file holds the meta-data for file or dir. https://github.com/intel-lab-lkp/linux/blob/dfe0dc15a73261ed83cdc728e43f4b3d... - eventfs_rwsem is supposed to protect the 'link-list which is made of struct eventfs_file ' and elements of struct eventfs_file.
I tried one more solution i.e by checking owner of lock: static inline struct task_struct *rwsem_owner(struct rw_semaphore *sem) { return (struct task_struct *) (atomic_long_read(&sem->owner) & ~RWSEM_OWNER_FLAGS_MASK); }
But rwsem_owner() is static.
Looking further for your input. I will add explanation in v4.
+}
[..]
- This function creates the top of the trace event directory.
- */
+struct dentry *eventfs_create_events_dir(const char *name,
struct dentry *parent,
struct rw_semaphore *eventfs_rwsem)
OK, I'm going to have to really look at this. Passing in a lock to the API is just broken. We need to find a way to solve this another way.
eventfs_rwsem is a member of struct trace_array, I guess we should pass pointer to trace_array.
No, it should not be part of the trace_array. If we can't do this with RCU, then we need to add a descriptor that contains the dentry that is returned above, and have the lock held there. The caller of the eventfs_create_events_dir() should not care about locking. That's an implementation detail that should *not* be part of the API.
That is, if you need a lock:
struct eventfs_dentry { struct dentry *dentry; struct rwsem *rwsem; };
And then get to that lock by using the container_of() macro. All created eventfs dentry's could have this structure, where the rwsem points to the top one. Again, that's only if we can't do this with RCU.
Ok. Let’s first fix locking issue.
-Ajay
On Mon, 3 Jul 2023 18:51:22 +0000 Ajay Kaher akaher@vmware.com wrote:
We can also look to see if we can implement this with RCU. What exactly is this rwsem protecting?
- struct eventfs_file holds the meta-data for file or dir.
https://github.com/intel-lab-lkp/linux/blob/dfe0dc15a73261ed83cdc728e43f4b3d...
- eventfs_rwsem is supposed to protect the 'link-list which is made of struct eventfs_file
' and elements of struct eventfs_file.
RCU is usually the perfect solution for protecting link lists though. I'll take a look at this when I get back to work.
-- Steve
Adding following function to eventfs to add files:
eventfs_add_top_file() adds the information of top file to eventfs and dynamically creates these files as and when required.
eventfs_add_file() adds the information of nested files to eventfs and dynamically creates these dir as and when required.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com --- fs/tracefs/event_inode.c | 94 ++++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 8 ++++ 2 files changed, 102 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index a48ce23c0..17afb7476 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -270,3 +270,97 @@ struct eventfs_file *eventfs_add_dir(const char *name, eventfs_up_write(eventfs_rwsem); return ef; } + +/** + * eventfs_add_top_file - add event top file to list to create later + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is NULL, then the + * file will be created in the root of the tracefs filesystem. + * @data: a pointer to something that the caller will want to get to later + * on. The inode.i_private pointer will point to this value on + * the open() call. + * @fop: a pointer to a struct file_operations that should be used for + * this file. + * + * This function adds top files of event dir to list. + * And all these files are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +int eventfs_add_top_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct rw_semaphore *eventfs_rwsem; + + if (!parent) + return -EINVAL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + if (!parent->d_inode) + return -EINVAL; + + ti = get_tracefs(parent->d_inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); + + if (IS_ERR(ef)) + return -ENOMEM; + + eventfs_rwsem = (struct rw_semaphore *) parent->d_inode->i_private; + eventfs_down_write(eventfs_rwsem); + list_add_tail(&ef->list, &ei->e_top_files); + ef->d_parent = parent; + eventfs_up_write(eventfs_rwsem); + return 0; +} + +/** + * eventfs_add_file - add eventfs file to list to create later + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @ef_parent: a pointer to the parent eventfs_file for this file. + * @data: a pointer to something that the caller will want to get to later + * on. The inode.i_private pointer will point to this value on + * the open() call. + * @fop: a pointer to a struct file_operations that should be used for + * this file. + * + * This function adds top files of event dir to list. + * And all these files are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, + void *data, + const struct file_operations *fop) +{ + struct eventfs_file *ef; + struct rw_semaphore *eventfs_rwsem; + + if (!ef_parent) + return -EINVAL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); + if (IS_ERR(ef)) + return -ENOMEM; + + eventfs_rwsem = (struct rw_semaphore *) ef_parent->data; + eventfs_down_write(eventfs_rwsem); + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + ef->d_parent = ef_parent->dentry; + eventfs_up_write(eventfs_rwsem); + return 0; +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index aeca6761f..1e1780a61 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -50,6 +50,14 @@ struct eventfs_file *eventfs_add_dir(const char *name, struct eventfs_file *ef_parent, struct rw_semaphore *eventfs_rwsem);
+int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, void *data, + const struct file_operations *fops); + +int eventfs_add_top_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Adding eventfs_remove(), this function will recursively remove dir or file info from eventfs.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/ --- fs/tracefs/event_inode.c | 78 ++++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 4 +++ 2 files changed, 82 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 17afb7476..874ef88bd 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -364,3 +364,81 @@ int eventfs_add_file(const char *name, umode_t mode, eventfs_up_write(eventfs_rwsem); return 0; } + +/** + * eventfs_remove_rec - remove eventfs dir or file from list + * @ef: a pointer to eventfs_file to be removed. + * + * This function recursively remove eventfs_file which + * contains info of file or dir. + */ +static void eventfs_remove_rec(struct eventfs_file *ef) +{ + struct eventfs_file *ef_child, *n; + + if (!ef) + return; + + if (ef->ei) { + /* search for nested folders or files */ + list_for_each_entry_safe(ef_child, n, &ef->ei->e_top_files, list) { + eventfs_remove_rec(ef_child); + } + kfree(ef->ei); + } + + if (ef->created && ef->dentry) { + d_invalidate(ef->dentry); + dput(ef->dentry); + } + list_del(&ef->list); + kfree(ef->name); + kfree(ef); +} + +/** + * eventfs_remove - remove eventfs dir or file from list + * @ef: a pointer to eventfs_file to be removed. + * + * This function acquire the eventfs_rwsem lock and call eventfs_remove_rec() + */ +void eventfs_remove(struct eventfs_file *ef) +{ + struct rw_semaphore *eventfs_rwsem; + + if (!ef) + return; + + if (ef->ei) + eventfs_rwsem = (struct rw_semaphore *) ef->data; + else + eventfs_rwsem = (struct rw_semaphore *) ef->d_parent->d_inode->i_private; + + eventfs_down_write(eventfs_rwsem); + eventfs_remove_rec(ef); + eventfs_up_write(eventfs_rwsem); +} + +/** + * eventfs_remove_events_dir - remove eventfs dir or file from list + * @dentry: a pointer to events's dentry to be removed. + * + * This function remove events main directory + */ +void eventfs_remove_events_dir(struct dentry *dentry) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + + if (!dentry || !dentry->d_inode) + return; + + ti = get_tracefs(dentry->d_inode); + if (!ti || !(ti->flags & TRACEFS_EVENT_INODE)) + return; + + ei = ti->private; + d_invalidate(dentry); + dput(dentry); + kfree(ei); +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 1e1780a61..ea10ccc87 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -58,6 +58,10 @@ int eventfs_add_top_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
+void eventfs_remove(struct eventfs_file *ef); + +void eventfs_remove_events_dir(struct dentry *dentry); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Adding eventfs_create_file(), eventfs_create_dir() to create file, dir at runtime as and when requires.
These function will be called either from lookup of inode_operations or open of file_operations.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/event_inode.c | 125 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/inode.c | 47 +++++++++++++++ include/linux/tracefs.h | 6 ++ 3 files changed, 178 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 874ef88bd..0ac1913cf 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -83,6 +83,131 @@ static void eventfs_up_write(struct rw_semaphore *eventfs_rwsem) up_write(eventfs_rwsem); }
+/** + * eventfs_create_file - create a file in the tracefs filesystem + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is NULL, then the + * file will be created in the root of the tracefs filesystem. + * @data: a pointer to something that the caller will want to get to later + * on. The inode.i_private pointer will point to this value on + * the open() call. + * @fop: a pointer to a struct file_operations that should be used for + * this file. + * + * This is the basic "create a file" function for tracefs. It allows for a + * wide range of flexibility in creating a file. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the tracefs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If tracefs is not enabled in the kernel, the value -%ENODEV will be + * returned. + */ +static struct dentry *eventfs_create_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop) +{ + struct tracefs_inode *ti; + struct dentry *dentry; + struct inode *inode; + + if (security_locked_down(LOCKDOWN_TRACEFS)) + return NULL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + if (WARN_ON_ONCE(!S_ISREG(mode))) + return NULL; + + dentry = eventfs_start_creating(name, parent); + + if (IS_ERR(dentry)) + return dentry; + + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) + return eventfs_failed_creating(dentry); + + inode->i_mode = mode; + inode->i_fop = fop; + inode->i_private = data; + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + d_instantiate(dentry, inode); + fsnotify_create(dentry->d_parent->d_inode, dentry); + return eventfs_end_creating(dentry); +} + +/** + * eventfs_create_dir - create a dir in the tracefs filesystem + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is NULL, then the + * file will be created in the root of the tracefs filesystem. + * @data: a pointer to something that the caller will want to get to later + * on. The inode.i_private pointer will point to this value on + * the open() call. + * @fop: a pointer to a struct file_operations that should be used for + * this dir. + * @iop: a pointer to a struct inode_operations that should be used for + * this dir. + * + * This is the basic "create a dir" function for eventfs. It allows for a + * wide range of flexibility in creating a dir. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the tracefs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If tracefs is not enabled in the kernel, the value -%ENODEV will be + * returned. + */ +static struct dentry *eventfs_create_dir(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop, + const struct inode_operations *iop) +{ + struct tracefs_inode *ti; + struct dentry *dentry; + struct inode *inode; + + if (security_locked_down(LOCKDOWN_TRACEFS)) + return NULL; + + WARN_ON(!S_ISDIR(mode)); + + dentry = eventfs_start_creating(name, parent); + + if (IS_ERR(dentry)) + return dentry; + + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) + return eventfs_failed_creating(dentry); + + inode->i_mode = mode; + inode->i_op = iop; + inode->i_fop = fop; + inode->i_private = data; + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + + inc_nlink(inode); + d_instantiate(dentry, inode); + inc_nlink(dentry->d_parent->d_inode); + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); + return eventfs_end_creating(dentry); +} + static const struct file_operations eventfs_file_operations = { };
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 7df1752e8..66c4df734 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -452,6 +452,53 @@ struct dentry *tracefs_end_creating(struct dentry *dentry) return dentry; }
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) +{ + struct dentry *dentry; + int error; + + error = simple_pin_fs(&trace_fs_type, &tracefs_mount, + &tracefs_mount_count); + if (error) + return ERR_PTR(error); + + /* + * If the parent is not specified, we create it in the root. + * We need the root dentry to do this, which is in the super + * block. A pointer to that is in the struct vfsmount that we + * have around. + */ + if (!parent) + parent = tracefs_mount->mnt_root; + + if (unlikely(IS_DEADDIR(parent->d_inode))) + dentry = ERR_PTR(-ENOENT); + else + dentry = lookup_one_len(name, parent, strlen(name)); + + if (!IS_ERR(dentry) && dentry->d_inode) { + dput(dentry); + dentry = ERR_PTR(-EEXIST); + } + + if (IS_ERR(dentry)) + simple_release_fs(&tracefs_mount, &tracefs_mount_count); + + return dentry; +} + +struct dentry *eventfs_failed_creating(struct dentry *dentry) +{ + dput(dentry); + simple_release_fs(&tracefs_mount, &tracefs_mount_count); + return NULL; +} + +struct dentry *eventfs_end_creating(struct dentry *dentry) +{ + return dentry; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index ea10ccc87..57bfd1322 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -38,6 +38,12 @@ struct eventfs_file { bool created; };
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); + +struct dentry *eventfs_failed_creating(struct dentry *dentry); + +struct dentry *eventfs_end_creating(struct dentry *dentry); + struct dentry *eventfs_create_events_dir(const char *name, struct dentry *parent, struct rw_semaphore *eventfs_rwsem);
Adding following inode_operations, file_operations and helper functions to eventfs: dcache_dir_open_wrapper() eventfs_root_lookup() eventfs_release() eventfs_set_ef_status_free() eventfs_post_create_dir()
inode_operations, file_operations will be called from vfs.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/event_inode.c | 188 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 2 + 2 files changed, 190 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 0ac1913cf..d98ded15d 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -208,10 +208,198 @@ static struct dentry *eventfs_create_dir(const char *name, umode_t mode, return eventfs_end_creating(dentry); }
+/** + * eventfs_set_ef_status_free - set the ef->status to free + * @dentry: dentry who's status to be freed + * + * eventfs_set_ef_status_free will be called if no more + * reference remains + */ +void eventfs_set_ef_status_free(struct dentry *dentry) +{ + struct tracefs_inode *ti_parent; + struct eventfs_file *ef; + + ti_parent = get_tracefs(dentry->d_parent->d_inode); + if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) + return; + + ef = dentry->d_fsdata; + if (!ef) + return; + ef->created = false; + ef->dentry = NULL; +} + +/** + * eventfs_post_create_dir - post create dir routine + * @ef: eventfs_file of recently created dir + * + * Files with-in eventfs dir should know dentry of parent dir + */ +static void eventfs_post_create_dir(struct eventfs_file *ef) +{ + struct eventfs_file *ef_child; + struct tracefs_inode *ti; + + eventfs_down_read((struct rw_semaphore *) ef->data); + /* fill parent-child relation */ + list_for_each_entry(ef_child, &ef->ei->e_top_files, list) { + ef_child->d_parent = ef->dentry; + } + eventfs_up_read((struct rw_semaphore *) ef->data); + + ti = get_tracefs(ef->dentry->d_inode); + ti->private = ef->ei; +} + +/** + * eventfs_root_lookup - lookup routine to create file/dir + * @dir: directory in which lookup to be done + * @dentry: file/dir dentry + * @flags: + * + * Used to create dynamic file/dir with-in @dir, search with-in ei + * list, if @dentry found go ahead and create the file/dir + */ + +static struct dentry *eventfs_root_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *ret = NULL; + struct rw_semaphore *eventfs_rwsem; + + ti = get_tracefs(dir); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return NULL; + + ei = ti->private; + eventfs_rwsem = (struct rw_semaphore *) dir->i_private; + eventfs_down_read(eventfs_rwsem); + list_for_each_entry(ef, &ei->e_top_files, list) { + if (strcmp(ef->name, dentry->d_name.name)) + continue; + ret = simple_lookup(dir, dentry, flags); + if (ef->created) + continue; + ef->created = true; + if (ef->ei) + ef->dentry = eventfs_create_dir(ef->name, ef->mode, ef->d_parent, + ef->data, ef->fop, ef->iop); + else + ef->dentry = eventfs_create_file(ef->name, ef->mode, ef->d_parent, + ef->data, ef->fop); + + if (IS_ERR_OR_NULL(ef->dentry)) { + ef->created = false; + } else { + if (ef->ei) + eventfs_post_create_dir(ef); + ef->dentry->d_fsdata = ef; + dput(ef->dentry); + } + break; + } + eventfs_up_read(eventfs_rwsem); + return ret; +} + +/** + * eventfs_release - called to release eventfs file/dir + * @inode: inode to be released + * @file: file to be released (not used) + */ +static int eventfs_release(struct inode *inode, struct file *file) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *dentry = file_dentry(file); + struct rw_semaphore *eventfs_rwsem; + + ti = get_tracefs(inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + eventfs_rwsem = eventfs_dentry_to_rwsem(dentry); + eventfs_down_read(eventfs_rwsem); + list_for_each_entry(ef, &ei->e_top_files, list) { + if (ef->created) + dput(ef->dentry); + } + eventfs_up_read(eventfs_rwsem); + return dcache_dir_close(inode, file); +} + +/** + * dcache_dir_open_wrapper - eventfs open wrapper + * @inode: not used + * @file: dir to be opened (to create it's child) + * + * Used to dynamic create file/dir with-in @file, all the + * file/dir will be created. If already created then reference + * will be increased + */ +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct inode *f_inode = file_inode(file); + struct dentry *dentry = file_dentry(file); + struct rw_semaphore *eventfs_rwsem; + + ti = get_tracefs(f_inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + eventfs_rwsem = eventfs_dentry_to_rwsem(dentry); + eventfs_down_read(eventfs_rwsem); + list_for_each_entry(ef, &ei->e_top_files, list) { + if (ef->created) { + dget(ef->dentry); + continue; + } + + ef->created = true; + + inode_lock(dentry->d_inode); + if (ef->ei) + ef->dentry = eventfs_create_dir(ef->name, ef->mode, dentry, + ef->data, ef->fop, ef->iop); + else + ef->dentry = eventfs_create_file(ef->name, ef->mode, dentry, + ef->data, ef->fop); + inode_unlock(dentry->d_inode); + + if (IS_ERR_OR_NULL(ef->dentry)) { + ef->created = false; + } else { + if (ef->ei) + eventfs_post_create_dir(ef); + ef->dentry->d_fsdata = ef; + } + } + eventfs_up_read(eventfs_rwsem); + return dcache_dir_open(inode, file); +} + static const struct file_operations eventfs_file_operations = { + .open = dcache_dir_open_wrapper, + .read = generic_read_dir, + .iterate_shared = dcache_readdir, + .llseek = generic_file_llseek, + .release = eventfs_release, };
static const struct inode_operations eventfs_root_dir_inode_operations = { + .lookup = eventfs_root_lookup, };
/** diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 57bfd1322..268450d60 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -68,6 +68,8 @@ void eventfs_remove(struct eventfs_file *ef);
void eventfs_remove_events_dir(struct dentry *dentry);
+void eventfs_set_ef_status_free(struct dentry *dentry); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Creating tracefs_inode_cache which is a cache of tracefs_inode. Adding helping functions: tracefs_alloc_inode() tracefs_free_inode()
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 66c4df734..76820d3e9 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -24,11 +24,30 @@ #include "internal.h"
#define TRACEFS_DEFAULT_MODE 0700 +static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
static struct vfsmount *tracefs_mount; static int tracefs_mount_count; static bool tracefs_registered;
+static struct inode *tracefs_alloc_inode(struct super_block *sb) +{ + struct tracefs_inode *ti; + + ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL); + if (!ti) + return NULL; + + ti->flags = 0; + + return &ti->vfs_inode; +} + +static void tracefs_free_inode(struct inode *inode) +{ + kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); +} + static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) { @@ -347,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root) }
static const struct super_operations tracefs_super_operations = { + .alloc_inode = tracefs_alloc_inode, + .free_inode = tracefs_free_inode, + .drop_inode = generic_delete_inode, .statfs = simple_statfs, .remount_fs = tracefs_remount, .show_options = tracefs_show_options, @@ -676,10 +698,26 @@ bool tracefs_initialized(void) return tracefs_registered; }
+static void init_once(void *foo) +{ + struct tracefs_inode *ti = (struct tracefs_inode *) foo; + + inode_init_once(&ti->vfs_inode); +} + static int __init tracefs_init(void) { int retval;
+ tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache", + sizeof(struct tracefs_inode), + 0, (SLAB_RECLAIM_ACCOUNT| + SLAB_MEM_SPREAD| + SLAB_ACCOUNT), + init_once); + if (!tracefs_inode_cachep) + return -ENOMEM; + retval = sysfs_create_mount_point(kernel_kobj, "tracing"); if (retval) return -EINVAL;
Till now /sys/kernel/debug/tracing/events is a part of tracefs, with-in this patch creating 'events' and it's sub-dir as eventfs. Basically replacing tracefs calls with eventfs calls for 'events'.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 18 ++++++++++ include/linux/trace_events.h | 1 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 67 +++++++++++++++++++----------------- 4 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 76820d3e9..a098d7153 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, };
+static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) +{ + struct tracefs_inode *ti; + + if (!dentry || !inode) + return; + + ti = get_tracefs(inode); + if (ti && ti->flags & TRACEFS_EVENT_INODE) + eventfs_set_ef_status_free(dentry); + iput(inode); +} + +static const struct dentry_operations tracefs_dentry_operations = { + .d_iput = tracefs_dentry_iput, +}; + static int trace_fill_super(struct super_block *sb, void *data, int silent) { static const struct tree_descr trace_files[] = {{""}}; @@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent) goto fail;
sb->s_op = &tracefs_super_operations; + sb->s_d_op = &tracefs_dentry_operations;
tracefs_apply_options(sb, false);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 0e373222a..696843d46 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -635,6 +635,7 @@ struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter; + struct eventfs_file *ef; struct dentry *dir; struct trace_array *tr; struct trace_subsystem_dir *system; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index b895c3346..b265ae2df 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1310,7 +1310,7 @@ struct trace_subsystem_dir { struct list_head list; struct event_subsystem *subsystem; struct trace_array *tr; - struct dentry *entry; + struct eventfs_file *ef; int ref_count; int nr_events; }; diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 16bc5ba45..94aa6f9c9 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -988,7 +988,8 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) return;
if (!--dir->nr_events) { - tracefs_remove(dir->entry); + if (dir->ef) + eventfs_remove(dir->ef); list_del(&dir->list); __put_system_dir(dir); } @@ -1009,7 +1010,8 @@ static void remove_event_file_dir(struct trace_event_file *file)
tracefs_remove(dir); } - + if (file->ef) + eventfs_remove(file->ef); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); @@ -2295,13 +2297,13 @@ create_new_subsystem(const char *name) return NULL; }
-static struct dentry * +static struct eventfs_file * event_subsystem_dir(struct trace_array *tr, const char *name, struct trace_event_file *file, struct dentry *parent) { struct event_subsystem *system, *iter; struct trace_subsystem_dir *dir; - struct dentry *entry; + int res;
/* First see if we did not already create this dir */ list_for_each_entry(dir, &tr->systems, list) { @@ -2309,7 +2311,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name, if (strcmp(system->name, name) == 0) { dir->nr_events++; file->system = dir; - return dir->entry; + return dir->ef; } }
@@ -2333,8 +2335,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name, } else __get_system(system);
- dir->entry = tracefs_create_dir(name, parent); - if (!dir->entry) { + dir->ef = eventfs_add_subsystem_dir(name, parent, &tr->eventfs_rwsem); + if (IS_ERR(dir->ef)) { pr_warn("Failed to create system directory %s\n", name); __put_system(system); goto out_free; @@ -2349,22 +2351,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name, /* the ftrace system is special, do not create enable or filter files */ if (strcmp(name, "ftrace") != 0) {
- entry = tracefs_create_file("filter", TRACE_MODE_WRITE, - dir->entry, dir, + res = eventfs_add_file("filter", TRACE_MODE_WRITE, + dir->ef, dir, &ftrace_subsystem_filter_fops); - if (!entry) { + if (res) { kfree(system->filter); system->filter = NULL; pr_warn("Could not create tracefs '%s/filter' entry\n", name); }
- trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir, + eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir, &ftrace_system_enable_fops); }
list_add(&dir->list, &tr->systems);
- return dir->entry; + return dir->ef;
out_free: kfree(dir); @@ -2418,7 +2420,7 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) { struct trace_event_call *call = file->event_call; struct trace_array *tr = file->tr; - struct dentry *d_events; + struct eventfs_file *ef_subsystem = NULL; const char *name; int ret;
@@ -2430,24 +2432,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) return -ENODEV;
- d_events = event_subsystem_dir(tr, call->class->system, file, parent); - if (!d_events) + ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent); + if (!ef_subsystem) return -ENOMEM;
name = trace_event_name(call); - file->dir = tracefs_create_dir(name, d_events); - if (!file->dir) { + file->ef = eventfs_add_dir(name, ef_subsystem, &tr->eventfs_rwsem); + if (IS_ERR(file->ef)) { pr_warn("Could not create tracefs '%s' directory\n", name); return -1; }
if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) - trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file, + eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file, &ftrace_enable_fops);
#ifdef CONFIG_PERF_EVENTS if (call->event.type && call->class->reg) - trace_create_file("id", TRACE_MODE_READ, file->dir, + eventfs_add_file("id", TRACE_MODE_READ, file->ef, (void *)(long)call->event.type, &ftrace_event_id_fops); #endif @@ -2463,27 +2465,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) * triggers or filters. */ if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { - trace_create_file("filter", TRACE_MODE_WRITE, file->dir, + eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef, file, &ftrace_event_filter_fops);
- trace_create_file("trigger", TRACE_MODE_WRITE, file->dir, + eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, file, &event_trigger_fops); }
#ifdef CONFIG_HIST_TRIGGERS - trace_create_file("hist", TRACE_MODE_READ, file->dir, file, + eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file, &event_hist_fops); #endif #ifdef CONFIG_HIST_TRIGGERS_DEBUG - trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file, + eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file, &event_hist_debug_fops); #endif - trace_create_file("format", TRACE_MODE_READ, file->dir, call, + eventfs_add_file("format", TRACE_MODE_READ, file->ef, call, &ftrace_event_format_fops);
#ifdef CONFIG_TRACE_EVENT_INJECT if (call->event.type && call->class->reg) - trace_create_file("inject", 0200, file->dir, file, + eventfs_add_file("inject", 0200, file->ef, file, &event_inject_fops); #endif
@@ -3636,21 +3638,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) { struct dentry *d_events; struct dentry *entry; + int error = 0;
entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, tr, &ftrace_set_event_fops); if (!entry) return -ENOMEM;
- d_events = tracefs_create_dir("events", parent); - if (!d_events) { + d_events = eventfs_create_events_dir("events", parent, &tr->eventfs_rwsem); + if (IS_ERR(d_events)) { pr_warn("Could not create tracefs 'events' directory\n"); return -ENOMEM; }
- entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events, + error = eventfs_add_top_file("enable", TRACE_MODE_WRITE, d_events, tr, &ftrace_tr_enable_fops); - if (!entry) + if (error) return -ENOMEM;
/* There are not as crucial, just warn if they are not created */ @@ -3663,11 +3666,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) &ftrace_set_event_notrace_pid_fops);
/* ring buffer internal formats */ - trace_create_file("header_page", TRACE_MODE_READ, d_events, + eventfs_add_top_file("header_page", TRACE_MODE_READ, d_events, ring_buffer_print_page_header, &ftrace_show_header_fops);
- trace_create_file("header_event", TRACE_MODE_READ, d_events, + eventfs_add_top_file("header_event", TRACE_MODE_READ, d_events, ring_buffer_print_entry_header, &ftrace_show_header_fops);
@@ -3755,7 +3758,7 @@ int event_trace_del_tracer(struct trace_array *tr)
down_write(&trace_event_sem); __trace_remove_event_dirs(tr); - tracefs_remove(tr->event_dir); + eventfs_remove_events_dir(tr->event_dir); up_write(&trace_event_sem);
tr->event_dir = NULL;
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770e..523cfb645 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,14 @@ mips*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53..b9f8c3f8b 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,14 @@ loongarch*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
On Thu, 1 Jun 2023 14:30:13 +0530 Ajay Kaher akaher@vmware.com wrote:
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
This looks good to me.
Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org
Thanks,
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com
.../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770e..523cfb645 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,14 @@ mips*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53..b9f8c3f8b 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,14 @@ loongarch*) esac : "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace -- 2.40.0
On 01-Jun-2023, at 2:30 PM, Ajay Kaher akaher@vmware.com wrote:
Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
Steve, I have used nested rw-semaphore for eventfs locking (same as in cifs). As per Amit Nadav, this has to be revisited/reviewed. Please have a look and share your thoughts.
-Ajay
On 01-Jun-2023, at 2:30 PM, Ajay Kaher akaher@vmware.com wrote:
Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
v2: Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' Patch 02: moved from v1 1/9 Patch 03: moved from v1 2/9 As suggested by Zheng Yejian, introduced eventfs_prepare_ef() helper function to add files or directories to eventfs fix WARNING reported by kernel test robot in v1 8/9 Patch 04: moved from v1 3/9 used eventfs_prepare_ef() to add files fix WARNING reported by kernel test robot in v1 8/9 Patch 05: moved from v1 4/9 fix compiling warning reported by kernel test robot in v1 4/9 Patch 06: moved from v1 5/9 Patch 07: moved from v1 6/9 Patch 08: moved from v1 7/9 Patch 09: moved from v1 8/9 rebased because of v3 01/10 Patch 10: moved from v1 9/9
v1: Patch 1: add header file Patch 2: resolved kernel test robot issues protecting eventfs lists using nested eventfs_rwsem Patch 3: protecting eventfs lists using nested eventfs_rwsem Patch 4: improve events cleanup code to fix crashes Patch 5: resolved kernel test robot issues removed d_instantiate_anon() calls Patch 6: resolved kernel test robot issues fix kprobe test in eventfs_root_lookup() protecting eventfs lists using nested eventfs_rwsem Patch 7: remove header file Patch 8: pass eventfs_rwsem as argument to eventfs functions called eventfs_remove_events_dir() instead of tracefs_remove() from event_trace_del_tracer() Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 757 ++++++++++++++++++ fs/tracefs/inode.c | 124 ++- fs/tracefs/internal.h | 25 + include/linux/trace_events.h | 1 + include/linux/tracefs.h | 49 ++ kernel/trace/trace.h | 3 +- kernel/trace/trace_events.c | 78 +- .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +- .../test.d/kprobe/kprobe_args_string.tc | 4 +- 10 files changed, 994 insertions(+), 52 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
Hi Steve, below is the ftracetest results with v3 of eventfs:
root@photon-6 [ ~/linux-6.3-rc5/tools/testing/selftests/ftrace ]# ./ftracetest === Ftrace unit tests === [1] Basic trace file check [PASS] [2] Basic test for tracers [PASS] [3] Basic trace clock test [PASS] [4] Basic event tracing check [PASS] [5] Change the ringbuffer size [PASS] [6] Snapshot and tracing setting [UNSUPPORTED] [7] trace_pipe and trace_marker [PASS] [8] Test ftrace direct functions against tracers [UNRESOLVED] [9] Test ftrace direct functions against kprobes [UNRESOLVED] [10] Generic dynamic event - add/remove eprobe events [PASS] [11] Generic dynamic event - add/remove kprobe events [PASS] [12] Generic dynamic event - add/remove synthetic events [UNSUPPORTED] [13] Generic dynamic event - selective clear (compatibility) [UNSUPPORTED] [14] Event probe event parser error log check [PASS] [15] Generic dynamic event - generic clear event [UNSUPPORTED] [16] Generic dynamic event - check if duplicate events are caught [PASS] [17] event tracing - enable/disable with event level files [PASS] [18] event tracing - restricts events based on pid notrace filtering [PASS] [19] event tracing - restricts events based on pid [PASS] [20] event tracing - enable/disable with subsystem level files [PASS] [21] event tracing - enable/disable with top level files [PASS] [22] Test trace_printk from module [UNRESOLVED] [23] event filter function - test event filtering on functions [FAIL] [24] ftrace - function graph filters with stack tracer [UNSUPPORTED] [25] ftrace - function graph filters [PASS] [26] ftrace - function trace with cpumask [PASS] [27] ftrace - test for function event triggers [PASS] [28] ftrace - function glob filters [PASS] [29] ftrace - function pid notrace filters [PASS] [30] ftrace - function pid filters [PASS] [31] ftrace - stacktrace filter command [PASS] [32] ftrace - function trace on module [UNRESOLVED] [33] ftrace - function profiler with function tracing [UNSUPPORTED] [34] ftrace - function profiling [UNSUPPORTED] [35] ftrace - test reading of set_ftrace_filter [PASS] [36] ftrace - Max stack tracer [UNSUPPORTED] [37] ftrace - test for function traceon/off triggers [PASS] [38] ftrace - test tracing error log support [PASS] [39] Test creation and deletion of trace instances while setting an event [PASS] [40] Test creation and deletion of trace instances [PASS] [41] Kprobe dynamic event - adding and removing [PASS] [42] Kprobe dynamic event - busy event check [PASS] [43] Kprobe event char type argument [PASS] [44] Kprobe event with comm arguments [PASS] [45] Kprobe event string type argument [PASS] [46] Kprobe event symbol argument [PASS] [47] Kprobe event argument syntax [PASS] [48] Kprobe dynamic event with arguments [PASS] [49] Kprobes event arguments with types [PASS] [50] Kprobe event user-memory access [PASS] [51] Kprobe event auto/manual naming [PASS] [52] Kprobe dynamic event with function tracer [UNSUPPORTED] [53] Kprobe dynamic event - probing module [UNRESOLVED] [54] Create/delete multiprobe on kprobe event [PASS] [55] Kprobe event parser error log check [PASS] [56] Kretprobe dynamic event with arguments [PASS] [57] Kretprobe dynamic event with maxactive [PASS] [58] Kretprobe %return suffix test [PASS] [59] Register/unregister many kprobe events [PASS] [60] Kprobe events - probe points [FAIL] [61] Kprobe profile [PASS] [62] Uprobe event parser error log check [PASS] [63] test for the preemptirqsoff tracer [UNSUPPORTED] [64] Meta-selftest: Checkbashisms [UNRESOLVED] [65] Test wakeup RT tracer [UNSUPPORTED] [66] Test wakeup tracer [UNSUPPORTED] [67] event trigger - test inter-event histogram trigger expected fail actions [UNSUPPORTED] [68] event trigger - test field variable support [UNSUPPORTED] [69] event trigger - test inter-event combined histogram trigger [UNSUPPORTED] [70] event trigger - test multiple actions on hist trigger [UNSUPPORTED] [71] event trigger - test inter-event histogram trigger onchange action [UNSUPPORTED] [72] event trigger - test inter-event histogram trigger onmatch action [UNSUPPORTED] [73] event trigger - test inter-event histogram trigger onmatch-onmax action [UNSUPPORTED] [74] event trigger - test inter-event histogram trigger onmax action [UNSUPPORTED] [75] event trigger - test inter-event histogram trigger snapshot action [UNSUPPORTED] [76] event trigger - test inter-event histogram trigger eprobe on synthetic event [UNSUPPORTED] [77] event trigger - test synthetic event create remove [UNSUPPORTED] [78] event trigger - test inter-event histogram trigger trace action with dynamic string param [UNSUPPORTED] [79] event trigger - test inter-event histogram trigger trace action with dynamic string param [UNSUPPORTED] [80] event trigger - test synthetic_events syntax parser errors [UNSUPPORTED] [81] event trigger - test synthetic_events syntax parser [UNSUPPORTED] [82] event trigger - test inter-event histogram trigger trace action [UNSUPPORTED] [83] event trigger - test event enable/disable trigger [PASS] [84] event trigger - test trigger filter [PASS] [85] event trigger - test histogram expression parsing [UNSUPPORTED] [86] event trigger - test histogram modifiers [UNSUPPORTED] [87] event trigger - test histogram parser errors [UNSUPPORTED] [88] event trigger - test histogram trigger [UNSUPPORTED] [89] event trigger - test multiple histogram triggers [UNSUPPORTED] [90] event trigger - test snapshot-trigger [UNSUPPORTED] [91] event trigger - test stacktrace-trigger [PASS] [92] trace_marker trigger - test histogram trigger [UNSUPPORTED] [93] trace_marker trigger - test snapshot trigger [UNSUPPORTED] [94] trace_marker trigger - test histogram with synthetic event against kernel event [UNSUPPORTED] [95] trace_marker trigger - test histogram with synthetic event [UNSUPPORTED] [96] event trigger - test traceon/off trigger [PASS] [97] (instance) Basic test for tracers [PASS] [98] (instance) Basic trace clock test [PASS] [99] (instance) Change the ringbuffer size [PASS] [100] (instance) Snapshot and tracing setting [UNSUPPORTED] [101] (instance) trace_pipe and trace_marker [PASS] [102] (instance) event tracing - enable/disable with event level files [PASS] [103] (instance) event tracing - restricts events based on pid notrace filtering [PASS] [104] (instance) event tracing - restricts events based on pid [PASS] [105] (instance) event tracing - enable/disable with subsystem level files [PASS] [106] (instance) event filter function - test event filtering on functions [FAIL] [107] (instance) ftrace - test for function event triggers [PASS] [108] (instance) ftrace - function pid notrace filters [PASS] [109] (instance) ftrace - function pid filters [PASS] [110] (instance) ftrace - stacktrace filter command [PASS] [111] (instance) ftrace - test for function traceon/off triggers [PASS] [112] (instance) event trigger - test event enable/disable trigger [PASS] [113] (instance) event trigger - test trigger filter [PASS] [114] (instance) event trigger - test histogram modifiers [UNSUPPORTED] [115] (instance) event trigger - test histogram trigger [UNSUPPORTED] [116] (instance) event trigger - test multiple histogram triggers [UNSUPPORTED] [117] (instance) trace_marker trigger - test histogram trigger [UNSUPPORTED] [118] (instance) trace_marker trigger - test snapshot trigger [UNSUPPORTED]
# of passed: 65 # of failed: 3 # of unresolved: 6 # of untested: 0 # of unsupported: 44 # of xfailed: 0 # of undefined(test bug): 0
These results are same with/without eventfs.
-Ajay
On Mon, 19 Jun 2023 05:38:25 +0000 Ajay Kaher akaher@vmware.com wrote:
# of passed: 65 # of failed: 3
Unrelated to your patches, but have you checked why these fail? Do you have the latest tests running on the latest kernel?
# of unresolved: 6 # of untested: 0 # of unsupported: 44 # of xfailed: 0 # of undefined(test bug): 0
These results are same with/without eventfs.
I'm hoping to look at these patches this week.
Thanks!
-- Steve
On 20-Jun-2023, at 8:32 PM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Mon, 19 Jun 2023 05:38:25 +0000 Ajay Kaher akaher@vmware.com wrote:
# of passed: 65 # of failed: 3
Unrelated to your patches, but have you checked why these fail? Do you have the latest tests running on the latest kernel?
Failed test got passed after enabling /proc/kallsyms, using: echo 0 > /proc/sys/kernel/kptr_restrict
Following is the report of ftracetest on Linux v6.4.0-rc7 (with/without eventfs): # of passed: 68 # of failed: 0 # of unresolved: 6 # of untested: 0 # of unsupported: 45 # of xfailed: 0 # of undefined(test bug): 0
If lockdep is enabled getting same warning as reported by 'kernel test robot' for v3 09/10: https://lore.kernel.org/all/1686640004-47546-1-git-send-email-akaher@vmware....
# of unresolved: 6 # of untested: 0 # of unsupported: 44 # of xfailed: 0 # of undefined(test bug): 0
These results are same with/without eventfs.
I'm hoping to look at these patches this week.
Yes, please. Thanks.
-Ajay
On Wed, 21 Jun 2023 11:42:24 +0000 Ajay Kaher akaher@vmware.com wrote:
On 20-Jun-2023, at 8:32 PM, Steven Rostedt rostedt@goodmis.org wrote:
!! External Email
On Mon, 19 Jun 2023 05:38:25 +0000 Ajay Kaher akaher@vmware.com wrote:
# of passed: 65 # of failed: 3
Unrelated to your patches, but have you checked why these fail? Do you have the latest tests running on the latest kernel?
Failed test got passed after enabling /proc/kallsyms, using: echo 0 > /proc/sys/kernel/kptr_restrict
Oh, interesting. It should be 'unresolved' (skipped) if that happens.
[23] event filter function - test event filtering on functions [FAIL] [60] Kprobe events - probe points [FAIL] [106] (instance) event filter function - test event filtering on functions [FAIL]
OK, let me see.
Thanks for reporting!
Following is the report of ftracetest on Linux v6.4.0-rc7 (with/without eventfs): # of passed: 68 # of failed: 0 # of unresolved: 6 # of untested: 0 # of unsupported: 45 # of xfailed: 0 # of undefined(test bug): 0
If lockdep is enabled getting same warning as reported by 'kernel test robot' for v3 09/10: https://lore.kernel.org/all/1686640004-47546-1-git-send-email-akaher@vmware....
# of unresolved: 6 # of untested: 0 # of unsupported: 44 # of xfailed: 0 # of undefined(test bug): 0
These results are same with/without eventfs.
I'm hoping to look at these patches this week.
Yes, please. Thanks.
-Ajay
linux-kselftest-mirror@lists.linaro.org