The patch titled
Subject: nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-hang-in-nilfs_lookup_dirty_data_buffers.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
Date: Wed, 31 Jan 2024 23:56:57 +0900
Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
creation and was trying to lock the folio. Thus causing a deadlock.
In the first place, it is unexpected that folios/pages in the middle of
writeback will be updated and become dirty. Nilfs2 adds a checksum to
verify the validity of the log being written and uses it for recovery at
mount, so data changes during writeback are suppressed. Since this is
broken, an unclean shutdown could potentially cause recovery to fail.
Investigation revealed that the root cause is that the wait for writeback
completion in nilfs_page_mkwrite() is conditional, and if the backing
device does not require stable writes, data may be modified without
waiting.
Fix these issues by making nilfs_page_mkwrite() wait for writeback to
finish regardless of the stable write requirement of the backing device.
Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com
Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+ee2ae68da3b22d04cd8d(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/file.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/file.c~nilfs2-fix-hang-in-nilfs_lookup_dirty_data_buffers
+++ a/fs/nilfs2/file.c
@@ -107,7 +107,13 @@ static vm_fault_t nilfs_page_mkwrite(str
nilfs_transaction_commit(inode->i_sb);
mapped:
- folio_wait_stable(folio);
+ /*
+ * Since checksumming including data blocks is performed to determine
+ * the validity of the log to be written and used for recovery, it is
+ * necessary to wait for writeback to finish here, regardless of the
+ * stable write requirement of the backing device.
+ */
+ folio_wait_writeback(folio);
out:
sb_end_pagefault(inode->i_sb);
return vmf_fs_error(ret);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-data-corruption-in-dsync-block-recovery-for-small-block-sizes.patch
nilfs2-fix-hang-in-nilfs_lookup_dirty_data_buffers.patch
nilfs2-convert-segment-buffer-to-use-kmap_local.patch
nilfs2-convert-nilfs_copy_buffer-to-use-kmap_local.patch
nilfs2-convert-metadata-file-common-code-to-use-kmap_local.patch
nilfs2-convert-sufile-to-use-kmap_local.patch
nilfs2-convert-persistent-object-allocator-to-use-kmap_local.patch
nilfs2-convert-dat-to-use-kmap_local.patch
nilfs2-move-nilfs_bmap_write-call-out-of-nilfs_write_inode_common.patch
nilfs2-do-not-acquire-rwsem-in-nilfs_bmap_write.patch
nilfs2-convert-ifile-to-use-kmap_local.patch
nilfs2-localize-highmem-mapping-for-checkpoint-creation-within-cpfile.patch
nilfs2-localize-highmem-mapping-for-checkpoint-finalization-within-cpfile.patch
nilfs2-localize-highmem-mapping-for-checkpoint-reading-within-cpfile.patch
nilfs2-remove-nilfs_cpfile_getput_checkpoint.patch
nilfs2-convert-cpfile-to-use-kmap_local.patch
Hi Dmitry,
There have been multiple reports that the keyboard on
Dell XPS 13 9350 / 9360 / 9370 models has stopped working after
a suspend/resume after the merging of commit 936e4d49ecbc ("Input:
atkbd - skip ATKBD_CMD_GETID in translated mode").
See the 4 closes tags in the first patch for 4 reports of this.
I have been working with the first reporter on resolving this
and testing on his Dell XPS 13 9360 confirms that these patches
fix things.
Unfortunately the commit causing the issue has also been picked
up by multiple stable kernel series now. Can you please send
these fixes to Linus ASAP, so that they can also be backported
to the stable series ASAP ?
Alternatively we could revert the commit causing this, but that
commit is know to fix issues on a whole bunch of other laptops
so I would rather not revert it.
Regards,
Hans
Hans de Goede (2):
Input: atkbd - Skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
Input: atkbd - Do not skip atkbd_deactivate() when skipping
ATKBD_CMD_GETID
drivers/input/keyboard/atkbd.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
--
2.43.0
From: Dan Carpenter <dan.carpenter(a)linaro.org>
commit c301f0981fdd3fd1ffac6836b423c4d7a8e0eb63 upstream.
The problem is in nft_byteorder_eval() where we are iterating through a
loop and writing to dst[0], dst[1], dst[2] and so on... On each
iteration we are writing 8 bytes. But dst[] is an array of u32 so each
element only has space for 4 bytes. That means that every iteration
overwrites part of the previous element.
I spotted this bug while reviewing commit caf3ef7468f7 ("netfilter:
nf_tables: prevent OOB access in nft_byteorder_eval") which is a related
issue. I think that the reason we have not detected this bug in testing
is that most of time we only write one element.
Fixes: ce1e7989d989 ("netfilter: nft_byteorder: provide 64bit le/be conversion")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Ajay: Modified to apply on v5.10.y]
Signed-off-by: Ajay Kaher <ajay.kaher(a)broadcom.com>
---
include/net/netfilter/nf_tables.h | 4 ++--
net/netfilter/nft_byteorder.c | 5 +++--
net/netfilter/nft_meta.c | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 2237657..2da11d8 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -142,9 +142,9 @@ static inline u16 nft_reg_load16(const u32 *sreg)
return *(u16 *)sreg;
}
-static inline void nft_reg_store64(u32 *dreg, u64 val)
+static inline void nft_reg_store64(u64 *dreg, u64 val)
{
- put_unaligned(val, (u64 *)dreg);
+ put_unaligned(val, dreg);
}
static inline u64 nft_reg_load64(const u32 *sreg)
diff --git a/net/netfilter/nft_byteorder.c b/net/netfilter/nft_byteorder.c
index 7b0b8fe..9d250bd 100644
--- a/net/netfilter/nft_byteorder.c
+++ b/net/netfilter/nft_byteorder.c
@@ -38,20 +38,21 @@ void nft_byteorder_eval(const struct nft_expr *expr,
switch (priv->size) {
case 8: {
+ u64 *dst64 = (void *)dst;
u64 src64;
switch (priv->op) {
case NFT_BYTEORDER_NTOH:
for (i = 0; i < priv->len / 8; i++) {
src64 = nft_reg_load64(&src[i]);
- nft_reg_store64(&dst[i], be64_to_cpu(src64));
+ nft_reg_store64(&dst64[i], be64_to_cpu(src64));
}
break;
case NFT_BYTEORDER_HTON:
for (i = 0; i < priv->len / 8; i++) {
src64 = (__force __u64)
cpu_to_be64(nft_reg_load64(&src[i]));
- nft_reg_store64(&dst[i], src64);
+ nft_reg_store64(&dst64[i], src64);
}
break;
}
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 44d9b38..cb5bb0e 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -63,7 +63,7 @@ nft_meta_get_eval_time(enum nft_meta_keys key,
{
switch (key) {
case NFT_META_TIME_NS:
- nft_reg_store64(dest, ktime_get_real_ns());
+ nft_reg_store64((u64 *)dest, ktime_get_real_ns());
break;
case NFT_META_TIME_DAY:
nft_reg_store8(dest, nft_meta_weekday());
--
2.7.4
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The directory link count in eventfs was somewhat bogus. It was only being
updated when a directory child was being looked up and not on creation.
One solution would be to update in get_attr() the link count by iterating
the ei->children list and then adding 2. But that could slow down simple
stat() calls, especially if it's done on all directories in eventfs.
Another solution would be to add a parent pointer to the eventfs_inode
and keep track of the number of sub directories it has on creation. But
this adds overhead for something not really worthwhile.
The solution decided upon is to keep all directory links in eventfs as 1.
This tells user space not to rely on the hard links of directories. Which
in this case it shouldn't.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher(a)broadcom.com>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 9e031e5a2713..110e8a272189 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -404,9 +404,7 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry,
dentry->d_fsdata = get_ei(ei);
- inc_nlink(inode);
d_add(dentry, inode);
- inc_nlink(dentry->d_parent->d_inode);
return NULL;
}
@@ -769,9 +767,17 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
dentry->d_fsdata = get_ei(ei);
- /* directory inodes start off with i_nlink == 2 (for "." entry) */
- inc_nlink(inode);
+ /*
+ * Keep all eventfs directories with i_nlink == 1.
+ * Due to the dynamic nature of the dentry creations and not
+ * wanting to add a pointer to the parent eventfs_inode in the
+ * eventfs_inode structure, keeping the i_nlink in sync with the
+ * number of directories would cause too much complexity for
+ * something not worth much. Keeping directory links at 1
+ * tells userspace not to trust the link number.
+ */
d_instantiate(dentry, inode);
+ /* The dentry of the "events" parent does keep track though */
inc_nlink(dentry->d_parent->d_inode);
fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
tracefs_end_creating(dentry);
--
2.43.0
From: Linus Torvalds <torvalds(a)linux-foundation.org>
The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer. That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.
There were two reasonms why eventfs tried to keep a pointer to a dentry:
- the creation of a 'events' directory would actually have a stable
dentry pointer that it created with tracefs_start_creating().
And it needed that dentry when tearing it all down again in
eventfs_remove_events_dir().
This use is actually ok, because the special top-level events
directory dentries are actually stable, not just a temporary cache of
the eventfs data structures.
- the 'eventfs_inode' (aka ei) needs to stay around as long as there
are dentries that refer to it.
It then used these dentry pointers as a replacement for doing
reference counting: it would try to make sure that there was only
ever one dentry associated with an event_inode, and keep a child
dentry array around to see which dentries might still refer to the
parent ei.
This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.
The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei. As
does the directory dentries. That makes the broken use case go away.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.san…
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher(a)broadcom.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 248 ++++++++++++---------------------------
fs/tracefs/internal.h | 7 +-
2 files changed, 78 insertions(+), 177 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index b2285d5f3fed..515fdace1eea 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -62,6 +62,35 @@ enum {
#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1)
+/*
+ * eventfs_inode reference count management.
+ *
+ * NOTE! We count only references from dentries, in the
+ * form 'dentry->d_fsdata'. There are also references from
+ * directory inodes ('ti->private'), but the dentry reference
+ * count is always a superset of the inode reference count.
+ */
+static void release_ei(struct kref *ref)
+{
+ struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
+ kfree(ei->entry_attrs);
+ kfree_const(ei->name);
+ kfree_rcu(ei, rcu);
+}
+
+static inline void put_ei(struct eventfs_inode *ei)
+{
+ if (ei)
+ kref_put(&ei->kref, release_ei);
+}
+
+static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
+{
+ if (ei)
+ kref_get(&ei->kref);
+ return ei;
+}
+
static struct dentry *eventfs_root_lookup(struct inode *dir,
struct dentry *dentry,
unsigned int flags);
@@ -289,7 +318,8 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode,
* directory. The inode.i_private pointer will point to @data in the open()
* call.
*/
-static struct dentry *lookup_file(struct dentry *dentry,
+static struct dentry *lookup_file(struct eventfs_inode *parent_ei,
+ struct dentry *dentry,
umode_t mode,
struct eventfs_attr *attr,
void *data,
@@ -302,7 +332,7 @@ static struct dentry *lookup_file(struct dentry *dentry,
mode |= S_IFREG;
if (WARN_ON_ONCE(!S_ISREG(mode)))
- return NULL;
+ return ERR_PTR(-EIO);
inode = tracefs_get_inode(dentry->d_sb);
if (unlikely(!inode))
@@ -321,9 +351,12 @@ static struct dentry *lookup_file(struct dentry *dentry,
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
+ // Files have their parent's ei as their fsdata
+ dentry->d_fsdata = get_ei(parent_ei);
+
d_add(dentry, inode);
fsnotify_create(dentry->d_parent->d_inode, dentry);
- return dentry;
+ return NULL;
};
/**
@@ -359,22 +392,29 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry,
/* Only directories have ti->private set to an ei, not files */
ti->private = ei;
- dentry->d_fsdata = ei;
- ei->dentry = dentry; // Remove me!
+ dentry->d_fsdata = get_ei(ei);
inc_nlink(inode);
d_add(dentry, inode);
inc_nlink(dentry->d_parent->d_inode);
fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
- return dentry;
+ return NULL;
}
-static void free_ei(struct eventfs_inode *ei)
+static inline struct eventfs_inode *alloc_ei(const char *name)
{
- kfree_const(ei->name);
- kfree(ei->d_children);
- kfree(ei->entry_attrs);
- kfree(ei);
+ struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+
+ if (!ei)
+ return NULL;
+
+ ei->name = kstrdup_const(name, GFP_KERNEL);
+ if (!ei->name) {
+ kfree(ei);
+ return NULL;
+ }
+ kref_init(&ei->kref);
+ return ei;
}
/**
@@ -385,39 +425,13 @@ static void free_ei(struct eventfs_inode *ei)
*/
void eventfs_d_release(struct dentry *dentry)
{
- struct eventfs_inode *ei;
- int i;
-
- mutex_lock(&eventfs_mutex);
-
- ei = dentry->d_fsdata;
- if (!ei)
- goto out;
-
- /* This could belong to one of the files of the ei */
- if (ei->dentry != dentry) {
- for (i = 0; i < ei->nr_entries; i++) {
- if (ei->d_children[i] == dentry)
- break;
- }
- if (WARN_ON_ONCE(i == ei->nr_entries))
- goto out;
- ei->d_children[i] = NULL;
- } else if (ei->is_freed) {
- free_ei(ei);
- } else {
- ei->dentry = NULL;
- }
-
- dentry->d_fsdata = NULL;
- out:
- mutex_unlock(&eventfs_mutex);
+ put_ei(dentry->d_fsdata);
}
/**
* lookup_file_dentry - create a dentry for a file of an eventfs_inode
* @ei: the eventfs_inode that the file will be created under
- * @idx: the index into the d_children[] of the @ei
+ * @idx: the index into the entry_attrs[] of the @ei
* @parent: The parent dentry of the created file.
* @name: The name of the file to create
* @mode: The mode of the file.
@@ -434,17 +448,11 @@ lookup_file_dentry(struct dentry *dentry,
const struct file_operations *fops)
{
struct eventfs_attr *attr = NULL;
- struct dentry **e_dentry = &ei->d_children[idx];
if (ei->entry_attrs)
attr = &ei->entry_attrs[idx];
- dentry->d_fsdata = ei; // NOTE: ei of _parent_
- lookup_file(dentry, mode, attr, data, fops);
-
- *e_dentry = dentry; // Remove me
-
- return dentry;
+ return lookup_file(ei, dentry, mode, attr, data, fops);
}
/**
@@ -465,6 +473,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
struct tracefs_inode *ti;
struct eventfs_inode *ei;
const char *name = dentry->d_name.name;
+ struct dentry *result = NULL;
ti = get_tracefs(dir);
if (!(ti->flags & TRACEFS_EVENT_INODE))
@@ -481,7 +490,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
continue;
if (ei_child->is_freed)
goto out;
- lookup_dir_entry(dentry, ei, ei_child);
+ result = lookup_dir_entry(dentry, ei, ei_child);
goto out;
}
@@ -498,12 +507,12 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
if (entry->callback(name, &mode, &data, &fops) <= 0)
goto out;
- lookup_file_dentry(dentry, ei, i, mode, data, fops);
+ result = lookup_file_dentry(dentry, ei, i, mode, data, fops);
goto out;
}
out:
mutex_unlock(&eventfs_mutex);
- return NULL;
+ return result;
}
/*
@@ -653,25 +662,10 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
if (!parent)
return ERR_PTR(-EINVAL);
- ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+ ei = alloc_ei(name);
if (!ei)
return ERR_PTR(-ENOMEM);
- ei->name = kstrdup_const(name, GFP_KERNEL);
- if (!ei->name) {
- kfree(ei);
- return ERR_PTR(-ENOMEM);
- }
-
- if (size) {
- ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL);
- if (!ei->d_children) {
- kfree_const(ei->name);
- kfree(ei);
- return ERR_PTR(-ENOMEM);
- }
- }
-
ei->entries = entries;
ei->nr_entries = size;
ei->data = data;
@@ -685,7 +679,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
/* Was the parent freed? */
if (list_empty(&ei->list)) {
- free_ei(ei);
+ put_ei(ei);
ei = NULL;
}
return ei;
@@ -720,28 +714,20 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
if (IS_ERR(dentry))
return ERR_CAST(dentry);
- ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+ ei = alloc_ei(name);
if (!ei)
- goto fail_ei;
+ goto fail;
inode = tracefs_get_inode(dentry->d_sb);
if (unlikely(!inode))
goto fail;
- if (size) {
- ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL);
- if (!ei->d_children)
- goto fail;
- }
-
- ei->dentry = dentry;
+ // Note: we have a ref to the dentry from tracefs_start_creating()
+ ei->events_dir = dentry;
ei->entries = entries;
ei->nr_entries = size;
ei->is_events = 1;
ei->data = data;
- ei->name = kstrdup_const(name, GFP_KERNEL);
- if (!ei->name)
- goto fail;
/* Save the ownership of this directory */
uid = d_inode(dentry->d_parent)->i_uid;
@@ -772,7 +758,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
- dentry->d_fsdata = ei;
+ dentry->d_fsdata = get_ei(ei);
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
@@ -784,72 +770,11 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
return ei;
fail:
- kfree(ei->d_children);
- kfree(ei);
- fail_ei:
+ put_ei(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
}
-static LLIST_HEAD(free_list);
-
-static void eventfs_workfn(struct work_struct *work)
-{
- struct eventfs_inode *ei, *tmp;
- struct llist_node *llnode;
-
- llnode = llist_del_all(&free_list);
- llist_for_each_entry_safe(ei, tmp, llnode, llist) {
- /* This dput() matches the dget() from unhook_dentry() */
- for (int i = 0; i < ei->nr_entries; i++) {
- if (ei->d_children[i])
- dput(ei->d_children[i]);
- }
- /* This should only get here if it had a dentry */
- if (!WARN_ON_ONCE(!ei->dentry))
- dput(ei->dentry);
- }
-}
-
-static DECLARE_WORK(eventfs_work, eventfs_workfn);
-
-static void free_rcu_ei(struct rcu_head *head)
-{
- struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu);
-
- if (ei->dentry) {
- /* Do not free the ei until all references of dentry are gone */
- if (llist_add(&ei->llist, &free_list))
- queue_work(system_unbound_wq, &eventfs_work);
- return;
- }
-
- /* If the ei doesn't have a dentry, neither should its children */
- for (int i = 0; i < ei->nr_entries; i++) {
- WARN_ON_ONCE(ei->d_children[i]);
- }
-
- free_ei(ei);
-}
-
-static void unhook_dentry(struct dentry *dentry)
-{
- if (!dentry)
- return;
- /*
- * Need to add a reference to the dentry that is expected by
- * simple_recursive_removal(), which will include a dput().
- */
- dget(dentry);
-
- /*
- * Also add a reference for the dput() in eventfs_workfn().
- * That is required as that dput() will free the ei after
- * the SRCU grace period is over.
- */
- dget(dentry);
-}
-
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -862,8 +787,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
{
struct eventfs_inode *ei_child;
- if (!ei)
- return;
/*
* Check recursion depth. It should never be greater than 3:
* 0 - events/
@@ -875,28 +798,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
return;
/* search for nested folders or files */
- list_for_each_entry_srcu(ei_child, &ei->children, list,
- lockdep_is_held(&eventfs_mutex)) {
- /* Children only have dentry if parent does */
- WARN_ON_ONCE(ei_child->dentry && !ei->dentry);
+ list_for_each_entry(ei_child, &ei->children, list)
eventfs_remove_rec(ei_child, level + 1);
- }
-
ei->is_freed = 1;
-
- for (int i = 0; i < ei->nr_entries; i++) {
- if (ei->d_children[i]) {
- /* Children only have dentry if parent does */
- WARN_ON_ONCE(!ei->dentry);
- unhook_dentry(ei->d_children[i]);
- }
- }
-
- unhook_dentry(ei->dentry);
-
- list_del_rcu(&ei->list);
- call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
+ list_del(&ei->list);
+ put_ei(ei);
}
/**
@@ -907,22 +814,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
*/
void eventfs_remove_dir(struct eventfs_inode *ei)
{
- struct dentry *dentry;
-
if (!ei)
return;
mutex_lock(&eventfs_mutex);
- dentry = ei->dentry;
eventfs_remove_rec(ei, 0);
mutex_unlock(&eventfs_mutex);
-
- /*
- * If any of the ei children has a dentry, then the ei itself
- * must have a dentry.
- */
- if (dentry)
- simple_recursive_removal(dentry, NULL);
}
/**
@@ -935,7 +832,11 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
{
struct dentry *dentry;
- dentry = ei->dentry;
+ dentry = ei->events_dir;
+ if (!dentry)
+ return;
+
+ ei->events_dir = NULL;
eventfs_remove_dir(ei);
/*
@@ -945,5 +846,6 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
* sticks around while the other ei->dentry are created
* and destroyed dynamically.
*/
+ d_invalidate(dentry);
dput(dentry);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 4b50a0668055..1886f1826cd8 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -35,8 +35,7 @@ struct eventfs_attr {
* @entries: the array of entries representing the files in the directory
* @name: the name of the directory to create
* @children: link list into the child eventfs_inode
- * @dentry: the dentry of the directory
- * @d_children: The array of dentries to represent the files when created
+ * @events_dir: the dentry of the events directory
* @entry_attrs: Saved mode and ownership of the @d_children
* @attr: Saved mode and ownership of eventfs_inode itself
* @data: The private data to pass to the callbacks
@@ -45,12 +44,12 @@ struct eventfs_attr {
* @nr_entries: The number of items in @entries
*/
struct eventfs_inode {
+ struct kref kref;
struct list_head list;
const struct eventfs_entry *entries;
const char *name;
struct list_head children;
- struct dentry *dentry; /* Check is_freed to access */
- struct dentry **d_children;
+ struct dentry *events_dir;
struct eventfs_attr *entry_attrs;
struct eventfs_attr attr;
void *data;
--
2.43.0
From: Linus Torvalds <torvalds(a)linux-foundation.org>
In order for the dentries to stay up-to-date with the eventfs changes,
just add a 'd_revalidate' function that checks the 'is_freed' bit.
Also, clean up the dentry release to actually use d_release() rather
than the slightly odd d_iput() function. We don't care about the inode,
all we want to do is to get rid of the refcount to the eventfs data
added by dentry->d_fsdata.
It would probably be cleaner to make eventfs its own filesystem, or at
least set its own dentry ops when looking up eventfs files. But as it
is, only eventfs dentries use d_fsdata, so we don't really need to split
these things up by use.
Another thing that might be worth doing is to make all eventfs lookups
mark their dentries as not worth caching. We could do that with
d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
As it is, the dentries are all freeable, but they only tend to get freed
at memory pressure rather than more proactively. But that's a separate
issue.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.san…
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher(a)broadcom.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 5 ++---
fs/tracefs/inode.c | 27 ++++++++++++++++++---------
fs/tracefs/internal.h | 3 ++-
3 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 16ca8d9759b1..b2285d5f3fed 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -378,13 +378,12 @@ static void free_ei(struct eventfs_inode *ei)
}
/**
- * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode
- * @ti: the tracefs_inode of the dentry
+ * eventfs_d_release - dentry is going away
* @dentry: dentry which has the reference to remove.
*
* Remove the association between a dentry from an eventfs_inode.
*/
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
+void eventfs_d_release(struct dentry *dentry)
{
struct eventfs_inode *ei;
int i;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 5c84460feeeb..d65ffad4c327 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -377,21 +377,30 @@ static const struct super_operations tracefs_super_operations = {
.show_options = tracefs_show_options,
};
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+/*
+ * It would be cleaner if eventfs had its own dentry ops.
+ *
+ * Note that d_revalidate is called potentially under RCU,
+ * so it can't take the eventfs mutex etc. It's fine - if
+ * we open a file just as it's marked dead, things will
+ * still work just fine, and just see the old stale case.
+ */
+static void tracefs_d_release(struct dentry *dentry)
{
- struct tracefs_inode *ti;
+ if (dentry->d_fsdata)
+ eventfs_d_release(dentry);
+}
- if (!dentry || !inode)
- return;
+static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+ struct eventfs_inode *ei = dentry->d_fsdata;
- ti = get_tracefs(inode);
- if (ti && ti->flags & TRACEFS_EVENT_INODE)
- eventfs_set_ei_status_free(ti, dentry);
- iput(inode);
+ return !(ei && ei->is_freed);
}
static const struct dentry_operations tracefs_dentry_operations = {
- .d_iput = tracefs_dentry_iput,
+ .d_revalidate = tracefs_d_revalidate,
+ .d_release = tracefs_d_release,
};
static int trace_fill_super(struct super_block *sb, void *data, int silent)
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 932733a2696a..4b50a0668055 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -78,6 +78,7 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
struct dentry *tracefs_end_creating(struct dentry *dentry);
struct dentry *tracefs_failed_creating(struct dentry *dentry);
struct inode *tracefs_get_inode(struct super_block *sb);
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
+
+void eventfs_d_release(struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
--
2.43.0