The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 46ba0e49b64232adac35a2bc892f1710c5b0fb7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061330-custody-resolved-131a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
46ba0e49b642 ("bpf: fix multi-uprobe PID filtering logic")
f17d1a18a3dd ("selftests/bpf: Add more uprobe multi fail tests")
0d83786f5661 ("selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46ba0e49b64232adac35a2bc892f1710c5b0fb7f Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko <andrii(a)kernel.org>
Date: Tue, 21 May 2024 09:33:57 -0700
Subject: [PATCH] bpf: fix multi-uprobe PID filtering logic
Current implementation of PID filtering logic for multi-uprobes in
uprobe_prog_run() is filtering down to exact *thread*, while the intent
for PID filtering it to filter by *process* instead. The check in
uprobe_prog_run() also differs from the analogous one in
uprobe_multi_link_filter() for some reason. The latter is correct,
checking task->mm, not the task itself.
Fix the check in uprobe_prog_run() to perform the same task->mm check.
While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
type of lookup, given the intent is to get a representative task of an
entire process. This doesn't change behavior, but seems more logical. It
would hold task group leader task now, not any random thread task.
Last but not least, given multi-uprobe support is half-broken due to
this PID filtering logic (depending on whether PID filtering is
important or not), we need to make it easy for user space consumers
(including libbpf) to easily detect whether PID filtering logic was
already fixed.
We do it here by adding an early check on passed pid parameter. If it's
negative (and so has no chance of being a valid PID), we return -EINVAL.
Previous behavior would eventually return -ESRCH ("No process found"),
given there can't be any process with negative PID. This subtle change
won't make any practical change in behavior, but will allow applications
to detect PID filtering fixes easily. Libbpf fixes take advantage of
this in the next patch.
Cc: stable(a)vger.kernel.org
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Fixes: b733eeade420 ("bpf: Add pid filter support for uprobe_multi link")
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f5154c051d2c..1baaeb9ca205 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3295,7 +3295,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
struct bpf_run_ctx *old_run_ctx;
int err = 0;
- if (link->task && current != link->task)
+ if (link->task && current->mm != link->task->mm)
return 0;
if (sleepable)
@@ -3396,8 +3396,9 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
upath = u64_to_user_ptr(attr->link_create.uprobe_multi.path);
uoffsets = u64_to_user_ptr(attr->link_create.uprobe_multi.offsets);
cnt = attr->link_create.uprobe_multi.cnt;
+ pid = attr->link_create.uprobe_multi.pid;
- if (!upath || !uoffsets || !cnt)
+ if (!upath || !uoffsets || !cnt || pid < 0)
return -EINVAL;
if (cnt > MAX_UPROBE_MULTI_CNT)
return -E2BIG;
@@ -3421,10 +3422,9 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
goto error_path_put;
}
- pid = attr->link_create.uprobe_multi.pid;
if (pid) {
rcu_read_lock();
- task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+ task = get_pid_task(find_vpid(pid), PIDTYPE_TGID);
rcu_read_unlock();
if (!task) {
err = -ESRCH;
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
index 8269cdee33ae..38fda42fd70f 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
@@ -397,7 +397,7 @@ static void test_attach_api_fails(void)
link_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_UPROBE_MULTI, &opts);
if (!ASSERT_ERR(link_fd, "link_fd"))
goto cleanup;
- ASSERT_EQ(link_fd, -ESRCH, "pid_is_wrong");
+ ASSERT_EQ(link_fd, -EINVAL, "pid_is_wrong");
cleanup:
if (link_fd >= 0)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fb33eb2ef0d88e75564983ef057b44c5b7e4fded
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061357-december-gaming-f1a1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
fb33eb2ef0d8 ("btrfs: fix leak of qgroup extent records after transaction abort")
99f09ce309b8 ("btrfs: make btrfs_destroy_delayed_refs() return void")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb33eb2ef0d88e75564983ef057b44c5b7e4fded Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 3 Jun 2024 12:49:08 +0100
Subject: [PATCH] btrfs: fix leak of qgroup extent records after transaction
abort
Qgroup extent records are created when delayed ref heads are created and
then released after accounting extents at btrfs_qgroup_account_extents(),
called during the transaction commit path.
If a transaction is aborted we free the qgroup records by calling
btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
unless we don't have delayed references. We are incorrectly assuming
that no delayed references means we don't have qgroup extents records.
We can currently have no delayed references because we ran them all
during a transaction commit and the transaction was aborted after that
due to some error in the commit path.
So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
btrfs_destroy_delayed_refs() even if we don't have any delayed references.
Reported-by: syzbot+0fecc032fa134afd49df(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort")
CC: stable(a)vger.kernel.org # 6.1+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a91a8056758a..242ada7e47b4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4538,18 +4538,10 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
struct btrfs_fs_info *fs_info)
{
struct rb_node *node;
- struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
struct btrfs_delayed_ref_node *ref;
- delayed_refs = &trans->delayed_refs;
-
spin_lock(&delayed_refs->lock);
- if (atomic_read(&delayed_refs->num_entries) == 0) {
- spin_unlock(&delayed_refs->lock);
- btrfs_debug(fs_info, "delayed_refs has NO entry");
- return;
- }
-
while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
struct btrfs_delayed_ref_head *head;
struct rb_node *n;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1fa7603d569b9e738e9581937ba8725cd7d39b48
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061307-gab-underfoot-f7db@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1fa7603d569b ("btrfs: qgroup: update rescan message levels and error codes")
182940f4f4db ("btrfs: qgroup: add new quota mode for simple quotas")
6b0cd63bc75c ("btrfs: qgroup: introduce quota mode")
515020900d44 ("btrfs: read raid stripe tree from disk")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1fa7603d569b9e738e9581937ba8725cd7d39b48 Mon Sep 17 00:00:00 2001
From: David Sterba <dsterba(a)suse.com>
Date: Thu, 2 May 2024 22:45:58 +0200
Subject: [PATCH] btrfs: qgroup: update rescan message levels and error codes
On filesystems without enabled quotas there's still a warning message in
the logs when rescan is called. In that case it's not a problem that
should be reported, rescan can be called unconditionally. Change the
error code to ENOTCONN which is used for 'quotas not enabled' elsewhere.
Remove message (also a warning) when rescan is called during an ongoing
rescan, this brings no useful information and the error code is
sufficient.
Change message levels to debug for now, they can be removed eventually.
CC: stable(a)vger.kernel.org # 6.6+
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index eb28141d5c37..f93354a96909 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3820,14 +3820,14 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
/* we're resuming qgroup rescan at mount time */
if (!(fs_info->qgroup_flags &
BTRFS_QGROUP_STATUS_FLAG_RESCAN)) {
- btrfs_warn(fs_info,
+ btrfs_debug(fs_info,
"qgroup rescan init failed, qgroup rescan is not queued");
ret = -EINVAL;
} else if (!(fs_info->qgroup_flags &
BTRFS_QGROUP_STATUS_FLAG_ON)) {
- btrfs_warn(fs_info,
+ btrfs_debug(fs_info,
"qgroup rescan init failed, qgroup is not enabled");
- ret = -EINVAL;
+ ret = -ENOTCONN;
}
if (ret)
@@ -3838,14 +3838,12 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
if (init_flags) {
if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
- btrfs_warn(fs_info,
- "qgroup rescan is already in progress");
ret = -EINPROGRESS;
} else if (!(fs_info->qgroup_flags &
BTRFS_QGROUP_STATUS_FLAG_ON)) {
- btrfs_warn(fs_info,
+ btrfs_debug(fs_info,
"qgroup rescan init failed, qgroup is not enabled");
- ret = -EINVAL;
+ ret = -ENOTCONN;
} else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) {
/* Quota disable is in progress */
ret = -EBUSY;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 27c046484382d78b4abb0a6e9905a20121af9b35
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061310-exemplify-snore-c1b0@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
27c046484382 ("tracefs: Update inode permissions on remount")
baa23a8d4360 ("tracefs: Reset permissions on remount if permissions are options")
8dce06e98c70 ("eventfs: Clean up dentry ops and add revalidate function")
49304c2b93e4 ("tracefs: dentry lookup crapectomy")
4fa4b010b83f ("eventfs: Initialize the tracefs inode properly")
d81786f53aec ("tracefs: Zero out the tracefs_inode when allocating it")
29142dc92c37 ("tracefs: remove stale 'update_gid' code")
8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
b0f7e2d739b4 ("eventfs: Remove "lookup" parameter from create_dir/file_dentry()")
ad579864637a ("tracefs: Check for dentry->d_inode exists in set_gid()")
7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership")
0dfc852b6fe3 ("eventfs: Have event files and directories default to parent uid and gid")
5eaf7f0589c0 ("eventfs: Fix events beyond NAME_MAX blocking tasks")
f49f950c217b ("eventfs: Make sure that parent->d_inode is locked in creating files/dirs")
fc4561226fea ("eventfs: Do not allow NULL parent to eventfs_start_creating()")
bcae32c5632f ("eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()")
71cade82f2b5 ("eventfs: Do not invalidate dentry in create_file/dir_dentry()")
88903daecacf ("eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL")
44365329f821 ("eventfs: Hold eventfs_mutex when calling callback functions")
28e12c09f5aa ("eventfs: Save ownership and mode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 27c046484382d78b4abb0a6e9905a20121af9b35 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 23 May 2024 01:14:27 -0400
Subject: [PATCH] tracefs: Update inode permissions on remount
When a remount happens, if a gid or uid is specified update the inodes to
have the same gid and uid. This will allow the simplification of the
permissions logic for the dynamically created files and directories.
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.592429986@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 55a40a730b10..5dfb1ccd56ea 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -317,20 +317,29 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
if (!ei)
return;
- if (update_uid)
+ if (update_uid) {
ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = ti->vfs_inode.i_uid;
+ }
- if (update_gid)
+
+ if (update_gid) {
ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = ti->vfs_inode.i_gid;
+ }
if (!ei->entry_attrs)
return;
for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid)
+ if (update_uid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- if (update_gid)
+ ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
+ }
+ if (update_gid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
+ }
}
}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index a827f6a716c4..9252e0d78ea2 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -373,12 +373,21 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
rcu_read_lock();
list_for_each_entry_rcu(ti, &tracefs_inodes, list) {
- if (update_uid)
+ if (update_uid) {
ti->flags &= ~TRACEFS_UID_PERM_SET;
+ ti->vfs_inode.i_uid = fsi->uid;
+ }
- if (update_gid)
+ if (update_gid) {
ti->flags &= ~TRACEFS_GID_PERM_SET;
+ ti->vfs_inode.i_gid = fsi->gid;
+ }
+ /*
+ * Note, the above ti->vfs_inode updates are
+ * used in eventfs_remount() so they must come
+ * before calling it.
+ */
if (ti->flags & TRACEFS_EVENT_INODE)
eventfs_remount(ti, update_uid, update_gid);
}
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 27c046484382d78b4abb0a6e9905a20121af9b35
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061309-pliable-outcast-adb7@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
27c046484382 ("tracefs: Update inode permissions on remount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 27c046484382d78b4abb0a6e9905a20121af9b35 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 23 May 2024 01:14:27 -0400
Subject: [PATCH] tracefs: Update inode permissions on remount
When a remount happens, if a gid or uid is specified update the inodes to
have the same gid and uid. This will allow the simplification of the
permissions logic for the dynamically created files and directories.
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.592429986@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 55a40a730b10..5dfb1ccd56ea 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -317,20 +317,29 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
if (!ei)
return;
- if (update_uid)
+ if (update_uid) {
ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = ti->vfs_inode.i_uid;
+ }
- if (update_gid)
+
+ if (update_gid) {
ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = ti->vfs_inode.i_gid;
+ }
if (!ei->entry_attrs)
return;
for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid)
+ if (update_uid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- if (update_gid)
+ ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
+ }
+ if (update_gid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
+ }
}
}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index a827f6a716c4..9252e0d78ea2 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -373,12 +373,21 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
rcu_read_lock();
list_for_each_entry_rcu(ti, &tracefs_inodes, list) {
- if (update_uid)
+ if (update_uid) {
ti->flags &= ~TRACEFS_UID_PERM_SET;
+ ti->vfs_inode.i_uid = fsi->uid;
+ }
- if (update_gid)
+ if (update_gid) {
ti->flags &= ~TRACEFS_GID_PERM_SET;
+ ti->vfs_inode.i_gid = fsi->gid;
+ }
+ /*
+ * Note, the above ti->vfs_inode updates are
+ * used in eventfs_remount() so they must come
+ * before calling it.
+ */
if (ti->flags & TRACEFS_EVENT_INODE)
eventfs_remount(ti, update_uid, update_gid);
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 340f0c7067a95281ad13734f8225f49c6cf52067
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061306-handed-oyster-2002@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
27c046484382 ("tracefs: Update inode permissions on remount")
baa23a8d4360 ("tracefs: Reset permissions on remount if permissions are options")
8dce06e98c70 ("eventfs: Clean up dentry ops and add revalidate function")
49304c2b93e4 ("tracefs: dentry lookup crapectomy")
4fa4b010b83f ("eventfs: Initialize the tracefs inode properly")
d81786f53aec ("tracefs: Zero out the tracefs_inode when allocating it")
29142dc92c37 ("tracefs: remove stale 'update_gid' code")
8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
b0f7e2d739b4 ("eventfs: Remove "lookup" parameter from create_dir/file_dentry()")
ad579864637a ("tracefs: Check for dentry->d_inode exists in set_gid()")
7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership")
0dfc852b6fe3 ("eventfs: Have event files and directories default to parent uid and gid")
5eaf7f0589c0 ("eventfs: Fix events beyond NAME_MAX blocking tasks")
f49f950c217b ("eventfs: Make sure that parent->d_inode is locked in creating files/dirs")
fc4561226fea ("eventfs: Do not allow NULL parent to eventfs_start_creating()")
bcae32c5632f ("eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()")
71cade82f2b5 ("eventfs: Do not invalidate dentry in create_file/dir_dentry()")
88903daecacf ("eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL")
44365329f821 ("eventfs: Hold eventfs_mutex when calling callback functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 340f0c7067a95281ad13734f8225f49c6cf52067 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 23 May 2024 01:14:28 -0400
Subject: [PATCH] eventfs: Update all the eventfs_inodes from the events
descriptor
The change to update the permissions of the eventfs_inode had the
misconception that using the tracefs_inode would find all the
eventfs_inodes that have been updated and reset them on remount.
The problem with this approach is that the eventfs_inodes are freed when
they are no longer used (basically the reason the eventfs system exists).
When they are freed, the updated eventfs_inodes are not reset on a remount
because their tracefs_inodes have been freed.
Instead, since the events directory eventfs_inode always has a
tracefs_inode pointing to it (it is not freed when finished), and the
events directory has a link to all its children, have the
eventfs_remount() function only operate on the events eventfs_inode and
have it descend into its children updating their uid and gids.
Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVw…
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 5dfb1ccd56ea..129d0f54ba62 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -305,6 +305,45 @@ static const struct file_operations eventfs_file_operations = {
.llseek = generic_file_llseek,
};
+static void eventfs_set_attrs(struct eventfs_inode *ei, bool update_uid, kuid_t uid,
+ bool update_gid, kgid_t gid, int level)
+{
+ struct eventfs_inode *ei_child;
+
+ /* Update events/<system>/<event> */
+ if (WARN_ON_ONCE(level > 3))
+ return;
+
+ if (update_uid) {
+ ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = uid;
+ }
+
+ if (update_gid) {
+ ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = gid;
+ }
+
+ list_for_each_entry(ei_child, &ei->children, list) {
+ eventfs_set_attrs(ei_child, update_uid, uid, update_gid, gid, level + 1);
+ }
+
+ if (!ei->entry_attrs)
+ return;
+
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (update_uid) {
+ ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
+ ei->entry_attrs[i].uid = uid;
+ }
+ if (update_gid) {
+ ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = gid;
+ }
+ }
+
+}
+
/*
* On a remount of tracefs, if UID or GID options are set, then
* the mount point inode permissions should be used.
@@ -314,33 +353,12 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
{
struct eventfs_inode *ei = ti->private;
- if (!ei)
+ /* Only the events directory does the updates */
+ if (!ei || !ei->is_events || ei->is_freed)
return;
- if (update_uid) {
- ei->attr.mode &= ~EVENTFS_SAVE_UID;
- ei->attr.uid = ti->vfs_inode.i_uid;
- }
-
-
- if (update_gid) {
- ei->attr.mode &= ~EVENTFS_SAVE_GID;
- ei->attr.gid = ti->vfs_inode.i_gid;
- }
-
- if (!ei->entry_attrs)
- return;
-
- for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid) {
- ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
- }
- if (update_gid) {
- ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
- ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
- }
- }
+ eventfs_set_attrs(ei, update_uid, ti->vfs_inode.i_uid,
+ update_gid, ti->vfs_inode.i_gid, 0);
}
/* Return the evenfs_inode of the "events" directory */
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 340f0c7067a95281ad13734f8225f49c6cf52067
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061305-kilogram-handheld-7bd9@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
27c046484382 ("tracefs: Update inode permissions on remount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 340f0c7067a95281ad13734f8225f49c6cf52067 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 23 May 2024 01:14:28 -0400
Subject: [PATCH] eventfs: Update all the eventfs_inodes from the events
descriptor
The change to update the permissions of the eventfs_inode had the
misconception that using the tracefs_inode would find all the
eventfs_inodes that have been updated and reset them on remount.
The problem with this approach is that the eventfs_inodes are freed when
they are no longer used (basically the reason the eventfs system exists).
When they are freed, the updated eventfs_inodes are not reset on a remount
because their tracefs_inodes have been freed.
Instead, since the events directory eventfs_inode always has a
tracefs_inode pointing to it (it is not freed when finished), and the
events directory has a link to all its children, have the
eventfs_remount() function only operate on the events eventfs_inode and
have it descend into its children updating their uid and gids.
Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVw…
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 5dfb1ccd56ea..129d0f54ba62 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -305,6 +305,45 @@ static const struct file_operations eventfs_file_operations = {
.llseek = generic_file_llseek,
};
+static void eventfs_set_attrs(struct eventfs_inode *ei, bool update_uid, kuid_t uid,
+ bool update_gid, kgid_t gid, int level)
+{
+ struct eventfs_inode *ei_child;
+
+ /* Update events/<system>/<event> */
+ if (WARN_ON_ONCE(level > 3))
+ return;
+
+ if (update_uid) {
+ ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = uid;
+ }
+
+ if (update_gid) {
+ ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = gid;
+ }
+
+ list_for_each_entry(ei_child, &ei->children, list) {
+ eventfs_set_attrs(ei_child, update_uid, uid, update_gid, gid, level + 1);
+ }
+
+ if (!ei->entry_attrs)
+ return;
+
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (update_uid) {
+ ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
+ ei->entry_attrs[i].uid = uid;
+ }
+ if (update_gid) {
+ ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = gid;
+ }
+ }
+
+}
+
/*
* On a remount of tracefs, if UID or GID options are set, then
* the mount point inode permissions should be used.
@@ -314,33 +353,12 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
{
struct eventfs_inode *ei = ti->private;
- if (!ei)
+ /* Only the events directory does the updates */
+ if (!ei || !ei->is_events || ei->is_freed)
return;
- if (update_uid) {
- ei->attr.mode &= ~EVENTFS_SAVE_UID;
- ei->attr.uid = ti->vfs_inode.i_uid;
- }
-
-
- if (update_gid) {
- ei->attr.mode &= ~EVENTFS_SAVE_GID;
- ei->attr.gid = ti->vfs_inode.i_gid;
- }
-
- if (!ei->entry_attrs)
- return;
-
- for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid) {
- ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
- }
- if (update_gid) {
- ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
- ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
- }
- }
+ eventfs_set_attrs(ei, update_uid, ti->vfs_inode.i_uid,
+ update_gid, ti->vfs_inode.i_gid, 0);
}
/* Return the evenfs_inode of the "events" directory */
Sometimes errors are seen, when doing DR swap, like:
[ 24.672481] ucsi-stm32g0-i2c 0-0035: UCSI_GET_PDOS failed (-5)
[ 24.720188] ucsi-stm32g0-i2c 0-0035: ucsi_handle_connector_change:
GET_CONNECTOR_STATUS failed (-5)
There may be some race, which lead to read CCI, before the command complete
flag is set, hence returning -EIO. Similar fix has been done also in
ucsi_acpi [1].
In case of a spurious or otherwise delayed notification it is
possible that CCI still reports the previous completion. The
UCSI spec is aware of this and provides two completion bits in
CCI, one for normal commands and one for acks. As acks and commands
alternate the notification handler can determine if the completion
bit is from the current command.
To fix this add the ACK_PENDING bit for ucsi_stm32g0 and only complete
commands if the completion bit matches.
[1] https://lore.kernel.org/lkml/20240121204123.275441-3-lk@c--e.de/
Fixes: 72849d4fcee7 ("usb: typec: ucsi: stm32g0: add support for stm32g0 controller")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)foss.st.com>
---
Changes in v2: rebase and define ACK_PENDING as commented by Dmitry.
---
drivers/usb/typec/ucsi/ucsi_stm32g0.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/ucsi/ucsi_stm32g0.c b/drivers/usb/typec/ucsi/ucsi_stm32g0.c
index ac48b7763114..ac69288e8bb0 100644
--- a/drivers/usb/typec/ucsi/ucsi_stm32g0.c
+++ b/drivers/usb/typec/ucsi/ucsi_stm32g0.c
@@ -65,6 +65,7 @@ struct ucsi_stm32g0 {
struct device *dev;
unsigned long flags;
#define COMMAND_PENDING 1
+#define ACK_PENDING 2
const char *fw_name;
struct ucsi *ucsi;
bool suspended;
@@ -396,9 +397,13 @@ static int ucsi_stm32g0_sync_write(struct ucsi *ucsi, unsigned int offset, const
size_t len)
{
struct ucsi_stm32g0 *g0 = ucsi_get_drvdata(ucsi);
+ bool ack = UCSI_COMMAND(*(u64 *)val) == UCSI_ACK_CC_CI;
int ret;
- set_bit(COMMAND_PENDING, &g0->flags);
+ if (ack)
+ set_bit(ACK_PENDING, &g0->flags);
+ else
+ set_bit(COMMAND_PENDING, &g0->flags);
ret = ucsi_stm32g0_async_write(ucsi, offset, val, len);
if (ret)
@@ -406,9 +411,14 @@ static int ucsi_stm32g0_sync_write(struct ucsi *ucsi, unsigned int offset, const
if (!wait_for_completion_timeout(&g0->complete, msecs_to_jiffies(5000)))
ret = -ETIMEDOUT;
+ else
+ return 0;
out_clear_bit:
- clear_bit(COMMAND_PENDING, &g0->flags);
+ if (ack)
+ clear_bit(ACK_PENDING, &g0->flags);
+ else
+ clear_bit(COMMAND_PENDING, &g0->flags);
return ret;
}
@@ -429,8 +439,9 @@ static irqreturn_t ucsi_stm32g0_irq_handler(int irq, void *data)
if (UCSI_CCI_CONNECTOR(cci))
ucsi_connector_change(g0->ucsi, UCSI_CCI_CONNECTOR(cci));
- if (test_bit(COMMAND_PENDING, &g0->flags) &&
- cci & (UCSI_CCI_ACK_COMPLETE | UCSI_CCI_COMMAND_COMPLETE))
+ if (cci & UCSI_CCI_ACK_COMPLETE && test_and_clear_bit(ACK_PENDING, &g0->flags))
+ complete(&g0->complete);
+ if (cci & UCSI_CCI_COMMAND_COMPLETE && test_and_clear_bit(COMMAND_PENDING, &g0->flags))
complete(&g0->complete);
return IRQ_HANDLED;
--
2.25.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 13df4d44a3aaabe61cd01d277b6ee23ead2a5206
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061335-payee-pamphlet-09d5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
13df4d44a3aa ("ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()")
57341fe3179c ("ext4: refactor out ext4_generic_attr_show()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13df4d44a3aaabe61cd01d277b6ee23ead2a5206 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Tue, 19 Mar 2024 19:33:20 +0800
Subject: [PATCH] ext4: fix slab-out-of-bounds in
ext4_mb_find_good_group_avg_frag_lists()
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G
mount /dev/$disk /tmp/test
echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
echo test > /tmp/test/file && sync
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
Call Trace:
dump_stack_lvl+0x2c/0x50
kasan_report+0xb6/0xf0
ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
ext4_mb_new_blocks+0x88a/0x1370 [ext4]
ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
ext4_map_blocks+0x569/0xea0 [ext4]
ext4_do_writepages+0x10f6/0x1bc0 [ext4]
[...]
==================================================================
The flow of issue triggering is as follows:
// Set s_mb_group_prealloc to 2147483647 via sysfs
ext4_mb_new_blocks
ext4_mb_normalize_request
ext4_mb_normalize_group_request
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
ext4_mb_regular_allocator
ext4_mb_choose_next_group
ext4_mb_choose_next_group_best_avail
mb_avg_fragment_size_order
order = fls(len) - 2 = 29
ext4_mb_find_good_group_avg_frag_lists
frag_list = &sbi->s_mb_avg_fragment_size[order]
if (list_empty(frag_list)) // Trigger SOOB!
At 4k block size, the length of the s_mb_avg_fragment_size list is 14,
but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds
to be triggered by an attempt to access an element at index 29.
Add a new attr_id attr_clusters_in_group with values in the range
[0, sbi->s_clusters_per_group] and declare mb_group_prealloc as
that type to fix the issue. In addition avoid returning an order
from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb)
and reduce some useless loops.
Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
CC: stable(a)vger.kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20240319113325.3110393-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..dbf04f91516c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -831,6 +831,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
return 0;
if (order == MB_NUM_ORDERS(sb))
order--;
+ if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
+ order = MB_NUM_ORDERS(sb) - 1;
return order;
}
@@ -1008,6 +1010,8 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* goal length.
*/
order = fls(ac->ac_g_ex.fe_len) - 1;
+ if (WARN_ON_ONCE(order - 1 > MB_NUM_ORDERS(ac->ac_sb)))
+ order = MB_NUM_ORDERS(ac->ac_sb);
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 7f455b5f22c0..ddd71673176c 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -29,6 +29,7 @@ typedef enum {
attr_trigger_test_error,
attr_first_error_time,
attr_last_error_time,
+ attr_clusters_in_group,
attr_feature,
attr_pointer_ui,
attr_pointer_ul,
@@ -207,13 +208,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
ext4_sb_info, s_inode_readahead_blks);
+EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, clusters_in_group,
+ ext4_sb_info, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
@@ -376,6 +378,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
switch (a->attr_id) {
case attr_inode_readahead:
+ case attr_clusters_in_group:
case attr_pointer_ui:
if (a->attr_ptr == ptr_ext4_super_block_offset)
return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
@@ -455,6 +458,14 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
else
*((unsigned int *) ptr) = t;
return len;
+ case attr_clusters_in_group:
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
+ if (ret)
+ return ret;
+ if (t > sbi->s_clusters_per_group)
+ return -EINVAL;
+ *((unsigned int *) ptr) = t;
+ return len;
case attr_pointer_ul:
ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 13df4d44a3aaabe61cd01d277b6ee23ead2a5206
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061334-impish-backdrop-1c34@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
13df4d44a3aa ("ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()")
57341fe3179c ("ext4: refactor out ext4_generic_attr_show()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13df4d44a3aaabe61cd01d277b6ee23ead2a5206 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Tue, 19 Mar 2024 19:33:20 +0800
Subject: [PATCH] ext4: fix slab-out-of-bounds in
ext4_mb_find_good_group_avg_frag_lists()
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G
mount /dev/$disk /tmp/test
echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
echo test > /tmp/test/file && sync
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
Call Trace:
dump_stack_lvl+0x2c/0x50
kasan_report+0xb6/0xf0
ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
ext4_mb_new_blocks+0x88a/0x1370 [ext4]
ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
ext4_map_blocks+0x569/0xea0 [ext4]
ext4_do_writepages+0x10f6/0x1bc0 [ext4]
[...]
==================================================================
The flow of issue triggering is as follows:
// Set s_mb_group_prealloc to 2147483647 via sysfs
ext4_mb_new_blocks
ext4_mb_normalize_request
ext4_mb_normalize_group_request
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
ext4_mb_regular_allocator
ext4_mb_choose_next_group
ext4_mb_choose_next_group_best_avail
mb_avg_fragment_size_order
order = fls(len) - 2 = 29
ext4_mb_find_good_group_avg_frag_lists
frag_list = &sbi->s_mb_avg_fragment_size[order]
if (list_empty(frag_list)) // Trigger SOOB!
At 4k block size, the length of the s_mb_avg_fragment_size list is 14,
but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds
to be triggered by an attempt to access an element at index 29.
Add a new attr_id attr_clusters_in_group with values in the range
[0, sbi->s_clusters_per_group] and declare mb_group_prealloc as
that type to fix the issue. In addition avoid returning an order
from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb)
and reduce some useless loops.
Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
CC: stable(a)vger.kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20240319113325.3110393-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..dbf04f91516c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -831,6 +831,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
return 0;
if (order == MB_NUM_ORDERS(sb))
order--;
+ if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
+ order = MB_NUM_ORDERS(sb) - 1;
return order;
}
@@ -1008,6 +1010,8 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* goal length.
*/
order = fls(ac->ac_g_ex.fe_len) - 1;
+ if (WARN_ON_ONCE(order - 1 > MB_NUM_ORDERS(ac->ac_sb)))
+ order = MB_NUM_ORDERS(ac->ac_sb);
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 7f455b5f22c0..ddd71673176c 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -29,6 +29,7 @@ typedef enum {
attr_trigger_test_error,
attr_first_error_time,
attr_last_error_time,
+ attr_clusters_in_group,
attr_feature,
attr_pointer_ui,
attr_pointer_ul,
@@ -207,13 +208,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
ext4_sb_info, s_inode_readahead_blks);
+EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, clusters_in_group,
+ ext4_sb_info, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
@@ -376,6 +378,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
switch (a->attr_id) {
case attr_inode_readahead:
+ case attr_clusters_in_group:
case attr_pointer_ui:
if (a->attr_ptr == ptr_ext4_super_block_offset)
return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
@@ -455,6 +458,14 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
else
*((unsigned int *) ptr) = t;
return len;
+ case attr_clusters_in_group:
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
+ if (ret)
+ return ret;
+ if (t > sbi->s_clusters_per_group)
+ return -EINVAL;
+ *((unsigned int *) ptr) = t;
+ return len;
case attr_pointer_ul:
ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)