The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-overstay-bullseye-eee1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-knoll-dropkick-f11a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-scrubber-excretion-0f83@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
3ea4dc5bf00c ("btrfs: send: send compressed extents with encoded writes")
6fe81a3a3ac8 ("btrfs: balance btree dirty pages and delayed items after clone and dedupe")
152555b39ceb ("btrfs: send: avoid trashing the page cache")
521b6803f22e ("btrfs: send: keep the current inode open while processing it")
1c6cbbbeeeca ("btrfs: remove inode_dio_wait() calls when starting reflink operations")
6b1f86f8e9c7 ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-cupped-curing-f315@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
3ea4dc5bf00c ("btrfs: send: send compressed extents with encoded writes")
6fe81a3a3ac8 ("btrfs: balance btree dirty pages and delayed items after clone and dedupe")
152555b39ceb ("btrfs: send: avoid trashing the page cache")
521b6803f22e ("btrfs: send: keep the current inode open while processing it")
1c6cbbbeeeca ("btrfs: remove inode_dio_wait() calls when starting reflink operations")
6b1f86f8e9c7 ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-shrivel-overstay-1394@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081909-blunt-glass-4ea5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081954-purplish-scarecrow-7568@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081953-calm-granola-8c1b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081952-clatter-tribute-f81c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081951-anyhow-fool-5756@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081942-rippling-relieving-c8d1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
2bc481362245 ("selftests/mm: add -a to run_vmtests.sh")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081941-agility-fable-8749@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
2bc481362245 ("selftests/mm: add -a to run_vmtests.sh")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081940-stopped-knickers-3a22@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d75abd0d0bc29e6ebfebbf76d11b4067b35844af
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-stardust-create-e577@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d75abd0d0bc2 ("mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu")
96f96763de26 ("mm: memory-failure: convert to pr_fmt()")
98931dd95fd4 ("Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d75abd0d0bc29e6ebfebbf76d11b4067b35844af Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Tue, 6 Aug 2024 12:41:07 -0400
Subject: [PATCH] mm/memory-failure: use raw_spinlock_t in struct
memory_failure_cpu
The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption. The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
[12135.732252] preempt_count: 1, expected: 0
[12135.732255] RCU nest depth: 2, expected: 2
:
[12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
[12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
[12135.732433] Call Trace:
[12135.732436] <TASK>
[12135.732450] dump_stack_lvl+0x57/0x81
[12135.732461] __might_resched.cold+0xf4/0x12f
[12135.732479] rt_spin_lock+0x4c/0x100
[12135.732491] memory_failure_queue+0x40/0xe0
[12135.732503] ghes_do_memory_failure+0x53/0x390
[12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
[12135.732575] ghes_proc+0xf9/0x1a0
[12135.732591] ghes_notify_hed+0x6a/0x150
[12135.732602] notifier_call_chain+0x43/0xb0
[12135.732626] blocking_notifier_call_chain+0x43/0x60
[12135.732637] acpi_ev_notify_dispatch+0x47/0x70
[12135.732648] acpi_os_execute_deferred+0x13/0x20
[12135.732654] process_one_work+0x41f/0x500
[12135.732695] worker_thread+0x192/0x360
[12135.732715] kthread+0x111/0x140
[12135.732733] ret_from_fork+0x29/0x50
[12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.
[longman(a)redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 581d3e5c9117..7066fc84f351 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2417,7 +2417,7 @@ struct memory_failure_entry {
struct memory_failure_cpu {
DECLARE_KFIFO(fifo, struct memory_failure_entry,
MEMORY_FAILURE_FIFO_SIZE);
- spinlock_t lock;
+ raw_spinlock_t lock;
struct work_struct work;
};
@@ -2443,20 +2443,22 @@ void memory_failure_queue(unsigned long pfn, int flags)
{
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
+ bool buffer_overflow;
struct memory_failure_entry entry = {
.pfn = pfn,
.flags = flags,
};
mf_cpu = &get_cpu_var(memory_failure_cpu);
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
- if (kfifo_put(&mf_cpu->fifo, entry))
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
+ if (!buffer_overflow)
schedule_work_on(smp_processor_id(), &mf_cpu->work);
- else
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ put_cpu_var(memory_failure_cpu);
+ if (buffer_overflow)
pr_err("buffer overflow when queuing memory failure at %#lx\n",
pfn);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
- put_cpu_var(memory_failure_cpu);
}
EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2469,9 +2471,9 @@ static void memory_failure_work_func(struct work_struct *work)
mf_cpu = container_of(work, struct memory_failure_cpu, work);
for (;;) {
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
gotten = kfifo_get(&mf_cpu->fifo, &entry);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
@@ -2501,7 +2503,7 @@ static int __init memory_failure_init(void)
for_each_possible_cpu(cpu) {
mf_cpu = &per_cpu(memory_failure_cpu, cpu);
- spin_lock_init(&mf_cpu->lock);
+ raw_spin_lock_init(&mf_cpu->lock);
INIT_KFIFO(mf_cpu->fifo);
INIT_WORK(&mf_cpu->work, memory_failure_work_func);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 58a63729c957621f1990c3494c702711188ca347
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081950-tweed-tattle-7f2c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
18010ff776fa ("net: mana: Fix race on per-CQ variable napi work_done")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58a63729c957621f1990c3494c702711188ca347 Mon Sep 17 00:00:00 2001
From: Long Li <longli(a)microsoft.com>
Date: Fri, 9 Aug 2024 08:58:58 -0700
Subject: [PATCH] net: mana: Fix doorbell out of order violation and avoid
unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable(a)vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Long Li <longli(a)microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhy…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ae717d06e66f..39f56973746d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1792,7 +1792,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1803,16 +1802,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
+ cq->work_done_since_doorbell += w;
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even if there is no need to arm the CQ.
+ * This driver rings the doorbell as soon as we have exceeded
+ * 4 wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 58a63729c957621f1990c3494c702711188ca347
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081949-strategy-gatherer-758e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58a63729c957621f1990c3494c702711188ca347 Mon Sep 17 00:00:00 2001
From: Long Li <longli(a)microsoft.com>
Date: Fri, 9 Aug 2024 08:58:58 -0700
Subject: [PATCH] net: mana: Fix doorbell out of order violation and avoid
unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable(a)vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Long Li <longli(a)microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhy…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ae717d06e66f..39f56973746d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1792,7 +1792,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1803,16 +1802,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
+ cq->work_done_since_doorbell += w;
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even if there is no need to arm the CQ.
+ * This driver rings the doorbell as soon as we have exceeded
+ * 4 wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 38055789d15155109b41602ad719d770af507030
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081920-gating-yummy-71e0@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
38055789d151 ("wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850")
26dd8ccdba4d ("wifi: ath12k: dynamic VLAN support")
97b7cbb7a3cb ("wifi: ath12k: support SMPS configuration for 6 GHz")
f0e61dc7ecf9 ("wifi: ath12k: refactor SMPS configuration")
112dbc6af807 ("wifi: ath12k: add 6 GHz params in peer assoc command")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38055789d15155109b41602ad719d770af507030 Mon Sep 17 00:00:00 2001
From: Baochen Qiang <quic_bqiang(a)quicinc.com>
Date: Thu, 1 Aug 2024 18:04:07 +0300
Subject: [PATCH] wifi: ath12k: use 128 bytes aligned iova in transmit path for
WCN7850
In transmit path, it is likely that the iova is not aligned to PCIe TLP
max payload size, which is 128 for WCN7850. Normally in such cases hardware
is expected to split the packet into several parts in a manner such that
they, other than the first one, have aligned iova. However due to hardware
limitations, WCN7850 does not behave like that properly with some specific
unaligned iova in transmit path. This easily results in target hang in a
KPI transmit test: packet send/receive failure, WMI command send timeout
etc. Also fatal error seen in PCIe level:
...
Capabilities: ...
...
DevSta: ... FatalErr+ ...
...
...
Work around this by manually moving/reallocating payload buffer such that
we can map it to a 128 bytes aligned iova. The moving requires sufficient
head room or tail room in skb: for the former we can do ourselves a favor
by asking some extra bytes when registering with mac80211, while for the
latter we can do nothing.
Moving/reallocating buffer consumes additional CPU cycles, but the good news
is that an aligned iova increases PCIe efficiency. In my tests on some X86
platforms the KPI results are almost consistent.
Since this is seen only with WCN7850, add a new hardware parameter to
differentiate from others.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Baochen Qiang <quic_bqiang(a)quicinc.com>
Cc: <stable(a)vger.kernel.org>
Tested-by: Mark Pearson <mpearson-lenovo(a)squebb.ca>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://patch.msgid.link/20240715023814.20242-1-quic_bqiang@quicinc.com
diff --git a/drivers/net/wireless/ath/ath12k/dp_tx.c b/drivers/net/wireless/ath/ath12k/dp_tx.c
index d08c04343e90..44406e0b4a34 100644
--- a/drivers/net/wireless/ath/ath12k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath12k/dp_tx.c
@@ -162,6 +162,60 @@ static int ath12k_dp_prepare_htt_metadata(struct sk_buff *skb)
return 0;
}
+static void ath12k_dp_tx_move_payload(struct sk_buff *skb,
+ unsigned long delta,
+ bool head)
+{
+ unsigned long len = skb->len;
+
+ if (head) {
+ skb_push(skb, delta);
+ memmove(skb->data, skb->data + delta, len);
+ skb_trim(skb, len);
+ } else {
+ skb_put(skb, delta);
+ memmove(skb->data + delta, skb->data, len);
+ skb_pull(skb, delta);
+ }
+}
+
+static int ath12k_dp_tx_align_payload(struct ath12k_base *ab,
+ struct sk_buff **pskb)
+{
+ u32 iova_mask = ab->hw_params->iova_mask;
+ unsigned long offset, delta1, delta2;
+ struct sk_buff *skb2, *skb = *pskb;
+ unsigned int headroom = skb_headroom(skb);
+ int tailroom = skb_tailroom(skb);
+ int ret = 0;
+
+ offset = (unsigned long)skb->data & iova_mask;
+ delta1 = offset;
+ delta2 = iova_mask - offset + 1;
+
+ if (headroom >= delta1) {
+ ath12k_dp_tx_move_payload(skb, delta1, true);
+ } else if (tailroom >= delta2) {
+ ath12k_dp_tx_move_payload(skb, delta2, false);
+ } else {
+ skb2 = skb_realloc_headroom(skb, iova_mask);
+ if (!skb2) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ dev_kfree_skb_any(skb);
+
+ offset = (unsigned long)skb2->data & iova_mask;
+ if (offset)
+ ath12k_dp_tx_move_payload(skb2, offset, true);
+ *pskb = skb2;
+ }
+
+out:
+ return ret;
+}
+
int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
struct sk_buff *skb)
{
@@ -184,6 +238,7 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
bool tcl_ring_retry;
bool msdu_ext_desc = false;
bool add_htt_metadata = false;
+ u32 iova_mask = ab->hw_params->iova_mask;
if (test_bit(ATH12K_FLAG_CRASH_FLUSH, &ar->ab->dev_flags))
return -ESHUTDOWN;
@@ -279,6 +334,23 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
goto fail_remove_tx_buf;
}
+ if (iova_mask &&
+ (unsigned long)skb->data & iova_mask) {
+ ret = ath12k_dp_tx_align_payload(ab, &skb);
+ if (ret) {
+ ath12k_warn(ab, "failed to align TX buffer %d\n", ret);
+ /* don't bail out, give original buffer
+ * a chance even unaligned.
+ */
+ goto map;
+ }
+
+ /* hdr is pointing to a wrong place after alignment,
+ * so refresh it for later use.
+ */
+ hdr = (void *)skb->data;
+ }
+map:
ti.paddr = dma_map_single(ab->dev, skb->data, skb->len, DMA_TO_DEVICE);
if (dma_mapping_error(ab->dev, ti.paddr)) {
atomic_inc(&ab->soc_stats.tx_err.misc_fail);
diff --git a/drivers/net/wireless/ath/ath12k/hw.c b/drivers/net/wireless/ath/ath12k/hw.c
index 2e11ea763574..7b0b6a7f4701 100644
--- a/drivers/net/wireless/ath/ath12k/hw.c
+++ b/drivers/net/wireless/ath/ath12k/hw.c
@@ -924,6 +924,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = NULL,
.supports_dynamic_smps_6ghz = true,
+
+ .iova_mask = 0,
},
{
.name = "wcn7850 hw2.0",
@@ -1000,6 +1002,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = &wcn7850_uuid,
.supports_dynamic_smps_6ghz = false,
+
+ .iova_mask = ATH12K_PCIE_MAX_PAYLOAD_SIZE - 1,
},
{
.name = "qcn9274 hw2.0",
@@ -1072,6 +1076,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = NULL,
.supports_dynamic_smps_6ghz = true,
+
+ .iova_mask = 0,
},
};
diff --git a/drivers/net/wireless/ath/ath12k/hw.h b/drivers/net/wireless/ath/ath12k/hw.h
index e792eb6b249b..b1d302c48326 100644
--- a/drivers/net/wireless/ath/ath12k/hw.h
+++ b/drivers/net/wireless/ath/ath12k/hw.h
@@ -96,6 +96,8 @@
#define ATH12K_M3_FILE "m3.bin"
#define ATH12K_REGDB_FILE_NAME "regdb.bin"
+#define ATH12K_PCIE_MAX_PAYLOAD_SIZE 128
+
enum ath12k_hw_rate_cck {
ATH12K_HW_RATE_CCK_LP_11M = 0,
ATH12K_HW_RATE_CCK_LP_5_5M,
@@ -215,6 +217,8 @@ struct ath12k_hw_params {
const guid_t *acpi_guid;
bool supports_dynamic_smps_6ghz;
+
+ u32 iova_mask;
};
struct ath12k_hw_ops {
diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c
index 8106297f0bc1..ce41c8153080 100644
--- a/drivers/net/wireless/ath/ath12k/mac.c
+++ b/drivers/net/wireless/ath/ath12k/mac.c
@@ -9193,6 +9193,7 @@ static int ath12k_mac_hw_register(struct ath12k_hw *ah)
hw->vif_data_size = sizeof(struct ath12k_vif);
hw->sta_data_size = sizeof(struct ath12k_sta);
+ hw->extra_tx_headroom = ab->hw_params->iova_mask;
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST);
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_STA_TX_PWR);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081926-thrower-salaried-4605@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4e91fa1ef3ce ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume")
14d02fbadb5d ("i2c: qcom-geni: add desc struct to prepare support for I2C Master Hub variant")
d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
357ee8841d0b ("i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct")
9cb4c67d7717 ("Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)kernel.org>
Date: Mon, 12 Aug 2024 21:40:28 +0200
Subject: [PATCH] i2c: qcom-geni: Add missing geni_icc_disable in
geni_i2c_runtime_resume
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1(a)huawei.com>
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 365e37bba0f3..06e836e3e877 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
return ret;
ret = clk_prepare_enable(gi2c->core_clk);
- if (ret)
+ if (ret) {
+ geni_icc_disable(&gi2c->se);
return ret;
+ }
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-editor-flinch-ca97@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4e91fa1ef3ce ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume")
14d02fbadb5d ("i2c: qcom-geni: add desc struct to prepare support for I2C Master Hub variant")
d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)kernel.org>
Date: Mon, 12 Aug 2024 21:40:28 +0200
Subject: [PATCH] i2c: qcom-geni: Add missing geni_icc_disable in
geni_i2c_runtime_resume
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1(a)huawei.com>
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 365e37bba0f3..06e836e3e877 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
return ret;
ret = clk_prepare_enable(gi2c->core_clk);
- if (ret)
+ if (ret) {
+ geni_icc_disable(&gi2c->se);
return ret;
+ }
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-lapping-sedation-f06f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
85067747cf98 ("dm: do not use waitqueue for request-based DM")
087615bf3acd ("dm mpath: pass IO start time to path selector")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-voting-handwrite-258d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-comic-charcoal-8b91@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081923-move-improving-38ad@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081924-fidelity-leverage-7546@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 74c2ab6d653b4c2354df65a7f7f2df1925a40a51
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-sustained-humbly-8aaf@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
74c2ab6d653b ("smb/client: avoid possible NULL dereference in cifs_free_subrequest()")
519be989717c ("cifs: Add a tracepoint to track credits involved in R/W requests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74c2ab6d653b4c2354df65a7f7f2df1925a40a51 Mon Sep 17 00:00:00 2001
From: Su Hui <suhui(a)nfschina.com>
Date: Thu, 8 Aug 2024 20:23:32 +0800
Subject: [PATCH] smb/client: avoid possible NULL dereference in
cifs_free_subrequest()
Clang static checker (scan-build) warning:
cifsglob.h:line 890, column 3
Access to field 'ops' results in a dereference of a null pointer.
Commit 519be989717c ("cifs: Add a tracepoint to track credits involved in
R/W requests") adds a check for 'rdata->server', and let clang throw this
warning about NULL dereference.
When 'rdata->credits.value != 0 && rdata->server == NULL' happens,
add_credits_and_wake_if() will call rdata->server->ops->add_credits().
This will cause NULL dereference problem. Add a check for 'rdata->server'
to avoid NULL dereference.
Cc: stable(a)vger.kernel.org
Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..45459af5044d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -315,7 +315,7 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq)
#endif
}
- if (rdata->credits.value != 0)
+ if (rdata->credits.value != 0) {
trace_smb3_rw_credits(rdata->rreq->debug_id,
rdata->subreq.debug_index,
rdata->credits.value,
@@ -323,8 +323,12 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq)
rdata->server ? rdata->server->in_flight : 0,
-rdata->credits.value,
cifs_trace_rw_credits_free_subreq);
+ if (rdata->server)
+ add_credits_and_wake_if(rdata->server, &rdata->credits, 0);
+ else
+ rdata->credits.value = 0;
+ }
- add_credits_and_wake_if(rdata->server, &rdata->credits, 0);
if (rdata->have_xid)
free_xid(rdata->xid);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-antitoxic-kennel-6f2e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081914-divided-overhaul-0e25@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-dominoes-octagon-edcc@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081912-hesitate-manor-84c0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-flame-used-1927@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081910-dawn-handoff-e37b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dec92020671c ("drm: Use the state pointer directly in planes atomic_check")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_cursor.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
index db0a1eb53532..e460f5ba2d87 100644
--- a/drivers/gpu/drm/sti/sti_cursor.c
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -200,6 +200,8 @@ static int sti_cursor_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6e6f58a170ea98e44075b761f2da42a5aec47dfb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081916-anguished-snooze-7b66@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
6e6f58a170ea ("thermal: gov_bang_bang: Use governor_data to reduce overhead")
5f64b4a1ab1b ("thermal: gov_bang_bang: Add .manage() callback")
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6e6f58a170ea98e44075b761f2da42a5aec47dfb Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:29:11 +0200
Subject: [PATCH] thermal: gov_bang_bang: Use governor_data to reduce overhead
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.
However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index bc55e0698bfa..daed67d19efb 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -86,6 +86,10 @@ static void bang_bang_manage(struct thermal_zone_device *tz)
const struct thermal_trip_desc *td;
struct thermal_instance *instance;
+ /* If the code below has run already, nothing needs to be done. */
+ if (tz->governor_data)
+ return;
+
for_each_trip_desc(tz, td) {
const struct thermal_trip *trip = &td->trip;
@@ -107,11 +111,25 @@ static void bang_bang_manage(struct thermal_zone_device *tz)
bang_bang_set_instance_target(instance, 0);
}
}
+
+ tz->governor_data = (void *)true;
+}
+
+static void bang_bang_update_tz(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason)
+{
+ /*
+ * Let bang_bang_manage() know that it needs to walk trips after binding
+ * a new cdev and after system resume.
+ */
+ if (reason == THERMAL_TZ_BIND_CDEV || reason == THERMAL_TZ_RESUME)
+ tz->governor_data = NULL;
}
static struct thermal_governor thermal_gov_bang_bang = {
.name = "bang_bang",
.trip_crossed = bang_bang_control,
.manage = bang_bang_manage,
+ .update_tz = bang_bang_update_tz,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 95c399f94744..e6669aeda1ff 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1728,7 +1728,8 @@ static void thermal_zone_device_resume(struct work_struct *work)
thermal_debug_tz_resume(tz);
thermal_zone_device_init(tz);
- __thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+ thermal_governor_update_tz(tz, THERMAL_TZ_RESUME);
+ __thermal_zone_device_update(tz, THERMAL_TZ_RESUME);
complete(&tz->resume);
tz->resuming = false;
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 25fbf960b474..b86ddca46b9e 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -55,6 +55,7 @@ enum thermal_notify_event {
THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */
+ THERMAL_TZ_RESUME, /* Thermal zone is resuming after system sleep */
};
/**
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-parlor-reformer-542a@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
5f64b4a1ab1b ("thermal: gov_bang_bang: Add .manage() callback")
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:27:33 +0200
Subject: [PATCH] thermal: gov_bang_bang: Add .manage() callback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
After recent changes, the Bang-bang governor may not adjust the
initial configuration of cooling devices to the actual situation.
Namely, if a cooling device bound to a certain trip point starts in
the "on" state and the thermal zone temperature is below the threshold
of that trip point, the trip point may never be crossed on the way up
in which case the state of the cooling device will never be adjusted
because the thermal core will never invoke the governor's
.trip_crossed() callback. [Note that there is no issue if the zone
temperature is at the trip threshold or above it to start with because
.trip_crossed() will be invoked then to indicate the start of thermal
mitigation for the given trip.]
To address this, add a .manage() callback to the Bang-bang governor
and use it to ensure that all of the thermal instances managed by the
governor have been initialized properly and the states of all of the
cooling devices involved have been adjusted to the current zone
temperature as appropriate.
Fixes: 530c932bdf75 ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()")
Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.…
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index 87cff3ea77a9..bc55e0698bfa 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -26,6 +26,7 @@ static void bang_bang_set_instance_target(struct thermal_instance *instance,
* when the trip is crossed on the way down.
*/
instance->target = target;
+ instance->initialized = true;
dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
@@ -80,8 +81,37 @@ static void bang_bang_control(struct thermal_zone_device *tz,
}
}
+static void bang_bang_manage(struct thermal_zone_device *tz)
+{
+ const struct thermal_trip_desc *td;
+ struct thermal_instance *instance;
+
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
+ if (tz->temperature >= td->threshold ||
+ trip->temperature == THERMAL_TEMP_INVALID ||
+ trip->type == THERMAL_TRIP_CRITICAL ||
+ trip->type == THERMAL_TRIP_HOT)
+ continue;
+
+ /*
+ * If the initial cooling device state is "on", but the zone
+ * temperature is not above the trip point, the core will not
+ * call bang_bang_control() until the zone temperature reaches
+ * the trip point temperature which may be never. In those
+ * cases, set the initial state of the cooling device to 0.
+ */
+ list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+ if (!instance->initialized && instance->trip == trip)
+ bang_bang_set_instance_target(instance, 0);
+ }
+ }
+}
+
static struct thermal_governor thermal_gov_bang_bang = {
.name = "bang_bang",
.trip_crossed = bang_bang_control,
+ .manage = bang_bang_manage,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 84248e35d9b60e03df7276627e4e91fbaf80f73d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081904-sprite-urologist-d8e9@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 84248e35d9b60e03df7276627e4e91fbaf80f73d Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:26:42 +0200
Subject: [PATCH] thermal: gov_bang_bang: Split bang_bang_control()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Move the setting of the thermal instance target state from
bang_bang_control() into a separate function that will be also called
in a different place going forward.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index b9474c6af72b..87cff3ea77a9 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -13,6 +13,27 @@
#include "thermal_core.h"
+static void bang_bang_set_instance_target(struct thermal_instance *instance,
+ unsigned int target)
+{
+ if (instance->target != 0 && instance->target != 1 &&
+ instance->target != THERMAL_NO_TARGET)
+ pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n",
+ instance->target, instance->name);
+
+ /*
+ * Enable the fan when the trip is crossed on the way up and disable it
+ * when the trip is crossed on the way down.
+ */
+ instance->target = target;
+
+ dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
+
+ mutex_lock(&instance->cdev->lock);
+ __thermal_cdev_update(instance->cdev);
+ mutex_unlock(&instance->cdev->lock);
+}
+
/**
* bang_bang_control - controls devices associated with the given zone
* @tz: thermal_zone_device
@@ -54,25 +75,8 @@ static void bang_bang_control(struct thermal_zone_device *tz,
tz->temperature, trip->hysteresis);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
- if (instance->trip != trip)
- continue;
-
- if (instance->target != 0 && instance->target != 1 &&
- instance->target != THERMAL_NO_TARGET)
- pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n",
- instance->target, instance->name);
-
- /*
- * Enable the fan when the trip is crossed on the way up and
- * disable it when the trip is crossed on the way down.
- */
- instance->target = crossed_up;
-
- dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
-
- mutex_lock(&instance->cdev->lock);
- __thermal_cdev_update(instance->cdev);
- mutex_unlock(&instance->cdev->lock);
+ if (instance->trip == trip)
+ bang_bang_set_instance_target(instance, crossed_up);
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081930-untoasted-corporal-214e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081931-improvise-purgatory-bbe3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081930-commuting-makeover-de1c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081929-kangaroo-abstract-f5d8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081928-deduct-humongous-41ae@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-ungodly-gumminess-adf9@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
Several cs track offsets (such as 'track->db_s_read_offset')
either are initialized with or plainly take big enough values that,
once shifted 8 bits left, may be hit with integer overflow if the
resulting values end up going over u32 limit.
Some debug prints take this into account (see according dev_warn() in
evergreen_cs_track_validate_stencil()), even if the actual
calculated value assigned to local 'offset' variable is missing
similar proper expansion.
Mitigate the problem by casting the type of right operands to the
wider type of corresponding left ones in all such cases.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 285484e2d55e ("drm/radeon: add support for evergreen/ni tiling informations v11")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
P.S. While I am not certain that track->cb_color_bo_offset[id]
actually ends up taking values high enough to cause an overflow,
nonetheless I thought it prudent to cast it to ulong as well.
drivers/gpu/drm/radeon/evergreen_cs.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c
index 1fe6e0d883c7..d734d221e2da 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -433,7 +433,7 @@ static int evergreen_cs_track_validate_cb(struct radeon_cs_parser *p, unsigned i
return r;
}
- offset = track->cb_color_bo_offset[id] << 8;
+ offset = (unsigned long)track->cb_color_bo_offset[id] << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d cb[%d] bo base %ld not aligned with %ld\n",
__func__, __LINE__, id, offset, surf.base_align);
@@ -455,7 +455,7 @@ static int evergreen_cs_track_validate_cb(struct radeon_cs_parser *p, unsigned i
min = surf.nby - 8;
}
bsize = radeon_bo_size(track->cb_color_bo[id]);
- tmp = track->cb_color_bo_offset[id] << 8;
+ tmp = (unsigned long)track->cb_color_bo_offset[id] << 8;
for (nby = surf.nby; nby > min; nby--) {
size = nby * surf.nbx * surf.bpe * surf.nsamples;
if ((tmp + size * mslice) <= bsize) {
@@ -476,10 +476,10 @@ static int evergreen_cs_track_validate_cb(struct radeon_cs_parser *p, unsigned i
}
}
dev_warn(p->dev, "%s:%d cb[%d] bo too small (layer size %d, "
- "offset %d, max layer %d, bo size %ld, slice %d)\n",
+ "offset %ld, max layer %d, bo size %ld, slice %d)\n",
__func__, __LINE__, id, surf.layer_size,
- track->cb_color_bo_offset[id] << 8, mslice,
- radeon_bo_size(track->cb_color_bo[id]), slice);
+ (unsigned long)track->cb_color_bo_offset[id] << 8,
+ mslice, radeon_bo_size(track->cb_color_bo[id]), slice);
dev_warn(p->dev, "%s:%d problematic surf: (%d %d) (%d %d %d %d %d %d %d)\n",
__func__, __LINE__, surf.nbx, surf.nby,
surf.mode, surf.bpe, surf.nsamples,
@@ -608,7 +608,7 @@ static int evergreen_cs_track_validate_stencil(struct radeon_cs_parser *p)
return r;
}
- offset = track->db_s_read_offset << 8;
+ offset = (unsigned long)track->db_s_read_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil read bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
@@ -627,7 +627,7 @@ static int evergreen_cs_track_validate_stencil(struct radeon_cs_parser *p)
return -EINVAL;
}
- offset = track->db_s_write_offset << 8;
+ offset = (unsigned long)track->db_s_write_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil write bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
@@ -706,7 +706,7 @@ static int evergreen_cs_track_validate_depth(struct radeon_cs_parser *p)
return r;
}
- offset = track->db_z_read_offset << 8;
+ offset = (unsigned long)track->db_z_read_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil read bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
@@ -722,7 +722,7 @@ static int evergreen_cs_track_validate_depth(struct radeon_cs_parser *p)
return -EINVAL;
}
- offset = track->db_z_write_offset << 8;
+ offset = (unsigned long)track->db_z_write_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil write bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
If formatting a suspended disk (such as formatting with different DIF
type), the disk will be resuming first, and then the format command will
submit to the disk through SG_IO ioctl.
When the disk is processing the format command, the system does not submit
other commands to the disk. Therefore, the system attempts to suspend the
disk again and sends the SYNC CACHE command. However, the SYNC CACHE
command will fail because the disk is in the formatting process, which
will cause the runtime_status of the disk to error and it is difficult
for user to recover it. Error info like:
[ 669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
[ 670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[ 670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
[ 670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
To solve the issue, retry the command until format command is finished.
Cc: stable(a)vger.kernel.org
Signed-off-by: Yihang Li <liyihang9(a)huawei.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
---
Changes since v3:
- Add Cc tag for kernel stable.
Changes since v2:
- Add Reviewed-by for Bart.
Changes since v1:
- Updated and added error information to the patch description.
---
drivers/scsi/sd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..5cd88a8eea73 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1823,6 +1823,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
(sshdr.asc == 0x74 && sshdr.ascq == 0x71)) /* drive is password locked */
/* this is no error here */
return 0;
+
+ /* retry if format in progress */
+ if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
+ return -EBUSY;
+
/*
* This drive doesn't support sync and there's not much
* we can do because this is called during shutdown
--
2.33.0
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
v2: Refactored the patch as suggested in v1 discussion.
drivers/usb/dwc3/gadget.c | 24 ------------------------
1 file changed, 24 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..3f634209c5b8 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -327,30 +327,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
--
2.25.1
I'm announcing the release of the 6.10.6 kernel.
All users of the 6.10 kernel series must upgrade.
The updated 6.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/loongarch/include/uapi/asm/unistd.h | 1
drivers/ata/libata-scsi.c | 15
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 232 ++++--------
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 90 +++-
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8
drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 -
drivers/nvme/host/pci.c | 7
drivers/platform/x86/Kconfig | 1
drivers/platform/x86/amd/pmf/spc.c | 32 -
drivers/platform/x86/ideapad-laptop.c | 148 ++++++-
drivers/platform/x86/ideapad-laptop.h | 9
drivers/platform/x86/lenovo-ymc.c | 60 ---
fs/binfmt_flat.c | 4
fs/exec.c | 8
fs/f2fs/extent_cache.c | 48 --
fs/f2fs/f2fs.h | 2
fs/f2fs/gc.c | 10
fs/f2fs/inode.c | 10
fs/jfs/jfs_dmap.c | 2
fs/jfs/jfs_dtree.c | 2
fs/ntfs3/frecord.c | 75 +++
net/core/filter.c | 8
net/ipv4/fou_core.c | 2
sound/soc/codecs/cs35l56-shared.c | 1
sound/usb/mixer.c | 7
29 files changed, 486 insertions(+), 355 deletions(-)
Chao Yu (2):
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
f2fs: fix to cover read extent cache access with lock
Edward Adam Davis (1):
jfs: fix null ptr deref in dtInsertEntry
Fangzhi Zuo (1):
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Gergo Koteles (3):
platform/x86: ideapad-laptop: introduce a generic notification chain
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Greg Kroah-Hartman (2):
Revert "drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()"
Linux 6.10.6
Harry Wentland (1):
drm/amd/display: Separate setting and programming of cursor
Huacai Chen (1):
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Kees Cook (2):
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Konstantin Komarov (1):
fs/ntfs3: Do copy_to_user out of run_lock
Niklas Cassel (1):
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Pei Li (1):
jfs: Fix shift-out-of-bounds in dbDiscardAG
Sean Young (1):
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Shyam Sundar S K (1):
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Simon Trimmer (1):
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
Srinivasan Shanmugam (1):
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Takashi Iwai (1):
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
WangYuli (1):
nvme/pci: Add APST quirk for Lenovo N60z laptop
Wayne Lin (2):
drm/amd/display: Defer handling mst up request in resume
drm/amd/display: Solve mst monitors blank out problem after resume
Willem de Bruijn (1):
fou: remove warn in gue_gro_receive on unsupported protocol
yunshui (1):
bpf, net: Use DEV_STAT_INC()
I'm announcing the release of the 6.1.106 kernel.
All users of the 6.1 kernel series must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm64/kvm/hyp/pgtable.c | 10 -
arch/loongarch/include/uapi/asm/unistd.h | 1
drivers/ata/libata-scsi.c | 15 +
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 192 ++++++++++++++++------
drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 ----
drivers/nvme/host/pci.c | 7
fs/binfmt_flat.c | 4
fs/exec.c | 8
fs/lockd/svc.c | 3
fs/nfs/callback.c | 3
fs/nfsd/export.c | 32 ++-
fs/nfsd/export.h | 4
fs/nfsd/netns.h | 25 ++
fs/nfsd/nfs4proc.c | 6
fs/nfsd/nfscache.c | 201 ++++++++++++++----------
fs/nfsd/nfsctl.c | 24 +-
fs/nfsd/nfsd.h | 1
fs/nfsd/nfsfh.c | 3
fs/nfsd/nfssvc.c | 24 +-
fs/nfsd/stats.c | 52 ++----
fs/nfsd/stats.h | 85 +++-------
fs/nfsd/trace.h | 22 ++
fs/nfsd/vfs.c | 6
include/linux/cgroup-defs.h | 7
include/linux/sunrpc/svc.h | 5
kernel/cgroup/cgroup-internal.h | 3
kernel/cgroup/cgroup.c | 23 +-
net/mptcp/options.c | 3
net/mptcp/pm_netlink.c | 49 +++--
net/mptcp/pm_userspace.c | 2
net/mptcp/protocol.h | 2
net/sunrpc/stats.c | 2
net/sunrpc/svc.c | 36 ++--
net/wireless/nl80211.c | 6
sound/soc/soc-topology.c | 32 ---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 +
38 files changed, 572 insertions(+), 379 deletions(-)
Amadeusz Sławiński (2):
ASoC: topology: Clean up route loading
ASoC: topology: Fix route memory corruption
Andi Shyti (2):
drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
drm/i915/gem: Adjust vma offset for framebuffer mmap offset
Chuck Lever (6):
NFSD: Refactor nfsd_reply_cache_free_locked()
NFSD: Rename nfsd_reply_cache_alloc()
NFSD: Replace nfsd_prune_bucket()
NFSD: Refactor the duplicate reply cache shrinker
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Dan Carpenter (1):
drm/i915: Fix a NULL vs IS_ERR() bug
Eric Dumazet (1):
wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Geliang Tang (1):
mptcp: pass addr to mptcp_pm_alloc_anno_list
Greg Kroah-Hartman (1):
Linux 6.1.106
Huacai Chen (1):
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Jeff Layton (2):
nfsd: move reply cache initialization into nfsd startup
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
Kees Cook (2):
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Matthieu Baerts (NGI0) (5):
mptcp: pm: reduce indentation blocks
mptcp: pm: don't try to create sf if alloc failed
mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set
selftests: mptcp: join: test both signal & subflow
mptcp: fully established after ADD_ADDR echo on MPJ
Niklas Cassel (1):
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Nirmoy Das (1):
drm/i915: Add a function to mmap framebuffer obj
Sean Young (1):
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Waiman Long (1):
cgroup: Move rcu_head up near the top of cgroup_root
WangYuli (1):
nvme/pci: Add APST quirk for Lenovo N60z laptop
Will Deacon (1):
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Yafang Shao (1):
cgroup: Make operations on the cgroup root_list RCU safe
On Sat, 2024-08-17 at 10:41 +0100, Martin Whitaker wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
>
> When performing the port_hwtstamp_set operation,
> ptp_schedule_worker()
> will be called if hardware timestamoing is enabled on any of the
> ports.
> When using multiple ports for PTP, port_hwtstamp_set is executed for
> each port. When called for the first time ptp_schedule_worker()
> returns
> 0. On subsequent calls it returns 1, indicating the worker is already
> scheduled. Currently the ksz driver treats 1 as an error and fails to
> complete the port_hwtstamp_set operation, thus leaving the
> timestamping
> configuration for those ports unchanged.
>
> This patch fixes this by ignoring the ptp_schedule_worker() return
> value.
>
> Link:
> https://lore.kernel.org/netdev/7aae307a-35ca-4209-a850-7b2749d40f90@martin-…
> Fixes: bb01ad30570b0 ("net: dsa: microchip: ptp: manipulating
> absolute time using ptp hw clock")
> Signed-off-by: Martin Whitaker <foss(a)martin-whitaker.me.uk>
Acked-by: Arun Ramadoss <arun.ramadoss(a)microchip.com>
From: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
Due to the lack of checks on the clc array, if the firmware supports
more clc configuration, it will cause illegal memory access.
Cc: stable(a)vger.kernel.org
Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips")
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
index 9dc22fbe25d3..c6c380571fd8 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
@@ -638,6 +638,9 @@ static int mt7925_load_clc(struct mt792x_dev *dev, const char *fw_name)
for (offset = 0; offset < len; offset += le32_to_cpu(clc->len)) {
clc = (const struct mt7925_clc *)(clc_base + offset);
+ if (clc->idx > ARRAY_SIZE(phy->clc))
+ break;
+
/* do not init buf again if chip reset triggered */
if (phy->clc[clc->idx])
continue;
--
2.18.0
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
[ Upstream commit 37ae5a0f5287a52cf51242e76ccf198d02ffe495]
Since lo_simple_ioctl(LOOP_SET_BLOCK_SIZE) and ioctl(NBD_SET_BLKSIZE) pass
user-controlled "unsigned long arg" to blk_validate_block_size(),
"unsigned long" should be used for validation.
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/9ecbf057-4375-c2db-ab53-e4cc0dff953d@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
(cherry picked from commit 37ae5a0f5287a52cf51242e76ccf198d02ffe495)
Signed-off-by: David Hunter <david.hunter.linux(a)gmail.com>
---
V1 --> V2
- put upstream commit after subject
- put the original Author
- Added a few people I needed to CC
---
include/linux/blkdev.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 905844172cfd..c6d57814988d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -235,7 +235,7 @@ struct request {
void *end_io_data;
};
-static inline int blk_validate_block_size(unsigned int bsize)
+static inline int blk_validate_block_size(unsigned long bsize)
{
if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
return -EINVAL;
--
2.43.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags
Author: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Wed Aug 7 09:22:10 2024 +0200
The cec_msg_set_reply_to() helper function never zeroed the
struct cec_msg flags field, this can cause unexpected behavior
if flags was uninitialized to begin with.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 0dbacebede1e ("[media] cec: move the CEC framework out of staging and to media")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
include/uapi/linux/cec.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
---
diff --git a/include/uapi/linux/cec.h b/include/uapi/linux/cec.h
index 894fffc66f2c..b2af1dddd4d7 100644
--- a/include/uapi/linux/cec.h
+++ b/include/uapi/linux/cec.h
@@ -132,6 +132,8 @@ static inline void cec_msg_init(struct cec_msg *msg,
* Set the msg destination to the orig initiator and the msg initiator to the
* orig destination. Note that msg and orig may be the same pointer, in which
* case the change is done in place.
+ *
+ * It also zeroes the reply, timeout and flags fields.
*/
static inline void cec_msg_set_reply_to(struct cec_msg *msg,
struct cec_msg *orig)
@@ -139,7 +141,9 @@ static inline void cec_msg_set_reply_to(struct cec_msg *msg,
/* The destination becomes the initiator and vice versa */
msg->msg[0] = (cec_msg_destination(orig) << 4) |
cec_msg_initiator(orig);
- msg->reply = msg->timeout = 0;
+ msg->reply = 0;
+ msg->timeout = 0;
+ msg->flags = 0;
}
/**
Zero and negative number is not a valid IRQ for in-kernel code and the
irq_of_parse_and_map() function returns zero on error. So this check for
valid IRQs should only accept values > 0.
Cc: stable(a)vger.kernel.org
Fixes: 2d9e31b9412c ("dmaengine: moxart: remove NO_IRQ")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- added missed changelog v2.
Changes in v2:
- added Cc stable line;
- added Fixes line.
---
drivers/dma/moxart-dma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/moxart-dma.c b/drivers/dma/moxart-dma.c
index 66dc6d31b603..16dd3c5aba4d 100644
--- a/drivers/dma/moxart-dma.c
+++ b/drivers/dma/moxart-dma.c
@@ -568,7 +568,7 @@ static int moxart_probe(struct platform_device *pdev)
return -ENOMEM;
irq = irq_of_parse_and_map(node, 0);
- if (!irq) {
+ if (irq <= 0) {
dev_err(dev, "no IRQ resource\n");
return -EINVAL;
}
--
2.25.1
This is the start of the stable review cycle for the 6.10.6 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 18 Aug 2024 08:52:13 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.6-rc2…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.10.6-rc2
Aurabindo Pillai <aurabindo.pillai(a)amd.com>
drm/amd/display: Add misc DC changes for DCN401
Niklas Cassel <cassel(a)kernel.org>
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Sean Young <sean(a)mess.org>
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Wayne Lin <Wayne.Lin(a)amd.com>
drm/amd/display: Solve mst monitors blank out problem after resume
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: introduce a generic notification chain
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Simon Trimmer <simont(a)opensource.cirrus.com>
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Fangzhi Zuo <jerry.zuo(a)amd.com>
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Harry Wentland <harry.wentland(a)amd.com>
drm/amd/display: Separate setting and programming of cursor
Wayne Lin <wayne.lin(a)amd.com>
drm/amd/display: Defer handling mst up request in resume
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Makefile | 4 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/ata/libata-scsi.c | 15 ++-
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +-
.../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 6 +
.../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +-
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 94 ++++++++-----
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8 ++
.../drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2 +-
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 +----
drivers/nvme/host/pci.c | 7 +
drivers/platform/x86/Kconfig | 1 +
drivers/platform/x86/amd/pmf/spc.c | 32 ++---
drivers/platform/x86/ideapad-laptop.c | 148 ++++++++++++++++++---
drivers/platform/x86/ideapad-laptop.h | 9 ++
drivers/platform/x86/lenovo-ymc.c | 60 +--------
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/f2fs/extent_cache.c | 50 +++----
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 ++
fs/f2fs/inode.c | 10 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/ntfs3/frecord.c | 75 ++++++++++-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
sound/soc/codecs/cs35l56-shared.c | 1 +
sound/usb/mixer.c | 7 +
29 files changed, 411 insertions(+), 212 deletions(-)
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Interrupt status read seems to be broken on some old MPU-6050 like
chips. Fix by reverting to previous driver behavior bypassing interrupt
status read. This is working because these chips are not supporting
WoM and data ready is the only interrupt source.
Fixes: 5537f653d9be ("iio: imu: inv_mpu6050: add new interrupt handler for WoM events")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
---
drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index 84273660ca2e..3bfeabab0ec4 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -248,12 +248,20 @@ static irqreturn_t inv_mpu6050_interrupt_handle(int irq, void *p)
int result;
switch (st->chip_type) {
+ case INV_MPU6000:
case INV_MPU6050:
+ case INV_MPU9150:
+ /*
+ * WoM is not supported and interrupt status read seems to be broken for
+ * some chips. Since data ready is the only interrupt, bypass interrupt
+ * status read and always assert data ready bit.
+ */
+ wom_bits = 0;
+ int_status = INV_MPU6050_BIT_RAW_DATA_RDY_INT;
+ goto data_ready_interrupt;
case INV_MPU6500:
case INV_MPU6515:
case INV_MPU6880:
- case INV_MPU6000:
- case INV_MPU9150:
case INV_MPU9250:
case INV_MPU9255:
wom_bits = INV_MPU6500_BIT_WOM_INT;
@@ -279,6 +287,7 @@ static irqreturn_t inv_mpu6050_interrupt_handle(int irq, void *p)
}
}
+data_ready_interrupt:
/* handle raw data interrupt */
if (int_status & INV_MPU6050_BIT_RAW_DATA_RDY_INT) {
indio_dev->pollfunc->timestamp = st->it_timestamp;
--
2.34.1
On Fri, Aug 16, 2024 at 08:12:36PM +0000, Jon Hunter wrote:
>
> ________________________________
> From: Jon Hunter <jonathanh(a)nvidia.com>
> Sent: Friday, August 16, 2024 2:43 PM
> To: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; patches(a)lists.linux.dev <patches(a)lists.linux.dev>; linux-kernel(a)vger.kernel.org <linux-kernel(a)vger.kernel.org>; torvalds(a)linux-foundation.org <torvalds(a)linux-foundation.org>; akpm(a)linux-foundation.org <akpm(a)linux-foundation.org>; linux(a)roeck-us.net <linux(a)roeck-us.net>; shuah(a)kernel.org <shuah(a)kernel.org>; patches(a)kernelci.org <patches(a)kernelci.org>; lkft-triage(a)lists.linaro.org <lkft-triage(a)lists.linaro.org>; pavel(a)denx.de <pavel(a)denx.de>; Jon Hunter <jonathanh(a)nvidia.com>; f.fainelli(a)gmail.com <f.fainelli(a)gmail.com>; sudipm.mukherjee(a)gmail.com <sudipm.mukherjee(a)gmail.com>; srw(a)sladewatkins.net <srw(a)sladewatkins.net>; rwarsow(a)gmx.de <rwarsow(a)gmx.de>; conor(a)kernel.org <conor(a)kernel.org>; allen.lkml(a)gmail.com <allen.lkml(a)gmail.com>; broonie(a)kernel.org <broonie(a)kernel.org>; linux-tegra(a)vger.kernel.org <linux-tegra(a)vger.kernel.org>; stable(a)vger.kernel.org <stable(a)vger.kernel.org>
> Subject: Re: [PATCH 5.10 000/350] 5.10.224-rc2 review
>
> On Fri, 16 Aug 2024 12:22:05 +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.10.224 release.
> > There are 350 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun, 18 Aug 2024 10:14:04 +0000.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.224-r…
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> Failures detected for Tegra ...
>
> Test results for stable-v5.10:
> 10 builds: 10 pass, 0 fail
> 31 boots: 26 pass, 5 fail
> 45 tests: 44 pass, 1 fail
>
> Linux version: 5.10.224-rc2-g470450f8c61c
> Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000,
> tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000,
> tegra20-ventana, tegra210-p2371-2180,
> tegra210-p3450-0000, tegra30-cardhu-a04
>
> Boot failures: tegra186-p2771-0000, tegra210-p2371-2180,
> tegra210-p3450-0000
>
> Test failures: tegra194-p2972-0000: boot.py
>
> ---
>
> Apologies for the mail formatting. I am travelling and only have outlook for mobile :-(
>
> Bisect points to the following commit ...
>
> # first bad commit: [4bade5a6b1cfe81c9777aa3c8823009ff28a6e7f] memory: fsl_ifc: Make FSL_IFC config visible and selectable
>
> Reverting this does fix the issue. Seems odd but this appears to disable CONFIG_MEMORY for v5.10 with ARM64 defconfig. So something we need to fix.
Ah, that's a mess. I'll go drop this one for now, glad it's not showing
up on 5.15.y where this commit also is. It's not really important for
5.10.y so there's no harm in removing it.
thanks,
greg k-h
This series begins with some work on the mac_scsi driver to improve
compatibility with SCSI2SD v5 devices. Better error handling is needed
there because the PDMA hardware does not tolerate the write latency spikes
which SD cards can produce.
A bug is fixed in the 5380 core driver so that scatter/gather can be
enabled in mac_scsi.
Several patches at the end of this series improve robustness and correctness
in the core driver.
This series has been tested on a variety of mac_scsi hosts. A variety of
SCSI targets was also tested, including Quantum HDD, Fujitsu HDD, Iomega FDD,
Ricoh CD-RW, Matsushita CD-ROM, SCSI2SD and BlueSCSI.
Finn Thain (11):
scsi: mac_scsi: Revise printk(KERN_DEBUG ...) messages
scsi: mac_scsi: Refactor polling loop
scsi: mac_scsi: Disallow bus errors during PDMA send
scsi: NCR5380: Check for phase match during PDMA fixup
scsi: mac_scsi: Enable scatter/gather by default
scsi: NCR5380: Initialize buffer for MSG IN and STATUS transfers
scsi: NCR5380: Handle BSY signal loss during information transfer
phases
scsi: NCR5380: Drop redundant member from struct NCR5380_cmd
scsi: NCR5380: Remove redundant result calculation from
NCR5380_transfer_pio()
scsi: NCR5380: Remove obsolete comment
scsi: NCR5380: Clean up indentation
drivers/scsi/NCR5380.c | 233 +++++++++++++++++++--------------------
drivers/scsi/NCR5380.h | 20 ++--
drivers/scsi/mac_scsi.c | 170 ++++++++++++++--------------
drivers/scsi/sun3_scsi.c | 2 +-
4 files changed, 215 insertions(+), 210 deletions(-)
--
2.39.5
The patch titled
Subject: kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Petr Tesarik <ptesarik(a)suse.com>
Subject: kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Date: Mon, 5 Aug 2024 17:07:50 +0200
Fix the condition to exclude the elfcorehdr segment from the SHA digest
calculation.
The j iterator is an index into the output sha_regions[] array, not into
the input image->segment[] array. Once it reaches
image->elfcorehdr_index, all subsequent segments are excluded. Besides,
if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
may be wrongly included in the calculation.
Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest")
Signed-off-by: Petr Tesarik <ptesarik(a)suse.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Hari Bathini <hbathini(a)linux.ibm.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Eric DeVolder <eric_devolder(a)yahoo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/kexec_file.c~kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y
+++ a/kernel/kexec_file.c
@@ -752,7 +752,7 @@ static int kexec_calculate_store_digests
#ifdef CONFIG_CRASH_HOTPLUG
/* Exclude elfcorehdr segment to allow future changes via hotplug */
- if (j == image->elfcorehdr_index)
+ if (i == image->elfcorehdr_index)
continue;
#endif
_
Patches currently in -mm which might be from ptesarik(a)suse.com are
kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y.patch
This is the start of the stable review cycle for the 6.1.106 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.106-rc1
Will Deacon <will(a)kernel.org>
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Eric Dumazet <edumazet(a)google.com>
wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Waiman Long <longman(a)redhat.com>
cgroup: Move rcu_head up near the top of cgroup_root
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Andi Shyti <andi.shyti(a)linux.intel.com>
drm/i915/gem: Adjust vma offset for framebuffer mmap offset
Dan Carpenter <dan.carpenter(a)linaro.org>
drm/i915: Fix a NULL vs IS_ERR() bug
Nirmoy Das <nirmoy.das(a)intel.com>
drm/i915: Add a function to mmap framebuffer obj
Yafang Shao <laoar.shao(a)gmail.com>
cgroup: Make operations on the cgroup root_list RCU safe
Andi Shyti <andi.shyti(a)linux.intel.com>
drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: fully established after ADD_ADDR echo on MPJ
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make svc_stat per-network namespace instead of global
Josef Bacik <josef(a)toxicpanda.com>
nfsd: remove nfsd_stats, make th_cnt a global counter
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make all of the nfsd stats per-network namespace
Josef Bacik <josef(a)toxicpanda.com>
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
Josef Bacik <josef(a)toxicpanda.com>
nfsd: rename NFSD_NET_* to NFSD_STATS_*
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: use the struct net as the svc proc private
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: remove ->pg_stats from svc_program
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: pass in the sv_stats struct through svc_create_pooled
Josef Bacik <josef(a)toxicpanda.com>
nfsd: stop setting ->pg_stats for unused stats
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: don't change ->sv_stats if it doesn't exist
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix frame size warning in svc_export_parse()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor the duplicate reply cache shrinker
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace nfsd_prune_bucket()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rename nfsd_reply_cache_alloc()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_reply_cache_free_locked()
Jeff Layton <jlayton(a)kernel.org>
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Jeff Layton <jlayton(a)kernel.org>
nfsd: move reply cache initialization into nfsd startup
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Fix route memory corruption
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Clean up route loading
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
selftests: mptcp: join: test both signal & subflow
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: don't try to create sf if alloc failed
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: reduce indentation blocks
Geliang Tang <geliang.tang(a)suse.com>
mptcp: pass addr to mptcp_pm_alloc_anno_list
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 10 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 192 ++++++++++++++++------
drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2 +-
drivers/nvme/host/pci.c | 7 +
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/export.c | 32 ++--
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfscache.c | 201 ++++++++++++++----------
fs/nfsd/nfsctl.c | 24 ++-
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 24 +--
fs/nfsd/stats.c | 52 +++---
fs/nfsd/stats.h | 83 ++++------
fs/nfsd/trace.h | 22 +++
fs/nfsd/vfs.c | 6 +-
include/linux/cgroup-defs.h | 7 +-
include/linux/sunrpc/svc.h | 5 +-
kernel/cgroup/cgroup-internal.h | 3 +-
kernel/cgroup/cgroup.c | 23 ++-
net/mptcp/options.c | 3 +-
net/mptcp/pm_netlink.c | 49 +++---
net/mptcp/pm_userspace.c | 2 +-
net/mptcp/protocol.h | 2 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 36 +++--
net/wireless/nl80211.c | 6 +-
sound/soc/soc-topology.c | 32 +---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 ++
36 files changed, 555 insertions(+), 346 deletions(-)
This is the start of the stable review cycle for the 6.6.47 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.47-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.47-rc1
Will Deacon <will(a)kernel.org>
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Will Deacon <will(a)kernel.org>
KVM: arm64: Don't defer TLB invalidation when zapping table entries
Waiman Long <longman(a)redhat.com>
cgroup: Move rcu_head up near the top of cgroup_root
Peter Xu <peterx(a)redhat.com>
mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Revert "Input: bcm5974 - check endpoint type before starting traffic"
Dave Kleikamp <dave.kleikamp(a)oracle.com>
Revert "jfs: fix shift-out-of-bounds in dbJoin"
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Wojciech Gładysz <wojciech.gladysz(a)infogain.com>
ext4: sanity check for NULL pointer after ext4_force_shutdown
Matthew Wilcox (Oracle) <willy(a)infradead.org>
ext4: convert ext4_da_do_write_end() to take a folio
Eric Dumazet <edumazet(a)google.com>
wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Peter Xu <peterx(a)redhat.com>
mm/page_table_check: support userfault wr-protect entries
Jan Kara <jack(a)suse.cz>
ext4: do not create EA inode under buffer lock
Jan Kara <jack(a)suse.cz>
ext4: fold quota accounting into ext4_xattr_inode_lookup_create()
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: RFCOMM: Fix not validating setsockopt user input
Eric Dumazet <edumazet(a)google.com>
nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
Eric Dumazet <edumazet(a)google.com>
net: add copy_safe_from_sockptr() helper
Eric Dumazet <edumazet(a)google.com>
mISDN: fix MISDN_TIME_STAMP handling
Gustavo A. R. Silva <gustavoars(a)kernel.org>
fs: Annotate struct file_handle with __counted_by() and use struct_size()
Alexei Starovoitov <ast(a)kernel.org>
bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.
Kees Cook <keescook(a)chromium.org>
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Gavrilov Ilia <Ilia.Gavrilov(a)infotecs.ru>
pppoe: Fix memory leak in pppoe_sendmsg()
Dmitry Antipov <dmantipov(a)yandex.ru>
net: sctp: fix skb leak in sctp_inq_free()
Allison Henderson <allison.henderson(a)oracle.com>
net:rds: Fix possible deadlock in rds_message_put
Jan Kara <jack(a)suse.cz>
quota: Detect loops in quota tree
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Input: bcm5974 - check endpoint type before starting traffic
John Fastabend <john.fastabend(a)gmail.com>
net: tls, add test to capture error on large splice
Gao Xiang <xiang(a)kernel.org>
erofs: avoid debugging output for (de)compressed data
Edward Adam Davis <eadavis(a)qq.com>
reiserfs: fix uninit-value in comp_keys
Phillip Lougher <phillip(a)squashfs.org.uk>
Squashfs: fix variable overflow triggered by sysbot
Lizhi Xu <lizhi.xu(a)windriver.com>
squashfs: squashfs_read_data need to check if the length is 0
Manas Ghandat <ghandatmanas(a)gmail.com>
jfs: fix shift-out-of-bounds in dbJoin
Jakub Kicinski <kuba(a)kernel.org>
net: don't dump stack on queue timeout
Lizhi Xu <lizhi.xu(a)windriver.com>
jfs: fix log->bdev_handle null ptr deref in lbmStartIO
Jan Kara <jack(a)suse.cz>
jfs: Convert to bdev_open_by_dev()
Jan Kara <jack(a)suse.cz>
fs: Convert to bdev_open_by_dev()
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: fix change_address deadlock during unregister
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: take wiphy lock for MAC addr change
Eric Dumazet <edumazet(a)google.com>
tcp_metrics: optimize tcp_metrics_flush_all()
Yafang Shao <laoar.shao(a)gmail.com>
cgroup: Make operations on the cgroup root_list RCU safe
Dongli Zhang <dongli.zhang(a)oracle.com>
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
David Stevens <stevensd(a)chromium.org>
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Yang Shi <yang(a)os.amperecomputing.com>
mm: gup: stop abusing try_grab_folio
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make svc_stat per-network namespace instead of global
Josef Bacik <josef(a)toxicpanda.com>
nfsd: remove nfsd_stats, make th_cnt a global counter
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make all of the nfsd stats per-network namespace
Josef Bacik <josef(a)toxicpanda.com>
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
Josef Bacik <josef(a)toxicpanda.com>
nfsd: rename NFSD_NET_* to NFSD_STATS_*
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: use the struct net as the svc proc private
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: remove ->pg_stats from svc_program
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: pass in the sv_stats struct through svc_create_pooled
Josef Bacik <josef(a)toxicpanda.com>
nfsd: stop setting ->pg_stats for unused stats
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: don't change ->sv_stats if it doesn't exist
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix frame size warning in svc_export_parse()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Fix route memory corruption
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Clean up route loading
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Documentation/bpf/map_lpm_trie.rst | 2 +-
Documentation/mm/page_table_check.rst | 9 +-
Makefile | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 12 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
arch/x86/include/asm/pgtable.h | 18 +-
drivers/isdn/mISDN/socket.c | 10 +-
drivers/net/ppp/pppoe.c | 23 +--
drivers/nvme/host/pci.c | 7 +
fs/binfmt_flat.c | 4 +-
fs/buffer.c | 2 +
fs/cramfs/inode.c | 2 +-
fs/erofs/decompressor.c | 8 +-
fs/exec.c | 8 +-
fs/ext4/inode.c | 24 ++-
fs/ext4/xattr.c | 155 +++++++-------
fs/f2fs/extent_cache.c | 50 ++---
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 +
fs/f2fs/inode.c | 10 +-
fs/fhandle.c | 6 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/jfs/jfs_logmgr.c | 33 +--
fs/jfs/jfs_logmgr.h | 2 +-
fs/jfs/jfs_mount.c | 3 +-
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/cache.h | 2 -
fs/nfsd/export.c | 32 ++-
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfs4state.c | 3 +-
fs/nfsd/nfscache.c | 40 +---
fs/nfsd/nfsctl.c | 16 +-
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 14 +-
fs/nfsd/stats.c | 54 ++---
fs/nfsd/stats.h | 88 +++-----
fs/nfsd/vfs.c | 6 +-
fs/ntfs3/frecord.c | 75 ++++++-
fs/quota/quota_tree.c | 128 +++++++++---
fs/quota/quota_v2.c | 15 +-
fs/reiserfs/stree.c | 2 +-
fs/romfs/super.c | 2 +-
fs/squashfs/block.c | 2 +-
fs/squashfs/file.c | 3 +-
fs/squashfs/file_direct.c | 6 +-
fs/super.c | 15 +-
include/linux/cgroup-defs.h | 7 +-
include/linux/fs.h | 3 +-
include/linux/sockptr.h | 25 +++
include/linux/sunrpc/svc.h | 5 +-
include/uapi/linux/bpf.h | 19 +-
kernel/bpf/lpm_trie.c | 33 +--
kernel/cgroup/cgroup-internal.h | 3 +-
kernel/cgroup/cgroup.c | 23 ++-
kernel/irq/cpuhotplug.c | 27 ++-
kernel/irq/manage.c | 12 +-
mm/debug_vm_pgtable.c | 31 +--
mm/gup.c | 251 ++++++++++++-----------
mm/huge_memory.c | 6 +-
mm/hugetlb.c | 2 +-
mm/internal.h | 4 +-
mm/page_table_check.c | 30 +++
net/bluetooth/rfcomm/sock.c | 14 +-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
net/ipv4/tcp_metrics.c | 7 +-
net/mac80211/iface.c | 27 ++-
net/nfc/llcp_sock.c | 12 +-
net/rds/recv.c | 13 +-
net/sched/sch_generic.c | 5 +-
net/sctp/inqueue.c | 14 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 39 ++--
net/wireless/nl80211.c | 6 +-
samples/bpf/map_perf_test_user.c | 2 +-
samples/bpf/xdp_router_ipv4_user.c | 2 +-
sound/soc/soc-topology.c | 32 +--
sound/usb/mixer.c | 7 +
tools/include/uapi/linux/bpf.h | 19 +-
tools/testing/selftests/bpf/progs/map_ptr_kern.c | 2 +-
tools/testing/selftests/bpf/test_lpm_map.c | 18 +-
tools/testing/selftests/net/tls.c | 14 ++
87 files changed, 987 insertions(+), 696 deletions(-)
From: Arnd Bergmann <arnd(a)arndb.de>
Both of these architectures require u64 function arguments to be
passed in even/odd pairs of registers or stack slots, which in case of
sync_file_range would result in a seven-argument system call that is
not currently possible. The system call is therefore incompatible with
all existing binaries.
While it would be possible to implement support for seven arguments
like on mips, it seems better to use a six-argument version, either
with the normal argument order but misaligned as on most architectures
or with the reordered sync_file_range2() calling conventions as on
arm and powerpc.
Cc: stable(a)vger.kernel.org
Acked-by: Guo Ren <guoren(a)kernel.org>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
arch/csky/include/uapi/asm/unistd.h | 1 +
arch/hexagon/include/uapi/asm/unistd.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/csky/include/uapi/asm/unistd.h b/arch/csky/include/uapi/asm/unistd.h
index 7ff6a2466af1..e0594b6370a6 100644
--- a/arch/csky/include/uapi/asm/unistd.h
+++ b/arch/csky/include/uapi/asm/unistd.h
@@ -6,6 +6,7 @@
#define __ARCH_WANT_SYS_CLONE3
#define __ARCH_WANT_SET_GET_RLIMIT
#define __ARCH_WANT_TIME32_SYSCALLS
+#define __ARCH_WANT_SYNC_FILE_RANGE2
#include <asm-generic/unistd.h>
#define __NR_set_thread_area (__NR_arch_specific_syscall + 0)
diff --git a/arch/hexagon/include/uapi/asm/unistd.h b/arch/hexagon/include/uapi/asm/unistd.h
index 432c4db1b623..21ae22306b5d 100644
--- a/arch/hexagon/include/uapi/asm/unistd.h
+++ b/arch/hexagon/include/uapi/asm/unistd.h
@@ -36,5 +36,6 @@
#define __ARCH_WANT_SYS_VFORK
#define __ARCH_WANT_SYS_FORK
#define __ARCH_WANT_TIME32_SYSCALLS
+#define __ARCH_WANT_SYNC_FILE_RANGE2
#include <asm-generic/unistd.h>
--
2.39.2
No upstream commit exists for this commit.
Fuzzing of 5.10 stable branch reports a slab-out-of-bounds error in
ata_scsi_pass_thru.
The error is fixed in 5.18 by commit ce70fd9a551a ("scsi: core: Remove the
cmd field from struct scsi_request") upstream.
Backporting this commit would require significant changes to the code so
it is better to use a simple fix for that particular error.
The problem is that the length of the received SCSI command is not
validated if scsi_op == VARIABLE_LENGTH_CMD. It can lead to out-of-bounds
reading if the user sends a request with SCSI command of length less than
32.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Acked-by: Damien Le Moal <dlemoal(a)kernel.org>
Co-developed-by: Mikhail Ivanov <iwanov-23(a)bk.ru>
Signed-off-by: Mikhail Ivanov <iwanov-23(a)bk.ru>
Co-developed-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
Signed-off-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
Signed-off-by: Artem Sadovnikov <ancowi69(a)gmail.com>
---
Link: https://lore.kernel.org/lkml/20240711151546.341491-1-ancowi69@gmail.com/T/#u
unfortunately, stable(a)vger.kernel.org wasn't initially mentioned.
drivers/ata/libata-scsi.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 36f32fa052df..4397986db053 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3949,6 +3949,9 @@ static unsigned int ata_scsi_var_len_cdb_xlat(struct ata_queued_cmd *qc)
const u8 *cdb = scmd->cmnd;
const u16 sa = get_unaligned_be16(&cdb[8]);
+ if (scmd->cmd_len != 32)
+ return 1;
+
/*
* if service action represents a ata pass-thru(32) command,
* then pass it to ata_scsi_pass_thru handler.
--
2.34.1
Although there are several patches improving the extent map shrinker,
there are still reports of too frequent shrinker behavior, taking too
much CPU for the kswapd process.
So let's only enable extent shrinker for now, until we got more
comprehensive understanding and a better solution.
Link: https://lore.kernel.org/linux-btrfs/3df4acd616a07ef4d2dc6bad668701504b412ff…
Link: https://lore.kernel.org/linux-btrfs/c30fd6b3-ca7a-4759-8a53-d42878bf84f7@gm…
Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
CC: stable(a)vger.kernel.org # 6.10+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
I also checked how XFS (the only other fs implemented the
free_cached_objects callback) implemented the callback.
They did two things:
- Make sure there is only one queued reclaim
Currently we only do the reclaim for kswapd, but for multi-node
systems, we can still have multiple kswapd processes.
But I do not think that's the root cause.
- With an extra delay of 60% of xfs_syncd_centiseccs
The default value for xfs_syncd_centiseccs is 3000 centiseconds (30s),
with a minimal 100 centiseconds (1s).
This results the reclaim work only to be executed at most every 18
seconds by default (or 0.6s for the minimal interval).
I believe this is the root cause, we have no extra delay and that
makes btrfs to shrink extent maps too frequently.
---
fs/btrfs/super.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 11044e9e2cb1..98fa0f382480 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2402,7 +2402,13 @@ static long btrfs_nr_cached_objects(struct super_block *sb, struct shrink_contro
trace_btrfs_extent_map_shrinker_count(fs_info, nr);
- return nr;
+ /*
+ * Only report the real number for DEBUG builds, as there are reports of
+ * serious performance degradation caused by too frequent shrinks.
+ */
+ if (IS_ENABLED(CONFIG_BTRFS_DEBUG))
+ return nr;
+ return 0;
}
static long btrfs_free_cached_objects(struct super_block *sb, struct shrink_control *sc)
--
2.46.0
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.43.0
It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match()
There is no reason it should not be done here
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
Cc: stable(a)vger.kernel.org
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 786ceae34488..e417ff0ea06c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -1244,7 +1244,7 @@ static u64 hash_filter_ntuple(struct ch_filter_specification *fs,
* in the Compressed Filter Tuple.
*/
if (tp->vlan_shift >= 0 && fs->mask.ivlan)
- ntuple |= (FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift;
+ ntuple |= (u64)(FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift;
if (tp->port_shift >= 0 && fs->mask.iport)
ntuple |= (u64)fs->val.iport << tp->port_shift;
--
2.34.1
Greetings,
Did you receive my last email message I sent to this Email
address: ( stable(a)vger.kernel.org ) concerning relocating my
investment to your country due to the on going war in my country
Russia.
Best Regards,
Mr.Boris Soroka.
From: Steven Rostedt <rostedt(a)goodmis.org>
When running the following:
# cd /sys/kernel/tracing/
# echo 1 > events/sched/sched_waking/enable
# echo 1 > events/sched/sched_switch/enable
# echo 0 > tracing_on
# dd if=per_cpu/cpu0/trace_pipe_raw of=/tmp/raw0.dat
The dd task would get stuck in an infinite loop in the kernel. What would
happen is the following:
When ring_buffer_read_page() returns -1 (no data) then a check is made to
see if the buffer is empty (as happens when the page is not full), it will
call wait_on_pipe() to wait until the ring buffer has data. When it is it
will try again to read data (unless O_NONBLOCK is set).
The issue happens when there's a reader and the file descriptor is closed.
The wait_on_pipe() will return when that is the case. But this loop will
continue to try again and wait_on_pipe() will again return immediately and
the loop will continue and never stop.
Simply check if the file was closed before looping and exit out if it is.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/20240808235730.78bf63e5@rorschach.local.home
Fixes: 2aa043a55b9a7 ("tracing/ring-buffer: Fix wait_on_pipe() race")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 10cd38bce2f1..ebe7ce2f5f4a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7956,7 +7956,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
trace_access_unlock(iter->cpu_file);
if (ret < 0) {
- if (trace_empty(iter)) {
+ if (trace_empty(iter) && !iter->closed) {
if ((filp->f_flags & O_NONBLOCK))
return -EAGAIN;
--
2.43.0
This is the start of the stable review cycle for the 6.10.6 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.6-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.10.6-rc1
Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Wayne Lin <Wayne.Lin(a)amd.com>
drm/amd/display: Solve mst monitors blank out problem after resume
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: introduce a generic notification chain
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Simon Trimmer <simont(a)opensource.cirrus.com>
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Fangzhi Zuo <jerry.zuo(a)amd.com>
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Harry Wentland <harry.wentland(a)amd.com>
drm/amd/display: Separate setting and programming of cursor
Wayne Lin <wayne.lin(a)amd.com>
drm/amd/display: Defer handling mst up request in resume
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Makefile | 4 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +-
.../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +-
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 94 ++++++++-----
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8 ++
.../drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2 +-
drivers/nvme/host/pci.c | 7 +
drivers/platform/x86/Kconfig | 1 +
drivers/platform/x86/amd/pmf/spc.c | 32 ++---
drivers/platform/x86/ideapad-laptop.c | 148 ++++++++++++++++++---
drivers/platform/x86/ideapad-laptop.h | 9 ++
drivers/platform/x86/lenovo-ymc.c | 60 +--------
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/f2fs/extent_cache.c | 50 +++----
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 ++
fs/f2fs/inode.c | 10 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/ntfs3/frecord.c | 75 ++++++++++-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
sound/soc/codecs/cs35l56-shared.c | 1 +
sound/usb/mixer.c | 7 +
26 files changed, 388 insertions(+), 179 deletions(-)
mwifiex_band_2ghz and mwifiex_band_5ghz are statically allocated, but
used and modified in driver instances. Duplicate them before using
them in driver instances so that different driver instances do not
influence each other.
This was observed on a board which has one PCIe and one SDIO mwifiex
adapter. It blew up in mwifiex_setup_ht_caps(). This was called with
the statically allocated struct which is modified in this function.
Cc: stable(a)vger.kernel.org
Fixes: d6bffe8bb520 ("mwifiex: support for creation of AP interface")
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 32 ++++++++++++++++++++-----
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index b909a7665e9cc..d2e4153192032 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -4361,11 +4361,27 @@ int mwifiex_register_cfg80211(struct mwifiex_adapter *adapter)
if (ISSUPP_ADHOC_ENABLED(adapter->fw_cap_info))
wiphy->interface_modes |= BIT(NL80211_IFTYPE_ADHOC);
- wiphy->bands[NL80211_BAND_2GHZ] = &mwifiex_band_2ghz;
- if (adapter->config_bands & BAND_A)
- wiphy->bands[NL80211_BAND_5GHZ] = &mwifiex_band_5ghz;
- else
+ wiphy->bands[NL80211_BAND_2GHZ] = devm_kmemdup(adapter->dev,
+ &mwifiex_band_2ghz,
+ sizeof(mwifiex_band_2ghz),
+ GFP_KERNEL);
+ if (!wiphy->bands[NL80211_BAND_2GHZ]) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ if (adapter->config_bands & BAND_A) {
+ wiphy->bands[NL80211_BAND_5GHZ] = devm_kmemdup(adapter->dev,
+ &mwifiex_band_5ghz,
+ sizeof(mwifiex_band_5ghz),
+ GFP_KERNEL);
+ if (!wiphy->bands[NL80211_BAND_5GHZ]) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ } else {
wiphy->bands[NL80211_BAND_5GHZ] = NULL;
+ }
if (adapter->drcs_enabled && ISSUPP_DRCS_ENABLED(adapter->fw_cap_info))
wiphy->iface_combinations = &mwifiex_iface_comb_ap_sta_drcs;
@@ -4459,8 +4475,7 @@ int mwifiex_register_cfg80211(struct mwifiex_adapter *adapter)
if (ret < 0) {
mwifiex_dbg(adapter, ERROR,
"%s: wiphy_register failed: %d\n", __func__, ret);
- wiphy_free(wiphy);
- return ret;
+ goto err;
}
if (!adapter->regd) {
@@ -4502,4 +4517,9 @@ int mwifiex_register_cfg80211(struct mwifiex_adapter *adapter)
adapter->wiphy = wiphy;
return ret;
+
+err:
+ wiphy_free(wiphy);
+
+ return ret;
}
---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20240809-mwifiex-duplicate-static-structs-f6355e8da797
Best regards,
--
Sascha Hauer <s.hauer(a)pengutronix.de>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: dfb3911c3692e45b027f13c7dca3230921533953
Gitweb: https://git.kernel.org/tip/dfb3911c3692e45b027f13c7dca3230921533953
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 14 Aug 2024 00:29:36 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 16 Aug 2024 11:33:33 +02:00
x86/kaslr: Expose and use the end of the physical memory address space
iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.
KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions. It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory. This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.
MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits. All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.
Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.
To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.
Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions")
Reported-by: Max Ramanouski <max8rr8(a)gmail.com>
Reported-by: Alistair Popple <apopple(a)nvidia.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-By: Max Ramanouski <max8rr8(a)gmail.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
---
arch/x86/include/asm/page_64.h | 1 +-
arch/x86/include/asm/pgtable_64_types.h | 4 ++++-
arch/x86/mm/init_64.c | 4 ++++-
arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++---
include/linux/mm.h | 4 ++++-
kernel/resource.c | 6 ++----
mm/memory_hotplug.c | 2 +-
mm/sparse.c | 2 +-
8 files changed, 40 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index af4302d..f3d257c 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -17,6 +17,7 @@ extern unsigned long phys_base;
extern unsigned long page_offset_base;
extern unsigned long vmalloc_base;
extern unsigned long vmemmap_base;
+extern unsigned long physmem_end;
static __always_inline unsigned long __phys_addr_nodebug(unsigned long x)
{
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 9053dfe..a98e534 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -140,6 +140,10 @@ extern unsigned int ptrs_per_p4d;
# define VMEMMAP_START __VMEMMAP_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
+#ifdef CONFIG_RANDOMIZE_MEMORY
+# define PHYSMEM_END physmem_end
+#endif
+
/*
* End of the region for which vmalloc page tables are pre-allocated.
* For non-KMSAN builds, this is the same as VMALLOC_END.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index d8dbeac..ff25364 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -958,8 +958,12 @@ static void update_end_of_memory_vars(u64 start, u64 size)
int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
struct mhp_params *params)
{
+ unsigned long end = ((start_pfn + nr_pages) << PAGE_SHIFT) - 1;
int ret;
+ if (WARN_ON_ONCE(end > PHYSMEM_END))
+ return -ERANGE;
+
ret = __add_pages(nid, start_pfn, nr_pages, params);
WARN_ON_ONCE(ret);
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 37db264..0f2a3a4 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -47,13 +47,24 @@ static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
*/
static __initdata struct kaslr_memory_region {
unsigned long *base;
+ unsigned long *end;
unsigned long size_tb;
} kaslr_regions[] = {
- { &page_offset_base, 0 },
- { &vmalloc_base, 0 },
- { &vmemmap_base, 0 },
+ {
+ .base = &page_offset_base,
+ .end = &physmem_end,
+ },
+ {
+ .base = &vmalloc_base,
+ },
+ {
+ .base = &vmemmap_base,
+ },
};
+/* The end of the possible address space for physical memory */
+unsigned long physmem_end __ro_after_init;
+
/* Get size in bytes used by the memory region */
static inline unsigned long get_padding(struct kaslr_memory_region *region)
{
@@ -82,6 +93,8 @@ void __init kernel_randomize_memory(void)
BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
+ /* Preset the end of the possible address space for physical memory */
+ physmem_end = ((1ULL << MAX_PHYSMEM_BITS) - 1);
if (!kaslr_memory_enabled())
return;
@@ -134,6 +147,13 @@ void __init kernel_randomize_memory(void)
*/
vaddr += get_padding(&kaslr_regions[i]);
vaddr = round_up(vaddr + 1, PUD_SIZE);
+
+ /*
+ * KASLR trims the maximum possible size of the
+ * direct-map. Update the physmem_end boundary.
+ */
+ if (kaslr_regions[i].end)
+ *kaslr_regions[i].end = __pa(vaddr) - 1;
remain_entropy -= entropy;
}
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c4b238a..b386415 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -97,6 +97,10 @@ extern const int mmap_rnd_compat_bits_max;
extern int mmap_rnd_compat_bits __read_mostly;
#endif
+#ifndef PHYSMEM_END
+# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1)
+#endif
+
#include <asm/page.h>
#include <asm/processor.h>
diff --git a/kernel/resource.c b/kernel/resource.c
index 14777af..a83040f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1826,8 +1826,7 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
if (flags & GFR_DESCENDING) {
resource_size_t end;
- end = min_t(resource_size_t, base->end,
- (1ULL << MAX_PHYSMEM_BITS) - 1);
+ end = min_t(resource_size_t, base->end, PHYSMEM_END);
return end - size + 1;
}
@@ -1844,8 +1843,7 @@ static bool gfr_continue(struct resource *base, resource_size_t addr,
* @size did not wrap 0.
*/
return addr > addr - size &&
- addr <= min_t(resource_size_t, base->end,
- (1ULL << MAX_PHYSMEM_BITS) - 1);
+ addr <= min_t(resource_size_t, base->end, PHYSMEM_END);
}
static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 66267c2..951878a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1681,7 +1681,7 @@ struct range __weak arch_get_mappable_range(void)
struct range mhp_get_pluggable_range(bool need_mapping)
{
- const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1;
+ const u64 max_phys = PHYSMEM_END;
struct range mhp_range;
if (need_mapping) {
diff --git a/mm/sparse.c b/mm/sparse.c
index e4b8300..0c3bff8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -129,7 +129,7 @@ static inline int sparse_early_nid(struct mem_section *section)
static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
unsigned long *end_pfn)
{
- unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
+ unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT;
/*
* Sanity checks - do not allow an architecture to pass
The quilt patch titled
Subject: alloc_tag: mark pages reserved during CMA activation as not tagged
has been removed from the -mm tree. Its filename was
alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: alloc_tag: mark pages reserved during CMA activation as not tagged
Date: Tue, 13 Aug 2024 08:07:57 -0700
During CMA activation, pages in CMA area are prepared and then freed
without being allocated. This triggers warnings when memory allocation
debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled. Fix this by
marking these pages not tagged before freeing them.
Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mm_init.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/mm_init.c~alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged
+++ a/mm/mm_init.c
@@ -2244,6 +2244,8 @@ void __init init_cma_reserved_pageblock(
set_pageblock_migratetype(page, MIGRATE_CMA);
set_page_refcounted(page);
+ /* pages were reserved and not allocated */
+ clear_page_tag_ref(page);
__free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
_
Patches currently in -mm which might be from surenb(a)google.com are
The quilt patch titled
Subject: alloc_tag: introduce clear_page_tag_ref() helper function
has been removed from the -mm tree. Its filename was
alloc_tag-introduce-clear_page_tag_ref-helper-function.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: alloc_tag: introduce clear_page_tag_ref() helper function
Date: Tue, 13 Aug 2024 08:07:56 -0700
In several cases we are freeing pages which were not allocated using
common page allocators. For such cases, in order to keep allocation
accounting correct, we should clear the page tag to indicate that the page
being freed is expected to not have a valid allocation tag. Introduce
clear_page_tag_ref() helper function to be used for this.
Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pgalloc_tag.h | 13 +++++++++++++
mm/mm_init.c | 10 +---------
mm/page_alloc.c | 9 +--------
3 files changed, 15 insertions(+), 17 deletions(-)
--- a/include/linux/pgalloc_tag.h~alloc_tag-introduce-clear_page_tag_ref-helper-function
+++ a/include/linux/pgalloc_tag.h
@@ -43,6 +43,18 @@ static inline void put_page_tag_ref(unio
page_ext_put(page_ext_from_codetag_ref(ref));
}
+static inline void clear_page_tag_ref(struct page *page)
+{
+ if (mem_alloc_profiling_enabled()) {
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref) {
+ set_codetag_empty(ref);
+ put_page_tag_ref(ref);
+ }
+ }
+}
+
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr)
{
@@ -126,6 +138,7 @@ static inline void pgalloc_tag_sub_pages
static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
static inline void put_page_tag_ref(union codetag_ref *ref) {}
+static inline void clear_page_tag_ref(struct page *page) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
--- a/mm/mm_init.c~alloc_tag-introduce-clear_page_tag_ref-helper-function
+++ a/mm/mm_init.c
@@ -2459,15 +2459,7 @@ void __init memblock_free_pages(struct p
}
/* pages were reserved and not allocated */
- if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
- }
- }
-
+ clear_page_tag_ref(page);
__free_pages_core(page, order, MEMINIT_EARLY);
}
--- a/mm/page_alloc.c~alloc_tag-introduce-clear_page_tag_ref-helper-function
+++ a/mm/page_alloc.c
@@ -5815,14 +5815,7 @@ unsigned long free_reserved_area(void *s
void free_reserved_page(struct page *page)
{
- if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
- }
- }
+ clear_page_tag_ref(page);
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
_
Patches currently in -mm which might be from surenb(a)google.com are
The quilt patch titled
Subject: selftests: memfd_secret: don't build memfd_secret test on unsupported arches
has been removed from the -mm tree. Its filename was
selftests-memfd_secret-dont-build-memfd_secret-test-on-unsupported-arches.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: selftests: memfd_secret: don't build memfd_secret test on unsupported arches
Date: Fri, 9 Aug 2024 12:56:42 +0500
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/Makefile | 2 ++
tools/testing/selftests/mm/run_vmtests.sh | 3 +++
2 files changed, 5 insertions(+)
--- a/tools/testing/selftests/mm/Makefile~selftests-memfd_secret-dont-build-memfd_secret-test-on-unsupported-arches
+++ a/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
--- a/tools/testing/selftests/mm/run_vmtests.sh~selftests-memfd_secret-dont-build-memfd_secret-test-on-unsupported-arches
+++ a/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
selftests-mm-fix-build-errors-on-armhf.patch
The quilt patch titled
Subject: mm: fix endless reclaim on machines with unaccepted memory
has been removed from the -mm tree. Its filename was
mm-fix-endless-reclaim-on-machines-with-unaccepted-memory.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm: fix endless reclaim on machines with unaccepted memory
Date: Fri, 9 Aug 2024 14:48:47 +0300
Unaccepted memory is considered unusable free memory, which is not counted
as free on the zone watermark check. This causes get_page_from_freelist()
to accept more memory to hit the high watermark, but it creates problems
in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts to
reclaim memory. This is usually successful, but if there is little or no
reclaimable memory, it can result in endless reclaim with little to no
progress. This can occur early in the boot process, just after start of
the init process when the only reclaimable memory is the page cache of the
init executable and its libraries.
Make unaccepted memory free from watermark check point of view. This way
unaccepted memory will never be the trigger of memory reclaim. Accept
more memory in the get_page_from_freelist() if needed.
Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Jianxiong Gao <jxgao(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Jianxiong Gao <jxgao(a)google.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 42 ++++++++++++++++++++----------------------
1 file changed, 20 insertions(+), 22 deletions(-)
--- a/mm/page_alloc.c~mm-fix-endless-reclaim-on-machines-with-unaccepted-memory
+++ a/mm/page_alloc.c
@@ -287,7 +287,7 @@ EXPORT_SYMBOL(nr_online_nodes);
static bool page_contains_unaccepted(struct page *page, unsigned int order);
static void accept_page(struct page *page, unsigned int order);
-static bool try_to_accept_memory(struct zone *zone, unsigned int order);
+static bool cond_accept_memory(struct zone *zone, unsigned int order);
static inline bool has_unaccepted_memory(void);
static bool __free_unaccepted(struct page *page);
@@ -3072,9 +3072,6 @@ static inline long __zone_watermark_unus
if (!(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
-#ifdef CONFIG_UNACCEPTED_MEMORY
- unusable_free += zone_page_state(z, NR_UNACCEPTED);
-#endif
return unusable_free;
}
@@ -3368,6 +3365,8 @@ retry:
}
}
+ cond_accept_memory(zone, order);
+
/*
* Detect whether the number of free pages is below high
* watermark. If so, we will decrease pcp->high and free
@@ -3393,10 +3392,8 @@ check_alloc_wmark:
gfp_mask)) {
int ret;
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
@@ -3450,10 +3447,8 @@ try_this_zone:
return page;
} else {
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/* Try again if zone has deferred pages */
@@ -6950,9 +6945,6 @@ static bool try_to_accept_memory_one(str
struct page *page;
bool last;
- if (list_empty(&zone->unaccepted_pages))
- return false;
-
spin_lock_irqsave(&zone->lock, flags);
page = list_first_entry_or_null(&zone->unaccepted_pages,
struct page, lru);
@@ -6978,23 +6970,29 @@ static bool try_to_accept_memory_one(str
return true;
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
long to_accept;
- int ret = false;
+ bool ret = false;
+
+ if (!has_unaccepted_memory())
+ return false;
+
+ if (list_empty(&zone->unaccepted_pages))
+ return false;
/* How much to accept to get to high watermark? */
to_accept = high_wmark_pages(zone) -
(zone_page_state(zone, NR_FREE_PAGES) -
- __zone_watermark_unusable_free(zone, order, 0));
+ __zone_watermark_unusable_free(zone, order, 0) -
+ zone_page_state(zone, NR_UNACCEPTED));
- /* Accept at least one page */
- do {
+ while (to_accept > 0) {
if (!try_to_accept_memory_one(zone))
break;
ret = true;
to_accept -= MAX_ORDER_NR_PAGES;
- } while (to_accept > 0);
+ }
return ret;
}
@@ -7037,7 +7035,7 @@ static void accept_page(struct page *pag
{
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
return false;
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-reduce-deferred-struct-page-init-ifdeffery.patch
mm-accept-memory-in-__alloc_pages_bulk.patch
mm-introduce-pageunaccepted-page-type.patch
mm-rework-accept-memory-helpers.patch
mm-add-a-helper-to-accept-page.patch
mm-page_isolation-handle-unaccepted-memory-isolation.patch
mm-accept-to-promo-watermark.patch
The quilt patch titled
Subject: mm/numa: no task_numa_fault() call if PMD is changed
has been removed from the -mm tree. Its filename was
mm-numa-no-task_numa_fault-call-if-pmd-is-changed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/numa: no task_numa_fault() call if PMD is changed
Date: Fri, 9 Aug 2024 10:59:05 -0400
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)
--- a/mm/huge_memory.c~mm-numa-no-task_numa_fault-call-if-pmd-is-changed
+++ a/mm/huge_memory.c
@@ -1685,7 +1685,7 @@ vm_fault_t do_huge_pmd_numa_page(struct
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
spin_unlock(vmf->ptl);
- goto out;
+ return 0;
}
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1728,22 +1728,16 @@ vm_fault_t do_huge_pmd_numa_page(struct
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
flags |= TNF_MIGRATED;
nid = target_nid;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
- spin_unlock(vmf->ptl);
- goto out;
- }
- goto out_map;
- }
-
-out:
- if (nid != NUMA_NO_NODE)
task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
+ }
- return 0;
-
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+ spin_unlock(vmf->ptl);
+ return 0;
+ }
out_map:
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1753,7 +1747,10 @@ out_map:
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
spin_unlock(vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
/*
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
memory-tiering-read-last_cpupid-correctly-in-do_huge_pmd_numa_page.patch
memory-tiering-introduce-folio_use_access_time-check.patch
memory-tiering-count-pgpromote_success-when-mem-tiering-is-enabled.patch
mm-migrate-move-common-code-to-numa_migrate_check-was-numa_migrate_prep.patch
The quilt patch titled
Subject: mm/numa: no task_numa_fault() call if PTE is changed
has been removed from the -mm tree. Its filename was
mm-numa-no-task_numa_fault-call-if-pte-is-changed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/numa: no task_numa_fault() call if PTE is changed
Date: Fri, 9 Aug 2024 10:59:04 -0400
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
TLB flush via delaying mapping on hint page fault") restructured
do_numa_page() and did not avoid task_numa_fault() call in the second page
table check after a numa migration failure. Fix it by making all
!pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
--- a/mm/memory.c~mm-numa-no-task_numa_fault-call-if-pte-is-changed
+++ a/mm/memory.c
@@ -5295,7 +5295,7 @@ static vm_fault_t do_numa_page(struct vm
if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+ return 0;
}
pte = pte_modify(old_pte, vma->vm_page_prot);
@@ -5358,23 +5358,19 @@ static vm_fault_t do_numa_page(struct vm
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
nid = target_nid;
flags |= TNF_MIGRATED;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
- vmf->address, &vmf->ptl);
- if (unlikely(!vmf->pte))
- goto out;
- if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, nr_pages, flags);
- return 0;
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+ vmf->address, &vmf->ptl);
+ if (unlikely(!vmf->pte))
+ return 0;
+ if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ return 0;
+ }
out_map:
/*
* Make it present again, depending on how arch implements
@@ -5387,7 +5383,10 @@ out_map:
numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
writable);
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
memory-tiering-read-last_cpupid-correctly-in-do_huge_pmd_numa_page.patch
memory-tiering-introduce-folio_use_access_time-check.patch
memory-tiering-count-pgpromote_success-when-mem-tiering-is-enabled.patch
mm-migrate-move-common-code-to-numa_migrate_check-was-numa_migrate_prep.patch
The quilt patch titled
Subject: mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
has been removed from the -mm tree. Its filename was
mm-vmalloc-fix-page-mapping-if-vm_area_alloc_pages-with-high-order-fallback-to-order-0.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hailong Liu <hailong.liu(a)oppo.com>
Subject: mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Date: Thu, 8 Aug 2024 20:19:56 +0800
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e ("mm,
vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
__GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
failed for high order, the pages** may contain two different page shifts
(high order and order-0). This could lead __vmap_pages_range_noflush() to
perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for
PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
__vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
vmap_pages_range()
vmap_pages_range_noflush()
__vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails,
__vmalloc_node_range_noprof() will retry with order-0. Therefore, it is
unnecessary to fallback to order-0 here. Therefore, fix this by removing
the fallback code.
Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com>
Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Acked-by: Barry Song <baohua(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-fix-page-mapping-if-vm_area_alloc_pages-with-high-order-fallback-to-order-0
+++ a/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_noprof(alloc_gfp, order);
else
page = alloc_pages_node_noprof(nid, alloc_gfp, order);
- if (unlikely(!page)) {
- if (!nofail)
- break;
-
- /* fall back to the zero order allocations */
- alloc_gfp |= __GFP_NOFAIL;
- order = 0;
- continue;
- }
+ if (unlikely(!page))
+ break;
/*
* Higher order allocations must be able to be treated as
_
Patches currently in -mm which might be from hailong.liu(a)oppo.com are
The quilt patch titled
Subject: mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
has been removed from the -mm tree. Its filename was
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
Date: Tue, 6 Aug 2024 12:41:07 -0400
The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption. The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
[12135.732252] preempt_count: 1, expected: 0
[12135.732255] RCU nest depth: 2, expected: 2
:
[12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
[12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
[12135.732433] Call Trace:
[12135.732436] <TASK>
[12135.732450] dump_stack_lvl+0x57/0x81
[12135.732461] __might_resched.cold+0xf4/0x12f
[12135.732479] rt_spin_lock+0x4c/0x100
[12135.732491] memory_failure_queue+0x40/0xe0
[12135.732503] ghes_do_memory_failure+0x53/0x390
[12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
[12135.732575] ghes_proc+0xf9/0x1a0
[12135.732591] ghes_notify_hed+0x6a/0x150
[12135.732602] notifier_call_chain+0x43/0xb0
[12135.732626] blocking_notifier_call_chain+0x43/0x60
[12135.732637] acpi_ev_notify_dispatch+0x47/0x70
[12135.732648] acpi_os_execute_deferred+0x13/0x20
[12135.732654] process_one_work+0x41f/0x500
[12135.732695] worker_thread+0x192/0x360
[12135.732715] kthread+0x111/0x140
[12135.732733] ret_from_fork+0x29/0x50
[12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.
[longman(a)redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu
+++ a/mm/memory-failure.c
@@ -2417,7 +2417,7 @@ struct memory_failure_entry {
struct memory_failure_cpu {
DECLARE_KFIFO(fifo, struct memory_failure_entry,
MEMORY_FAILURE_FIFO_SIZE);
- spinlock_t lock;
+ raw_spinlock_t lock;
struct work_struct work;
};
@@ -2443,20 +2443,22 @@ void memory_failure_queue(unsigned long
{
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
+ bool buffer_overflow;
struct memory_failure_entry entry = {
.pfn = pfn,
.flags = flags,
};
mf_cpu = &get_cpu_var(memory_failure_cpu);
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
- if (kfifo_put(&mf_cpu->fifo, entry))
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
+ if (!buffer_overflow)
schedule_work_on(smp_processor_id(), &mf_cpu->work);
- else
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ put_cpu_var(memory_failure_cpu);
+ if (buffer_overflow)
pr_err("buffer overflow when queuing memory failure at %#lx\n",
pfn);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
- put_cpu_var(memory_failure_cpu);
}
EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2469,9 +2471,9 @@ static void memory_failure_work_func(str
mf_cpu = container_of(work, struct memory_failure_cpu, work);
for (;;) {
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
gotten = kfifo_get(&mf_cpu->fifo, &entry);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
@@ -2501,7 +2503,7 @@ static int __init memory_failure_init(vo
for_each_possible_cpu(cpu) {
mf_cpu = &per_cpu(memory_failure_cpu, cpu);
- spin_lock_init(&mf_cpu->lock);
+ raw_spin_lock_init(&mf_cpu->lock);
INIT_KFIFO(mf_cpu->fifo);
INIT_WORK(&mf_cpu->work, memory_failure_work_func);
}
_
Patches currently in -mm which might be from longman(a)redhat.com are
watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch
The quilt patch titled
Subject: mm/hugetlb: fix hugetlb vs. core-mm PT locking
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/hugetlb: fix hugetlb vs. core-mm PT locking
Date: Thu, 1 Aug 2024 22:47:48 +0200
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables
we can have with CONFIG_HIGHPTE.
Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such
that when e.g., CONFIG_PGTABLE_LEVELS==2 with
PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document
why that works.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 33 ++++++++++++++++++++++++++++++---
include/linux/mm.h | 11 +++++++++++
2 files changed, 41 insertions(+), 3 deletions(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking
+++ a/include/linux/hugetlb.h
@@ -944,10 +944,37 @@ static inline bool htlb_allow_alloc_fall
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ const unsigned long size = huge_page_size(h);
+
+ VM_WARN_ON(size == PAGE_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ *
+ * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where
+ * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use pud_lockptr()
+ * and core-mm would use pmd_lockptr(). However, in such configurations
+ * split PMD locks are disabled -- they don't make sense on a single
+ * PGDIR page table -- and the end result is the same.
+ */
+ if (size >= PUD_SIZE)
+ return pud_lockptr(mm, (pud_t *) pte);
+ else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
- return &mm->page_table_lock;
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
}
#ifndef hugepages_supported
--- a/include/linux/mm.h~mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking
+++ a/include/linux/mm.h
@@ -2920,6 +2920,13 @@ static inline spinlock_t *pte_lockptr(st
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE);
+ return ptlock_ptr(virt_to_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2944,6 +2951,10 @@ static inline spinlock_t *pte_lockptr(st
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-turn-use_split_pte_ptlocks-use_split_pte_ptlocks-into-kconfig-options.patch
mm-hugetlb-enforce-that-pmd-pt-sharing-has-split-pmd-pt-locks.patch
powerpc-8xx-document-and-enforce-that-split-pt-locks-are-not-used.patch
mm-simplify-arch_make_folio_accessible.patch
mm-gup-convert-to-arch_make_folio_accessible.patch
s390-uv-drop-arch_make_page_accessible.patch
mm-hugetlb-remove-hugetlb_follow_page_mask-leftover.patch
mm-rmap-cleanup-partially-mapped-handling-in-__folio_remove_rmap.patch
mm-clarify-folio_likely_mapped_shared-documentation-for-ksm-folios.patch
mm-provide-vm_normal_pagefolio_pmd-with-config_pgtable_has_huge_leaves.patch
mm-pagewalk-introduce-folio_walk_start-folio_walk_end.patch
mm-migrate-convert-do_pages_stat_array-from-follow_page-to-folio_walk.patch
mm-migrate-convert-add_page_for_migration-from-follow_page-to-folio_walk.patch
mm-ksm-convert-get_mergeable_page-from-follow_page-to-folio_walk.patch
mm-ksm-convert-scan_get_next_rmap_item-from-follow_page-to-folio_walk.patch
mm-huge_memory-convert-split_huge_pages_pid-from-follow_page-to-folio_walk.patch
mm-huge_memory-convert-split_huge_pages_pid-from-follow_page-to-folio_walk-fix.patch
s390-uv-convert-gmap_destroy_page-from-follow_page-to-folio_walk.patch
s390-mm-fault-convert-do_secure_storage_access-from-follow_page-to-folio_walk.patch
mm-remove-follow_page.patch
mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
mm-rmap-minimize-folio-_nr_pages_mapped-updates-when-batching-pte-unmapping.patch
The quilt patch titled
Subject: mseal: fix is_madv_discard()
has been removed from the -mm tree. Its filename was
mseal-fix-is_madv_discard.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Pedro Falcato <pedro.falcato(a)gmail.com>
Subject: mseal: fix is_madv_discard()
Date: Wed, 7 Aug 2024 18:33:35 +0100
is_madv_discard did its check wrong. MADV_ flags are not bitwise,
they're normal sequential numbers. So, for instance:
behavior & (/* ... */ | MADV_REMOVE)
tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard
operations.
As a result the kernel could erroneously block certain madvises (e.g
MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits
with blocked MADV operations (e.g REMOVE or WIPEONFORK).
This is obviously incorrect, so use a switch statement instead.
Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com
Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com
Fixes: 8be7258aad44 ("mseal: add mseal syscall")
Signed-off-by: Pedro Falcato <pedro.falcato(a)gmail.com>
Tested-by: Jeff Xu <jeffxu(a)chromium.org>
Reviewed-by: Jeff Xu <jeffxu(a)chromium.org>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mseal.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- a/mm/mseal.c~mseal-fix-is_madv_discard
+++ a/mm/mseal.c
@@ -40,9 +40,17 @@ static bool can_modify_vma(struct vm_are
static bool is_madv_discard(int behavior)
{
- return behavior &
- (MADV_FREE | MADV_DONTNEED | MADV_DONTNEED_LOCKED |
- MADV_REMOVE | MADV_DONTFORK | MADV_WIPEONFORK);
+ switch (behavior) {
+ case MADV_FREE:
+ case MADV_DONTNEED:
+ case MADV_DONTNEED_LOCKED:
+ case MADV_REMOVE:
+ case MADV_DONTFORK:
+ case MADV_WIPEONFORK:
+ return true;
+ }
+
+ return false;
}
static bool is_ro_anon(struct vm_area_struct *vma)
_
Patches currently in -mm which might be from pedro.falcato(a)gmail.com are
selftests-mm-add-mseal-test-for-no-discard-madvise.patch
selftests-mm-add-mseal-test-for-no-discard-madvise-fix.patch
The locks_remove_posix() function in fcntl_setlk/fcntl_setlk64 is designed
to reliably remove locks when an fcntl/close race is detected. However, it
was passing in the wrong filelock owner, it looks like a mistake and
resulting in a failure to remove locks. More critically, if the lock
removal fails, it could lead to a uaf issue while traversing the locks.
This problem occurs only in the 4.19/5.4 stable version.
Fixes: a561145f3ae9 ("filelock: Fix fcntl/close race recovery compat path")
Fixes: d30ff3304083 ("filelock: Remove locks reliably when fcntl/close race is detected")
Cc: stable(a)vger.kernel.org
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/locks.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 234ebfa8c070..b1201b01867a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2313,7 +2313,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
@@ -2443,7 +2443,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
--
2.39.2
The quilt patch titled
Subject: lib/stackdepot: double DEPOT_POOLS_CAP if KASAN is enabled
has been removed from the -mm tree. Its filename was
lib-stackdepot-double-depot_pools_cap-if-kasan-is-enabled.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: lib/stackdepot: double DEPOT_POOLS_CAP if KASAN is enabled
Date: Wed, 7 Aug 2024 12:52:28 -0400
When a wide variety of workloads are run on a debug kernel with KASAN
enabled, the following warning may sometimes be printed.
[ 6818.650674] Stack depot reached limit capacity
[ 6818.650730] WARNING: CPU: 1 PID: 272741 at lib/stackdepot.c:252 depot_alloc_stack+0x39e/0x3d0
:
[ 6818.650907] Call Trace:
[ 6818.650909] [<00047dd453d84b92>] depot_alloc_stack+0x3a2/0x3d0
[ 6818.650916] [<00047dd453d85254>] stack_depot_save_flags+0x4f4/0x5c0
[ 6818.650920] [<00047dd4535872c6>] kasan_save_stack+0x56/0x70
[ 6818.650924] [<00047dd453587328>] kasan_save_track+0x28/0x40
[ 6818.650927] [<00047dd45358a27a>] kasan_save_free_info+0x4a/0x70
[ 6818.650930] [<00047dd45358766a>] __kasan_slab_free+0x12a/0x1d0
[ 6818.650933] [<00047dd45350deb4>] kmem_cache_free+0x1b4/0x580
[ 6818.650938] [<00047dd452c520da>] __put_task_struct+0x24a/0x320
[ 6818.650945] [<00047dd452c6aee4>] delayed_put_task_struct+0x294/0x350
[ 6818.650949] [<00047dd452e9066a>] rcu_do_batch+0x6ea/0x2090
[ 6818.650953] [<00047dd452ea60f4>] rcu_core+0x474/0xa90
[ 6818.650956] [<00047dd452c780c0>] handle_softirqs+0x3c0/0xf90
[ 6818.650960] [<00047dd452c76fbe>] __irq_exit_rcu+0x35e/0x460
[ 6818.650963] [<00047dd452c79992>] irq_exit_rcu+0x22/0xb0
[ 6818.650966] [<00047dd454bd8128>] do_ext_irq+0xd8/0x120
[ 6818.650972] [<00047dd454c0ddd0>] ext_int_handler+0xb8/0xe8
[ 6818.650979] [<00047dd453589cf6>] kasan_check_range+0x236/0x2f0
[ 6818.650982] [<00047dd453378cf0>] filemap_get_pages+0x190/0xaa0
[ 6818.650986] [<00047dd453379940>] filemap_read+0x340/0xa70
[ 6818.650989] [<00047dd3d325d226>] xfs_file_buffered_read+0x2c6/0x400 [xfs]
[ 6818.651431] [<00047dd3d325dfe2>] xfs_file_read_iter+0x2c2/0x550 [xfs]
[ 6818.651663] [<00047dd45364710c>] vfs_read+0x64c/0x8c0
[ 6818.651669] [<00047dd453648ed8>] ksys_read+0x118/0x200
[ 6818.651672] [<00047dd452b6cf5a>] do_syscall+0x27a/0x380
[ 6818.651676] [<00047dd454bd7e74>] __do_syscall+0xf4/0x1a0
[ 6818.651680] [<00047dd454c0db58>] system_call+0x70/0x98
As KASAN is a big user of stackdepot, the current DEPOT_POOLS_CAP of
8192 may not be enough. Double DEPOT_POOLS_CAP if KASAN is enabled to
avoid hitting this problem.
Also use the MIN() macro for defining DEPOT_MAX_POOLS to clarify the
intention.
Link: https://lkml.kernel.org/r/20240807165228.1116831-1-longman@redhat.com
Fixes: 02754e0a484a ("lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/stackdepot.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/lib/stackdepot.c~lib-stackdepot-double-depot_pools_cap-if-kasan-is-enabled
+++ a/lib/stackdepot.c
@@ -36,11 +36,12 @@
#include <linux/memblock.h>
#include <linux/kasan-enabled.h>
-#define DEPOT_POOLS_CAP 8192
+/* KASAN is a big user of stackdepot, double the cap if KASAN is enabled */
+#define DEPOT_POOLS_CAP (8192 * (IS_ENABLED(CONFIG_KASAN) ? 2 : 1))
+
/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
#define DEPOT_MAX_POOLS \
- (((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \
- (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP)
+ MIN((1LL << (DEPOT_POOL_INDEX_BITS)) - 1, DEPOT_POOLS_CAP)
static bool stack_depot_disabled;
static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu.patch
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu-v3.patch
watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch
The locks_remove_posix() function in fcntl_setlk/fcntl_setlk64 is designed
to reliably remove locks when an fcntl/close race is detected. However, it
was passing in the wrong filelock owner, it looks like a mistake and
resulting in a failure to remove locks. More critically, if the lock
removal fails, it could lead to a uaf issue while traversing the locks.
This problem occurs only in the 4.19 stable version.
Fixes: a561145f3ae9 ("filelock: Fix fcntl/close race recovery compat path")
Fixes: d30ff3304083 ("filelock: Remove locks reliably when fcntl/close race is detected")
Cc: stable(a)vger.kernel.org
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/locks.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 234ebfa8c070..b1201b01867a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2313,7 +2313,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
@@ -2443,7 +2443,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
--
2.39.2
The patch titled
Subject: mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hao Ge <gehao(a)kylinos.cn>
Subject: mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
Date: Fri, 16 Aug 2024 09:33:36 +0800
When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
warning always occurs,This is because the following call stack occurred:
mem_pool_alloc
kmem_cache_alloc_noprof
slab_alloc_node
kfence_alloc
Once the kfence allocation is successful,slab->obj_exts will not be empty,
because it has already been assigned a value in kfence_init_pool.
Since in the prepare_slab_obj_exts_hook function,we perform a check for
s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
will not be called as a result.Therefore,ref->ct remains NULL.
However,when we call mem_pool_free,since obj_ext is not empty, it
eventually leads to the alloc_tag_sub scenario being invoked. This is
where the warning occurs.
So we should add corresponding checks in the alloc_tagging_slab_free_hook.
For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
kfence,so I won't add the corresponding check in
alloc_tagging_slab_free_hook for now.
[ 3.734349] ------------[ cut here ]------------
[ 3.734807] alloc_tag was not set
[ 3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
[ 3.735866] Modules linked in: autofs4
[ 3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G W 6.11.0-rc3-dirty #1
[ 3.736969] Tainted: [W]=WARN
[ 3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.738501] pc : kmem_cache_free+0x444/0x574
[ 3.738951] lr : kmem_cache_free+0x444/0x574
[ 3.739361] sp : ffff80008357bb60
[ 3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
[ 3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
[ 3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
[ 3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
[ 3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
[ 3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
[ 3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
[ 3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
[ 3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
[ 3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
[ 3.746255] Call trace:
[ 3.746530] kmem_cache_free+0x444/0x574
[ 3.746931] mem_pool_free+0x44/0xf4
[ 3.747306] free_object_rcu+0xc8/0xdc
[ 3.747693] rcu_do_batch+0x234/0x8a4
[ 3.748075] rcu_core+0x230/0x3e4
[ 3.748424] rcu_core_si+0x14/0x1c
[ 3.748780] handle_softirqs+0x134/0x378
[ 3.749189] run_ksoftirqd+0x70/0x9c
[ 3.749560] smpboot_thread_fn+0x148/0x22c
[ 3.749978] kthread+0x10c/0x118
[ 3.750323] ret_from_fork+0x10/0x20
[ 3.750696] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20240816013336.17505-1-hao.ge@linux.dev
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Hao Ge <gehao(a)kylinos.cn>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/slub.c~mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook
+++ a/mm/slub.c
@@ -2116,6 +2116,10 @@ alloc_tagging_slab_free_hook(struct kmem
if (!mem_alloc_profiling_enabled())
return;
+ /* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
+ if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
+ return;
+
obj_exts = slab_obj_exts(slab);
if (!obj_exts)
return;
_
Patches currently in -mm which might be from gehao(a)kylinos.cn are
mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook.patch
mm-cma-change-the-addition-of-totalcma_pages-in-the-cma_init_reserved_mem.patch
AddressSanitizer found a use-after-free bug in the symbol code which
manifested as perf top segfaulting.
==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
READ of size 1 at 0x60b00c48844b thread T193
#0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
#1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
#2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
#3 0x5650d804568f in __hists__add_entry util/hist.c:754
#4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
#5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
#6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
#7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
#8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
#9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
#10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
#11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
#12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
#13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().
While this bug was probably introduced with 5c24b67aae72 ("perf tools:
Replace map->referenced & maps->removed_maps with map->refcnt"), the
symbol objects were leaked until c087e9480cf3 ("perf machine: Fix
refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the
bug was masked.
Fixes: c087e9480cf3 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Signed-off-by: Matt Fleming (Cloudflare) <matt(a)readmodwrite.com>
Reported-by: Yunzhao Li <yunzhao(a)cloudflare.com>
Cc: stable(a)vger.kernel.org # v5.13+
---
tools/perf/util/hist.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 0f554febf9a1..0f9ce2ee2c31 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -639,6 +639,11 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
* the history counter to increment.
*/
if (he->ms.map != entry->ms.map) {
+ if (he->ms.sym) {
+ u64 addr = he->ms.sym->start;
+ he->ms.sym = map__find_symbol(entry->ms.map, addr);
+ }
+
map__put(he->ms.map);
he->ms.map = map__get(entry->ms.map);
}
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081113-creamed-unleaded-c696@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
82dbb57ac8d0 ("scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES")
0c25422d34b4 ("scsi: mpt3sas: Remove scsi_dma_map() error messages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Fri, 19 Jul 2024 16:39:12 +0900
Subject: [PATCH] scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
Some firmware versions of the 9600 series SAS HBA byte-swap the REPORT
ZONES command reply buffer from ATA-ZAC devices by directly accessing the
buffer in the host memory. This does not respect the default command DMA
direction and causes IOMMU page faults on architectures with an IOMMU
enforcing write-only mappings for DMA_FROM_DEVICE DMA driection (e.g. AMD
hosts).
scsi 18:0:0:0: Direct-Access-ZBC ATA WDC WSH722020AL W870 PQ: 0 ANSI: 6
scsi 18:0:0:0: SATA: handle(0x0027), sas_addr(0x300062b2083e7c40), phy(0), device_name(0x5000cca29dc35e11)
scsi 18:0:0:0: enclosure logical id (0x300062b208097c40), slot(0)
scsi 18:0:0:0: enclosure level(0x0000), connector name( C0.0)
scsi 18:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
scsi 18:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
sd 18:0:0:0: Attached scsi generic sg2 type 20
sd 18:0:0:0: [sdc] Host-managed zoned block device
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b200 flags=0x0050]
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b300 flags=0x0050]
mpt3sas_cm0: mpt3sas_ctl_pre_reset_handler: Releasing the trace buffer due to adapter reset.
mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
mpt3sas_cm0: fault_state(0x2666)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
sd 18:0:0:0: [sdc] REPORT ZONES start lba 0 failed
sd 18:0:0:0: [sdc] REPORT ZONES: Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
sd 18:0:0:0: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
Avoid such issue by always mapping the buffer of REPORT ZONES commands
using DMA_BIDIRECTIONAL (read+write IOMMU mapping). This is done by
introducing the helper function _base_scsi_dma_map() and using this helper
in _base_build_sg_scmd() and _base_build_sg_scmd_ieee() instead of calling
directly scsi_dma_map().
Fixes: 471ef9d4e498 ("mpt3sas: Build MPI SGL LIST on GEN2 HBAs and IEEE SGL LIST on GEN3 HBAs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240719073913.179559-3-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1092497563b2..c8fb965a6bf0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2671,6 +2671,22 @@ _base_build_zero_len_sge_ieee(struct MPT3SAS_ADAPTER *ioc, void *paddr)
_base_add_sg_single_ieee(paddr, sgl_flags, 0, 0, -1);
}
+static inline int _base_scsi_dma_map(struct scsi_cmnd *cmd)
+{
+ /*
+ * Some firmware versions byte-swap the REPORT ZONES command reply from
+ * ATA-ZAC devices by directly accessing in the host buffer. This does
+ * not respect the default command DMA direction and causes IOMMU page
+ * faults on some architectures with an IOMMU enforcing write mappings
+ * (e.g. AMD hosts). Avoid such issue by making the report zones buffer
+ * mapping bi-directional.
+ */
+ if (cmd->cmnd[0] == ZBC_IN && cmd->cmnd[1] == ZI_REPORT_ZONES)
+ cmd->sc_data_direction = DMA_BIDIRECTIONAL;
+
+ return scsi_dma_map(cmd);
+}
+
/**
* _base_build_sg_scmd - main sg creation routine
* pcie_device is unused here!
@@ -2717,7 +2733,7 @@ _base_build_sg_scmd(struct MPT3SAS_ADAPTER *ioc,
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
@@ -2861,7 +2877,7 @@ _base_build_sg_scmd_ieee(struct MPT3SAS_ADAPTER *ioc,
}
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081112-idly-qualify-80f3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
82dbb57ac8d0 ("scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES")
0c25422d34b4 ("scsi: mpt3sas: Remove scsi_dma_map() error messages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Fri, 19 Jul 2024 16:39:12 +0900
Subject: [PATCH] scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
Some firmware versions of the 9600 series SAS HBA byte-swap the REPORT
ZONES command reply buffer from ATA-ZAC devices by directly accessing the
buffer in the host memory. This does not respect the default command DMA
direction and causes IOMMU page faults on architectures with an IOMMU
enforcing write-only mappings for DMA_FROM_DEVICE DMA driection (e.g. AMD
hosts).
scsi 18:0:0:0: Direct-Access-ZBC ATA WDC WSH722020AL W870 PQ: 0 ANSI: 6
scsi 18:0:0:0: SATA: handle(0x0027), sas_addr(0x300062b2083e7c40), phy(0), device_name(0x5000cca29dc35e11)
scsi 18:0:0:0: enclosure logical id (0x300062b208097c40), slot(0)
scsi 18:0:0:0: enclosure level(0x0000), connector name( C0.0)
scsi 18:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
scsi 18:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
sd 18:0:0:0: Attached scsi generic sg2 type 20
sd 18:0:0:0: [sdc] Host-managed zoned block device
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b200 flags=0x0050]
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b300 flags=0x0050]
mpt3sas_cm0: mpt3sas_ctl_pre_reset_handler: Releasing the trace buffer due to adapter reset.
mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
mpt3sas_cm0: fault_state(0x2666)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
sd 18:0:0:0: [sdc] REPORT ZONES start lba 0 failed
sd 18:0:0:0: [sdc] REPORT ZONES: Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
sd 18:0:0:0: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
Avoid such issue by always mapping the buffer of REPORT ZONES commands
using DMA_BIDIRECTIONAL (read+write IOMMU mapping). This is done by
introducing the helper function _base_scsi_dma_map() and using this helper
in _base_build_sg_scmd() and _base_build_sg_scmd_ieee() instead of calling
directly scsi_dma_map().
Fixes: 471ef9d4e498 ("mpt3sas: Build MPI SGL LIST on GEN2 HBAs and IEEE SGL LIST on GEN3 HBAs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240719073913.179559-3-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1092497563b2..c8fb965a6bf0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2671,6 +2671,22 @@ _base_build_zero_len_sge_ieee(struct MPT3SAS_ADAPTER *ioc, void *paddr)
_base_add_sg_single_ieee(paddr, sgl_flags, 0, 0, -1);
}
+static inline int _base_scsi_dma_map(struct scsi_cmnd *cmd)
+{
+ /*
+ * Some firmware versions byte-swap the REPORT ZONES command reply from
+ * ATA-ZAC devices by directly accessing in the host buffer. This does
+ * not respect the default command DMA direction and causes IOMMU page
+ * faults on some architectures with an IOMMU enforcing write mappings
+ * (e.g. AMD hosts). Avoid such issue by making the report zones buffer
+ * mapping bi-directional.
+ */
+ if (cmd->cmnd[0] == ZBC_IN && cmd->cmnd[1] == ZI_REPORT_ZONES)
+ cmd->sc_data_direction = DMA_BIDIRECTIONAL;
+
+ return scsi_dma_map(cmd);
+}
+
/**
* _base_build_sg_scmd - main sg creation routine
* pcie_device is unused here!
@@ -2717,7 +2733,7 @@ _base_build_sg_scmd(struct MPT3SAS_ADAPTER *ioc,
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
@@ -2861,7 +2877,7 @@ _base_build_sg_scmd_ieee(struct MPT3SAS_ADAPTER *ioc,
}
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
In _emif_get_id(), of_get_address() may return NULL which is later
dereferenced. Fix this bug by adding NULL check. of_translate_address() is
the same.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v4:
- added the check of of_translate_address() as suggestions.
Changes in v3:
- added the patch operations omitted in PATCH v2 RESEND compared to PATCH
v2. Sorry for my oversight.
Changes in v2:
- added Cc stable line.
---
drivers/edac/ti_edac.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/edac/ti_edac.c b/drivers/edac/ti_edac.c
index 29723c9592f7..f466f12630d3 100644
--- a/drivers/edac/ti_edac.c
+++ b/drivers/edac/ti_edac.c
@@ -207,14 +207,24 @@ static int _emif_get_id(struct device_node *node)
int my_id = 0;
addrp = of_get_address(node, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
my_addr = (u32)of_translate_address(node, addrp);
+ if (my_addr == OF_BAD_ADDR)
+ return -EINVAL;
for_each_matching_node(np, ti_edac_of_match) {
if (np == node)
continue;
addrp = of_get_address(np, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
addr = (u32)of_translate_address(np, addrp);
+ if (addr == OF_BAD_ADDR)
+ return -EINVAL;
edac_printk(KERN_INFO, EDAC_MOD_NAME,
"addr=%x, my_addr=%x\n",
--
2.25.1
The patch titled
Subject: kunit/overflow: fix UB in overflow_allocation_test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kunit-overflow-fix-ub-in-overflow_allocation_test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Subject: kunit/overflow: fix UB in overflow_allocation_test
Date: Thu, 15 Aug 2024 01:04:31 +0100
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used as a
driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN enabled.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Link: https://lkml.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Tested-by: Erhard Furtner <erhard_f(a)mailbox.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/overflow_kunit.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/overflow_kunit.c~kunit-overflow-fix-ub-in-overflow_allocation_test
+++ a/lib/overflow_kunit.c
@@ -668,7 +668,6 @@ DEFINE_TEST_ALLOC(devm_kzalloc, devm_kf
static void overflow_allocation_test(struct kunit *test)
{
- const char device_name[] = "overflow-test";
struct device *dev;
int count = 0;
@@ -678,7 +677,7 @@ static void overflow_allocation_test(str
} while (0)
/* Create dummy device for devm_kmalloc()-family tests. */
- dev = kunit_device_register(test, device_name);
+ dev = kunit_device_register(test, "overflow-test");
KUNIT_ASSERT_FALSE_MSG(test, IS_ERR(dev),
"Cannot register test device\n");
_
Patches currently in -mm which might be from ivan.orlov0322(a)gmail.com are
kunit-overflow-fix-ub-in-overflow_allocation_test.patch
From: Murali Nalajala <quic_mnalajal(a)quicinc.com>
Currently get_wq_ctx() is wrongly configured as a standard call. When two
SMC calls are in sleep and one SMC wakes up, it calls get_wq_ctx() to
resume the corresponding sleeping thread. But if get_wq_ctx() is
interrupted, goes to sleep and another SMC call is waiting to be allocated
a waitq context, it leads to a deadlock.
To avoid this get_wq_ctx() must be an atomic call and can't be a standard
SMC call. Hence mark get_wq_ctx() as a fast call.
Fixes: 6bf325992236 ("firmware: qcom: scm: Add wait-queue handling logic")
Cc: stable(a)vger.kernel.org
Signed-off-by: Murali Nalajala <quic_mnalajal(a)quicinc.com>
Signed-off-by: Unnathi Chalicheemala <quic_uchalich(a)quicinc.com>
Reviewed-by: Elliot Berman <quic_eberman(a)quicinc.com>
---
Changes in v2:
- Made commit message more clear.
- R-b tag from Elliot.
- Link to v1: https://lore.kernel.org/all/20240611-get_wq_ctx_atomic-v1-1-9189a0a7d1ba@qu…
drivers/firmware/qcom/qcom_scm-smc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/qcom/qcom_scm-smc.c b/drivers/firmware/qcom/qcom_scm-smc.c
index dca5f3f1883b..2b4c2826f572 100644
--- a/drivers/firmware/qcom/qcom_scm-smc.c
+++ b/drivers/firmware/qcom/qcom_scm-smc.c
@@ -73,7 +73,7 @@ int scm_get_wq_ctx(u32 *wq_ctx, u32 *flags, u32 *more_pending)
struct arm_smccc_res get_wq_res;
struct arm_smccc_args get_wq_ctx = {0};
- get_wq_ctx.args[0] = ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL,
+ get_wq_ctx.args[0] = ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,
ARM_SMCCC_SMC_64, ARM_SMCCC_OWNER_SIP,
SCM_SMC_FNID(QCOM_SCM_SVC_WAITQ, QCOM_SCM_WAITQ_GET_WQ_CTX));
--
2.34.1
This series adds the DFS support for GCC QUPv3 RCGS and also adds the
missing GPLL9 support and fixes the sdcc clocks frequency tables.
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
---
Changes in V2:
- Add stable kernel tags and update the commit text for [1/4] patch.
- Added one more fix in V2, to remove the unused cpuss_ahb_clk and its RCG.
---
Satya Priya Kakitapalli (5):
clk: qcom: gcc-sc8180x: Register QUPv3 RCGs for DFS on sc8180x
dt-bindings: clock: qcom: Add GPLL9 support on gcc-sc8180x
clk: qcom: gcc-sc8180x: Add GPLL9 support
clk: qcom: gcc-sc8180x: Fix the sdcc2 and sdcc4 clocks freq table
clk: qcom: gcc-sm8150: De-register gcc_cpuss_ahb_clk_src
drivers/clk/qcom/gcc-sc8180x.c | 438 ++++++++++++++-------------
include/dt-bindings/clock/qcom,gcc-sc8180x.h | 1 +
2 files changed, 232 insertions(+), 207 deletions(-)
---
base-commit: 864b1099d16fc7e332c3ad7823058c65f890486c
change-id: 20240725-gcc-sc8180x-fixes-cf58908142b5
Best regards,
--
Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
From: Mike Tipton <quic_mdtipton(a)quicinc.com>
Valid frequencies may result in BCM votes that exceed the max HW value.
Set vote ceiling to BCM_TCS_CMD_VOTE_MASK to ensure the votes aren't
truncated, which can result in lower frequencies than desired.
Fixes: 04053f4d23a4 ("clk: qcom: clk-rpmh: Add IPA clock support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mike Tipton <quic_mdtipton(a)quicinc.com>
Reviewed-by: Taniya Das <quic_tdas(a)quicinc.com>
Signed-off-by: Imran Shaik <quic_imrashai(a)quicinc.com>
---
Changes in v2:
- Updated the overflow check as per the comment from Stephen.
- Link to v1: https://lore.kernel.org/r/20240808-clk-rpmh-bcm-vote-fix-v1-1-109bd1d76189@…
---
drivers/clk/qcom/clk-rpmh.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/clk/qcom/clk-rpmh.c b/drivers/clk/qcom/clk-rpmh.c
index bb82abeed88f..4acde937114a 100644
--- a/drivers/clk/qcom/clk-rpmh.c
+++ b/drivers/clk/qcom/clk-rpmh.c
@@ -263,6 +263,8 @@ static int clk_rpmh_bcm_send_cmd(struct clk_rpmh *c, bool enable)
cmd_state = 0;
}
+ cmd_state = min(cmd_state, BCM_TCS_CMD_VOTE_MASK);
+
if (c->last_sent_aggr_state != cmd_state) {
cmd.addr = c->res_addr;
cmd.data = BCM_TCS_CMD(1, enable, 0, cmd_state);
---
base-commit: 222a3380f92b8791d4eeedf7cd750513ff428adf
change-id: 20240808-clk-rpmh-bcm-vote-fix-c344e213c9bb
Best regards,
--
Imran Shaik <quic_imrashai(a)quicinc.com>
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 24e82654e98e96cece5d8b919c522054456eeec6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081204-stylist-bobsled-3424@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
24e82654e98e ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e82654e98e96cece5d8b919c522054456eeec6 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6b713fb0b818..fdf171ad4a3c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1144,7 +1144,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2312,7 +2312,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3354,6 +3354,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 24e82654e98e96cece5d8b919c522054456eeec6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081203-dreadlock-trodden-9b5f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
24e82654e98e ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e82654e98e96cece5d8b919c522054456eeec6 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6b713fb0b818..fdf171ad4a3c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1144,7 +1144,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2312,7 +2312,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3354,6 +3354,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 5d390df21440..02441883401d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE entry
+ * for the enclave page, CPU2 may attempt to load this page
+ * (because the page is still in enclave's xarray). To prevent
+ * CPU2 from loading the page, mark the page as busy.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.43.0
On Thu, Aug 15, 2024 at 08:21:29AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> Input: bcm5974 - check endpoint type before starting traffic
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> input-bcm5974-check-endpoint-type-before-starting-tr.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I am confused why to pick this up only to have to pick up the revert? It
was not tagged for stable explicitly.
I'd expects stable scripts to check for reverts that happen almost
immediately...
>
>
>
> commit 16ae8b10b473f96b7f63add2e928d8d437c83b07
> Author: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
> Date: Sat Oct 14 12:20:15 2023 +0200
>
> Input: bcm5974 - check endpoint type before starting traffic
>
> [ Upstream commit 2b9c3eb32a699acdd4784d6b93743271b4970899 ]
>
> syzbot has found a type mismatch between a USB pipe and the transfer
> endpoint, which is triggered by the bcm5974 driver[1].
>
> This driver expects the device to provide input interrupt endpoints and
> if that is not the case, the driver registration should terminate.
>
> Repros are available to reproduce this issue with a certain setup for
> the dummy_hcd, leading to an interrupt/bulk mismatch which is caught in
> the USB core after calling usb_submit_urb() with the following message:
> "BOGUS urb xfer, pipe 1 != type 3"
>
> Some other device drivers (like the appletouch driver bcm5974 is mainly
> based on) provide some checking mechanism to make sure that an IN
> interrupt endpoint is available. In this particular case the endpoint
> addresses are provided by a config table, so the checking can be
> targeted to the provided endpoints.
>
> Add some basic checking to guarantee that the endpoints available match
> the expected type for both the trackpad and button endpoints.
>
> This issue was only found for the trackpad endpoint, but the checking
> has been added to the button endpoint as well for the same reasons.
>
> Given that there was never a check for the endpoint type, this bug has
> been there since the first implementation of the driver (f89bd95c5c94).
>
> [1] https://syzkaller.appspot.com/bug?extid=348331f63b034f89b622
>
> Fixes: f89bd95c5c94 ("Input: bcm5974 - add driver for Macbook Air and Pro Penryn touchpads")
> Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
> Reported-and-tested-by: syzbot+348331f63b034f89b622(a)syzkaller.appspotmail.com
> Link: https://lore.kernel.org/r/20231007-topic-bcm5974_bulk-v3-1-d0f38b9d2935@gma…
> Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/input/mouse/bcm5974.c b/drivers/input/mouse/bcm5974.c
> index ca150618d32f1..953992b458e9f 100644
> --- a/drivers/input/mouse/bcm5974.c
> +++ b/drivers/input/mouse/bcm5974.c
> @@ -19,6 +19,7 @@
> * Copyright (C) 2006 Nicolas Boichat (nicolas(a)boichat.ch)
> */
>
> +#include "linux/usb.h"
> #include <linux/kernel.h>
> #include <linux/errno.h>
> #include <linux/slab.h>
> @@ -193,6 +194,8 @@ enum tp_type {
>
> /* list of device capability bits */
> #define HAS_INTEGRATED_BUTTON 1
> +/* maximum number of supported endpoints (currently trackpad and button) */
> +#define MAX_ENDPOINTS 2
>
> /* trackpad finger data block size */
> #define FSIZE_TYPE1 (14 * sizeof(__le16))
> @@ -891,6 +894,18 @@ static int bcm5974_resume(struct usb_interface *iface)
> return error;
> }
>
> +static bool bcm5974_check_endpoints(struct usb_interface *iface,
> + const struct bcm5974_config *cfg)
> +{
> + u8 ep_addr[MAX_ENDPOINTS + 1] = {0};
> +
> + ep_addr[0] = cfg->tp_ep;
> + if (cfg->tp_type == TYPE1)
> + ep_addr[1] = cfg->bt_ep;
> +
> + return usb_check_int_endpoints(iface, ep_addr);
> +}
> +
> static int bcm5974_probe(struct usb_interface *iface,
> const struct usb_device_id *id)
> {
> @@ -903,6 +918,11 @@ static int bcm5974_probe(struct usb_interface *iface,
> /* find the product index */
> cfg = bcm5974_get_config(udev);
>
> + if (!bcm5974_check_endpoints(iface, cfg)) {
> + dev_err(&iface->dev, "Unexpected non-int endpoint\n");
> + return -ENODEV;
> + }
> +
> /* allocate memory for our device state and initialize it */
> dev = kzalloc(sizeof(struct bcm5974), GFP_KERNEL);
> input_dev = input_allocate_device();
Thanks.
--
Dmitry
Imagine an mmap()'d file. Two threads touch the same address at the same
time and fault. Both allocate a physical page and race to install a PTE
for that page. Only one will win the race. The loser frees its page, but
still continues handling the fault as a success and returns
VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the
SGX steers into a failure path. The loser EREMOVE's the winner's EPC
page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Change the return code to VM_FAULT_NOPAGE
to avoid SIGBUS and call sgx_free_epc_page() which avoids EREMOVE'ing
the winner's page and only frees the page that the loser allocated.
The race can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c0a3c00284c8..9f7f9e57cdeb 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -380,8 +380,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -417,7 +420,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.43.0
SGX_ENCL_PAGE_BEING_RECLAIMED flag is set when the enclave page is being
reclaimed (moved to the backing store). This flag however has two
logical meanings:
1. Don't attempt to load the enclave page (the page is busy).
2. Don't attempt to remove the PCMD page corresponding to this enclave
page (the PCMD page is busy).
To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into
two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently,
both flags are set only when the enclave page is being reclaimed. A
future commit will introduce a new case when the enclave page is being
removed; this new case will set only the SGX_ENCL_PAGE_BUSY flag.
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 16 +++++++---------
arch/x86/kernel/cpu/sgx/encl.h | 10 ++++++++--
arch/x86/kernel/cpu/sgx/main.c | 4 ++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..c0a3c00284c8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -46,10 +46,10 @@ static int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_ind
* a check if an enclave page sharing the PCMD page is in the process of being
* reclaimed.
*
- * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
- * intends to reclaim that enclave page - it means that the PCMD page
- * associated with that enclave page is about to get some data and thus
- * even if the PCMD page is empty, it should not be truncated.
+ * The reclaimer sets the SGX_ENCL_PAGE_PCMD_BUSY flag when it intends to
+ * reclaim that enclave page - it means that the PCMD page associated with that
+ * enclave page is about to get some data and thus even if the PCMD page is
+ * empty, it should not be truncated.
*
* Context: Enclave mutex (&sgx_encl->lock) must be held.
* Return: 1 if the reclaimer is about to write to the PCMD page
@@ -77,8 +77,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* Stop when reaching the SECS page - it does not
* have a page_array entry and its reclaim is
* started and completed with enclave mutex held so
- * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
- * flag.
+ * it does not use the SGX_ENCL_PAGE_PCMD_BUSY flag.
*/
if (addr == encl->base + encl->size)
break;
@@ -91,8 +90,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* VA page slot ID uses same bit as the flag so it is important
* to ensure that the page is not already in backing store.
*/
- if (entry->epc_page &&
- (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+ if (entry->epc_page && (entry->desc & SGX_ENCL_PAGE_PCMD_BUSY)) {
reclaimed = 1;
break;
}
@@ -257,7 +255,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & SGX_ENCL_PAGE_BUSY)
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..11b09899cd92 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,8 +22,14 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit marking that the page is being reclaimed. */
-#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit indicating that the page is busy (e.g. being reclaimed). */
+#define SGX_ENCL_PAGE_BUSY BIT(2)
+
+/*
+ * 'desc' bit indicating that PCMD page associated with the enclave page is
+ * busy (e.g. because the enclave page is being reclaimed).
+ */
+#define SGX_ENCL_PAGE_PCMD_BUSY BIT(3)
struct sgx_encl_page {
unsigned long desc;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..e94b09c43673 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -204,7 +204,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
void *va_slot;
int ret;
- encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc &= ~(SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY);
va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
list);
@@ -340,7 +340,7 @@ static void sgx_reclaim_pages(void)
goto skip;
}
- encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc |= SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY;
mutex_unlock(&encl_page->encl->lock);
continue;
--
2.43.0
Hi,
This series is a v6.6-only backport (based on v6.6.44) of the upstream
workaround for SSBS errata on Arm Ltd CPUs, as affected parts are likely
to be used with stable kernels. This does not apply to earlier stable
trees, which will receive a separate backport.
The errata mean that an MSR to the SSBS special-purpose register does not
affect subsequent speculative instructions, permitting speculative store
bypassing for a window of time.
The upstream support was original posted as:
* https://lore.kernel.org/linux-arm-kernel/20240508081400.235362-1-mark.rutla…
"arm64: errata: Add workaround for Arm errata 3194386 and 3312417"
Present in v6.10
* https://lore.kernel.org/linux-arm-kernel/20240603111812.1514101-1-mark.rutl…
"arm64: errata: Expand speculative SSBS workaround"
Present in v6.11-rc1
* https://lore.kernel.org/linux-arm-kernel/20240801101803.1982459-1-mark.rutl…
"arm64: errata: Expand speculative SSBS workaround (again)"
Present in v6.11-rc2
This backport applies the patches which are not present in v6.6.y, and
as prerequisites backports the addition of the Neoverse-V2 MIDR values
and the restoration of the spec_bar() macro.
I have tested the backport (when applied to v6.6.44), ensuring that the
detection logic works and that the HWCAP and string in /proc/cpuinfo are
both hidden when the relevant errata are detected.
Mark.Besar Wicaksono (1):
arm64: Add Neoverse-V2 part
Mark Rutland (12):
arm64: barrier: Restore spec_bar() macro
arm64: cputype: Add Cortex-X4 definitions
arm64: cputype: Add Neoverse-V3 definitions
arm64: errata: Add workaround for Arm errata 3194386 and 3312417
arm64: cputype: Add Cortex-X3 definitions
arm64: cputype: Add Cortex-A720 definitions
arm64: cputype: Add Cortex-X925 definitions
arm64: errata: Unify speculative SSBS errata logic
arm64: errata: Expand speculative SSBS workaround
arm64: cputype: Add Cortex-X1C definitions
arm64: cputype: Add Cortex-A725 definitions
arm64: errata: Expand speculative SSBS workaround (again)
Documentation/arch/arm64/silicon-errata.rst | 36 +++++++++++++++++++
arch/arm64/Kconfig | 38 +++++++++++++++++++++
arch/arm64/include/asm/barrier.h | 4 +++
arch/arm64/include/asm/cputype.h | 16 +++++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++
arch/arm64/kernel/cpufeature.c | 12 +++++++
arch/arm64/kernel/proton-pack.c | 12 +++++++
arch/arm64/tools/cpucaps | 1 +
8 files changed, 150 insertions(+)
--
2.30.2
From: Friedrich Vock <friedrich.vock(a)gmx.de>
The special case for VM passthrough doesn't check adev->nbio.funcs
before dereferencing it. If GPUs that don't have an NBIO block are
passed through, this leads to a NULL pointer dereference on startup.
Signed-off-by: Friedrich Vock <friedrich.vock(a)gmx.de>
Fixes: 1bece222eabe ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid")
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Christian König <christian.koenig(a)amd.com>
Acked-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3558
Cc: stable(a)vger.kernel.org # 5.15.x
(cherry picked from commit 0cdb3f9740844b9d95ca413e3fcff11f81223ecf)
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5f6c32ec674d..300d3b236bb3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5531,7 +5531,7 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
adev->nbio.funcs->enable_doorbell_interrupt)
adev->nbio.funcs->enable_doorbell_interrupt(adev, true);
- if (amdgpu_passthrough(adev) &&
+ if (amdgpu_passthrough(adev) && adev->nbio.funcs &&
adev->nbio.funcs->clear_doorbell_interrupt)
adev->nbio.funcs->clear_doorbell_interrupt(adev);
--
2.46.0
re-enumerating full-speed devices after a failed address device command
can trigger a NULL pointer dereference.
Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size
value during enumeration. Usb core calls usb_ep0_reinit() in this case,
which ends up calling xhci_configure_endpoint().
On Panther point xHC the xhci_configure_endpoint() function will
additionally check and reserve bandwidth in software. Other hosts do
this in hardware
If xHC address device command fails then a new xhci_virt_device structure
is allocated as part of re-enabling the slot, but the bandwidth table
pointers are not set up properly here.
This triggers the NULL pointer dereference the next time usb_ep0_reinit()
is called and xhci_configure_endpoint() tries to check and reserve
bandwidth
[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
[46710.713699] usb 3-1: Device not responding to setup address.
[46710.917684] usb 3-1: Device not responding to setup address.
[46711.125536] usb 3-1: device not accepting address 5, error -71
[46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
[46711.125600] #PF: supervisor read access in kernel mode
[46711.125603] #PF: error_code(0x0000) - not-present page
[46711.125606] PGD 0 P4D 0
[46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
[46711.125620] Hardware name: Gigabyte Technology Co., Ltd.
[46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
[46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c
Fix this by making sure bandwidth table pointers are set up correctly
after a failed address device command, and additionally by avoiding
checking for bandwidth in cases like this where no actual endpoints are
added or removed, i.e. only context for default control endpoint 0 is
evaluated.
Reported-by: Karel Balej <balejk(a)matfyz.cz>
Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/
Cc: stable(a)vger.kernel.org
Fixes: 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command")
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 0a8cf6c17f82..efdf4c228b8c 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2837,7 +2837,7 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci,
xhci->num_active_eps);
return -ENOMEM;
}
- if ((xhci->quirks & XHCI_SW_BW_CHECKING) &&
+ if ((xhci->quirks & XHCI_SW_BW_CHECKING) && !ctx_change &&
xhci_reserve_bandwidth(xhci, virt_dev, command->in_ctx)) {
if ((xhci->quirks & XHCI_EP_LIMIT_QUIRK))
xhci_free_host_resources(xhci, ctrl_ctx);
@@ -4200,8 +4200,10 @@ static int xhci_setup_device(struct usb_hcd *hcd, struct usb_device *udev,
mutex_unlock(&xhci->mutex);
ret = xhci_disable_slot(xhci, udev->slot_id);
xhci_free_virt_device(xhci, udev->slot_id);
- if (!ret)
- xhci_alloc_dev(hcd, udev);
+ if (!ret) {
+ if (xhci_alloc_dev(hcd, udev) == 1)
+ xhci_setup_addressable_virt_dev(xhci, udev);
+ }
kfree(command->completion);
kfree(command);
return -EPROTO;
--
2.25.1
This reverts commit 28ab9769117ca944cb6eb537af5599aa436287a4.
Sense data can be in either fixed format or descriptor format.
SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
"The SATL shall support this bit as defined in SPC-5 with the following
exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
data), then the SATL should return fixed format sense data for ATA
PASS-THROUGH commands."
The libata SATL has always kept D_SENSE set to zero by default. (It is
however possible to change the value using a MODE SELECT SG_IO command.)
Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
however, successful ATA PASS-THROUGH commands incorrectly returned the
sense data in descriptor format (regardless of the D_SENSE bit).
Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for
CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
commands.
However, after commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE
bit for CK_COND=1 and no error"), there were bug reports that hdparm,
hddtemp, and udisks were no longer working as expected.
These applications incorrectly assume the returned sense data is in
descriptor format, without even looking at the RESPONSE CODE field in the
returned sense data (to see which format the returned sense data is in).
Considering that there will be broken versions of these applications around
roughly forever, we are stuck with being bug compatible with older kernels.
Cc: stable(a)vger.kernel.org # 4.19+
Reported-by: Stephan Eisvogel <eisvogel(a)seitics.de>
Reported-by: Christian Heusel <christian(a)heusel.eu>
Closes: https://lore.kernel.org/linux-ide/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heus…
Fixes: 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/ata/libata-scsi.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index d6f5e25e1ed8..473e00a58a8b 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -951,8 +951,19 @@ static void ata_gen_passthru_sense(struct ata_queued_cmd *qc)
&sense_key, &asc, &ascq);
ata_scsi_set_sense(qc->dev, cmd, sense_key, asc, ascq);
} else {
- /* ATA PASS-THROUGH INFORMATION AVAILABLE */
- ata_scsi_set_sense(qc->dev, cmd, RECOVERED_ERROR, 0, 0x1D);
+ /*
+ * ATA PASS-THROUGH INFORMATION AVAILABLE
+ *
+ * Note: we are supposed to call ata_scsi_set_sense(), which
+ * respects the D_SENSE bit, instead of unconditionally
+ * generating the sense data in descriptor format. However,
+ * because hdparm, hddtemp, and udisks incorrectly assume sense
+ * data in descriptor format, without even looking at the
+ * RESPONSE CODE field in the returned sense data (to see which
+ * format the returned sense data is in), we are stuck with
+ * being bug compatible with older kernels.
+ */
+ scsi_build_sense(cmd, 1, RECOVERED_ERROR, 0, 0x1D);
}
}
--
2.46.0
On Thu, Aug 15, 2024 at 08:22:16AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> ext4: convert ext4_da_do_write_end() to take a folio
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> ext4-convert-ext4_da_do_write_end-to-take-a-folio.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I'd think you'd want to backport 83f4414b8f84 to before folios existing,
so you may as well use the same patch for 6.6 as you'd want to use for
those older kernels. ie:
if (unlikely(!page_buffers(page))) {
unlock_page(page);
put_page(page);
return -EIO;
}
but maybe this has already been discussed and I wasn't cc'd on that
discussion.
The main fix is a possible memory leak on an early exit in the
for_each_child_of_node() loop. That fix has been divided into a patch
that can be backported (a simple of_node_put()), and another one that
uses the scoped variant of the macro, removing the need for any
of_node_put(). That prevents mistakes if new break/return instructions
are added, but the macro might not be available in older kernels.
When at it, an unused header has been dropped.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (3):
drm/mediatek: ovl_adaptor: drop unused mtk_crtc.h header
drm/mediatek: ovl_adaptor: add missing of_node_put()
drm/mediatek: ovl_adaptor: use scoped variant of for_each_child_of_node()
drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
---
base-commit: f76698bd9a8ca01d3581236082d786e9a6b72bb7
change-id: 20240624-mtk_disp_ovl_adaptor_scoped-0702a6b23443
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: d33d26036a0274b472299d7dcdaa5fb34329f91b
Gitweb: https://git.kernel.org/tip/d33d26036a0274b472299d7dcdaa5fb34329f91b
Author: Roland Xu <mu001999(a)outlook.com>
AuthorDate: Thu, 15 Aug 2024 10:58:13 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 15 Aug 2024 15:38:53 +02:00
rtmutex: Drop rt_mutex::wait_lock before scheduling
rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the
good case it returns with the lock held and in the deadlock case it emits a
warning and goes into an endless scheduling loop with the lock held, which
triggers the 'scheduling in atomic' warning.
Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning
and dropping into the schedule for ever loop.
[ tglx: Moved unlock before the WARN(), removed the pointless comment,
massaged changelog, added Fixes tag ]
Fixes: 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter")
Signed-off-by: Roland Xu <mu001999(a)outlook.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300M…
---
kernel/locking/rtmutex.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 88d08ee..fba1229 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1644,6 +1644,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_mutex_base *lock,
}
static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
+ struct rt_mutex_base *lock,
struct rt_mutex_waiter *w)
{
/*
@@ -1656,10 +1657,10 @@ static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
if (build_ww_mutex() && w->ww_ctx)
return;
- /*
- * Yell loudly and stop the task right here.
- */
+ raw_spin_unlock_irq(&lock->wait_lock);
+
WARN(1, "rtmutex deadlock detected\n");
+
while (1) {
set_current_state(TASK_INTERRUPTIBLE);
rt_mutex_schedule();
@@ -1713,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
} else {
__set_current_state(TASK_RUNNING);
remove_waiter(lock, waiter);
- rt_mutex_handle_deadlock(ret, chwalk, waiter);
+ rt_mutex_handle_deadlock(ret, chwalk, lock, waiter);
}
/*
Hi Greg,
Please consider these two patches for the 6.6 kernel. These patches are
unmodified versions of the corresponding upstream commits.
Thank you,
Bart.
David Stevens (1):
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
Dongli Zhang (1):
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
kernel/irq/cpuhotplug.c | 27 ++++++++++++++++++++++++---
kernel/irq/manage.c | 12 ++++++++----
2 files changed, 32 insertions(+), 7 deletions(-)
Hi stable folks,
I noticed that these two KVM/arm64 pgtable fixes are missing from 6.6.y
so I've done the backports. The second one is also needed in 6.1.y but
it needs some tweaks so I'll post a separate backport for that.
Cheers,
Will
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
cc: kvmarm(a)lists.linux.dev
--->8
Will Deacon (2):
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
arch/arm64/kvm/hyp/pgtable.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--
2.46.0.184.g6999bdac58-goog
An USB hub is not a HCD, but an USB device. Fix the referenced schema
accordingly.
Fixes: bfbf2e4b77e2 ("dt-bindings: usb: Document the Microchip USB2514 hub")
Cc: stable(a)vger.kernel.org
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Signed-off-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
---
As this USB hub also can contain an USB (ethernet) sub device, I copied
the subdevice part from usb-hcd.yaml.
I had to add 'additionalProperties: true' as well, because I got that warning
upon dt_binding_check otherwise:
> Documentation/devicetree/bindings/usb/microchip,usb2514.yaml:
> ^.*@[0-9a-f]{1,2}$: Missing additionalProperties/unevaluatedProperties constraint
I added a Fixes tag to keep this schema aligned in v6.10 stable tree.
Changes in v2:
* Do not update the example
* Adjust comit message accordingly
* Add Cc for stable
* Collected Krzysztof's R-b
* Shorten the SHA1 of the Fixes tag
.../devicetree/bindings/usb/microchip,usb2514.yaml | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml b/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml
index 245e8c3ce6699..b14e6f37b2987 100644
--- a/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml
+++ b/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml
@@ -10,7 +10,7 @@ maintainers:
- Fabio Estevam <festevam(a)gmail.com>
allOf:
- - $ref: usb-hcd.yaml#
+ - $ref: usb-device.yaml#
properties:
compatible:
@@ -36,6 +36,13 @@ required:
- compatible
- reg
+patternProperties:
+ "^.*@[0-9a-f]{1,2}$":
+ description: The hard wired USB devices
+ type: object
+ $ref: /schemas/usb/usb-device.yaml
+ additionalProperties: true
+
unevaluatedProperties: false
examples:
--
2.34.1
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
Changes in v2:
- v1 was sent in mistake, b4 messed up with QEMU again
- Link to v1: https://lore.kernel.org/r/20240621-loongson3-ipi-follow-v1-0-c6e73f2b2844@f…
---
Jiaxun Yang (3):
hw/mips/loongson3_virt: Store core_iocsr into LoongsonMachineState
hw/mips/loongson3_virt: Fix condition of IPI IOCSR connection
linux-user/mips64: Use MIPS64R2-generic as default CPU type
hw/mips/loongson3_virt.c | 5 ++++-
linux-user/mips64/target_elf.h | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)
---
base-commit: 02d9c38236cf8c9826e5c5be61780c4444cb4ae0
change-id: 20240621-loongson3-ipi-follow-1f4919621882
Best regards,
--
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8bdd9ef7e9b1b2a73e394712b72b22055e0e26c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081218-quotation-thud-f8b0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8bdd9ef7e9b1 ("drm/i915/gem: Fix Virtual Memory mapping boundaries calculation")
8e4ee5e87ce6 ("drm/i915: Wrap all access to i915_vma.node.start|size")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8bdd9ef7e9b1b2a73e394712b72b22055e0e26c3 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)linux.intel.com>
Date: Fri, 2 Aug 2024 10:38:50 +0200
Subject: [PATCH] drm/i915/gem: Fix Virtual Memory mapping boundaries
calculation
Calculating the size of the mapped area as the lesser value
between the requested size and the actual size does not consider
the partial mapping offset. This can cause page fault access.
Fix the calculation of the starting and ending addresses, the
total size is now deduced from the difference between the end and
start addresses.
Additionally, the calculations have been rewritten in a clearer
and more understandable form.
Fixes: c58305af1835 ("drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass")
Reported-by: Jann Horn <jannh(a)google.com>
Co-developed-by: Chris Wilson <chris.p.wilson(a)linux.intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson(a)linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v4.9+
Reviewed-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Jonathan Cavitt <Jonathan.cavitt(a)intel.com>
[Joonas: Add Requires: tag]
Requires: 60a2066c5005 ("drm/i915/gem: Adjust vma offset for framebuffer mmap offset")
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240802083850.103694-3-andi.…
(cherry picked from commit 97b6784753da06d9d40232328efc5c5367e53417)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index ce10dd259812..cac6d4184506 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -290,6 +290,41 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf)
return i915_error_to_vmf_fault(err);
}
+static void set_address_limits(struct vm_area_struct *area,
+ struct i915_vma *vma,
+ unsigned long obj_offset,
+ unsigned long *start_vaddr,
+ unsigned long *end_vaddr)
+{
+ unsigned long vm_start, vm_end, vma_size; /* user's memory parameters */
+ long start, end; /* memory boundaries */
+
+ /*
+ * Let's move into the ">> PAGE_SHIFT"
+ * domain to be sure not to lose bits
+ */
+ vm_start = area->vm_start >> PAGE_SHIFT;
+ vm_end = area->vm_end >> PAGE_SHIFT;
+ vma_size = vma->size >> PAGE_SHIFT;
+
+ /*
+ * Calculate the memory boundaries by considering the offset
+ * provided by the user during memory mapping and the offset
+ * provided for the partial mapping.
+ */
+ start = vm_start;
+ start -= obj_offset;
+ start += vma->gtt_view.partial.offset;
+ end = start + vma_size;
+
+ start = max_t(long, start, vm_start);
+ end = min_t(long, end, vm_end);
+
+ /* Let's move back into the "<< PAGE_SHIFT" domain */
+ *start_vaddr = (unsigned long)start << PAGE_SHIFT;
+ *end_vaddr = (unsigned long)end << PAGE_SHIFT;
+}
+
static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
{
#define MIN_CHUNK_PAGES (SZ_1M >> PAGE_SHIFT)
@@ -302,14 +337,18 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
struct i915_ggtt *ggtt = to_gt(i915)->ggtt;
bool write = area->vm_flags & VM_WRITE;
struct i915_gem_ww_ctx ww;
+ unsigned long obj_offset;
+ unsigned long start, end; /* memory boundaries */
intel_wakeref_t wakeref;
struct i915_vma *vma;
pgoff_t page_offset;
+ unsigned long pfn;
int srcu;
int ret;
- /* We don't use vmf->pgoff since that has the fake offset */
+ obj_offset = area->vm_pgoff - drm_vma_node_start(&mmo->vma_node);
page_offset = (vmf->address - area->vm_start) >> PAGE_SHIFT;
+ page_offset += obj_offset;
trace_i915_gem_object_fault(obj, page_offset, true, write);
@@ -402,12 +441,14 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (ret)
goto err_unpin;
+ set_address_limits(area, vma, obj_offset, &start, &end);
+
+ pfn = (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT;
+ pfn += (start - area->vm_start) >> PAGE_SHIFT;
+ pfn += obj_offset - vma->gtt_view.partial.offset;
+
/* Finally, remap it using the new GTT offset */
- ret = remap_io_mapping(area,
- area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT),
- (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT,
- min_t(u64, vma->size, area->vm_end - area->vm_start),
- &ggtt->iomap);
+ ret = remap_io_mapping(area, start, pfn, end - start, &ggtt->iomap);
if (ret)
goto err_fence;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d67c5649c1541dc93f202eeffc6f49220a4ed71d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081208-motion-jubilant-6af5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d67c5649c154 ("mptcp: fully established after ADD_ADDR echo on MPJ")
b3ea6b272d79 ("mptcp: consolidate initial ack seq generation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d67c5649c1541dc93f202eeffc6f49220a4ed71d Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 31 Jul 2024 13:05:53 +0200
Subject: [PATCH] mptcp: fully established after ADD_ADDR echo on MPJ
Before this patch, receiving an ADD_ADDR echo on the just connected
MP_JOIN subflow -- initiator side, after the MP_JOIN 3WHS -- was
resulting in an MP_RESET. That's because only ACKs with a DSS or
ADD_ADDRs without the echo bit were allowed.
Not allowing the ADD_ADDR echo after an MP_CAPABLE 3WHS makes sense, as
we are not supposed to send an ADD_ADDR before because it requires to be
in full established mode first. For the MP_JOIN 3WHS, that's different:
the ADD_ADDR can be sent on a previous subflow, and the ADD_ADDR echo
can be received on the recently created one. The other peer will already
be in fully established, so it is allowed to send that.
We can then relax the conditions here to accept the ADD_ADDR echo for
MPJ subflows.
Fixes: 67b12f792d5e ("mptcp: full fully established support after ADD_ADDR")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a68382a4fe9..ac2f1a54cc43 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -958,7 +958,8 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
if (subflow->remote_key_valid &&
(((mp_opt->suboptions & OPTION_MPTCP_DSS) && mp_opt->use_ack) ||
- ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) && !mp_opt->echo))) {
+ ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) &&
+ (!mp_opt->echo || subflow->mp_join)))) {
/* subflows are fully established as soon as we get any
* additional ack, including ADD_ADDR.
*/
From: Simon Trimmer <simont(a)opensource.cirrus.com>
[ Upstream commit 72776774b55bb59b7b1b09117e915a5030110304 ]
Please apply to 6.10.
The upstream patch should have had a Fixes: tag but it was missing.
Device tuning files made with early revision tooling may contain
configuration that can unmask IRQ signals that are owned by the host.
Adding a safe default to the regmap patch ensures that the hardware
matches the driver expectations.
Signed-off-by: Simon Trimmer <simont(a)opensource.cirrus.com>
Link: https://patch.msgid.link/20240807142648.46932-1-simont@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
---
sound/soc/codecs/cs35l56-shared.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/soc/codecs/cs35l56-shared.c b/sound/soc/codecs/cs35l56-shared.c
index f609cade805d..58b213722e4e 100644
--- a/sound/soc/codecs/cs35l56-shared.c
+++ b/sound/soc/codecs/cs35l56-shared.c
@@ -24,6 +24,7 @@ static const struct reg_sequence cs35l56_patch[] = {
{ CS35L56_SWIRE_DP3_CH2_INPUT, 0x00000019 },
{ CS35L56_SWIRE_DP3_CH3_INPUT, 0x00000029 },
{ CS35L56_SWIRE_DP3_CH4_INPUT, 0x00000028 },
+ { CS35L56_IRQ1_MASK_18, 0x1f7df0ff },
/* These are not reset by a soft-reset, so patch to defaults. */
{ CS35L56_MAIN_RENDER_USER_MUTE, 0x00000000 },
--
2.39.2
commit ab091ec536cb7b271983c0c063b17f62f3591583 upstream
There is a hardware power-saving problem with the Lenovo N60z
board. When turn it on and leave it for 10 hours, there is a
20% chance that a nvme disk will not wake up until reboot.
Link: https://lore.kernel.org/all/2B5581C46AC6E335+9c7a81f1-05fb-4fd0-9fbb-108757…
Signed-off-by: hmy <huanglin(a)uniontech.com>
Signed-off-by: Wentao Guan <guanwentao(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
---
drivers/nvme/host/pci.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5a3ba7e39054..d73b8eb76b8f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2968,6 +2968,13 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
return NVME_QUIRK_FORCE_NO_SIMPLE_SUSPEND;
}
+ /*
+ * NVMe SSD drops off the PCIe bus after system idle
+ * for 10 hours on a Lenovo N60z board.
+ */
+ if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
+ return NVME_QUIRK_NO_APST;
+
return 0;
}
--
2.43.4
From: Shyjumon N <shyjumon.n(a)intel.com>
commit 1fae37accfc5872af3905d4ba71dc6ab15829be7 upstream
The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo
C640 platform experience runtime resume issues when the SSDs are kept in
sleep/suspend mode for long time.
This patch applies the 'Simple Suspend' quirk to these configurations.
With this patch, the issue had not been observed in a 1+ day test.
Reviewed-by: Jon Derrick <jonathan.derrick(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Shyjumon N <shyjumon.n(a)intel.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Erpeng Xu <xuerpeng(a)uniontech.com>
---
drivers/nvme/host/pci.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9c80f9f08149..b0434b687b17 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2747,6 +2747,18 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
(dmi_match(DMI_BOARD_NAME, "PRIME B350M-A") ||
dmi_match(DMI_BOARD_NAME, "PRIME Z370-A")))
return NVME_QUIRK_NO_APST;
+ } else if ((pdev->vendor == 0x144d && (pdev->device == 0xa801 ||
+ pdev->device == 0xa808 || pdev->device == 0xa809)) ||
+ (pdev->vendor == 0x1e0f && pdev->device == 0x0001)) {
+ /*
+ * Forcing to use host managed nvme power settings for
+ * lowest idle power with quick resume latency on
+ * Samsung and Toshiba SSDs based on suspend behavior
+ * on Coffee Lake board for LENOVO C640
+ */
+ if ((dmi_match(DMI_BOARD_VENDOR, "LENOVO")) &&
+ dmi_match(DMI_BOARD_NAME, "LNVNB161216"))
+ return NVME_QUIRK_SIMPLE_SUSPEND;
}
return 0;
--
2.45.2
commit f442fa6141379a20b48ae3efabee827a3d260787 upstream
A kernel warning was reported when pinning folio in CMA memory when
launching SEV virtual machine. The splat looks like:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520
[ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6
[ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520
[ 464.325515] Call Trace:
[ 464.325520] <TASK>
[ 464.325523] ? __get_user_pages+0x423/0x520
[ 464.325528] ? __warn+0x81/0x130
[ 464.325536] ? __get_user_pages+0x423/0x520
[ 464.325541] ? report_bug+0x171/0x1a0
[ 464.325549] ? handle_bug+0x3c/0x70
[ 464.325554] ? exc_invalid_op+0x17/0x70
[ 464.325558] ? asm_exc_invalid_op+0x1a/0x20
[ 464.325567] ? __get_user_pages+0x423/0x520
[ 464.325575] __gup_longterm_locked+0x212/0x7a0
[ 464.325583] internal_get_user_pages_fast+0xfb/0x190
[ 464.325590] pin_user_pages_fast+0x47/0x60
[ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd]
[ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Per the analysis done by yangge, when starting the SEV virtual machine, it
will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory.
But the page is in CMA area, so fast GUP will fail then fallback to the
slow path due to the longterm pinnalbe check in try_grab_folio().
The slow path will try to pin the pages then migrate them out of CMA area.
But the slow path also uses try_grab_folio() to pin the page, it will
also fail due to the same check then the above warning is triggered.
In addition, the try_grab_folio() is supposed to be used in fast path and
it elevates folio refcount by using add ref unless zero. We are guaranteed
to have at least one stable reference in slow path, so the simple atomic add
could be used. The performance difference should be trivial, but the
misuse may be confusing and misleading.
Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page()
to try_grab_folio(), and use them in the proper paths. This solves both
the abuse and the kernel warning.
The proper naming makes their usecase more clear and should prevent from
abusing in the future.
peterx said:
: The user will see the pin fails, for gpu-slow it further triggers the WARN
: right below that failure (as in the original report):
:
: folio = try_grab_folio(page, page_increm - 1,
: foll_flags);
: if (WARN_ON_ONCE(!folio)) { <------------------------ here
: /*
: * Release the 1st page ref if the
: * folio is problematic, fail hard.
: */
: gup_put_folio(page_folio(page), 1,
: foll_flags);
: ret = -EFAULT;
: goto out;
: }
[1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge11…
[shy828301(a)gmail.com: fix implicit declaration of function try_grab_folio_fast]
Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xj…
Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.…
Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Reported-by: yangge <yangge1116(a)126.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 251 ++++++++++++++++++++++++-----------------------
mm/huge_memory.c | 6 +-
mm/hugetlb.c | 2 +-
mm/internal.h | 4 +-
4 files changed, 135 insertions(+), 128 deletions(-)
v2: Fixed a build failure
diff --git a/mm/gup.c b/mm/gup.c
index f50fe2219a13..fdd75384160d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -97,95 +97,6 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
return folio;
}
-/**
- * try_grab_folio() - Attempt to get or pin a folio.
- * @page: pointer to page to be grabbed
- * @refs: the value to (effectively) add to the folio's refcount
- * @flags: gup flags: these are the FOLL_* flag values.
- *
- * "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
- *
- * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
- * same time. (That's true throughout the get_user_pages*() and
- * pin_user_pages*() APIs.) Cases:
- *
- * FOLL_GET: folio's refcount will be incremented by @refs.
- *
- * FOLL_PIN on large folios: folio's refcount will be incremented by
- * @refs, and its pincount will be incremented by @refs.
- *
- * FOLL_PIN on single-page folios: folio's refcount will be incremented by
- * @refs * GUP_PIN_COUNTING_BIAS.
- *
- * Return: The folio containing @page (with refcount appropriately
- * incremented) for success, or NULL upon failure. If neither FOLL_GET
- * nor FOLL_PIN was set, that's considered failure, and furthermore,
- * a likely bug in the caller, so a warning is also emitted.
- */
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
-{
- struct folio *folio;
-
- if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
- return NULL;
-
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
- return NULL;
-
- if (flags & FOLL_GET)
- return try_get_folio(page, refs);
-
- /* FOLL_PIN is set */
-
- /*
- * Don't take a pin on the zero page - it's not going anywhere
- * and it is used in a *lot* of places.
- */
- if (is_zero_page(page))
- return page_folio(page);
-
- folio = try_get_folio(page, refs);
- if (!folio)
- return NULL;
-
- /*
- * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
- * right zone, so fail and let the caller fall back to the slow
- * path.
- */
- if (unlikely((flags & FOLL_LONGTERM) &&
- !folio_is_longterm_pinnable(folio))) {
- if (!put_devmap_managed_page_refs(&folio->page, refs))
- folio_put_refs(folio, refs);
- return NULL;
- }
-
- /*
- * When pinning a large folio, use an exact count to track it.
- *
- * However, be sure to *also* increment the normal folio
- * refcount field at least once, so that the folio really
- * is pinned. That's why the refcount from the earlier
- * try_get_folio() is left intact.
- */
- if (folio_test_large(folio))
- atomic_add(refs, &folio->_pincount);
- else
- folio_ref_add(folio,
- refs * (GUP_PIN_COUNTING_BIAS - 1));
- /*
- * Adjust the pincount before re-checking the PTE for changes.
- * This is essentially a smp_mb() and is paired with a memory
- * barrier in page_try_share_anon_rmap().
- */
- smp_mb__after_atomic();
-
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
-
- return folio;
-}
-
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
@@ -203,58 +114,59 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
}
/**
- * try_grab_page() - elevate a page's refcount by a flag-dependent amount
- * @page: pointer to page to be grabbed
- * @flags: gup flags: these are the FOLL_* flag values.
+ * try_grab_folio() - add a folio's refcount by a flag-dependent amount
+ * @folio: pointer to folio to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values
*
* This might not do anything at all, depending on the flags argument.
*
* "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
*
* Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
- * time. Cases: please see the try_grab_folio() documentation, with
- * "refs=1".
+ * time.
*
* Return: 0 for success, or if no action was required (if neither FOLL_PIN
* nor FOLL_GET was set, nothing is done). A negative error code for failure:
*
- * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not
+ * -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not
* be grabbed.
+ *
+ * It is called when we have a stable reference for the folio, typically in
+ * GUP slow path.
*/
-int __must_check try_grab_page(struct page *page, unsigned int flags)
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags)
{
- struct folio *folio = page_folio(page);
-
if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
return -ENOMEM;
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page)))
return -EREMOTEIO;
if (flags & FOLL_GET)
- folio_ref_inc(folio);
+ folio_ref_add(folio, refs);
else if (flags & FOLL_PIN) {
/*
* Don't take a pin on the zero page - it's not going anywhere
* and it is used in a *lot* of places.
*/
- if (is_zero_page(page))
+ if (is_zero_folio(folio))
return 0;
/*
- * Similar to try_grab_folio(): be sure to *also*
- * increment the normal page refcount field at least once,
+ * Increment the normal page refcount field at least once,
* so that the page really is pinned.
*/
if (folio_test_large(folio)) {
- folio_ref_add(folio, 1);
- atomic_add(1, &folio->_pincount);
+ folio_ref_add(folio, refs);
+ atomic_add(refs, &folio->_pincount);
} else {
- folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+ folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS);
}
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
}
return 0;
@@ -647,8 +559,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
- ret = try_grab_page(page, flags);
+ /* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (unlikely(ret)) {
page = ERR_PTR(ret);
goto out;
@@ -899,7 +811,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
goto unmap;
*page = pte_page(entry);
}
- ret = try_grab_page(*page, gup_flags);
+ ret = try_grab_folio(page_folio(*page), 1, gup_flags);
if (unlikely(ret))
goto unmap;
out:
@@ -1302,20 +1214,19 @@ static long __get_user_pages(struct mm_struct *mm,
* pages.
*/
if (page_increm > 1) {
- struct folio *folio;
+ struct folio *folio = page_folio(page);
/*
* Since we already hold refcount on the
* large folio, this should never fail.
*/
- folio = try_grab_folio(page, page_increm - 1,
- foll_flags);
- if (WARN_ON_ONCE(!folio)) {
+ if (try_grab_folio(folio, page_increm - 1,
+ foll_flags)) {
/*
* Release the 1st page ref if the
* folio is problematic, fail hard.
*/
- gup_put_folio(page_folio(page), 1,
+ gup_put_folio(folio, 1,
foll_flags);
ret = -EFAULT;
goto out;
@@ -2541,6 +2452,102 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
}
}
+/**
+ * try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
+ * @page: pointer to page to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values.
+ *
+ * "grab" names in this file mean, "look at flags to decide whether to use
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
+ *
+ * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
+ * same time. (That's true throughout the get_user_pages*() and
+ * pin_user_pages*() APIs.) Cases:
+ *
+ * FOLL_GET: folio's refcount will be incremented by @refs.
+ *
+ * FOLL_PIN on large folios: folio's refcount will be incremented by
+ * @refs, and its pincount will be incremented by @refs.
+ *
+ * FOLL_PIN on single-page folios: folio's refcount will be incremented by
+ * @refs * GUP_PIN_COUNTING_BIAS.
+ *
+ * Return: The folio containing @page (with refcount appropriately
+ * incremented) for success, or NULL upon failure. If neither FOLL_GET
+ * nor FOLL_PIN was set, that's considered failure, and furthermore,
+ * a likely bug in the caller, so a warning is also emitted.
+ *
+ * It uses add ref unless zero to elevate the folio refcount and must be called
+ * in fast path only.
+ */
+static struct folio *try_grab_folio_fast(struct page *page, int refs,
+ unsigned int flags)
+{
+ struct folio *folio;
+
+ /* Raise warn if it is not called in fast GUP */
+ VM_WARN_ON_ONCE(!irqs_disabled());
+
+ if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
+ return NULL;
+
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ return NULL;
+
+ if (flags & FOLL_GET)
+ return try_get_folio(page, refs);
+
+ /* FOLL_PIN is set */
+
+ /*
+ * Don't take a pin on the zero page - it's not going anywhere
+ * and it is used in a *lot* of places.
+ */
+ if (is_zero_page(page))
+ return page_folio(page);
+
+ folio = try_get_folio(page, refs);
+ if (!folio)
+ return NULL;
+
+ /*
+ * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
+ * right zone, so fail and let the caller fall back to the slow
+ * path.
+ */
+ if (unlikely((flags & FOLL_LONGTERM) &&
+ !folio_is_longterm_pinnable(folio))) {
+ if (!put_devmap_managed_page_refs(&folio->page, refs))
+ folio_put_refs(folio, refs);
+ return NULL;
+ }
+
+ /*
+ * When pinning a large folio, use an exact count to track it.
+ *
+ * However, be sure to *also* increment the normal folio
+ * refcount field at least once, so that the folio really
+ * is pinned. That's why the refcount from the earlier
+ * try_get_folio() is left intact.
+ */
+ if (folio_test_large(folio))
+ atomic_add(refs, &folio->_pincount);
+ else
+ folio_ref_add(folio,
+ refs * (GUP_PIN_COUNTING_BIAS - 1));
+ /*
+ * Adjust the pincount before re-checking the PTE for changes.
+ * This is essentially a smp_mb() and is paired with a memory
+ * barrier in folio_try_share_anon_rmap_*().
+ */
+ smp_mb__after_atomic();
+
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
+
+ return folio;
+}
+
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
/*
* Fast-gup relies on pte change detection to avoid concurrent pgtable
@@ -2605,7 +2612,7 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- folio = try_grab_folio(page, 1, flags);
+ folio = try_grab_folio_fast(page, 1, flags);
if (!folio)
goto pte_unmap;
@@ -2699,7 +2706,7 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
SetPageReferenced(page);
pages[*nr] = page;
- if (unlikely(try_grab_page(page, flags))) {
+ if (unlikely(try_grab_folio(page_folio(page), 1, flags))) {
undo_dev_pagemap(nr, nr_start, flags, pages);
break;
}
@@ -2808,7 +2815,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -2879,7 +2886,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -2923,7 +2930,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -2963,7 +2970,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 79fbd6ddec49..2e64897168bc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1052,7 +1052,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!*pgmap)
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
@@ -1210,7 +1210,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
@@ -1471,7 +1471,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
return ERR_PTR(ret);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a480affd475b..ab040f8d1987 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6532,7 +6532,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
* try_grab_page() should always be able to get the page here,
* because we hold the ptl lock and have verified pte_present().
*/
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (WARN_ON_ONCE(ret)) {
page = ERR_PTR(ret);
diff --git a/mm/internal.h b/mm/internal.h
index abed947f784b..ef8d787a510c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -938,8 +938,8 @@ int migrate_device_coherent_page(struct page *page);
/*
* mm/gup.c
*/
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
-int __must_check try_grab_page(struct page *page, unsigned int flags);
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags);
/*
* mm/huge_memory.c
--
2.41.0
From: Chuck Lever <chuck.lever(a)oracle.com>
Following up on
https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
Here is a backport series targeting origin/linux-6.1.y that closes
the information leak described in the above thread.
I started with v6.1.y because that is the most recent LTS kernel
and thus the closest to upstream. I plan to look at 5.15 and 5.10
LTS too if this series is applied to v6.1.y.
Review comments welcome.
Chuck Lever (6):
NFSD: Refactor nfsd_reply_cache_free_locked()
NFSD: Rename nfsd_reply_cache_alloc()
NFSD: Replace nfsd_prune_bucket()
NFSD: Refactor the duplicate reply cache shrinker
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Jeff Layton (2):
nfsd: move reply cache initialization into nfsd startup
nfsd: move init of percpu reply_cache_stats counters back to
nfsd_init_net
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/export.c | 32 ++++--
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfscache.c | 201 ++++++++++++++++++++++---------------
fs/nfsd/nfsctl.c | 24 ++---
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 24 +++--
fs/nfsd/stats.c | 52 ++++------
fs/nfsd/stats.h | 83 ++++++---------
fs/nfsd/trace.h | 22 ++++
fs/nfsd/vfs.c | 6 +-
include/linux/sunrpc/svc.h | 5 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 36 ++++---
18 files changed, 301 insertions(+), 231 deletions(-)
--
2.45.1
From: Chuck Lever <chuck.lever(a)oracle.com>
Following up on:
https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
Here is a backport series targeting origin/linux-6.6.y that closes
the information leak described in the above thread. It passes basic
NFSD regression testing.
Review comments welcome.
Chuck Lever (2):
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
fs/lockd/svc.c | 3 --
fs/nfs/callback.c | 3 --
fs/nfsd/cache.h | 2 -
fs/nfsd/export.c | 32 ++++++++++----
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 +++++++++--
fs/nfsd/nfs4proc.c | 6 +--
fs/nfsd/nfs4state.c | 3 +-
fs/nfsd/nfscache.c | 40 ++++-------------
fs/nfsd/nfsctl.c | 16 +++----
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 14 +++---
fs/nfsd/stats.c | 54 ++++++++++-------------
fs/nfsd/stats.h | 88 ++++++++++++++------------------------
fs/nfsd/vfs.c | 6 ++-
include/linux/sunrpc/svc.h | 5 ++-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 39 +++++++++++------
19 files changed, 163 insertions(+), 183 deletions(-)
--
2.45.1
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7697a0fe0154468f5df35c23ebd7aa48994c2cdc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072921-props-yam-bb2b@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
7697a0fe0154 ("LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h")
26a3b85bac08 ("loongarch: convert to generic syscall table")
505d66d1abfb ("clone3: drop __ARCH_WANT_SYS_CLONE3 macro")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7697a0fe0154468f5df35c23ebd7aa48994c2cdc Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Sat, 20 Jul 2024 22:40:58 +0800
Subject: [PATCH] LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Chromium sandbox apparently wants to deny statx [1] so it could properly
inspect arguments after the sandboxed process later falls back to fstat.
Because there's currently not a "fd-only" version of statx, so that the
sandbox has no way to ensure the path argument is empty without being
able to peek into the sandboxed process's memory. For architectures able
to do newfstatat though, glibc falls back to newfstatat after getting
-ENOSYS for statx, then the respective SIGSYS handler [2] takes care of
inspecting the path argument, transforming allowed newfstatat's into
fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!
Actually, back when the LoongArch port was under review, people were
aware of the same problem with sandboxing clone3 [3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to LoongArch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection.
However, this is obviously a decision that shouldn't be taken lightly,
so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT
in unistd.h. This is the simplest solution for now, and so we hope the
community will tackle the long-standing problem of seccomp deep argument
inspection in the future [4][5].
Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to
upstream asm-generic changes.
More infomation please reading this thread [6].
[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux…
[3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal…
[4] https://lwn.net/Articles/799557/
[5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-i…
[6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@b…
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/arch/loongarch/include/asm/unistd.h b/arch/loongarch/include/asm/unistd.h
index fc0a481a7416..e2c0f3d86c7b 100644
--- a/arch/loongarch/include/asm/unistd.h
+++ b/arch/loongarch/include/asm/unistd.h
@@ -8,6 +8,7 @@
#include <uapi/asm/unistd.h>
+#define __ARCH_WANT_NEW_STAT
#define __ARCH_WANT_SYS_CLONE
#define NR_syscalls (__NR_syscalls)
diff --git a/arch/loongarch/kernel/Makefile.syscalls b/arch/loongarch/kernel/Makefile.syscalls
index ab7d9baa2915..523bb411a3bc 100644
--- a/arch/loongarch/kernel/Makefile.syscalls
+++ b/arch/loongarch/kernel/Makefile.syscalls
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
-# No special ABIs on loongarch so far
-syscall_abis_64 +=
+syscall_abis_64 += newstat
To clarify…
On 02/07/2024 5:54 pm, Calum Mackay wrote:
> hi Petr,
>
> I noticed your LTP patch [1][2] which adjusts the nfsstat01 test on v6.9
> kernels, to account for Josef's changes [3], which restrict the NFS/RPC
> stats per-namespace.
>
> I see that Josef's changes were backported, as far back as longterm
> v5.4,
Sorry, that's not quite accurate.
Josef's NFS client changes were all backported from v6.9, as far as
longterm v5.4.y:
2057a48d0dd0 sunrpc: add a struct rpc_stats arg to rpc_create_args
d47151b79e32 nfs: expose /proc/net/sunrpc/nfs in net namespaces
1548036ef120 nfs: make the rpc_stat per net namespace
Of Josef's NFS server changes, four were backported from v6.9 to v6.8:
418b9687dece sunrpc: use the struct net as the svc proc private
d98416cc2154 nfsd: rename NFSD_NET_* to NFSD_STATS_*
93483ac5fec6 nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
4b14885411f7 nfsd: make all of the nfsd stats per-network namespace
and the others remained only in v6.9:
ab42f4d9a26f sunrpc: don't change ->sv_stats if it doesn't exist
a2214ed588fb nfsd: stop setting ->pg_stats for unused stats
f09432386766 sunrpc: pass in the sv_stats struct through svc_create_pooled
3f6ef182f144 sunrpc: remove ->pg_stats from svc_program
e41ee44cc6a4 nfsd: remove nfsd_stats, make th_cnt a global counter
16fb9808ab2c nfsd: make svc_stat per-network namespace instead of global
I'm wondering if this difference between NFS client, and NFS server,
stat behaviour, across kernel versions, may perhaps cause some user
confusion?
cheers,
calum.
> so your check for kernel version "6.9" in the test may need to be
> adjusted, if LTP is intended to be run on stable kernels?
>
> best wishes,
> calum.
>
>
> [1] https://lore.kernel.org/ltp/20240620111129.594449-1-pvorel@suse.cz/
> [2] https://patchwork.ozlabs.org/project/ltp/
> patch/20240620111129.594449-1-pvorel(a)suse.cz/
> [3] https://lore.kernel.org/linux-nfs/
> cover.1708026931.git.josef(a)toxicpanda.com/
The DWC3_EP_RESOURCE_ALLOCATED flag ensures that the resource of an
endpoint is only assigned once. Unless the endpoint is reset, don't
clear this flag. Otherwise we may set endpoint resource again, which
prevents the driver from initiate transfer after handling a STALL or
endpoint halt to the control endpoint.
Commit f2e0eee47038 ("usb: dwc3: ep0: Don't reset resource alloc flag")
was fixing the initial issue, but did this only for physical ep1. Since
the function dwc3_ep0_stall_and_restart is resetting the flags for both
physical endpoints, this also has to be done for ep0.
Cc: stable(a)vger.kernel.org
Fixes: b311048c174d ("usb: dwc3: gadget: Rewrite endpoint allocation flow")
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Michael Grzeschik <m.grzeschik(a)pengutronix.de>
---
v2: Added missing double quotes in the referenced patch name
- Link to v1: https://lore.kernel.org/r/20240814-dwc3hwep0reset-v1-1-087b0d26f3d0@pengutr…
---
drivers/usb/dwc3/ep0.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index d96ffbe520397..c9533a99e47c8 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -232,7 +232,8 @@ void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
/* stall is always issued on EP0 */
dep = dwc->eps[0];
__dwc3_gadget_ep_set_halt(dep, 1, false);
- dep->flags = DWC3_EP_ENABLED;
+ dep->flags &= DWC3_EP_RESOURCE_ALLOCATED;
+ dep->flags |= DWC3_EP_ENABLED;
dwc->delayed_status = false;
if (!list_empty(&dep->pending_list)) {
---
base-commit: 38343be0bf9a7d7ef0d160da5f2db887a0e29b62
change-id: 20240814-dwc3hwep0reset-b4d371873494
Best regards,
--
Michael Grzeschik <m.grzeschik(a)pengutronix.de>
pinmux_generic_get_function() can return NULL and the pointer 'function'
was dereferenced without checking against NULL. Add checking of pointer
'function' in pcs_get_function().
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 571aec4df5b7 ("pinctrl: single: Use generic pinmux helpers for managing functions")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-single.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 4c6bfabb6bd7..4da3c3f422b6 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -345,6 +345,8 @@ static int pcs_get_function(struct pinctrl_dev *pctldev, unsigned pin,
return -ENOTSUPP;
fselector = setting->func;
function = pinmux_generic_get_function(pctldev, fselector);
+ if (!function)
+ return -EINVAL;
*func = function->data;
if (!(*func)) {
dev_err(pcs->dev, "%s could not find function%i\n",
--
2.25.1
The DWC3_EP_RESOURCE_ALLOCATED flag ensures that the resource of an
endpoint is only assigned once. Unless the endpoint is reset, don't
clear this flag. Otherwise we may set endpoint resource again, which
prevents the driver from initiate transfer after handling a STALL or
endpoint halt to the control endpoint.
Commit f2e0eee47038 (usb: dwc3: ep0: Don't reset resource alloc flag)
was fixing the initial issue, but did this only for physical ep1. Since
the function dwc3_ep0_stall_and_restart is resetting the flags for both
physical endpoints, this also has to be done for ep0.
Cc: stable(a)vger.kernel.org
Fixes: b311048c174d ("usb: dwc3: gadget: Rewrite endpoint allocation flow")
Signed-off-by: Michael Grzeschik <m.grzeschik(a)pengutronix.de>
---
drivers/usb/dwc3/ep0.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index d96ffbe520397..c9533a99e47c8 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -232,7 +232,8 @@ void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
/* stall is always issued on EP0 */
dep = dwc->eps[0];
__dwc3_gadget_ep_set_halt(dep, 1, false);
- dep->flags = DWC3_EP_ENABLED;
+ dep->flags &= DWC3_EP_RESOURCE_ALLOCATED;
+ dep->flags |= DWC3_EP_ENABLED;
dwc->delayed_status = false;
if (!list_empty(&dep->pending_list)) {
---
base-commit: 38343be0bf9a7d7ef0d160da5f2db887a0e29b62
change-id: 20240814-dwc3hwep0reset-b4d371873494
Best regards,
--
Michael Grzeschik <m.grzeschik(a)pengutronix.de>
The quilt patch titled
Subject: Squashfs: sanity check symbolic link size
has been removed from the -mm tree. Its filename was
squashfs-sanity-check-symbolic-link-size.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: Squashfs: sanity check symbolic link size
Date: Sun, 11 Aug 2024 21:13:01 +0100
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value
3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an unitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
Link: https://lkml.kernel.org/r/20240811201301.13076-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Reported-by: syzbot+24ac24ff58dc5b0d26b9(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Phillip Lougher <phillip(a)squashfs.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/inode.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/squashfs/inode.c~squashfs-sanity-check-symbolic-link-size
+++ a/fs/squashfs/inode.c
@@ -279,8 +279,13 @@ int squashfs_read_inode(struct inode *in
if (err < 0)
goto failed_read;
- set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le32_to_cpu(sqsh_ino->symlink_size);
+ if (inode->i_size > PAGE_SIZE) {
+ ERROR("Corrupted symlink\n");
+ return -EINVAL;
+ }
+
+ set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_op = &squashfs_symlink_inode_ops;
inode_nohighmem(inode);
inode->i_data.a_ops = &squashfs_symlink_aops;
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
commit 4b827b3f305d ("xfs: remove WARN when dquot cache insertion fails")
Disk quota cache insertion failure doesn't require this warning as
the system can still manage and track disk quotas without caching the
dquot object into memory. The failure doesn't imply any data loss or
corruption.
Therefore, the WARN_ON in xfs_qm_dqget_cache_insert function is aggressive
and causes bot noise. I have confirmed there are no conflicts and also
tested the using the C repro from syzkaller:
https://syzkaller.appspot.com/text?tag=ReproC&x=15406772280000
Please do let me know if I missed out on anything as it's my first
backport patch.
Reported-by: syzbot+55fb1b7d909494fd520d(a)syzkaller.appspotmail.com
Signed-off-by: Abhinav Jain <jain.abhinav177(a)gmail.com>
---
fs/xfs/xfs_dquot.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 8fb90da89787..7f071757f278 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -798,7 +798,6 @@ xfs_qm_dqget_cache_insert(
error = radix_tree_insert(tree, id, dqp);
if (unlikely(error)) {
/* Duplicate found! Caller must try again. */
- WARN_ON(error != -EEXIST);
mutex_unlock(&qi->qi_tree_lock);
trace_xfs_dqget_dup(dqp);
return error;
--
2.34.1
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
v2: Update doc strings
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.43.0
The patch titled
Subject: nilfs2: fix state management in error path of log writing function
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-state-management-in-error-path-of-log-writing-function.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix state management in error path of log writing function
Date: Wed, 14 Aug 2024 19:11:19 +0900
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.
First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared. This causes page cache operations to
hang waiting for the writeback flag. For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.
Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.
Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.
Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-state-management-in-error-path-of-log-writing-function
+++ a/fs/nilfs2/segment.c
@@ -1812,6 +1812,9 @@ static void nilfs_segctor_abort_construc
nilfs_abort_logs(&logs, ret ? : err);
list_splice_tail_init(&sci->sc_segbufs, &logs);
+ if (list_empty(&logs))
+ return; /* if the first segment buffer preparation failed */
+
nilfs_cancel_segusage(&logs, nilfs->ns_sufile);
nilfs_free_incomplete_logs(&logs, nilfs);
@@ -2056,7 +2059,7 @@ static int nilfs_segctor_do_construct(st
err = nilfs_segctor_begin_construction(sci, nilfs);
if (unlikely(err))
- goto out;
+ goto failed;
/* Update time stamp */
sci->sc_seg_ctime = ktime_get_real_seconds();
@@ -2120,10 +2123,9 @@ static int nilfs_segctor_do_construct(st
return err;
failed_to_write:
- if (sci->sc_stage.flags & NILFS_CF_IFILE_STARTED)
- nilfs_redirty_inodes(&sci->sc_dirty_files);
-
failed:
+ if (mode == SC_LSEG_SR && nilfs_sc_cstage_get(sci) >= NILFS_ST_IFILE)
+ nilfs_redirty_inodes(&sci->sc_dirty_files);
if (nilfs_doing_gc())
nilfs_redirty_inodes(&sci->sc_gc_inodes);
nilfs_segctor_abort_construction(sci, nilfs, err);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-protect-references-to-superblock-parameters-exposed-in-sysfs.patch
nilfs2-fix-missing-cleanup-on-rollforward-recovery-error.patch
nilfs2-fix-state-management-in-error-path-of-log-writing-function.patch
Dell All In One (AIO) models released after 2017 may use a backlight
controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
The Dell OptiPlex 7760 AIO has an ACPI device for one if its UARTs with
the above _HID + _CID. Loading the dell-uart-backlight driver shows that
there actually is a backlight controller board attached to the UART,
which reports a firmware version of "G&MX01-V15".
But the backlight controller board does not actually control the backlight
brightness and the GPU's native backlight control method does work.
Add a quirk to use the GPU's native backlight control method on this model.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2303936
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/video_detect.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index e509dcbf3090..674b9db7a1ef 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -823,6 +823,21 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
},
},
+ /*
+ * Dell AIO (All in Ones) which advertise an UART attached backlight
+ * controller board in their ACPI tables (and may even have one), but
+ * which need native backlight control nevertheless.
+ */
+ {
+ /* https://bugzilla.redhat.com/show_bug.cgi?id=2303936 */
+ .callback = video_detect_force_native,
+ /* Dell OptiPlex 7760 AIO */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 7760 AIO"),
+ },
+ },
+
/*
* Models which have nvidia-ec-wmi support, but should not use it.
* Note this indicates a likely firmware bug on these models and should
--
2.46.0
The dell-uart-backlight driver supports backlight control on Dell All In
One (AIO) models using a backlight controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
Now the first AIO has turned up which has not only the DSDT bits for this,
but also an actual controller attached to the UART, yet it is not using
this controller for backlight control.
Use the acpi_video_get_backlight_type() function from the ACPI video-detect
code to check if the dell-uart-backlight driver should actually be used.
This allows reusing the existing ACPI video-detect infra to override
the backlight control method on the commandline or with DMI quirks.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/platform/x86/dell/Kconfig | 1 +
drivers/platform/x86/dell/dell-uart-backlight.c | 8 ++++++++
2 files changed, 9 insertions(+)
diff --git a/drivers/platform/x86/dell/Kconfig b/drivers/platform/x86/dell/Kconfig
index f711c59fcf1b..11c2cb7d05b0 100644
--- a/drivers/platform/x86/dell/Kconfig
+++ b/drivers/platform/x86/dell/Kconfig
@@ -162,6 +162,7 @@ config DELL_SMO8800
config DELL_UART_BACKLIGHT
tristate "Dell AIO UART Backlight driver"
depends on ACPI
+ depends on ACPI_VIDEO
depends on BACKLIGHT_CLASS_DEVICE
depends on SERIAL_DEV_BUS
help
diff --git a/drivers/platform/x86/dell/dell-uart-backlight.c b/drivers/platform/x86/dell/dell-uart-backlight.c
index 87d2a20b4cb3..3995f90add45 100644
--- a/drivers/platform/x86/dell/dell-uart-backlight.c
+++ b/drivers/platform/x86/dell/dell-uart-backlight.c
@@ -20,6 +20,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/wait.h>
+#include <acpi/video.h>
#include "../serdev_helpers.h"
/* The backlight controller must respond within 1 second */
@@ -332,10 +333,17 @@ struct serdev_device_driver dell_uart_bl_serdev_driver = {
static int dell_uart_bl_pdev_probe(struct platform_device *pdev)
{
+ enum acpi_backlight_type bl_type;
struct serdev_device *serdev;
struct device *ctrl_dev;
int ret;
+ bl_type = acpi_video_get_backlight_type();
+ if (bl_type != acpi_backlight_dell_uart) {
+ dev_dbg(&pdev->dev, "Not loading (ACPI backlight type = %d)\n", bl_type);
+ return -ENODEV;
+ }
+
ctrl_dev = get_serdev_controller("DELL0501", NULL, 0, "serial0");
if (IS_ERR(ctrl_dev))
return PTR_ERR(ctrl_dev);
--
2.46.0
Dell All In One (AIO) models released after 2017 use a backlight
controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
Commit 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
has added support for this, but I neglected to tie this into
acpi_video_get_backlight_type().
Now the first AIO has turned up which has not only the DSDT bits for this,
but also an actual controller attached to the UART, yet it is not using
this controller for backlight control.
Add support to acpi_video_get_backlight_type() for a new dell_uart
backlight type. So that the existing infra to override the backlight
control method on the commandline or with DMI quirks can be used.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/video_detect.c | 7 +++++++
include/acpi/video.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index c11cbe5b6eaa..e509dcbf3090 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -54,6 +54,8 @@ static void acpi_video_parse_cmdline(void)
acpi_backlight_cmdline = acpi_backlight_nvidia_wmi_ec;
if (!strcmp("apple_gmux", acpi_video_backlight_string))
acpi_backlight_cmdline = acpi_backlight_apple_gmux;
+ if (!strcmp("dell_uart", acpi_video_backlight_string))
+ acpi_backlight_cmdline = acpi_backlight_dell_uart;
if (!strcmp("none", acpi_video_backlight_string))
acpi_backlight_cmdline = acpi_backlight_none;
}
@@ -918,6 +920,7 @@ enum acpi_backlight_type __acpi_video_get_backlight_type(bool native, bool *auto
static DEFINE_MUTEX(init_mutex);
static bool nvidia_wmi_ec_present;
static bool apple_gmux_present;
+ static bool dell_uart_present;
static bool native_available;
static bool init_done;
static long video_caps;
@@ -932,6 +935,7 @@ enum acpi_backlight_type __acpi_video_get_backlight_type(bool native, bool *auto
&video_caps, NULL);
nvidia_wmi_ec_present = nvidia_wmi_ec_supported();
apple_gmux_present = apple_gmux_detect(NULL, NULL);
+ dell_uart_present = acpi_dev_present("DELL0501", NULL, -1);
init_done = true;
}
if (native)
@@ -962,6 +966,9 @@ enum acpi_backlight_type __acpi_video_get_backlight_type(bool native, bool *auto
if (apple_gmux_present)
return acpi_backlight_apple_gmux;
+ if (dell_uart_present)
+ return acpi_backlight_dell_uart;
+
/* Use ACPI video if available, except when native should be preferred. */
if ((video_caps & ACPI_VIDEO_BACKLIGHT) &&
!(native_available && prefer_native_over_acpi_video()))
diff --git a/include/acpi/video.h b/include/acpi/video.h
index 3d538d4178ab..044c463138df 100644
--- a/include/acpi/video.h
+++ b/include/acpi/video.h
@@ -50,6 +50,7 @@ enum acpi_backlight_type {
acpi_backlight_native,
acpi_backlight_nvidia_wmi_ec,
acpi_backlight_apple_gmux,
+ acpi_backlight_dell_uart,
};
#if IS_ENABLED(CONFIG_ACPI_VIDEO)
--
2.46.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081450-exploring-lego-5070@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage")
e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap")
9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap")
f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap")
4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap")
13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap")
77435322777d ("fs: port ->get_acl() to pass mnt_idmap")
011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap")
5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap")
c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap")
7a77db95511c ("fs: port ->symlink() to pass mnt_idmap")
6c960e68aaed ("fs: port ->create() to pass mnt_idmap")
b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap")
c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap")
abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap")
6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001
From: Kees Cook <kees(a)kernel.org>
Date: Thu, 8 Aug 2024 11:39:08 -0700
Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti(a)google.com>
Tested-by: Marco Vanotti <mvanotti(a)google.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Kees Cook <kees(a)kernel.org>
diff --git a/fs/exec.c b/fs/exec.c
index a126e3d1cacb..50e76cc633c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
unsigned int mode;
vfsuid_t vfsuid;
vfsgid_t vfsgid;
+ int err;
if (!mnt_may_suid(file->f_path.mnt))
return;
@@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
/* Be careful if suid/sgid is set */
inode_lock(inode);
- /* reload atomically mode/uid/gid now that lock held */
+ /* Atomically reload and check mode/uid/gid now that lock held. */
mode = inode->i_mode;
vfsuid = i_uid_into_vfsuid(idmap, inode);
vfsgid = i_gid_into_vfsgid(idmap, inode);
+ err = inode_permission(idmap, inode, MAY_EXEC);
inode_unlock(inode);
+ /* Did the exec bit vanish out from under us? Give up. */
+ if (err)
+ return;
+
/* We ignore suid/sgid if there are no mappings for them in the ns */
if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) ||
!vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))
Commit 97ab304ecd95 ("ASoC: topology: Fix references to freed memory")
is a problematic fix for issue in topology loading code, which was
cherry-picked to stable. It was later corrected in
0298f51652be ("ASoC: topology: Fix route memory corruption"), however to
apply cleanly e0e7bc2cbee9 ("ASoC: topology: Clean up route loading")
also needs to be applied.
Link: https://lore.kernel.org/linux-sound/ZrwUCnrtKQ61LWFS@sashalap/T/#mbfd273adf…
Should be applied to stable 6.1, 6.6, 6.9.
v2:
- Mention base commit
- Sign-off patches again, as those are cherrypicks
Amadeusz Sławiński (2):
ASoC: topology: Clean up route loading
ASoC: topology: Fix route memory corruption
sound/soc/soc-topology.c | 32 ++++++++------------------------
1 file changed, 8 insertions(+), 24 deletions(-)
--
2.34.1
From: Mitchell Levy <levymitchell0(a)gmail.com>
When computing which xfeatures are available, make sure that LBR is only
present if both LBR is supported in general, as well as by XSAVES.
There are two distinct CPU features related to the use of XSAVES as it
applies to LBR: whether LBR is itself supported (strictly speaking, I'm
not sure that this is necessary to check though it's certainly a good
sanity check), and whether XSAVES supports LBR (see sections 13.2 and
13.5.12 of the Intel 64 and IA-32 Architectures Software Developer's
Manual, Volume 1). Currently, the LBR subsystem correctly checks both
(see intel_pmu_arch_lbr_init), however the xstate initialization
subsystem does not.
When calculating what value to place in the IA32_XSS MSR,
xfeatures_mask_independent only checks whether LBR support is present,
not whether XSAVES supports LBR. If XSAVES does not support LBR, this
write causes #GP, leaving the state of IA32_XSS unchanged (i.e., set to
zero, as its not written with other values, and its default value is
zero out of RESET per section 13.3 of the arch manual).
Then, the next time XRSTORS is used to restore supervisor state, it will
fail with #GP (because the RFBM has zero for all supervisor features,
which does not match the XCOMP_BV field). In particular,
XFEATURE_MASK_FPSTATE includes supervisor features, so setting up the FPU
will cause a #GP. This results in a call to fpu_reset_from_exception_fixup,
which by the same process results in another #GP. Eventually this causes
the kernel to run out of stack space and #DF.
Fixes: f0dccc9da4c0 ("x86/fpu/xstate: Support dynamic supervisor feature for LBR")
Cc: stable(a)vger.kernel.org
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Mitchell Levy <levymitchell0(a)gmail.com>
---
Changes in v3:
- Use a proper Suggested-by: (thanks tglx)
- Link to v2: https://lore.kernel.org/r/20240809-xsave-lbr-fix-v2-1-04296b387380@gmail.com
Changes in v2:
- Corrected Fixes tag (thanks tglx)
- Properly check for XSAVES support of LBR (thanks tglx)
- Link to v1: https://lore.kernel.org/r/20240808-xsave-lbr-fix-v1-1-a223806c83e7@gmail.com
---
arch/x86/include/asm/fpu/types.h | 7 +++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
arch/x86/kernel/fpu/xstate.h | 4 ++--
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index eb17f31b06d2..de16862bf230 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -591,6 +591,13 @@ struct fpu_state_config {
* even without XSAVE support, i.e. legacy features FP + SSE
*/
u64 legacy_features;
+ /*
+ * @independent_features:
+ *
+ * Features that are supported by XSAVES, but not managed as part of
+ * the FPU core, such as LBR
+ */
+ u64 independent_features;
};
/* FPU state configuration information */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee5e0..1339f8328db5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -788,6 +788,9 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
+ XFEATURE_MASK_INDEPENDENT;
+
/*
* Clear XSAVE features that are disabled in the normal CPUID.
*/
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 2ee0b9c53dcc..afb404cd2059 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -62,9 +62,9 @@ static inline u64 xfeatures_mask_supervisor(void)
static inline u64 xfeatures_mask_independent(void)
{
if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR))
- return XFEATURE_MASK_INDEPENDENT & ~XFEATURE_MASK_LBR;
+ return fpu_kernel_cfg.independent_features & ~XFEATURE_MASK_LBR;
- return XFEATURE_MASK_INDEPENDENT;
+ return fpu_kernel_cfg.independent_features;
}
/* XSAVE/XRSTOR wrapper functions */
---
base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed
change-id: 20240807-xsave-lbr-fix-02d52f641653
Best regards,
--
Mitchell Levy <levymitchell0(a)gmail.com>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x cd7c957f936f8cb80d03e5152f4013aae65bd986
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081245-deem-refinance-8605@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
cd7c957f936f ("mptcp: pm: don't try to create sf if alloc failed")
c95eb32ced82 ("mptcp: pm: reduce indentation blocks")
528cb5f2a1e8 ("mptcp: pass addr to mptcp_pm_alloc_anno_list")
77e4b94a3de6 ("mptcp: update userspace pm infos")
24430f8bf516 ("mptcp: add address into userspace pm list")
fb00ee4f3343 ("mptcp: netlink: respect v4/v6-only sockets")
80638684e840 ("mptcp: get sk from msk directly")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd7c957f936f8cb80d03e5152f4013aae65bd986 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 31 Jul 2024 13:05:56 +0200
Subject: [PATCH] mptcp: pm: don't try to create sf if alloc failed
It sounds better to avoid wasting cycles and / or put extreme memory
pressure on the system by trying to create new subflows if it was not
possible to add a new item in the announce list.
While at it, a warning is now printed if the entry was already in the
list as it should not happen with the in-kernel path-manager. With this
PM, mptcp_pm_alloc_anno_list() should only fail in case of memory
pressure.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
Cc: stable(a)vger.kernel.org
Suggested-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 780f4cca165c..2be7af377cda 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -348,7 +348,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
if (add_entry) {
- if (mptcp_pm_is_kernel(msk))
+ if (WARN_ON_ONCE(mptcp_pm_is_kernel(msk)))
return false;
sk_reset_timer(sk, &add_entry->add_timer,
@@ -555,8 +555,6 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
/* check first for announce */
if (msk->pm.add_addr_signaled < add_addr_signal_max) {
- local = select_signal_address(pernet, msk);
-
/* due to racing events on both ends we can reach here while
* previous add address is still running: if we invoke now
* mptcp_pm_announce_addr(), that will fail and the
@@ -567,11 +565,15 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL))
return;
+ local = select_signal_address(pernet, msk);
if (!local)
goto subflow;
+ /* If the alloc fails, we are on memory pressure, not worth
+ * continuing, and trying to create subflows.
+ */
if (!mptcp_pm_alloc_anno_list(msk, &local->addr))
- goto subflow;
+ return;
__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
msk->pm.add_addr_signaled++;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081414-distance-hurler-0efd@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage")
e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap")
9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap")
f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap")
4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap")
13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap")
77435322777d ("fs: port ->get_acl() to pass mnt_idmap")
011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap")
5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap")
c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap")
7a77db95511c ("fs: port ->symlink() to pass mnt_idmap")
6c960e68aaed ("fs: port ->create() to pass mnt_idmap")
b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap")
c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap")
abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap")
6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001
From: Kees Cook <kees(a)kernel.org>
Date: Thu, 8 Aug 2024 11:39:08 -0700
Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti(a)google.com>
Tested-by: Marco Vanotti <mvanotti(a)google.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Kees Cook <kees(a)kernel.org>
diff --git a/fs/exec.c b/fs/exec.c
index a126e3d1cacb..50e76cc633c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
unsigned int mode;
vfsuid_t vfsuid;
vfsgid_t vfsgid;
+ int err;
if (!mnt_may_suid(file->f_path.mnt))
return;
@@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
/* Be careful if suid/sgid is set */
inode_lock(inode);
- /* reload atomically mode/uid/gid now that lock held */
+ /* Atomically reload and check mode/uid/gid now that lock held. */
mode = inode->i_mode;
vfsuid = i_uid_into_vfsuid(idmap, inode);
vfsgid = i_gid_into_vfsgid(idmap, inode);
+ err = inode_permission(idmap, inode, MAY_EXEC);
inode_unlock(inode);
+ /* Did the exec bit vanish out from under us? Give up. */
+ if (err)
+ return;
+
/* We ignore suid/sgid if there are no mappings for them in the ns */
if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) ||
!vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081412-caddie-manhandle-397e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage")
e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap")
9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap")
f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap")
4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap")
13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap")
77435322777d ("fs: port ->get_acl() to pass mnt_idmap")
011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap")
5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap")
c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap")
7a77db95511c ("fs: port ->symlink() to pass mnt_idmap")
6c960e68aaed ("fs: port ->create() to pass mnt_idmap")
b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap")
c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap")
abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap")
6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001
From: Kees Cook <kees(a)kernel.org>
Date: Thu, 8 Aug 2024 11:39:08 -0700
Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti(a)google.com>
Tested-by: Marco Vanotti <mvanotti(a)google.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Kees Cook <kees(a)kernel.org>
diff --git a/fs/exec.c b/fs/exec.c
index a126e3d1cacb..50e76cc633c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
unsigned int mode;
vfsuid_t vfsuid;
vfsgid_t vfsgid;
+ int err;
if (!mnt_may_suid(file->f_path.mnt))
return;
@@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
/* Be careful if suid/sgid is set */
inode_lock(inode);
- /* reload atomically mode/uid/gid now that lock held */
+ /* Atomically reload and check mode/uid/gid now that lock held. */
mode = inode->i_mode;
vfsuid = i_uid_into_vfsuid(idmap, inode);
vfsgid = i_gid_into_vfsgid(idmap, inode);
+ err = inode_permission(idmap, inode, MAY_EXEC);
inode_unlock(inode);
+ /* Did the exec bit vanish out from under us? Give up. */
+ if (err)
+ return;
+
/* We ignore suid/sgid if there are no mappings for them in the ns */
if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) ||
!vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081405-fender-shortcut-a18f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage")
e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap")
9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap")
f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap")
4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap")
13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap")
77435322777d ("fs: port ->get_acl() to pass mnt_idmap")
011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap")
5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap")
c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap")
7a77db95511c ("fs: port ->symlink() to pass mnt_idmap")
6c960e68aaed ("fs: port ->create() to pass mnt_idmap")
b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap")
c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap")
abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap")
6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001
From: Kees Cook <kees(a)kernel.org>
Date: Thu, 8 Aug 2024 11:39:08 -0700
Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti(a)google.com>
Tested-by: Marco Vanotti <mvanotti(a)google.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Kees Cook <kees(a)kernel.org>
diff --git a/fs/exec.c b/fs/exec.c
index a126e3d1cacb..50e76cc633c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
unsigned int mode;
vfsuid_t vfsuid;
vfsgid_t vfsgid;
+ int err;
if (!mnt_may_suid(file->f_path.mnt))
return;
@@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
/* Be careful if suid/sgid is set */
inode_lock(inode);
- /* reload atomically mode/uid/gid now that lock held */
+ /* Atomically reload and check mode/uid/gid now that lock held. */
mode = inode->i_mode;
vfsuid = i_uid_into_vfsuid(idmap, inode);
vfsgid = i_gid_into_vfsgid(idmap, inode);
+ err = inode_permission(idmap, inode, MAY_EXEC);
inode_unlock(inode);
+ /* Did the exec bit vanish out from under us? Give up. */
+ if (err)
+ return;
+
/* We ignore suid/sgid if there are no mappings for them in the ns */
if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) ||
!vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081458-grumbly-glance-dff9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage")
e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap")
9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap")
f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap")
4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap")
13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap")
77435322777d ("fs: port ->get_acl() to pass mnt_idmap")
011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap")
5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap")
c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap")
7a77db95511c ("fs: port ->symlink() to pass mnt_idmap")
6c960e68aaed ("fs: port ->create() to pass mnt_idmap")
b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap")
c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap")
abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap")
6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001
From: Kees Cook <kees(a)kernel.org>
Date: Thu, 8 Aug 2024 11:39:08 -0700
Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti(a)google.com>
Tested-by: Marco Vanotti <mvanotti(a)google.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Kees Cook <kees(a)kernel.org>
diff --git a/fs/exec.c b/fs/exec.c
index a126e3d1cacb..50e76cc633c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
unsigned int mode;
vfsuid_t vfsuid;
vfsgid_t vfsgid;
+ int err;
if (!mnt_may_suid(file->f_path.mnt))
return;
@@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
/* Be careful if suid/sgid is set */
inode_lock(inode);
- /* reload atomically mode/uid/gid now that lock held */
+ /* Atomically reload and check mode/uid/gid now that lock held. */
mode = inode->i_mode;
vfsuid = i_uid_into_vfsuid(idmap, inode);
vfsgid = i_gid_into_vfsgid(idmap, inode);
+ err = inode_permission(idmap, inode, MAY_EXEC);
inode_unlock(inode);
+ /* Did the exec bit vanish out from under us? Give up. */
+ if (err)
+ return;
+
/* We ignore suid/sgid if there are no mappings for them in the ns */
if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) ||
!vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page
tables we can have with CONFIG_HIGHPTE.
Take care of PTE tables possibly spanning multiple pages, and take care of
CONFIG_PGTABLE_LEVELS complexity when e.g., PMD_SIZE == PUD_SIZE. For
example, with CONFIG_PGTABLE_LEVELS == 2, core-mm would detect
with hugepagesize==PMD_SIZE pmd_leaf() and use the pmd_lockptr(), which
would end up just mapping to the per-MM PT lock.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Reviewed-by: James Houghton <jthoughton(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
Third time is the charm?
Retested on arm64 and x86-64. Cross-compiled on a bunch of others.
v2 -> v3:
* Handle CONFIG_PGTABLE_LEVELS oddities as good as possible. It's a mess.
Remove the size >= P4D_SIZE check and simply default to the
&mm->page_table_lock.
* Align the PTE pointer to the start of the page table to handle PTE page
tables bigger than a single page (unclear if this could currently trigger).
* Extend patch description
v1 -> 2:
* Extend patch description
* Drop "mm: let pte_lockptr() consume a pte_t pointer"
* Introduce ptep_lockptr() in this patch
---
include/linux/hugetlb.h | 27 +++++++++++++++++++++++++--
include/linux/mm.h | 22 ++++++++++++++++++++++
2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c9bf68c239a01..e6437a06e2346 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -944,9 +944,32 @@ static inline bool htlb_allow_alloc_fallback(int reason)
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ unsigned long size = huge_page_size(h);
+
+ VM_WARN_ON(size == PAGE_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ */
+ if (size < PMD_SIZE && !IS_ENABLED(CONFIG_HIGHPTE))
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
+ else if (size < PUD_SIZE || CONFIG_PGTABLE_LEVELS == 2)
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
+ else if (size < P4D_SIZE || CONFIG_PGTABLE_LEVELS == 3)
+ return pud_lockptr(mm, (pud_t *) pte);
return &mm->page_table_lock;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b100df8cb5857..f6c7fe8f5746f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2926,6 +2926,24 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline struct page *ptep_pgtable_page(pte_t *pte)
+{
+ unsigned long mask = ~(PTRS_PER_PTE * sizeof(pte_t) - 1);
+
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ return virt_to_page((void *)((unsigned long)pte & mask));
+}
+
+static inline struct ptdesc *ptep_ptdesc(pte_t *pte)
+{
+ return page_ptdesc(ptep_pgtable_page(pte));
+}
+
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return ptlock_ptr(ptep_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2950,6 +2968,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
--
2.45.2
The fence lock is part of the queue, therefore in the current design
anything locking the fence should then also hold a ref to the queue to
prevent the queue from being freed.
However, currently it looks like we signal the fence and then drop the
queue ref, but if something is waiting on the fence, the waiter is
kicked to wake up at some later point, where upon waking up it first
grabs the lock before checking the fence state. But if we have already
dropped the queue ref, then the lock might already be freed as part of
the queue, leading to uaf.
To prevent this, move the fence lock into the fence itself so we don't
run into lifetime issues. Alternative might be to have device level
lock, or only release the queue in the fence release callback, however
that might require pushing to another worker to avoid locking issues.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2454
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2342
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2020
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_exec_queue.c | 1 -
drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 --
drivers/gpu/drm/xe/xe_preempt_fence.c | 3 ++-
drivers/gpu/drm/xe/xe_preempt_fence_types.h | 2 ++
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 971e1234b8ea..0f610d273fb6 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -614,7 +614,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
if (xe_vm_in_preempt_fence_mode(vm)) {
q->lr.context = dma_fence_context_alloc(1);
- spin_lock_init(&q->lr.lock);
err = xe_vm_add_compute_exec_queue(vm, q);
if (XE_IOCTL_DBG(xe, err))
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 1408b02eea53..fc2a1a20b7e4 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -126,8 +126,6 @@ struct xe_exec_queue {
u32 seqno;
/** @lr.link: link into VM's list of exec queues */
struct list_head link;
- /** @lr.lock: preemption fences lock */
- spinlock_t lock;
} lr;
/** @ops: submission backend exec queue operations */
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c
index 56e709d2fb30..83fbeea5aa20 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence.c
+++ b/drivers/gpu/drm/xe/xe_preempt_fence.c
@@ -134,8 +134,9 @@ xe_preempt_fence_arm(struct xe_preempt_fence *pfence, struct xe_exec_queue *q,
{
list_del_init(&pfence->link);
pfence->q = xe_exec_queue_get(q);
+ spin_lock_init(&pfence->lock);
dma_fence_init(&pfence->base, &preempt_fence_ops,
- &q->lr.lock, context, seqno);
+ &pfence->lock, context, seqno);
return &pfence->base;
}
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence_types.h b/drivers/gpu/drm/xe/xe_preempt_fence_types.h
index b54b5c29b533..312c3372a49f 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence_types.h
+++ b/drivers/gpu/drm/xe/xe_preempt_fence_types.h
@@ -25,6 +25,8 @@ struct xe_preempt_fence {
struct xe_exec_queue *q;
/** @preempt_work: work struct which issues preemption */
struct work_struct preempt_work;
+ /** @lock: dma-fence fence lock */
+ spinlock_t lock;
/** @error: preempt fence is in error state */
int error;
};
--
2.46.0
[ I'm sorry for the noise if you get this patch 2x ]
6.9 moved client RPC calls to namespace in "Make nfs stats visible in
network NS" patchet.
https://lore.kernel.org/linux-nfs/cover.1708026931.git.josef@toxicpanda.com/
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
Changes v1->v2:
* Point out whole patchset, not just single commit
* Add a comment about the patchset
Hi all,
could you please ack this so that we have fixed mainline?
FYI Some parts has been backported, e.g.:
d47151b79e322 ("nfs: expose /proc/net/sunrpc/nfs in net namespaces")
to all stable/LTS: 5.4.276, 5.10.217, 5.15.159, 6.1.91, 6.6.31.
But most of that is not yet (but planned to be backported), e.g.
93483ac5fec62 ("nfsd: expose /proc/net/sunrpc/nfsd in net namespaces")
see Chuck's patchset for 6.6
https://lore.kernel.org/linux-nfs/20240812223604.32592-1-cel@kernel.org/
Once all kernels up to 5.4 fixed we should update the version.
Kind regards,
Petr
testcases/network/nfs/nfsstat01/nfsstat01.sh | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/testcases/network/nfs/nfsstat01/nfsstat01.sh b/testcases/network/nfs/nfsstat01/nfsstat01.sh
index c2856eff1f..1beecbec43 100755
--- a/testcases/network/nfs/nfsstat01/nfsstat01.sh
+++ b/testcases/network/nfs/nfsstat01/nfsstat01.sh
@@ -15,7 +15,14 @@ get_calls()
local calls opt
[ "$name" = "rpc" ] && opt="r" || opt="n"
- ! tst_net_use_netns && [ "$nfs_f" != "nfs" ] && type="rhost"
+
+ if tst_net_use_netns; then
+ # "Make nfs stats visible in network NS" patchet
+ # https://lore.kernel.org/linux-nfs/cover.1708026931.git.josef@toxicpanda.com/
+ tst_kvcmp -ge "6.9" && [ "$nfs_f" = "nfs" ] && type="rhost"
+ else
+ [ "$nfs_f" != "nfs" ] && type="rhost"
+ fi
if [ "$type" = "lhost" ]; then
calls="$(grep $name /proc/net/rpc/$nfs_f | cut -d' ' -f$field)"
--
2.45.2
The conversion of system address to physical memory address (as viewed by
the memory controller) by igen6_edac is incorrect when the system address
is above the TOM (Total amount Of populated physical Memory) for Elkhart
Lake and Ice Lake (Neural Network Processor). Fix this conversion.
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo(a)intel.com>
---
drivers/edac/igen6_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 0fe75eed8973..189a2fc29e74 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -316,7 +316,7 @@ static u64 ehl_err_addr_to_imc_addr(u64 eaddr, int mc)
if (igen6_tom <= _4GB)
return eaddr + igen6_tolud - _4GB;
- if (eaddr < _4GB)
+ if (eaddr >= igen6_tom)
return eaddr + igen6_tolud - igen6_tom;
return eaddr;
--
2.17.1
From: Mahesh Salgaonkar <mahesh(a)linux.ibm.com>
nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel
crash when invoked during real mode interrupt handling (e.g. early HMI/MCE
interrupt handler) if percpu allocation comes from vmalloc area.
Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI()
wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when
percpu allocation is from the embedded first chunk. However with
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu
allocation can come from the vmalloc area.
With kernel command line "percpu_alloc=page" we can force percpu allocation
to come from vmalloc area and can see kernel crash in machine_check_early:
[ 1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110
[ 1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0
[ 1.215719] --- interrupt: 200
[ 1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable)
[ 1.215722] [c000000fffd731b0] [0000000000000000] 0x0
[ 1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8
Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu
first chunk is not embedded.
CVE-2024-42126
Cc: stable(a)vger.kernel.org#5.15.x
Cc: gregkh(a)linuxfoundation.org
Reviewed-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Tested-by: Shirisha Ganta <shirisha(a)linux.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com
[ Conflicts in arch/powerpc/include/asm/interrupt.h
because interrupt_nmi_enter_prepare() and interrupt_nmi_exit_prepare()
has been refactored. ]
Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com>
---
arch/powerpc/include/asm/interrupt.h | 14 ++++++++++----
arch/powerpc/include/asm/percpu.h | 10 ++++++++++
arch/powerpc/kernel/setup_64.c | 2 ++
3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index e592e65e7665..49285b147afe 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -285,18 +285,24 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
/*
* Do not use nmi_enter() for pseries hash guest taking a real-mode
* NMI because not everything it touches is within the RMA limit.
+ *
+ * Likewise, do not use it in real mode if percpu first chunk is not
+ * embedded. With CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there
+ * are chances where percpu allocation can come from vmalloc area.
*/
- if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
+ if ((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
!firmware_has_feature(FW_FEATURE_LPAR) ||
- radix_enabled() || (mfmsr() & MSR_DR))
+ radix_enabled() || (mfmsr() & MSR_DR)) &&
+ !percpu_first_chunk_is_paged)
nmi_enter();
}
static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
{
- if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
+ if ((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
!firmware_has_feature(FW_FEATURE_LPAR) ||
- radix_enabled() || (mfmsr() & MSR_DR))
+ radix_enabled() || (mfmsr() & MSR_DR)) &&
+ !percpu_first_chunk_is_paged)
nmi_exit();
/*
diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
index 8e5b7d0b851c..634970ce13c6 100644
--- a/arch/powerpc/include/asm/percpu.h
+++ b/arch/powerpc/include/asm/percpu.h
@@ -15,6 +15,16 @@
#endif /* CONFIG_SMP */
#endif /* __powerpc64__ */
+#if defined(CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK) && defined(CONFIG_SMP)
+#include <linux/jump_label.h>
+DECLARE_STATIC_KEY_FALSE(__percpu_first_chunk_is_paged);
+
+#define percpu_first_chunk_is_paged \
+ (static_key_enabled(&__percpu_first_chunk_is_paged.key))
+#else
+#define percpu_first_chunk_is_paged false
+#endif /* CONFIG_PPC64 && CONFIG_SMP */
+
#include <asm-generic/percpu.h>
#include <asm/paca.h>
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index eaa79a0996d1..37d5683ab298 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -825,6 +825,7 @@ static int pcpu_cpu_distance(unsigned int from, unsigned int to)
unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
EXPORT_SYMBOL(__per_cpu_offset);
+DEFINE_STATIC_KEY_FALSE(__percpu_first_chunk_is_paged);
static void __init pcpu_populate_pte(unsigned long addr)
{
@@ -904,6 +905,7 @@ void __init setup_per_cpu_areas(void)
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
+ static_key_enable(&__percpu_first_chunk_is_paged.key);
delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
for_each_possible_cpu(cpu) {
__per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
--
2.34.1
If formatting a suspended disk (such as formatting with different DIF
type), the disk will be resuming first, and then the format command will
submit to the disk through SG_IO ioctl.
When the disk is processing the format command, the system does not submit
other commands to the disk. Therefore, the system attempts to suspend the
disk again and sends the SYNC CACHE command. However, the SYNC CACHE
command will fail because the disk is in the formatting process, which
will cause the runtime_status of the disk to error and it is difficult
for user to recover it. Error info like:
[ 669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
[ 670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[ 670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
[ 670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
To solve the issue, retry the command until format command is finished.
Signed-off-by: Yihang Li <liyihang9(a)huawei.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
---
drivers/scsi/sd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..5cd88a8eea73 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1823,6 +1823,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
(sshdr.asc == 0x74 && sshdr.ascq == 0x71)) /* drive is password locked */
/* this is no error here */
return 0;
+
+ /* retry if format in progress */
+ if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
+ return -EBUSY;
+
/*
* This drive doesn't support sync and there's not much
* we can do because this is called during shutdown
--
2.33.0
From: Peng Fan <peng.fan(a)nxp.com>
With "quiet" set in bootargs, there is power domain failure:
"imx93_power_domain 44462400.power-domain: pd_off timeout: name:
44462400.power-domain, stat: 4"
The current power on opertation takes ISO state as power on finished
flag, but it is wrong. Before powering on operation really finishes,
powering off comes and powering off will never finish because the last
powering on still not finishes, so the following powering off actually
not trigger hardware state machine to run. SSAR is the last step when
powering on a domain, so need to wait SSAR done when powering on.
Since EdgeLock Enclave(ELE) handshake is involved in the flow, enlarge
the waiting time to 10ms for both on and off to avoid timeout.
Cc: <Stable(a)vger.kernel.org>
Fixes: 0a0f7cc25d4a ("soc: imx: add i.MX93 SRC power domain driver")
Reviewed-by: Jacky Bai <ping.bai(a)nxp.com>
Signed-off-by: Peng Fan <peng.fan(a)nxp.com>
---
V2:
Add Fixes tag and Cc stable (Per Ulf's comment)
drivers/pmdomain/imx/imx93-pd.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pmdomain/imx/imx93-pd.c b/drivers/pmdomain/imx/imx93-pd.c
index 1e94b499c19b..d750a7dc58d2 100644
--- a/drivers/pmdomain/imx/imx93-pd.c
+++ b/drivers/pmdomain/imx/imx93-pd.c
@@ -20,6 +20,7 @@
#define FUNC_STAT_PSW_STAT_MASK BIT(0)
#define FUNC_STAT_RST_STAT_MASK BIT(2)
#define FUNC_STAT_ISO_STAT_MASK BIT(4)
+#define FUNC_STAT_SSAR_STAT_MASK BIT(8)
struct imx93_power_domain {
struct generic_pm_domain genpd;
@@ -50,7 +51,7 @@ static int imx93_pd_on(struct generic_pm_domain *genpd)
writel(val, addr + MIX_SLICE_SW_CTRL_OFF);
ret = readl_poll_timeout(addr + MIX_FUNC_STAT_OFF, val,
- !(val & FUNC_STAT_ISO_STAT_MASK), 1, 10000);
+ !(val & FUNC_STAT_SSAR_STAT_MASK), 1, 10000);
if (ret) {
dev_err(domain->dev, "pd_on timeout: name: %s, stat: %x\n", genpd->name, val);
return ret;
@@ -72,7 +73,7 @@ static int imx93_pd_off(struct generic_pm_domain *genpd)
writel(val, addr + MIX_SLICE_SW_CTRL_OFF);
ret = readl_poll_timeout(addr + MIX_FUNC_STAT_OFF, val,
- val & FUNC_STAT_PSW_STAT_MASK, 1, 1000);
+ val & FUNC_STAT_PSW_STAT_MASK, 1, 10000);
if (ret) {
dev_err(domain->dev, "pd_off timeout: name: %s, stat: %x\n", genpd->name, val);
return ret;
--
2.37.1
Hi Calum,
> Hi Petr,
> There are two sets of changes here, for NFS client, and NFS server.
> The NFS client changes have already been backported from v6.9 all the way to v5.4.
> Here, Chuck is discussing the NFS server changes (and others), which were not backported from v6.9 (actually, a few were, but only to v6.8).
Thanks for info! Now I'll see the patchset "Make nfsd stats visible in network
ns" [1]. kernelnewbies [2] starts with d98416cc2154 ("nfsd: rename
NFSD_NET_* to NFSD_STATS_*"), the others are probably some preparation commits.
Anyway, I'll update the patch with NFS server patchset.
Kind regards,
Petr
[1] https://lore.kernel.org/linux-nfs/cover.1706283433.git.josef@toxicpanda.com/
[2] https://kernelnewbies.org/Linux_6.9#File_systems
> Thanks,
> Calum.
> Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: Petr Vorel <pvorel(a)suse.cz>
> Sent: Wednesday, August 14, 2024 8:45:59 AM
> To: cel(a)kernel.org <cel(a)kernel.org>
> Cc: stable(a)vger.kernel.org <stable(a)vger.kernel.org>; linux-nfs(a)vger.kernel.org <linux-nfs(a)vger.kernel.org>; Sherry Yang <sherry.yang(a)oracle.com>; Calum Mackay <calum.mackay(a)oracle.com>; kernel-team(a)fb.com <kernel-team(a)fb.com>; Chuck Lever III <chuck.lever(a)oracle.com>; Cyril Hrubis <chrubis(a)suse.cz>; ltp(a)lists.linux.it <ltp(a)lists.linux.it>
> Subject: Re: [PATCH 6.6.y 00/12] Backport "make svc_stat per-net instead of global"
> Hi Chuck,
> > Following up on:
> > https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
> > Here is a backport series targeting origin/linux-6.6.y that closes
> > the information leak described in the above thread. It passes basic
> > NFSD regression testing.
> Thank you for handling this! The link above mentions that it was already
> backported to 5.4 and indeed I see at least d47151b79e322 ("nfs: expose
> /proc/net/sunrpc/nfs in net namespaces") is backported in 5.4, 5.10, 5.15, 6.1.
> And you're now preparing 6.6. Thus we can expect the behavior changed from
> 5.4 kernels.
> I wonder if we consider this as a fix, thus expect any kernel newer than 5.4
> should backport all these 12 patches.
> Or, whether we should relax and just check if version is higher than the one
> which got it in stable/LTS (e.g. >= 5.4.276 || >= 5.10.217 ...). The question is
> also if enterprise distros will take this patchset.
> BTW We have in LTP functionality which points as a hint to kernel fixes. But
> it's usually a single commit. I might need to list all.
> Kind regards,
> Petr
> > Review comments welcome.
> > Chuck Lever (2):
> > NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
> > NFSD: Fix frame size warning in svc_export_parse()
> > Josef Bacik (10):
> > sunrpc: don't change ->sv_stats if it doesn't exist
> > nfsd: stop setting ->pg_stats for unused stats
> > sunrpc: pass in the sv_stats struct through svc_create_pooled
> > sunrpc: remove ->pg_stats from svc_program
> > sunrpc: use the struct net as the svc proc private
> > nfsd: rename NFSD_NET_* to NFSD_STATS_*
> > nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
> > nfsd: make all of the nfsd stats per-network namespace
> > nfsd: remove nfsd_stats, make th_cnt a global counter
> > nfsd: make svc_stat per-network namespace instead of global
> > fs/lockd/svc.c | 3 --
> > fs/nfs/callback.c | 3 --
> > fs/nfsd/cache.h | 2 -
> > fs/nfsd/export.c | 32 ++++++++++----
> > fs/nfsd/export.h | 4 +-
> > fs/nfsd/netns.h | 25 +++++++++--
> > fs/nfsd/nfs4proc.c | 6 +--
> > fs/nfsd/nfs4state.c | 3 +-
> > fs/nfsd/nfscache.c | 40 ++++-------------
> > fs/nfsd/nfsctl.c | 16 +++----
> > fs/nfsd/nfsd.h | 1 +
> > fs/nfsd/nfsfh.c | 3 +-
> > fs/nfsd/nfssvc.c | 14 +++---
> > fs/nfsd/stats.c | 54 ++++++++++-------------
> > fs/nfsd/stats.h | 88 ++++++++++++++------------------------
> > fs/nfsd/vfs.c | 6 ++-
> > include/linux/sunrpc/svc.h | 5 ++-
> > net/sunrpc/stats.c | 2 +-
> > net/sunrpc/svc.c | 39 +++++++++++------
> > 19 files changed, 163 insertions(+), 183 deletions(-)
On 6/28/24 20:06, Rafael J. Wysocki wrote:
> On Fri, Jun 28, 2024 at 12:02 PM Christian Loehle
> <christian.loehle(a)arm.com> wrote:
>>
>> Hi all,
>> so my investigation into teo lead to the following fixes.
>>
>> 1/3:
>> As discussed the utilization threshold is too high while
>> there are benefits in certain workloads, there are quite a few
>> regressions, too. Revert the Util-awareness patch.
>> This in itself leads to regressions, but part of it can be offset
>> by the later patches.
>> See
>> https://lore.kernel.org/lkml/CAKfTPtA6ZzRR-zMN7sodOW+N_P+GqwNv4tGR+aMB5VXRT…
>> 2/3:
>> Remove the 'recent' intercept logic, see my findings in:
>> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
>> I haven't found a way to salvage this properly, so I removed it.
>> The regular intercept seems to decay fast enough to not need this, but
>> we could change it if that turns out that we need this to be faster in
>> ramp-up and decaying.
>> 3/3:
>> The rest of the intercept logic had issues, too.
>> See the commit.
>>
>> Happy for anyone to take a look and test as well.
>>
>> Some numbers for context, comparing:
>> - IO workload (intercept heavy).
>> - Timer workload very low utilization (check for deepest state)
>> - hackbench (high utilization)
>> - Geekbench 5 on Pixel6 (high utilization)
>> Tests 1 to 3 are on RK3399 with CONFIG_HZ=100.
>> target_residencies: 1, 900, 2000
>>
>> 1. IO workload, 5 runs, results sorted, in read IOPS.
>> fio --minimal --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1 | cut -d \; -f 8;
>>
>> teo fixed v2:
>> /dev/nvme0n1
>> [4599, 4658, 4692, 4694, 4720]
>> /dev/mmcblk2
>> [5700, 5730, 5735, 5747, 5977]
>> /dev/mmcblk1
>> [2052, 2054, 2066, 2067, 2073]
>>
>> teo mainline:
>> /dev/nvme0n1
>> [3793, 3825, 3846, 3865, 3964]
>> /dev/mmcblk2
>> [3831, 4110, 4154, 4203, 4228]
>> /dev/mmcblk1
>> [1559, 1564, 1596, 1611, 1618]
>>
>> menu:
>> /dev/nvme0n1
>> [2571, 2630, 2804, 2813, 2917]
>> /dev/mmcblk2
>> [4181, 4260, 5062, 5260, 5329]
>> /dev/mmcblk1
>> [1567, 1581, 1585, 1603, 1769]
>>
>>
>> 2. Timer workload (through IO for my convenience 😉 )
>> Results in read IOPS, fio same as above.
>> echo "0 2097152 zero" | dmsetup create dm-zeros
>> echo "0 2097152 delay /dev/mapper/dm-zeros 0 50" | dmsetup create dm-slow
>> (Each IO is delayed by timer of 50ms, should be mostly in state2, for 5s total)
>>
>> teo fixed v2:
>> idle_state time
>> 2.0 4.807025
>> -1.0 0.219766
>> 0.0 0.072007
>> 1.0 0.169570
>>
>> 3199 cpu_idle total
>> 38 cpu_idle_miss
>> 31 cpu_idle_miss above
>> 7 cpu_idle_miss below
>>
>> teo mainline:
>> idle_state time
>> 1.0 4.897942
>> -1.0 0.095375
>> 0.0 0.253581
>>
>> 3221 cpu_idle total
>> 1269 cpu_idle_miss
>> 22 cpu_idle_miss above
>> 1247 cpu_idle_miss below
>>
>> menu:
>> idle_state time
>> 2.0 4.295546
>> -1.0 0.234164
>> 1.0 0.356344
>> 0.0 0.401507
>>
>> 3421 cpu_idle total
>> 129 cpu_idle_miss
>> 52 cpu_idle_miss above
>> 77 cpu_idle_miss below
>>
>> Residencies:
>> teo mainline isn't in state2 at all, teo fixed is more in state2 than menu, but
>> both are in state2 the vast majority of the time as expected.
>>
>> tldr: overall teo fixed spends more time in state2 while having
>> fewer idle_miss than menu.
>> teo mainline was just way too aggressive at selecting shallow states.
>>
>> 3. Hackbench, 5 runs
>> for i in $(seq 0 4); do hackbench -l 100 -g 100 ; sleep 1; done
>>
>> teo fixed v2:
>> Time: 4.937
>> Time: 4.898
>> Time: 4.871
>> Time: 4.833
>> Time: 4.898
>>
>> teo mainline:
>> Time: 4.945
>> Time: 5.021
>> Time: 4.927
>> Time: 4.923
>> Time: 5.137
>>
>> menu:
>> Time: 4.964
>> Time: 4.847
>> Time: 4.914
>> Time: 4.841
>> Time: 4.800
>>
>> tldr: all comparable, teo mainline slightly worse
>>
>> 4. Geekbench 5 (multi-core) on Pixel6
>> (Device is cooled for each iteration separately)
>> teo mainline:
>> 3113, 3068, 3079
>> mean 3086.66
>>
>> teo revert util-awareness:
>> 2947, 2941, 2952
>> mean 2946.66 (-4.54%)
>>
>> teo fixed v2:
>> 3032, 3066, 3019
>> mean 3039 (-1.54%)
>>
>>
>> Changes since v2:
>> - Reworded commits according to Dietmar's comments
>> - Dropped the KTIME_MAX as hit part from 3/3 according to Dietmar's
>> remark.
>>
>> Changes since v1:
>> - Removed all non-fixes.
>> - Do a full revert of Util-awareness instead of increasing thresholds.
>> - Address Dietmar's comments.
>> https://lore.kernel.org/linux-kernel/20240606090050.327614-2-christian.loeh…
>>
>> Kind Regards,
>> Christian
>>
>> Christian Loehle (3):
>> Revert: "cpuidle: teo: Introduce util-awareness"
>> cpuidle: teo: Remove recent intercepts metric
>> cpuidle: teo: Don't count non-existent intercepts
>>
>> drivers/cpuidle/governors/teo.c | 189 +++++---------------------------
>> 1 file changed, 27 insertions(+), 162 deletions(-)
>>
>> --
>
> Patches [1-2/3] have been applied as 6.11 material.
>
> Patch [3/3] looks like it may be improved slightly, see my reply to that patch.
>
> Thanks!
Hi Rafael,
are you fine with this being backported to stable?
@stable
4b20b07ce72f cpuidle: teo: Don't count non-existent intercepts
449914398083 cpuidle: teo: Remove recent intercepts metric
0a2998fa48f0 Revert: "cpuidle: teo: Introduce util-awareness"
apply as-is to
linux-6.10.y
linux-6.6.y
for linux-6.1.y only 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
is relevant, I'll reply with a backport.
Check bc->bus_power_dev = dev_pm_domain_attach_by_name() return value using
IS_ERR_OR_NULL() instead of plain IS_ERR(), and fail if bc->bus_power_dev
is either error or NULL.
In case a power domain attached by dev_pm_domain_attach_by_name() is not
described in DT, dev_pm_domain_attach_by_name() returns NULL, which is
then used, which leads to NULL pointer dereference.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 1a1da28544fd ("soc: imx: imx8m-blk-ctrl: Defer probe if 'bus' genpd is not yet ready")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pmdomain/imx/imx8m-blk-ctrl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pmdomain/imx/imx8m-blk-ctrl.c b/drivers/pmdomain/imx/imx8m-blk-ctrl.c
index ca942d7929c2..d46fb5387148 100644
--- a/drivers/pmdomain/imx/imx8m-blk-ctrl.c
+++ b/drivers/pmdomain/imx/imx8m-blk-ctrl.c
@@ -212,7 +212,7 @@ static int imx8m_blk_ctrl_probe(struct platform_device *pdev)
return -ENOMEM;
bc->bus_power_dev = dev_pm_domain_attach_by_name(dev, "bus");
- if (IS_ERR(bc->bus_power_dev)) {
+ if (IS_ERR_OR_NULL(bc->bus_power_dev)) {
if (PTR_ERR(bc->bus_power_dev) == -ENODEV)
return dev_err_probe(dev, -EPROBE_DEFER,
"failed to attach power domain \"bus\"\n");
--
2.25.1