The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092028-purveyor-limpness-f224@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
763748b238ef ("btrfs: reduce amount of reserved metadata for delayed item insertion")
c9d02ab4b436 ("btrfs: set delayed item type when initializing it")
3bae13e9d42e ("btrfs: do not BUG_ON() on failure to reserve metadata for delayed item")
a176affe547c ("btrfs: assert that delayed item is a dir index item when adding it")
b7ef5f3a6f37 ("btrfs: loop only once over data sizes array when inserting an item batch")
086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible")
eb10d85ee77f ("btrfs: factor out the copying loop of dir items from log_dir_items()")
289cffcb0399 ("btrfs: remove no longer needed checks for NULL log context")
5a656c3628b2 ("btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool")
506650dcb3a7 ("btrfs: improve the batch insertion of delayed items")
bfaa324e9a80 ("btrfs: remove total_data_size variable in btrfs_batch_insert_items()")
bb385bedded3 ("btrfs: fix error handling in __btrfs_update_delayed_inode")
64d6b281ba4d ("btrfs: remove unnecessary check_parent_dirs_for_sync()")
3e6a86a193b0 ("btrfs: skip logging directories already logged when logging all parents")
ab12313a9f56 ("btrfs: avoid logging new ancestor inodes when logging new inode")
47d3db41e190 ("btrfs: fix race that makes inode logging fallback to transaction commit")
4d6221d7d831 ("btrfs: fix race that causes unnecessary logging of ancestor inodes")
5893dfb98f25 ("btrfs: refactor btrfs_drop_extents() to make it easier to extend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092027-census-monorail-5614@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
763748b238ef ("btrfs: reduce amount of reserved metadata for delayed item insertion")
c9d02ab4b436 ("btrfs: set delayed item type when initializing it")
3bae13e9d42e ("btrfs: do not BUG_ON() on failure to reserve metadata for delayed item")
a176affe547c ("btrfs: assert that delayed item is a dir index item when adding it")
b7ef5f3a6f37 ("btrfs: loop only once over data sizes array when inserting an item batch")
086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible")
eb10d85ee77f ("btrfs: factor out the copying loop of dir items from log_dir_items()")
289cffcb0399 ("btrfs: remove no longer needed checks for NULL log context")
5a656c3628b2 ("btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool")
506650dcb3a7 ("btrfs: improve the batch insertion of delayed items")
bfaa324e9a80 ("btrfs: remove total_data_size variable in btrfs_batch_insert_items()")
bb385bedded3 ("btrfs: fix error handling in __btrfs_update_delayed_inode")
64d6b281ba4d ("btrfs: remove unnecessary check_parent_dirs_for_sync()")
3e6a86a193b0 ("btrfs: skip logging directories already logged when logging all parents")
ab12313a9f56 ("btrfs: avoid logging new ancestor inodes when logging new inode")
47d3db41e190 ("btrfs: fix race that makes inode logging fallback to transaction commit")
4d6221d7d831 ("btrfs: fix race that causes unnecessary logging of ancestor inodes")
5893dfb98f25 ("btrfs: refactor btrfs_drop_extents() to make it easier to extend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092026-anemia-unwanted-1c2a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
763748b238ef ("btrfs: reduce amount of reserved metadata for delayed item insertion")
c9d02ab4b436 ("btrfs: set delayed item type when initializing it")
3bae13e9d42e ("btrfs: do not BUG_ON() on failure to reserve metadata for delayed item")
a176affe547c ("btrfs: assert that delayed item is a dir index item when adding it")
b7ef5f3a6f37 ("btrfs: loop only once over data sizes array when inserting an item batch")
086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible")
eb10d85ee77f ("btrfs: factor out the copying loop of dir items from log_dir_items()")
289cffcb0399 ("btrfs: remove no longer needed checks for NULL log context")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092025-unwell-virtual-9c97@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092024-maritime-smelting-37a5@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092013-slightly-hubcap-822b@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
487781796d30 ("btrfs: make fast fsyncs wait only for writeback")
75b463d2b47a ("btrfs: do not commit logs and transactions during link and rename operations")
260db43cd2f5 ("btrfs: delete duplicated words + other fixes in comments")
3ef64143a796 ("btrfs: remove no longer used trans_list member of struct btrfs_ordered_extent")
cd8d39f4aeb3 ("btrfs: remove no longer used log_list member of struct btrfs_ordered_extent")
7af597433d43 ("btrfs: make full fsyncs always operate on the entire file again")
0a8068a3dd42 ("btrfs: make ranged full fsyncs more efficient")
da447009a256 ("btrfs: factor out inode items copy loop from btrfs_log_inode()")
a5eeb3d17b97 ("btrfs: add helper to get the end offset of a file extent item")
95418ed1d107 ("btrfs: fix missing file extent item for hole after ranged fsync")
3f1c64ce0438 ("btrfs: delete the ordered isize update code")
41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
236ebc20d9af ("btrfs: fix log context list corruption after rename whiteout error")
b5e4ff9d465d ("Btrfs: fix infinite loop during fsync after rename operations")
0e56315ca147 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES")
9c7d3a548331 ("btrfs: move extent_io_tree defs to their own header")
6f0d04f8e72e ("btrfs: separate out the extent io init function")
33ca832fefa5 ("btrfs: separate out the extent leak code")
afd7a71872f1 ("Merge tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092012-congress-stubborn-56b5@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
487781796d30 ("btrfs: make fast fsyncs wait only for writeback")
75b463d2b47a ("btrfs: do not commit logs and transactions during link and rename operations")
260db43cd2f5 ("btrfs: delete duplicated words + other fixes in comments")
3ef64143a796 ("btrfs: remove no longer used trans_list member of struct btrfs_ordered_extent")
cd8d39f4aeb3 ("btrfs: remove no longer used log_list member of struct btrfs_ordered_extent")
7af597433d43 ("btrfs: make full fsyncs always operate on the entire file again")
0a8068a3dd42 ("btrfs: make ranged full fsyncs more efficient")
da447009a256 ("btrfs: factor out inode items copy loop from btrfs_log_inode()")
a5eeb3d17b97 ("btrfs: add helper to get the end offset of a file extent item")
95418ed1d107 ("btrfs: fix missing file extent item for hole after ranged fsync")
3f1c64ce0438 ("btrfs: delete the ordered isize update code")
41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
236ebc20d9af ("btrfs: fix log context list corruption after rename whiteout error")
b5e4ff9d465d ("Btrfs: fix infinite loop during fsync after rename operations")
0e56315ca147 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES")
9c7d3a548331 ("btrfs: move extent_io_tree defs to their own header")
6f0d04f8e72e ("btrfs: separate out the extent io init function")
33ca832fefa5 ("btrfs: separate out the extent leak code")
afd7a71872f1 ("Merge tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092011-resume-sneezing-595f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
487781796d30 ("btrfs: make fast fsyncs wait only for writeback")
75b463d2b47a ("btrfs: do not commit logs and transactions during link and rename operations")
260db43cd2f5 ("btrfs: delete duplicated words + other fixes in comments")
3ef64143a796 ("btrfs: remove no longer used trans_list member of struct btrfs_ordered_extent")
cd8d39f4aeb3 ("btrfs: remove no longer used log_list member of struct btrfs_ordered_extent")
7af597433d43 ("btrfs: make full fsyncs always operate on the entire file again")
0a8068a3dd42 ("btrfs: make ranged full fsyncs more efficient")
da447009a256 ("btrfs: factor out inode items copy loop from btrfs_log_inode()")
a5eeb3d17b97 ("btrfs: add helper to get the end offset of a file extent item")
95418ed1d107 ("btrfs: fix missing file extent item for hole after ranged fsync")
3f1c64ce0438 ("btrfs: delete the ordered isize update code")
41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
236ebc20d9af ("btrfs: fix log context list corruption after rename whiteout error")
b5e4ff9d465d ("Btrfs: fix infinite loop during fsync after rename operations")
0e56315ca147 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES")
9c7d3a548331 ("btrfs: move extent_io_tree defs to their own header")
6f0d04f8e72e ("btrfs: separate out the extent io init function")
33ca832fefa5 ("btrfs: separate out the extent leak code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ee34a82e890a7babb5585daf1a6dd7d4d1cf142a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092056-manifesto-shame-79c2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ee34a82e890a ("btrfs: release path before inode lookup during the ino lookup ioctl")
0202e83fdab0 ("btrfs: simplify iget helpers")
c75e839414d3 ("btrfs: kill the subvol_srcu")
5c8fd99fec9d ("btrfs: make inodes hold a ref on their roots")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
0024652895e3 ("btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root")
bd647ce385ec ("btrfs: add a leak check for roots")
8260edba67a2 ("btrfs: make the init of static elements in fs_info separate")
ae18c37ad5a1 ("btrfs: move fs_info init work into it's own helper function")
141386e1a5d6 ("btrfs: free more things in btrfs_free_fs_info")
bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root")
81f096edf047 ("btrfs: use btrfs_put_fs_root to free roots always")
0d4b0463011d ("btrfs: export and rename free_fs_info")
fbb0ce40d606 ("btrfs: hold a ref on the root in btrfs_check_uuid_tree_entry")
ca2037fba6af ("btrfs: hold a ref on the root in btrfs_recover_log_trees")
5119cfc36f6d ("btrfs: hold a ref on the root in create_pending_snapshot")
5168489a079a ("btrfs: hold a ref on the root in get_subvol_name_from_objectid")
6f9a3da5da9e ("btrfs: hold a ref on the root in btrfs_ioctl_send")
fd79d43b347e ("btrfs: hold a ref on the root in scrub_print_warning_inode")
0b2dee5cff74 ("btrfs: hold a ref for the root in btrfs_find_orphan_roots")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee34a82e890a7babb5585daf1a6dd7d4d1cf142a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Sat, 26 Aug 2023 11:28:20 +0100
Subject: [PATCH] btrfs: release path before inode lookup during the ino lookup
ioctl
During the ino lookup ioctl we can end up calling btrfs_iget() to get an
inode reference while we are holding on a root's btree. If btrfs_iget()
needs to lookup the inode from the root's btree, because it's not
currently loaded in memory, then it will need to lock another or the
same path in the same root btree. This may result in a deadlock and
trigger the following lockdep splat:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Not tainted
------------------------------------------------------
syz-executor277/5012 is trying to acquire lock:
ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
but task is already holding lock:
ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-tree-00){++++}-{3:3}:
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302
btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955
btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline]
btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338
btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline]
open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494
btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154
btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
fc_mount fs/namespace.c:1112 [inline]
vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142
btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (btrfs-tree-01){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(btrfs-tree-00);
lock(btrfs-tree-01);
lock(btrfs-tree-00);
rlock(btrfs-tree-01);
*** DEADLOCK ***
1 lock held by syz-executor277/5012:
#0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
stack backtrace:
CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0bec94ea39
Fix this simply by releasing the path before calling btrfs_iget() as at
point we don't need the path anymore.
Reported-by: syzbot+bf66ad948981797d2f1d(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000045fa140603c4a969@google.com/
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..d27b0d86b8e2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1958,6 +1958,13 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
+ /*
+ * We don't need the path anymore, so release it and
+ * avoid deadlocks and lockdep warnings in case
+ * btrfs_iget() needs to lookup the inode from its root
+ * btree and lock the same leaf.
+ */
+ btrfs_release_path(path);
temp_inode = btrfs_iget(sb, key2.objectid, root);
if (IS_ERR(temp_inode)) {
ret = PTR_ERR(temp_inode);
@@ -1978,7 +1985,6 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
- btrfs_release_path(path);
key.objectid = key.offset;
key.offset = (u64)-1;
dirid = key.objectid;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ee34a82e890a7babb5585daf1a6dd7d4d1cf142a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092054-antibody-prenatal-dc3c@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ee34a82e890a ("btrfs: release path before inode lookup during the ino lookup ioctl")
0202e83fdab0 ("btrfs: simplify iget helpers")
c75e839414d3 ("btrfs: kill the subvol_srcu")
5c8fd99fec9d ("btrfs: make inodes hold a ref on their roots")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
0024652895e3 ("btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root")
bd647ce385ec ("btrfs: add a leak check for roots")
8260edba67a2 ("btrfs: make the init of static elements in fs_info separate")
ae18c37ad5a1 ("btrfs: move fs_info init work into it's own helper function")
141386e1a5d6 ("btrfs: free more things in btrfs_free_fs_info")
bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root")
81f096edf047 ("btrfs: use btrfs_put_fs_root to free roots always")
0d4b0463011d ("btrfs: export and rename free_fs_info")
fbb0ce40d606 ("btrfs: hold a ref on the root in btrfs_check_uuid_tree_entry")
ca2037fba6af ("btrfs: hold a ref on the root in btrfs_recover_log_trees")
5119cfc36f6d ("btrfs: hold a ref on the root in create_pending_snapshot")
5168489a079a ("btrfs: hold a ref on the root in get_subvol_name_from_objectid")
6f9a3da5da9e ("btrfs: hold a ref on the root in btrfs_ioctl_send")
fd79d43b347e ("btrfs: hold a ref on the root in scrub_print_warning_inode")
0b2dee5cff74 ("btrfs: hold a ref for the root in btrfs_find_orphan_roots")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee34a82e890a7babb5585daf1a6dd7d4d1cf142a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Sat, 26 Aug 2023 11:28:20 +0100
Subject: [PATCH] btrfs: release path before inode lookup during the ino lookup
ioctl
During the ino lookup ioctl we can end up calling btrfs_iget() to get an
inode reference while we are holding on a root's btree. If btrfs_iget()
needs to lookup the inode from the root's btree, because it's not
currently loaded in memory, then it will need to lock another or the
same path in the same root btree. This may result in a deadlock and
trigger the following lockdep splat:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Not tainted
------------------------------------------------------
syz-executor277/5012 is trying to acquire lock:
ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
but task is already holding lock:
ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-tree-00){++++}-{3:3}:
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302
btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955
btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline]
btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338
btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline]
open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494
btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154
btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
fc_mount fs/namespace.c:1112 [inline]
vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142
btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (btrfs-tree-01){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(btrfs-tree-00);
lock(btrfs-tree-01);
lock(btrfs-tree-00);
rlock(btrfs-tree-01);
*** DEADLOCK ***
1 lock held by syz-executor277/5012:
#0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
stack backtrace:
CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0bec94ea39
Fix this simply by releasing the path before calling btrfs_iget() as at
point we don't need the path anymore.
Reported-by: syzbot+bf66ad948981797d2f1d(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000045fa140603c4a969@google.com/
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..d27b0d86b8e2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1958,6 +1958,13 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
+ /*
+ * We don't need the path anymore, so release it and
+ * avoid deadlocks and lockdep warnings in case
+ * btrfs_iget() needs to lookup the inode from its root
+ * btree and lock the same leaf.
+ */
+ btrfs_release_path(path);
temp_inode = btrfs_iget(sb, key2.objectid, root);
if (IS_ERR(temp_inode)) {
ret = PTR_ERR(temp_inode);
@@ -1978,7 +1985,6 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
- btrfs_release_path(path);
key.objectid = key.offset;
key.offset = (u64)-1;
dirid = key.objectid;
Dropped in August:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…
Only re-applied and build tested against v6.1.53.
If they do apply and test well against linux-5.{4,10,15}, all the better.
Florian Westphal (1):
netfilter: nf_tables: don't skip expired elements during walk
Pablo Neira Ayuso (4):
netfilter: nf_tables: GC transaction API to avoid race with control
plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from
packet path
netfilter: nf_tables: remove busy mark and gc batch API
include/net/netfilter/nf_tables.h | 120 +++++-------
net/netfilter/nf_tables_api.c | 307 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 85 +++++----
net/netfilter/nft_set_pipapo.c | 66 +++++--
net/netfilter/nft_set_rbtree.c | 146 ++++++++------
5 files changed, 476 insertions(+), 248 deletions(-)
--
2.42.0.459.ge4e396fd5e-goog
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: cff9b2332ab762b7e0586c793c431a8f2ea4db04
Gitweb: https://git.kernel.org/tip/cff9b2332ab762b7e0586c793c431a8f2ea4db04
Author: Liam R. Howlett <Liam.Howlett(a)oracle.com>
AuthorDate: Fri, 15 Sep 2023 13:44:44 -04:00
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Tue, 19 Sep 2023 10:48:04 +02:00
kernel/sched: Modify initial boot task idle setup
Initial booting is setting the task flag to idle (PF_IDLE) by the call
path sched_init() -> init_idle(). Having the task idle and calling
call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
set. Subsequent calls to any cond_resched() will enable IRQs,
potentially earlier than the IRQ setup has completed. Recent changes
have caused just this scenario and IRQs have been enabled early.
This causes a warning later in start_kernel() as interrupts are enabled
before they are fully set up.
Fix this issue by setting the PF_IDLE flag later in the boot sequence.
Although the boot task was marked as idle since (at least) d80e4fda576d,
I am not sure that it is wrong to do so. The forced context-switch on
idle task was introduced in the tiny_rcu update, so I'm going to claim
this fixes 5f6130fa52ee.
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+Zo…
---
kernel/sched/core.c | 2 +-
kernel/sched/idle.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2299a5c..802551e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
* PF_KTHREAD should already be set at this point; regardless, make it
* look like a proper per-CPU kthread.
*/
- idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
+ idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
kthread_set_per_cpu(idle, cpu);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 342f58a..5007b25 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -373,6 +373,7 @@ EXPORT_SYMBOL_GPL(play_idle_precise);
void cpu_startup_entry(enum cpuhp_state state)
{
+ current->flags |= PF_IDLE;
arch_cpu_idle_prepare();
cpuhp_online_idle(state);
while (1)
From: Marek Vasut <marex(a)denx.de>
[ Upstream commit 362fa8f6e6a05089872809f4465bab9d011d05b3 ]
This bridge seems to need the HSE packet, otherwise the image is
shifted up and corrupted at the bottom. This makes the bridge
work with Samsung DSIM on i.MX8MM and i.MX8MP.
Signed-off-by: Marek Vasut <marex(a)denx.de>
Reviewed-by: Sam Ravnborg <sam(a)ravnborg.org>
Signed-off-by: Robert Foss <rfoss(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615201902.566182-3-marex…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/bridge/tc358762.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/tc358762.c b/drivers/gpu/drm/bridge/tc358762.c
index 5641395fd310e..11445c50956e1 100644
--- a/drivers/gpu/drm/bridge/tc358762.c
+++ b/drivers/gpu/drm/bridge/tc358762.c
@@ -231,7 +231,7 @@ static int tc358762_probe(struct mipi_dsi_device *dsi)
dsi->lanes = 1;
dsi->format = MIPI_DSI_FMT_RGB888;
dsi->mode_flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
- MIPI_DSI_MODE_LPM;
+ MIPI_DSI_MODE_LPM | MIPI_DSI_MODE_VIDEO_HSE;
ret = tc358762_parse_dt(ctx);
if (ret < 0)
--
2.40.1
The folio conversion changed the behaviour of shmem_sg_alloc_table() to
put the entire length of the last folio into the sg list, even if the sg
list should have been shorter. gen8_ggtt_insert_entries() relied on the
list being the right langth and would overrun the end of the page tables.
Other functions may also have been affected.
Clamp the length of the last entry in the sg list to be the expected
length.
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: stable(a)vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+ unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
}
} while (1);
+ nr_pages = min_t(unsigned long,
+ folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
sg = sg_next(sg);
st->nents++;
- sg_set_folio(sg, folio, folio_size(folio), 0);
+ sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
- sg->length += folio_size(folio);
+ sg->length += nr_pages * PAGE_SIZE;
}
- next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- i += folio_nr_pages(folio) - 1;
+ next_pfn = folio_pfn(folio) + nr_pages;
+ i += nr_pages - 1;
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);
--
2.40.1
From: Kent Overstreet <kent.overstreet(a)gmail.com>
commit 3644e2d2dda78e21edd8f5415b6d7ab03f5f54f3 upstream.
If iter->count is 0 and iocb->ki_pos is page aligned, this causes
nr_pages to be 0.
Then in generic_file_buffered_read_get_pages() find_get_pages_contig()
returns 0 - because we asked for 0 pages, so we call
generic_file_buffered_read_no_cached_page() which attempts to add a page
to the page cache, which fails with -EEXIST, and then we loop. Oops...
Signed-off-by: Kent Overstreet <kent.overstreet(a)gmail.com>
Reported-by: Jens Axboe <axboe(a)kernel.dk>
Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
---
mm/filemap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 3a983bc1a71c..3b0d8c6dd587 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2203,6 +2203,9 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
return 0;
+ if (unlikely(!iov_iter_count(iter)))
+ return 0;
+
iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
index = *ppos >> PAGE_SHIFT;
--
2.34.1
The quilt patch titled
Subject: proc: nommu: /proc/<pid>/maps: release mmap read lock
has been removed from the -mm tree. Its filename was
proc-nommu-proc-pid-maps-release-mmap-read-lock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ben Wolsieffer <Ben.Wolsieffer(a)hefring.com>
Subject: proc: nommu: /proc/<pid>/maps: release mmap read lock
Date: Thu, 14 Sep 2023 12:30:20 -0400
The no-MMU implementation of /proc/<pid>/map doesn't normally release
the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
whether to release the lock. Since _vml is NULL when the end of the
mappings is reached, the lock is not released.
Reading /proc/1/maps twice doesn't cause a hang because it only
takes the read lock, which can be taken multiple times and therefore
doesn't show any problem if the lock isn't released. Instead, you need
to perform some operation that attempts to take the write lock after
reading /proc/<pid>/maps. To actually reproduce the bug, compile the
following code as 'proc_maps_bug':
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
int main(int argc, char *argv[]) {
void *buf;
sleep(1);
buf = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
puts("mmap returned");
return 0;
}
Then, run:
./proc_maps_bug &; cat /proc/$!/maps; fg
Without this patch, mmap() will hang and the command will never
complete.
This code was incorrectly adapted from the MMU implementation, which at
the time released the lock in m_next() before returning the last entry.
The MMU implementation has diverged further from the no-MMU version since
then, so this patch brings their locking and error handling into sync,
fixing the bug and hopefully avoiding similar issues in the future.
Link: https://lkml.kernel.org/r/20230914163019.4050530-2-ben.wolsieffer@hefring.c…
Fixes: 47fecca15c09 ("fs/proc/task_nommu.c: don't use priv->task->mm")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer(a)hefring.com>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Giulio Benetti <giulio.benetti(a)benettiengineering.com>
Cc: Greg Ungerer <gerg(a)uclinux.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_nommu.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
--- a/fs/proc/task_nommu.c~proc-nommu-proc-pid-maps-release-mmap-read-lock
+++ a/fs/proc/task_nommu.c
@@ -192,11 +192,16 @@ static void *m_start(struct seq_file *m,
return ERR_PTR(-ESRCH);
mm = priv->mm;
- if (!mm || !mmget_not_zero(mm))
+ if (!mm || !mmget_not_zero(mm)) {
+ put_task_struct(priv->task);
+ priv->task = NULL;
return NULL;
+ }
if (mmap_read_lock_killable(mm)) {
mmput(mm);
+ put_task_struct(priv->task);
+ priv->task = NULL;
return ERR_PTR(-EINTR);
}
@@ -205,23 +210,21 @@ static void *m_start(struct seq_file *m,
if (vma)
return vma;
- mmap_read_unlock(mm);
- mmput(mm);
return NULL;
}
-static void m_stop(struct seq_file *m, void *_vml)
+static void m_stop(struct seq_file *m, void *v)
{
struct proc_maps_private *priv = m->private;
+ struct mm_struct *mm = priv->mm;
- if (!IS_ERR_OR_NULL(_vml)) {
- mmap_read_unlock(priv->mm);
- mmput(priv->mm);
- }
- if (priv->task) {
- put_task_struct(priv->task);
- priv->task = NULL;
- }
+ if (!priv->task)
+ return;
+
+ mmap_read_unlock(mm);
+ mmput(mm);
+ put_task_struct(priv->task);
+ priv->task = NULL;
}
static void *m_next(struct seq_file *m, void *_p, loff_t *pos)
_
Patches currently in -mm which might be from Ben.Wolsieffer(a)hefring.com are
The quilt patch titled
Subject: mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-cma-and-highatomic-landing-on-the-wrong-buddy-list.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
Date: Mon, 11 Sep 2023 14:11:08 -0400
Commit 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
bypasses the pcplist on lock contention and returns the page directly to
the buddy list of the page's migratetype.
For pages that don't have their own pcplist, such as CMA and HIGHATOMIC,
the migratetype is temporarily updated such that the page can hitch a ride
on the MOVABLE pcplist. Their true type is later reassessed when flushing
in free_pcppages_bulk(). However, when lock contention is detected after
the type was already overridden, the bypass will then put the page on the
wrong buddy list.
Once on the MOVABLE buddy list, the page becomes eligible for fallbacks
and even stealing. In the case of HIGHATOMIC, otherwise ineligible
allocations can dip into the highatomic reserves. In the case of CMA, the
page can be lost from the CMA region permanently.
Use a separate pcpmigratetype variable for the pcplist override. Use the
original migratetype when going directly to the buddy. This fixes the bug
and should make the intentions more obvious in the code.
Originally sent here to address the HIGHATOMIC case:
https://lore.kernel.org/lkml/20230821183733.106619-4-hannes@cmpxchg.org/
Changelog updated in response to the CMA-specific bug report.
[mgorman(a)techsingularity.net: updated changelog]
Link: https://lkml.kernel.org/r/20230911181108.GA104295@cmpxchg.org
Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Joe Liu <joe.liu(a)mediatek.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-cma-and-highatomic-landing-on-the-wrong-buddy-list
+++ a/mm/page_alloc.c
@@ -2400,7 +2400,7 @@ void free_unref_page(struct page *page,
struct per_cpu_pages *pcp;
struct zone *zone;
unsigned long pfn = page_to_pfn(page);
- int migratetype;
+ int migratetype, pcpmigratetype;
if (!free_unref_page_prepare(page, pfn, order))
return;
@@ -2408,24 +2408,24 @@ void free_unref_page(struct page *page,
/*
* We only track unmovable, reclaimable and movable on pcp lists.
* Place ISOLATE pages on the isolated list because they are being
- * offlined but treat HIGHATOMIC as movable pages so we can get those
- * areas back if necessary. Otherwise, we may have to free
+ * offlined but treat HIGHATOMIC and CMA as movable pages so we can
+ * get those areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
*/
- migratetype = get_pcppage_migratetype(page);
+ migratetype = pcpmigratetype = get_pcppage_migratetype(page);
if (unlikely(migratetype >= MIGRATE_PCPTYPES)) {
if (unlikely(is_migrate_isolate(migratetype))) {
free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
return;
}
- migratetype = MIGRATE_MOVABLE;
+ pcpmigratetype = MIGRATE_MOVABLE;
}
zone = page_zone(page);
pcp_trylock_prepare(UP_flags);
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (pcp) {
- free_unref_page_commit(zone, pcp, page, migratetype, order);
+ free_unref_page_commit(zone, pcp, page, pcpmigratetype, order);
pcp_spin_unlock(pcp);
} else {
free_one_page(zone, page, pfn, order, migratetype, FPI_NONE);
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
The patch titled
Subject: i915: limit the length of an sg list to the requested length
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
i915-limit-the-length-of-an-sg-list-to-the-requested-length.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: i915: limit the length of an sg list to the requested length
Date: Tue, 19 Sep 2023 20:48:55 +0100
The folio conversion changed the behaviour of shmem_sg_alloc_table() to
put the entire length of the last folio into the sg list, even if the sg
list should have been shorter. gen8_ggtt_insert_entries() relied on the
list being the right langth and would overrun the end of the page tables.
Other functions may also have been affected.
Clamp the length of the last entry in the sg list to be the expected
length.
Link: https://lkml.kernel.org/r/20230919194855.347582-1-willy@infradead.org
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Closes: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Tested-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [6.5.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c~i915-limit-the-length-of-an-sg-list-to-the-requested-length
+++ a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+ unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915
}
} while (1);
+ nr_pages = min_t(unsigned long,
+ folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915
sg = sg_next(sg);
st->nents++;
- sg_set_folio(sg, folio, folio_size(folio), 0);
+ sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
- sg->length += folio_size(folio);
+ sg->length += nr_pages * PAGE_SIZE;
}
- next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- i += folio_nr_pages(folio) - 1;
+ next_pfn = folio_pfn(folio) + nr_pages;
+ i += nr_pages - 1;
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);
_
Patches currently in -mm which might be from willy(a)infradead.org are
i915-limit-the-length-of-an-sg-list-to-the-requested-length.patch
mm-convert-dax-lock-unlock-page-to-lock-unlock-folio.patch
buffer-pass-gfp-flags-to-folio_alloc_buffers.patch
buffer-hoist-gfp-flags-from-grow_dev_page-to-__getblk_gfp.patch
ext4-use-bdev_getblk-to-avoid-memory-reclaim-in-readahead-path.patch
buffer-use-bdev_getblk-to-avoid-memory-reclaim-in-readahead-path.patch
buffer-convert-getblk_unmovable-and-__getblk-to-use-bdev_getblk.patch
buffer-convert-sb_getblk-to-call-__getblk.patch
ext4-call-bdev_getblk-from-sb_getblk_gfp.patch
buffer-remove-__getblk_gfp.patch
hugetlb-use-a-folio-in-free_hpage_workfn.patch
hugetlb-remove-a-few-calls-to-page_folio.patch
hugetlb-convert-remove_pool_huge_page-to-remove_pool_hugetlb_folio.patch
(Re-sending email since the previous email was undeliverable due to HTML content)
Hi Team,
This is a request to backport the following fix to 6.5/scsi-fixes. This was merged into Linus' tree.
This fix fixes a crash due to a null pointer exception when a lun reset is issued from sgreset for a lun.
With this fix, there is no longer a crash.
I have another fix, which I have tested, dependent on this fix. It is currently in the pipeline.
I'll send out a patch for that fix when the internal review is complete.
Please let me know if you need any more information to backport this fix.
commit 15924b0503630016dee4dbb945a8df4df659070b
Author: Karan Tilak Kumar kartilak(a)cisco.com
Date: Thu Aug 17 11:21:46 2023 -0700
scsi: fnic: Replace sgreset tag with max_tag_id
sgreset is issued with a SCSI command pointer. The device reset code
assumes that it was issued on a hardware queue, and calls block multiqueue
layer. However, the assumption is broken, and there is no hardware queue
associated with the sgreset, and this leads to a crash due to a null
pointer exception.
Fix the code to use the max_tag_id as a tag which does not overlap with the
other tags issued by mid layer.
Tested by running FC traffic for a few minutes, and by issuing sgreset on
the device in parallel. Without the fix, the crash is observed right away.
With this fix, no crash is observed.
Reviewed-by: Sesidhar Baddela sebaddel(a)cisco.com
Tested-by: Karan Tilak Kumar kartilak(a)cisco.com
Signed-off-by: Karan Tilak Kumar kartilak(a)cisco.com
Link: https://lore.kernel.org/r/20230817182146.229059-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen martin.petersen(a)oracle.com
Thanks,
Karan
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 9cc0a598b944816f2968baf2631757f22721b996
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091616-equipment-bucktooth-6ae5@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
9cc0a598b944 ("mtd: rawnand: brcmnand: Fix potential false time out warning")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9cc0a598b944816f2968baf2631757f22721b996 Mon Sep 17 00:00:00 2001
From: William Zhang <william.zhang(a)broadcom.com>
Date: Thu, 6 Jul 2023 11:29:06 -0700
Subject: [PATCH] mtd: rawnand: brcmnand: Fix potential false time out warning
If system is busy during the command status polling function, the driver
may not get the chance to poll the status register till the end of time
out and return the premature status. Do a final check after time out
happens to ensure reading the correct status.
Fixes: 9d2ee0a60b8b ("mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-3-william.zhang@broa…
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 9ea96911d16b..9a373a10304d 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1080,6 +1080,14 @@ static int bcmnand_ctrl_poll_status(struct brcmnand_controller *ctrl,
cpu_relax();
} while (time_after(limit, jiffies));
+ /*
+ * do a final check after time out in case the CPU was busy and the driver
+ * did not get enough time to perform the polling to avoid false alarms
+ */
+ val = brcmnand_read_reg(ctrl, BRCMNAND_INTFC_STATUS);
+ if ((val & mask) == expected_val)
+ return 0;
+
dev_warn(ctrl->dev, "timeout on status poll (expected %x got %x)\n",
expected_val, val & mask);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5d53244186c9ac58cb88d76a0958ca55b83a15cd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091632-marauding-viable-2fad@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5d53244186c9 ("mtd: rawnand: brcmnand: Fix potential out-of-bounds access in oob write")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d53244186c9ac58cb88d76a0958ca55b83a15cd Mon Sep 17 00:00:00 2001
From: William Zhang <william.zhang(a)broadcom.com>
Date: Thu, 6 Jul 2023 11:29:08 -0700
Subject: [PATCH] mtd: rawnand: brcmnand: Fix potential out-of-bounds access in
oob write
When the oob buffer length is not in multiple of words, the oob write
function does out-of-bounds read on the oob source buffer at the last
iteration. Fix that by always checking length limit on the oob buffer
read and fill with 0xff when reaching the end of the buffer to the oob
registers.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-5-william.zhang@broa…
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index b2c6396060db..71d0ba652bee 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1477,19 +1477,33 @@ static int write_oob_to_regs(struct brcmnand_controller *ctrl, int i,
const u8 *oob, int sas, int sector_1k)
{
int tbytes = sas << sector_1k;
- int j;
+ int j, k = 0;
+ u32 last = 0xffffffff;
+ u8 *plast = (u8 *)&last;
/* Adjust OOB values for 1K sector size */
if (sector_1k && (i & 0x01))
tbytes = max(0, tbytes - (int)ctrl->max_oob);
tbytes = min_t(int, tbytes, ctrl->max_oob);
- for (j = 0; j < tbytes; j += 4)
+ /*
+ * tbytes may not be multiple of words. Make sure we don't read out of
+ * the boundary and stop at last word.
+ */
+ for (j = 0; (j + 3) < tbytes; j += 4)
oob_reg_write(ctrl, j,
(oob[j + 0] << 24) |
(oob[j + 1] << 16) |
(oob[j + 2] << 8) |
(oob[j + 3] << 0));
+
+ /* handle the remaing bytes */
+ while (j < tbytes)
+ plast[k++] = oob[j++];
+
+ if (tbytes & 0x3)
+ oob_reg_write(ctrl, (tbytes & ~0x3), (__force u32)cpu_to_be32(last));
+
return tbytes;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 57a943ebfcdb4a97fbb409640234bdb44bfa1953
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091617-deserve-animal-a57e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
57a943ebfcdb ("drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma")
5d945cbcd4b1 ("drm/amd/display: Create a file dedicated to planes")
60693e3a3890 ("Merge tag 'amd-drm-next-5.20-2022-07-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57a943ebfcdb4a97fbb409640234bdb44bfa1953 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Thu, 31 Aug 2023 15:12:28 -0100
Subject: [PATCH] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy
gamma
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma
using a pre-defined sRGB transfer function. It works fine for DCN2
family where degamma ROM and custom curves go to the same color block.
But, on DCN3+, degamma is split into two blocks: degamma ROM for
pre-defined TFs and `gamma correction` for user/custom curves and
degamma ROM settings doesn't apply to cursor plane. To get DRM legacy
gamma working as expected, enable cursor degamma ROM for implict sRGB
degamma on HW with this configuration.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 2198df96ed6f..cc74dd69acf2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1269,6 +1269,13 @@ void amdgpu_dm_plane_handle_cursor_update(struct drm_plane *plane,
attributes.rotation_angle = 0;
attributes.attribute_flags.value = 0;
+ /* Enable cursor degamma ROM on DCN3+ for implicit sRGB degamma in DRM
+ * legacy gamma setup.
+ */
+ if (crtc_state->cm_is_degamma_srgb &&
+ adev->dm.dc->caps.color.dpp.gamma_corr)
+ attributes.attribute_flags.bits.ENABLE_CURSOR_DEGAMMA = 1;
+
attributes.pitch = afb->base.pitches[0] / afb->base.format->cpp[0];
if (crtc_state->stream) {
Make IS_POSIXACL() return false if POSIX ACL support is disabled and
ignore SB_POSIXACL/MS_POSIXACL.
Never skip applying the umask in namei.c and never bother to do any
ACL specific checks if the filesystem falsely indicates it has ACLs
enabled when the feature is completely disabled in the kernel.
This fixes a problem where the umask is always ignored in the NFS
client when compiled without CONFIG_FS_POSIX_ACL. This is a 4 year
old regression caused by commit 013cdf1088d723 which itself was not
completely wrong, but failed to consider all the side effects by
misdesigned VFS code.
Prior to that commit, there were two places where the umask could be
applied, for example when creating a directory:
1. in the VFS layer in SYSCALL_DEFINE3(mkdirat), but only if
!IS_POSIXACL()
2. again (unconditionally) in nfs3_proc_mkdir()
The first one does not apply, because even without
CONFIG_FS_POSIX_ACL, the NFS client sets MS_POSIXACL in
nfs_fill_super().
After that commit, (2.) was replaced by:
2b. in posix_acl_create(), called by nfs3_proc_mkdir()
There's one branch in posix_acl_create() which applies the umask;
however, without CONFIG_FS_POSIX_ACL, posix_acl_create() is an empty
dummy function which does not apply the umask.
The approach chosen by this patch is to make IS_POSIXACL() always
return false when POSIX ACL support is disabled, so the umask always
gets applied by the VFS layer. This is consistent with the (regular)
behavior of posix_acl_create(): that function returns early if
IS_POSIXACL() is false, before applying the umask.
Therefore, posix_acl_create() is responsible for applying the umask if
there is ACL support enabled in the file system (SB_POSIXACL), and the
VFS layer is responsible for all other cases (no SB_POSIXACL or no
CONFIG_FS_POSIX_ACL).
Reviewed-by: J. Bruce Fields <bfields(a)redhat.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: stable(a)vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
---
include/linux/fs.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4aeb3fa11927..c1a4bc5c2e95 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2110,7 +2110,12 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
#define IS_NOQUOTA(inode) ((inode)->i_flags & S_NOQUOTA)
#define IS_APPEND(inode) ((inode)->i_flags & S_APPEND)
#define IS_IMMUTABLE(inode) ((inode)->i_flags & S_IMMUTABLE)
+
+#ifdef CONFIG_FS_POSIX_ACL
#define IS_POSIXACL(inode) __IS_FLG(inode, SB_POSIXACL)
+#else
+#define IS_POSIXACL(inode) 0
+#endif
#define IS_DEADDIR(inode) ((inode)->i_flags & S_DEAD)
#define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
--
2.39.2
The function posix_acl_create() applies the umask only if the inode
has no ACL (= NULL) or if ACLs are not supported by the filesystem
driver (= -EOPNOTSUPP).
However, this happens only after after the IS_POSIXACL() check
succeeded. If the superblock doesn't enable ACL support, umask will
never be applied. A filesystem which has no ACL support will of
course not enable SB_POSIXACL, rendering the umask-applying code path
unreachable.
This fixes a bug which causes the umask to be ignored with O_TMPFILE
on tmpfs:
https://github.com/MusicPlayerDaemon/MPD/issues/558https://bugs.gentoo.org/show_bug.cgi?id=686142#c3https://bugzilla.kernel.org/show_bug.cgi?id=203625
Reviewed-by: J. Bruce Fields <bfields(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
---
fs/posix_acl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index a05fe94970ce..79831269dd2f 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -642,9 +642,14 @@ posix_acl_create(struct inode *dir, umode_t *mode,
*acl = NULL;
*default_acl = NULL;
- if (S_ISLNK(*mode) || !IS_POSIXACL(dir))
+ if (S_ISLNK(*mode))
return 0;
+ if (!IS_POSIXACL(dir)) {
+ *mode &= ~current_umask();
+ return 0;
+ }
+
p = get_inode_acl(dir, ACL_TYPE_DEFAULT);
if (!p || p == ERR_PTR(-EOPNOTSUPP)) {
*mode &= ~current_umask();
--
2.39.2
Initial booting is setting the task flag to idle (PF_IDLE) by the call
path sched_init() -> init_idle(). Having the task idle and calling
call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
set. Subsequent calls to any cond_resched() will enable IRQs,
potentially earlier than the IRQ setup has completed. Recent changes
have caused just this scenario and IRQs have been enabled early.
This causes a warning later in start_kernel() as interrupts are enabled
before they are fully set up.
Fix this issue by setting the PF_IDLE flag later in the boot sequence.
Although the boot task was marked as idle since (at least) d80e4fda576d,
I am not sure that it is wrong to do so. The forced context-switch on
idle task was introduced in the tiny_rcu update, so I'm going to claim
this fixes 5f6130fa52ee.
Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/
Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+Zo…
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
Cc: stable(a)vger.kernel.org
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Andreas Schwab <schwab(a)linux-m68k.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "Mike Rapoport (IBM)" <rppt(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
v1: https://lore.kernel.org/linux-mm/20230913005647.1534747-1-Liam.Howlett@orac…
kernel/sched/core.c | 2 +-
kernel/sched/idle.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..e8f73ff12126 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9271,7 +9271,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
* PF_KTHREAD should already be set at this point; regardless, make it
* look like a proper per-CPU kthread.
*/
- idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
+ idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
kthread_set_per_cpu(idle, cpu);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 342f58a329f5..5007b25c5bc6 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -373,6 +373,7 @@ EXPORT_SYMBOL_GPL(play_idle_precise);
void cpu_startup_entry(enum cpuhp_state state)
{
+ current->flags |= PF_IDLE;
arch_cpu_idle_prepare();
cpuhp_online_idle(state);
while (1)
--
2.39.2
From: Jeff Vanhoof <qjv001(a)motorola.com>
arm-smmu related crashes seen after a Missed ISOC interrupt when
no_interrupt=1 is used. This can happen if the hardware is still using
the data associated with a TRB after the usb_request's ->complete call
has been made. Instead of immediately releasing a request when a Missed
ISOC interrupt has occurred, this change will add logic to cancel the
request instead where it will eventually be released when the
END_TRANSFER command has completed. This logic is similar to some of the
cleanup done in dwc3_gadget_ep_dequeue.
Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001(a)motorola.com>
Co-developed-by: Dan Vacura <w36195(a)motorola.com>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
V3 -> V4:
- no change
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8f9959ba9fd4..9b005d912241 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -943,6 +943,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1
u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..411532c5c378 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ if (status == -EXDEV) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;
- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);
- if (!dep->endpoint.desc)
- return no_started_trb;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+ dwc3_gadget_move_cancelled_request(req,
+ DWC3_REQUEST_STATUS_MISSED_ISOC);
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }
out:
/*
--
2.34.1
Make sure that the device is not runtime suspended before explicitly
disabling the clocks on probe failure and on driver unbind to avoid a
clock enable-count imbalance.
Fixes: 9e3a000362ae ("spi: zynqmp: Add pm runtime support")
Cc: stable(a)vger.kernel.org # 4.19
Cc: Naga Sureshkumar Relli <naga.sureshkumar.relli(a)xilinx.com>
Cc: Shubhrajyoti Datta <shubhrajyoti.datta(a)xilinx.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
Hi Mark,
This patch ended up sitting in your for-next and for-6.4 branches for a
few releases but was never sent on to Linus as I reported here:
https://lore.kernel.org/lkml/ZOy0l6sXyYib59ej@finisterre.sirena.org.uk/
Now it appears to have been dropped from linux-next so resending.
Johan
drivers/spi/spi-zynqmp-gqspi.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-zynqmp-gqspi.c b/drivers/spi/spi-zynqmp-gqspi.c
index 94d9a33d9af5..9a46b2478f4e 100644
--- a/drivers/spi/spi-zynqmp-gqspi.c
+++ b/drivers/spi/spi-zynqmp-gqspi.c
@@ -1340,9 +1340,9 @@ static int zynqmp_qspi_probe(struct platform_device *pdev)
return 0;
clk_dis_all:
- pm_runtime_put_sync(&pdev->dev);
- pm_runtime_set_suspended(&pdev->dev);
pm_runtime_disable(&pdev->dev);
+ pm_runtime_put_noidle(&pdev->dev);
+ pm_runtime_set_suspended(&pdev->dev);
clk_disable_unprepare(xqspi->refclk);
clk_dis_pclk:
clk_disable_unprepare(xqspi->pclk);
@@ -1366,11 +1366,15 @@ static void zynqmp_qspi_remove(struct platform_device *pdev)
{
struct zynqmp_qspi *xqspi = platform_get_drvdata(pdev);
+ pm_runtime_get_sync(&pdev->dev);
+
zynqmp_gqspi_write(xqspi, GQSPI_EN_OFST, 0x0);
+
+ pm_runtime_disable(&pdev->dev);
+ pm_runtime_put_noidle(&pdev->dev);
+ pm_runtime_set_suspended(&pdev->dev);
clk_disable_unprepare(xqspi->refclk);
clk_disable_unprepare(xqspi->pclk);
- pm_runtime_set_suspended(&pdev->dev);
- pm_runtime_disable(&pdev->dev);
}
MODULE_DEVICE_TABLE(of, zynqmp_qspi_of_match);
--
2.41.0
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x e66dd317194daae0475fe9e5577c80aa97f16cb9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091616-collapse-uncrown-ad3a@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
e66dd317194d ("mtd: rawnand: brcmnand: Fix crash during the panic_write")
3c7c1e4594ef ("mtd: rawnand: brcmnand: Refactored code to introduce helper functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e66dd317194daae0475fe9e5577c80aa97f16cb9 Mon Sep 17 00:00:00 2001
From: William Zhang <william.zhang(a)broadcom.com>
Date: Thu, 6 Jul 2023 11:29:07 -0700
Subject: [PATCH] mtd: rawnand: brcmnand: Fix crash during the panic_write
When executing a NAND command within the panic write path, wait for any
pending command instead of calling BUG_ON to avoid crashing while
already crashing.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Reviewed-by: Kursad Oney <kursad.oney(a)broadcom.com>
Reviewed-by: Kamal Dasu <kamal.dasu(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-4-william.zhang@broa…
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 9a373a10304d..b2c6396060db 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1608,7 +1608,17 @@ static void brcmnand_send_cmd(struct brcmnand_host *host, int cmd)
dev_dbg(ctrl->dev, "send native cmd %d addr 0x%llx\n", cmd, cmd_addr);
- BUG_ON(ctrl->cmd_pending != 0);
+ /*
+ * If we came here through _panic_write and there is a pending
+ * command, try to wait for it. If it times out, rather than
+ * hitting BUG_ON, just return so we don't crash while crashing.
+ */
+ if (oops_in_progress) {
+ if (ctrl->cmd_pending &&
+ bcmnand_ctrl_poll_status(ctrl, NAND_CTRL_RDY, NAND_CTRL_RDY, 0))
+ return;
+ } else
+ BUG_ON(ctrl->cmd_pending != 0);
ctrl->cmd_pending = cmd;
ret = bcmnand_ctrl_poll_status(ctrl, NAND_CTRL_RDY, NAND_CTRL_RDY, 0);
The patch titled
Subject: mm: lock VMAs skipped by a failed queue_pages_range()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-lock-vmas-skipped-by-a-failed-queue_pages_range.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: lock VMAs skipped by a failed queue_pages_range()
Date: Mon, 18 Sep 2023 14:16:08 -0700
When queue_pages_range() encounters an unmovable page, it terminates its
page walk. This walk, among other things, locks the VMAs in the range.
This termination might result in some VMAs being left unlocked after
queue_pages_range() completes. Since do_mbind() continues to operate on
these VMAs despite the failure from queue_pages_range(), it will encounter
an unlocked VMA, leading to a BUG().
This mbind() behavior has been modified several times before and might
need some changes to either finish the page walk even in the presence of
unmovable pages or to error out immediately after the failure to
queue_pages_range(). However that requires more discussions, so to fix
the immediate issue, explicitly lock the VMAs in the range if
queue_pages_range() failed. The added condition does not save much but is
added for documentation purposes to understand when this extra locking is
needed.
Link: https://lkml.kernel.org/r/20230918211608.3580629-1-surenb@google.com
Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+b591856e0f0139f83023(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/mempolicy.c~mm-lock-vmas-skipped-by-a-failed-queue_pages_range
+++ a/mm/mempolicy.c
@@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ /* If queue_pages_range failed then not all VMAs might be locked */
+ if (ret)
+ vma_start_write(vma);
err = mbind_range(&vmi, vma, &prev, start, end, new);
if (err)
break;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-lock-vmas-skipped-by-a-failed-queue_pages_range.patch
From: Fedor Pchelkin <pchelkin(a)ispras.ru>
[ Upstream commit ccbe77f7e45dfb4420f7f531b650c00c6e9c7507 ]
Syzkaller reports a memory leak:
BUG: memory leak
unreferenced object 0xffff88810b279e00 (size 96):
comm "syz-executor399", pid 3631, jiffies 4294964921 (age 23.870s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 08 9e 27 0b 81 88 ff ff ..........'.....
08 9e 27 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ..'.............
backtrace:
[<ffffffff814cfc90>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
[<ffffffff81bb75ca>] kmalloc include/linux/slab.h:576 [inline]
[<ffffffff81bb75ca>] autofs_wait+0x3fa/0x9a0 fs/autofs/waitq.c:378
[<ffffffff81bb88a7>] autofs_do_expire_multi+0xa7/0x3e0 fs/autofs/expire.c:593
[<ffffffff81bb8c33>] autofs_expire_multi+0x53/0x80 fs/autofs/expire.c:619
[<ffffffff81bb6972>] autofs_root_ioctl_unlocked+0x322/0x3b0 fs/autofs/root.c:897
[<ffffffff81bb6a95>] autofs_root_ioctl+0x25/0x30 fs/autofs/root.c:910
[<ffffffff81602a9c>] vfs_ioctl fs/ioctl.c:51 [inline]
[<ffffffff81602a9c>] __do_sys_ioctl fs/ioctl.c:870 [inline]
[<ffffffff81602a9c>] __se_sys_ioctl fs/ioctl.c:856 [inline]
[<ffffffff81602a9c>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:856
[<ffffffff84608225>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84608225>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
autofs_wait_queue structs should be freed if their wait_ctr becomes zero.
Otherwise they will be lost.
In this case an AUTOFS_IOC_EXPIRE_MULTI ioctl is done, then a new
waitqueue struct is allocated in autofs_wait(), its initial wait_ctr
equals 2. After that wait_event_killable() is interrupted (it returns
-ERESTARTSYS), so that 'wq->name.name == NULL' condition may be not
satisfied. Actually, this condition can be satisfied when
autofs_wait_release() or autofs_catatonic_mode() is called and, what is
also important, wait_ctr is decremented in those places. Upon the exit of
autofs_wait(), wait_ctr is decremented to 1. Then the unmounting process
begins: kill_sb calls autofs_catatonic_mode(), which should have freed the
waitqueues, but it only decrements its usage counter to zero which is not
a correct behaviour.
edit:imk
This description is of course not correct. The umount performed as a result
of an expire is a umount of a mount that has been automounted, it's not the
autofs mount itself. They happen independently, usually after everything
mounted within the autofs file system has been expired away. If everything
hasn't been expired away the automount daemon can still exit leaving mounts
in place. But expires done in both cases will result in a notification that
calls autofs_wait_release() with a result status. The problem case is the
summary execution of of the automount daemon. In this case any waiting
processes won't be woken up until either they are terminated or the mount
is umounted.
end edit: imk
So in catatonic mode we should free waitqueues which counter becomes zero.
edit: imk
Initially I was concerned that the calling of autofs_wait_release() and
autofs_catatonic_mode() was not mutually exclusive but that can't be the
case (obviously) because the queue entry (or entries) is removed from the
list when either of these two functions are called. Consequently the wait
entry will be freed by only one of these functions or by the woken process
in autofs_wait() depending on the order of the calls.
end edit: imk
Reported-by: syzbot+5e53f70e69ff0c0a1c0c(a)syzkaller.appspotmail.com
Suggested-by: Takeshi Misawa <jeliantsurux(a)gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov(a)ispras.ru>
Signed-off-by: Ian Kent <raven(a)themaw.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: autofs(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Message-Id: <169112719161.7590.6700123246297365841.stgit(a)donald.themaw.net>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/autofs/waitq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/autofs/waitq.c b/fs/autofs/waitq.c
index 54c1f8b8b0757..efdc76732faed 100644
--- a/fs/autofs/waitq.c
+++ b/fs/autofs/waitq.c
@@ -32,8 +32,9 @@ void autofs_catatonic_mode(struct autofs_sb_info *sbi)
wq->status = -ENOENT; /* Magic is gone - report failure */
kfree(wq->name.name - wq->offset);
wq->name.name = NULL;
- wq->wait_ctr--;
wake_up_interruptible(&wq->queue);
+ if (!--wq->wait_ctr)
+ kfree(wq);
wq = nwq;
}
fput(sbi->pipe); /* Close the pipe */
--
2.40.1
This is the start of the stable review cycle for the 5.10.194 release.
There are 11 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 02 Sep 2023 11:08:22 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.194-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.194-rc1
Paul E. McKenney <paulmck(a)kernel.org>
rcu-tasks: Add trc_inspect_reader() checks for exiting critical section
Paul E. McKenney <paulmck(a)kernel.org>
rcu-tasks: Wait for trc_read_check_handler() IPIs
Neeraj Upadhyay <neeraju(a)codeaurora.org>
rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader
Paul E. McKenney <paulmck(a)kernel.org>
rcu: Prevent expedited GP from enabling tick on offline CPU
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "MIPS: Alchemy: fix dbdma2"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "drm/amdgpu: install stub fence into potential unused fence pointers"
Loic Poulain <loic.poulain(a)linaro.org>
mhi: pci_generic: Fix implicit conversion warning
James Morse <james.morse(a)arm.com>
ARM: module: Use module_init_layout_section() to spot init sections
James Morse <james.morse(a)arm.com>
arm64: module: Use module_init_layout_section() to spot init sections
Arnd Bergmann <arnd(a)arndb.de>
arm64: module-plts: inline linux/moduleloader.h
James Morse <james.morse(a)arm.com>
module: Expose module_init_layout_section()
-------------
Diffstat:
Makefile | 4 ++--
arch/arm/kernel/module-plts.c | 2 +-
arch/arm64/kernel/module-plts.c | 3 ++-
arch/mips/alchemy/common/dbdma.c | 27 ++++++++++++-------------
drivers/bus/mhi/host/pci_generic.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++----
include/linux/moduleloader.h | 5 +++++
kernel/module.c | 2 +-
kernel/rcu/tasks.h | 36 ++++++++++++++++++++++++----------
kernel/rcu/tree_exp.h | 5 ++++-
10 files changed, 56 insertions(+), 36 deletions(-)
From: Marek Vasut <marex(a)denx.de>
[ Upstream commit 362fa8f6e6a05089872809f4465bab9d011d05b3 ]
This bridge seems to need the HSE packet, otherwise the image is
shifted up and corrupted at the bottom. This makes the bridge
work with Samsung DSIM on i.MX8MM and i.MX8MP.
Signed-off-by: Marek Vasut <marex(a)denx.de>
Reviewed-by: Sam Ravnborg <sam(a)ravnborg.org>
Signed-off-by: Robert Foss <rfoss(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615201902.566182-3-marex…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/bridge/tc358762.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/tc358762.c b/drivers/gpu/drm/bridge/tc358762.c
index 77f7f7f547570..5de3f5b8dd336 100644
--- a/drivers/gpu/drm/bridge/tc358762.c
+++ b/drivers/gpu/drm/bridge/tc358762.c
@@ -216,7 +216,7 @@ static int tc358762_probe(struct mipi_dsi_device *dsi)
dsi->lanes = 1;
dsi->format = MIPI_DSI_FMT_RGB888;
dsi->mode_flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
- MIPI_DSI_MODE_LPM;
+ MIPI_DSI_MODE_LPM | MIPI_DSI_MODE_VIDEO_HSE;
ret = tc358762_parse_dt(ctx);
if (ret < 0)
--
2.40.1
From: Marek Vasut <marex(a)denx.de>
[ Upstream commit 362fa8f6e6a05089872809f4465bab9d011d05b3 ]
This bridge seems to need the HSE packet, otherwise the image is
shifted up and corrupted at the bottom. This makes the bridge
work with Samsung DSIM on i.MX8MM and i.MX8MP.
Signed-off-by: Marek Vasut <marex(a)denx.de>
Reviewed-by: Sam Ravnborg <sam(a)ravnborg.org>
Signed-off-by: Robert Foss <rfoss(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615201902.566182-3-marex…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/bridge/tc358762.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/tc358762.c b/drivers/gpu/drm/bridge/tc358762.c
index 7f4fce1aa9988..8db981e7759b9 100644
--- a/drivers/gpu/drm/bridge/tc358762.c
+++ b/drivers/gpu/drm/bridge/tc358762.c
@@ -216,7 +216,7 @@ static int tc358762_probe(struct mipi_dsi_device *dsi)
dsi->lanes = 1;
dsi->format = MIPI_DSI_FMT_RGB888;
dsi->mode_flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
- MIPI_DSI_MODE_LPM;
+ MIPI_DSI_MODE_LPM | MIPI_DSI_MODE_VIDEO_HSE;
ret = tc358762_parse_dt(ctx);
if (ret < 0)
--
2.40.1
Hi,
Would you be interested in acquiring the Healthcare Information and
Management Systems Society Attendees List 2023?
Number of Contacts: 45,789
Cost: $ 2,175
Interested? Email me back; I would love to provide more information on the list.
Kind Regards,
Kris jenner
Marketing Coordinator
Like various other ASUS ExpertBook-s, the ASUS ExpertBook B1402CBA
has an ACPI DSDT table that describes IRQ 1 as ActiveLow while
the kernel overrides it to EdgeHigh.
This prevents the keyboard from working. To fix this issue, add this laptop
to the skip_override_table so that the kernel does not override IRQ 1.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217901
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/resource.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 32cfa3f4efd3..8116b55b6c98 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -439,6 +439,13 @@ static const struct dmi_system_id asus_laptop[] = {
DMI_MATCH(DMI_BOARD_NAME, "S5602ZA"),
},
},
+ {
+ .ident = "Asus ExpertBook B1402CBA",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
+ DMI_MATCH(DMI_BOARD_NAME, "B1402CBA"),
+ },
+ },
{
.ident = "Asus ExpertBook B1502CBA",
.matches = {
--
2.41.0
From: John Ogness <john.ogness(a)linutronix.de>
[ Upstream commit 7b23a66db55ed0a55b020e913f0d6f6d52a1ad2c ]
A semaphore is not NMI-safe, even when using down_trylock(). Both
down_trylock() and up() are using internal spinlocks and up()
might even call wake_up_process().
In the panic() code path it gets even worse because the internal
spinlocks of the semaphore may have been taken by a CPU that has
been stopped.
To reduce the risk of deadlocks caused by the console semaphore in
the panic path, make the following changes:
- First check if any consoles have implemented the unblank()
callback. If not, then there is no reason to take the console
semaphore anyway. (This check is also useful for the non-panic
path since the locking/unlocking of the console lock can be
quite expensive due to console printing.)
- If the panic path is in NMI context, bail out without attempting
to take the console semaphore or calling any unblank() callbacks.
Bailing out is acceptable because console_unblank() would already
bail out if the console semaphore is contended. The alternative of
ignoring the console semaphore and calling the unblank() callbacks
anyway is a bad idea because these callbacks are also not NMI-safe.
If consoles with unblank() callbacks exist and console_unblank() is
called from a non-NMI panic context, it will still attempt a
down_trylock(). This could still result in a deadlock if one of the
stopped CPUs is holding the semaphore internal spinlock. But this
is a risk that the kernel has been (and continues to be) willing
to take.
Signed-off-by: John Ogness <john.ogness(a)linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
Link: https://lore.kernel.org/r/20230717194607.145135-3-john.ogness@linutronix.de
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/printk/printk.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 357a4d18f6387..7d3f30eb35862 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3045,9 +3045,27 @@ EXPORT_SYMBOL(console_conditional_schedule);
void console_unblank(void)
{
+ bool found_unblank = false;
struct console *c;
int cookie;
+ /*
+ * First check if there are any consoles implementing the unblank()
+ * callback. If not, there is no reason to continue and take the
+ * console lock, which in particular can be dangerous if
+ * @oops_in_progress is set.
+ */
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(c) {
+ if ((console_srcu_read_flags(c) & CON_ENABLED) && c->unblank) {
+ found_unblank = true;
+ break;
+ }
+ }
+ console_srcu_read_unlock(cookie);
+ if (!found_unblank)
+ return;
+
/*
* Stop console printing because the unblank() callback may
* assume the console is not within its write() callback.
@@ -3056,6 +3074,16 @@ void console_unblank(void)
* In that case, attempt a trylock as best-effort.
*/
if (oops_in_progress) {
+ /* Semaphores are not NMI-safe. */
+ if (in_nmi())
+ return;
+
+ /*
+ * Attempting to trylock the console lock can deadlock
+ * if another CPU was stopped while modifying the
+ * semaphore. "Hope and pray" that this is not the
+ * current situation.
+ */
if (down_trylock_console_sem() != 0)
return;
} else
--
2.40.1
This regression affects all copies of the Dell XPS 15 9560 and Dell Precision
5520 with any SSD including aftermarket ones.
Per the bugzilla discussion here:
https://bugzilla.kernel.org/show_bug.cgi?id=217802 this regression has
been confirmed to also affect 6.5.2 and 6.6-rc1, and affects several
distros.
Known affected branches: 6.1, 6.4, 6.5, 6.6.
It has already been reverted by OpenSUSE and is soon to be reverted in
NixOS. A patch follows, hopefully with the right metadata tags.
I have compiled a kernel 6.1 with the revert and confirmed it now boots
again.
p.s. this is my first time pointing git-send-email at this particular
list, so I'm sorry if I got anything wrong.
Jade
Currently if ucsi_send_command() fails, then we bail out without
clearing EVENT_PENDING flag. So when the next connector change
event comes, ucsi_connector_change() won't queue the con->work,
because of which none of the new events will be processed.
Fix this by clearing EVENT_PENDING flag if ucsi_send_command()
fails.
Cc: <stable(a)vger.kernel.org> # 5.16
Fixes: 512df95b9432 ("usb: typec: ucsi: Better fix for missing unplug events issue")
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/typec/ucsi/ucsi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index c6dfe3d..509c67c 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -884,6 +884,7 @@ static void ucsi_handle_connector_change(struct work_struct *work)
if (ret < 0) {
dev_err(ucsi->dev, "%s: GET_CONNECTOR_STATUS failed (%d)\n",
__func__, ret);
+ clear_bit(EVENT_PENDING, &con->ucsi->flags);
goto out_unlock;
}
--
2.7.4
Hi Pavel,
We have running systems that use COLOR_ID_MULTI. The GPIO toggles
between two colors and we have used the identifier. RGB is not a good
fit since it is not a RGB LED. Please provide guidance.
This patch causes the system to not start: f741121a2251 leds: Fix
BUG_ON check for LED_COLOR_ID_MULTI that is always false
It was also backported to stable causing previously booting systems to
no longer boot.
Best,
Da Xue
Since commit 2d47c6956ab3 ("ubsan: Tighten UBSAN_BOUNDS on GCC"),
UBSAN_BOUNDS no longer pretends 1-element arrays are unbounded. Walking
'element' and 'channel_list' will trigger warnings, so make them proper
flexible arrays.
False positive warnings were:
UBSAN: array-index-out-of-bounds in drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:6984:20
index 1 is out of range for type '__le32 [1]'
UBSAN: array-index-out-of-bounds in drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1126:27
index 1 is out of range for type '__le16 [1]'
for these lines of code:
6884 ch.chspec = (u16)le32_to_cpu(list->element[i]);
1126 params_le->channel_list[i] = cpu_to_le16(chanspec);
Cc: stable(a)vger.kernel.org # 6.5+
Signed-off-by: Juerg Haefliger <juerg.haefliger(a)canonical.com>
---
v2:
- Use element[] instead of DFA() in brcmf_chanspec_list.
- Add Cc: stable tag
---
.../wireless/broadcom/brcm80211/brcmfmac/fwil_types.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h
index bece26741d3a..611d1a6aabb9 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h
@@ -442,7 +442,12 @@ struct brcmf_scan_params_v2_le {
* fixed parameter portion is assumed, otherwise
* ssid in the fixed portion is ignored
*/
- __le16 channel_list[1]; /* list of chanspecs */
+ union {
+ __le16 padding; /* Reserve space for at least 1 entry for abort
+ * which uses an on stack brcmf_scan_params_v2_le
+ */
+ DECLARE_FLEX_ARRAY(__le16, channel_list); /* chanspecs */
+ };
};
struct brcmf_scan_results {
@@ -702,7 +707,7 @@ struct brcmf_sta_info_le {
struct brcmf_chanspec_list {
__le32 count; /* # of entries */
- __le32 element[1]; /* variable length uint32 list */
+ __le32 element[]; /* variable length uint32 list */
};
/*
--
2.39.2
Hi Greg, Sasha,
This batch contains a fixes for 4.19:
1) Missing fix in 4.19, you can cherry-pick it from
8ca79606cdfd ("netfilter: nft_flow_offload: fix underflow in flowtable reference counter")
2) Oneliner that includes missing chunk in 4.19 backport.
Fixes: 1df28fde1270 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain") in 4.19
This patch you have to manually apply it.
Thanks.
Pablo Neira Ayuso (1):
netfilter: nf_tables: missing NFT_TRANS_PREPARE_ERROR in flowtable deactivatation
wenxu (1):
netfilter: nft_flow_offload: fix underflow in flowtable reference counter
net/netfilter/nf_tables_api.c | 1 +
net/netfilter/nft_flow_offload.c | 3 ---
2 files changed, 1 insertion(+), 3 deletions(-)
--
2.30.2
Daire reported a few issues with MPTCP where some connections were
stalled in different states. Paolo did a great job fixing them.
Patch 1 fixes bogus receive window shrinkage with multiple subflows. Due
to a race condition and unlucky circumstances, that may lead to
TCP-level window shrinkage, and the connection being stalled on the
sender end.
Patch 2 is a preparation for patch 3 which processes pending subflow
errors on close. Without that and under specific circumstances, the
MPTCP-level socket might not switch to the CLOSE state and stall.
Patch 4 is also a preparation patch for the next one. Patch 5 fixes
MPTCP connections not switching to the CLOSE state when all subflows
have been closed but no DATA_FIN have been exchanged to explicitly close
the MPTCP connection. Now connections in such state will switch to the
CLOSE state after a timeout, still allowing the "make-after-break"
feature but making sure connections don't stall forever. It will be
possible to modify this timeout -- currently matching TCP TIMEWAIT value
(60 seconds) -- in a future version.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Paolo Abeni (5):
mptcp: fix bogus receive window shrinkage with multiple subflows
mptcp: move __mptcp_error_report in protocol.c
mptcp: process pending subflow error on close
mptcp: rename timer related helper to less confusing names
mptcp: fix dangling connection hang-up
net/mptcp/options.c | 5 +-
net/mptcp/protocol.c | 165 +++++++++++++++++++++++++++++++--------------------
net/mptcp/protocol.h | 24 +++++++-
net/mptcp/subflow.c | 39 +-----------
4 files changed, 130 insertions(+), 103 deletions(-)
---
base-commit: 615efed8b63f60ddd69c0b8f32f7783859034fc2
change-id: 20230915-upstream-net-20230915-mptcp-hanging-conn-0609338b1728
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
The patch titled
Subject: fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-binfmt_elf_efpic-fix-personality-for-elf-fdpic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Greg Ungerer <gerg(a)kernel.org>
Subject: fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
Date: Thu, 7 Sep 2023 11:18:08 +1000
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example). The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).
On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary. This
matters since start_thread() will set the ARM CPSR register as required
based on this flag. If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.
Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it. This macro in
the generic case sets PER_LINUX and preserves the upper bytes.
Architectures can override this for their specific use case, and ARM does
exactly this.
The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware. If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.
Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23547 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg(a)kernel.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: Greg Ungerer <gerg(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_elf_fdpic.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/binfmt_elf_fdpic.c~fs-binfmt_elf_efpic-fix-personality-for-elf-fdpic
+++ a/fs/binfmt_elf_fdpic.c
@@ -345,10 +345,9 @@ static int load_elf_fdpic_binary(struct
/* there's now no turning back... the old userspace image is dead,
* defunct, deceased, etc.
*/
+ SET_PERSONALITY(exec_params.hdr);
if (elf_check_fdpic(&exec_params.hdr))
- set_personality(PER_LINUX_FDPIC);
- else
- set_personality(PER_LINUX);
+ current->personality |= PER_LINUX_FDPIC;
if (elf_read_implies_exec(&exec_params.hdr, executable_stack))
current->personality |= READ_IMPLIES_EXEC;
_
Patches currently in -mm which might be from gerg(a)kernel.org are
fs-binfmt_elf_efpic-fix-personality-for-elf-fdpic.patch
Building parisc:allnoconfig ... failed
--------------
Error log:
arch/parisc/kernel/processor.c: In function 'show_cpuinfo':
arch/parisc/kernel/processor.c:443:30: error: 'cpuinfo' undeclared (first use in this function)
443 | cpuinfo->loops_per_jiffy / (500000 / HZ),
Caused by 'parisc: Fix /proc/cpuinfo output for lscpu' which
moves the declaration of cpuinfo inside an #ifdef but still uses
it outside of it in v5.10.y and older.
That either needs to be dropped, adjusted, or commit 93346da8ff47 ("parisc:
Drop loops_per_jiffy from per_cpu struct") needs to be applied as well
(tested with v5.10.y.queue).
Guenter
make tools/perf fails with
gcc: fatal error: no input files
Bisect points to 'perf build: Update build rule for generated files'
which adds
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
but there is no PMU_EVENTS_C in v6.1.y.
Guenter
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7822a8913f4c51c7d1aff793b525d60c3384fb5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091718-unequal-drinkable-e89d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7822a8913f4c ("perf build: Update build rule for generated files")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7822a8913f4c51c7d1aff793b525d60c3384fb5b Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Thu, 27 Jul 2023 19:24:46 -0700
Subject: [PATCH] perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Anup Sharma <anupnewsmail(a)gmail.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 89430338a3d9..fac42486a8cf 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory
+# so it needs a separate rule to depend on them properly
+$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
+$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 150765f2baee..1d18bb89402e 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L
$(call rule_mkdir)
$(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@
endif
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
17. September 2023.
Hallo,
Ich möchte Ihnen einen Geschäftsvorschlag mitteilen. Für weitere
Details antworten Sie auf Englisch.
Grüße
Frau Victoria Cleland
_________________________
Sekretär: Wajdi Chaabani
stable-rc/linux-5.4.y build: 17 builds: 0 failed, 17 passed, 34 warnings (v5.4.256-310-g2c8b78bf17ff)
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-5.4.y/kernel/v5.4.256-310…
Tree: stable-rc
Branch: linux-5.4.y
Git Describe: v5.4.256-310-g2c8b78bf17ff
Git Commit: 2c8b78bf17fff04480dd96ee53f39d222c304903
Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Built: 7 unique architectures
Warnings Detected:
arc:
arm64:
defconfig (gcc-10): 3 warnings
defconfig+arm64-chromebook (gcc-10): 4 warnings
arm:
imx_v6_v7_defconfig (gcc-10): 1 warning
omap2plus_defconfig (gcc-10): 1 warning
i386:
allnoconfig (gcc-10): 2 warnings
i386_defconfig (gcc-10): 3 warnings
tinyconfig (gcc-10): 2 warnings
mips:
riscv:
x86_64:
allnoconfig (gcc-10): 4 warnings
tinyconfig (gcc-10): 4 warnings
x86_64_defconfig (gcc-10): 5 warnings
x86_64_defconfig+x86-chromebook (gcc-10): 5 warnings
Warnings summary:
7 ld: warning: creating DT_TEXTREL in a PIE
7 fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
4 ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
4 arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
3 ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
2 arch/x86/entry/entry_64.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
2 arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x1c1: unsupported intra-function call
2 arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x151: unsupported intra-function call
2 arch/x86/entry/entry_64.S:1756: Warning: no instruction mnemonic suffix given and no register operands; using default for `sysret'
1 drivers/gpu/drm/mediatek/mtk_drm_gem.c:273:10: warning: returning ‘int’ from a function with return type ‘void *’ makes pointer from integer without a cast [-Wint-conversion]
Section mismatches summary:
1 WARNING: vmlinux.o(___ksymtab_gpl+vic_init_cascaded+0x0): Section mismatch in reference from the variable __ksymtab_vic_init_cascaded to the function .init.text:vic_init_cascaded()
================================================================================
Detailed per-defconfig build reports:
--------------------------------------------------------------------------------
32r2el_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
--------------------------------------------------------------------------------
allnoconfig (x86_64, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.S:1756: Warning: no instruction mnemonic suffix given and no register operands; using default for `sysret'
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x151: unsupported intra-function call
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
allnoconfig (i386, gcc-10) — PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
defconfig (riscv, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
--------------------------------------------------------------------------------
defconfig (arm64, gcc-10) — PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
--------------------------------------------------------------------------------
defconfig+arm64-chromebook (arm64, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
drivers/gpu/drm/mediatek/mtk_drm_gem.c:273:10: warning: returning ‘int’ from a function with return type ‘void *’ makes pointer from integer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
haps_hs_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
--------------------------------------------------------------------------------
i386_defconfig (i386, gcc-10) — PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
imx_v6_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
--------------------------------------------------------------------------------
multi_v5_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
Section mismatches:
WARNING: vmlinux.o(___ksymtab_gpl+vic_init_cascaded+0x0): Section mismatch in reference from the variable __ksymtab_vic_init_cascaded to the function .init.text:vic_init_cascaded()
--------------------------------------------------------------------------------
multi_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
--------------------------------------------------------------------------------
omap2plus_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
--------------------------------------------------------------------------------
tinyconfig (i386, gcc-10) — PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
tinyconfig (x86_64, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.S:1756: Warning: no instruction mnemonic suffix given and no register operands; using default for `sysret'
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x151: unsupported intra-function call
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
vexpress_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
--------------------------------------------------------------------------------
x86_64_defconfig (x86_64, gcc-10) — PASS, 0 errors, 5 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x1c1: unsupported intra-function call
arch/x86/entry/entry_64.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
x86_64_defconfig+x86-chromebook (x86_64, gcc-10) — PASS, 0 errors, 5 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x1c1: unsupported intra-function call
arch/x86/entry/entry_64.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
fs/quota/dquot.c:2611:1: warning: label ‘out’ defined but not used [-Wunused-label]
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
---
For more info write to <info(a)kernelci.org>
Hi, linux-stable-mirror
I'm thrilled to share that Xiamen Oready Industry & Trade Co.,Ltd is one of the leading bags manufacturer from China.
I would like to recommend our drawstring bag as below, it is a good choice for giveaways. Different colors are available, we can print your logo and text on it, the MOQ is only 1,000 pcs.
If you have any inquiry, please don't hesitate to contact me.
Best regards,
Steven Xiu
Xiamen Oready Industry & Trade Co.,Ltd.
Our main products: backpacks, sports bags, cooler bags, shopping bags, tool bags, tactical bags, medical bags, waterproof bags, bike bags, helmet bags and more...
If you no longer wish to receive our emails, you can click unsubscribe.
Where's it going? And then ask yourself: Do I really value all this stuff? And get curious about what you're feeling when you're spending
" No, you have an expectation of what you're going to see
We talked about running
2023/9/17 17:18:51
commit 4fe4a6374c4db9ae2b849b61e84b58685dca565a upstream.
We have originally guarded fiddling with CHECKFLAGS in our arch Makefile
by checking for the CONFIG_MIPS variable, not set for targets such as
`distclean', etc. that neither include `.config' nor use the compiler.
Starting from commit 805b2e1d427a ("kbuild: include Makefile.compiler
only when compiler is needed") we have had a generic `need-compiler'
variable explicitly telling us if the compiler will be used and thus its
capabilities need to be checked and expressed in the form of compilation
flags. If this variable is not set, then `make' functions such as
`cc-option' are undefined, causing all kinds of weirdness to happen if
we expect specific results to be returned, most recently:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
messages with configurations such as `fuloong2e_defconfig' and the
`modules_install' target, which does include `.config' and yet does not
use the compiler.
Replace the check for CONFIG_MIPS with one for `need-compiler' instead,
so as to prevent the compiler from being ever called for CHECKFLAGS when
not needed.
Reported-by: Guillaume Tucker <guillaume.tucker(a)collabora.com>
Closes: https://lore.kernel.org/r/85031c0c-d981-031e-8a50-bc4fad2ddcd8@collabora.co…
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed")
Cc: stable(a)vger.kernel.org # v5.13+
Reported-by: "kernelci.org bot" <bot(a)kernelci.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
---
Hi,
This is a version of commit 4fe4a6374c4d for 6.1-stable and before,
resolving a conflict due to a change in how $(CHECKFLAGS) is set.
No functional change, just a mechanical update. Please apply.
Maciej
---
arch/mips/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index fe64ad43ba88..745d8ddd1fb3 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -350,7 +350,7 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
KBUILD_LDFLAGS += -m $(ld-emul)
-ifdef CONFIG_MIPS
+ifdef need-compiler
CHECKFLAGS += $(shell $(CC) $(KBUILD_CFLAGS) -dM -E -x c /dev/null | \
egrep -vw '__GNUC_(MINOR_|PATCHLEVEL_)?_' | \
sed -e "s/^\#define /-D'/" -e "s/ /'='/" -e "s/$$/'/" -e 's/\$$/&&/g')
--
2.20.1
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 1482650bc7ef01ebb24ec2c3a2e4d50e45da4d8c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091726-surrender-surviving-7052@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
1482650bc7ef ("drm/amd/display: update blank state on ODM changes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1482650bc7ef01ebb24ec2c3a2e4d50e45da4d8c Mon Sep 17 00:00:00 2001
From: Wenjing Liu <wenjing.liu(a)amd.com>
Date: Mon, 14 Aug 2023 17:11:16 -0400
Subject: [PATCH] drm/amd/display: update blank state on ODM changes
When we are dynamically adding new ODM slices, we didn't update
blank state, if the pipe used by new ODM slice is previously blanked,
we will continue outputting blank pixel data on that slice causing
right half of the screen showing blank image.
The previous fix was a temporary hack to directly update current state
when committing new state. This could potentially cause hw and sw
state synchronization issues and it is not permitted by dc commit
design.
Cc: stable(a)vger.kernel.org
Fixes: 7fbf451e7639 ("drm/amd/display: Reinit DPG when exiting dynamic ODM")
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index 65fa9e21ad9c..87c1ac40240b 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1106,29 +1106,6 @@ void dcn20_blank_pixel_data(
v_active,
offset);
- if (!blank && dc->debug.enable_single_display_2to1_odm_policy) {
- /* when exiting dynamic ODM need to reinit DPG state for unused pipes */
- struct pipe_ctx *old_odm_pipe = dc->current_state->res_ctx.pipe_ctx[pipe_ctx->pipe_idx].next_odm_pipe;
-
- odm_pipe = pipe_ctx->next_odm_pipe;
-
- while (old_odm_pipe) {
- if (!odm_pipe || old_odm_pipe->pipe_idx != odm_pipe->pipe_idx)
- dc->hwss.set_disp_pattern_generator(dc,
- old_odm_pipe,
- CONTROLLER_DP_TEST_PATTERN_VIDEOMODE,
- CONTROLLER_DP_COLOR_SPACE_UDEFINED,
- COLOR_DEPTH_888,
- NULL,
- 0,
- 0,
- 0);
- old_odm_pipe = old_odm_pipe->next_odm_pipe;
- if (odm_pipe)
- odm_pipe = odm_pipe->next_odm_pipe;
- }
- }
-
if (!blank)
if (stream_res->abm) {
dc->hwss.set_pipe(pipe_ctx);
@@ -1722,11 +1699,16 @@ static void dcn20_program_pipe(
struct dc_state *context)
{
struct dce_hwseq *hws = dc->hwseq;
- /* Only need to unblank on top pipe */
- if ((pipe_ctx->update_flags.bits.enable || pipe_ctx->stream->update_flags.bits.abm_level)
- && !pipe_ctx->top_pipe && !pipe_ctx->prev_odm_pipe)
- hws->funcs.blank_pixel_data(dc, pipe_ctx, !pipe_ctx->plane_state->visible);
+ /* Only need to unblank on top pipe */
+ if (resource_is_pipe_type(pipe_ctx, OTG_MASTER)) {
+ if (pipe_ctx->update_flags.bits.enable ||
+ pipe_ctx->update_flags.bits.odm ||
+ pipe_ctx->stream->update_flags.bits.abm_level)
+ hws->funcs.blank_pixel_data(dc, pipe_ctx,
+ !pipe_ctx->plane_state ||
+ !pipe_ctx->plane_state->visible);
+ }
/* Only update TG on top pipe */
if (pipe_ctx->update_flags.bits.global_sync && !pipe_ctx->top_pipe
Greg & Sasha,
can you please queue-up this upstream patch for kernel v6.5-stable ?
commit 3cec50490969afd4a76ccee441f747d869ccff77
Author: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Sat Sep 16 12:31:42 2023 -0700
vm: fix move_vma() memory accounting being off
Thanks,
Helge
On 9/16/23 23:17, Linus Torvalds wrote:
> On Sat, 16 Sept 2023 at 12:31, Linus Torvalds
> <torvalds(a)linux-foundation.org> wrote:
>>
>> Does the attached patch fix the problem?
>
> So while I didn't confirm the fix myself, I'm pretty sure that was it.
> Getting the return value wrong would cause an incorrect extra
> vm_acct_memory() call in the non-error case when VM_ACCOUNT is set
> (and mean the loss of one in the error case, but the error case never
> happens in practice).
>
> Which then causes 'vm_committed_as' to grow when it shouldn't, and
> causes exactly that "Committed_AS" in /proc/meminfo to be off.
>
> So here's the same patch, but now with a proper commit message etc.
>
> I haven't pushed it out (because it would be lovely to get a
> "Tested-by" for it, and that will make the commit ID change), but I'll
> probably do so later today, with or without confirmation, because it
> does seem to be the problem.
>
> Linus
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6bfe3959b0e7a526f5c64747801a8613f002f05a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091605-gargle-tweet-b0a7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
6bfe3959b0e7 ("btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super")
aefd7f706556 ("btrfs: promote debugging asserts to full-fledged checks in validate_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bfe3959b0e7a526f5c64747801a8613f002f05a Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Mon, 31 Jul 2023 19:16:35 +0800
Subject: [PATCH] btrfs: compare the correct fsid/metadata_uuid in
btrfs_validate_super
The function btrfs_validate_super() should verify the metadata_uuid in
the provided superblock argument. Because, all its callers expect it to
do that.
Such as in the following stacks:
write_all_supers()
sb = fs_info->super_for_commit;
btrfs_validate_write_super(.., sb)
btrfs_validate_super(.., sb, ..)
scrub_one_super()
btrfs_validate_super(.., sb, ..)
And
check_dev_super()
btrfs_validate_super(.., sb, ..)
However, it currently verifies the fs_info::super_copy::metadata_uuid
instead. Fix this using the correct metadata_uuid in the superblock
argument.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 49f405495e34..32ec651c570f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2391,13 +2391,11 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (btrfs_fs_incompat(fs_info, METADATA_UUID) &&
- memcmp(fs_info->fs_devices->metadata_uuid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_info->fs_devices->metadata_uuid, btrfs_sb_fsid_ptr(sb),
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"superblock metadata_uuid doesn't match metadata uuid of fs_devices: %pU != %pU",
- fs_info->super_copy->metadata_uuid,
- fs_info->fs_devices->metadata_uuid);
+ btrfs_sb_fsid_ptr(sb), fs_info->fs_devices->metadata_uuid);
ret = -EINVAL;
}
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091639-rippling-ought-e714@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 4490e803e1fe9fab8db5025e44e23b55df54078b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091642-bonding-pessimism-6fe2@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
4490e803e1fe ("btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4490e803e1fe9fab8db5025e44e23b55df54078b Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 26 Jul 2023 16:56:57 +0100
Subject: [PATCH] btrfs: don't start transaction when joining with
TRANS_JOIN_NOSTART
When joining a transaction with TRANS_JOIN_NOSTART, if we don't find a
running transaction we end up creating one. This goes against the purpose
of TRANS_JOIN_NOSTART which is to join a running transaction if its state
is at or below the state TRANS_STATE_COMMIT_START, otherwise return an
-ENOENT error and don't start a new transaction. So fix this to not create
a new transaction if there's no running transaction at or below that
state.
CC: stable(a)vger.kernel.org # 4.14+
Fixes: a6d155d2e363 ("Btrfs: fix deadlock between fiemap and transaction commits")
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 815f61d6b506..6a2a12593183 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -292,10 +292,11 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
spin_unlock(&fs_info->trans_lock);
/*
- * If we are ATTACH, we just want to catch the current transaction,
- * and commit it. If there is no transaction, just return ENOENT.
+ * If we are ATTACH or TRANS_JOIN_NOSTART, we just want to catch the
+ * current transaction, and commit it. If there is no transaction, just
+ * return ENOENT.
*/
- if (type == TRANS_ATTACH)
+ if (type == TRANS_ATTACH || type == TRANS_JOIN_NOSTART)
return -ENOENT;
/*
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 6bfe3959b0e7a526f5c64747801a8613f002f05a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091629-presuming-concrete-30ca@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
6bfe3959b0e7 ("btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bfe3959b0e7a526f5c64747801a8613f002f05a Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Mon, 31 Jul 2023 19:16:35 +0800
Subject: [PATCH] btrfs: compare the correct fsid/metadata_uuid in
btrfs_validate_super
The function btrfs_validate_super() should verify the metadata_uuid in
the provided superblock argument. Because, all its callers expect it to
do that.
Such as in the following stacks:
write_all_supers()
sb = fs_info->super_for_commit;
btrfs_validate_write_super(.., sb)
btrfs_validate_super(.., sb, ..)
scrub_one_super()
btrfs_validate_super(.., sb, ..)
And
check_dev_super()
btrfs_validate_super(.., sb, ..)
However, it currently verifies the fs_info::super_copy::metadata_uuid
instead. Fix this using the correct metadata_uuid in the superblock
argument.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 49f405495e34..32ec651c570f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2391,13 +2391,11 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (btrfs_fs_incompat(fs_info, METADATA_UUID) &&
- memcmp(fs_info->fs_devices->metadata_uuid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_info->fs_devices->metadata_uuid, btrfs_sb_fsid_ptr(sb),
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"superblock metadata_uuid doesn't match metadata uuid of fs_devices: %pU != %pU",
- fs_info->super_copy->metadata_uuid,
- fs_info->fs_devices->metadata_uuid);
+ btrfs_sb_fsid_ptr(sb), fs_info->fs_devices->metadata_uuid);
ret = -EINVAL;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6bfe3959b0e7a526f5c64747801a8613f002f05a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091616-alumni-depose-cd8b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6bfe3959b0e7 ("btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bfe3959b0e7a526f5c64747801a8613f002f05a Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Mon, 31 Jul 2023 19:16:35 +0800
Subject: [PATCH] btrfs: compare the correct fsid/metadata_uuid in
btrfs_validate_super
The function btrfs_validate_super() should verify the metadata_uuid in
the provided superblock argument. Because, all its callers expect it to
do that.
Such as in the following stacks:
write_all_supers()
sb = fs_info->super_for_commit;
btrfs_validate_write_super(.., sb)
btrfs_validate_super(.., sb, ..)
scrub_one_super()
btrfs_validate_super(.., sb, ..)
And
check_dev_super()
btrfs_validate_super(.., sb, ..)
However, it currently verifies the fs_info::super_copy::metadata_uuid
instead. Fix this using the correct metadata_uuid in the superblock
argument.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 49f405495e34..32ec651c570f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2391,13 +2391,11 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (btrfs_fs_incompat(fs_info, METADATA_UUID) &&
- memcmp(fs_info->fs_devices->metadata_uuid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_info->fs_devices->metadata_uuid, btrfs_sb_fsid_ptr(sb),
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"superblock metadata_uuid doesn't match metadata uuid of fs_devices: %pU != %pU",
- fs_info->super_copy->metadata_uuid,
- fs_info->fs_devices->metadata_uuid);
+ btrfs_sb_fsid_ptr(sb), fs_info->fs_devices->metadata_uuid);
ret = -EINVAL;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6bfe3959b0e7a526f5c64747801a8613f002f05a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091602-electable-stench-1002@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6bfe3959b0e7 ("btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bfe3959b0e7a526f5c64747801a8613f002f05a Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Mon, 31 Jul 2023 19:16:35 +0800
Subject: [PATCH] btrfs: compare the correct fsid/metadata_uuid in
btrfs_validate_super
The function btrfs_validate_super() should verify the metadata_uuid in
the provided superblock argument. Because, all its callers expect it to
do that.
Such as in the following stacks:
write_all_supers()
sb = fs_info->super_for_commit;
btrfs_validate_write_super(.., sb)
btrfs_validate_super(.., sb, ..)
scrub_one_super()
btrfs_validate_super(.., sb, ..)
And
check_dev_super()
btrfs_validate_super(.., sb, ..)
However, it currently verifies the fs_info::super_copy::metadata_uuid
instead. Fix this using the correct metadata_uuid in the superblock
argument.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 49f405495e34..32ec651c570f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2391,13 +2391,11 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (btrfs_fs_incompat(fs_info, METADATA_UUID) &&
- memcmp(fs_info->fs_devices->metadata_uuid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_info->fs_devices->metadata_uuid, btrfs_sb_fsid_ptr(sb),
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"superblock metadata_uuid doesn't match metadata uuid of fs_devices: %pU != %pU",
- fs_info->super_copy->metadata_uuid,
- fs_info->fs_devices->metadata_uuid);
+ btrfs_sb_fsid_ptr(sb), fs_info->fs_devices->metadata_uuid);
ret = -EINVAL;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a6496849671a5bc9218ecec25a983253b34351b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091623-platform-greedily-f4d0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a6496849671a ("btrfs: fix start transaction qgroup rsv double free")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6496849671a5bc9218ecec25a983253b34351b1 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Fri, 21 Jul 2023 09:02:07 -0700
Subject: [PATCH] btrfs: fix start transaction qgroup rsv double free
btrfs_start_transaction reserves metadata space of the PERTRANS type
before it identifies a transaction to start/join. This allows flushing
when reserving that space without a deadlock. However, it results in a
race which temporarily breaks qgroup rsv accounting.
T1 T2
start_transaction
do_stuff
start_transaction
qgroup_reserve_meta_pertrans
commit_transaction
qgroup_free_meta_all_pertrans
hit an error starting txn
goto reserve_fail
qgroup_free_meta_pertrans (already freed!)
The basic issue is that there is nothing preventing another commit from
committing before start_transaction finishes (in fact sometimes we
intentionally wait for it) so any error path that frees the reserve is
at risk of this race.
While this exact space was getting freed anyway, and it's not a huge
deal to double free it (just a warning, the free code catches this), it
can result in incorrectly freeing some other pertrans reservation in
this same reservation, which could then lead to spuriously granting
reservations we might not have the space for. Therefore, I do believe it
is worth fixing.
To fix it, use the existing prealloc->pertrans conversion mechanism.
When we first reserve the space, we reserve prealloc space and only when
we are sure we have a transaction do we convert it to pertrans. This way
any racing commits do not blow away our reservation, but we still get a
pertrans reservation that is freed when _this_ transaction gets committed.
This issue can be reproduced by running generic/269 with either qgroups
or squotas enabled via mkfs on the scratch device.
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
CC: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4bb9716ad24a..815f61d6b506 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -591,8 +591,13 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
u64 delayed_refs_bytes = 0;
qgroup_reserved = num_items * fs_info->nodesize;
- ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved,
- enforce_qgroups);
+ /*
+ * Use prealloc for now, as there might be a currently running
+ * transaction that could free this reserved space prematurely
+ * by committing.
+ */
+ ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserved,
+ enforce_qgroups, false);
if (ret)
return ERR_PTR(ret);
@@ -705,6 +710,14 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
h->reloc_reserved = reloc_reserved;
}
+ /*
+ * Now that we have found a transaction to be a part of, convert the
+ * qgroup reservation from prealloc to pertrans. A different transaction
+ * can't race in and free our pertrans out from under us.
+ */
+ if (qgroup_reserved)
+ btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
+
got_it:
if (!current->journal_info)
current->journal_info = h;
@@ -752,7 +765,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
btrfs_block_rsv_release(fs_info, &fs_info->trans_block_rsv,
num_bytes, NULL);
reserve_fail:
- btrfs_qgroup_free_meta_pertrans(root, qgroup_reserved);
+ btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
return ERR_PTR(ret);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6bfe3959b0e7a526f5c64747801a8613f002f05a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091611-earring-chowder-4da2@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
6bfe3959b0e7 ("btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super")
aefd7f706556 ("btrfs: promote debugging asserts to full-fledged checks in validate_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bfe3959b0e7a526f5c64747801a8613f002f05a Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Mon, 31 Jul 2023 19:16:35 +0800
Subject: [PATCH] btrfs: compare the correct fsid/metadata_uuid in
btrfs_validate_super
The function btrfs_validate_super() should verify the metadata_uuid in
the provided superblock argument. Because, all its callers expect it to
do that.
Such as in the following stacks:
write_all_supers()
sb = fs_info->super_for_commit;
btrfs_validate_write_super(.., sb)
btrfs_validate_super(.., sb, ..)
scrub_one_super()
btrfs_validate_super(.., sb, ..)
And
check_dev_super()
btrfs_validate_super(.., sb, ..)
However, it currently verifies the fs_info::super_copy::metadata_uuid
instead. Fix this using the correct metadata_uuid in the superblock
argument.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 49f405495e34..32ec651c570f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2391,13 +2391,11 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (btrfs_fs_incompat(fs_info, METADATA_UUID) &&
- memcmp(fs_info->fs_devices->metadata_uuid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_info->fs_devices->metadata_uuid, btrfs_sb_fsid_ptr(sb),
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"superblock metadata_uuid doesn't match metadata uuid of fs_devices: %pU != %pU",
- fs_info->super_copy->metadata_uuid,
- fs_info->fs_devices->metadata_uuid);
+ btrfs_sb_fsid_ptr(sb), fs_info->fs_devices->metadata_uuid);
ret = -EINVAL;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e28b02118b94e42be3355458a2406c6861e2dd32
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091632-graceful-actress-f418@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e28b02118b94 ("btrfs: free qgroup rsv on io failure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e28b02118b94e42be3355458a2406c6861e2dd32 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Fri, 21 Jul 2023 09:02:06 -0700
Subject: [PATCH] btrfs: free qgroup rsv on io failure
If we do a write whose bio suffers an error, we will never reclaim the
qgroup reserved space for it. We allocate the space in the write_iter
codepath, then release the reservation as we allocate the ordered
extent, but we only create a delayed ref if the ordered extent finishes.
If it has an error, we simply leak the rsv. This is apparent in running
any error injecting (dmerror) fstests like btrfs/146 or btrfs/160. Such
tests fail due to dmesg on umount complaining about the leaked qgroup
data space.
When we clean up other aspects of space on failed ordered_extents, also
free the qgroup rsv.
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
CC: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5508597be614..8e53df3516c2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3358,6 +3358,13 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent)
btrfs_free_reserved_extent(fs_info,
ordered_extent->disk_bytenr,
ordered_extent->disk_num_bytes, 1);
+ /*
+ * Actually free the qgroup rsv which was released when
+ * the ordered extent was created.
+ */
+ btrfs_qgroup_free_refroot(fs_info, inode->root->root_key.objectid,
+ ordered_extent->qgroup_rsv,
+ BTRFS_QGROUP_RSV_DATA);
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a6496849671a5bc9218ecec25a983253b34351b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091614-retry-numbing-c815@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a6496849671a ("btrfs: fix start transaction qgroup rsv double free")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6496849671a5bc9218ecec25a983253b34351b1 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Fri, 21 Jul 2023 09:02:07 -0700
Subject: [PATCH] btrfs: fix start transaction qgroup rsv double free
btrfs_start_transaction reserves metadata space of the PERTRANS type
before it identifies a transaction to start/join. This allows flushing
when reserving that space without a deadlock. However, it results in a
race which temporarily breaks qgroup rsv accounting.
T1 T2
start_transaction
do_stuff
start_transaction
qgroup_reserve_meta_pertrans
commit_transaction
qgroup_free_meta_all_pertrans
hit an error starting txn
goto reserve_fail
qgroup_free_meta_pertrans (already freed!)
The basic issue is that there is nothing preventing another commit from
committing before start_transaction finishes (in fact sometimes we
intentionally wait for it) so any error path that frees the reserve is
at risk of this race.
While this exact space was getting freed anyway, and it's not a huge
deal to double free it (just a warning, the free code catches this), it
can result in incorrectly freeing some other pertrans reservation in
this same reservation, which could then lead to spuriously granting
reservations we might not have the space for. Therefore, I do believe it
is worth fixing.
To fix it, use the existing prealloc->pertrans conversion mechanism.
When we first reserve the space, we reserve prealloc space and only when
we are sure we have a transaction do we convert it to pertrans. This way
any racing commits do not blow away our reservation, but we still get a
pertrans reservation that is freed when _this_ transaction gets committed.
This issue can be reproduced by running generic/269 with either qgroups
or squotas enabled via mkfs on the scratch device.
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
CC: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4bb9716ad24a..815f61d6b506 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -591,8 +591,13 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
u64 delayed_refs_bytes = 0;
qgroup_reserved = num_items * fs_info->nodesize;
- ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved,
- enforce_qgroups);
+ /*
+ * Use prealloc for now, as there might be a currently running
+ * transaction that could free this reserved space prematurely
+ * by committing.
+ */
+ ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserved,
+ enforce_qgroups, false);
if (ret)
return ERR_PTR(ret);
@@ -705,6 +710,14 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
h->reloc_reserved = reloc_reserved;
}
+ /*
+ * Now that we have found a transaction to be a part of, convert the
+ * qgroup reservation from prealloc to pertrans. A different transaction
+ * can't race in and free our pertrans out from under us.
+ */
+ if (qgroup_reserved)
+ btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
+
got_it:
if (!current->journal_info)
current->journal_info = h;
@@ -752,7 +765,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
btrfs_block_rsv_release(fs_info, &fs_info->trans_block_rsv,
num_bytes, NULL);
reserve_fail:
- btrfs_qgroup_free_meta_pertrans(root, qgroup_reserved);
+ btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
return ERR_PTR(ret);
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 35588314e963938dfdcdb792c9170108399377d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091615-moodiness-twelve-b779@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
35588314e963 ("drm/amdgpu: fix amdgpu_cs_p1_user_fence")
ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3")
8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3")
f2cd6b26922e ("drm/amdkfd: fix stack size in svm_range_validate_and_map")
3af470cbcc9f ("drm/amdkfd: Fix an issue at userptr buffer validation process.")
c103a23f2f29 ("drm/amd: Convert amdgpu to use suballocation helper.")
aebd8f0c6f82 ("Merge v6.2-rc6 into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 35588314e963938dfdcdb792c9170108399377d6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig(a)amd.com>
Date: Fri, 25 Aug 2023 15:28:00 +0200
Subject: [PATCH] drm/amdgpu: fix amdgpu_cs_p1_user_fence
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.
The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 49dd9aa8da70..efdb1c48f431 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -127,7 +127,6 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
{
struct drm_gem_object *gobj;
unsigned long size;
- int r;
gobj = drm_gem_object_lookup(p->filp, data->handle);
if (gobj == NULL)
@@ -137,23 +136,14 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
drm_gem_object_put(gobj);
size = amdgpu_bo_size(p->uf_bo);
- if (size != PAGE_SIZE || (data->offset + 8) > size) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (size != PAGE_SIZE || data->offset > (size - 8))
+ return -EINVAL;
- if (amdgpu_ttm_tt_get_usermm(p->uf_bo->tbo.ttm)) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (amdgpu_ttm_tt_get_usermm(p->uf_bo->tbo.ttm))
+ return -EINVAL;
*offset = data->offset;
-
return 0;
-
-error_unref:
- amdgpu_bo_unref(&p->uf_bo);
- return r;
}
static int amdgpu_cs_p1_bo_handles(struct amdgpu_cs_parser *p,
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 35588314e963938dfdcdb792c9170108399377d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091615-wick-punch-6c68@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
35588314e963 ("drm/amdgpu: fix amdgpu_cs_p1_user_fence")
ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3")
8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 35588314e963938dfdcdb792c9170108399377d6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig(a)amd.com>
Date: Fri, 25 Aug 2023 15:28:00 +0200
Subject: [PATCH] drm/amdgpu: fix amdgpu_cs_p1_user_fence
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.
The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 49dd9aa8da70..efdb1c48f431 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -127,7 +127,6 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
{
struct drm_gem_object *gobj;
unsigned long size;
- int r;
gobj = drm_gem_object_lookup(p->filp, data->handle);
if (gobj == NULL)
@@ -137,23 +136,14 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
drm_gem_object_put(gobj);
size = amdgpu_bo_size(p->uf_bo);
- if (size != PAGE_SIZE || (data->offset + 8) > size) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (size != PAGE_SIZE || data->offset > (size - 8))
+ return -EINVAL;
- if (amdgpu_ttm_tt_get_usermm(p->uf_bo->tbo.ttm)) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (amdgpu_ttm_tt_get_usermm(p->uf_bo->tbo.ttm))
+ return -EINVAL;
*offset = data->offset;
-
return 0;
-
-error_unref:
- amdgpu_bo_unref(&p->uf_bo);
- return r;
}
static int amdgpu_cs_p1_bo_handles(struct amdgpu_cs_parser *p,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x f6b8436bede3e80226e8b2100279c4450c73806a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091606-lisp-unwrapped-1868@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
f6b8436bede3 ("perf hists browser: Fix the number of entries for 'e' key")
e6d6abfc447a ("perf report/top: Make 'e' visible in the help and make it toggle showing callchains")
d10ec006dcd7 ("perf hists browser: Allow passing an initial hotkey")
bdc633fec50b ("perf report/top: Improve toggle callchain menu option")
d5a599d9890f ("perf report/top: Add menu entry for toggling callchain expansion")
9218a9132f83 ("perf report/top: Make ENTER consistently bring up menu")
5cb456af99f5 ("perf util: Move block TUI function to ui browsers")
7fa46cbf20d3 ("perf report: Sort by sampled cycles percent per block for tui")
6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
b65a7d372b1a ("perf hist: Support block formats with compare/sort/display")
7841f40aed93 ("perf hist: Count the total cycles of all samples")
6041441870ab ("perf block: Cleanup and refactor block info functions")
cebf7d51a6c3 ("perf diff: Report noisy for cycles diff")
6ef81c55a2b6 ("perf session: Return error code for perf_session__new() function on failure")
fb71c86cc804 ("perf tools: Remove util.h from where it is not needed")
4a903c2e1514 ("perf tools: Remove debug.h from places where it is not needed")
b22bb139dcb3 ("perf debug: No need to include ui/util.h")
d3300a3c4e76 ("perf symbols: Move mem_info and branch_info out of symbol.h")
f2a39fe84901 ("perf auxtrace: Uninline functions that touch perf_session")
fa0d98462fae ("perf tools: Remove needless evlist.h include directives")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f6b8436bede3e80226e8b2100279c4450c73806a Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Mon, 31 Jul 2023 02:49:33 -0700
Subject: [PATCH] perf hists browser: Fix the number of entries for 'e' key
The 'e' key is to toggle expand/collapse the selected entry only. But
the current code has a bug that it only increases the number of entries
by 1 in the hierarchy mode so users cannot move under the current entry
after the key stroke. This is due to a wrong assumption in the
hist_entry__set_folding().
The commit b33f922651011eff ("perf hists browser: Put hist_entry folding
logic into single function") factored out the code, but actually it
should be handled separately. The hist_browser__set_folding() is to
update fold state for each entry so it needs to traverse all (child)
entries regardless of the current fold state. So it increases the
number of entries by 1.
But the hist_entry__set_folding() only cares the currently selected
entry and its all children. So it should count all unfolded child
entries. This code is implemented in hist_browser__toggle_fold()
already so we can just call it.
Fixes: b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230731094934.1616495-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index d8b88f10a48d..70db5a717905 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -407,11 +407,6 @@ static bool hist_browser__selection_has_children(struct hist_browser *browser)
return container_of(ms, struct callchain_list, ms)->has_children;
}
-static bool hist_browser__he_selection_unfolded(struct hist_browser *browser)
-{
- return browser->he_selection ? browser->he_selection->unfolded : false;
-}
-
static bool hist_browser__selection_unfolded(struct hist_browser *browser)
{
struct hist_entry *he = browser->he_selection;
@@ -584,8 +579,8 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
return n;
}
-static void __hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *hb, bool unfold)
+static void hist_entry__set_folding(struct hist_entry *he,
+ struct hist_browser *hb, bool unfold)
{
hist_entry__init_have_children(he);
he->unfolded = unfold ? he->has_children : false;
@@ -603,34 +598,12 @@ static void __hist_entry__set_folding(struct hist_entry *he,
he->nr_rows = 0;
}
-static void hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *browser, bool unfold)
-{
- double percent;
-
- percent = hist_entry__get_percent_limit(he);
- if (he->filtered || percent < browser->min_pcnt)
- return;
-
- __hist_entry__set_folding(he, browser, unfold);
-
- if (!he->depth || unfold)
- browser->nr_hierarchy_entries++;
- if (he->leaf)
- browser->nr_callchain_rows += he->nr_rows;
- else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
- browser->nr_hierarchy_entries++;
- he->has_no_entry = true;
- he->nr_rows = 1;
- } else
- he->has_no_entry = false;
-}
-
static void
__hist_browser__set_folding(struct hist_browser *browser, bool unfold)
{
struct rb_node *nd;
struct hist_entry *he;
+ double percent;
nd = rb_first_cached(&browser->hists->entries);
while (nd) {
@@ -640,6 +613,21 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD);
hist_entry__set_folding(he, browser, unfold);
+
+ percent = hist_entry__get_percent_limit(he);
+ if (he->filtered || percent < browser->min_pcnt)
+ continue;
+
+ if (!he->depth || unfold)
+ browser->nr_hierarchy_entries++;
+ if (he->leaf)
+ browser->nr_callchain_rows += he->nr_rows;
+ else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
+ browser->nr_hierarchy_entries++;
+ he->has_no_entry = true;
+ he->nr_rows = 1;
+ } else
+ he->has_no_entry = false;
}
}
@@ -659,8 +647,10 @@ static void hist_browser__set_folding_selected(struct hist_browser *browser, boo
if (!browser->he_selection)
return;
- hist_entry__set_folding(browser->he_selection, browser, unfold);
- browser->b.nr_entries = hist_browser__nr_entries(browser);
+ if (unfold == browser->he_selection->unfolded)
+ return;
+
+ hist_browser__toggle_fold(browser);
}
static void ui_browser__warn_lost_events(struct ui_browser *browser)
@@ -732,8 +722,8 @@ static int hist_browser__handle_hotkey(struct hist_browser *browser, bool warn_l
hist_browser__set_folding(browser, true);
break;
case 'e':
- /* Expand the selected entry. */
- hist_browser__set_folding_selected(browser, !hist_browser__he_selection_unfolded(browser));
+ /* Toggle expand/collapse the selected entry. */
+ hist_browser__toggle_fold(browser);
break;
case 'H':
browser->show_headers = !browser->show_headers;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f6b8436bede3e80226e8b2100279c4450c73806a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091604-finicky-penknife-7537@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f6b8436bede3 ("perf hists browser: Fix the number of entries for 'e' key")
e6d6abfc447a ("perf report/top: Make 'e' visible in the help and make it toggle showing callchains")
d10ec006dcd7 ("perf hists browser: Allow passing an initial hotkey")
bdc633fec50b ("perf report/top: Improve toggle callchain menu option")
d5a599d9890f ("perf report/top: Add menu entry for toggling callchain expansion")
9218a9132f83 ("perf report/top: Make ENTER consistently bring up menu")
5cb456af99f5 ("perf util: Move block TUI function to ui browsers")
7fa46cbf20d3 ("perf report: Sort by sampled cycles percent per block for tui")
6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
b65a7d372b1a ("perf hist: Support block formats with compare/sort/display")
7841f40aed93 ("perf hist: Count the total cycles of all samples")
6041441870ab ("perf block: Cleanup and refactor block info functions")
cebf7d51a6c3 ("perf diff: Report noisy for cycles diff")
6ef81c55a2b6 ("perf session: Return error code for perf_session__new() function on failure")
fb71c86cc804 ("perf tools: Remove util.h from where it is not needed")
4a903c2e1514 ("perf tools: Remove debug.h from places where it is not needed")
b22bb139dcb3 ("perf debug: No need to include ui/util.h")
d3300a3c4e76 ("perf symbols: Move mem_info and branch_info out of symbol.h")
f2a39fe84901 ("perf auxtrace: Uninline functions that touch perf_session")
fa0d98462fae ("perf tools: Remove needless evlist.h include directives")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f6b8436bede3e80226e8b2100279c4450c73806a Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Mon, 31 Jul 2023 02:49:33 -0700
Subject: [PATCH] perf hists browser: Fix the number of entries for 'e' key
The 'e' key is to toggle expand/collapse the selected entry only. But
the current code has a bug that it only increases the number of entries
by 1 in the hierarchy mode so users cannot move under the current entry
after the key stroke. This is due to a wrong assumption in the
hist_entry__set_folding().
The commit b33f922651011eff ("perf hists browser: Put hist_entry folding
logic into single function") factored out the code, but actually it
should be handled separately. The hist_browser__set_folding() is to
update fold state for each entry so it needs to traverse all (child)
entries regardless of the current fold state. So it increases the
number of entries by 1.
But the hist_entry__set_folding() only cares the currently selected
entry and its all children. So it should count all unfolded child
entries. This code is implemented in hist_browser__toggle_fold()
already so we can just call it.
Fixes: b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230731094934.1616495-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index d8b88f10a48d..70db5a717905 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -407,11 +407,6 @@ static bool hist_browser__selection_has_children(struct hist_browser *browser)
return container_of(ms, struct callchain_list, ms)->has_children;
}
-static bool hist_browser__he_selection_unfolded(struct hist_browser *browser)
-{
- return browser->he_selection ? browser->he_selection->unfolded : false;
-}
-
static bool hist_browser__selection_unfolded(struct hist_browser *browser)
{
struct hist_entry *he = browser->he_selection;
@@ -584,8 +579,8 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
return n;
}
-static void __hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *hb, bool unfold)
+static void hist_entry__set_folding(struct hist_entry *he,
+ struct hist_browser *hb, bool unfold)
{
hist_entry__init_have_children(he);
he->unfolded = unfold ? he->has_children : false;
@@ -603,34 +598,12 @@ static void __hist_entry__set_folding(struct hist_entry *he,
he->nr_rows = 0;
}
-static void hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *browser, bool unfold)
-{
- double percent;
-
- percent = hist_entry__get_percent_limit(he);
- if (he->filtered || percent < browser->min_pcnt)
- return;
-
- __hist_entry__set_folding(he, browser, unfold);
-
- if (!he->depth || unfold)
- browser->nr_hierarchy_entries++;
- if (he->leaf)
- browser->nr_callchain_rows += he->nr_rows;
- else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
- browser->nr_hierarchy_entries++;
- he->has_no_entry = true;
- he->nr_rows = 1;
- } else
- he->has_no_entry = false;
-}
-
static void
__hist_browser__set_folding(struct hist_browser *browser, bool unfold)
{
struct rb_node *nd;
struct hist_entry *he;
+ double percent;
nd = rb_first_cached(&browser->hists->entries);
while (nd) {
@@ -640,6 +613,21 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD);
hist_entry__set_folding(he, browser, unfold);
+
+ percent = hist_entry__get_percent_limit(he);
+ if (he->filtered || percent < browser->min_pcnt)
+ continue;
+
+ if (!he->depth || unfold)
+ browser->nr_hierarchy_entries++;
+ if (he->leaf)
+ browser->nr_callchain_rows += he->nr_rows;
+ else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
+ browser->nr_hierarchy_entries++;
+ he->has_no_entry = true;
+ he->nr_rows = 1;
+ } else
+ he->has_no_entry = false;
}
}
@@ -659,8 +647,10 @@ static void hist_browser__set_folding_selected(struct hist_browser *browser, boo
if (!browser->he_selection)
return;
- hist_entry__set_folding(browser->he_selection, browser, unfold);
- browser->b.nr_entries = hist_browser__nr_entries(browser);
+ if (unfold == browser->he_selection->unfolded)
+ return;
+
+ hist_browser__toggle_fold(browser);
}
static void ui_browser__warn_lost_events(struct ui_browser *browser)
@@ -732,8 +722,8 @@ static int hist_browser__handle_hotkey(struct hist_browser *browser, bool warn_l
hist_browser__set_folding(browser, true);
break;
case 'e':
- /* Expand the selected entry. */
- hist_browser__set_folding_selected(browser, !hist_browser__he_selection_unfolded(browser));
+ /* Toggle expand/collapse the selected entry. */
+ hist_browser__toggle_fold(browser);
break;
case 'H':
browser->show_headers = !browser->show_headers;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f6b8436bede3e80226e8b2100279c4450c73806a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091603-cabbage-tipoff-5d59@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
f6b8436bede3 ("perf hists browser: Fix the number of entries for 'e' key")
e6d6abfc447a ("perf report/top: Make 'e' visible in the help and make it toggle showing callchains")
d10ec006dcd7 ("perf hists browser: Allow passing an initial hotkey")
bdc633fec50b ("perf report/top: Improve toggle callchain menu option")
d5a599d9890f ("perf report/top: Add menu entry for toggling callchain expansion")
9218a9132f83 ("perf report/top: Make ENTER consistently bring up menu")
5cb456af99f5 ("perf util: Move block TUI function to ui browsers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f6b8436bede3e80226e8b2100279c4450c73806a Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Mon, 31 Jul 2023 02:49:33 -0700
Subject: [PATCH] perf hists browser: Fix the number of entries for 'e' key
The 'e' key is to toggle expand/collapse the selected entry only. But
the current code has a bug that it only increases the number of entries
by 1 in the hierarchy mode so users cannot move under the current entry
after the key stroke. This is due to a wrong assumption in the
hist_entry__set_folding().
The commit b33f922651011eff ("perf hists browser: Put hist_entry folding
logic into single function") factored out the code, but actually it
should be handled separately. The hist_browser__set_folding() is to
update fold state for each entry so it needs to traverse all (child)
entries regardless of the current fold state. So it increases the
number of entries by 1.
But the hist_entry__set_folding() only cares the currently selected
entry and its all children. So it should count all unfolded child
entries. This code is implemented in hist_browser__toggle_fold()
already so we can just call it.
Fixes: b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230731094934.1616495-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index d8b88f10a48d..70db5a717905 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -407,11 +407,6 @@ static bool hist_browser__selection_has_children(struct hist_browser *browser)
return container_of(ms, struct callchain_list, ms)->has_children;
}
-static bool hist_browser__he_selection_unfolded(struct hist_browser *browser)
-{
- return browser->he_selection ? browser->he_selection->unfolded : false;
-}
-
static bool hist_browser__selection_unfolded(struct hist_browser *browser)
{
struct hist_entry *he = browser->he_selection;
@@ -584,8 +579,8 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
return n;
}
-static void __hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *hb, bool unfold)
+static void hist_entry__set_folding(struct hist_entry *he,
+ struct hist_browser *hb, bool unfold)
{
hist_entry__init_have_children(he);
he->unfolded = unfold ? he->has_children : false;
@@ -603,34 +598,12 @@ static void __hist_entry__set_folding(struct hist_entry *he,
he->nr_rows = 0;
}
-static void hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *browser, bool unfold)
-{
- double percent;
-
- percent = hist_entry__get_percent_limit(he);
- if (he->filtered || percent < browser->min_pcnt)
- return;
-
- __hist_entry__set_folding(he, browser, unfold);
-
- if (!he->depth || unfold)
- browser->nr_hierarchy_entries++;
- if (he->leaf)
- browser->nr_callchain_rows += he->nr_rows;
- else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
- browser->nr_hierarchy_entries++;
- he->has_no_entry = true;
- he->nr_rows = 1;
- } else
- he->has_no_entry = false;
-}
-
static void
__hist_browser__set_folding(struct hist_browser *browser, bool unfold)
{
struct rb_node *nd;
struct hist_entry *he;
+ double percent;
nd = rb_first_cached(&browser->hists->entries);
while (nd) {
@@ -640,6 +613,21 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD);
hist_entry__set_folding(he, browser, unfold);
+
+ percent = hist_entry__get_percent_limit(he);
+ if (he->filtered || percent < browser->min_pcnt)
+ continue;
+
+ if (!he->depth || unfold)
+ browser->nr_hierarchy_entries++;
+ if (he->leaf)
+ browser->nr_callchain_rows += he->nr_rows;
+ else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
+ browser->nr_hierarchy_entries++;
+ he->has_no_entry = true;
+ he->nr_rows = 1;
+ } else
+ he->has_no_entry = false;
}
}
@@ -659,8 +647,10 @@ static void hist_browser__set_folding_selected(struct hist_browser *browser, boo
if (!browser->he_selection)
return;
- hist_entry__set_folding(browser->he_selection, browser, unfold);
- browser->b.nr_entries = hist_browser__nr_entries(browser);
+ if (unfold == browser->he_selection->unfolded)
+ return;
+
+ hist_browser__toggle_fold(browser);
}
static void ui_browser__warn_lost_events(struct ui_browser *browser)
@@ -732,8 +722,8 @@ static int hist_browser__handle_hotkey(struct hist_browser *browser, bool warn_l
hist_browser__set_folding(browser, true);
break;
case 'e':
- /* Expand the selected entry. */
- hist_browser__set_folding_selected(browser, !hist_browser__he_selection_unfolded(browser));
+ /* Toggle expand/collapse the selected entry. */
+ hist_browser__toggle_fold(browser);
break;
case 'H':
browser->show_headers = !browser->show_headers;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f6b8436bede3e80226e8b2100279c4450c73806a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091601-nearest-overspend-e08c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
f6b8436bede3 ("perf hists browser: Fix the number of entries for 'e' key")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f6b8436bede3e80226e8b2100279c4450c73806a Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Mon, 31 Jul 2023 02:49:33 -0700
Subject: [PATCH] perf hists browser: Fix the number of entries for 'e' key
The 'e' key is to toggle expand/collapse the selected entry only. But
the current code has a bug that it only increases the number of entries
by 1 in the hierarchy mode so users cannot move under the current entry
after the key stroke. This is due to a wrong assumption in the
hist_entry__set_folding().
The commit b33f922651011eff ("perf hists browser: Put hist_entry folding
logic into single function") factored out the code, but actually it
should be handled separately. The hist_browser__set_folding() is to
update fold state for each entry so it needs to traverse all (child)
entries regardless of the current fold state. So it increases the
number of entries by 1.
But the hist_entry__set_folding() only cares the currently selected
entry and its all children. So it should count all unfolded child
entries. This code is implemented in hist_browser__toggle_fold()
already so we can just call it.
Fixes: b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230731094934.1616495-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index d8b88f10a48d..70db5a717905 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -407,11 +407,6 @@ static bool hist_browser__selection_has_children(struct hist_browser *browser)
return container_of(ms, struct callchain_list, ms)->has_children;
}
-static bool hist_browser__he_selection_unfolded(struct hist_browser *browser)
-{
- return browser->he_selection ? browser->he_selection->unfolded : false;
-}
-
static bool hist_browser__selection_unfolded(struct hist_browser *browser)
{
struct hist_entry *he = browser->he_selection;
@@ -584,8 +579,8 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
return n;
}
-static void __hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *hb, bool unfold)
+static void hist_entry__set_folding(struct hist_entry *he,
+ struct hist_browser *hb, bool unfold)
{
hist_entry__init_have_children(he);
he->unfolded = unfold ? he->has_children : false;
@@ -603,34 +598,12 @@ static void __hist_entry__set_folding(struct hist_entry *he,
he->nr_rows = 0;
}
-static void hist_entry__set_folding(struct hist_entry *he,
- struct hist_browser *browser, bool unfold)
-{
- double percent;
-
- percent = hist_entry__get_percent_limit(he);
- if (he->filtered || percent < browser->min_pcnt)
- return;
-
- __hist_entry__set_folding(he, browser, unfold);
-
- if (!he->depth || unfold)
- browser->nr_hierarchy_entries++;
- if (he->leaf)
- browser->nr_callchain_rows += he->nr_rows;
- else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
- browser->nr_hierarchy_entries++;
- he->has_no_entry = true;
- he->nr_rows = 1;
- } else
- he->has_no_entry = false;
-}
-
static void
__hist_browser__set_folding(struct hist_browser *browser, bool unfold)
{
struct rb_node *nd;
struct hist_entry *he;
+ double percent;
nd = rb_first_cached(&browser->hists->entries);
while (nd) {
@@ -640,6 +613,21 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD);
hist_entry__set_folding(he, browser, unfold);
+
+ percent = hist_entry__get_percent_limit(he);
+ if (he->filtered || percent < browser->min_pcnt)
+ continue;
+
+ if (!he->depth || unfold)
+ browser->nr_hierarchy_entries++;
+ if (he->leaf)
+ browser->nr_callchain_rows += he->nr_rows;
+ else if (unfold && !hist_entry__has_hierarchy_children(he, browser->min_pcnt)) {
+ browser->nr_hierarchy_entries++;
+ he->has_no_entry = true;
+ he->nr_rows = 1;
+ } else
+ he->has_no_entry = false;
}
}
@@ -659,8 +647,10 @@ static void hist_browser__set_folding_selected(struct hist_browser *browser, boo
if (!browser->he_selection)
return;
- hist_entry__set_folding(browser->he_selection, browser, unfold);
- browser->b.nr_entries = hist_browser__nr_entries(browser);
+ if (unfold == browser->he_selection->unfolded)
+ return;
+
+ hist_browser__toggle_fold(browser);
}
static void ui_browser__warn_lost_events(struct ui_browser *browser)
@@ -732,8 +722,8 @@ static int hist_browser__handle_hotkey(struct hist_browser *browser, bool warn_l
hist_browser__set_folding(browser, true);
break;
case 'e':
- /* Expand the selected entry. */
- hist_browser__set_folding_selected(browser, !hist_browser__he_selection_unfolded(browser));
+ /* Toggle expand/collapse the selected entry. */
+ hist_browser__toggle_fold(browser);
break;
case 'H':
browser->show_headers = !browser->show_headers;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 9bf63282ea77a531ea58acb42fb3f40d2d1e4497
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091630-citable-both-38a7@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
9bf63282ea77 ("perf tools: Handle old data in PERF_RECORD_ATTR")
b0031c22819a ("libperf: Add perf_evlist__id_add() function")
ff47d86a0d9b ("libperf: Add perf_evlist__read_format() function")
515dbe48f620 ("libperf: Add perf_evlist__first()/last() functions")
70c20369ee95 ("libperf: Add perf_evsel__alloc_id/perf_evsel__free_id functions")
1d5af02d7a92 ("libperf: Move 'heads' from 'struct evlist' to 'struct perf_evlist'")
e7eb9002d451 ("libperf: Move 'ids' from 'struct evsel' to 'struct perf_evsel'")
deaf321913a7 ("libperf: Move 'id' from 'struct evsel' to 'struct perf_evsel'")
8cd36f3ef492 ("libperf: Move 'sample_id' from 'struct evsel' to 'struct perf_evsel'")
40cb2d5141bd ("libperf: Move 'pollfd' from 'struct evlist' to 'struct perf_evlist'")
f6fa43757793 ("libperf: Move 'mmap_len' from 'struct evlist' to 'struct perf_evlist'")
c976ee11a0e1 ("libperf: Move 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist'")
648b5af3f3ae ("libperf: Move 'system_wide' from 'struct evsel' to 'struct perf_evsel'")
2cf07b294a60 ("libperf: Add 'fd' to struct perf_mmap")
4fd0cef2c7b6 ("libperf: Add 'mask' to struct perf_mmap")
547740f7b357 ("libperf: Add perf_mmap struct")
e0fcfb086fbb ("perf evlist: Adopt backwards ring buffer state enum")
9521b5f2d9d3 ("perf tools: Rename perf_evlist__mmap() to evlist__mmap()")
a583053299c1 ("perf tools: Rename 'struct perf_mmap' to 'struct mmap'")
ce095c9ac293 ("perf test: Fix spelling mistake "allos" -> "allocate"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9bf63282ea77a531ea58acb42fb3f40d2d1e4497 Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Fri, 25 Aug 2023 08:25:49 -0700
Subject: [PATCH] perf tools: Handle old data in PERF_RECORD_ATTR
The PERF_RECORD_ATTR is used for a pipe mode to describe an event with
attribute and IDs. The ID table comes after the attr and it calculate
size of the table using the total record size and the attr size.
n_ids = (total_record_size - end_of_the_attr_field) / sizeof(u64)
This is fine for most use cases, but sometimes it saves the pipe output
in a file and then process it later. And it becomes a problem if there
is a change in attr size between the record and report.
$ perf record -o- > perf-pipe.data # old version
$ perf report -i- < perf-pipe.data # new version
For example, if the attr size is 128 and it has 4 IDs, then it would
save them in 168 byte like below:
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
128 byte: perf event attr { .size = 128, ... },
32 byte: event IDs [] = { 1234, 1235, 1236, 1237 },
But when report later, it thinks the attr size is 136 then it only read
the last 3 entries as ID.
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
136 byte: perf event attr { .size = 136, ... },
24 byte: event IDs [] = { 1235, 1236, 1237 }, // 1234 is missing
So it should use the recorded version of the attr. The attr has the
size field already then it should honor the size when reading data.
Fixes: 2c46dbb517a10b18 ("perf: Convert perf header attrs into attr events")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230825152552.112913-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 99c61cdffa54..e13f4bab7c49 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -4378,7 +4378,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct evlist **pevlist)
{
- u32 i, ids, n_ids;
+ u32 i, n_ids;
+ u64 *ids;
struct evsel *evsel;
struct evlist *evlist = *pevlist;
@@ -4394,9 +4395,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
evlist__add(evlist, evsel);
- ids = event->header.size;
- ids -= (void *)&event->attr.id - (void *)event;
- n_ids = ids / sizeof(u64);
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids = n_ids / sizeof(u64);
/*
* We don't have the cpu and thread maps on the header, so
* for allocating the perf_sample_id table we fake 1 cpu and
@@ -4405,8 +4405,9 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
if (perf_evsel__alloc_id(&evsel->core, 1, n_ids))
return -ENOMEM;
+ ids = (void *)&event->attr.attr + event->attr.attr.size;
for (i = 0; i < n_ids; i++) {
- perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, event->attr.id[i]);
+ perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, ids[i]);
}
return 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 9bf63282ea77a531ea58acb42fb3f40d2d1e4497
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091628-endurance-silencer-f1f0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
9bf63282ea77 ("perf tools: Handle old data in PERF_RECORD_ATTR")
b0031c22819a ("libperf: Add perf_evlist__id_add() function")
ff47d86a0d9b ("libperf: Add perf_evlist__read_format() function")
515dbe48f620 ("libperf: Add perf_evlist__first()/last() functions")
70c20369ee95 ("libperf: Add perf_evsel__alloc_id/perf_evsel__free_id functions")
1d5af02d7a92 ("libperf: Move 'heads' from 'struct evlist' to 'struct perf_evlist'")
e7eb9002d451 ("libperf: Move 'ids' from 'struct evsel' to 'struct perf_evsel'")
deaf321913a7 ("libperf: Move 'id' from 'struct evsel' to 'struct perf_evsel'")
8cd36f3ef492 ("libperf: Move 'sample_id' from 'struct evsel' to 'struct perf_evsel'")
40cb2d5141bd ("libperf: Move 'pollfd' from 'struct evlist' to 'struct perf_evlist'")
f6fa43757793 ("libperf: Move 'mmap_len' from 'struct evlist' to 'struct perf_evlist'")
c976ee11a0e1 ("libperf: Move 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist'")
648b5af3f3ae ("libperf: Move 'system_wide' from 'struct evsel' to 'struct perf_evsel'")
2cf07b294a60 ("libperf: Add 'fd' to struct perf_mmap")
4fd0cef2c7b6 ("libperf: Add 'mask' to struct perf_mmap")
547740f7b357 ("libperf: Add perf_mmap struct")
e0fcfb086fbb ("perf evlist: Adopt backwards ring buffer state enum")
9521b5f2d9d3 ("perf tools: Rename perf_evlist__mmap() to evlist__mmap()")
a583053299c1 ("perf tools: Rename 'struct perf_mmap' to 'struct mmap'")
ce095c9ac293 ("perf test: Fix spelling mistake "allos" -> "allocate"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9bf63282ea77a531ea58acb42fb3f40d2d1e4497 Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Fri, 25 Aug 2023 08:25:49 -0700
Subject: [PATCH] perf tools: Handle old data in PERF_RECORD_ATTR
The PERF_RECORD_ATTR is used for a pipe mode to describe an event with
attribute and IDs. The ID table comes after the attr and it calculate
size of the table using the total record size and the attr size.
n_ids = (total_record_size - end_of_the_attr_field) / sizeof(u64)
This is fine for most use cases, but sometimes it saves the pipe output
in a file and then process it later. And it becomes a problem if there
is a change in attr size between the record and report.
$ perf record -o- > perf-pipe.data # old version
$ perf report -i- < perf-pipe.data # new version
For example, if the attr size is 128 and it has 4 IDs, then it would
save them in 168 byte like below:
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
128 byte: perf event attr { .size = 128, ... },
32 byte: event IDs [] = { 1234, 1235, 1236, 1237 },
But when report later, it thinks the attr size is 136 then it only read
the last 3 entries as ID.
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
136 byte: perf event attr { .size = 136, ... },
24 byte: event IDs [] = { 1235, 1236, 1237 }, // 1234 is missing
So it should use the recorded version of the attr. The attr has the
size field already then it should honor the size when reading data.
Fixes: 2c46dbb517a10b18 ("perf: Convert perf header attrs into attr events")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230825152552.112913-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 99c61cdffa54..e13f4bab7c49 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -4378,7 +4378,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct evlist **pevlist)
{
- u32 i, ids, n_ids;
+ u32 i, n_ids;
+ u64 *ids;
struct evsel *evsel;
struct evlist *evlist = *pevlist;
@@ -4394,9 +4395,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
evlist__add(evlist, evsel);
- ids = event->header.size;
- ids -= (void *)&event->attr.id - (void *)event;
- n_ids = ids / sizeof(u64);
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids = n_ids / sizeof(u64);
/*
* We don't have the cpu and thread maps on the header, so
* for allocating the perf_sample_id table we fake 1 cpu and
@@ -4405,8 +4405,9 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
if (perf_evsel__alloc_id(&evsel->core, 1, n_ids))
return -ENOMEM;
+ ids = (void *)&event->attr.attr + event->attr.attr.size;
for (i = 0; i < n_ids; i++) {
- perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, event->attr.id[i]);
+ perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, ids[i]);
}
return 0;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 68ca249c964f520af7f8763e22f12bd26b57b870
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091607-discuss-pointing-8b33@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
68ca249c964f ("perf test shell stat_bpf_counters: Fix test on Intel")
c8b947642d23 ("perf test: Remove bash construct from stat_bpf_counters.sh test")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 68ca249c964f520af7f8763e22f12bd26b57b870 Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Fri, 25 Aug 2023 09:41:51 -0700
Subject: [PATCH] perf test shell stat_bpf_counters: Fix test on Intel
As of now, bpf counters (bperf) don't support event groups. But the
default perf stat includes topdown metrics if supported (on recent Intel
machines) which require groups. That makes perf stat exiting.
$ sudo perf stat --bpf-counter true
bpf managed perf events do not yet support groups.
Actually the test explicitly uses cycles event only, but it missed to
pass the option when it checks the availability of the command.
Fixes: 2c0cb9f56020d2ea ("perf test: Add a shell test for 'perf stat --bpf-counters' new option")
Reviewed-by: Song Liu <song(a)kernel.org>
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: bpf(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230825164152.165610-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
index 513cd1e58e0e..a87bb2814b4c 100755
--- a/tools/perf/tests/shell/stat_bpf_counters.sh
+++ b/tools/perf/tests/shell/stat_bpf_counters.sh
@@ -22,10 +22,10 @@ compare_number()
}
# skip if --bpf-counters is not supported
-if ! perf stat --bpf-counters true > /dev/null 2>&1; then
+if ! perf stat -e cycles --bpf-counters true > /dev/null 2>&1; then
if [ "$1" = "-v" ]; then
echo "Skipping: --bpf-counters not supported"
- perf --no-pager stat --bpf-counters true || true
+ perf --no-pager stat -e cycles --bpf-counters true || true
fi
exit 2
fi
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 7822a8913f4c51c7d1aff793b525d60c3384fb5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091601-concerned-refurbish-854c@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
7822a8913f4c ("perf build: Update build rule for generated files")
00facc760903 ("perf jevents: Switch build to use jevents.py")
0d1c50ac488e ("perf tools: Add an option to build without libbfd")
517db3b59537 ("perf jevents: Make build dependency on test JSONs")
e71e19a9ea70 ("tools features: Add feature test to check if libbfd has buildid support")
74d5f3d06f70 ("tools build: Add capability-related feature detection")
3b1c5d965971 ("tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines")
8a1b1718214c ("perf build: Check what binutils's 'disassembler()' signature to use")
31be9478ed7f ("perf feature detection: Add -lopcodes to feature-libbfd")
a96c03e8cdcf ("tools build: Add test-reallocarray.c to test-all.c to fix the build")
1c3b28fd7ae8 ("perf coresight: Do not test for libopencsd by default")
aa8f9c517ebc ("tools build: Add -lrt to FEATURE_CHECK_LDFLAGS-libaio")
14541b1e7e72 ("perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz")
2a07d814747b ("tools build feature: Check if libaio is available")
531b014e7a2f ("tools: bpf: make use of reallocarray")
174e719439b8 ("Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7822a8913f4c51c7d1aff793b525d60c3384fb5b Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Thu, 27 Jul 2023 19:24:46 -0700
Subject: [PATCH] perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Anup Sharma <anupnewsmail(a)gmail.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 89430338a3d9..fac42486a8cf 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory
+# so it needs a separate rule to depend on them properly
+$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
+$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 150765f2baee..1d18bb89402e 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L
$(call rule_mkdir)
$(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@
endif
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 7822a8913f4c51c7d1aff793b525d60c3384fb5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091600-retention-manlike-1192@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
7822a8913f4c ("perf build: Update build rule for generated files")
00facc760903 ("perf jevents: Switch build to use jevents.py")
0d1c50ac488e ("perf tools: Add an option to build without libbfd")
517db3b59537 ("perf jevents: Make build dependency on test JSONs")
e71e19a9ea70 ("tools features: Add feature test to check if libbfd has buildid support")
74d5f3d06f70 ("tools build: Add capability-related feature detection")
3b1c5d965971 ("tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines")
8a1b1718214c ("perf build: Check what binutils's 'disassembler()' signature to use")
31be9478ed7f ("perf feature detection: Add -lopcodes to feature-libbfd")
a96c03e8cdcf ("tools build: Add test-reallocarray.c to test-all.c to fix the build")
1c3b28fd7ae8 ("perf coresight: Do not test for libopencsd by default")
aa8f9c517ebc ("tools build: Add -lrt to FEATURE_CHECK_LDFLAGS-libaio")
14541b1e7e72 ("perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz")
2a07d814747b ("tools build feature: Check if libaio is available")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7822a8913f4c51c7d1aff793b525d60c3384fb5b Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Thu, 27 Jul 2023 19:24:46 -0700
Subject: [PATCH] perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Anup Sharma <anupnewsmail(a)gmail.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 89430338a3d9..fac42486a8cf 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory
+# so it needs a separate rule to depend on them properly
+$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
+$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 150765f2baee..1d18bb89402e 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L
$(call rule_mkdir)
$(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@
endif
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 7822a8913f4c51c7d1aff793b525d60c3384fb5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091659-apricot-cartel-9e93@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
7822a8913f4c ("perf build: Update build rule for generated files")
00facc760903 ("perf jevents: Switch build to use jevents.py")
0d1c50ac488e ("perf tools: Add an option to build without libbfd")
517db3b59537 ("perf jevents: Make build dependency on test JSONs")
e71e19a9ea70 ("tools features: Add feature test to check if libbfd has buildid support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7822a8913f4c51c7d1aff793b525d60c3384fb5b Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Thu, 27 Jul 2023 19:24:46 -0700
Subject: [PATCH] perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Anup Sharma <anupnewsmail(a)gmail.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 89430338a3d9..fac42486a8cf 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory
+# so it needs a separate rule to depend on them properly
+$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
+$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 150765f2baee..1d18bb89402e 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L
$(call rule_mkdir)
$(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@
endif
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7822a8913f4c51c7d1aff793b525d60c3384fb5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091657-festival-verbalize-6981@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
7822a8913f4c ("perf build: Update build rule for generated files")
00facc760903 ("perf jevents: Switch build to use jevents.py")
0d1c50ac488e ("perf tools: Add an option to build without libbfd")
517db3b59537 ("perf jevents: Make build dependency on test JSONs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7822a8913f4c51c7d1aff793b525d60c3384fb5b Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Thu, 27 Jul 2023 19:24:46 -0700
Subject: [PATCH] perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Anup Sharma <anupnewsmail(a)gmail.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 89430338a3d9..fac42486a8cf 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory
+# so it needs a separate rule to depend on them properly
+$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
+$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 150765f2baee..1d18bb89402e 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L
$(call rule_mkdir)
$(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@
endif
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7822a8913f4c51c7d1aff793b525d60c3384fb5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091656-composure-legible-415e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7822a8913f4c ("perf build: Update build rule for generated files")
00facc760903 ("perf jevents: Switch build to use jevents.py")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7822a8913f4c51c7d1aff793b525d60c3384fb5b Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Thu, 27 Jul 2023 19:24:46 -0700
Subject: [PATCH] perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Anup Sharma <anupnewsmail(a)gmail.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 89430338a3d9..fac42486a8cf 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
+# bison and flex files are generated in the OUTPUT directory
+# so it needs a separate rule to depend on them properly
+$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
+$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
+ $(call rule_mkdir)
+ $(call if_changed_dep,$(host)cc_o_c)
+
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
index 150765f2baee..1d18bb89402e 100644
--- a/tools/perf/pmu-events/Build
+++ b/tools/perf/pmu-events/Build
@@ -35,3 +35,9 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L
$(call rule_mkdir)
$(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@
endif
+
+# pmu-events.c file is generated in the OUTPUT directory so it needs a
+# separate rule to depend on it properly
+$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C)
+ $(call rule_mkdir)
+ $(call if_changed_dep,cc_o_c)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4fe4a6374c4db9ae2b849b61e84b58685dca565a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091615-punctured-obnoxious-ab62@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4fe4a6374c4d ("MIPS: Only fiddle with CHECKFLAGS if `need-compiler'")
08f6554ff90e ("mips: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation")
d42f0c6ad502 ("MIPS: Use "grep -E" instead of "egrep"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4fe4a6374c4db9ae2b849b61e84b58685dca565a Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 18 Jul 2023 15:37:23 +0100
Subject: [PATCH] MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
We have originally guarded fiddling with CHECKFLAGS in our arch Makefile
by checking for the CONFIG_MIPS variable, not set for targets such as
`distclean', etc. that neither include `.config' nor use the compiler.
Starting from commit 805b2e1d427a ("kbuild: include Makefile.compiler
only when compiler is needed") we have had a generic `need-compiler'
variable explicitly telling us if the compiler will be used and thus its
capabilities need to be checked and expressed in the form of compilation
flags. If this variable is not set, then `make' functions such as
`cc-option' are undefined, causing all kinds of weirdness to happen if
we expect specific results to be returned, most recently:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
messages with configurations such as `fuloong2e_defconfig' and the
`modules_install' target, which does include `.config' and yet does not
use the compiler.
Replace the check for CONFIG_MIPS with one for `need-compiler' instead,
so as to prevent the compiler from being ever called for CHECKFLAGS when
not needed.
Reported-by: Guillaume Tucker <guillaume.tucker(a)collabora.com>
Closes: https://lore.kernel.org/r/85031c0c-d981-031e-8a50-bc4fad2ddcd8@collabora.co…
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed")
Cc: stable(a)vger.kernel.org # v5.13+
Reported-by: "kernelci.org bot" <bot(a)kernelci.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 69c18074d817..d624b87c150d 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -341,7 +341,7 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
KBUILD_LDFLAGS += -m $(ld-emul)
-ifdef CONFIG_MIPS
+ifdef need-compiler
CHECKFLAGS += $(shell $(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -dM -E -x c /dev/null | \
grep -E -vw '__GNUC_(MINOR_|PATCHLEVEL_)?_' | \
sed -e "s/^\#define /-D'/" -e "s/ /'='/" -e "s/$$/'/" -e 's/\$$/&&/g')
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4fe4a6374c4db9ae2b849b61e84b58685dca565a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091613-tracing-voucher-3d42@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
4fe4a6374c4d ("MIPS: Only fiddle with CHECKFLAGS if `need-compiler'")
08f6554ff90e ("mips: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation")
d42f0c6ad502 ("MIPS: Use "grep -E" instead of "egrep"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4fe4a6374c4db9ae2b849b61e84b58685dca565a Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 18 Jul 2023 15:37:23 +0100
Subject: [PATCH] MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
We have originally guarded fiddling with CHECKFLAGS in our arch Makefile
by checking for the CONFIG_MIPS variable, not set for targets such as
`distclean', etc. that neither include `.config' nor use the compiler.
Starting from commit 805b2e1d427a ("kbuild: include Makefile.compiler
only when compiler is needed") we have had a generic `need-compiler'
variable explicitly telling us if the compiler will be used and thus its
capabilities need to be checked and expressed in the form of compilation
flags. If this variable is not set, then `make' functions such as
`cc-option' are undefined, causing all kinds of weirdness to happen if
we expect specific results to be returned, most recently:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
messages with configurations such as `fuloong2e_defconfig' and the
`modules_install' target, which does include `.config' and yet does not
use the compiler.
Replace the check for CONFIG_MIPS with one for `need-compiler' instead,
so as to prevent the compiler from being ever called for CHECKFLAGS when
not needed.
Reported-by: Guillaume Tucker <guillaume.tucker(a)collabora.com>
Closes: https://lore.kernel.org/r/85031c0c-d981-031e-8a50-bc4fad2ddcd8@collabora.co…
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed")
Cc: stable(a)vger.kernel.org # v5.13+
Reported-by: "kernelci.org bot" <bot(a)kernelci.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 69c18074d817..d624b87c150d 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -341,7 +341,7 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
KBUILD_LDFLAGS += -m $(ld-emul)
-ifdef CONFIG_MIPS
+ifdef need-compiler
CHECKFLAGS += $(shell $(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -dM -E -x c /dev/null | \
grep -E -vw '__GNUC_(MINOR_|PATCHLEVEL_)?_' | \
sed -e "s/^\#define /-D'/" -e "s/ /'='/" -e "s/$$/'/" -e 's/\$$/&&/g')
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x f3cebc75e7425d6949d726bb8e937095b0aef025
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091600-appetite-ought-b4f4@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
f3cebc75e742 ("KVM: SVM: Set target pCPU during IRTE update if target vCPU is running")
4c08e737f056 ("KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry")
935a7333958e ("KVM: SVM: Drop AVIC's intermediate avic_set_running() helper")
782f64558de7 ("KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU")
af52f5aa5c1b ("KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption")
e422b8896948 ("KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path")
7cfc5c653b07 ("KVM: fix avic_set_running for preemptable kernels")
bf5f6b9d7ad6 ("KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}")
02ffbe6351f5 ("KVM: SVM: fix doc warnings")
a7fc06dd2f14 ("KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup")
553cc15f6e8d ("KVM: SVM: remove uneeded fields from host_save_users_msrs")
e79b91bb3c91 ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state")
35a7831912f4 ("KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions")
647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
f65cf84ee769 ("KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load")
8640ca588b03 ("KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting")
16809ecdc1e8 ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
85ca8be938c0 ("KVM: SVM: Set the encryption mask for the SVM host save area")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3cebc75e7425d6949d726bb8e937095b0aef025 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 8 Aug 2023 16:31:32 -0700
Subject: [PATCH] KVM: SVM: Set target pCPU during IRTE update if target vCPU
is running
Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU. KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delayed until the vCPU goes through a put+load cycle (which
might very well "never" happen for the lifetime of the VM).
To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock. Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey(a)bytedance.com>
Cc: stable(a)vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Reviewed-by: Joao Martins <joao.m.martins(a)oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e041b215ddb..2092db892d7d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
int ret = 0;
unsigned long flags;
struct amd_svm_iommu_ir *ir;
+ u64 entry;
/**
* In some cases, the existing irte is updated and re-set,
@@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
ir->data = pi->ir_data;
spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ /*
+ * Update the target pCPU for IOMMU doorbells if the vCPU is running.
+ * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
+ * will update the pCPU info when the vCPU awkened and/or scheduled in.
+ * See also avic_vcpu_load().
+ */
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+ amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
+ true, pi->ir_data);
+
list_add(&ir->node, &svm->ir_list);
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
out:
@@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_vcpu_is_blocking(vcpu))
return;
+ /*
+ * Grab the per-vCPU interrupt remapping lock even if the VM doesn't
+ * _currently_ have assigned devices, as that can change. Holding
+ * ir_list_lock ensures that either svm_ir_list_add() will consume
+ * up-to-date entry information, or that this task will wait until
+ * svm_ir_list_add() completes to set the new target pCPU.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
entry = READ_ONCE(*(svm->avic_physical_id_cache));
@@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
return;
+ /*
+ * Take and hold the per-vCPU interrupt remapping lock while updating
+ * the Physical ID entry even though the lock doesn't protect against
+ * multiple writers (see above). Holding ir_list_lock ensures that
+ * either svm_ir_list_add() will consume up-to-date entry information,
+ * or that this task will wait until svm_ir_list_add() completes to
+ * mark the vCPU as not running.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f3cebc75e7425d6949d726bb8e937095b0aef025
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091659-tinderbox-ligament-ed56@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f3cebc75e742 ("KVM: SVM: Set target pCPU during IRTE update if target vCPU is running")
4c08e737f056 ("KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry")
935a7333958e ("KVM: SVM: Drop AVIC's intermediate avic_set_running() helper")
782f64558de7 ("KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU")
af52f5aa5c1b ("KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption")
e422b8896948 ("KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path")
7cfc5c653b07 ("KVM: fix avic_set_running for preemptable kernels")
bf5f6b9d7ad6 ("KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}")
02ffbe6351f5 ("KVM: SVM: fix doc warnings")
a7fc06dd2f14 ("KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup")
553cc15f6e8d ("KVM: SVM: remove uneeded fields from host_save_users_msrs")
e79b91bb3c91 ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state")
35a7831912f4 ("KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions")
647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
f65cf84ee769 ("KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load")
8640ca588b03 ("KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting")
16809ecdc1e8 ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
85ca8be938c0 ("KVM: SVM: Set the encryption mask for the SVM host save area")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3cebc75e7425d6949d726bb8e937095b0aef025 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 8 Aug 2023 16:31:32 -0700
Subject: [PATCH] KVM: SVM: Set target pCPU during IRTE update if target vCPU
is running
Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU. KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delayed until the vCPU goes through a put+load cycle (which
might very well "never" happen for the lifetime of the VM).
To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock. Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey(a)bytedance.com>
Cc: stable(a)vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Reviewed-by: Joao Martins <joao.m.martins(a)oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e041b215ddb..2092db892d7d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
int ret = 0;
unsigned long flags;
struct amd_svm_iommu_ir *ir;
+ u64 entry;
/**
* In some cases, the existing irte is updated and re-set,
@@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
ir->data = pi->ir_data;
spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ /*
+ * Update the target pCPU for IOMMU doorbells if the vCPU is running.
+ * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
+ * will update the pCPU info when the vCPU awkened and/or scheduled in.
+ * See also avic_vcpu_load().
+ */
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+ amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
+ true, pi->ir_data);
+
list_add(&ir->node, &svm->ir_list);
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
out:
@@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_vcpu_is_blocking(vcpu))
return;
+ /*
+ * Grab the per-vCPU interrupt remapping lock even if the VM doesn't
+ * _currently_ have assigned devices, as that can change. Holding
+ * ir_list_lock ensures that either svm_ir_list_add() will consume
+ * up-to-date entry information, or that this task will wait until
+ * svm_ir_list_add() completes to set the new target pCPU.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
entry = READ_ONCE(*(svm->avic_physical_id_cache));
@@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
return;
+ /*
+ * Take and hold the per-vCPU interrupt remapping lock while updating
+ * the Physical ID entry even though the lock doesn't protect against
+ * multiple writers (see above). Holding ir_list_lock ensures that
+ * either svm_ir_list_add() will consume up-to-date entry information,
+ * or that this task will wait until svm_ir_list_add() completes to
+ * mark the vCPU as not running.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f3cebc75e7425d6949d726bb8e937095b0aef025
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091658-ardently-stony-3b87@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
f3cebc75e742 ("KVM: SVM: Set target pCPU during IRTE update if target vCPU is running")
4c08e737f056 ("KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry")
935a7333958e ("KVM: SVM: Drop AVIC's intermediate avic_set_running() helper")
782f64558de7 ("KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU")
af52f5aa5c1b ("KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption")
e422b8896948 ("KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path")
7cfc5c653b07 ("KVM: fix avic_set_running for preemptable kernels")
bf5f6b9d7ad6 ("KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}")
02ffbe6351f5 ("KVM: SVM: fix doc warnings")
a7fc06dd2f14 ("KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup")
553cc15f6e8d ("KVM: SVM: remove uneeded fields from host_save_users_msrs")
e79b91bb3c91 ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state")
35a7831912f4 ("KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions")
647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
f65cf84ee769 ("KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load")
8640ca588b03 ("KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting")
16809ecdc1e8 ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
85ca8be938c0 ("KVM: SVM: Set the encryption mask for the SVM host save area")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3cebc75e7425d6949d726bb8e937095b0aef025 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 8 Aug 2023 16:31:32 -0700
Subject: [PATCH] KVM: SVM: Set target pCPU during IRTE update if target vCPU
is running
Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU. KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delayed until the vCPU goes through a put+load cycle (which
might very well "never" happen for the lifetime of the VM).
To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock. Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey(a)bytedance.com>
Cc: stable(a)vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Reviewed-by: Joao Martins <joao.m.martins(a)oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e041b215ddb..2092db892d7d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
int ret = 0;
unsigned long flags;
struct amd_svm_iommu_ir *ir;
+ u64 entry;
/**
* In some cases, the existing irte is updated and re-set,
@@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
ir->data = pi->ir_data;
spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ /*
+ * Update the target pCPU for IOMMU doorbells if the vCPU is running.
+ * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
+ * will update the pCPU info when the vCPU awkened and/or scheduled in.
+ * See also avic_vcpu_load().
+ */
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+ amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
+ true, pi->ir_data);
+
list_add(&ir->node, &svm->ir_list);
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
out:
@@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_vcpu_is_blocking(vcpu))
return;
+ /*
+ * Grab the per-vCPU interrupt remapping lock even if the VM doesn't
+ * _currently_ have assigned devices, as that can change. Holding
+ * ir_list_lock ensures that either svm_ir_list_add() will consume
+ * up-to-date entry information, or that this task will wait until
+ * svm_ir_list_add() completes to set the new target pCPU.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
entry = READ_ONCE(*(svm->avic_physical_id_cache));
@@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
return;
+ /*
+ * Take and hold the per-vCPU interrupt remapping lock while updating
+ * the Physical ID entry even though the lock doesn't protect against
+ * multiple writers (see above). Holding ir_list_lock ensures that
+ * either svm_ir_list_add() will consume up-to-date entry information,
+ * or that this task will wait until svm_ir_list_add() completes to
+ * mark the vCPU as not running.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f3cebc75e7425d6949d726bb8e937095b0aef025
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091657-decidable-garbage-b5fa@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
f3cebc75e742 ("KVM: SVM: Set target pCPU during IRTE update if target vCPU is running")
4c08e737f056 ("KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry")
935a7333958e ("KVM: SVM: Drop AVIC's intermediate avic_set_running() helper")
782f64558de7 ("KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU")
af52f5aa5c1b ("KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption")
e422b8896948 ("KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path")
7cfc5c653b07 ("KVM: fix avic_set_running for preemptable kernels")
bf5f6b9d7ad6 ("KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}")
02ffbe6351f5 ("KVM: SVM: fix doc warnings")
a7fc06dd2f14 ("KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup")
553cc15f6e8d ("KVM: SVM: remove uneeded fields from host_save_users_msrs")
e79b91bb3c91 ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state")
35a7831912f4 ("KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions")
647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
f65cf84ee769 ("KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load")
8640ca588b03 ("KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting")
16809ecdc1e8 ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
85ca8be938c0 ("KVM: SVM: Set the encryption mask for the SVM host save area")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3cebc75e7425d6949d726bb8e937095b0aef025 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 8 Aug 2023 16:31:32 -0700
Subject: [PATCH] KVM: SVM: Set target pCPU during IRTE update if target vCPU
is running
Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU. KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delayed until the vCPU goes through a put+load cycle (which
might very well "never" happen for the lifetime of the VM).
To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock. Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey(a)bytedance.com>
Cc: stable(a)vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Reviewed-by: Joao Martins <joao.m.martins(a)oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e041b215ddb..2092db892d7d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
int ret = 0;
unsigned long flags;
struct amd_svm_iommu_ir *ir;
+ u64 entry;
/**
* In some cases, the existing irte is updated and re-set,
@@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
ir->data = pi->ir_data;
spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ /*
+ * Update the target pCPU for IOMMU doorbells if the vCPU is running.
+ * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
+ * will update the pCPU info when the vCPU awkened and/or scheduled in.
+ * See also avic_vcpu_load().
+ */
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+ amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
+ true, pi->ir_data);
+
list_add(&ir->node, &svm->ir_list);
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
out:
@@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_vcpu_is_blocking(vcpu))
return;
+ /*
+ * Grab the per-vCPU interrupt remapping lock even if the VM doesn't
+ * _currently_ have assigned devices, as that can change. Holding
+ * ir_list_lock ensures that either svm_ir_list_add() will consume
+ * up-to-date entry information, or that this task will wait until
+ * svm_ir_list_add() completes to set the new target pCPU.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
entry = READ_ONCE(*(svm->avic_physical_id_cache));
@@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
return;
+ /*
+ * Take and hold the per-vCPU interrupt remapping lock while updating
+ * the Physical ID entry even though the lock doesn't protect against
+ * multiple writers (see above). Holding ir_list_lock ensures that
+ * either svm_ir_list_add() will consume up-to-date entry information,
+ * or that this task will wait until svm_ir_list_add() completes to
+ * mark the vCPU as not running.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f3cebc75e7425d6949d726bb8e937095b0aef025
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091655-resize-playmate-ac62@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f3cebc75e742 ("KVM: SVM: Set target pCPU during IRTE update if target vCPU is running")
4c08e737f056 ("KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry")
935a7333958e ("KVM: SVM: Drop AVIC's intermediate avic_set_running() helper")
782f64558de7 ("KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU")
af52f5aa5c1b ("KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption")
e422b8896948 ("KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path")
7cfc5c653b07 ("KVM: fix avic_set_running for preemptable kernels")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3cebc75e7425d6949d726bb8e937095b0aef025 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 8 Aug 2023 16:31:32 -0700
Subject: [PATCH] KVM: SVM: Set target pCPU during IRTE update if target vCPU
is running
Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU. KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delayed until the vCPU goes through a put+load cycle (which
might very well "never" happen for the lifetime of the VM).
To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock. Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey(a)bytedance.com>
Cc: stable(a)vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
Reviewed-by: Joao Martins <joao.m.martins(a)oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e041b215ddb..2092db892d7d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
int ret = 0;
unsigned long flags;
struct amd_svm_iommu_ir *ir;
+ u64 entry;
/**
* In some cases, the existing irte is updated and re-set,
@@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
ir->data = pi->ir_data;
spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ /*
+ * Update the target pCPU for IOMMU doorbells if the vCPU is running.
+ * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
+ * will update the pCPU info when the vCPU awkened and/or scheduled in.
+ * See also avic_vcpu_load().
+ */
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+ amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
+ true, pi->ir_data);
+
list_add(&ir->node, &svm->ir_list);
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
out:
@@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_vcpu_is_blocking(vcpu))
return;
+ /*
+ * Grab the per-vCPU interrupt remapping lock even if the VM doesn't
+ * _currently_ have assigned devices, as that can change. Holding
+ * ir_list_lock ensures that either svm_ir_list_add() will consume
+ * up-to-date entry information, or that this task will wait until
+ * svm_ir_list_add() completes to set the new target pCPU.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
entry = READ_ONCE(*(svm->avic_physical_id_cache));
@@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
return;
+ /*
+ * Take and hold the per-vCPU interrupt remapping lock while updating
+ * the Physical ID entry even though the lock doesn't protect against
+ * multiple writers (see above). Holding ir_list_lock ensures that
+ * either svm_ir_list_add() will consume up-to-date entry information,
+ * or that this task will wait until svm_ir_list_add() completes to
+ * mark the vCPU as not running.
+ */
spin_lock_irqsave(&svm->ir_list_lock, flags);
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1e611e104b9acb6310b8c684d5acee0e11ca7bd1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091600-powdered-lanky-892a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1e611e104b9a ("mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT")
d534fd9787d5 ("mtd: spi-nor: spansion: use CLPEF as an alternative to CLSR")
4095f4d9225a ("mtd: spi-nor: Fix divide by zero for spi-nor-generic flashes")
df6def86b9dc ("mtd: spi-nor: spansion: Add support for s25hl02gt and s25hs02gt")
91f3c430f622 ("mtd: spi-nor: spansion: Add a new ->ready() hook for multi-chip device")
6c01ae11130c ("mtd: spi-nor: spansion: Rework cypress_nor_get_page_size() for multi-chip device support")
e570f7872a34 ("mtd: spi-nor: Allow post_sfdp hook to return errors")
120c94a67b26 ("mtd: spi-nor: spansion: Rename method to cypress_nor_get_page_size")
a9180c298d35 ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for S25FS256T")
4199c1719e24 ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s25hx SEMPER flash")
9fd0945fe6fa ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s28hx SEMPER flash")
c87c9b11c53c ("mtd: spi-nor: spansion: Determine current address mode")
4e53ab0c292d ("mtd: spi-nor: Set the 4-Byte Address Mode method based on SFDP data")
d75c22f376f6 ("mtd: spi-nor: core: Update name and description of spi_nor_set_4byte_addr_mode")
f1f1976224f3 ("mtd: spi-nor: core: Update name and description of spansion_set_4byte_addr_mode")
288df4378319 ("mtd: spi-nor: core: Update name and description of micron_st_nor_set_4byte_addr_mode")
076aa4eac8b3 ("mtd: spi-nor: core: Move generic method to core - micron_st_nor_set_4byte_addr_mode")
79a4db50192c ("mtd: spi-nor: Delay the initialization of bank_size")
4eddee70140b ("mtd: spi-nor: Add a RWW flag")
6afcc84080c4 ("mtd: spi-nor: spansion: Add support for Infineon S25FS256T")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e611e104b9acb6310b8c684d5acee0e11ca7bd1 Mon Sep 17 00:00:00 2001
From: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Date: Wed, 26 Jul 2023 10:52:48 +0300
Subject: [PATCH] mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT
CFR2V[7] is assigned to Flash's address mode (3- or 4-ybte) and must not
be changed when writing MEMLAT (CFR2V[3:0]). CFR2V shall be used in a read,
update, write back fashion.
Fixes: c3266af101f2 ("mtd: spi-nor: spansion: add support for Cypress Semper flash")
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230726075257.12985-3-tudor.ambarus@linaro.org
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
diff --git a/drivers/mtd/spi-nor/spansion.c b/drivers/mtd/spi-nor/spansion.c
index 6b2532ed053c..6460d2247bdf 100644
--- a/drivers/mtd/spi-nor/spansion.c
+++ b/drivers/mtd/spi-nor/spansion.c
@@ -4,6 +4,7 @@
* Copyright (C) 2014, Freescale Semiconductor, Inc.
*/
+#include <linux/bitfield.h>
#include <linux/device.h>
#include <linux/mtd/spi-nor.h>
@@ -28,6 +29,7 @@
#define SPINOR_REG_CYPRESS_CFR2 0x3
#define SPINOR_REG_CYPRESS_CFR2V \
(SPINOR_REG_CYPRESS_VREG + SPINOR_REG_CYPRESS_CFR2)
+#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK GENMASK(3, 0)
#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24 0xb
#define SPINOR_REG_CYPRESS_CFR2_ADRBYT BIT(7)
#define SPINOR_REG_CYPRESS_CFR3 0x4
@@ -161,8 +163,18 @@ static int cypress_nor_octal_dtr_en(struct spi_nor *nor)
int ret;
u8 addr_mode_nbytes = nor->params->addr_mode_nbytes;
+ op = (struct spi_mem_op)
+ CYPRESS_NOR_RD_ANY_REG_OP(addr_mode_nbytes,
+ SPINOR_REG_CYPRESS_CFR2V, 0, buf);
+
+ ret = spi_nor_read_any_reg(nor, &op, nor->reg_proto);
+ if (ret)
+ return ret;
+
/* Use 24 dummy cycles for memory array reads. */
- *buf = SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24;
+ *buf &= ~SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK;
+ *buf |= FIELD_PREP(SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK,
+ SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24);
op = (struct spi_mem_op)
CYPRESS_NOR_WR_ANY_REG_OP(addr_mode_nbytes,
SPINOR_REG_CYPRESS_CFR2V, 1, buf);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e611e104b9acb6310b8c684d5acee0e11ca7bd1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091659-unguarded-greeting-7dbb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1e611e104b9a ("mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT")
d534fd9787d5 ("mtd: spi-nor: spansion: use CLPEF as an alternative to CLSR")
4095f4d9225a ("mtd: spi-nor: Fix divide by zero for spi-nor-generic flashes")
df6def86b9dc ("mtd: spi-nor: spansion: Add support for s25hl02gt and s25hs02gt")
91f3c430f622 ("mtd: spi-nor: spansion: Add a new ->ready() hook for multi-chip device")
6c01ae11130c ("mtd: spi-nor: spansion: Rework cypress_nor_get_page_size() for multi-chip device support")
e570f7872a34 ("mtd: spi-nor: Allow post_sfdp hook to return errors")
120c94a67b26 ("mtd: spi-nor: spansion: Rename method to cypress_nor_get_page_size")
a9180c298d35 ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for S25FS256T")
4199c1719e24 ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s25hx SEMPER flash")
9fd0945fe6fa ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s28hx SEMPER flash")
c87c9b11c53c ("mtd: spi-nor: spansion: Determine current address mode")
4e53ab0c292d ("mtd: spi-nor: Set the 4-Byte Address Mode method based on SFDP data")
d75c22f376f6 ("mtd: spi-nor: core: Update name and description of spi_nor_set_4byte_addr_mode")
f1f1976224f3 ("mtd: spi-nor: core: Update name and description of spansion_set_4byte_addr_mode")
288df4378319 ("mtd: spi-nor: core: Update name and description of micron_st_nor_set_4byte_addr_mode")
076aa4eac8b3 ("mtd: spi-nor: core: Move generic method to core - micron_st_nor_set_4byte_addr_mode")
79a4db50192c ("mtd: spi-nor: Delay the initialization of bank_size")
4eddee70140b ("mtd: spi-nor: Add a RWW flag")
6afcc84080c4 ("mtd: spi-nor: spansion: Add support for Infineon S25FS256T")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e611e104b9acb6310b8c684d5acee0e11ca7bd1 Mon Sep 17 00:00:00 2001
From: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Date: Wed, 26 Jul 2023 10:52:48 +0300
Subject: [PATCH] mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT
CFR2V[7] is assigned to Flash's address mode (3- or 4-ybte) and must not
be changed when writing MEMLAT (CFR2V[3:0]). CFR2V shall be used in a read,
update, write back fashion.
Fixes: c3266af101f2 ("mtd: spi-nor: spansion: add support for Cypress Semper flash")
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230726075257.12985-3-tudor.ambarus@linaro.org
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
diff --git a/drivers/mtd/spi-nor/spansion.c b/drivers/mtd/spi-nor/spansion.c
index 6b2532ed053c..6460d2247bdf 100644
--- a/drivers/mtd/spi-nor/spansion.c
+++ b/drivers/mtd/spi-nor/spansion.c
@@ -4,6 +4,7 @@
* Copyright (C) 2014, Freescale Semiconductor, Inc.
*/
+#include <linux/bitfield.h>
#include <linux/device.h>
#include <linux/mtd/spi-nor.h>
@@ -28,6 +29,7 @@
#define SPINOR_REG_CYPRESS_CFR2 0x3
#define SPINOR_REG_CYPRESS_CFR2V \
(SPINOR_REG_CYPRESS_VREG + SPINOR_REG_CYPRESS_CFR2)
+#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK GENMASK(3, 0)
#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24 0xb
#define SPINOR_REG_CYPRESS_CFR2_ADRBYT BIT(7)
#define SPINOR_REG_CYPRESS_CFR3 0x4
@@ -161,8 +163,18 @@ static int cypress_nor_octal_dtr_en(struct spi_nor *nor)
int ret;
u8 addr_mode_nbytes = nor->params->addr_mode_nbytes;
+ op = (struct spi_mem_op)
+ CYPRESS_NOR_RD_ANY_REG_OP(addr_mode_nbytes,
+ SPINOR_REG_CYPRESS_CFR2V, 0, buf);
+
+ ret = spi_nor_read_any_reg(nor, &op, nor->reg_proto);
+ if (ret)
+ return ret;
+
/* Use 24 dummy cycles for memory array reads. */
- *buf = SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24;
+ *buf &= ~SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK;
+ *buf |= FIELD_PREP(SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK,
+ SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24);
op = (struct spi_mem_op)
CYPRESS_NOR_WR_ANY_REG_OP(addr_mode_nbytes,
SPINOR_REG_CYPRESS_CFR2V, 1, buf);
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 1e611e104b9acb6310b8c684d5acee0e11ca7bd1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091658-glare-spinster-90d6@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
1e611e104b9a ("mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT")
d534fd9787d5 ("mtd: spi-nor: spansion: use CLPEF as an alternative to CLSR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e611e104b9acb6310b8c684d5acee0e11ca7bd1 Mon Sep 17 00:00:00 2001
From: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Date: Wed, 26 Jul 2023 10:52:48 +0300
Subject: [PATCH] mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT
CFR2V[7] is assigned to Flash's address mode (3- or 4-ybte) and must not
be changed when writing MEMLAT (CFR2V[3:0]). CFR2V shall be used in a read,
update, write back fashion.
Fixes: c3266af101f2 ("mtd: spi-nor: spansion: add support for Cypress Semper flash")
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230726075257.12985-3-tudor.ambarus@linaro.org
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
diff --git a/drivers/mtd/spi-nor/spansion.c b/drivers/mtd/spi-nor/spansion.c
index 6b2532ed053c..6460d2247bdf 100644
--- a/drivers/mtd/spi-nor/spansion.c
+++ b/drivers/mtd/spi-nor/spansion.c
@@ -4,6 +4,7 @@
* Copyright (C) 2014, Freescale Semiconductor, Inc.
*/
+#include <linux/bitfield.h>
#include <linux/device.h>
#include <linux/mtd/spi-nor.h>
@@ -28,6 +29,7 @@
#define SPINOR_REG_CYPRESS_CFR2 0x3
#define SPINOR_REG_CYPRESS_CFR2V \
(SPINOR_REG_CYPRESS_VREG + SPINOR_REG_CYPRESS_CFR2)
+#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK GENMASK(3, 0)
#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24 0xb
#define SPINOR_REG_CYPRESS_CFR2_ADRBYT BIT(7)
#define SPINOR_REG_CYPRESS_CFR3 0x4
@@ -161,8 +163,18 @@ static int cypress_nor_octal_dtr_en(struct spi_nor *nor)
int ret;
u8 addr_mode_nbytes = nor->params->addr_mode_nbytes;
+ op = (struct spi_mem_op)
+ CYPRESS_NOR_RD_ANY_REG_OP(addr_mode_nbytes,
+ SPINOR_REG_CYPRESS_CFR2V, 0, buf);
+
+ ret = spi_nor_read_any_reg(nor, &op, nor->reg_proto);
+ if (ret)
+ return ret;
+
/* Use 24 dummy cycles for memory array reads. */
- *buf = SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24;
+ *buf &= ~SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK;
+ *buf |= FIELD_PREP(SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK,
+ SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24);
op = (struct spi_mem_op)
CYPRESS_NOR_WR_ANY_REG_OP(addr_mode_nbytes,
SPINOR_REG_CYPRESS_CFR2V, 1, buf);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091640-upstroke-gopher-c666@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
13bb483d32ab ("btrfs: zoned: activate metadata block group on write time")
0356ad41e0dd ("btrfs: zoned: defer advancing meta write pointer")
2ad8c0510a96 ("btrfs: zoned: return int from btrfs_check_meta_write_pointer")
7db94301a980 ("btrfs: zoned: introduce block group context to btrfs_eb_write_context")
861093eff4f0 ("btrfs: introduce struct to consolidate extent buffer write context")
71df088c1cc0 ("btrfs: defer splitting of ordered extents until I/O completion")
a6f3e205e491 ("btrfs: move split_extent_map to extent_map.c")
cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
5cfe76f846d5 ("btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical")
6e4b2479ab38 ("btrfs: mark the len field in struct btrfs_ordered_sum as unsigned")
50b21d7a066f ("btrfs: submit a writeback bio per extent_buffer")
9fdd160160f0 ("btrfs: return bool from lock_extent_buffer_for_io")
adbe7e388e42 ("btrfs: use SECTOR_SHIFT to convert LBA to physical offset")
7edd339c8a41 ("btrfs: pass an ordered_extent to btrfs_extract_ordered_extent")
2e38a84bc6ab ("btrfs: simplify extent map splitting and rename split_zoned_em")
11d33ab6c1f3 ("btrfs: simplify splitting logic in btrfs_extract_ordered_extent")
e44ca71cfe07 ("btrfs: move ordered_extent internal sanity checks into btrfs_split_ordered_extent")
2cef0c79bb81 ("btrfs: make btrfs_split_bio work on struct btrfs_bio")
ae42a154ca89 ("btrfs: pass a btrfs_bio to btrfs_submit_bio")
34f888ce3a35 ("btrfs: cleanup main loop in btrfs_encoded_read_regular_fill_pages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2 Mon Sep 17 00:00:00 2001
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Date: Tue, 8 Aug 2023 01:12:37 +0900
Subject: [PATCH] btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation
time to ensure that all reserved bytes can be written to an active metadata
block group. However, this approach has proven to be less efficient, as it
activates block groups more frequently than necessary, putting pressure on
the active zone resource and leading to potential issues such as early
ENOSPC or hung_task.
Another drawback of the current method is that it hampers metadata
over-commit, and necessitates additional flush operations and block group
allocations, resulting in decreased overall performance.
To address these issues, this commit introduces a write-time activation of
metadata and system block group. This involves reserving at least one
active block group specifically for a metadata and system block group.
Since metadata write-out is always allocated sequentially, when we need to
write to a non-active block group, we can wait for the ongoing IOs to
complete, activate a new block group, and then proceed with writing to the
new block group.
Fixes: b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
CC: stable(a)vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a127865f49f9..b0e432c30e1d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4287,6 +4287,17 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_caching_control *caching_ctl;
struct rb_node *n;
+ if (btrfs_is_zoned(info)) {
+ if (info->active_meta_bg) {
+ btrfs_put_block_group(info->active_meta_bg);
+ info->active_meta_bg = NULL;
+ }
+ if (info->active_system_bg) {
+ btrfs_put_block_group(info->active_system_bg);
+ info->active_system_bg = NULL;
+ }
+ }
+
write_lock(&info->block_group_cache_lock);
while (!list_empty(&info->caching_block_groups)) {
caching_ctl = list_entry(info->caching_block_groups.next,
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index ef07c6c252d8..a523d64d5491 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -770,6 +770,9 @@ struct btrfs_fs_info {
u64 data_reloc_bg;
struct mutex zoned_data_reloc_io_lock;
+ struct btrfs_block_group *active_meta_bg;
+ struct btrfs_block_group *active_system_bg;
+
u64 nr_global_roots;
spinlock_t zone_active_bgs_lock;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index fc69041bb6b4..099cb6a6d3b3 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -65,6 +65,9 @@
#define SUPER_INFO_SECTORS ((u64)BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT)
+static void wait_eb_writebacks(struct btrfs_block_group *block_group);
+static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written);
+
static inline bool sb_zone_is_full(const struct blk_zone *zone)
{
return (zone->cond == BLK_ZONE_COND_FULL) ||
@@ -1747,6 +1750,62 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
}
}
+static bool check_bg_is_active(struct btrfs_eb_write_context *ctx,
+ struct btrfs_block_group **active_bg)
+{
+ const struct writeback_control *wbc = ctx->wbc;
+ struct btrfs_block_group *block_group = ctx->zoned_bg;
+ struct btrfs_fs_info *fs_info = block_group->fs_info;
+
+ if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
+ return true;
+
+ if (fs_info->treelog_bg == block_group->start) {
+ if (!btrfs_zone_activate(block_group)) {
+ int ret_fin = btrfs_zone_finish_one_bg(fs_info);
+
+ if (ret_fin != 1 || !btrfs_zone_activate(block_group))
+ return false;
+ }
+ } else if (*active_bg != block_group) {
+ struct btrfs_block_group *tgt = *active_bg;
+
+ /* zoned_meta_io_lock protects fs_info->active_{meta,system}_bg. */
+ lockdep_assert_held(&fs_info->zoned_meta_io_lock);
+
+ if (tgt) {
+ /*
+ * If there is an unsent IO left in the allocated area,
+ * we cannot wait for them as it may cause a deadlock.
+ */
+ if (tgt->meta_write_pointer < tgt->start + tgt->alloc_offset) {
+ if (wbc->sync_mode == WB_SYNC_NONE ||
+ (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync))
+ return false;
+ }
+
+ /* Pivot active metadata/system block group. */
+ btrfs_zoned_meta_io_unlock(fs_info);
+ wait_eb_writebacks(tgt);
+ do_zone_finish(tgt, true);
+ btrfs_zoned_meta_io_lock(fs_info);
+ if (*active_bg == tgt) {
+ btrfs_put_block_group(tgt);
+ *active_bg = NULL;
+ }
+ }
+ if (!btrfs_zone_activate(block_group))
+ return false;
+ if (*active_bg != block_group) {
+ ASSERT(*active_bg == NULL);
+ *active_bg = block_group;
+ btrfs_get_block_group(block_group);
+ }
+ }
+
+ return true;
+}
+
/*
* Check if @ctx->eb is aligned to the write pointer.
*
@@ -1781,8 +1840,26 @@ int btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
ctx->zoned_bg = block_group;
}
- if (block_group->meta_write_pointer == eb->start)
- return 0;
+ if (block_group->meta_write_pointer == eb->start) {
+ struct btrfs_block_group **tgt;
+
+ if (!test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags))
+ return 0;
+
+ if (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ tgt = &fs_info->active_system_bg;
+ else
+ tgt = &fs_info->active_meta_bg;
+ if (check_bg_is_active(ctx, tgt))
+ return 0;
+ }
+
+ /*
+ * Since we may release fs_info->zoned_meta_io_lock, someone can already
+ * start writing this eb. In that case, we can just bail out.
+ */
+ if (block_group->meta_write_pointer > eb->start)
+ return -EBUSY;
/* If for_sync, this hole will be filled with trasnsaction commit. */
if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091639-mundane-justice-4caa@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
13bb483d32ab ("btrfs: zoned: activate metadata block group on write time")
0356ad41e0dd ("btrfs: zoned: defer advancing meta write pointer")
2ad8c0510a96 ("btrfs: zoned: return int from btrfs_check_meta_write_pointer")
7db94301a980 ("btrfs: zoned: introduce block group context to btrfs_eb_write_context")
861093eff4f0 ("btrfs: introduce struct to consolidate extent buffer write context")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2 Mon Sep 17 00:00:00 2001
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Date: Tue, 8 Aug 2023 01:12:37 +0900
Subject: [PATCH] btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation
time to ensure that all reserved bytes can be written to an active metadata
block group. However, this approach has proven to be less efficient, as it
activates block groups more frequently than necessary, putting pressure on
the active zone resource and leading to potential issues such as early
ENOSPC or hung_task.
Another drawback of the current method is that it hampers metadata
over-commit, and necessitates additional flush operations and block group
allocations, resulting in decreased overall performance.
To address these issues, this commit introduces a write-time activation of
metadata and system block group. This involves reserving at least one
active block group specifically for a metadata and system block group.
Since metadata write-out is always allocated sequentially, when we need to
write to a non-active block group, we can wait for the ongoing IOs to
complete, activate a new block group, and then proceed with writing to the
new block group.
Fixes: b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
CC: stable(a)vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a127865f49f9..b0e432c30e1d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4287,6 +4287,17 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_caching_control *caching_ctl;
struct rb_node *n;
+ if (btrfs_is_zoned(info)) {
+ if (info->active_meta_bg) {
+ btrfs_put_block_group(info->active_meta_bg);
+ info->active_meta_bg = NULL;
+ }
+ if (info->active_system_bg) {
+ btrfs_put_block_group(info->active_system_bg);
+ info->active_system_bg = NULL;
+ }
+ }
+
write_lock(&info->block_group_cache_lock);
while (!list_empty(&info->caching_block_groups)) {
caching_ctl = list_entry(info->caching_block_groups.next,
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index ef07c6c252d8..a523d64d5491 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -770,6 +770,9 @@ struct btrfs_fs_info {
u64 data_reloc_bg;
struct mutex zoned_data_reloc_io_lock;
+ struct btrfs_block_group *active_meta_bg;
+ struct btrfs_block_group *active_system_bg;
+
u64 nr_global_roots;
spinlock_t zone_active_bgs_lock;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index fc69041bb6b4..099cb6a6d3b3 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -65,6 +65,9 @@
#define SUPER_INFO_SECTORS ((u64)BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT)
+static void wait_eb_writebacks(struct btrfs_block_group *block_group);
+static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written);
+
static inline bool sb_zone_is_full(const struct blk_zone *zone)
{
return (zone->cond == BLK_ZONE_COND_FULL) ||
@@ -1747,6 +1750,62 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
}
}
+static bool check_bg_is_active(struct btrfs_eb_write_context *ctx,
+ struct btrfs_block_group **active_bg)
+{
+ const struct writeback_control *wbc = ctx->wbc;
+ struct btrfs_block_group *block_group = ctx->zoned_bg;
+ struct btrfs_fs_info *fs_info = block_group->fs_info;
+
+ if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
+ return true;
+
+ if (fs_info->treelog_bg == block_group->start) {
+ if (!btrfs_zone_activate(block_group)) {
+ int ret_fin = btrfs_zone_finish_one_bg(fs_info);
+
+ if (ret_fin != 1 || !btrfs_zone_activate(block_group))
+ return false;
+ }
+ } else if (*active_bg != block_group) {
+ struct btrfs_block_group *tgt = *active_bg;
+
+ /* zoned_meta_io_lock protects fs_info->active_{meta,system}_bg. */
+ lockdep_assert_held(&fs_info->zoned_meta_io_lock);
+
+ if (tgt) {
+ /*
+ * If there is an unsent IO left in the allocated area,
+ * we cannot wait for them as it may cause a deadlock.
+ */
+ if (tgt->meta_write_pointer < tgt->start + tgt->alloc_offset) {
+ if (wbc->sync_mode == WB_SYNC_NONE ||
+ (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync))
+ return false;
+ }
+
+ /* Pivot active metadata/system block group. */
+ btrfs_zoned_meta_io_unlock(fs_info);
+ wait_eb_writebacks(tgt);
+ do_zone_finish(tgt, true);
+ btrfs_zoned_meta_io_lock(fs_info);
+ if (*active_bg == tgt) {
+ btrfs_put_block_group(tgt);
+ *active_bg = NULL;
+ }
+ }
+ if (!btrfs_zone_activate(block_group))
+ return false;
+ if (*active_bg != block_group) {
+ ASSERT(*active_bg == NULL);
+ *active_bg = block_group;
+ btrfs_get_block_group(block_group);
+ }
+ }
+
+ return true;
+}
+
/*
* Check if @ctx->eb is aligned to the write pointer.
*
@@ -1781,8 +1840,26 @@ int btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
ctx->zoned_bg = block_group;
}
- if (block_group->meta_write_pointer == eb->start)
- return 0;
+ if (block_group->meta_write_pointer == eb->start) {
+ struct btrfs_block_group **tgt;
+
+ if (!test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags))
+ return 0;
+
+ if (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ tgt = &fs_info->active_system_bg;
+ else
+ tgt = &fs_info->active_meta_bg;
+ if (check_bg_is_active(ctx, tgt))
+ return 0;
+ }
+
+ /*
+ * Since we may release fs_info->zoned_meta_io_lock, someone can already
+ * start writing this eb. In that case, we can just bail out.
+ */
+ if (block_group->meta_write_pointer > eb->start)
+ return -EBUSY;
/* If for_sync, this hole will be filled with trasnsaction commit. */
if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x b8bd342d50cbf606666488488f9fea374aceb2d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091602-untapped-scapegoat-dd85@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
b8bd342d50cb ("fuse: nlookup missing decrement in fuse_direntplus_link")
d123d8e1833c ("fuse: split out readdir.c")
63576c13bd17 ("fuse: fix initial parallel dirops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b8bd342d50cbf606666488488f9fea374aceb2d5 Mon Sep 17 00:00:00 2001
From: ruanmeisi <ruan.meisi(a)zte.com.cn>
Date: Tue, 25 Apr 2023 19:13:54 +0800
Subject: [PATCH] fuse: nlookup missing decrement in fuse_direntplus_link
During our debugging of glusterfs, we found an Assertion failed error:
inode_lookup >= nlookup, which was caused by the nlookup value in the
kernel being greater than that in the FUSE file system.
The issue was introduced by fuse_direntplus_link, where in the function,
fuse_iget increments nlookup, and if d_splice_alias returns failure,
fuse_direntplus_link returns failure without decrementing nlookup
https://github.com/gluster/glusterfs/pull/4081
Signed-off-by: ruanmeisi <ruan.meisi(a)zte.com.cn>
Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support")
Cc: <stable(a)vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index dc603479b30e..b3d498163f97 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -243,8 +243,16 @@ static int fuse_direntplus_link(struct file *file,
dput(dentry);
dentry = alias;
}
- if (IS_ERR(dentry))
+ if (IS_ERR(dentry)) {
+ if (!IS_ERR(inode)) {
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fi->nlookup--;
+ spin_unlock(&fi->lock);
+ }
return PTR_ERR(dentry);
+ }
}
if (fc->readdirplus_auto)
set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091649-spotlight-ounce-dd0d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 772c9f691dcf3a487f29ddb90a5a15c78d7328e1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091649-ember-remindful-28e0@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
772c9f691dcf ("ext4: don't use CR_BEST_AVAIL_LEN for non-regular files")
b50675a4a6a6 ("ext4: return found group directly in ext4_mb_choose_next_group_goal_fast")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 772c9f691dcf3a487f29ddb90a5a15c78d7328e1 Mon Sep 17 00:00:00 2001
From: Ritesh Harjani <ritesh.list(a)gmail.com>
Date: Sun, 16 Jul 2023 19:33:34 +0530
Subject: [PATCH] ext4: don't use CR_BEST_AVAIL_LEN for non-regular files
Using CR_BEST_AVAIL_LEN only make sense for regular files, as for
non-regular files we never normalize the allocation request length i.e.
goal len is same as original length (ac_g_ex.fe_len == ac_o_ex.fe_len).
Hence there is no scope of trimming the goal length to make it
satisfy original request len. Thus this patch avoids using
CR_BEST_AVAIL_LEN criteria for non-regular files request.
Cc: stable(a)kernel.org
Fixes: 33122aa930f1 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Reported-by: Eric Whitney <enwlinux(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Tested-by: Eric Whitney <enwlinux(a)gmail.com>
Link: https://lore.kernel.org/r/2a694c748ff8b8c4b416995a24f06f07b55047a8.16895160…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index b89b5f0816e7..3d5b0b71d7f5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -966,7 +966,18 @@ static void ext4_mb_choose_next_group_goal_fast(struct ext4_allocation_context *
}
}
- *new_cr = CR_BEST_AVAIL_LEN;
+ /*
+ * CR_BEST_AVAIL_LEN works based on the concept that we have
+ * a larger normalized goal len request which can be trimmed to
+ * a smaller goal len such that it can still satisfy original
+ * request len. However, allocation request for non-regular
+ * files never gets normalized.
+ * See function ext4_mb_normalize_request() (EXT4_MB_HINT_DATA).
+ */
+ if (ac->ac_flags & EXT4_MB_HINT_DATA)
+ *new_cr = CR_BEST_AVAIL_LEN;
+ else
+ *new_cr = CR_GOAL_LEN_SLOW;
}
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091648-dinginess-legacy-08ce@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091646-safely-preflight-9684@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091644-chamomile-presoak-599f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091643-excluding-lining-7075@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091642-shucking-scant-a1b0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7ca4b085f430f3774c3838b3da569ceccd6a0177
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091607-removal-popular-03e9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7ca4b085f430 ("ext4: fix memory leaks in ext4_fname_{setup_filename,prepare_lookup}")
3030b59c8533 ("ext4: cleanup function defs from ext4.h into crypto.c")
b1241c8eb977 ("ext4: move ext4 crypto code to its own file crypto.c")
5298d4bfe80f ("unicode: clean up the Kconfig symbol confusion")
6661224e66f0 ("Merge tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7ca4b085f430f3774c3838b3da569ceccd6a0177 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lu=C3=ADs=20Henriques?= <lhenriques(a)suse.de>
Date: Thu, 3 Aug 2023 10:17:13 +0100
Subject: [PATCH] ext4: fix memory leaks in
ext4_fname_{setup_filename,prepare_lookup}
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If the filename casefolding fails, we'll be leaking memory from the
fscrypt_name struct, namely from the 'crypto_buf.name' member.
Make sure we free it in the error path on both ext4_fname_setup_filename()
and ext4_fname_prepare_lookup() functions.
Cc: stable(a)kernel.org
Fixes: 1ae98e295fa2 ("ext4: optimize match for casefolded encrypted dirs")
Signed-off-by: Luís Henriques <lhenriques(a)suse.de>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20230803091713.13239-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/crypto.c b/fs/ext4/crypto.c
index e20ac0654b3f..453d4da5de52 100644
--- a/fs/ext4/crypto.c
+++ b/fs/ext4/crypto.c
@@ -33,6 +33,8 @@ int ext4_fname_setup_filename(struct inode *dir, const struct qstr *iname,
#if IS_ENABLED(CONFIG_UNICODE)
err = ext4_fname_setup_ci_filename(dir, iname, fname);
+ if (err)
+ ext4_fname_free_filename(fname);
#endif
return err;
}
@@ -51,6 +53,8 @@ int ext4_fname_prepare_lookup(struct inode *dir, struct dentry *dentry,
#if IS_ENABLED(CONFIG_UNICODE)
err = ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname);
+ if (err)
+ ext4_fname_free_filename(fname);
#endif
return err;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2dfba3bb40ad8536b9fa802364f2d40da31aa88e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091621-unviable-character-f8d2@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2dfba3bb40ad ("jbd2: correct the end of the journal recovery scan range")
cb3b3bf22cf3 ("jbd2: rename jbd_debug() to jbd2_debug()")
f7f497cb7024 ("jbd2: kill t_handle_lock transaction spinlock")
cc16eecae687 ("jbd2: fix use-after-free of transaction_t race")
4f9818684870 ("jbd2: refactor wait logic for transaction updates into a common function")
fcdf3c34b7ab ("ext4: fix debug format string warning")
d556435156b7 ("jbd2: avoid -Wempty-body warnings")
3042b1b45c41 ("Updated locking documentation for transaction_t")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2dfba3bb40ad8536b9fa802364f2d40da31aa88e Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Mon, 26 Jun 2023 15:33:22 +0800
Subject: [PATCH] jbd2: correct the end of the journal recovery scan range
We got a filesystem inconsistency issue below while running generic/475
I/O failure pressure test with fast_commit feature enabled.
Symlink /p3/d3/d1c/d6c/dd6/dce/l101 (inode #132605) is invalid.
If fast_commit feature is enabled, a special fast_commit journal area is
appended to the end of the normal journal area. The journal->j_last
point to the first unused block behind the normal journal area instead
of the whole log area, and the journal->j_fc_last point to the first
unused block behind the fast_commit journal area. While doing journal
recovery, do_one_pass(PASS_SCAN) should first scan the normal journal
area and turn around to the first block once it meet journal->j_last,
but the wrap() macro misuse the journal->j_fc_last, so the recovering
could not read the next magic block (commit block perhaps) and would end
early mistakenly and missing tN and every transaction after it in the
following example. Finally, it could lead to filesystem inconsistency.
| normal journal area | fast commit area |
+-------------------------------------------------+------------------+
| tN(rere) | tN+1 |~| tN-x |...| tN-1 | tN(front) | .... |
+-------------------------------------------------+------------------+
/ / /
start journal->j_last journal->j_fc_last
This patch fix it by use the correct ending journal->j_last.
Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path")
Cc: stable(a)kernel.org
Reported-by: Theodore Ts'o <tytso(a)mit.edu>
Link: https://lore.kernel.org/linux-ext4/20230613043120.GB1584772@mit.edu/
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230626073322.3956567-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 0184931d47f7..c269a7d29a46 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -230,12 +230,8 @@ static int count_tags(journal_t *journal, struct buffer_head *bh)
/* Make sure we wrap around the log correctly! */
#define wrap(journal, var) \
do { \
- unsigned long _wrap_last = \
- jbd2_has_feature_fast_commit(journal) ? \
- (journal)->j_fc_last : (journal)->j_last; \
- \
- if (var >= _wrap_last) \
- var -= (_wrap_last - (journal)->j_first); \
+ if (var >= (journal)->j_last) \
+ var -= ((journal)->j_last - (journal)->j_first); \
} while (0)
static int fc_do_one_pass(journal_t *journal,
@@ -524,9 +520,7 @@ static int do_one_pass(journal_t *journal,
break;
jbd2_debug(2, "Scanning for sequence ID %u at %lu/%lu\n",
- next_commit_ID, next_log_block,
- jbd2_has_feature_fast_commit(journal) ?
- journal->j_fc_last : journal->j_last);
+ next_commit_ID, next_log_block, journal->j_last);
/* Skip over each chunk of the transaction looking
* either the next descriptor block or the final commit
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2dfba3bb40ad8536b9fa802364f2d40da31aa88e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091620-stage-racoon-f926@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2dfba3bb40ad ("jbd2: correct the end of the journal recovery scan range")
cb3b3bf22cf3 ("jbd2: rename jbd_debug() to jbd2_debug()")
f7f497cb7024 ("jbd2: kill t_handle_lock transaction spinlock")
cc16eecae687 ("jbd2: fix use-after-free of transaction_t race")
4f9818684870 ("jbd2: refactor wait logic for transaction updates into a common function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2dfba3bb40ad8536b9fa802364f2d40da31aa88e Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Mon, 26 Jun 2023 15:33:22 +0800
Subject: [PATCH] jbd2: correct the end of the journal recovery scan range
We got a filesystem inconsistency issue below while running generic/475
I/O failure pressure test with fast_commit feature enabled.
Symlink /p3/d3/d1c/d6c/dd6/dce/l101 (inode #132605) is invalid.
If fast_commit feature is enabled, a special fast_commit journal area is
appended to the end of the normal journal area. The journal->j_last
point to the first unused block behind the normal journal area instead
of the whole log area, and the journal->j_fc_last point to the first
unused block behind the fast_commit journal area. While doing journal
recovery, do_one_pass(PASS_SCAN) should first scan the normal journal
area and turn around to the first block once it meet journal->j_last,
but the wrap() macro misuse the journal->j_fc_last, so the recovering
could not read the next magic block (commit block perhaps) and would end
early mistakenly and missing tN and every transaction after it in the
following example. Finally, it could lead to filesystem inconsistency.
| normal journal area | fast commit area |
+-------------------------------------------------+------------------+
| tN(rere) | tN+1 |~| tN-x |...| tN-1 | tN(front) | .... |
+-------------------------------------------------+------------------+
/ / /
start journal->j_last journal->j_fc_last
This patch fix it by use the correct ending journal->j_last.
Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path")
Cc: stable(a)kernel.org
Reported-by: Theodore Ts'o <tytso(a)mit.edu>
Link: https://lore.kernel.org/linux-ext4/20230613043120.GB1584772@mit.edu/
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230626073322.3956567-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 0184931d47f7..c269a7d29a46 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -230,12 +230,8 @@ static int count_tags(journal_t *journal, struct buffer_head *bh)
/* Make sure we wrap around the log correctly! */
#define wrap(journal, var) \
do { \
- unsigned long _wrap_last = \
- jbd2_has_feature_fast_commit(journal) ? \
- (journal)->j_fc_last : (journal)->j_last; \
- \
- if (var >= _wrap_last) \
- var -= (_wrap_last - (journal)->j_first); \
+ if (var >= (journal)->j_last) \
+ var -= ((journal)->j_last - (journal)->j_first); \
} while (0)
static int fc_do_one_pass(journal_t *journal,
@@ -524,9 +520,7 @@ static int do_one_pass(journal_t *journal,
break;
jbd2_debug(2, "Scanning for sequence ID %u at %lu/%lu\n",
- next_commit_ID, next_log_block,
- jbd2_has_feature_fast_commit(journal) ?
- journal->j_fc_last : journal->j_last);
+ next_commit_ID, next_log_block, journal->j_last);
/* Skip over each chunk of the transaction looking
* either the next descriptor block or the final commit
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a2cb9cd6a3949a3804ad9fd7da234892ce6719ec
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091639-gallantly-pellet-e738@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a2cb9cd6a394 ("misc: fastrpc: Fix incorrect DMA mapping unmap request")
791da5c7fedb ("misc: fastrpc: Prepare to dynamic dma-buf locking specification")
e90d91190619 ("misc: fastrpc: Add support to secure memory map")
7f1f481263c3 ("misc: fastrpc: check before loading process to the DSP")
3abe3ab3cdab ("misc: fastrpc: add secure domain support")
6c16fd8bdd40 ("misc: fastrpc: Add support to get DSP capabilities")
5c1b97c7d7b7 ("misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP")
965602eabb57 ("misc: fastrpc: separate fastrpc device from channel context")
304b0ba0a21b ("misc: fastrpc: Update number of max fastrpc sessions")
6010d9befc8d ("misc: fastrpc: add ioctl for attaching to sensors pd")
84195d206e1f ("misc: fastrpc: define names for protection domain ids")
7c920da30e04 ("misc: fastrpc: fix indentation error in uapi header")
0978de9fc733 ("misc: fastrpc: Fix an incomplete memory release in fastrpc_rpmsg_probe()")
2d10d2d17072 ("misc: fastrpc: fix memory leak from miscdev->name")
2419e55e532d ("misc: fastrpc: add mmap/unmap support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2cb9cd6a3949a3804ad9fd7da234892ce6719ec Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 11 Aug 2023 12:56:42 +0100
Subject: [PATCH] misc: fastrpc: Fix incorrect DMA mapping unmap request
Scatterlist table is obtained during map create request and the same
table is used for DMA mapping unmap. In case there is any failure
while getting the sg_table, ERR_PTR is returned instead of sg_table.
When the map is getting freed, there is only a non-NULL check of
sg_table which will also be true in case failure was returned instead
of sg_table. This would result in improper unmap request. Add proper
check before setting map table to avoid bad unmap request.
Fixes: c68cfb718c8f ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-3-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 7d8818a4089f..0b376d9a2744 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -757,6 +757,7 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
{
struct fastrpc_session_ctx *sess = fl->sctx;
struct fastrpc_map *map = NULL;
+ struct sg_table *table;
int err = 0;
if (!fastrpc_map_lookup(fl, fd, ppmap, true))
@@ -784,11 +785,12 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
goto attach_err;
}
- map->table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
- if (IS_ERR(map->table)) {
- err = PTR_ERR(map->table);
+ table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(table)) {
+ err = PTR_ERR(table);
goto map_err;
}
+ map->table = table;
if (attr & FASTRPC_ATTR_SECUREMAP) {
map->phys = sg_phys(map->table->sgl);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a2cb9cd6a3949a3804ad9fd7da234892ce6719ec
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091638-hamstring-rocky-cf79@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a2cb9cd6a394 ("misc: fastrpc: Fix incorrect DMA mapping unmap request")
791da5c7fedb ("misc: fastrpc: Prepare to dynamic dma-buf locking specification")
e90d91190619 ("misc: fastrpc: Add support to secure memory map")
7f1f481263c3 ("misc: fastrpc: check before loading process to the DSP")
3abe3ab3cdab ("misc: fastrpc: add secure domain support")
6c16fd8bdd40 ("misc: fastrpc: Add support to get DSP capabilities")
5c1b97c7d7b7 ("misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP")
965602eabb57 ("misc: fastrpc: separate fastrpc device from channel context")
304b0ba0a21b ("misc: fastrpc: Update number of max fastrpc sessions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2cb9cd6a3949a3804ad9fd7da234892ce6719ec Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 11 Aug 2023 12:56:42 +0100
Subject: [PATCH] misc: fastrpc: Fix incorrect DMA mapping unmap request
Scatterlist table is obtained during map create request and the same
table is used for DMA mapping unmap. In case there is any failure
while getting the sg_table, ERR_PTR is returned instead of sg_table.
When the map is getting freed, there is only a non-NULL check of
sg_table which will also be true in case failure was returned instead
of sg_table. This would result in improper unmap request. Add proper
check before setting map table to avoid bad unmap request.
Fixes: c68cfb718c8f ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-3-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 7d8818a4089f..0b376d9a2744 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -757,6 +757,7 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
{
struct fastrpc_session_ctx *sess = fl->sctx;
struct fastrpc_map *map = NULL;
+ struct sg_table *table;
int err = 0;
if (!fastrpc_map_lookup(fl, fd, ppmap, true))
@@ -784,11 +785,12 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
goto attach_err;
}
- map->table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
- if (IS_ERR(map->table)) {
- err = PTR_ERR(map->table);
+ table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(table)) {
+ err = PTR_ERR(table);
goto map_err;
}
+ map->table = table;
if (attr & FASTRPC_ATTR_SECUREMAP) {
map->phys = sg_phys(map->table->sgl);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a2cb9cd6a3949a3804ad9fd7da234892ce6719ec
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091637-clinic-wanted-7595@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a2cb9cd6a394 ("misc: fastrpc: Fix incorrect DMA mapping unmap request")
791da5c7fedb ("misc: fastrpc: Prepare to dynamic dma-buf locking specification")
e90d91190619 ("misc: fastrpc: Add support to secure memory map")
7f1f481263c3 ("misc: fastrpc: check before loading process to the DSP")
3abe3ab3cdab ("misc: fastrpc: add secure domain support")
6c16fd8bdd40 ("misc: fastrpc: Add support to get DSP capabilities")
5c1b97c7d7b7 ("misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP")
965602eabb57 ("misc: fastrpc: separate fastrpc device from channel context")
304b0ba0a21b ("misc: fastrpc: Update number of max fastrpc sessions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2cb9cd6a3949a3804ad9fd7da234892ce6719ec Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 11 Aug 2023 12:56:42 +0100
Subject: [PATCH] misc: fastrpc: Fix incorrect DMA mapping unmap request
Scatterlist table is obtained during map create request and the same
table is used for DMA mapping unmap. In case there is any failure
while getting the sg_table, ERR_PTR is returned instead of sg_table.
When the map is getting freed, there is only a non-NULL check of
sg_table which will also be true in case failure was returned instead
of sg_table. This would result in improper unmap request. Add proper
check before setting map table to avoid bad unmap request.
Fixes: c68cfb718c8f ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-3-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 7d8818a4089f..0b376d9a2744 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -757,6 +757,7 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
{
struct fastrpc_session_ctx *sess = fl->sctx;
struct fastrpc_map *map = NULL;
+ struct sg_table *table;
int err = 0;
if (!fastrpc_map_lookup(fl, fd, ppmap, true))
@@ -784,11 +785,12 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
goto attach_err;
}
- map->table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
- if (IS_ERR(map->table)) {
- err = PTR_ERR(map->table);
+ table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(table)) {
+ err = PTR_ERR(table);
goto map_err;
}
+ map->table = table;
if (attr & FASTRPC_ATTR_SECUREMAP) {
map->phys = sg_phys(map->table->sgl);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a2cb9cd6a3949a3804ad9fd7da234892ce6719ec
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091636-perceive-unabashed-178b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a2cb9cd6a394 ("misc: fastrpc: Fix incorrect DMA mapping unmap request")
791da5c7fedb ("misc: fastrpc: Prepare to dynamic dma-buf locking specification")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2cb9cd6a3949a3804ad9fd7da234892ce6719ec Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 11 Aug 2023 12:56:42 +0100
Subject: [PATCH] misc: fastrpc: Fix incorrect DMA mapping unmap request
Scatterlist table is obtained during map create request and the same
table is used for DMA mapping unmap. In case there is any failure
while getting the sg_table, ERR_PTR is returned instead of sg_table.
When the map is getting freed, there is only a non-NULL check of
sg_table which will also be true in case failure was returned instead
of sg_table. This would result in improper unmap request. Add proper
check before setting map table to avoid bad unmap request.
Fixes: c68cfb718c8f ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-3-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 7d8818a4089f..0b376d9a2744 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -757,6 +757,7 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
{
struct fastrpc_session_ctx *sess = fl->sctx;
struct fastrpc_map *map = NULL;
+ struct sg_table *table;
int err = 0;
if (!fastrpc_map_lookup(fl, fd, ppmap, true))
@@ -784,11 +785,12 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
goto attach_err;
}
- map->table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
- if (IS_ERR(map->table)) {
- err = PTR_ERR(map->table);
+ table = dma_buf_map_attachment_unlocked(map->attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(table)) {
+ err = PTR_ERR(table);
goto map_err;
}
+ map->table = table;
if (attr & FASTRPC_ATTR_SECUREMAP) {
map->phys = sg_phys(map->table->sgl);
Hi,
There are some warning traces being reported in linux-6.5.y related to
the relatively recent colorspace property.
A workaround is landed in 6.6-rc1 to avoid the traces.
Can you please backport this back to linux-6.5.y?
69a959610229 ("drm/amd/display: Temporary Disable MST DP Colorspace
Property")
Thanks!
Hi,
Can you please apply "watchdog: advantech_ec_wdt: fix Kconfig dependencies" to the 6.5.y branch?
commit 6eb28a38f6478a650c7e76b2d6910669615d8a62 upstream.
This patch fixes a configuration bug in the advantech_ec_wdt (Advantech Embedded Controller Watchdog) driver where it can be compiled into a noop driver.
I come at the Debian kernel maintainer suggestion following my attempt at adding this driver to their kernel [0].
Thanks!
Regards,
[0]: https://salsa.debian.org/kernel-team/linux/-/merge_requests/841#note_427523
--
Yoann Congal
Smile ECS - Tech Expert
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x bb5e7f234eacf34b65be67ebb3613e3b8cf11b87
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091354-atom-tinderbox-b9be@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
bb5e7f234eac ("Multi-gen LRU: avoid race in inc_min_seq()")
391655fe08d1 ("mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bb5e7f234eacf34b65be67ebb3613e3b8cf11b87 Mon Sep 17 00:00:00 2001
From: Kalesh Singh <kaleshsingh(a)google.com>
Date: Tue, 1 Aug 2023 19:56:03 -0700
Subject: [PATCH] Multi-gen LRU: avoid race in inc_min_seq()
inc_max_seq() will try to inc_min_seq() if nr_gens == MAX_NR_GENS. This
is because the generations are reused (the last oldest now empty
generation will become the next youngest generation).
inc_min_seq() is retried until successful, dropping the lru_lock
and yielding the CPU on each failure, and retaking the lock before
trying again:
while (!inc_min_seq(lruvec, type, can_swap)) {
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock);
}
However, the initial condition that required incrementing the min_seq
(nr_gens == MAX_NR_GENS) is not retested. This can change by another
call to inc_max_seq() from run_aging() with force_scan=true from the
debugfs interface.
Since the eviction stalls when the nr_gens == MIN_NR_GENS, avoid
unnecessarily incrementing the min_seq by rechecking the number of
generations before each attempt.
This issue was uncovered in previous discussion on the list by Yu Zhao
and Aneesh Kumar [1].
[1] https://lore.kernel.org/linux-mm/CAOUHufbO7CaVm=xjEb1avDhHVvnC8pJmGyKcFf2iY…
Link: https://lkml.kernel.org/r/20230802025606.346758-2-kaleshsingh@google.com
Fixes: d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface")
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com> [mediatek]
Tested-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Aneesh Kumar K V <aneesh.kumar(a)linux.ibm.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Jan Alexander Steffens (heftig) <heftig(a)archlinux.org>
Cc: Lecopzer Chen <lecopzer.chen(a)mediatek.com>
Cc: Matthias Brugger <matthias.bgg(a)gmail.com>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Steven Barrett <steven(a)liquorix.net>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 489a4fc7d9b1..6eecd291756c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4439,7 +4439,7 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
int prev, next;
int type, zone;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
-
+restart:
spin_lock_irq(&lruvec->lru_lock);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4450,11 +4450,12 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
VM_WARN_ON_ONCE(!force_scan && (type == LRU_GEN_FILE || can_swap));
- while (!inc_min_seq(lruvec, type, can_swap)) {
- spin_unlock_irq(&lruvec->lru_lock);
- cond_resched();
- spin_lock_irq(&lruvec->lru_lock);
- }
+ if (inc_min_seq(lruvec, type, can_swap))
+ continue;
+
+ spin_unlock_irq(&lruvec->lru_lock);
+ cond_resched();
+ goto restart;
}
/*
Hi Greg,
could you please cherry-pick this upstream commit:
08700ec70504 ("linux/export: fix reference to exported functions for parisc64")
to the v6.5 stable kernel series?
Thanks!
Helge
---
From 08700ec705043eb0cee01b35cf5b9d63f0230d12 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <masahiroy(a)kernel.org>
Date: Wed, 6 Sep 2023 03:46:57 +0900
Subject: [PATCH] linux/export: fix reference to exported functions for
parisc64
John David Anglin reported parisc has been broken since commit
ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost").
Like ia64, parisc64 uses a function descriptor. The function
references must be prefixed with P%.
Also, symbols prefixed $$ from the library have the symbol type
STT_LOPROC instead of STT_FUNC. They should be handled as functions
too.
Fixes: ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
Reported-by: John David Anglin <dave.anglin(a)bell.net>
Tested-by: John David Anglin <dave.anglin(a)bell.net>
Tested-by: Helge Deller <deller(a)gmx.de>
Closes: https://lore.kernel.org/linux-parisc/1901598a-e11d-f7dd-a5d9-9a69d06e6b6e@b…
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/include/linux/export-internal.h b/include/linux/export-internal.h
index 1c849db953a5..45fca09b2319 100644
--- a/include/linux/export-internal.h
+++ b/include/linux/export-internal.h
@@ -52,6 +52,8 @@
#ifdef CONFIG_IA64
#define KSYM_FUNC(name) @fptr(name)
+#elif defined(CONFIG_PARISC) && defined(CONFIG_64BIT)
+#define KSYM_FUNC(name) P%name
#else
#define KSYM_FUNC(name) name
#endif
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index b29b29707f10..ba981f22908a 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1226,6 +1226,15 @@ static void check_export_symbol(struct module *mod, struct elf_info *elf,
*/
s->is_func = (ELF_ST_TYPE(sym->st_info) == STT_FUNC);
+ /*
+ * For parisc64, symbols prefixed $$ from the library have the symbol type
+ * STT_LOPROC. They should be handled as functions too.
+ */
+ if (elf->hdr->e_ident[EI_CLASS] == ELFCLASS64 &&
+ elf->hdr->e_machine == EM_PARISC &&
+ ELF_ST_TYPE(sym->st_info) == STT_LOPROC)
+ s->is_func = true;
+
if (match(secname, PATTERNS(INIT_SECTIONS)))
warn("%s: %s: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.\n",
mod->name, name);
From: Keith Busch <kbusch(a)kernel.org>
Some devices are reporting controller ready mode support, but return 0
for CRTO. These devices require a much higher time to ready than that,
so they are failing to initialize after the driver starter preferring
that value over CAP.TO.
The spec requires that CAP.TO match the appropritate CRTO value, or be
set to 0xff if CRTO is larger than that. This means that CAP.TO can be
used to validate if CRTO is reliable, and provides an appropriate
fallback for setting the timeout value if not. Use whichever is larger.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863
Reported-by: Cláudio Sampaio <patola(a)gmail.com>
Reported-by: Felix Yan <felixonmars(a)archlinux.org>
Based-on-a-patch-by: Felix Yan <felixonmars(a)archlinux.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
---
drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37b6fa7466620..4adc0b2f12f1e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2245,25 +2245,8 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
else
ctrl->ctrl_config = NVME_CC_CSS_NVM;
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
- u32 crto;
-
- ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
- if (ret) {
- dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
- ret);
- return ret;
- }
-
- if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
- ctrl->ctrl_config |= NVME_CC_CRIME;
- timeout = NVME_CRTO_CRIMT(crto);
- } else {
- timeout = NVME_CRTO_CRWMT(crto);
- }
- } else {
- timeout = NVME_CAP_TIMEOUT(ctrl->cap);
- }
+ if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
+ ctrl->ctrl_config |= NVME_CC_CRIME;
ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT;
ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
@@ -2277,6 +2260,33 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
if (ret)
return ret;
+ /* CAP value may change after initial CC write */
+ ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
+ if (ret)
+ return ret;
+
+ timeout = NVME_CAP_TIMEOUT(ctrl->cap);
+ if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
+ u32 crto;
+
+ ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
+ if (ret) {
+ dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
+ ret);
+ return ret;
+ }
+
+ /*
+ * CRTO should always be greater or equal to CAP.TO, but some
+ * devices are known to get this wrong. Use the larger of the
+ * two values.
+ */
+ if (ctrl->ctrl_config & NVME_CC_CRIME)
+ timeout = max(timeout, NVME_CRTO_CRIMT(crto));
+ else
+ timeout = max(timeout, NVME_CRTO_CRWMT(crto));
+ }
+
ctrl->ctrl_config |= NVME_CC_ENABLE;
ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config);
if (ret)
--
2.34.1
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
Changes in v3:
- combine v1 and v2 into a series; I didn't recognize that this macro
appeared twice in the kernel sources.
- Use __PASTE twice.
- Link to v2: https://lore.kernel.org/r/20230915-bpf_collision-v2-1-027670d38bdf@google.c…
---
Jiri Olsa (1):
bpf: Fix BTF_ID symbol generation collision
Nick Desaulniers (1):
bpf: Fix BTF_ID symbol generation collision in tools/
include/linux/btf_ids.h | 2 +-
tools/include/linux/btf_ids.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
---
base-commit: 9fdfb15a3dbf818e06be514f4abbfc071004cbe7
change-id: 20230915-bpf_collision-36889a391d44
Best regards,
--
Nick Desaulniers <ndesaulniers(a)google.com>
On eDP we can receive invalid modes from dm_update_crtc_state() for
entirely new streams for which drm_mode_set_crtcinfo() shouldn't be
called on. So, instead of calling drm_mode_set_crtcinfo() from within
create_stream_for_sink() we can instead call it from
amdgpu_dm_connector_mode_valid(). Since, we are guaranteed to only call
drm_mode_set_crtcinfo() for valid modes from that function (invalid
modes are rejected by that callback) and that is the only user
of create_validate_stream_for_sink() that we need to call
drm_mode_set_crtcinfo() for (as before commit cb841d27b876
("drm/amd/display: Always pass connector_state to stream validation"),
that is the only place where create_validate_stream_for_sink()'s
dm_state was NULL).
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2693
Fixes: cb841d27b876 ("drm/amd/display: Always pass connector_state to stream validation")
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 933c9b5d5252..beef4fef7338 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6128,8 +6128,6 @@ create_stream_for_sink(struct amdgpu_dm_connector *aconnector,
if (recalculate_timing)
drm_mode_set_crtcinfo(&saved_mode, 0);
- else if (!old_stream)
- drm_mode_set_crtcinfo(&mode, 0);
/*
* If scaling is enabled and refresh rate didn't change
@@ -6691,6 +6689,8 @@ enum drm_mode_status amdgpu_dm_connector_mode_valid(struct drm_connector *connec
goto fail;
}
+ drm_mode_set_crtcinfo(mode, 0);
+
stream = create_validate_stream_for_sink(aconnector, mode,
to_dm_connector_state(connector->state),
NULL);
--
2.42.0
Marcus and Satya reported an issue where BTF_ID macro generates same
symbol in separate objects and that breaks final vmlinux link.
ld.lld: error: ld-temp.o <inline asm>:14577:1: symbol
'__BTF_ID__struct__cgroup__624' is already defined
This can be triggered under specific configs when __COUNTER__ happens to
be the same for the same symbol in two different translation units,
which is already quite unlikely to happen.
Add __LINE__ number suffix to make BTF_ID symbol more unique, which is
not a complete fix, but it would help for now and meanwhile we can work
on better solution as suggested by Andrii.
Cc: stable(a)vger.kernel.org
Reported-by: Satya Durga Srinivasu Prabhala <quic_satyap(a)quicinc.com>
Reported-by: Marcus Seyfarth <m.seyfarth(a)gmail.com>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1913
Tested-by: Marcus Seyfarth <m.seyfarth(a)gmail.com>
Debugged-by: Nathan Chancellor <nathan(a)kernel.org>
Co-developed-by: Jiri Olsa <jolsa(a)kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4Bzb5KQ2_LmhN769ifMeSJaWfebccUasQOfQKaOd0nQ…
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
tools/include/linux/btf_ids.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/linux/btf_ids.h b/tools/include/linux/btf_ids.h
index 71e54b1e3796..30e920b96a18 100644
--- a/tools/include/linux/btf_ids.h
+++ b/tools/include/linux/btf_ids.h
@@ -38,7 +38,7 @@ asm( \
____BTF_ID(symbol)
#define __ID(prefix) \
- __PASTE(prefix, __COUNTER__)
+ __PASTE(prefix, __COUNTER__ __LINE__)
/*
* The BTF_ID defines unique symbol for each ID pointing
---
base-commit: 9fdfb15a3dbf818e06be514f4abbfc071004cbe7
change-id: 20230915-bpf_collision-36889a391d44
Best regards,
--
Nick Desaulniers <ndesaulniers(a)google.com>
The synth traces incorrectly print pointer to the synthetic event values
instead of the actual value when using u64 type. Fix by addressing the
contents of the union properly.
Fixes: ddeea494a16f ("tracing/synthetic: Use union instead of casts")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tero Kristo <tero.kristo(a)linux.intel.com>
---
kernel/trace/trace_events_synth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 7fff8235075f..070365959c0a 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -337,7 +337,7 @@ static void print_synth_event_num_val(struct trace_seq *s,
break;
default:
- trace_seq_printf(s, print_fmt, name, val, space);
+ trace_seq_printf(s, print_fmt, name, val->as_u64, space);
break;
}
}
--
2.40.1
From: Lukas Wunner <lukas(a)wunner.de>
The Event Handler Busy bit shall be cleared by software when the Event
Ring is empty. The xHC is thereby informed that it may raise another
interrupt once it has enqueued new events (sec 4.17.2).
However since commit dc0ffbea5729 ("usb: host: xhci: update event ring
dequeue pointer on purpose"), the EHB bit is already cleared after half
a segment has been processed.
As a result, spurious interrupts may occur:
- xhci_irq() processes half a segment, clears EHB, continues processing
remaining events.
- xHC enqueues new events. Because EHB has been cleared, xHC sets
Interrupt Pending bit. Interrupt moderation countdown begins.
- Meanwhile xhci_irq() continues processing events. Interrupt
moderation countdown reaches zero, so an MSI interrupt is signaled.
- xhci_irq() empties the Event Ring, clears EHB again and is done.
- Because an MSI interrupt has been signaled, xhci_irq() is run again.
It discovers there's nothing to do and returns IRQ_NONE.
Avoid by clearing the EHB bit only at the end of xhci_irq().
Fixes: dc0ffbea5729 ("usb: host: xhci: update event ring dequeue pointer on purpose")
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Cc: stable(a)vger.kernel.org # v5.5+
Cc: Peter Chen <peter.chen(a)kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 98389b568633..3e5dc0723a8f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2996,7 +2996,8 @@ static int xhci_handle_event(struct xhci_hcd *xhci, struct xhci_interrupter *ir)
*/
static void xhci_update_erst_dequeue(struct xhci_hcd *xhci,
struct xhci_interrupter *ir,
- union xhci_trb *event_ring_deq)
+ union xhci_trb *event_ring_deq,
+ bool clear_ehb)
{
u64 temp_64;
dma_addr_t deq;
@@ -3017,12 +3018,13 @@ static void xhci_update_erst_dequeue(struct xhci_hcd *xhci,
return;
/* Update HC event ring dequeue pointer */
- temp_64 &= ERST_PTR_MASK;
+ temp_64 &= ERST_DESI_MASK;
temp_64 |= ((u64) deq & (u64) ~ERST_PTR_MASK);
}
/* Clear the event handler busy flag (RW1C) */
- temp_64 |= ERST_EHB;
+ if (clear_ehb)
+ temp_64 |= ERST_EHB;
xhci_write_64(xhci, temp_64, &ir->ir_set->erst_dequeue);
}
@@ -3103,7 +3105,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
while (xhci_handle_event(xhci, ir) > 0) {
if (event_loop++ < TRBS_PER_SEGMENT / 2)
continue;
- xhci_update_erst_dequeue(xhci, ir, event_ring_deq);
+ xhci_update_erst_dequeue(xhci, ir, event_ring_deq, false);
event_ring_deq = ir->event_ring->dequeue;
/* ring is half-full, force isoc trbs to interrupt more often */
@@ -3113,7 +3115,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
event_loop = 0;
}
- xhci_update_erst_dequeue(xhci, ir, event_ring_deq);
+ xhci_update_erst_dequeue(xhci, ir, event_ring_deq, true);
ret = IRQ_HANDLED;
out:
--
2.25.1
xhci-hub.c tracks suspended ports in a suspended_port bitfield.
This is checked when responding to a Get_Status(PORT) request to see if a
port in running U0 state was recently resumed, and adds the required
USB_PORT_STAT_C_SUSPEND change bit in those cases.
The suspended_port bit was left uncleared if a device is disconnected
during suspend. The bit remained set even when a new device was connected
and enumerated. The set bit resulted in a incorrect Get_Status(PORT)
response with a bogus USB_PORT_STAT_C_SUSPEND change
bit set once the new device reached U0 link state.
USB_PORT_STAT_C_SUSPEND change bit is only used for USB2 ports, but
xhci-hub keeps track of both USB2 and USB3 suspended ports.
Cc: stable(a)vger.kernel.org
Reported-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
Closes: https://lore.kernel.org/linux-usb/d68aa806-b26a-0e43-42fb-b8067325e967@quic…
Fixes: 1d5810b6923c ("xhci: Rework port suspend structures for limited ports.")
Tested-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-hub.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 0054d02239e2..0df5d807a77e 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1062,19 +1062,19 @@ static void xhci_get_usb3_port_status(struct xhci_port *port, u32 *status,
*status |= USB_PORT_STAT_C_CONFIG_ERROR << 16;
/* USB3 specific wPortStatus bits */
- if (portsc & PORT_POWER) {
+ if (portsc & PORT_POWER)
*status |= USB_SS_PORT_STAT_POWER;
- /* link state handling */
- if (link_state == XDEV_U0)
- bus_state->suspended_ports &= ~(1 << portnum);
- }
- /* remote wake resume signaling complete */
- if (bus_state->port_remote_wakeup & (1 << portnum) &&
+ /* no longer suspended or resuming */
+ if (link_state != XDEV_U3 &&
link_state != XDEV_RESUME &&
link_state != XDEV_RECOVERY) {
- bus_state->port_remote_wakeup &= ~(1 << portnum);
- usb_hcd_end_port_resume(&hcd->self, portnum);
+ /* remote wake resume signaling complete */
+ if (bus_state->port_remote_wakeup & (1 << portnum)) {
+ bus_state->port_remote_wakeup &= ~(1 << portnum);
+ usb_hcd_end_port_resume(&hcd->self, portnum);
+ }
+ bus_state->suspended_ports &= ~(1 << portnum);
}
xhci_hub_report_usb3_link_state(xhci, status, portsc);
@@ -1131,6 +1131,7 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
usb_hcd_end_port_resume(&port->rhub->hcd->self, portnum);
}
port->rexit_active = 0;
+ bus_state->suspended_ports &= ~(1 << portnum);
}
}
--
2.25.1
From: Wesley Cheng <quic_wcheng(a)quicinc.com>
As mentioned in:
commit 474ed23a6257 ("xhci: align the last trb before link if it is
easily splittable.")
A bounce buffer is utilized for ensuring that transfers that span across
ring segments are aligned to the EP's max packet size. However, the device
that is used to map the DMA buffer to is currently using the XHCI HCD,
which does not carry any DMA operations in certain configrations.
Migration to using the sysdev entry was introduced for DWC3 based
implementations where the IOMMU operations are present.
Replace the reference to the controller device to sysdev instead. This
allows the bounce buffer to be properly mapped to any implementations that
have an IOMMU involved.
cc: <stable(a)vger.kernel.org>
Fixes: 4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA configuration")
Signed-off-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1dde53f6eb31..98389b568633 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -798,7 +798,7 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
static void xhci_unmap_td_bounce_buffer(struct xhci_hcd *xhci,
struct xhci_ring *ring, struct xhci_td *td)
{
- struct device *dev = xhci_to_hcd(xhci)->self.controller;
+ struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
struct xhci_segment *seg = td->bounce_seg;
struct urb *urb = td->urb;
size_t len;
@@ -3469,7 +3469,7 @@ static u32 xhci_td_remainder(struct xhci_hcd *xhci, int transferred,
static int xhci_align_td(struct xhci_hcd *xhci, struct urb *urb, u32 enqd_len,
u32 *trb_buff_len, struct xhci_segment *seg)
{
- struct device *dev = xhci_to_hcd(xhci)->self.controller;
+ struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
unsigned int unalign;
unsigned int max_pkt;
u32 new_buff_len;
--
2.25.1
From: Anthony Krowiak <akrowiak(a)linux.ibm.com>
In the vfio_ap_irq_enable function, after the page containing the
notification indicator byte (NIB) is pinned, the function attempts
to register the guest ISC. If registration fails, the function sets the
status response code and returns without unpinning the page containing
the NIB. In order to avoid a memory leak, the NIB should be unpinned before
returning from the vfio_ap_irq_enable function.
Fixes: 783f0a3ccd79 ("s390/vfio-ap: add s390dbf logging to the vfio_ap_irq_enable function")
Signed-off-by: Janosch Frank <frankja(a)linux.ibm.com>
Signed-off-by: Anthony Krowiak <akrowiak(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 4db538a55192..9cb28978c186 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -457,6 +457,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
VFIO_AP_DBF_WARN("%s: gisc registration failed: nisc=%d, isc=%d, apqn=%#04x\n",
__func__, nisc, isc, q->apqn);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, nib, 1);
status.response_code = AP_RESPONSE_INVALID_GISA;
return status;
}
--
2.41.0
On 09/14/ , Felix Kuehling wrote:
> On 2023-09-14 10:02, Christian König wrote:
Do we still need to use legacy flush to emulate heavyweight flush
if we don't use SVM? And can I push this now?
Regards,
Lang
> >
> > Am 14.09.23 um 15:59 schrieb Felix Kuehling:
> > >
> > > On 2023-09-14 9:39, Christian König wrote:
> > > > Is a single legacy flush sufficient to emulate an heavyweight
> > > > flush as well?
> > > >
> > > > On previous generations we needed to issue at least two legacy
> > > > flushes for this.
> > > I assume you are referring to the Vega20 XGMI workaround. That is a
> > > very different issue. Because PTEs would be cached in L2, we had to
> > > always use a heavy-weight flush that would also flush the L2 cache
> > > as well, and follow that with another legacy flush to deal with race
> > > conditions where stale PTEs could be re-fetched from L2 before the
> > > L2 flush was complete.
> >
> > No, we also have another (badly documented) workaround which issues a
> > legacy flush before each heavy weight on some hw generations. See the my
> > TLB flush cleanup patches.
> >
> > >
> > > A heavy-weight flush guarantees that there are no more possible
> > > memory accesses using the old PTEs. With physically addressed caches
> > > on GFXv9 that includes a cache flush because the address translation
> > > happened before putting data into the cache. I think the address
> > > translation and cache architecture works differently on GFXv10. So
> > > maybe the cache-flush is not required here.
> > >
> > > But even then a legacy flush probably allows for in-flight memory
> > > accesses with old physical addresses to complete after the TLB
> > > flush. So there is a small risk of memory corruption that was
> > > assumed to not be accessed by the GPU any more. Or when using IOMMU
> > > device isolation it would result in IOMMU faults if the DMA mappings
> > > are invalidated slightly too early.
> >
> > Mhm, that's quite bad. Any idea how to avoid that?
>
> A few ideas
>
> * Add an arbitrary delay and hope that it is longer than the FIFOs in
> the HW
> * Execute an atomic operation to memory on some GPU engine that could
> act as a fence, maybe just a RELEASE_MEM on the CP to some writeback
> location would do the job
> * If needed, RELEASE_MEM could also perform a cache flush
>
> Regards,
> Felix
>
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Regards,
> > > Felix
> > >
> > >
> > > >
> > > > And please don't push before getting an rb from Felix as well.
> > > >
> > > > Regards,
> > > > Christian.
> > > >
> > > >
> > > > Am 14.09.23 um 11:23 schrieb Lang Yu:
> > > > > cyan_skilfish has problems with other flush types.
> > > > >
> > > > > v2: fix incorrect ternary conditional operator usage.(Yifan)
> > > > >
> > > > > Signed-off-by: Lang Yu <Lang.Yu(a)amd.com>
> > > > > Cc: <stable(a)vger.kernel.org> # v5.15+
> > > > > ---
> > > > > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
> > > > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > index d3da13f4c80e..c6d11047169a 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > @@ -236,7 +236,8 @@ static void
> > > > > gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t
> > > > > vmid,
> > > > > {
> > > > > bool use_semaphore =
> > > > > gmc_v10_0_use_invalidate_semaphore(adev, vmhub);
> > > > > struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
> > > > > - u32 inv_req =
> > > > > hub->vmhub_funcs->get_invalidate_req(vmid, flush_type);
> > > > > + u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
> > > > > + (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
> > > > > flush_type : 0);
> > > > > u32 tmp;
> > > > > /* Use register 17 for GART */
> > > > > const unsigned int eng = 17;
> > > > > @@ -331,6 +332,8 @@ static void
> > > > > gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t
> > > > > vmid,
> > > > > int r;
> > > > > + flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
> > > > > ? flush_type : 0;
> > > > > +
> > > > > /* flush hdp cache */
> > > > > adev->hdp.funcs->flush_hdp(adev, NULL);
> > > > > @@ -426,6 +429,8 @@ static int
> > > > > gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> > > > > struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
> > > > > struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
> > > > > + flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
> > > > > ? flush_type : 0;
> > > > > +
> > > > > if (amdgpu_emu_mode == 0 && ring->sched.ready) {
> > > > > spin_lock(&adev->gfx.kiq[0].ring_lock);
> > > > > /* 2 dwords flush + 8 dwords fence */
> > > >
> >
Hi friend I am a banker in ADB BANK. I want to transfer an abandoned
$18.5Million to your Bank account. 50/percent will be your share.
No risk involved but keep it as secret. Contact me for more details.
I am requesting for your urgent assistance to collaborate with you to
move the balance sum of $18.5 Million US Dollars and 950kg of gold
bars, this is a very genuine business that need full concentration,
corporation and transparency between the both of us and its %100 free
from any risk. I agreed to share the funds and gold bars with you in
ratio of %50 %50.
Yours
Mr Pascal Yembiline
The quilt patch titled
Subject: kasan: fix access invalid shadow address when input is illegal
has been removed from the -mm tree. Its filename was
kasan-fix-access-invalid-shadow-address-when-input-is-illegal.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Haibo Li <haibo.li(a)mediatek.com>
Subject: kasan: fix access invalid shadow address when input is illegal
Date: Thu, 14 Sep 2023 16:08:33 +0800
when the input address is illegal,the corresponding shadow address from
kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow
address causes kernel oops. Here is a sample about oops on arm64(VA
39bit) with KASAN_SW_TAGS on:
[ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003,
pud=000000005d3ce003, pmd=0000000000000000
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43
Hardware name: linux,dummy-virt (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __hwasan_load8_noabort+0x5c/0x90
lr : do_ib_ob+0xf4/0x110
ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa.
The problem is reading invalid shadow in kasan_check_range.
The generic kasan also has similar oops.
To fix it,check shadow address by reading it with no fault.
After this patch,KASAN is able to report invalid memory access
for this case.
Link: https://lkml.kernel.org/r/20230914080833.50026-1-haibo.li@mediatek.com
Signed-off-by: Haibo Li <haibo.li(a)mediatek.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Matthias Brugger <matthias.bgg(a)gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan.h | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
--- a/mm/kasan/kasan.h~kasan-fix-access-invalid-shadow-address-when-input-is-illegal
+++ a/mm/kasan/kasan.h
@@ -304,8 +304,17 @@ static __always_inline bool addr_has_met
#ifdef __HAVE_ARCH_SHADOW_MAP
return (kasan_mem_to_shadow((void *)addr) != NULL);
#else
- return (kasan_reset_tag(addr) >=
- kasan_shadow_to_mem((void *)KASAN_SHADOW_START));
+ u8 *shadow, shadow_val;
+
+ if (kasan_reset_tag(addr) <
+ kasan_shadow_to_mem((void *)KASAN_SHADOW_START))
+ return false;
+ /* use read with nofault to check whether the shadow is accessible */
+ shadow = kasan_mem_to_shadow((void *)addr);
+ __get_kernel_nofault(&shadow_val, shadow, u8, fault);
+ return true;
+fault:
+ return false;
#endif
}
_
Patches currently in -mm which might be from haibo.li(a)mediatek.com are