Hello,
I was profiling the 5.10 kernel and comparing it to 4.14. On a system with 64 virtual CPUs and 256 GiB of RAM, I am observing a significant drop in IO performance. Using the following FIO with the script "sudo ftest_write.sh <dev_name>" in attachment, I saw FIO iops result drop from 22K to less than 1K.
The script simply does: mount a the EXT4 16GiB volume with max IOPS 64000K, mounting option is " -o noatime,nodiratime,data=ordered", then run fio with 2048 fio wring thread with 28800000 file size with { --name=16kb_rand_write_only_2048_jobs --directory=/rdsdbdata1 --rw=randwrite --ioengine=sync --buffered=1 --bs=16k --max-jobs=2048 --numjobs=2048 --runtime=60 --time_based --thread --filesize=28800000 --fsync=1 --group_reporting }.
My analyzing is that the degradation is introduce by commit {244adf6426ee31a83f397b700d964cff12a247d3} and the issue is the contention on rsv_conversion_wq. The simplest option is to increase the journal size, but that introduces more operational complexity. Another option is to add the following change in attachment "allow more ext4-rsv-conversion workqueue.patch"
From 27e1b0e14275a281b3529f6a60c7b23a81356751 Mon Sep 17 00:00:00 2001
From: davinalu <davinalu(a)amazon.com>
Date: Fri, 23 Sep 2022 00:43:53 +0000
Subject: [PATCH] allow more ext4-rsv-conversion workqueue to speedup fio writing
---
fs/ext4/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a0af833f7da7..6b34298cdc3b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4963,7 +4963,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
* concurrency isn't really necessary. Limit it to 1.
*/
EXT4_SB(sb)->rsv_conversion_wq =
- alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+ alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM |
+ WQ_UNBOUND | __WQ_ORDERED, 0);
if (!EXT4_SB(sb)->rsv_conversion_wq) {
printk(KERN_ERR "EXT4-fs: failed to create workqueue\n");
ret = -ENOMEM;
My thought is: If the max_active is 1, it means the "__WQ_ORDERED" combined with WQ_UNBOUND setting, based on alloc_workqueue(). So I added it .
I am not sure should we need "__WQ_ORDERED" or not? without "__WQ_ORDERED" it looks also work at my testbed, but I added since not much fio TP difference on my testbed result with/out "__WQ_ORDERED".
From My understanding and observation: with dioread_unlock and delay_alloc both enabled, the bio_endio() and ext4_writepages() will trigger this work queue to ext4_do_flush_completed_IO(). Looks like the work queue is an one-by-one updating: at EXT4 extend.c io_end->list_vec list only have one io_end_vec each time. So if the BIO has high performance, and we have only one thread to do EXT4 flush will be an bottleneck here. The "ext4-rsv-conversion" this workqueue is mainly for update the EXT4_IO_END_UNWRITTEN extend block(only exist on dioread_unlock and delay_alloc options are set) and extend status if I understand correctly here. Am I correct?
This works on my test system and passes xfstests, but will this cause any corruption on ext4 extends blocks updates, not even sure about the journal transaction updates either?
Can you tell me what I will break if this change is made?
Thanks
Davina
The quilt patch titled
Subject: mm/damon/dbgfs: check if rm_contexts input is for a real context
has been removed from the -mm tree. Its filename was
mm-damon-dbgfs-check-if-rm_contexts-input-is-for-a-real-context.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/dbgfs: check if rm_contexts input is for a real context
Date: Mon, 7 Nov 2022 16:50:00 +0000
A user could write a name of a file under 'damon/' debugfs directory,
which is not a user-created context, to 'rm_contexts' file. In the case,
'dbgfs_rm_context()' just assumes it's the valid DAMON context directory
only if a file of the name exist. As a result, invalid memory access
could happen as below. Fix the bug by checking if the given input is for
a directory. This check can filter out non-context inputs because
directories under 'damon/' debugfs directory can be created via only
'mk_contexts' file.
This bug has found by syzbot[1].
[1] https://lore.kernel.org/damon/000000000000ede3ac05ec4abf8e@google.com/
Link: https://lkml.kernel.org/r/20221107165001.5717-2-sj@kernel.org
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: syzbot+6087eafb76a94c4ac9eb(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org> [5.15.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/dbgfs.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-check-if-rm_contexts-input-is-for-a-real-context
+++ a/mm/damon/dbgfs.c
@@ -890,6 +890,7 @@ out:
static int dbgfs_rm_context(char *name)
{
struct dentry *root, *dir, **new_dirs;
+ struct inode *inode;
struct damon_ctx **new_ctxs;
int i, j;
int ret = 0;
@@ -905,6 +906,12 @@ static int dbgfs_rm_context(char *name)
if (!dir)
return -ENOENT;
+ inode = d_inode(dir);
+ if (!S_ISDIR(inode->i_mode)) {
+ ret = -EINVAL;
+ goto out_dput;
+ }
+
new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*dbgfs_dirs),
GFP_KERNEL);
if (!new_dirs) {
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-split-out-damos-charged-region-skip-logic-into-a-new-function.patch
mm-damon-core-split-damos-application-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-stat-update-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-quota-adjustment-logic-into-a-new-function.patch
mm-damon-sysfs-use-damon_addr_range-for-regions-start-and-end-values.patch
mm-damon-sysfs-remove-parameters-of-damon_sysfs_region_alloc.patch
mm-damon-sysfs-move-sysfs_lock-to-common-module.patch
mm-damon-sysfs-move-unsigned-long-range-directory-to-common-module.patch
mm-damon-sysfs-split-out-kdamond-independent-schemes-stats-update-logic-into-a-new-function.patch
mm-damon-sysfs-split-out-schemes-directory-implementation-to-separate-file.patch
mm-damon-modules-deduplicate-init-steps-for-damon-context-setup.patch
mm-damon-reclaimlru_sort-remove-unnecessarily-included-headers.patch
mm-damon-reclaim-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_reclaims-enabled-parameter.patch
mm-damon-lru_sort-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_lru_sorts-enabled-parameter.patch
docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch
docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch
mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch
mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch
mm-damon-sysfs-schemes-implement-scheme-region-directory.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch
tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch
docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch
docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch
selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch
The quilt patch titled
Subject: nilfs2: fix use-after-free bug of ns_writer on remount
has been removed from the -mm tree. Its filename was
nilfs2-fix-use-after-free-bug-of-ns_writer-on-remount.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix use-after-free bug of ns_writer on remount
Date: Fri, 4 Nov 2022 23:29:59 +0900
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 15 ++++++++-------
fs/nilfs2/super.c | 2 --
2 files changed, 8 insertions(+), 9 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-use-after-free-bug-of-ns_writer-on-remount
+++ a/fs/nilfs2/segment.c
@@ -317,7 +317,7 @@ void nilfs_relax_pressure_in_lock(struct
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2242,7 +2242,7 @@ int nilfs_construct_segment(struct super
struct nilfs_sc_info *sci = nilfs->ns_writer;
struct nilfs_transaction_info *ti;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2280,7 +2280,7 @@ int nilfs_construct_dsync_segment(struct
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2776,11 +2776,12 @@ int nilfs_attach_log_writer(struct super
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
--- a/fs/nilfs2/super.c~nilfs2-fix-use-after-free-bug-of-ns_writer-on-remount
+++ a/fs/nilfs2/super.c
@@ -1133,8 +1133,6 @@ static int nilfs_remount(struct super_bl
if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & SB_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= SB_RDONLY;
/*
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-shift-out-of-bounds-overflow-in-nilfs_sb2_bad_offset.patch
nilfs2-fix-shift-out-of-bounds-due-to-too-large-exponent-of-block-size.patch
The quilt patch titled
Subject: mm: hugetlb_vmemmap: include missing linux/moduleparam.h
has been removed from the -mm tree. Its filename was
mm-hugetlb_vmemmap-include-missing-linux-moduleparamh.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Vasily Gorbik <gor(a)linux.ibm.com>
Subject: mm: hugetlb_vmemmap: include missing linux/moduleparam.h
Date: Wed, 2 Nov 2022 19:09:17 +0100
The kernel test robot reported build failures with a 'randconfig' on s390:
>> mm/hugetlb_vmemmap.c:421:11: error: a function declaration without a
prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0);
^
Link: https://lore.kernel.org/linux-mm/202210300751.rG3UDsuc-lkp@intel.com/
Link: https://lkml.kernel.org/r/patch.git-296b83ca939b.your-ad-here.call-01667411…
Fixes: 30152245c63b ("mm: hugetlb_vmemmap: replace early_param() with core_param()")
Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb_vmemmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/hugetlb_vmemmap.c~mm-hugetlb_vmemmap-include-missing-linux-moduleparamh
+++ a/mm/hugetlb_vmemmap.c
@@ -11,6 +11,7 @@
#define pr_fmt(fmt) "HugeTLB: " fmt
#include <linux/pgtable.h>
+#include <linux/moduleparam.h>
#include <linux/bootmem_info.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
_
Patches currently in -mm which might be from gor(a)linux.ibm.com are
The quilt patch titled
Subject: mm/shmem: use page_mapping() to detect page cache for uffd continue
has been removed from the -mm tree. Its filename was
mm-shmem-use-page_mapping-to-detect-page-cache-for-uffd-continue.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/shmem: use page_mapping() to detect page cache for uffd continue
Date: Wed, 2 Nov 2022 14:41:52 -0400
mfill_atomic_install_pte() checks page->mapping to detect whether one page
is used in the page cache. However as pointed out by Matthew, the page
can logically be a tail page rather than always the head in the case of
uffd minor mode with UFFDIO_CONTINUE. It means we could wrongly install
one pte with shmem thp tail page assuming it's an anonymous page.
It's not that clear even for anonymous page, since normally anonymous
pages also have page->mapping being setup with the anon vma. It's safe
here only because the only such caller to mfill_atomic_install_pte() is
always passing in a newly allocated page (mcopy_atomic_pte()), whose
page->mapping is not yet setup. However that's not extremely obvious
either.
For either of above, use page_mapping() instead.
Link: https://lkml.kernel.org/r/Y2K+y7wnhC4vbnP2@x1n
Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Matthew Wilcox <willy(a)infradead.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/userfaultfd.c~mm-shmem-use-page_mapping-to-detect-page-cache-for-uffd-continue
+++ a/mm/userfaultfd.c
@@ -64,7 +64,7 @@ int mfill_atomic_install_pte(struct mm_s
pte_t _dst_pte, *dst_pte;
bool writable = dst_vma->vm_flags & VM_WRITE;
bool vm_shared = dst_vma->vm_flags & VM_SHARED;
- bool page_in_cache = page->mapping;
+ bool page_in_cache = page_mapping(page);
spinlock_t *ptl;
struct inode *inode;
pgoff_t offset, max_off;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
selftests-vm-use-memfd-for-uffd-hugetlb-tests.patch
selftests-vm-use-memfd-for-hugetlb-madvise-test.patch
selftests-vm-use-memfd-for-hugepage-mremap-test.patch
selftests-vm-drop-mnt-point-for-hugetlb-in-run_vmtestssh.patch
mm-hugetlb-unify-clearing-of-restorereserve-for-private-pages.patch
revert-mm-uffd-fix-warning-without-pte_marker_uffd_wp-compiled-in.patch
mm-always-compile-in-pte-markers.patch
mm-use-pte-markers-for-swap-errors.patch
The quilt patch titled
Subject: mm/memremap.c: map FS_DAX device memory as decrypted
has been removed from the -mm tree. Its filename was
mm-memremapc-map-fs_dax-device-memory-as-decrypted.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Pankaj Gupta <pankaj.gupta(a)amd.com>
Subject: mm/memremap.c: map FS_DAX device memory as decrypted
Date: Wed, 2 Nov 2022 11:07:28 -0500
virtio_pmem use devm_memremap_pages() to map the device memory. By
default this memory is mapped as encrypted with SEV. Guest reboot changes
the current encryption key and guest no longer properly decrypts the FSDAX
device meta data.
Mark the corresponding device memory region for FSDAX devices (mapped with
memremap_pages) as decrypted to retain the persistent memory property.
Link: https://lkml.kernel.org/r/20221102160728.3184016-1-pankaj.gupta@amd.com
Fixes: b7b3c01b19159 ("mm/memremap_pages: support multiple ranges per invocation")
Signed-off-by: Pankaj Gupta <pankaj.gupta(a)amd.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memremap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/memremap.c~mm-memremapc-map-fs_dax-device-memory-as-decrypted
+++ a/mm/memremap.c
@@ -335,6 +335,7 @@ void *memremap_pages(struct dev_pagemap
WARN(1, "File system DAX not supported\n");
return ERR_PTR(-EINVAL);
}
+ params.pgprot = pgprot_decrypted(params.pgprot);
break;
case MEMORY_DEVICE_GENERIC:
break;
_
Patches currently in -mm which might be from pankaj.gupta(a)amd.com are
The quilt patch titled
Subject: nilfs2: fix deadlock in nilfs_count_free_blocks()
has been removed from the -mm tree. Its filename was
nilfs2-fix-deadlock-in-nilfs_count_free_blocks.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix deadlock in nilfs_count_free_blocks()
Date: Sat, 29 Oct 2022 13:49:12 +0900
A semaphore deadlock can occur if nilfs_get_block() detects metadata
corruption while locating data blocks and a superblock writeback occurs at
the same time:
task 1 task 2
------ ------
* A file operation *
nilfs_truncate()
nilfs_get_block()
down_read(rwsem A) <--
nilfs_bmap_lookup_contig()
... generic_shutdown_super()
nilfs_put_super()
* Prepare to write superblock *
down_write(rwsem B) <--
nilfs_cleanup_super()
* Detect b-tree corruption * nilfs_set_log_cursor()
nilfs_bmap_convert_error() nilfs_count_free_blocks()
__nilfs_error() down_read(rwsem A) <--
nilfs_set_error()
down_write(rwsem B) <--
*** DEADLOCK ***
Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
inside the lock section.
Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
read-only and nilfs_set_error() attempts to writelock rwsem B (=
nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
acquisition occurs in the order rwsem A -> rwsem B.
Now, if another task starts updating the superblock, it may writelock
rwsem B during the lock sequence above, and can deadlock trying to
readlock rwsem A in nilfs_count_free_blocks().
However, there is actually no need to take rwsem A in
nilfs_count_free_blocks() because it, within the lock section, only reads
a single integer data on a shared struct with
nilfs_sufile_get_ncleansegs(). This has been the case after commit
aa474a220180 ("nilfs2: add local variable to cache the number of clean
segments"), that is, even before this bug was introduced.
So, this resolves the deadlock problem by just not taking the semaphore in
nilfs_count_free_blocks().
Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com
Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+45d6ce7b7ad7ef455d03(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [2.6.38+
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/the_nilfs.c | 2 --
1 file changed, 2 deletions(-)
--- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-deadlock-in-nilfs_count_free_blocks
+++ a/fs/nilfs2/the_nilfs.c
@@ -690,9 +690,7 @@ int nilfs_count_free_blocks(struct the_n
{
unsigned long ncleansegs;
- down_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
ncleansegs = nilfs_sufile_get_ncleansegs(nilfs->ns_sufile);
- up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
*nblocks = (sector_t)ncleansegs * nilfs->ns_blocks_per_segment;
return 0;
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-shift-out-of-bounds-overflow-in-nilfs_sb2_bad_offset.patch
nilfs2-fix-shift-out-of-bounds-due-to-too-large-exponent-of-block-size.patch
The quilt patch titled
Subject: hugetlbfs: don't delete error page from pagecache
has been removed from the -mm tree. Its filename was
hugetlbfs-dont-delete-error-page-from-pagecache.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: James Houghton <jthoughton(a)google.com>
Subject: hugetlbfs: don't delete error page from pagecache
Date: Tue, 18 Oct 2022 20:01:25 +0000
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 13 ++++++-------
mm/hugetlb.c | 4 ++++
mm/memory-failure.c | 5 ++++-
3 files changed, 14 insertions(+), 8 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-dont-delete-error-page-from-pagecache
+++ a/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struc
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struc
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
--- a/mm/hugetlb.c~hugetlbfs-dont-delete-error-page-from-pagecache
+++ a/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
--- a/mm/memory-failure.c~hugetlbfs-dont-delete-error-page-from-pagecache
+++ a/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_stat
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_stat
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_stat
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
_
Patches currently in -mm which might be from jthoughton(a)google.com are
On Tue, Nov 8, 2022 at 1:38 PM Jason A. Donenfeld <Jason(a)zx2c4.com> wrote:
> Sparse reports that calling add_device_randomness() on `uid` is a
> violation of address spaces. And indeed the next usage uses readl()
> properly, but that was left out when passing it toadd_device_
> randomness(). So instead copy the whole thing to the stack first.
>
> Fixes: 4040d10a3d44 ("ARM: ux500: add DB serial number to entropy pool")
> Cc: Linus Walleij <linus.walleij(a)linaro.org>
> Cc: stable(a)vger.kernel.org
> Link: https://lore.kernel.org/all/202210230819.loF90KDh-lkp@intel.com/
> Reported-by: kernel test robot <lkp(a)intel.com>
> Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
Patch applied!
Sorry for taking so long!
Yours,
Linus Walleij
Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
fixed an orphan section warning by adding the '.data..decrypted' section
to the linker script under the PERCPU_DECRYPTED_SECTION define but that
placement introduced a panic with !SMP, as the percpu sections are not
instantiated with that configuration so attempting to access variables
defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault.
Move the '.data..decrypted' section to the DATA_MAIN define so that the
variables in it are properly instantiated at boot time with
CONFIG_SMP=n.
Cc: stable(a)vger.kernel.org
Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
Link: https://lore.kernel.org/cbbd3548-880c-d2ca-1b67-5bb93b291d5f@huawei.com/
Debugged-by: Ard Biesheuvel <ardb(a)kernel.org>
Reported-by: Zhao Wenhui <zhaowenhui8(a)huawei.com>
Tested-by: xiafukun <xiafukun(a)huawei.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
include/asm-generic/vmlinux.lds.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index d06ada2341cb..3dc5824141cd 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -347,6 +347,7 @@
#define DATA_DATA \
*(.xiptext) \
*(DATA_MAIN) \
+ *(.data..decrypted) \
*(.ref.data) \
*(.data..shared_aligned) /* percpu related */ \
MEM_KEEP(init.data*) \
@@ -995,7 +996,6 @@
#ifdef CONFIG_AMD_MEM_ENCRYPT
#define PERCPU_DECRYPTED_SECTION \
. = ALIGN(PAGE_SIZE); \
- *(.data..decrypted) \
*(.data..percpu..decrypted) \
. = ALIGN(PAGE_SIZE);
#else
base-commit: f0c4d9fc9cc9462659728d168387191387e903cc
--
2.38.1
From: Vasily Averin <vvs(a)virtuozzo.com>
Commit 18319498fdd4cdf8c1c2c48cd432863b1f915d6f upstream.
[ This backport fixes CVE-2021-3759 for 5.10 and 5.4. Please, note that
it caused conflicts in all files being changed because upstream
changed ipc object allocation to and from kvmalloc() & friends (eg.
commits bc8136a543aa and fc37a3b8b4388e). However, I decided to keep
this backport about the memcg accounting fix only. ]
When user creates IPC objects it forces kernel to allocate memory for
these long-living objects.
It makes sense to account them to restrict the host's memory consumption
from inside the memcg-limited container.
This patch enables accounting for IPC shared memory segments, messages
semaphores and semaphore's undo lists.
Link: https://lkml.kernel.org/r/d6507b06-4df6-78f8-6c54-3ae86e3b5339@virtuozzo.com
Signed-off-by: Vasily Averin <vvs(a)virtuozzo.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "J. Bruce Fields" <bfields(a)fieldses.org>
Cc: Jeff Layton <jlayton(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Serge Hallyn <serge(a)hallyn.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Yutian Yang <nglaive(a)gmail.com>
Cc: Zefan Li <lizefan.x(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Luiz Capitulino <luizcap(a)amazon.com>
---
ipc/msg.c | 2 +-
ipc/sem.c | 9 +++++----
ipc/shm.c | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
Reviewers,
Some important details:
o While doing this backport I realized that Vasily worked on a large accounting
overhaul which may include more instances of this problem (and possibly more
unfixed CVEs). This brings the question whether we should only fix concrete/reproducible
accounting issues or bring Vasily's entire overhaul. I'm choosing to fix
only concrete cases
o 4.19 and 4.9 should also have this issue, but I haven't tried the backport
there yet
o For testing, I did two things:
1. Reproduced the issue as described in the link below, with and
without this patch. Without the patch I can pretty clearly see
the kernel allocating several gigas of memory that are not
accounted for by memcg. With the patch the memory is accounted
correctly
Reproducer: https://lore.kernel.org/linux-mm/1626333284-1404-1-git-send-email-nglaive@g…
2. I ran LTP's ipc test-cases (which simple, but hopefully good enough)
diff --git a/ipc/msg.c b/ipc/msg.c
index 6e6c8e0c9380..8ded6b8f10a2 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -147,7 +147,7 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
key_t key = params->key;
int msgflg = params->flg;
- msq = kvmalloc(sizeof(*msq), GFP_KERNEL);
+ msq = kvmalloc(sizeof(*msq), GFP_KERNEL_ACCOUNT);
if (unlikely(!msq))
return -ENOMEM;
diff --git a/ipc/sem.c b/ipc/sem.c
index 7d9c06b0ad6e..d3b9b73cd9ca 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -511,7 +511,7 @@ static struct sem_array *sem_alloc(size_t nsems)
if (nsems > (INT_MAX - sizeof(*sma)) / sizeof(sma->sems[0]))
return NULL;
- sma = kvzalloc(struct_size(sma, sems, nsems), GFP_KERNEL);
+ sma = kvzalloc(struct_size(sma, sems, nsems), GFP_KERNEL_ACCOUNT);
if (unlikely(!sma))
return NULL;
@@ -1852,7 +1852,7 @@ static inline int get_undo_list(struct sem_undo_list **undo_listp)
undo_list = current->sysvsem.undo_list;
if (!undo_list) {
- undo_list = kzalloc(sizeof(*undo_list), GFP_KERNEL);
+ undo_list = kzalloc(sizeof(*undo_list), GFP_KERNEL_ACCOUNT);
if (undo_list == NULL)
return -ENOMEM;
spin_lock_init(&undo_list->lock);
@@ -1937,7 +1937,7 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
rcu_read_unlock();
/* step 2: allocate new undo structure */
- new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, GFP_KERNEL);
+ new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, GFP_KERNEL_ACCOUNT);
if (!new) {
ipc_rcu_putref(&sma->sem_perm, sem_rcu_free);
return ERR_PTR(-ENOMEM);
@@ -2001,7 +2001,8 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
if (nsops > ns->sc_semopm)
return -E2BIG;
if (nsops > SEMOPM_FAST) {
- sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+ sops = kvmalloc_array(nsops, sizeof(*sops),
+ GFP_KERNEL_ACCOUNT);
if (sops == NULL)
return -ENOMEM;
}
diff --git a/ipc/shm.c b/ipc/shm.c
index 471ac3e7498d..b418731d66e8 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -711,7 +711,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
ns->shm_tot + numpages > ns->shm_ctlall)
return -ENOSPC;
- shp = kvmalloc(sizeof(*shp), GFP_KERNEL);
+ shp = kvmalloc(sizeof(*shp), GFP_KERNEL_ACCOUNT);
if (unlikely(!shp))
return -ENOMEM;
--
2.24.4.AMZN
FILL_RETURN_BUFFER can access percpu data, therefore vmload of the
host save area must be executed first. First of all, move the
VMCB vmsave/vmload to assembly.
The idea on how to number the exception tables is stolen from
a prototype patch by Peter Zijlstra.
Cc: stable(a)vger.kernel.org
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/kvm-asm-offsets.c | 1 +
arch/x86/kvm/svm/svm.c | 9 -------
arch/x86/kvm/svm/vmenter.S | 49 ++++++++++++++++++++++++++--------
3 files changed, 39 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index f1b694e431ae..f83e88b85bf2 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -16,6 +16,7 @@ static void __used common(void)
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0c86c435c51f..0ba4feb19cb6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3922,16 +3922,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
} else {
struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
- /*
- * Use a single vmcb (vmcb01 because it's always valid) for
- * context switching guest state via VMLOAD/VMSAVE, that way
- * the state doesn't need to be copied between vmcb01 and
- * vmcb02 when switching vmcbs for nested virtualization.
- */
- vmload(svm->vmcb01.pa);
__svm_vcpu_run(svm);
- vmsave(svm->vmcb01.pa);
-
vmload(__sme_page_pa(sd->save_area));
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index d07bac1952c5..5bc2ed7d79c0 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -28,6 +28,8 @@
#define VCPU_R15 (SVM_vcpu_arch_regs + __VCPU_REGS_R15 * WORD_SIZE)
#endif
+#define SVM_vmcb01_pa (SVM_vmcb01 + KVM_VMCB_pa)
+
.section .noinstr.text, "ax"
/**
@@ -55,6 +57,16 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %_ASM_ARG1, %_ASM_DI
.endif
+ /*
+ * Use a single vmcb (vmcb01 because it's always valid) for
+ * context switching guest state via VMLOAD/VMSAVE, that way
+ * the state doesn't need to be copied between vmcb01 and
+ * vmcb02 when switching vmcbs for nested virtualization.
+ */
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+1: vmload %_ASM_AX
+2:
+
/* Get svm->current_vmcb->pa into RAX. */
mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
@@ -80,16 +92,11 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Enter guest mode */
sti
-1: vmrun %_ASM_AX
-
-2: cli
-
-#ifdef CONFIG_RETPOLINE
- /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
- FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
-#endif
+3: vmrun %_ASM_AX
+4:
+ cli
- /* "POP" @svm to RAX. */
+ /* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
/* Save all guest registers. */
@@ -110,6 +117,18 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %r15, VCPU_R15(%_ASM_AX)
#endif
+ /* @svm can stay in RDI from now on. */
+ mov %_ASM_AX, %_ASM_DI
+
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+5: vmsave %_ASM_AX
+6:
+
+#ifdef CONFIG_RETPOLINE
+ /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
+ FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+#endif
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -159,11 +178,19 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
-3: cmpb $0, kvm_rebooting
+10: cmpb $0, kvm_rebooting
jne 2b
ud2
+30: cmpb $0, kvm_rebooting
+ jne 4b
+ ud2
+50: cmpb $0, kvm_rebooting
+ jne 6b
+ ud2
- _ASM_EXTABLE(1b, 3b)
+ _ASM_EXTABLE(1b, 10b)
+ _ASM_EXTABLE(3b, 30b)
+ _ASM_EXTABLE(5b, 50b)
SYM_FUNC_END(__svm_vcpu_run)
--
2.31.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
12caf46cf4fc ("drm/i915/sdvo: Grab mode_config.mutex during LVDS init to avoid WARNs")
d372ec94a018 ("drm/i915: Simplify intel_panel_add_edid_alt_fixed_modes()")
a5810f551d0a ("drm/i915: Allow more varied alternate fixed modes for panels")
6e939738da20 ("drm/i915: Accept more fixed modes with VRR panels")
3cf050762534 ("drm/i915/bios: Split VBT data into per-panel vs. global parts")
c2fdb424d322 ("drm/i915/bios: Split VBT parsing to global vs. panel specific parts")
c3fbcf60bc74 ("drm/i915/bios: Split parse_driver_features() into two parts")
75bd0d5e4ead ("drm/i915/pps: Split pps_init_delays() into distinct parts")
822e5ae701af ("drm/i915: Extract intel_edp_fixup_vbt_bpp()")
949665a6e237 ("drm/i915: Respect VBT seamless DRRS min refresh rate")
790b45f1bc67 ("drm/i915/bios: Parse the seamless DRRS min refresh rate")
901a0cad2ab8 ("drm/i915/bios: Get access to the tail end of the LFP data block")
13367132a7ad ("drm/i915/bios: Reorder panel DTD parsing")
5ab58d6996d7 ("drm/i915/bios: Validate the panel_name table")
58b2e3829ec6 ("drm/i915/bios: Trust the LFP data pointers")
514003e1421e ("drm/i915/bios: Validate LFP data table pointers")
918f3025960f ("drm/i915/bios: Use the copy of the LFP data table always")
e163cfb4c96d ("drm/i915/bios: Make copies of VBT data blocks")
d58a3d699797 ("drm/i915/bios: Use the cached BDB version")
ca2a3c9204ec ("drm/i915/bios: Extract struct lvds_lfp_data_ptr_table")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 12caf46cf4fc92b1c3884cb363ace2e12732fd2f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 26 Oct 2022 13:11:29 +0300
Subject: [PATCH] drm/i915/sdvo: Grab mode_config.mutex during LVDS init to
avoid WARNs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
drm_mode_probed_add() is unhappy about being called w/o
mode_config.mutex. Grab it during LVDS fixed mode setup
to silence the WARNs.
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7301
Fixes: aa2b88074a56 ("drm/i915/sdvo: Fix multi function encoder stuff")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-4-ville.…
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit a3cd4f447281c56377de2ee109327400eb00668d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_sdvo.c b/drivers/gpu/drm/i915/display/intel_sdvo.c
index 8ee7b05ab733..774c1dc31a52 100644
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@@ -2900,8 +2900,12 @@ intel_sdvo_lvds_init(struct intel_sdvo *intel_sdvo, int device)
intel_panel_add_vbt_sdvo_fixed_mode(intel_connector);
if (!intel_panel_preferred_fixed_mode(intel_connector)) {
+ mutex_lock(&i915->drm.mode_config.mutex);
+
intel_ddc_get_modes(connector, &intel_sdvo->ddc);
intel_panel_add_edid_fixed_modes(intel_connector, false);
+
+ mutex_unlock(&i915->drm.mode_config.mutex);
}
intel_panel_init(intel_connector);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
12caf46cf4fc ("drm/i915/sdvo: Grab mode_config.mutex during LVDS init to avoid WARNs")
d372ec94a018 ("drm/i915: Simplify intel_panel_add_edid_alt_fixed_modes()")
a5810f551d0a ("drm/i915: Allow more varied alternate fixed modes for panels")
6e939738da20 ("drm/i915: Accept more fixed modes with VRR panels")
3cf050762534 ("drm/i915/bios: Split VBT data into per-panel vs. global parts")
c2fdb424d322 ("drm/i915/bios: Split VBT parsing to global vs. panel specific parts")
c3fbcf60bc74 ("drm/i915/bios: Split parse_driver_features() into two parts")
75bd0d5e4ead ("drm/i915/pps: Split pps_init_delays() into distinct parts")
822e5ae701af ("drm/i915: Extract intel_edp_fixup_vbt_bpp()")
949665a6e237 ("drm/i915: Respect VBT seamless DRRS min refresh rate")
790b45f1bc67 ("drm/i915/bios: Parse the seamless DRRS min refresh rate")
901a0cad2ab8 ("drm/i915/bios: Get access to the tail end of the LFP data block")
13367132a7ad ("drm/i915/bios: Reorder panel DTD parsing")
5ab58d6996d7 ("drm/i915/bios: Validate the panel_name table")
58b2e3829ec6 ("drm/i915/bios: Trust the LFP data pointers")
514003e1421e ("drm/i915/bios: Validate LFP data table pointers")
918f3025960f ("drm/i915/bios: Use the copy of the LFP data table always")
e163cfb4c96d ("drm/i915/bios: Make copies of VBT data blocks")
d58a3d699797 ("drm/i915/bios: Use the cached BDB version")
ca2a3c9204ec ("drm/i915/bios: Extract struct lvds_lfp_data_ptr_table")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 12caf46cf4fc92b1c3884cb363ace2e12732fd2f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 26 Oct 2022 13:11:29 +0300
Subject: [PATCH] drm/i915/sdvo: Grab mode_config.mutex during LVDS init to
avoid WARNs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
drm_mode_probed_add() is unhappy about being called w/o
mode_config.mutex. Grab it during LVDS fixed mode setup
to silence the WARNs.
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7301
Fixes: aa2b88074a56 ("drm/i915/sdvo: Fix multi function encoder stuff")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-4-ville.…
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit a3cd4f447281c56377de2ee109327400eb00668d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_sdvo.c b/drivers/gpu/drm/i915/display/intel_sdvo.c
index 8ee7b05ab733..774c1dc31a52 100644
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@@ -2900,8 +2900,12 @@ intel_sdvo_lvds_init(struct intel_sdvo *intel_sdvo, int device)
intel_panel_add_vbt_sdvo_fixed_mode(intel_connector);
if (!intel_panel_preferred_fixed_mode(intel_connector)) {
+ mutex_lock(&i915->drm.mode_config.mutex);
+
intel_ddc_get_modes(connector, &intel_sdvo->ddc);
intel_panel_add_edid_fixed_modes(intel_connector, false);
+
+ mutex_unlock(&i915->drm.mode_config.mutex);
}
intel_panel_init(intel_connector);
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
12caf46cf4fc ("drm/i915/sdvo: Grab mode_config.mutex during LVDS init to avoid WARNs")
d372ec94a018 ("drm/i915: Simplify intel_panel_add_edid_alt_fixed_modes()")
a5810f551d0a ("drm/i915: Allow more varied alternate fixed modes for panels")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 12caf46cf4fc92b1c3884cb363ace2e12732fd2f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 26 Oct 2022 13:11:29 +0300
Subject: [PATCH] drm/i915/sdvo: Grab mode_config.mutex during LVDS init to
avoid WARNs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
drm_mode_probed_add() is unhappy about being called w/o
mode_config.mutex. Grab it during LVDS fixed mode setup
to silence the WARNs.
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7301
Fixes: aa2b88074a56 ("drm/i915/sdvo: Fix multi function encoder stuff")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-4-ville.…
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit a3cd4f447281c56377de2ee109327400eb00668d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_sdvo.c b/drivers/gpu/drm/i915/display/intel_sdvo.c
index 8ee7b05ab733..774c1dc31a52 100644
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@@ -2900,8 +2900,12 @@ intel_sdvo_lvds_init(struct intel_sdvo *intel_sdvo, int device)
intel_panel_add_vbt_sdvo_fixed_mode(intel_connector);
if (!intel_panel_preferred_fixed_mode(intel_connector)) {
+ mutex_lock(&i915->drm.mode_config.mutex);
+
intel_ddc_get_modes(connector, &intel_sdvo->ddc);
intel_panel_add_edid_fixed_modes(intel_connector, false);
+
+ mutex_unlock(&i915->drm.mode_config.mutex);
}
intel_panel_init(intel_connector);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
12caf46cf4fc ("drm/i915/sdvo: Grab mode_config.mutex during LVDS init to avoid WARNs")
d372ec94a018 ("drm/i915: Simplify intel_panel_add_edid_alt_fixed_modes()")
a5810f551d0a ("drm/i915: Allow more varied alternate fixed modes for panels")
6e939738da20 ("drm/i915: Accept more fixed modes with VRR panels")
3cf050762534 ("drm/i915/bios: Split VBT data into per-panel vs. global parts")
c2fdb424d322 ("drm/i915/bios: Split VBT parsing to global vs. panel specific parts")
c3fbcf60bc74 ("drm/i915/bios: Split parse_driver_features() into two parts")
75bd0d5e4ead ("drm/i915/pps: Split pps_init_delays() into distinct parts")
822e5ae701af ("drm/i915: Extract intel_edp_fixup_vbt_bpp()")
949665a6e237 ("drm/i915: Respect VBT seamless DRRS min refresh rate")
790b45f1bc67 ("drm/i915/bios: Parse the seamless DRRS min refresh rate")
901a0cad2ab8 ("drm/i915/bios: Get access to the tail end of the LFP data block")
13367132a7ad ("drm/i915/bios: Reorder panel DTD parsing")
5ab58d6996d7 ("drm/i915/bios: Validate the panel_name table")
58b2e3829ec6 ("drm/i915/bios: Trust the LFP data pointers")
514003e1421e ("drm/i915/bios: Validate LFP data table pointers")
918f3025960f ("drm/i915/bios: Use the copy of the LFP data table always")
e163cfb4c96d ("drm/i915/bios: Make copies of VBT data blocks")
d58a3d699797 ("drm/i915/bios: Use the cached BDB version")
ca2a3c9204ec ("drm/i915/bios: Extract struct lvds_lfp_data_ptr_table")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 12caf46cf4fc92b1c3884cb363ace2e12732fd2f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 26 Oct 2022 13:11:29 +0300
Subject: [PATCH] drm/i915/sdvo: Grab mode_config.mutex during LVDS init to
avoid WARNs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
drm_mode_probed_add() is unhappy about being called w/o
mode_config.mutex. Grab it during LVDS fixed mode setup
to silence the WARNs.
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7301
Fixes: aa2b88074a56 ("drm/i915/sdvo: Fix multi function encoder stuff")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-4-ville.…
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit a3cd4f447281c56377de2ee109327400eb00668d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_sdvo.c b/drivers/gpu/drm/i915/display/intel_sdvo.c
index 8ee7b05ab733..774c1dc31a52 100644
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@@ -2900,8 +2900,12 @@ intel_sdvo_lvds_init(struct intel_sdvo *intel_sdvo, int device)
intel_panel_add_vbt_sdvo_fixed_mode(intel_connector);
if (!intel_panel_preferred_fixed_mode(intel_connector)) {
+ mutex_lock(&i915->drm.mode_config.mutex);
+
intel_ddc_get_modes(connector, &intel_sdvo->ddc);
intel_panel_add_edid_fixed_modes(intel_connector, false);
+
+ mutex_unlock(&i915->drm.mode_config.mutex);
}
intel_panel_init(intel_connector);
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6cb5cec16c38 ("drm/amd/display: Set memclk levels to be at least 1 for dcn32")
d6170e418d1d ("drm/amd/display: Acquire FCLK DPM levels on DCN32")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6cb5cec16c380be4cf9776a8c23b72e9fe742fd1 Mon Sep 17 00:00:00 2001
From: Dillon Varone <Dillon.Varone(a)amd.com>
Date: Thu, 20 Oct 2022 11:46:48 -0400
Subject: [PATCH] drm/amd/display: Set memclk levels to be at least 1 for dcn32
[Why]
Cannot report 0 memclk levels even when SMU does not provide any.
[How]
When memclk levels reported by SMU is 0, set levels to 1.
Tested-by: Mark Broadworth <mark.broadworth(a)amd.com>
Reviewed-by: Martin Leung <Martin.Leung(a)amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.0.x
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
index fd0313468fdb..6f77d8e538ab 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
@@ -669,6 +669,9 @@ static void dcn32_get_memclk_states_from_smu(struct clk_mgr *clk_mgr_base)
&clk_mgr_base->bw_params->clk_table.entries[0].memclk_mhz,
&num_entries_per_clk->num_memclk_levels);
+ /* memclk must have at least one level */
+ num_entries_per_clk->num_memclk_levels = num_entries_per_clk->num_memclk_levels ? num_entries_per_clk->num_memclk_levels : 1;
+
dcn32_init_single_clock(clk_mgr, PPCLK_FCLK,
&clk_mgr_base->bw_params->clk_table.entries[0].fclk_mhz,
&num_entries_per_clk->num_fclk_levels);
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
e59843c4cdd6 ("drm/amd/display: Limit dcn32 to 1950Mhz display clock")
d6170e418d1d ("drm/amd/display: Acquire FCLK DPM levels on DCN32")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e59843c4cdd68a369591630088171eeacce9859f Mon Sep 17 00:00:00 2001
From: Jun Lei <jun.lei(a)amd.com>
Date: Thu, 20 Oct 2022 11:46:44 -0400
Subject: [PATCH] drm/amd/display: Limit dcn32 to 1950Mhz display clock
[why]
Hardware team recommends we limit dispclock to 1950Mhz for all DCN3.2.x
[how]
Limit to 1950 when initializing clocks.
Tested-by: Mark Broadworth <mark.broadworth(a)amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2(a)amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: Jun Lei <jun.lei(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.0.x
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
index 1c612ccf1944..fd0313468fdb 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
@@ -157,6 +157,7 @@ void dcn32_init_clocks(struct clk_mgr *clk_mgr_base)
struct clk_mgr_internal *clk_mgr = TO_CLK_MGR_INTERNAL(clk_mgr_base);
unsigned int num_levels;
struct clk_limit_num_entries *num_entries_per_clk = &clk_mgr_base->bw_params->clk_table.num_entries_per_clk;
+ unsigned int i;
memset(&(clk_mgr_base->clks), 0, sizeof(struct dc_clocks));
clk_mgr_base->clks.p_state_change_support = true;
@@ -205,18 +206,17 @@ void dcn32_init_clocks(struct clk_mgr *clk_mgr_base)
clk_mgr->dpm_present = true;
if (clk_mgr_base->ctx->dc->debug.min_disp_clk_khz) {
- unsigned int i;
-
for (i = 0; i < num_levels; i++)
if (clk_mgr_base->bw_params->clk_table.entries[i].dispclk_mhz
< khz_to_mhz_ceil(clk_mgr_base->ctx->dc->debug.min_disp_clk_khz))
clk_mgr_base->bw_params->clk_table.entries[i].dispclk_mhz
= khz_to_mhz_ceil(clk_mgr_base->ctx->dc->debug.min_disp_clk_khz);
}
+ for (i = 0; i < num_levels; i++)
+ if (clk_mgr_base->bw_params->clk_table.entries[i].dispclk_mhz > 1950)
+ clk_mgr_base->bw_params->clk_table.entries[i].dispclk_mhz = 1950;
if (clk_mgr_base->ctx->dc->debug.min_dpp_clk_khz) {
- unsigned int i;
-
for (i = 0; i < num_levels; i++)
if (clk_mgr_base->bw_params->clk_table.entries[i].dppclk_mhz
< khz_to_mhz_ceil(clk_mgr_base->ctx->dc->debug.min_dpp_clk_khz))
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6fdaed8c7988 ("drm/format-helper: Only advertise supported formats for conversion")
4a85b0b51e21 ("drm/format-helper: Add drm_fb_build_fourcc_list() helper")
71bf55872cbe ("drm/format-helper: Provide drm_fb_blit()")
de40c281fe0b ("drm/simpledrm: Convert to atomic helpers")
c25b69604fc4 ("drm/simpledrm: Inline device-init helpers")
03d38605cee7 ("drm/simpledrm: Remove mem field from device structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6fdaed8c79887680bc46cb0a51775bd7c8645528 Mon Sep 17 00:00:00 2001
From: Hector Martin <marcan(a)marcan.st>
Date: Thu, 27 Oct 2022 22:57:11 +0900
Subject: [PATCH] drm/format-helper: Only advertise supported formats for
conversion
drm_fb_build_fourcc_list() currently returns all emulated formats
unconditionally as long as the native format is among them, even though
not all combinations have conversion helpers. Although the list is
arguably provided to userspace in precedence order, userspace can pick
something out-of-order (and thus break when it shouldn't), or simply
only support a format that is unsupported (and thus think it can work,
which results in the appearance of a hang as FB blits fail later on,
instead of the initialization error you'd expect in this case).
Add checks to filter the list of emulated formats to only those
supported for conversion to the native format. This presumes that there
is a single native format (only the first is checked, if there are
multiple). Refactoring this API to drop the native list or support it
properly (by returning the appropriate emulated->native mapping table)
is left for a future patch.
The simpledrm driver is left as-is with a full table of emulated
formats. This keeps all currently working conversions available and
drops all the broken ones (i.e. this a strict bugfix patch, adding no
new supported formats nor removing any actually working ones). In order
to avoid proliferation of emulated formats, future drivers should
advertise only XRGB8888 as the sole emulated format (since some
userspace assumes its presence).
This fixes a real user regression where the ?RGB2101010 support commit
started advertising it unconditionally where not supported, and KWin
decided to start to use it over the native format and broke, but also
the fixes the spurious RGB565/RGB888 formats which have been wrongly
unconditionally advertised since the dawn of simpledrm.
Fixes: 6ea966fca084 ("drm/simpledrm: Add [AX]RGB2101010 formats")
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hector Martin <marcan(a)marcan.st>
Acked-by: Pekka Paalanen <pekka.paalanen(a)collabora.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027135711.24425-1-marcan…
diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index e2f76621453c..3ee59bae9d2f 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -807,6 +807,38 @@ static bool is_listed_fourcc(const uint32_t *fourccs, size_t nfourccs, uint32_t
return false;
}
+static const uint32_t conv_from_xrgb8888[] = {
+ DRM_FORMAT_XRGB8888,
+ DRM_FORMAT_ARGB8888,
+ DRM_FORMAT_XRGB2101010,
+ DRM_FORMAT_ARGB2101010,
+ DRM_FORMAT_RGB565,
+ DRM_FORMAT_RGB888,
+};
+
+static const uint32_t conv_from_rgb565_888[] = {
+ DRM_FORMAT_XRGB8888,
+ DRM_FORMAT_ARGB8888,
+};
+
+static bool is_conversion_supported(uint32_t from, uint32_t to)
+{
+ switch (from) {
+ case DRM_FORMAT_XRGB8888:
+ case DRM_FORMAT_ARGB8888:
+ return is_listed_fourcc(conv_from_xrgb8888, ARRAY_SIZE(conv_from_xrgb8888), to);
+ case DRM_FORMAT_RGB565:
+ case DRM_FORMAT_RGB888:
+ return is_listed_fourcc(conv_from_rgb565_888, ARRAY_SIZE(conv_from_rgb565_888), to);
+ case DRM_FORMAT_XRGB2101010:
+ return to == DRM_FORMAT_ARGB2101010;
+ case DRM_FORMAT_ARGB2101010:
+ return to == DRM_FORMAT_XRGB2101010;
+ default:
+ return false;
+ }
+}
+
/**
* drm_fb_build_fourcc_list - Filters a list of supported color formats against
* the device's native formats
@@ -827,7 +859,9 @@ static bool is_listed_fourcc(const uint32_t *fourccs, size_t nfourccs, uint32_t
* be handed over to drm_universal_plane_init() et al. Native formats
* will go before emulated formats. Other heuristics might be applied
* to optimize the order. Formats near the beginning of the list are
- * usually preferred over formats near the end of the list.
+ * usually preferred over formats near the end of the list. Formats
+ * without conversion helpers will be skipped. New drivers should only
+ * pass in XRGB8888 and avoid exposing additional emulated formats.
*
* Returns:
* The number of color-formats 4CC codes returned in @fourccs_out.
@@ -839,7 +873,7 @@ size_t drm_fb_build_fourcc_list(struct drm_device *dev,
{
u32 *fourccs = fourccs_out;
const u32 *fourccs_end = fourccs_out + nfourccs_out;
- bool found_native = false;
+ uint32_t native_format = 0;
size_t i;
/*
@@ -858,26 +892,18 @@ size_t drm_fb_build_fourcc_list(struct drm_device *dev,
drm_dbg_kms(dev, "adding native format %p4cc\n", &fourcc);
- if (!found_native)
- found_native = is_listed_fourcc(driver_fourccs, driver_nfourccs, fourcc);
+ /*
+ * There should only be one native format with the current API.
+ * This API needs to be refactored to correctly support arbitrary
+ * sets of native formats, since it needs to report which native
+ * format to use for each emulated format.
+ */
+ if (!native_format)
+ native_format = fourcc;
*fourccs = fourcc;
++fourccs;
}
- /*
- * The plane's atomic_update helper converts the framebuffer's color format
- * to a native format when copying to device memory.
- *
- * If there is not a single format supported by both, device and
- * driver, the native formats are likely not supported by the conversion
- * helpers. Therefore *only* support the native formats and add a
- * conversion helper ASAP.
- */
- if (!found_native) {
- drm_warn(dev, "Format conversion helpers required to add extra formats.\n");
- goto out;
- }
-
/*
* The extra formats, emulated by the driver, go second.
*/
@@ -890,6 +916,9 @@ size_t drm_fb_build_fourcc_list(struct drm_device *dev,
} else if (fourccs == fourccs_end) {
drm_warn(dev, "Ignoring emulated format %p4cc\n", &fourcc);
continue; /* end of available output buffer */
+ } else if (!is_conversion_supported(fourcc, native_format)) {
+ drm_dbg_kms(dev, "Unsupported emulated format %p4cc\n", &fourcc);
+ continue; /* format is not supported for conversion */
}
drm_dbg_kms(dev, "adding emulated format %p4cc\n", &fourcc);
@@ -898,7 +927,6 @@ size_t drm_fb_build_fourcc_list(struct drm_device *dev,
++fourccs;
}
-out:
return fourccs - fourccs_out;
}
EXPORT_SYMBOL(drm_fb_build_fourcc_list);
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()")
e1f84eef313f ("drm/amdkfd: handle CPU fault on COW mapping")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5b994354af3cab770bf13386469c5725713679af Mon Sep 17 00:00:00 2001
From: Yang Li <yang.lee(a)linux.alibaba.com>
Date: Wed, 26 Oct 2022 10:00:54 +0800
Subject: [PATCH] drm/amdkfd: Fix NULL pointer dereference in
svm_migrate_to_ram()
./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:985:58-62: ERROR: p is NULL but dereferenced.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2549
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee(a)linux.alibaba.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 2797029bd500..22b077ac9a19 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -973,12 +973,10 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
out_unlock_svms:
mutex_unlock(&p->svms.lock);
out_unref_process:
+ pr_debug("CPU fault svms 0x%p address 0x%lx done\n", &p->svms, addr);
kfd_unref_process(p);
out_mmput:
mmput(mm);
-
- pr_debug("CPU fault svms 0x%p address 0x%lx done\n", &p->svms, addr);
-
return r ? VM_FAULT_SIGBUS : 0;
}
---------- Forwarded message ---------
From: Steve French <smfrench(a)gmail.com>
Date: Mon, Nov 7, 2022 at 5:54 PM
Subject: patch for stable "cifs: fix regression in very old smb1 mounts"
To: Stable <stable(a)vger.kernel.org>
Cc: CIFS <linux-cifs(a)vger.kernel.org>, Thorsten Leemhuis
<regressions(a)leemhuis.info>, ronnie sahlberg
<ronniesahlberg(a)gmail.com>
commit 2f6f19c7aaad "cifs: fix regression in very old smb1 mounts" upstream
Also attached upstream commit.
For 5.15 and 6.0 stable kernels
--
Thanks,
Steve
--
Thanks,
Steve
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/tty/serial/8250/8250_port.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
--
2.30.2
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
The fix was initially developed by Srikanth Thokala and Aman Kumar.
This together with the previous config change is the cleaned up version
of the original fix.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake")
Cc: <stable(a)vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter
Reported-by: Wentong Wu <wentong.wu(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/tty/serial/8250/8250_lpss.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_lpss.c b/drivers/tty/serial/8250/8250_lpss.c
index 7d9cddbfef40..0e43bdfb7459 100644
--- a/drivers/tty/serial/8250/8250_lpss.c
+++ b/drivers/tty/serial/8250/8250_lpss.c
@@ -174,6 +174,8 @@ static int ehl_serial_setup(struct lpss8250 *lpss, struct uart_port *port)
*/
up->dma = dma;
+ lpss->dma_maxburst = 16;
+
port->set_termios = dw8250_do_set_termios;
return 0;
--
2.30.2
DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT
should have been triggered instead. Since IIR_RDI has higher priority
than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop.
The problem seems to occur at least with some combinations of
small-sized transfers (I've reproduced the problem on Elkhart Lake PSE
UARTs).
If there's already an on-going Rx DMA and IIR_RDI triggers, fall
graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had
occurred.
8250_omap already considers IIR_RDI similar to this change so its
nothing unheard of.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Co-developed-by: Srikanth Thokala <srikanth.thokala(a)intel.com>
Signed-off-by: Srikanth Thokala <srikanth.thokala(a)intel.com>
Co-developed-by: Aman Kumar <aman.kumar(a)intel.com>
Signed-off-by: Aman Kumar <aman.kumar(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/tty/serial/8250/8250_port.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fe8662cd9402..92dd18716169 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1897,6 +1897,10 @@ EXPORT_SYMBOL_GPL(serial8250_modem_status);
static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
{
switch (iir & 0x3f) {
+ case UART_IIR_RDI:
+ if (!up->dma->rx_running)
+ break;
+ fallthrough;
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
fallthrough;
--
2.30.2
From: Sascha Hauer <s.hauer(a)pengutronix.de>
commit 0fddf9ad06fd9f439f137139861556671673e31c upstream.
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
though: It is calculated based on the maximum page read time, but the
timeout is also used for page write and block erase operations which
require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of
tBERS_max and tPROG_max.
This is for now the easiest and most obvious way to fix the driver.
There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us
the desired timeout for the current operation, so we could program the
timeout dynamically for each operation instead of setting a fixed
timeout. Also we could wire up the interrupt handler to actually detect
and forward timeouts occurred when waiting for the chip being ready.
As a sidenote I verified that the change in 06781a5026350 is really
correct. I wired up the interrupt handler in my tree and measured the
time between starting the operation and the timeout interrupt handler
coming in. The time increases 41us with each step in the timeout
register which corresponds to 4096 clock cycles with the 99MHz clock
that I have.
Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Acked-by: Han Xu <han.xu(a)nxp.com>
Tested-by: Tomasz Moń <tomasz.mon(a)camlingroup.com>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Tim Harvey <tharvey(a)gateworks.com>
---
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
index 02218c3b548f..b806a762d079 100644
--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
@@ -652,8 +652,9 @@ static void gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
unsigned int tRP_ps;
bool use_half_period;
int sample_delay_ps, sample_delay_factor;
- u16 busy_timeout_cycles;
+ unsigned int busy_timeout_cycles;
u8 wrn_dly_sel;
+ u64 busy_timeout_ps;
if (sdr->tRC_min >= 30000) {
/* ONFI non-EDO modes [0-3] */
@@ -677,7 +678,8 @@ static void gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
addr_setup_cycles = TO_CYCLES(sdr->tALS_min, period_ps);
data_setup_cycles = TO_CYCLES(sdr->tDS_min, period_ps);
data_hold_cycles = TO_CYCLES(sdr->tDH_min, period_ps);
- busy_timeout_cycles = TO_CYCLES(sdr->tWB_max + sdr->tR_max, period_ps);
+ busy_timeout_ps = max(sdr->tBERS_max, sdr->tPROG_max);
+ busy_timeout_cycles = TO_CYCLES(busy_timeout_ps, period_ps);
hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
--
2.25.1
Stable maintainers,
Please apply commit 4fa0e3ff217f775cb58d2d6d51820ec519243fb9
("ext4,f2fs: fix readahead of verity data") to stable, 5.10 and later. It
cherry-picks cleanly to 6.0 and 5.15. I'll send it out manually for 5.10.
- Eric
Commit 056d3fed3d1f ("tee: add tee_shm_register_{user,kernel}_buf()")
refactored tee_shm_register() into corresponding user and kernel space
functions named tee_shm_register_{user,kernel}_buf(). The upstream fix
commit 573ae4f13f63 ("tee: add overflow check in register_shm_helper()")
only applied to tee_shm_register_user_buf().
But the stable kernel 4.19, 5.4, 5.10 and 5.15 don't have the above
mentioned tee_shm_register() refactoring commit. Hence a direct backport
wasn't possible and the fix has to be rather applied to
tee_ioctl_shm_register().
Somehow the fix was correctly backported to 4.19 and 5.4 stable kernels
but the backports for 5.10 and 5.15 stable kernels were broken as fix
was applied to common tee_shm_register() function which broke its kernel
space users such as trusted keys driver.
Fortunately the backport for 5.10 stable kernel was incidently fixed by:
commit 606fe84a4185 ("tee: fix memory leak in tee_shm_register()"). So
fix the backport for 5.15 stable kernel as well.
Fixes: 578c349570d2 ("tee: add overflow check in register_shm_helper()")
Cc: stable(a)vger.kernel.org # 5.15
Reported-by: Sahil Malhotra <sahil.malhotra(a)nxp.com>
Signed-off-by: Sumit Garg <sumit.garg(a)linaro.org>
---
drivers/tee/tee_core.c | 3 +++
drivers/tee/tee_shm.c | 3 ---
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
index 3fc426dad2df..a44e5b53e7a9 100644
--- a/drivers/tee/tee_core.c
+++ b/drivers/tee/tee_core.c
@@ -334,6 +334,9 @@ tee_ioctl_shm_register(struct tee_context *ctx,
if (data.flags)
return -EINVAL;
+ if (!access_ok((void __user *)(unsigned long)data.addr, data.length))
+ return -EFAULT;
+
shm = tee_shm_register(ctx, data.addr, data.length,
TEE_SHM_DMA_BUF | TEE_SHM_USER_MAPPED);
if (IS_ERR(shm))
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index bd96ebb82c8e..6fb4400333fb 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -223,9 +223,6 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr,
goto err;
}
- if (!access_ok((void __user *)addr, length))
- return ERR_PTR(-EFAULT);
-
mutex_lock(&teedev->mutex);
shm->id = idr_alloc(&teedev->idr, shm, 1, 0, GFP_KERNEL);
mutex_unlock(&teedev->mutex);
--
2.34.1
From: Xiubo Li <xiubli(a)redhat.com>
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
URL: https://tracker.ceph.com/issues/57686
Reviewed-by: Luís Henriques <lhenriques(a)suse.de>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/snap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 9bceed2ebda3..77b948846d4d 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -854,6 +854,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
else
ceph_put_snap_realm(mdsc, realm);
+ realm = NULL;
+
if (p < e)
goto more;
--
2.31.1
Dzień dobry,
dostarczamy kompletne, nowoczesne laptopy wraz z oprogramowaniem, gotowe do pracy.
Jako autoryzowany partner Dell oraz Lenovo z platynowym statusem zapewniamy nowy sprzęt z możliwością konfiguracji według potrzeb użytkowników.
Stale wyposażamy wiele dużych organizacji, m.in, Polkomtel, Netia, Beko, Canal+, BOŚ Bank w nowoczesnych sprzęt, który zapewnia bezpieczeństwo i ciągłość działania biznesu.
Wysoka dostępność sprzętu, doświadczenie we współpracy z wieloma Partnerami Państwa formatu pozwala nam zapewnić Państwu wydajność i ergonomię laptopów, a także inne, najlepsze parametry.
Mógłbym zaprezentować odpowiadające Państwu rozwiązania?
Pozdrawiam
Marek Laskowski
I'm announcing the release of the 4.14.297 kernel.
All users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/admin-guide/hw-vuln/spectre.rst | 8
Documentation/admin-guide/kernel-parameters.txt | 13
Makefile | 2
arch/x86/entry/calling.h | 68 +++
arch/x86/entry/entry_32.S | 2
arch/x86/entry/entry_64.S | 38 +-
arch/x86/entry/entry_64_compat.S | 12
arch/x86/include/asm/cpu_device_id.h | 168 +++++++++
arch/x86/include/asm/cpufeatures.h | 16
arch/x86/include/asm/intel-family.h | 6
arch/x86/include/asm/msr-index.h | 14
arch/x86/include/asm/nospec-branch.h | 48 +-
arch/x86/kernel/cpu/amd.c | 21 -
arch/x86/kernel/cpu/bugs.c | 415 +++++++++++++++++++-----
arch/x86/kernel/cpu/common.c | 68 ++-
arch/x86/kernel/cpu/match.c | 44 ++
arch/x86/kernel/cpu/scattered.c | 1
arch/x86/kernel/process.c | 2
arch/x86/kvm/svm.c | 1
arch/x86/kvm/vmx.c | 51 ++
drivers/base/cpu.c | 8
drivers/cpufreq/acpi-cpufreq.c | 1
drivers/cpufreq/amd_freq_sensitivity.c | 1
drivers/idle/intel_idle.c | 45 ++
include/linux/cpu.h | 2
include/linux/mod_devicetable.h | 4
tools/arch/x86/include/asm/cpufeatures.h | 1
27 files changed, 898 insertions(+), 162 deletions(-)
Alexandre Chartre (2):
x86/bugs: Report AMD retbleed vulnerability
x86/bugs: Add AMD retbleed= boot parameter
Andrew Cooper (1):
x86/cpu/amd: Enumerate BTC_NO
Daniel Sneddon (1):
x86/speculation: Add RSB VM Exit protections
Greg Kroah-Hartman (1):
Linux 4.14.297
Ingo Molnar (1):
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Josh Poimboeuf (8):
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
x86/speculation: Fix firmware entry SPEC_CTRL handling
x86/speculation: Fix SPEC_CTRL write on SMT state change
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
x86/speculation: Remove x86_spec_ctrl_mask
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
KVM: VMX: Fix IBRS handling after vmexit
x86/speculation: Fill RSB on vmexit for IBRS
Kan Liang (1):
x86/cpufeature: Add facility to check for min microcode revisions
Mark Gross (1):
x86/cpu: Add a steppings field to struct x86_cpu_id
Nathan Chancellor (1):
x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Pawan Gupta (5):
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
x86/speculation: Add LFENCE to RSB fill sequence
x86/bugs: Add Cannon lake to RETBleed affected CPU list
x86/speculation: Disable RRSBA behavior
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
Peter Zijlstra (9):
x86/entry: Remove skip_r11rcx
x86/cpufeatures: Move RETPOLINE flags to word 11
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
x86/bugs: Optimize SPEC_CTRL MSR writes
x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation()
x86/bugs: Report Intel retbleed vulnerability
entel_idle: Disable IBRS during long idle
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
x86/common: Stamp out the stepping madness
Suraj Jitindar Singh (1):
Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"
Thadeu Lima de Souza Cascardo (1):
x86/entry: Add kernel IBRS implementation
Thomas Gleixner (2):
x86/devicetable: Move x86 specific macro out of generic code
x86/cpu: Add consistent CPU match macros
Added a check on the return value of preallocate_image_highmem(). If
memory preallocate is insufficient, S4 cannot be done;
I am playing 4K video on a machine with AMD or other graphics card and
only 8GiB memory, and the kernel is not configured with CONFIG_HIGHMEM.
When doing the S4 test, the analysis found that when the pages get from
minimum_image_size() is large enough, The preallocate_image_memory() and
preallocate_image_highmem() calls failed to obtain enough memory. Add
the judgment that memory preallocate is insufficient;
The detailed debugging data is as follows:
image_size: 3225923584, totalram_pages: 1968948 in
hibernate_reserved_size_init();
in hibernate_preallocate_memory():
code pages = minimum_image_size(saveable) = 717992, at this time(line):
count: 2030858
avail_normal: 2053753
highmem: 0
totalreserve_pages: 22895
max_size: 1013336
size: 787579
saveable: 1819905
When the code executes to:
pages = preallocate_image_memory(alloc, avail_normal), at that
time(line):
pages_highmem: 0
avail_normal: 1335761
alloc: 1017522
pages: 1017522
So enter the else branch judged by (pages < alloc), When executed to
size = preallocate_image_memory(alloc, avail_normal):
alloc = max_size - size = 225757;
size = preallocate_image_memory(alloc, avail_normal) = 168671, That is,
preallocate_image_memory() does not apply for all alloc memory pages,
because highmem is not enabled, and size_highmem will return 0 here, so
there is a memory page that has not been preallocated, so I think a
judgment needs to be added here.
But what I can't understand is that although pages are not preallocated
enough, "pages -= free_unnecessary_pages()" in the code below can also
discard some pages that have been preallocated, so I am not sure whether
it is appropriate to add a judgment here.
Cc: stable(a)vger.kernel.org
Signed-off-by: xiongxin <xiongxin(a)kylinos.cn>
Signed-off-by: huanglei <huanglei(a)kylinos.cn>
---
kernel/power/snapshot.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index c20ca5fb9adc..546d544cf7de 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1854,6 +1854,8 @@ int hibernate_preallocate_memory(void)
alloc = (count - pages) - size;
pages += preallocate_image_highmem(alloc);
} else {
+ unsigned long size_highmem = 0;
+
/*
* There are approximately max_size saveable pages at this point
* and we want to reduce this number down to size.
@@ -1863,8 +1865,13 @@ int hibernate_preallocate_memory(void)
pages_highmem += size;
alloc -= size;
size = preallocate_image_memory(alloc, avail_normal);
- pages_highmem += preallocate_image_highmem(alloc - size);
- pages += pages_highmem + size;
+ size_highmem = preallocate_image_highmem(alloc - size);
+ if (size_highmem < (alloc - size)) {
+ pr_err("Image allocation is %lu pages short, exit\n",
+ alloc - size - pages_highmem);
+ goto err_out;
+ }
+ pages += pages_highmem + size_highmem + size;
}
/*
--
2.25.1
The actual calculation formula in the code below is:
max_size = (count - (size + PAGES_FOR_IO)) / 2
- 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE);
But function comments are written differently, the comment is wrong?
By the way, what exactly do the "/ 2" and "2 *" mean?
Cc: stable(a)vger.kernel.org
Signed-off-by: xiongxin <xiongxin(a)kylinos.cn>
---
kernel/power/snapshot.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 2a406753af90..c20ca5fb9adc 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1723,8 +1723,8 @@ static unsigned long minimum_image_size(unsigned long saveable)
* /sys/power/reserved_size, respectively). To make this happen, we compute the
* total number of available page frames and allocate at least
*
- * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2
- * + 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE)
+ * ([page frames total] - PAGES_FOR_IO - [metadata pages]) / 2
+ * - 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE)
*
* of them, which corresponds to the maximum size of a hibernation image.
*
--
2.25.1
Can i trust you on this transaction?
I have important business to discuss with you as soon as possible.
This proposition will be in the best interest of you and me both.
commit 573ae4f13f630d6660008f1974c0a8a29c30e18a upstream.
With special lengths supplied by user space, tee_shm_register() has
an integer overflow when calculating the number of pages covered by a
supplied user space memory region.
This may cause pin_user_pages_fast() to do a NULL pointer dereference.
Fix this by adding an an explicit call to access_ok() in
tee_ioctl_shm_register() to catch an invalid user space address early.
Fixes: 033ddf12bcf5 ("tee: add register user memory")
Cc: stable(a)vger.kernel.org # 5.4
Cc: stable(a)vger.kernel.org # 5.10
Reported-by: Nimish Mishra <neelam.nimish(a)gmail.com>
Reported-by: Anirban Chakraborty <ch.anirban00727(a)gmail.com>
Reported-by: Debdeep Mukhopadhyay <debdeep.mukhopadhyay(a)gmail.com>
Suggested-by: Jerome Forissier <jerome.forissier(a)linaro.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[JW: backport to stable 5.4 and 5.10 + update commit message]
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
---
drivers/tee/tee_core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
index a7ccd4d2bd10..2db144d2d26f 100644
--- a/drivers/tee/tee_core.c
+++ b/drivers/tee/tee_core.c
@@ -182,6 +182,9 @@ tee_ioctl_shm_register(struct tee_context *ctx,
if (data.flags)
return -EINVAL;
+ if (!access_ok((void __user *)(unsigned long)data.addr, data.length))
+ return -EFAULT;
+
shm = tee_shm_register(ctx, data.addr, data.length,
TEE_SHM_DMA_BUF | TEE_SHM_USER_MAPPED);
if (IS_ERR(shm))
--
2.31.1
From: Xiubo Li <xiubli(a)redhat.com>
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
URL: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/snap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 9bceed2ebda3..baf17df05107 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
if (realm_to_rebuild && p >= e)
rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
- if (!first_realm)
+ if (!first_realm) {
first_realm = realm;
- else
+ realm = NULL;
+ } else {
ceph_put_snap_realm(mdsc, realm);
+ }
if (p < e)
goto more;
--
2.31.1
Hi Ted,
Sorry to bother you with a quick question: I am doing a patch for configurable lazytime for EXT4, Before submit the patch I would like to double check the names first. I added a "lazytime_flush=xxx" in mount show options, not sure if this is ok to continue.
As far as I know, lazytime is an option for EXT4 and other file systems. My patch make ext4 can be supported as before: without any parameters E.g. mount -o lazytime,...
Or can be supported with parameters (in ms) E.g. mount -o lazytime=30, ...
So when we show the options by mount -t ext4, if no parameter configured, the result shows as before:
/dev/nvme1n1 on /rdsdbdata1 type ext4 (rw,noatime,nodiratime,lazytime,data=ordered)
If has parameters configured, the result shows as below, added one "lazytime_flush=xxx". Not sure if this sound ok for you?
/dev/nvme1n1 on /rdsdbdata1 type ext4 (rw,noatime,nodiratime,lazytime,lazytime_flush=100ms,data=ordered)
I tested "ext4/053"of latest xfstest-dev with my patch and it can pass the result(I changed a bit 053 since I am running on 5.10, removed one mount option" no_prefetch_block_bitmaps" that 5.10 doesn't support).
Thanks
Davina
The patch titled
Subject: mm/damon/dbgfs: check if rm_contexts input is for a real context
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-dbgfs-check-if-rm_contexts-input-is-for-a-real-context.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/dbgfs: check if rm_contexts input is for a real context
Date: Mon, 7 Nov 2022 16:50:00 +0000
A user could write a name of a file under 'damon/' debugfs directory,
which is not a user-created context, to 'rm_contexts' file. In the case,
'dbgfs_rm_context()' just assumes it's the valid DAMON context directory
only if a file of the name exist. As a result, invalid memory access
could happen as below. Fix the bug by checking if the given input is for
a directory. This check can filter out non-context inputs because
directories under 'damon/' debugfs directory can be created via only
'mk_contexts' file.
This bug has found by syzbot[1].
[1] https://lore.kernel.org/damon/000000000000ede3ac05ec4abf8e@google.com/
Link: https://lkml.kernel.org/r/20221107165001.5717-2-sj@kernel.org
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: syzbot+6087eafb76a94c4ac9eb(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org> [5.15.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/dbgfs.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-check-if-rm_contexts-input-is-for-a-real-context
+++ a/mm/damon/dbgfs.c
@@ -890,6 +890,7 @@ out:
static int dbgfs_rm_context(char *name)
{
struct dentry *root, *dir, **new_dirs;
+ struct inode *inode;
struct damon_ctx **new_ctxs;
int i, j;
int ret = 0;
@@ -905,6 +906,12 @@ static int dbgfs_rm_context(char *name)
if (!dir)
return -ENOENT;
+ inode = d_inode(dir);
+ if (!S_ISDIR(inode->i_mode)) {
+ ret = -EINVAL;
+ goto out_dput;
+ }
+
new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*dbgfs_dirs),
GFP_KERNEL);
if (!new_dirs) {
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-dbgfs-check-if-rm_contexts-input-is-for-a-real-context.patch
mm-damon-core-split-out-damos-charged-region-skip-logic-into-a-new-function.patch
mm-damon-core-split-damos-application-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-stat-update-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-quota-adjustment-logic-into-a-new-function.patch
mm-damon-sysfs-use-damon_addr_range-for-regions-start-and-end-values.patch
mm-damon-sysfs-remove-parameters-of-damon_sysfs_region_alloc.patch
mm-damon-sysfs-move-sysfs_lock-to-common-module.patch
mm-damon-sysfs-move-unsigned-long-range-directory-to-common-module.patch
mm-damon-sysfs-split-out-kdamond-independent-schemes-stats-update-logic-into-a-new-function.patch
mm-damon-sysfs-split-out-schemes-directory-implementation-to-separate-file.patch
mm-damon-modules-deduplicate-init-steps-for-damon-context-setup.patch
mm-damon-reclaimlru_sort-remove-unnecessarily-included-headers.patch
mm-damon-reclaim-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_reclaims-enabled-parameter.patch
mm-damon-lru_sort-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_lru_sorts-enabled-parameter.patch
docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch
docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch
mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch
mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch
mm-damon-sysfs-schemes-implement-scheme-region-directory.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch
tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch
docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch
docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch
selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the
page tables associated with the address range. For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final. However,
__unmap_hugepage_range_final assumes the passed vma is about to be
removed and deletes the vma_lock to prevent pmd sharing as the vma is
on the way out. In the case of madvise(MADV_DONTNEED) the vma remains,
but the missing vma_lock prevents pmd sharing and could potentially
lead to issues with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap. Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing. Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
was not set in new pages added to the page table. This resulted in
pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
Create a new routine clear_hugetlb_page_range() that can be called from
madvise(MADV_DONTNEED) for hugetlb vmas. It has the same setup as
zap_page_range, but does not delete the vma_lock.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
---
v2 - Fix build issues associated with !CONFIG_ADVISE_SYSCALLS
include/linux/hugetlb.h | 7 +++++
mm/hugetlb.c | 62 +++++++++++++++++++++++++++++++++--------
mm/madvise.c | 5 +++-
3 files changed, 61 insertions(+), 13 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a899bc76d677..0246e77be3a3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,6 +158,8 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *,
zap_flags_t);
+void clear_hugetlb_page_range(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end);
void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
@@ -426,6 +428,11 @@ static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
BUG();
}
+static void __maybe_unused clear_hugetlb_page_range(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end)
+{
+}
+
static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
unsigned int flags)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 931789a8f734..807cfd2884fa 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5202,29 +5202,67 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
tlb_flush_mmu_tlbonly(tlb);
}
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+static void __unmap_hugepage_range_locking(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page,
- zap_flags_t zap_flags)
+ zap_flags_t zap_flags, bool final)
{
hugetlb_vma_lock_write(vma);
i_mmap_lock_write(vma->vm_file->f_mapping);
__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
- /*
- * Unlock and free the vma lock before releasing i_mmap_rwsem. When
- * the vma_lock is freed, this makes the vma ineligible for pmd
- * sharing. And, i_mmap_rwsem is required to set up pmd sharing.
- * This is important as page tables for this unmapped range will
- * be asynchrously deleted. If the page tables are shared, there
- * will be issues when accessed by someone else.
- */
- __hugetlb_vma_unlock_write_free(vma);
+ if (final) {
+ /*
+ * Unlock and free the vma lock before releasing i_mmap_rwsem.
+ * When the vma_lock is freed, this makes the vma ineligible
+ * for pmd sharing. And, i_mmap_rwsem is required to set up
+ * pmd sharing. This is important as page tables for this
+ * unmapped range will be asynchrously deleted. If the page
+ * tables are shared, there will be issues when accessed by
+ * someone else.
+ */
+ __hugetlb_vma_unlock_write_free(vma);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ } else {
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ hugetlb_vma_unlock_write(vma);
+ }
+}
- i_mmap_unlock_write(vma->vm_file->f_mapping);
+void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, struct page *ref_page,
+ zap_flags_t zap_flags)
+{
+ __unmap_hugepage_range_locking(tlb, vma, start, end, ref_page,
+ zap_flags, true);
}
+#ifdef CONFIG_ADVISE_SYSCALLS
+/*
+ * Similar setup as in zap_page_range(). madvise(MADV_DONTNEED) can not call
+ * zap_page_range for hugetlb vmas as __unmap_hugepage_range_final will delete
+ * the associated vma_lock.
+ */
+void clear_hugetlb_page_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end)
+{
+ struct mmu_notifier_range range;
+ struct mmu_gather tlb;
+
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+ start, end);
+ tlb_gather_mmu(&tlb, vma->vm_mm);
+ update_hiwater_rss(vma->vm_mm);
+
+ __unmap_hugepage_range_locking(&tlb, vma, start, end, NULL, 0, false);
+
+ mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
+}
+#endif
+
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page,
zap_flags_t zap_flags)
diff --git a/mm/madvise.c b/mm/madvise.c
index 2baa93ca2310..90577a669635 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -790,7 +790,10 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- zap_page_range(vma, start, end - start);
+ if (!is_vm_hugetlb_page(vma))
+ zap_page_range(vma, start, end - start);
+ else
+ clear_hugetlb_page_range(vma, start, end);
return 0;
}
--
2.37.3
The patch titled
Subject: arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
arch-x86-mm-hugetlbpagec-pud_huge-returns-0-when-using-2-level-paging.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging
Date: Mon, 7 Nov 2022 11:10:10 +0900
The following bug is reported to be triggered when starting X on x86-32
system with i915:
[ 225.777375] kernel BUG at mm/memory.c:2664!
[ 225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
[ 225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
[ 225.777415] Hardware name: /8I865G775-G, BIOS F1 08/29/2006
[ 225.777421] EIP: __apply_to_page_range+0x24d/0x31c
[ 225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
[ 225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
[ 225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
[ 225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
[ 225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
[ 225.777479] Call Trace:
[ 225.777486] ? i915_memcpy_init_early+0x63/0x63 [i915]
[ 225.777684] apply_to_page_range+0x21/0x27
[ 225.777694] ? i915_memcpy_init_early+0x63/0x63 [i915]
[ 225.777870] remap_io_mapping+0x49/0x75 [i915]
[ 225.778046] ? i915_memcpy_init_early+0x63/0x63 [i915]
[ 225.778220] ? mutex_unlock+0xb/0xd
[ 225.778231] ? i915_vma_pin_fence+0x6d/0xf7 [i915]
[ 225.778420] vm_fault_gtt+0x2a9/0x8f1 [i915]
[ 225.778644] ? lock_is_held_type+0x56/0xe7
[ 225.778655] ? lock_is_held_type+0x7a/0xe7
[ 225.778663] ? 0xc1000000
[ 225.778670] __do_fault+0x21/0x6a
[ 225.778679] handle_mm_fault+0x708/0xb21
[ 225.778686] ? mt_find+0x21e/0x5ae
[ 225.778696] exc_page_fault+0x185/0x705
[ 225.778704] ? doublefault_shim+0x127/0x127
[ 225.778715] handle_exception+0x130/0x130
[ 225.778723] EIP: 0xb700468a
Recently pud_huge() got aware of non-present entry by commit 3a194f3f8ad0
("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present
pud entry") to handle some special states of gigantic page. However, it's
overlooked that pud_none() always returns false when running with 2-level
paging, and as a result pmd_huge() can return true pointlessly.
Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this.
Link: https://lkml.kernel.org/r/20221107021010.2449306-1-naoya.horiguchi@linux.dev
Fixes: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Ville Syrj��l�� <ville.syrjala(a)linux.intel.com>
Tested-by: Ville Syrj��l�� <ville.syrjala(a)linux.intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/mm/hugetlbpage.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/mm/hugetlbpage.c~arch-x86-mm-hugetlbpagec-pud_huge-returns-0-when-using-2-level-paging
+++ a/arch/x86/mm/hugetlbpage.c
@@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
*/
int pud_huge(pud_t pud)
{
+#if CONFIG_PGTABLE_LEVELS > 2
return !pud_none(pud) &&
(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
+#else
+ return 0;
+#endif
}
#ifdef CONFIG_HUGETLB_PAGE
_
Patches currently in -mm which might be from naoya.horiguchi(a)nec.com are
arch-x86-mm-hugetlbpagec-pud_huge-returns-0-when-using-2-level-paging.patch
mmhwpoisonhugetlbmemory_hotplug-hotremove-memory-section-with-hwpoisoned-hugepage.patch
mm-hwpoison-move-definitions-of-num_poisoned_pages_-to-memory-failurec.patch
mm-hwpoison-pass-pfn-to-num_poisoned_pages_.patch
mm-hwpoison-introduce-per-memory_block-hwpoison-counter.patch
This change is very similar to the change that was made for shmem [1],
and it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
---
fs/hugetlbfs/inode.c | 13 ++++++-------
mm/hugetlb.c | 4 ++++
mm/memory-failure.c | 5 ++++-
3 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
--
2.38.1.431.g37b22c650d-goog
Explicitly print the VMSA dump at KERN_DEBUG log level, KERN_CONT uses
KERNEL_DEFAULT if the previous log line has a newline, i.e. if there's
nothing to continuing, and as a result the VMSA gets dumped when it
shouldn't.
The KERN_CONT documentation says it defaults back to KERNL_DEFAULT if the
previous log line has a newline. So switch from KERN_CONT to
print_hex_dump_debug().
Jarkko pointed this out in reference to the original patch. See:
https://lore.kernel.org/all/YuPMeWX4uuR1Tz3M@kernel.org/
print_hex_dump(KERN_DEBUG, ...) was pointed out there, but
print_hex_dump_debug() should similar.
Fixes: 6fac42f127b8 ("KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog")
Signed-off-by: Peter Gonda <pgonda(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Harald Hoyer <harald(a)profian.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: x86(a)kernel.org
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: kvm(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
arch/x86/kvm/svm/sev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c0c9ed5e279cb..9b8db157cf773 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -605,7 +605,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->dr6 = svm->vcpu.arch.dr6;
pr_debug("Virtual Machine Save Area (VMSA):\n");
- print_hex_dump(KERN_CONT, "", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
+ print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
return 0;
}
--
2.38.1.431.g37b22c650d-goog
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
055f37f84e30 ("KVM: x86: emulator: update the emulation mode after rsm")
0128116550ac ("KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset")
fa75e08bbe4f ("KVM: x86: Invoke kvm_smm_changed() immediately after clearing SMM flag")
edce46548b70 ("KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper")
25b17226cd9a ("KVM: x86: Emulate triple fault shutdown if RSM emulation fails")
78fcb2c91adf ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")
02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
9ec19493fb86 ("KVM: x86: clear SMM flags before loading state while leaving SMM")
c5833c7a43a6 ("KVM: x86: Open code kvm_set_hflags")
ed19321fb657 ("KVM: x86: Load SMRAM in a single shot when leaving SMM")
a821bab2d1ee ("KVM: VMX: Move VMX specific files to a "vmx" subdirectory")
a633e41e7362 ("KVM: nVMX: assimilate nested_vmx_entry_failure() into nested_vmx_enter_non_root_mode()")
7671ce21b13b ("KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()")
d63907dc7dd1 ("KVM: nVMX: rename enter_vmx_non_root_mode to nested_vmx_enter_non_root_mode")
7e7126846c95 ("kvm: nVMX: fix entry with pending interrupt if APICv is enabled")
e6c67d8cf117 ("KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv")
b5861e5cf2fc ("KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2")
61ada7488ffd ("KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExit")
8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
7f7f1ba33cf2 ("KVM: x86: do not load vmcs12 pages while still in SMM")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 055f37f84e304e59c046d1accfd8f08462f52c4c Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 25 Oct 2022 15:47:30 +0300
Subject: [PATCH] KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5522a23d985..33385ebae100 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2662,7 +2662,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
* those side effects need to be explicitly handled for both success
* and shutdown.
*/
- return X86EMUL_CONTINUE;
+ return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown:
ctxt->ops->triple_fault(ctxt);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
055f37f84e30 ("KVM: x86: emulator: update the emulation mode after rsm")
0128116550ac ("KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset")
fa75e08bbe4f ("KVM: x86: Invoke kvm_smm_changed() immediately after clearing SMM flag")
edce46548b70 ("KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper")
25b17226cd9a ("KVM: x86: Emulate triple fault shutdown if RSM emulation fails")
78fcb2c91adf ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")
02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
9ec19493fb86 ("KVM: x86: clear SMM flags before loading state while leaving SMM")
c5833c7a43a6 ("KVM: x86: Open code kvm_set_hflags")
ed19321fb657 ("KVM: x86: Load SMRAM in a single shot when leaving SMM")
a821bab2d1ee ("KVM: VMX: Move VMX specific files to a "vmx" subdirectory")
a633e41e7362 ("KVM: nVMX: assimilate nested_vmx_entry_failure() into nested_vmx_enter_non_root_mode()")
7671ce21b13b ("KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()")
d63907dc7dd1 ("KVM: nVMX: rename enter_vmx_non_root_mode to nested_vmx_enter_non_root_mode")
7e7126846c95 ("kvm: nVMX: fix entry with pending interrupt if APICv is enabled")
e6c67d8cf117 ("KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv")
b5861e5cf2fc ("KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2")
61ada7488ffd ("KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExit")
8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
7f7f1ba33cf2 ("KVM: x86: do not load vmcs12 pages while still in SMM")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 055f37f84e304e59c046d1accfd8f08462f52c4c Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 25 Oct 2022 15:47:30 +0300
Subject: [PATCH] KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5522a23d985..33385ebae100 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2662,7 +2662,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
* those side effects need to be explicitly handled for both success
* and shutdown.
*/
- return X86EMUL_CONTINUE;
+ return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown:
ctxt->ops->triple_fault(ctxt);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
055f37f84e30 ("KVM: x86: emulator: update the emulation mode after rsm")
0128116550ac ("KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset")
fa75e08bbe4f ("KVM: x86: Invoke kvm_smm_changed() immediately after clearing SMM flag")
edce46548b70 ("KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper")
25b17226cd9a ("KVM: x86: Emulate triple fault shutdown if RSM emulation fails")
78fcb2c91adf ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")
02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
9ec19493fb86 ("KVM: x86: clear SMM flags before loading state while leaving SMM")
c5833c7a43a6 ("KVM: x86: Open code kvm_set_hflags")
ed19321fb657 ("KVM: x86: Load SMRAM in a single shot when leaving SMM")
a821bab2d1ee ("KVM: VMX: Move VMX specific files to a "vmx" subdirectory")
a633e41e7362 ("KVM: nVMX: assimilate nested_vmx_entry_failure() into nested_vmx_enter_non_root_mode()")
7671ce21b13b ("KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()")
d63907dc7dd1 ("KVM: nVMX: rename enter_vmx_non_root_mode to nested_vmx_enter_non_root_mode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 055f37f84e304e59c046d1accfd8f08462f52c4c Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 25 Oct 2022 15:47:30 +0300
Subject: [PATCH] KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5522a23d985..33385ebae100 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2662,7 +2662,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
* those side effects need to be explicitly handled for both success
* and shutdown.
*/
- return X86EMUL_CONTINUE;
+ return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown:
ctxt->ops->triple_fault(ctxt);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
055f37f84e30 ("KVM: x86: emulator: update the emulation mode after rsm")
0128116550ac ("KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset")
fa75e08bbe4f ("KVM: x86: Invoke kvm_smm_changed() immediately after clearing SMM flag")
edce46548b70 ("KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper")
25b17226cd9a ("KVM: x86: Emulate triple fault shutdown if RSM emulation fails")
78fcb2c91adf ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 055f37f84e304e59c046d1accfd8f08462f52c4c Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 25 Oct 2022 15:47:30 +0300
Subject: [PATCH] KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5522a23d985..33385ebae100 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2662,7 +2662,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
* those side effects need to be explicitly handled for both success
* and shutdown.
*/
- return X86EMUL_CONTINUE;
+ return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown:
ctxt->ops->triple_fault(ctxt);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
055f37f84e30 ("KVM: x86: emulator: update the emulation mode after rsm")
0128116550ac ("KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset")
fa75e08bbe4f ("KVM: x86: Invoke kvm_smm_changed() immediately after clearing SMM flag")
edce46548b70 ("KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper")
25b17226cd9a ("KVM: x86: Emulate triple fault shutdown if RSM emulation fails")
78fcb2c91adf ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 055f37f84e304e59c046d1accfd8f08462f52c4c Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 25 Oct 2022 15:47:30 +0300
Subject: [PATCH] KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5522a23d985..33385ebae100 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2662,7 +2662,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
* those side effects need to be explicitly handled for both success
* and shutdown.
*/
- return X86EMUL_CONTINUE;
+ return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown:
ctxt->ops->triple_fault(ctxt);
A user could write a name of a file under 'damon/' debugfs directory,
which is not a user-created context, to 'rm_contexts' file. In the
case, 'dbgfs_rm_context()' just assumes it's the valid DAMON context
directory only if a file of the name exist. As a result, invalid memory
access could happen as below. Fix the bug by checking if the given
input is for a directory. This check can filter out non-context inputs
because directories under 'damon/' debugfs directory can be created via
only 'mk_contexts' file.
This bug has found by syzbot[1].
[1] https://lore.kernel.org/damon/000000000000ede3ac05ec4abf8e@google.com/
Reported-by: syzbot+6087eafb76a94c4ac9eb(a)syzkaller.appspotmail.com
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Cc: <stable(a)vger.kernel.org> # 5.15.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/dbgfs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 6f0ae7d3ae39..b3f454a5c682 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -890,6 +890,7 @@ static ssize_t dbgfs_mk_context_write(struct file *file,
static int dbgfs_rm_context(char *name)
{
struct dentry *root, *dir, **new_dirs;
+ struct inode *inode;
struct damon_ctx **new_ctxs;
int i, j;
int ret = 0;
@@ -905,6 +906,12 @@ static int dbgfs_rm_context(char *name)
if (!dir)
return -ENOENT;
+ inode = d_inode(dir);
+ if (!S_ISDIR(inode->i_mode)) {
+ ret = -EINVAL;
+ goto out_dput;
+ }
+
new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*dbgfs_dirs),
GFP_KERNEL);
if (!new_dirs) {
--
2.25.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
b333b8ebb85d ("KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL")
18e897d213cb ("KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()")
2f4073e08f4c ("KVM: VMX: Enable Notify VM exit")
938c8745bcf2 ("KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings")
ed2351174e38 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
35875316384b ("KVM: x86: Allow userspace to set maximum VCPU id for VM")
e9bf3acb23f0 ("KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP")
8aba09588d2a ("KVM: s390: Add CPU dump functionality")
0460eb35b443 ("KVM: s390: Add configuration dump functionality")
35d02493dba1 ("KVM: s390: pv: Add query interface")
47e8eec83262 ("Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b333b8ebb85d62469f32b52fa03fd7d1522afc03 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 6 Oct 2022 00:03:10 +0000
Subject: [PATCH] KVM: VMX: Ignore guest CPUID for host userspace writes to
DEBUGCTL
Ignore guest CPUID for host userspace writes to the DEBUGCTL MSR, KVM's
ABI is that setting CPUID vs. state can be done in any order, i.e. KVM
allows userspace to stuff MSRs prior to setting the guest's CPUID that
makes the new MSR "legal".
Keep the vmx_get_perf_capabilities() check for guest writes, even though
it's technically unnecessary since the vCPU's PERF_CAPABILITIES is
consulted when refreshing LBR support. A future patch will clean up
vmx_get_perf_capabilities() to avoid the RDMSR on every call, at which
point the paranoia will incur no meaningful overhead.
Note, prior to vmx_get_perf_capabilities() checking that the host fully
supports LBRs via x86_perf_get_lbr(), KVM effectively relied on
intel_pmu_lbr_is_enabled() to guard against host userspace enabling LBRs
on platforms without full support.
Fixes: c646236344e9 ("KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20221006000314.73240-5-seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 981b38355066..63247c57c72c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2021,16 +2021,16 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
+static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
- guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
- intel_pmu_lbr_is_enabled(vcpu))
+ (host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
return debugctl;
@@ -2105,7 +2105,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmcs_writel(GUEST_SYSENTER_ESP, data);
break;
case MSR_IA32_DEBUGCTLMSR: {
- u64 invalid = data & ~vcpu_supported_debugctl(vcpu);
+ u64 invalid;
+
+ invalid = data & ~vmx_get_supported_debugctl(vcpu, msr_info->host_initiated);
if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
if (report_ignored_msrs)
vcpu_unimpl(vcpu, "%s: BTF|LBR in IA32_DEBUGCTLMSR 0x%llx, nop\n",
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
18e897d213cb ("KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()")
2f4073e08f4c ("KVM: VMX: Enable Notify VM exit")
938c8745bcf2 ("KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings")
ed2351174e38 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
35875316384b ("KVM: x86: Allow userspace to set maximum VCPU id for VM")
e9bf3acb23f0 ("KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP")
8aba09588d2a ("KVM: s390: Add CPU dump functionality")
0460eb35b443 ("KVM: s390: Add configuration dump functionality")
35d02493dba1 ("KVM: s390: pv: Add query interface")
47e8eec83262 ("Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 18e897d213cb152c786abab14919196bd9dc3a9f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 6 Oct 2022 00:03:09 +0000
Subject: [PATCH] KVM: VMX: Fold vmx_supported_debugctl() into
vcpu_supported_debugctl()
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only
caller. Setting bits only to clear them a few instructions later is
rather silly, and splitting the logic makes things seem more complicated
than they actually are.
Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single
reference to the pair of bits. The extra layer of indirection provides
no meaningful value and makes it unnecessarily tedious to understand
what KVM is doing.
No functional change.
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20221006000314.73240-4-seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 3bd7a8970618..07254314f3dd 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -24,8 +24,6 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f
-#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
-
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -421,19 +419,6 @@ static inline u64 vmx_get_perf_capabilities(void)
return perf_cap;
}
-static inline u64 vmx_supported_debugctl(void)
-{
- u64 debugctl = 0;
-
- if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
- debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
-
- if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR_MASK;
-
- return debugctl;
-}
-
static inline bool cpu_has_notify_vmexit(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 65f092e4a81b..981b38355066 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2023,13 +2023,15 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
{
- u64 debugctl = vmx_supported_debugctl();
+ u64 debugctl = 0;
- if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR_MASK;
+ if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
- if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
- debugctl &= ~DEBUGCTLMSR_BUS_LOCK_DETECT;
+ if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
+ intel_pmu_lbr_is_enabled(vcpu))
+ debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
return debugctl;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
18e897d213cb ("KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()")
2f4073e08f4c ("KVM: VMX: Enable Notify VM exit")
938c8745bcf2 ("KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings")
ed2351174e38 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
35875316384b ("KVM: x86: Allow userspace to set maximum VCPU id for VM")
e9bf3acb23f0 ("KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP")
8aba09588d2a ("KVM: s390: Add CPU dump functionality")
0460eb35b443 ("KVM: s390: Add configuration dump functionality")
35d02493dba1 ("KVM: s390: pv: Add query interface")
47e8eec83262 ("Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 18e897d213cb152c786abab14919196bd9dc3a9f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 6 Oct 2022 00:03:09 +0000
Subject: [PATCH] KVM: VMX: Fold vmx_supported_debugctl() into
vcpu_supported_debugctl()
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only
caller. Setting bits only to clear them a few instructions later is
rather silly, and splitting the logic makes things seem more complicated
than they actually are.
Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single
reference to the pair of bits. The extra layer of indirection provides
no meaningful value and makes it unnecessarily tedious to understand
what KVM is doing.
No functional change.
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20221006000314.73240-4-seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 3bd7a8970618..07254314f3dd 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -24,8 +24,6 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f
-#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
-
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -421,19 +419,6 @@ static inline u64 vmx_get_perf_capabilities(void)
return perf_cap;
}
-static inline u64 vmx_supported_debugctl(void)
-{
- u64 debugctl = 0;
-
- if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
- debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
-
- if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR_MASK;
-
- return debugctl;
-}
-
static inline bool cpu_has_notify_vmexit(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 65f092e4a81b..981b38355066 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2023,13 +2023,15 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
{
- u64 debugctl = vmx_supported_debugctl();
+ u64 debugctl = 0;
- if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR_MASK;
+ if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
- if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
- debugctl &= ~DEBUGCTLMSR_BUS_LOCK_DETECT;
+ if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
+ intel_pmu_lbr_is_enabled(vcpu))
+ debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
return debugctl;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
145dfad998ea ("KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs")
cf8e55fe50df ("KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64")
4732f2444acd ("KVM: x86: Making the module parameter of vPMU more common")
b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization")
f800650a4ed2 ("KVM: x86: SVM: add module param to control TSC scaling")
4c84926e229e ("KVM: x86: SVM: add module param to control LBR virtualization")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 145dfad998eac74abc59219d936e905766ba2d98 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 6 Oct 2022 00:03:08 +0000
Subject: [PATCH] KVM: VMX: Advertise PMU LBRs if and only if perf supports
LBRs
Advertise LBR support to userspace via MSR_IA32_PERF_CAPABILITIES if and
only if perf fully supports LBRs. Perf may disable LBRs (by zeroing the
number of LBRs) even on platforms the allegedly support LBRs, e.g. if
probing any LBR MSRs during setup fails.
Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES")
Reported-by: Like Xu <like.xu.linux(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20221006000314.73240-3-seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 87c4e46daf37..3bd7a8970618 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -400,6 +400,7 @@ static inline bool vmx_pebs_supported(void)
static inline u64 vmx_get_perf_capabilities(void)
{
u64 perf_cap = PMU_CAP_FW_WRITES;
+ struct x86_pmu_lbr lbr;
u64 host_perf_cap = 0;
if (!enable_pmu)
@@ -408,7 +409,8 @@ static inline u64 vmx_get_perf_capabilities(void)
if (boot_cpu_has(X86_FEATURE_PDCM))
rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
- perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+ if (x86_perf_get_lbr(&lbr) >= 0 && lbr.nr)
+ perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
if (vmx_pebs_supported()) {
perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
86c4f0d547f6 ("KVM: x86: Mask off reserved bits in CPUID.8000001FH")
e39f00f60ebd ("KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR")
4bf48e3c0aaf ("KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled")
d9db0fd6c5c9 ("KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features")
013380782d4d ("KVM: x86: Move reverse CPUID helpers to separate header file")
01de8682b32d ("KVM: x86: Add reverse-CPUID lookup support for scattered SGX features")
4e66c0cb79b7 ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
e42033342293 ("KVM: x86: Advertise INVPCID by default")
916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM")
9d4747d02376 ("KVM: SVM: Remove the call to sev_platform_status() during setup")
dc46515cf838 ("KVM: x86: Move illegal GPA helper out of the MMU code")
1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept")
ec7771ab471b ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
cd313569f581 ("KVM: x86: mmu: Move translate_gpa() to mmu.c")
985ab2780164 ("KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only")
6ca9a6f3adef ("KVM: x86/mmu: Add MMU-internal header")
06e7852c0ffb ("KVM: SVM: Add vmcb_ prefix to mark_*() functions")
f25a9dec2da3 ("KVM: x86/mmu: Drop kvm_arch_write_log_dirty() wrapper")
2dbebf7ae1ed ("KVM: nVMX: Plumb L2 GPA through to PML emulation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 86c4f0d547f6460d0426ebb3ba0614f1134b8cda Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Thu, 29 Sep 2022 15:52:03 -0700
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.8000001FH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and
should be masked off.
Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220929225203.2234702-6-jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
[Clear NumVMPL too. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a0292ba650df..0810e93cbedc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1199,7 +1199,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
} else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
-
+ /* Clear NumVMPL since KVM does not support VMPL. */
+ entry->ebx &= ~GENMASK(31, 12);
/*
* Enumerate '0' for "PA bits reduction", the adjusted
* MAXPHYADDR is enumerated directly (see 0x80000008).
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
86c4f0d547f6 ("KVM: x86: Mask off reserved bits in CPUID.8000001FH")
e39f00f60ebd ("KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR")
4bf48e3c0aaf ("KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled")
d9db0fd6c5c9 ("KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features")
013380782d4d ("KVM: x86: Move reverse CPUID helpers to separate header file")
01de8682b32d ("KVM: x86: Add reverse-CPUID lookup support for scattered SGX features")
4e66c0cb79b7 ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
e42033342293 ("KVM: x86: Advertise INVPCID by default")
916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM")
9d4747d02376 ("KVM: SVM: Remove the call to sev_platform_status() during setup")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 86c4f0d547f6460d0426ebb3ba0614f1134b8cda Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Thu, 29 Sep 2022 15:52:03 -0700
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.8000001FH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and
should be masked off.
Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220929225203.2234702-6-jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
[Clear NumVMPL too. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a0292ba650df..0810e93cbedc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1199,7 +1199,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
} else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
-
+ /* Clear NumVMPL since KVM does not support VMPL. */
+ entry->ebx &= ~GENMASK(31, 12);
/*
* Enumerate '0' for "PA bits reduction", the adjusted
* MAXPHYADDR is enumerated directly (see 0x80000008).
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
86c4f0d547f6 ("KVM: x86: Mask off reserved bits in CPUID.8000001FH")
e39f00f60ebd ("KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR")
4bf48e3c0aaf ("KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled")
d9db0fd6c5c9 ("KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features")
013380782d4d ("KVM: x86: Move reverse CPUID helpers to separate header file")
01de8682b32d ("KVM: x86: Add reverse-CPUID lookup support for scattered SGX features")
4e66c0cb79b7 ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
e42033342293 ("KVM: x86: Advertise INVPCID by default")
916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM")
9d4747d02376 ("KVM: SVM: Remove the call to sev_platform_status() during setup")
dc46515cf838 ("KVM: x86: Move illegal GPA helper out of the MMU code")
1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept")
ec7771ab471b ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
cd313569f581 ("KVM: x86: mmu: Move translate_gpa() to mmu.c")
985ab2780164 ("KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only")
6ca9a6f3adef ("KVM: x86/mmu: Add MMU-internal header")
06e7852c0ffb ("KVM: SVM: Add vmcb_ prefix to mark_*() functions")
f25a9dec2da3 ("KVM: x86/mmu: Drop kvm_arch_write_log_dirty() wrapper")
2dbebf7ae1ed ("KVM: nVMX: Plumb L2 GPA through to PML emulation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 86c4f0d547f6460d0426ebb3ba0614f1134b8cda Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Thu, 29 Sep 2022 15:52:03 -0700
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.8000001FH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and
should be masked off.
Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220929225203.2234702-6-jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
[Clear NumVMPL too. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a0292ba650df..0810e93cbedc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1199,7 +1199,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
} else {
cpuid_entry_override(entry, CPUID_8000_001F_EAX);
-
+ /* Clear NumVMPL since KVM does not support VMPL. */
+ entry->ebx &= ~GENMASK(31, 12);
/*
* Enumerate '0' for "PA bits reduction", the adjusted
* MAXPHYADDR is enumerated directly (see 0x80000008).
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0469e56a14bf ("KVM: x86: Mask off reserved bits in CPUID.80000001H")
bd7919999047 ("KVM: x86: Override host CPUID results with kvm_cpu_caps")
09f628a0b49c ("KVM: x86: Fold CPUID 0x7 masking back into __do_cpuid_func()")
90d2f60f41f7 ("KVM: x86: Use KVM cpu caps to track UMIP emulation")
b3d895d5c415 ("KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update")
3ec6fd8cf0ba ("KVM: VMX: Convert feature updates from CPUID to KVM cpu caps")
9b58b9857f22 ("KVM: SVM: Convert feature updates from CPUID to KVM cpu caps")
66a6950f9995 ("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
9e6d01c2d908 ("KVM: x86: Refactor handling of XSAVES CPUID adjustment")
fb7d4377d513 ("KVM: x86: handle GBPAGE CPUID adjustment for EPT with generic code")
dbd068040c64 ("KVM: x86: Handle Intel PT CPUID adjustment in VMX code")
733deafc00df ("KVM: x86: Handle RDTSCP CPUID adjustment in VMX code")
d64d83d1e026 ("KVM: x86: Handle PKU CPUID adjustment in VMX code")
e574768f841b ("KVM: x86: Handle UMIP emulation CPUID adjustment in VMX code")
5ffec6f910dc ("KVM: x86: Handle INVPCID CPUID adjustment in VMX code")
6c7ea4b56bfe ("KVM: x86: Handle MPX CPUID adjustment in VMX code")
e745e37d4977 ("KVM: x86: Refactor cpuid_mask() to auto-retrieve the register")
b32666b13a72 ("KVM: x86: Introduce cpuid_entry_{change,set,clear}() mutators")
4c61534aaae2 ("KVM: x86: Introduce cpuid_entry_{get,has}() accessors")
5e12b2bb34e9 ("KVM: x86: Replace bare "unsigned" with "unsigned int" in cpuid helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0469e56a14bf8cfb80507e51b7aeec0332cdbc13 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Fri, 30 Sep 2022 00:51:58 +0200
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.80000001H
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.80000001:EBX[27:16] are reserved bits and
should be masked off.
Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7065462378e2..834feeb0a828 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1133,6 +1133,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = max(entry->eax, 0x80000021);
break;
case 0x80000001:
+ entry->ebx &= ~GENMASK(27, 16);
cpuid_entry_override(entry, CPUID_8000_0001_EDX);
cpuid_entry_override(entry, CPUID_8000_0001_ECX);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0469e56a14bf ("KVM: x86: Mask off reserved bits in CPUID.80000001H")
bd7919999047 ("KVM: x86: Override host CPUID results with kvm_cpu_caps")
09f628a0b49c ("KVM: x86: Fold CPUID 0x7 masking back into __do_cpuid_func()")
90d2f60f41f7 ("KVM: x86: Use KVM cpu caps to track UMIP emulation")
b3d895d5c415 ("KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update")
3ec6fd8cf0ba ("KVM: VMX: Convert feature updates from CPUID to KVM cpu caps")
9b58b9857f22 ("KVM: SVM: Convert feature updates from CPUID to KVM cpu caps")
66a6950f9995 ("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
9e6d01c2d908 ("KVM: x86: Refactor handling of XSAVES CPUID adjustment")
fb7d4377d513 ("KVM: x86: handle GBPAGE CPUID adjustment for EPT with generic code")
dbd068040c64 ("KVM: x86: Handle Intel PT CPUID adjustment in VMX code")
733deafc00df ("KVM: x86: Handle RDTSCP CPUID adjustment in VMX code")
d64d83d1e026 ("KVM: x86: Handle PKU CPUID adjustment in VMX code")
e574768f841b ("KVM: x86: Handle UMIP emulation CPUID adjustment in VMX code")
5ffec6f910dc ("KVM: x86: Handle INVPCID CPUID adjustment in VMX code")
6c7ea4b56bfe ("KVM: x86: Handle MPX CPUID adjustment in VMX code")
e745e37d4977 ("KVM: x86: Refactor cpuid_mask() to auto-retrieve the register")
b32666b13a72 ("KVM: x86: Introduce cpuid_entry_{change,set,clear}() mutators")
4c61534aaae2 ("KVM: x86: Introduce cpuid_entry_{get,has}() accessors")
5e12b2bb34e9 ("KVM: x86: Replace bare "unsigned" with "unsigned int" in cpuid helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0469e56a14bf8cfb80507e51b7aeec0332cdbc13 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Fri, 30 Sep 2022 00:51:58 +0200
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.80000001H
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.80000001:EBX[27:16] are reserved bits and
should be masked off.
Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7065462378e2..834feeb0a828 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1133,6 +1133,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = max(entry->eax, 0x80000021);
break;
case 0x80000001:
+ entry->ebx &= ~GENMASK(27, 16);
cpuid_entry_override(entry, CPUID_8000_0001_EDX);
cpuid_entry_override(entry, CPUID_8000_0001_ECX);
break;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0469e56a14bf ("KVM: x86: Mask off reserved bits in CPUID.80000001H")
bd7919999047 ("KVM: x86: Override host CPUID results with kvm_cpu_caps")
09f628a0b49c ("KVM: x86: Fold CPUID 0x7 masking back into __do_cpuid_func()")
90d2f60f41f7 ("KVM: x86: Use KVM cpu caps to track UMIP emulation")
b3d895d5c415 ("KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update")
3ec6fd8cf0ba ("KVM: VMX: Convert feature updates from CPUID to KVM cpu caps")
9b58b9857f22 ("KVM: SVM: Convert feature updates from CPUID to KVM cpu caps")
66a6950f9995 ("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
9e6d01c2d908 ("KVM: x86: Refactor handling of XSAVES CPUID adjustment")
fb7d4377d513 ("KVM: x86: handle GBPAGE CPUID adjustment for EPT with generic code")
dbd068040c64 ("KVM: x86: Handle Intel PT CPUID adjustment in VMX code")
733deafc00df ("KVM: x86: Handle RDTSCP CPUID adjustment in VMX code")
d64d83d1e026 ("KVM: x86: Handle PKU CPUID adjustment in VMX code")
e574768f841b ("KVM: x86: Handle UMIP emulation CPUID adjustment in VMX code")
5ffec6f910dc ("KVM: x86: Handle INVPCID CPUID adjustment in VMX code")
6c7ea4b56bfe ("KVM: x86: Handle MPX CPUID adjustment in VMX code")
e745e37d4977 ("KVM: x86: Refactor cpuid_mask() to auto-retrieve the register")
b32666b13a72 ("KVM: x86: Introduce cpuid_entry_{change,set,clear}() mutators")
4c61534aaae2 ("KVM: x86: Introduce cpuid_entry_{get,has}() accessors")
5e12b2bb34e9 ("KVM: x86: Replace bare "unsigned" with "unsigned int" in cpuid helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0469e56a14bf8cfb80507e51b7aeec0332cdbc13 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Fri, 30 Sep 2022 00:51:58 +0200
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.80000001H
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.80000001:EBX[27:16] are reserved bits and
should be masked off.
Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7065462378e2..834feeb0a828 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1133,6 +1133,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = max(entry->eax, 0x80000021);
break;
case 0x80000001:
+ entry->ebx &= ~GENMASK(27, 16);
cpuid_entry_override(entry, CPUID_8000_0001_EDX);
cpuid_entry_override(entry, CPUID_8000_0001_ECX);
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0469e56a14bf ("KVM: x86: Mask off reserved bits in CPUID.80000001H")
bd7919999047 ("KVM: x86: Override host CPUID results with kvm_cpu_caps")
09f628a0b49c ("KVM: x86: Fold CPUID 0x7 masking back into __do_cpuid_func()")
90d2f60f41f7 ("KVM: x86: Use KVM cpu caps to track UMIP emulation")
b3d895d5c415 ("KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update")
3ec6fd8cf0ba ("KVM: VMX: Convert feature updates from CPUID to KVM cpu caps")
9b58b9857f22 ("KVM: SVM: Convert feature updates from CPUID to KVM cpu caps")
66a6950f9995 ("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
9e6d01c2d908 ("KVM: x86: Refactor handling of XSAVES CPUID adjustment")
fb7d4377d513 ("KVM: x86: handle GBPAGE CPUID adjustment for EPT with generic code")
dbd068040c64 ("KVM: x86: Handle Intel PT CPUID adjustment in VMX code")
733deafc00df ("KVM: x86: Handle RDTSCP CPUID adjustment in VMX code")
d64d83d1e026 ("KVM: x86: Handle PKU CPUID adjustment in VMX code")
e574768f841b ("KVM: x86: Handle UMIP emulation CPUID adjustment in VMX code")
5ffec6f910dc ("KVM: x86: Handle INVPCID CPUID adjustment in VMX code")
6c7ea4b56bfe ("KVM: x86: Handle MPX CPUID adjustment in VMX code")
e745e37d4977 ("KVM: x86: Refactor cpuid_mask() to auto-retrieve the register")
b32666b13a72 ("KVM: x86: Introduce cpuid_entry_{change,set,clear}() mutators")
4c61534aaae2 ("KVM: x86: Introduce cpuid_entry_{get,has}() accessors")
5e12b2bb34e9 ("KVM: x86: Replace bare "unsigned" with "unsigned int" in cpuid helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0469e56a14bf8cfb80507e51b7aeec0332cdbc13 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Fri, 30 Sep 2022 00:51:58 +0200
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.80000001H
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.80000001:EBX[27:16] are reserved bits and
should be masked off.
Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7065462378e2..834feeb0a828 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1133,6 +1133,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax = max(entry->eax, 0x80000021);
break;
case 0x80000001:
+ entry->ebx &= ~GENMASK(27, 16);
cpuid_entry_override(entry, CPUID_8000_0001_EDX);
cpuid_entry_override(entry, CPUID_8000_0001_ECX);
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
079f6889818d ("KVM: x86: Mask off reserved bits in CPUID.8000001AH")
382409b4c43e ("kvm: x86: Include CPUID leaf 0x8000001e in kvm's supported CPUID")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 079f6889818dd07903fb36c252532ab47ebb6d48 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Thu, 29 Sep 2022 15:52:01 -0700
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.8000001AH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. In the case of CPUID.8000001AH, only three bits are
currently defined. The 125 reserved bits should be masked off.
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220929225203.2234702-4-jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 489c028859e1..a0292ba650df 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1189,6 +1189,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ecx = entry->edx = 0;
break;
case 0x8000001a:
+ entry->eax &= GENMASK(2, 0);
+ entry->ebx = entry->ecx = entry->edx = 0;
+ break;
case 0x8000001e:
break;
case 0x8000001F:
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
079f6889818d ("KVM: x86: Mask off reserved bits in CPUID.8000001AH")
382409b4c43e ("kvm: x86: Include CPUID leaf 0x8000001e in kvm's supported CPUID")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 079f6889818dd07903fb36c252532ab47ebb6d48 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Thu, 29 Sep 2022 15:52:01 -0700
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.8000001AH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. In the case of CPUID.8000001AH, only three bits are
currently defined. The 125 reserved bits should be masked off.
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220929225203.2234702-4-jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 489c028859e1..a0292ba650df 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1189,6 +1189,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ecx = entry->edx = 0;
break;
case 0x8000001a:
+ entry->eax &= GENMASK(2, 0);
+ entry->ebx = entry->ecx = entry->edx = 0;
+ break;
case 0x8000001e:
break;
case 0x8000001F:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
079f6889818d ("KVM: x86: Mask off reserved bits in CPUID.8000001AH")
382409b4c43e ("kvm: x86: Include CPUID leaf 0x8000001e in kvm's supported CPUID")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 079f6889818dd07903fb36c252532ab47ebb6d48 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Thu, 29 Sep 2022 15:52:01 -0700
Subject: [PATCH] KVM: x86: Mask off reserved bits in CPUID.8000001AH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. In the case of CPUID.8000001AH, only three bits are
currently defined. The 125 reserved bits should be masked off.
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid")
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220929225203.2234702-4-jmattson(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 489c028859e1..a0292ba650df 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1189,6 +1189,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ecx = entry->edx = 0;
break;
case 0x8000001a:
+ entry->eax &= GENMASK(2, 0);
+ entry->ebx = entry->ecx = entry->edx = 0;
+ break;
case 0x8000001e:
break;
case 0x8000001F:
The patch below was submitted to be applied to the 6.0-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 180418e2eb33be5c8d0b703c843e0ebc045aef80 Mon Sep 17 00:00:00 2001
From: Hou Wenlong <houwenlong.hwl(a)antgroup.com>
Date: Mon, 17 Oct 2022 11:06:10 +0800
Subject: [PATCH] KVM: debugfs: Return retval of simple_attr_open() if it fails
Although simple_attr_open() fails only with -ENOMEM with current code
base, it would be nicer to return retval of simple_attr_open() directly
in kvm_debugfs_open().
No functional change intended.
Signed-off-by: Hou Wenlong <houwenlong.hwl(a)antgroup.com>
Message-Id: <69d64d93accd1f33691b8a383ae555baee80f943.1665975828.git.houwenlong.hwl(a)antgroup.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedee..f1df24c2bc84 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5409,6 +5409,7 @@ static int kvm_debugfs_open(struct inode *inode, struct file *file,
int (*get)(void *, u64 *), int (*set)(void *, u64),
const char *fmt)
{
+ int ret;
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
inode->i_private;
@@ -5420,15 +5421,13 @@ static int kvm_debugfs_open(struct inode *inode, struct file *file,
if (!kvm_get_kvm_safe(stat_data->kvm))
return -ENOENT;
- if (simple_attr_open(inode, file, get,
- kvm_stats_debugfs_mode(stat_data->desc) & 0222
- ? set : NULL,
- fmt)) {
+ ret = simple_attr_open(inode, file, get,
+ kvm_stats_debugfs_mode(stat_data->desc) & 0222
+ ? set : NULL, fmt);
+ if (ret)
kvm_put_kvm(stat_data->kvm);
- return -ENOMEM;
- }
- return 0;
+ return ret;
}
static int kvm_debugfs_release(struct inode *inode, struct file *file)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b6ae0962b42 ("parisc: Avoid printing the hardware path twice")
4f80b70e1953 ("parisc: Use proper printk format for resource_size_t")
5e791d2e4785 ("parisc: Convert printk(KERN_LEVEL) to pr_lvl()")
0ae60d0c4f19 ("parisc: Show unhashed hardware inventory")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b6ae0962b421103feb41a80406732944b0665b3 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Fri, 28 Oct 2022 18:12:49 +0200
Subject: [PATCH] parisc: Avoid printing the hardware path twice
Avoid that the hardware path is shown twice in the kernel log, and clean
up the output of the version numbers to show up in the same order as
they are listed in the hardware database in the hardware.c file.
Additionally, optimize the memory footprint of the hardware database
and mark some code as init code.
Fixes: cab56b51ec0e ("parisc: Fix device names in /proc/iomem")
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: <stable(a)vger.kernel.org> # v4.9+
diff --git a/arch/parisc/include/asm/hardware.h b/arch/parisc/include/asm/hardware.h
index 9d3d7737c58b..a005ebc54779 100644
--- a/arch/parisc/include/asm/hardware.h
+++ b/arch/parisc/include/asm/hardware.h
@@ -10,12 +10,12 @@
#define SVERSION_ANY_ID PA_SVERSION_ANY_ID
struct hp_hardware {
- unsigned short hw_type:5; /* HPHW_xxx */
- unsigned short hversion;
- unsigned long sversion:28;
- unsigned short opt;
- const char name[80]; /* The hardware description */
-};
+ unsigned int hw_type:8; /* HPHW_xxx */
+ unsigned int hversion:12;
+ unsigned int sversion:12;
+ unsigned char opt;
+ unsigned char name[59]; /* The hardware description */
+} __packed;
struct parisc_device;
diff --git a/arch/parisc/kernel/drivers.c b/arch/parisc/kernel/drivers.c
index d126e78e101a..e7ee0c0c91d3 100644
--- a/arch/parisc/kernel/drivers.c
+++ b/arch/parisc/kernel/drivers.c
@@ -882,15 +882,13 @@ void __init walk_central_bus(void)
&root);
}
-static void print_parisc_device(struct parisc_device *dev)
+static __init void print_parisc_device(struct parisc_device *dev)
{
- char hw_path[64];
- static int count;
+ static int count __initdata;
- print_pa_hwpath(dev, hw_path);
- pr_info("%d. %s at %pap [%s] { %d, 0x%x, 0x%.3x, 0x%.5x }",
- ++count, dev->name, &(dev->hpa.start), hw_path, dev->id.hw_type,
- dev->id.hversion_rev, dev->id.hversion, dev->id.sversion);
+ pr_info("%d. %s at %pap { type:%d, hv:%#x, sv:%#x, rev:%#x }",
+ ++count, dev->name, &(dev->hpa.start), dev->id.hw_type,
+ dev->id.hversion, dev->id.sversion, dev->id.hversion_rev);
if (dev->num_addrs) {
int k;
@@ -1079,7 +1077,7 @@ static __init int qemu_print_iodc_data(struct device *lin_dev, void *data)
-static int print_one_device(struct device * dev, void * data)
+static __init int print_one_device(struct device * dev, void * data)
{
struct parisc_device * pdev = to_parisc_device(dev);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b6ae0962b42 ("parisc: Avoid printing the hardware path twice")
4f80b70e1953 ("parisc: Use proper printk format for resource_size_t")
5e791d2e4785 ("parisc: Convert printk(KERN_LEVEL) to pr_lvl()")
0ae60d0c4f19 ("parisc: Show unhashed hardware inventory")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b6ae0962b421103feb41a80406732944b0665b3 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Fri, 28 Oct 2022 18:12:49 +0200
Subject: [PATCH] parisc: Avoid printing the hardware path twice
Avoid that the hardware path is shown twice in the kernel log, and clean
up the output of the version numbers to show up in the same order as
they are listed in the hardware database in the hardware.c file.
Additionally, optimize the memory footprint of the hardware database
and mark some code as init code.
Fixes: cab56b51ec0e ("parisc: Fix device names in /proc/iomem")
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: <stable(a)vger.kernel.org> # v4.9+
diff --git a/arch/parisc/include/asm/hardware.h b/arch/parisc/include/asm/hardware.h
index 9d3d7737c58b..a005ebc54779 100644
--- a/arch/parisc/include/asm/hardware.h
+++ b/arch/parisc/include/asm/hardware.h
@@ -10,12 +10,12 @@
#define SVERSION_ANY_ID PA_SVERSION_ANY_ID
struct hp_hardware {
- unsigned short hw_type:5; /* HPHW_xxx */
- unsigned short hversion;
- unsigned long sversion:28;
- unsigned short opt;
- const char name[80]; /* The hardware description */
-};
+ unsigned int hw_type:8; /* HPHW_xxx */
+ unsigned int hversion:12;
+ unsigned int sversion:12;
+ unsigned char opt;
+ unsigned char name[59]; /* The hardware description */
+} __packed;
struct parisc_device;
diff --git a/arch/parisc/kernel/drivers.c b/arch/parisc/kernel/drivers.c
index d126e78e101a..e7ee0c0c91d3 100644
--- a/arch/parisc/kernel/drivers.c
+++ b/arch/parisc/kernel/drivers.c
@@ -882,15 +882,13 @@ void __init walk_central_bus(void)
&root);
}
-static void print_parisc_device(struct parisc_device *dev)
+static __init void print_parisc_device(struct parisc_device *dev)
{
- char hw_path[64];
- static int count;
+ static int count __initdata;
- print_pa_hwpath(dev, hw_path);
- pr_info("%d. %s at %pap [%s] { %d, 0x%x, 0x%.3x, 0x%.5x }",
- ++count, dev->name, &(dev->hpa.start), hw_path, dev->id.hw_type,
- dev->id.hversion_rev, dev->id.hversion, dev->id.sversion);
+ pr_info("%d. %s at %pap { type:%d, hv:%#x, sv:%#x, rev:%#x }",
+ ++count, dev->name, &(dev->hpa.start), dev->id.hw_type,
+ dev->id.hversion, dev->id.sversion, dev->id.hversion_rev);
if (dev->num_addrs) {
int k;
@@ -1079,7 +1077,7 @@ static __init int qemu_print_iodc_data(struct device *lin_dev, void *data)
-static int print_one_device(struct device * dev, void * data)
+static __init int print_one_device(struct device * dev, void * data)
{
struct parisc_device * pdev = to_parisc_device(dev);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
161a438d730d ("efi: random: reduce seed size to 32 bytes")
6120681bdf1a ("Merge branch 'efi/urgent' into efi/core, to pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 161a438d730dade2ba2b1bf8785f0759aba4ca5f Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 20 Oct 2022 10:39:08 +0200
Subject: [PATCH] efi: random: reduce seed size to 32 bytes
We no longer need at least 64 bytes of random seed to permit the early
crng init to complete. The RNG is now based on Blake2s, so reduce the
EFI seed size to the Blake2s hash size, which is sufficient for our
purposes.
While at it, drop the READ_ONCE(), which was supposed to prevent size
from being evaluated after seed was unmapped. However, this cannot
actually happen, so READ_ONCE() is unnecessary here.
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 3ecdc43a3f2b..a46df5d1d094 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -611,7 +611,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
seed = early_memremap(efi_rng_seed, sizeof(*seed));
if (seed != NULL) {
- size = READ_ONCE(seed->size);
+ size = min(seed->size, EFI_RANDOM_SEED_SIZE);
early_memunmap(seed, sizeof(*seed));
} else {
pr_err("Could not map UEFI random seed!\n");
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 80f3c1c7827d..929d559ad41d 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1222,7 +1222,7 @@ efi_status_t efi_random_get_seed(void);
arch_efi_call_virt_teardown(); \
})
-#define EFI_RANDOM_SEED_SIZE 64U
+#define EFI_RANDOM_SEED_SIZE 32U // BLAKE2S_HASH_SIZE
struct linux_efi_random_seed {
u32 size;
With use_codeword_fixup enabled, any return from
mtd_device_parse_register gets overwritten. Aside from the clear bug, this
is also problematic as a parser can EPROBE_DEFER and because this is not
correctly handled, the nand is never rescanned later in the bootup
process.
An example of this problem is when smem requires additional time to be
probed and nandc use qcomsmempart as parser. Parser will return
EPROBE_DEFER but in the current code this ret gets overwritten by
qcom_nand_host_parse_boot_partitions and qcom_nand_host_init_and_register
return 0.
Correctly handle the return code from mtd_device_parse_register so that
any error from this function is not ignored.
Fixes: 862bdedd7f4b ("mtd: nand: raw: qcom_nandc: add support for unprotected spare data pages")
Cc: stable(a)vger.kernel.org # v6.0+
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
drivers/mtd/nand/raw/qcom_nandc.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 8f80019a9f01..198a44794d2d 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -3167,16 +3167,18 @@ static int qcom_nand_host_init_and_register(struct qcom_nand_controller *nandc,
ret = mtd_device_parse_register(mtd, probes, NULL, NULL, 0);
if (ret)
- nand_cleanup(chip);
+ goto err;
if (nandc->props->use_codeword_fixup) {
ret = qcom_nand_host_parse_boot_partitions(nandc, host, dn);
- if (ret) {
- nand_cleanup(chip);
- return ret;
- }
+ if (ret)
+ goto err;
}
+ return 0;
+
+err:
+ nand_cleanup(chip);
return ret;
}
--
2.37.2
Humanitarian Assistance/Support
Your fund transfer is successfully completed!
Have you verified the money transferred to your bank account as a support from the UNHCR (Hong Kong Humanitarian Agency)?
Your name and email address was among the humanitarian assistance email extraction program. If you have not yet confirmed your money in your bank account, please confirm your payment through the collaborating bank (Iccrea Banca S.p.A. Istituto Centrale del Credito Cooperativo).
Email1: info(a)iccreabancn.org
Email2: ibspaicdc(a)gmail.com
Email3: ibsicdc(a)gmail.com
Guido Josef
Tel. +39 03 8829911
E-Mail: info(a)iccreabancn.org
E-Mail: customercare(a)iccreabancn.org
E-Mail: ibspaicdc(a)gmail.com
Forma giuridica: Societa Per Azioni Con Socio
Tipo di azienda: Sede Centrale P. Iva: 04774801007
____________________________________________________________
人道主义援助/支持
您的资金转让已成功完成!
您是否验证了转移到您的银行帐户的资金作为难民署(香港人道主义机构)的支持?
您的姓名和电子邮件地址是人道主义援助电子邮件提取计划之一。 如果您尚未在银行帐户中确认您的钱,请通过合作银行(Iccrea Banca S.P.A. Istituto Centrale del Credito Cooperativo)确认您的付款。
电子邮件1:info(a)iccreabancn.org
电子邮件2:ibspaicdc(a)gmail.com
电子邮件3:ibsicdc(a)gmail.com
Guido Josef
电话。 +39 03 8829911
电子邮件:info(a)iccreabancn.org
电子邮件:customercare(a)iccreabancn.org
电子邮件:ibspaicdc(a)gmail.com
朱里迪卡福音:azioni con社会社会
Tipo di Azienda:Sede Centrale P. IVA:04774801007
FILL_RETURN_BUFFER can access percpu data, therefore vmload of the
host save area must be executed first. First of all, move the
VMCB vmsave/vmload to assembly.
The idea on how to number the exception tables is stolen from
a prototype patch by Peter Zijlstra.
Cc: stable(a)vger.kernel.org
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/asm-offsets.c | 2 ++
arch/x86/kvm/svm/svm.c | 9 -------
arch/x86/kvm/svm/vmenter.S | 50 +++++++++++++++++++++++++++--------
3 files changed, 41 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 85de7e4fe59a..f01293a1e594 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -113,6 +113,8 @@ static void __used common(void)
if (IS_ENABLED(CONFIG_KVM_AMD)) {
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
+ OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
+ OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
}
if (IS_ENABLED(CONFIG_KVM_INTEL)) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4cfa62e66a0e..ae65cdcab660 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3924,16 +3924,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
} else {
struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
- /*
- * Use a single vmcb (vmcb01 because it's always valid) for
- * context switching guest state via VMLOAD/VMSAVE, that way
- * the state doesn't need to be copied between vmcb01 and
- * vmcb02 when switching vmcbs for nested virtualization.
- */
- vmload(svm->vmcb01.pa);
__svm_vcpu_run(vmcb_pa, svm);
- vmsave(svm->vmcb01.pa);
-
vmload(__sme_page_pa(sd->save_area));
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index dc558d0a589e..4709bc8868d7 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/linkage.h>
#include <asm/asm.h>
+#include <asm/asm-offsets.h>
#include <asm/bitsperlong.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/nospec-branch.h>
@@ -27,6 +28,8 @@
#define VCPU_R15 (SVM_vcpu_arch_regs + __VCPU_REGS_R15 * WORD_SIZE)
#endif
+#define SVM_vmcb01_pa (SVM_vmcb01 + KVM_VMCB_pa)
+
.section .noinstr.text, "ax"
/**
@@ -56,6 +59,16 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Move @svm to RDI. */
mov %_ASM_ARG2, %_ASM_DI
+ /*
+ * Use a single vmcb (vmcb01 because it's always valid) for
+ * context switching guest state via VMLOAD/VMSAVE, that way
+ * the state doesn't need to be copied between vmcb01 and
+ * vmcb02 when switching vmcbs for nested virtualization.
+ */
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+1: vmload %_ASM_AX
+2:
+
/* "POP" @vmcb to RAX. */
pop %_ASM_AX
@@ -80,16 +93,11 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Enter guest mode */
sti
-1: vmrun %_ASM_AX
-
-2: cli
-
-#ifdef CONFIG_RETPOLINE
- /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
- FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
-#endif
+3: vmrun %_ASM_AX
+4:
+ cli
- /* "POP" @svm to RAX. */
+ /* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
/* Save all guest registers. */
@@ -110,6 +118,18 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %r15, VCPU_R15(%_ASM_AX)
#endif
+ /* @svm can stay in RDI from now on. */
+ mov %_ASM_AX, %_ASM_DI
+
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+5: vmsave %_ASM_AX
+6:
+
+#ifdef CONFIG_RETPOLINE
+ /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
+ FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+#endif
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -159,11 +179,19 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
-3: cmpb $0, kvm_rebooting
+10: cmpb $0, kvm_rebooting
jne 2b
ud2
+30: cmpb $0, kvm_rebooting
+ jne 4b
+ ud2
+50: cmpb $0, kvm_rebooting
+ jne 6b
+ ud2
- _ASM_EXTABLE(1b, 3b)
+ _ASM_EXTABLE(1b, 10b)
+ _ASM_EXTABLE(3b, 30b)
+ _ASM_EXTABLE(5b, 50b)
SYM_FUNC_END(__svm_vcpu_run)
--
2.31.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4a6f278d4827 ("fuse: add file_modified() to fallocate")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4a6f278d4827b59ba26ceae0ff4529ee826aa258 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Fri, 28 Oct 2022 14:25:20 +0200
Subject: [PATCH] fuse: add file_modified() to fallocate
Add missing file_modified() call to fuse_file_fallocate(). Without this
fallocate on fuse failed to clear privileges.
Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1a3afd469e3a..71bfb663aac5 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3001,6 +3001,10 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
goto out;
}
+ err = file_modified(file);
+ if (err)
+ goto out;
+
if (!(mode & FALLOC_FL_KEEP_SIZE))
set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4a6f278d4827 ("fuse: add file_modified() to fallocate")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4a6f278d4827b59ba26ceae0ff4529ee826aa258 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Fri, 28 Oct 2022 14:25:20 +0200
Subject: [PATCH] fuse: add file_modified() to fallocate
Add missing file_modified() call to fuse_file_fallocate(). Without this
fallocate on fuse failed to clear privileges.
Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1a3afd469e3a..71bfb663aac5 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3001,6 +3001,10 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
goto out;
}
+ err = file_modified(file);
+ if (err)
+ goto out;
+
if (!(mode & FALLOC_FL_KEEP_SIZE))
set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4a6f278d4827 ("fuse: add file_modified() to fallocate")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4a6f278d4827b59ba26ceae0ff4529ee826aa258 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Fri, 28 Oct 2022 14:25:20 +0200
Subject: [PATCH] fuse: add file_modified() to fallocate
Add missing file_modified() call to fuse_file_fallocate(). Without this
fallocate on fuse failed to clear privileges.
Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1a3afd469e3a..71bfb663aac5 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3001,6 +3001,10 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
goto out;
}
+ err = file_modified(file);
+ if (err)
+ goto out;
+
if (!(mode & FALLOC_FL_KEEP_SIZE))
set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
161a438d730d ("efi: random: reduce seed size to 32 bytes")
6120681bdf1a ("Merge branch 'efi/urgent' into efi/core, to pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 161a438d730dade2ba2b1bf8785f0759aba4ca5f Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 20 Oct 2022 10:39:08 +0200
Subject: [PATCH] efi: random: reduce seed size to 32 bytes
We no longer need at least 64 bytes of random seed to permit the early
crng init to complete. The RNG is now based on Blake2s, so reduce the
EFI seed size to the Blake2s hash size, which is sufficient for our
purposes.
While at it, drop the READ_ONCE(), which was supposed to prevent size
from being evaluated after seed was unmapped. However, this cannot
actually happen, so READ_ONCE() is unnecessary here.
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 3ecdc43a3f2b..a46df5d1d094 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -611,7 +611,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
seed = early_memremap(efi_rng_seed, sizeof(*seed));
if (seed != NULL) {
- size = READ_ONCE(seed->size);
+ size = min(seed->size, EFI_RANDOM_SEED_SIZE);
early_memunmap(seed, sizeof(*seed));
} else {
pr_err("Could not map UEFI random seed!\n");
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 80f3c1c7827d..929d559ad41d 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1222,7 +1222,7 @@ efi_status_t efi_random_get_seed(void);
arch_efi_call_virt_teardown(); \
})
-#define EFI_RANDOM_SEED_SIZE 64U
+#define EFI_RANDOM_SEED_SIZE 32U // BLAKE2S_HASH_SIZE
struct linux_efi_random_seed {
u32 size;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7d866e38c7e9 ("efi: random: Use 'ACPI reclaim' memory for random seed")
966291f6344d ("efi/libstub: Rename efi_call_early/_runtime macros to be more intuitive")
99ea8b1db2d2 ("efi/libstub: Drop 'table' argument from efi_table_attr() macro")
47c0fd39b7b8 ("efi/libstub: Drop protocol argument from efi_call_proto() macro")
cd33a5c1d53e ("efi/libstub: Remove 'sys_table_arg' from all function prototypes")
8173ec7905b5 ("efi/libstub: Drop sys_table_arg from printk routines")
c3710de5065d ("efi/libstub/x86: Drop __efi_early() export and efi_config struct")
dc29da14ed94 ("efi/libstub: Unify the efi_char16_printk implementations")
2fcdad2a80a6 ("efi/libstub: Get rid of 'sys_table_arg' macro parameter")
afc4cc71cf78 ("efi/libstub/x86: Avoid thunking for native firmware calls")
960a8d01834e ("efi/libstub: Use stricter typing for firmware function pointers")
e8bd5ddf60ee ("efi/libstub: Drop explicit 32/64-bit protocol definitions")
f958efe97596 ("efi/libstub: Distinguish between native/mixed not 32/64 bit")
1786e8301164 ("efi/libstub: Extend native protocol definitions with mixed_mode aliases")
2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
58ec655a7573 ("efi/libstub: Remove unused __efi_call_early() macro")
8de8788d2182 ("efi/gop: Unify 32/64-bit functions")
44c84b4ada73 ("efi/gop: Convert GOP structures to typedef and clean up some types")
8d62af177812 ("efi/gop: Remove bogus packed attribute from GOP structures")
4911ee401b7c ("x86/efistub: Disable paging at mixed mode entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7d866e38c7e9ece8a096d0d098fa9d92b9d4f97e Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 20 Oct 2022 10:39:09 +0200
Subject: [PATCH] efi: random: Use 'ACPI reclaim' memory for random seed
EFI runtime services data is guaranteed to be preserved by the OS,
making it a suitable candidate for the EFI random seed table, which may
be passed to kexec kernels as well (after refreshing the seed), and so
we need to ensure that the memory is preserved without support from the
OS itself.
However, runtime services data is intended for allocations that are
relevant to the implementations of the runtime services themselves, and
so they are unmapped from the kernel linear map, and mapped into the EFI
page tables that are active while runtime service invocations are in
progress. None of this is needed for the RNG seed.
So let's switch to EFI 'ACPI reclaim' memory: in spite of the name,
there is nothing exclusively ACPI about it, it is simply a type of
allocation that carries firmware provided data which may or may not be
relevant to the OS, and it is left up to the OS to decide whether to
reclaim it after having consumed its contents.
Given that in Linux, we never reclaim these allocations, it is a good
choice for the EFI RNG seed, as the allocation is guaranteed to survive
kexec reboots.
One additional reason for changing this now is to align it with the
upcoming recommendation for EFI bootloader provided RNG seeds, which
must not use EFI runtime services code/data allocations.
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
index 24aa37535372..33ab56769595 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -75,7 +75,12 @@ efi_status_t efi_random_get_seed(void)
if (status != EFI_SUCCESS)
return status;
- status = efi_bs_call(allocate_pool, EFI_RUNTIME_SERVICES_DATA,
+ /*
+ * Use EFI_ACPI_RECLAIM_MEMORY here so that it is guaranteed that the
+ * allocation will survive a kexec reboot (although we refresh the seed
+ * beforehand)
+ */
+ status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY,
sizeof(*seed) + EFI_RANDOM_SEED_SIZE,
(void **)&seed);
if (status != EFI_SUCCESS)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7d866e38c7e9 ("efi: random: Use 'ACPI reclaim' memory for random seed")
966291f6344d ("efi/libstub: Rename efi_call_early/_runtime macros to be more intuitive")
99ea8b1db2d2 ("efi/libstub: Drop 'table' argument from efi_table_attr() macro")
47c0fd39b7b8 ("efi/libstub: Drop protocol argument from efi_call_proto() macro")
cd33a5c1d53e ("efi/libstub: Remove 'sys_table_arg' from all function prototypes")
8173ec7905b5 ("efi/libstub: Drop sys_table_arg from printk routines")
c3710de5065d ("efi/libstub/x86: Drop __efi_early() export and efi_config struct")
dc29da14ed94 ("efi/libstub: Unify the efi_char16_printk implementations")
2fcdad2a80a6 ("efi/libstub: Get rid of 'sys_table_arg' macro parameter")
afc4cc71cf78 ("efi/libstub/x86: Avoid thunking for native firmware calls")
960a8d01834e ("efi/libstub: Use stricter typing for firmware function pointers")
e8bd5ddf60ee ("efi/libstub: Drop explicit 32/64-bit protocol definitions")
f958efe97596 ("efi/libstub: Distinguish between native/mixed not 32/64 bit")
1786e8301164 ("efi/libstub: Extend native protocol definitions with mixed_mode aliases")
2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
58ec655a7573 ("efi/libstub: Remove unused __efi_call_early() macro")
8de8788d2182 ("efi/gop: Unify 32/64-bit functions")
44c84b4ada73 ("efi/gop: Convert GOP structures to typedef and clean up some types")
8d62af177812 ("efi/gop: Remove bogus packed attribute from GOP structures")
4911ee401b7c ("x86/efistub: Disable paging at mixed mode entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7d866e38c7e9ece8a096d0d098fa9d92b9d4f97e Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 20 Oct 2022 10:39:09 +0200
Subject: [PATCH] efi: random: Use 'ACPI reclaim' memory for random seed
EFI runtime services data is guaranteed to be preserved by the OS,
making it a suitable candidate for the EFI random seed table, which may
be passed to kexec kernels as well (after refreshing the seed), and so
we need to ensure that the memory is preserved without support from the
OS itself.
However, runtime services data is intended for allocations that are
relevant to the implementations of the runtime services themselves, and
so they are unmapped from the kernel linear map, and mapped into the EFI
page tables that are active while runtime service invocations are in
progress. None of this is needed for the RNG seed.
So let's switch to EFI 'ACPI reclaim' memory: in spite of the name,
there is nothing exclusively ACPI about it, it is simply a type of
allocation that carries firmware provided data which may or may not be
relevant to the OS, and it is left up to the OS to decide whether to
reclaim it after having consumed its contents.
Given that in Linux, we never reclaim these allocations, it is a good
choice for the EFI RNG seed, as the allocation is guaranteed to survive
kexec reboots.
One additional reason for changing this now is to align it with the
upcoming recommendation for EFI bootloader provided RNG seeds, which
must not use EFI runtime services code/data allocations.
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
index 24aa37535372..33ab56769595 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -75,7 +75,12 @@ efi_status_t efi_random_get_seed(void)
if (status != EFI_SUCCESS)
return status;
- status = efi_bs_call(allocate_pool, EFI_RUNTIME_SERVICES_DATA,
+ /*
+ * Use EFI_ACPI_RECLAIM_MEMORY here so that it is guaranteed that the
+ * allocation will survive a kexec reboot (although we refresh the seed
+ * beforehand)
+ */
+ status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY,
sizeof(*seed) + EFI_RANDOM_SEED_SIZE,
(void **)&seed);
if (status != EFI_SUCCESS)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7d866e38c7e9 ("efi: random: Use 'ACPI reclaim' memory for random seed")
966291f6344d ("efi/libstub: Rename efi_call_early/_runtime macros to be more intuitive")
99ea8b1db2d2 ("efi/libstub: Drop 'table' argument from efi_table_attr() macro")
47c0fd39b7b8 ("efi/libstub: Drop protocol argument from efi_call_proto() macro")
cd33a5c1d53e ("efi/libstub: Remove 'sys_table_arg' from all function prototypes")
8173ec7905b5 ("efi/libstub: Drop sys_table_arg from printk routines")
c3710de5065d ("efi/libstub/x86: Drop __efi_early() export and efi_config struct")
dc29da14ed94 ("efi/libstub: Unify the efi_char16_printk implementations")
2fcdad2a80a6 ("efi/libstub: Get rid of 'sys_table_arg' macro parameter")
afc4cc71cf78 ("efi/libstub/x86: Avoid thunking for native firmware calls")
960a8d01834e ("efi/libstub: Use stricter typing for firmware function pointers")
e8bd5ddf60ee ("efi/libstub: Drop explicit 32/64-bit protocol definitions")
f958efe97596 ("efi/libstub: Distinguish between native/mixed not 32/64 bit")
1786e8301164 ("efi/libstub: Extend native protocol definitions with mixed_mode aliases")
2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
58ec655a7573 ("efi/libstub: Remove unused __efi_call_early() macro")
8de8788d2182 ("efi/gop: Unify 32/64-bit functions")
44c84b4ada73 ("efi/gop: Convert GOP structures to typedef and clean up some types")
8d62af177812 ("efi/gop: Remove bogus packed attribute from GOP structures")
4911ee401b7c ("x86/efistub: Disable paging at mixed mode entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7d866e38c7e9ece8a096d0d098fa9d92b9d4f97e Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 20 Oct 2022 10:39:09 +0200
Subject: [PATCH] efi: random: Use 'ACPI reclaim' memory for random seed
EFI runtime services data is guaranteed to be preserved by the OS,
making it a suitable candidate for the EFI random seed table, which may
be passed to kexec kernels as well (after refreshing the seed), and so
we need to ensure that the memory is preserved without support from the
OS itself.
However, runtime services data is intended for allocations that are
relevant to the implementations of the runtime services themselves, and
so they are unmapped from the kernel linear map, and mapped into the EFI
page tables that are active while runtime service invocations are in
progress. None of this is needed for the RNG seed.
So let's switch to EFI 'ACPI reclaim' memory: in spite of the name,
there is nothing exclusively ACPI about it, it is simply a type of
allocation that carries firmware provided data which may or may not be
relevant to the OS, and it is left up to the OS to decide whether to
reclaim it after having consumed its contents.
Given that in Linux, we never reclaim these allocations, it is a good
choice for the EFI RNG seed, as the allocation is guaranteed to survive
kexec reboots.
One additional reason for changing this now is to align it with the
upcoming recommendation for EFI bootloader provided RNG seeds, which
must not use EFI runtime services code/data allocations.
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
index 24aa37535372..33ab56769595 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -75,7 +75,12 @@ efi_status_t efi_random_get_seed(void)
if (status != EFI_SUCCESS)
return status;
- status = efi_bs_call(allocate_pool, EFI_RUNTIME_SERVICES_DATA,
+ /*
+ * Use EFI_ACPI_RECLAIM_MEMORY here so that it is guaranteed that the
+ * allocation will survive a kexec reboot (although we refresh the seed
+ * beforehand)
+ */
+ status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY,
sizeof(*seed) + EFI_RANDOM_SEED_SIZE,
(void **)&seed);
if (status != EFI_SUCCESS)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9fa248c65bdb ("fuse: fix readdir cache race")
5fe0fc9f1de6 ("fuse: use kmap_local_page()")
9ac29fd3f87f ("fuse: move ioctl to separate source file")
5d069dbe8aaf ("fuse: fix bad inode")
fcee216beb9c ("fuse: split fuse_mount off of fuse_conn")
8f622e9497bb ("fuse: drop fuse_conn parameter where possible")
24754db2728a ("fuse: store fuse_conn in fuse_req")
9a752d18c85a ("virtiofs: add logic to free up a memory range")
d0cfb9dcbca6 ("virtiofs: maintain a list of busy elements")
6ae330cad6ef ("virtiofs: serialize truncate/punch_hole and dax fault path")
2a9a609a0c4a ("virtiofs: add DAX mmap support")
c2d0ad00d948 ("virtiofs: implement dax read/write operations")
ceec02d4354a ("virtiofs: introduce setupmapping/removemapping commands")
fd1a1dc6f5aa ("virtiofs: implement FUSE_INIT map_alignment field")
45f2348eceb6 ("virtiofs: keep a list of free dax memory ranges")
1dd539577c42 ("virtiofs: add a mount option to enable dax")
f4fd4ae354ba ("virtiofs: get rid of no_mount_options")
31070f6ccec0 ("fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS")
69a6487ac0ea ("fuse: move rb_erase() before tree_insert()")
5b14671be58d ("Merge tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9fa248c65bdbf5af0a2f74dd38575acfc8dfd2bf Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 20 Oct 2022 17:18:58 +0200
Subject: [PATCH] fuse: fix readdir cache race
There's a race in fuse's readdir cache that can result in an uninitilized
page being read. The page lock is supposed to prevent this from happening
but in the following case it doesn't:
Two fuse_add_dirent_to_cache() start out and get the same parameters
(size=0,offset=0). One of them wins the race to create and lock the page,
after which it fills in data, sets rdc.size and unlocks the page.
In the meantime the page gets evicted from the cache before the other
instance gets to run. That one also creates the page, but finds the
size to be mismatched, bails out and leaves the uninitialized page in the
cache.
Fix by marking a filled page uptodate and ignoring non-uptodate pages.
Reported-by: Frank Sorenson <fsorenso(a)redhat.com>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: <stable(a)vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index b4e565711045..e8deaacf1832 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -77,8 +77,10 @@ static void fuse_add_dirent_to_cache(struct file *file,
goto unlock;
addr = kmap_local_page(page);
- if (!offset)
+ if (!offset) {
clear_page(addr);
+ SetPageUptodate(page);
+ }
memcpy(addr + offset, dirent, reclen);
kunmap_local(addr);
fi->rdc.size = (index << PAGE_SHIFT) + offset + reclen;
@@ -516,6 +518,12 @@ static int fuse_readdir_cached(struct file *file, struct dir_context *ctx)
page = find_get_page_flags(file->f_mapping, index,
FGP_ACCESSED | FGP_LOCK);
+ /* Page gone missing, then re-added to cache, but not initialized? */
+ if (page && !PageUptodate(page)) {
+ unlock_page(page);
+ put_page(page);
+ page = NULL;
+ }
spin_lock(&fi->rdc.lock);
if (!page) {
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9fa248c65bdb ("fuse: fix readdir cache race")
5fe0fc9f1de6 ("fuse: use kmap_local_page()")
9ac29fd3f87f ("fuse: move ioctl to separate source file")
5d069dbe8aaf ("fuse: fix bad inode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9fa248c65bdbf5af0a2f74dd38575acfc8dfd2bf Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 20 Oct 2022 17:18:58 +0200
Subject: [PATCH] fuse: fix readdir cache race
There's a race in fuse's readdir cache that can result in an uninitilized
page being read. The page lock is supposed to prevent this from happening
but in the following case it doesn't:
Two fuse_add_dirent_to_cache() start out and get the same parameters
(size=0,offset=0). One of them wins the race to create and lock the page,
after which it fills in data, sets rdc.size and unlocks the page.
In the meantime the page gets evicted from the cache before the other
instance gets to run. That one also creates the page, but finds the
size to be mismatched, bails out and leaves the uninitialized page in the
cache.
Fix by marking a filled page uptodate and ignoring non-uptodate pages.
Reported-by: Frank Sorenson <fsorenso(a)redhat.com>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: <stable(a)vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index b4e565711045..e8deaacf1832 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -77,8 +77,10 @@ static void fuse_add_dirent_to_cache(struct file *file,
goto unlock;
addr = kmap_local_page(page);
- if (!offset)
+ if (!offset) {
clear_page(addr);
+ SetPageUptodate(page);
+ }
memcpy(addr + offset, dirent, reclen);
kunmap_local(addr);
fi->rdc.size = (index << PAGE_SHIFT) + offset + reclen;
@@ -516,6 +518,12 @@ static int fuse_readdir_cached(struct file *file, struct dir_context *ctx)
page = find_get_page_flags(file->f_mapping, index,
FGP_ACCESSED | FGP_LOCK);
+ /* Page gone missing, then re-added to cache, but not initialized? */
+ if (page && !PageUptodate(page)) {
+ unlock_page(page);
+ put_page(page);
+ page = NULL;
+ }
spin_lock(&fi->rdc.lock);
if (!page) {
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9fa248c65bdb ("fuse: fix readdir cache race")
5fe0fc9f1de6 ("fuse: use kmap_local_page()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9fa248c65bdbf5af0a2f74dd38575acfc8dfd2bf Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 20 Oct 2022 17:18:58 +0200
Subject: [PATCH] fuse: fix readdir cache race
There's a race in fuse's readdir cache that can result in an uninitilized
page being read. The page lock is supposed to prevent this from happening
but in the following case it doesn't:
Two fuse_add_dirent_to_cache() start out and get the same parameters
(size=0,offset=0). One of them wins the race to create and lock the page,
after which it fills in data, sets rdc.size and unlocks the page.
In the meantime the page gets evicted from the cache before the other
instance gets to run. That one also creates the page, but finds the
size to be mismatched, bails out and leaves the uninitialized page in the
cache.
Fix by marking a filled page uptodate and ignoring non-uptodate pages.
Reported-by: Frank Sorenson <fsorenso(a)redhat.com>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: <stable(a)vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index b4e565711045..e8deaacf1832 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -77,8 +77,10 @@ static void fuse_add_dirent_to_cache(struct file *file,
goto unlock;
addr = kmap_local_page(page);
- if (!offset)
+ if (!offset) {
clear_page(addr);
+ SetPageUptodate(page);
+ }
memcpy(addr + offset, dirent, reclen);
kunmap_local(addr);
fi->rdc.size = (index << PAGE_SHIFT) + offset + reclen;
@@ -516,6 +518,12 @@ static int fuse_readdir_cached(struct file *file, struct dir_context *ctx)
page = find_get_page_flags(file->f_mapping, index,
FGP_ACCESSED | FGP_LOCK);
+ /* Page gone missing, then re-added to cache, but not initialized? */
+ if (page && !PageUptodate(page)) {
+ unlock_page(page);
+ put_page(page);
+ page = NULL;
+ }
spin_lock(&fi->rdc.lock);
if (!page) {
/*
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8f279fb00bb2 ("udp: advertise ipv6 udp support for msghdr::ubuf_info")
d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().")
21985f43376c ("udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8f279fb00bb29def9ac79e28c5d6d8e07d21f3fb Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 27 Oct 2022 00:25:56 +0100
Subject: [PATCH] udp: advertise ipv6 udp support for msghdr::ubuf_info
Mark udp ipv6 as supporting msghdr::ubuf_info. In the original commit
SOCK_SUPPORT_ZC was supposed to be set by a udp_init_sock() call from
udp6_init_sock(), but
d38afeec26ed4 ("tcp/udp: Call inet6_destroy_sock() in IPv6 ...")
removed it and so ipv6 udp misses the flag.
Cc: <stable(a)vger.kernel.org> # 6.0
Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 129ec5a9b0eb..bc65e5b7195b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -66,6 +66,7 @@ int udpv6_init_sock(struct sock *sk)
{
skb_queue_head_init(&udp_sk(sk)->reader_queue);
sk->sk_destruct = udpv6_destruct_sock;
+ set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
return 0;
}
If the intention is to overwrite the first NULL with a -1, s[strlen(s)]
is the first NULL, not s[strlen(s)+1].
Cc: Gabriel Krisman Bertazi <krisman(a)collabora.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
fs/unicode/mkutf8data.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/unicode/mkutf8data.c b/fs/unicode/mkutf8data.c
index bc1a7c8b5c8d..61800e0d3226 100644
--- a/fs/unicode/mkutf8data.c
+++ b/fs/unicode/mkutf8data.c
@@ -3194,7 +3194,7 @@ static int normalize_line(struct tree *tree)
/* Second test: length-limited string. */
s = buf2;
/* Replace NUL with a value that will cause an error if seen. */
- s[strlen(s) + 1] = -1;
+ s[strlen(s)] = -1;
t = buf3;
if (utf8cursor(&u8c, tree, s))
return -1;
--
2.38.1
This bug is marked as fixed by commit:
ext4: Avoid crash when inline data creation follows DIO write
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
The read access to struct canxl_frame::len inside of a j1939 created
skbuff revealed a missing initialization of reserved and later filled
elements in struct can_frame.
This patch initializes the 8 byte CAN header with zero.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Cc: Oleksij Rempel <o.rempel(a)pengutronix.de>
Link: https://lore.kernel.org/linux-can/20221104052235.GA6474@pengutronix.de
Reported-by: syzbot+d168ec0caca4697e03b1(a)syzkaller.appspotmail.com
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/all/20221104075000.105414-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/j1939/main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index 144c86b0e3ff..821d4ff303b3 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -336,6 +336,9 @@ int j1939_send_one(struct j1939_priv *priv, struct sk_buff *skb)
/* re-claim the CAN_HDR from the SKB */
cf = skb_push(skb, J1939_CAN_HDR);
+ /* initialize header structure */
+ memset(cf, 0, J1939_CAN_HDR);
+
/* make it a full can frame again */
skb_put(skb, J1939_CAN_FTR + (8 - dlc));
--
2.35.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0e792b89e680 ("ftrace: Fix use-after-free for dynamic ftrace_ops")
fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
6171a0310a06 ("ftrace/kallsyms: Have /proc/kallsyms show saved mod init functions")
aba4b5c22cba ("ftrace: Save module init functions kallsyms symbols for tracing")
3e234289f86b ("ftrace: Allow module init functions to be traced")
6cafbe159416 ("ftrace: Add a ftrace_free_mem() function for modules to use")
edb096e00724 ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter")
d7fbf8df7ca0 ("ftrace: Implement cached modules tracing on module load")
673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
d0ba52f1d764 ("ftrace: Add missing comment for FTRACE_OPS_FL_RCU")
04ec7bb642b7 ("tracing: Have the trace_array hold the list of registered func probes")
eee8ded131f1 ("ftrace: Have the function probes call their own function")
1ec3a81a0cf4 ("ftrace: Have each function probe use its own ftrace_ops")
d3d532d798c5 ("ftrace: Have unregister_ftrace_function_probe_func() return a value")
1a48df0041c2 ("ftrace: Remove data field from ftrace_func_probe structure")
02b77e2afb49 ("ftrace: Remove printing of data in showing of a function probe")
78f78e07d51e ("ftrace: Remove unused unregister_ftrace_function_probe_all() function")
0fe7e7e3f839 ("ftrace: Remove unused unregister_ftrace_function_probe() function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 Mon Sep 17 00:00:00 2001
From: Li Huafei <lihuafei1(a)huawei.com>
Date: Thu, 3 Nov 2022 11:10:10 +0800
Subject: [PATCH] ftrace: Fix use-after-free for dynamic ftrace_ops
KASAN reported a use-after-free with ftrace ops [1]. It was found from
vmcore that perf had registered two ops with the same content
successively, both dynamic. After unregistering the second ops, a
use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the
FTRACE_UPDATE_CALLS command is not set because there is another enabled
ops with the same content. Also, both ops are dynamic and the ftrace
callback function is ftrace_ops_list_func, so the
FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value
of 'command' will be 0 and ftrace_shutdown() will skip the rcu
synchronization.
However, ftrace may be activated. When the ops is released, another CPU
may be accessing the ops. Add the missing synchronization to fix this
problem.
[1]
BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b4/0x248 lib/dump_stack.c:118
print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387
__kasan_report mm/kasan/report.c:547 [inline]
kasan_report+0x118/0x210 mm/kasan/report.c:564
check_memory_region_inline mm/kasan/generic.c:187 [inline]
__asan_load8+0x98/0xc0 mm/kasan/generic.c:253
__ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
ftrace_graph_call+0x0/0x4
__might_sleep+0x8/0x100 include/linux/perf_event.h:1170
__might_fault mm/memory.c:5183 [inline]
__might_fault+0x58/0x70 mm/memory.c:5171
do_strncpy_from_user lib/strncpy_from_user.c:41 [inline]
strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139
getname_flags+0xb0/0x31c fs/namei.c:149
getname+0x2c/0x40 fs/namei.c:209
[...]
Allocated by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:479 [inline]
__kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449
kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493
kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950
kmalloc include/linux/slab.h:563 [inline]
kzalloc include/linux/slab.h:675 [inline]
perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
__arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723
[...]
Freed by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track+0x24/0x34 mm/kasan/common.c:56
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358
__kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437
__kasan_slab_free mm/kasan/common.c:445 [inline]
kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446
slab_free_hook mm/slub.c:1569 [inline]
slab_free_freelist_hook mm/slub.c:1608 [inline]
slab_free mm/slub.c:3179 [inline]
kfree+0x12c/0xc10 mm/slub.c:4176
perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
[...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei…
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
Cc: stable(a)vger.kernel.org
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Li Huafei <lihuafei1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fbf2543111c0..7dc023641bf1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3028,18 +3028,8 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
command |= FTRACE_UPDATE_TRACE_FUNC;
}
- if (!command || !ftrace_enabled) {
- /*
- * If these are dynamic or per_cpu ops, they still
- * need their data freed. Since, function tracing is
- * not currently active, we can just free them
- * without synchronizing all CPUs.
- */
- if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
- goto free_ops;
-
- return 0;
- }
+ if (!command || !ftrace_enabled)
+ goto out;
/*
* If the ops uses a trampoline, then it needs to be
@@ -3076,6 +3066,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
removed_ops = NULL;
ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out:
/*
* Dynamic ops may be freed, we must make sure that all
* callers are done before leaving this function.
@@ -3103,7 +3094,6 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
if (IS_ENABLED(CONFIG_PREEMPTION))
synchronize_rcu_tasks();
- free_ops:
ftrace_trampoline_free(ops);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0e792b89e680 ("ftrace: Fix use-after-free for dynamic ftrace_ops")
fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
6171a0310a06 ("ftrace/kallsyms: Have /proc/kallsyms show saved mod init functions")
aba4b5c22cba ("ftrace: Save module init functions kallsyms symbols for tracing")
3e234289f86b ("ftrace: Allow module init functions to be traced")
6cafbe159416 ("ftrace: Add a ftrace_free_mem() function for modules to use")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 Mon Sep 17 00:00:00 2001
From: Li Huafei <lihuafei1(a)huawei.com>
Date: Thu, 3 Nov 2022 11:10:10 +0800
Subject: [PATCH] ftrace: Fix use-after-free for dynamic ftrace_ops
KASAN reported a use-after-free with ftrace ops [1]. It was found from
vmcore that perf had registered two ops with the same content
successively, both dynamic. After unregistering the second ops, a
use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the
FTRACE_UPDATE_CALLS command is not set because there is another enabled
ops with the same content. Also, both ops are dynamic and the ftrace
callback function is ftrace_ops_list_func, so the
FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value
of 'command' will be 0 and ftrace_shutdown() will skip the rcu
synchronization.
However, ftrace may be activated. When the ops is released, another CPU
may be accessing the ops. Add the missing synchronization to fix this
problem.
[1]
BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b4/0x248 lib/dump_stack.c:118
print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387
__kasan_report mm/kasan/report.c:547 [inline]
kasan_report+0x118/0x210 mm/kasan/report.c:564
check_memory_region_inline mm/kasan/generic.c:187 [inline]
__asan_load8+0x98/0xc0 mm/kasan/generic.c:253
__ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
ftrace_graph_call+0x0/0x4
__might_sleep+0x8/0x100 include/linux/perf_event.h:1170
__might_fault mm/memory.c:5183 [inline]
__might_fault+0x58/0x70 mm/memory.c:5171
do_strncpy_from_user lib/strncpy_from_user.c:41 [inline]
strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139
getname_flags+0xb0/0x31c fs/namei.c:149
getname+0x2c/0x40 fs/namei.c:209
[...]
Allocated by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:479 [inline]
__kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449
kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493
kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950
kmalloc include/linux/slab.h:563 [inline]
kzalloc include/linux/slab.h:675 [inline]
perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
__arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723
[...]
Freed by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track+0x24/0x34 mm/kasan/common.c:56
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358
__kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437
__kasan_slab_free mm/kasan/common.c:445 [inline]
kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446
slab_free_hook mm/slub.c:1569 [inline]
slab_free_freelist_hook mm/slub.c:1608 [inline]
slab_free mm/slub.c:3179 [inline]
kfree+0x12c/0xc10 mm/slub.c:4176
perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
[...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei…
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
Cc: stable(a)vger.kernel.org
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Li Huafei <lihuafei1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fbf2543111c0..7dc023641bf1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3028,18 +3028,8 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
command |= FTRACE_UPDATE_TRACE_FUNC;
}
- if (!command || !ftrace_enabled) {
- /*
- * If these are dynamic or per_cpu ops, they still
- * need their data freed. Since, function tracing is
- * not currently active, we can just free them
- * without synchronizing all CPUs.
- */
- if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
- goto free_ops;
-
- return 0;
- }
+ if (!command || !ftrace_enabled)
+ goto out;
/*
* If the ops uses a trampoline, then it needs to be
@@ -3076,6 +3066,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
removed_ops = NULL;
ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out:
/*
* Dynamic ops may be freed, we must make sure that all
* callers are done before leaving this function.
@@ -3103,7 +3094,6 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
if (IS_ENABLED(CONFIG_PREEMPTION))
synchronize_rcu_tasks();
- free_ops:
ftrace_trampoline_free(ops);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0e792b89e680 ("ftrace: Fix use-after-free for dynamic ftrace_ops")
fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 Mon Sep 17 00:00:00 2001
From: Li Huafei <lihuafei1(a)huawei.com>
Date: Thu, 3 Nov 2022 11:10:10 +0800
Subject: [PATCH] ftrace: Fix use-after-free for dynamic ftrace_ops
KASAN reported a use-after-free with ftrace ops [1]. It was found from
vmcore that perf had registered two ops with the same content
successively, both dynamic. After unregistering the second ops, a
use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the
FTRACE_UPDATE_CALLS command is not set because there is another enabled
ops with the same content. Also, both ops are dynamic and the ftrace
callback function is ftrace_ops_list_func, so the
FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value
of 'command' will be 0 and ftrace_shutdown() will skip the rcu
synchronization.
However, ftrace may be activated. When the ops is released, another CPU
may be accessing the ops. Add the missing synchronization to fix this
problem.
[1]
BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b4/0x248 lib/dump_stack.c:118
print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387
__kasan_report mm/kasan/report.c:547 [inline]
kasan_report+0x118/0x210 mm/kasan/report.c:564
check_memory_region_inline mm/kasan/generic.c:187 [inline]
__asan_load8+0x98/0xc0 mm/kasan/generic.c:253
__ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
ftrace_graph_call+0x0/0x4
__might_sleep+0x8/0x100 include/linux/perf_event.h:1170
__might_fault mm/memory.c:5183 [inline]
__might_fault+0x58/0x70 mm/memory.c:5171
do_strncpy_from_user lib/strncpy_from_user.c:41 [inline]
strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139
getname_flags+0xb0/0x31c fs/namei.c:149
getname+0x2c/0x40 fs/namei.c:209
[...]
Allocated by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:479 [inline]
__kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449
kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493
kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950
kmalloc include/linux/slab.h:563 [inline]
kzalloc include/linux/slab.h:675 [inline]
perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
__arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723
[...]
Freed by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track+0x24/0x34 mm/kasan/common.c:56
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358
__kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437
__kasan_slab_free mm/kasan/common.c:445 [inline]
kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446
slab_free_hook mm/slub.c:1569 [inline]
slab_free_freelist_hook mm/slub.c:1608 [inline]
slab_free mm/slub.c:3179 [inline]
kfree+0x12c/0xc10 mm/slub.c:4176
perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
[...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei…
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
Cc: stable(a)vger.kernel.org
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Li Huafei <lihuafei1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fbf2543111c0..7dc023641bf1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3028,18 +3028,8 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
command |= FTRACE_UPDATE_TRACE_FUNC;
}
- if (!command || !ftrace_enabled) {
- /*
- * If these are dynamic or per_cpu ops, they still
- * need their data freed. Since, function tracing is
- * not currently active, we can just free them
- * without synchronizing all CPUs.
- */
- if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
- goto free_ops;
-
- return 0;
- }
+ if (!command || !ftrace_enabled)
+ goto out;
/*
* If the ops uses a trampoline, then it needs to be
@@ -3076,6 +3066,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
removed_ops = NULL;
ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out:
/*
* Dynamic ops may be freed, we must make sure that all
* callers are done before leaving this function.
@@ -3103,7 +3094,6 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
if (IS_ENABLED(CONFIG_PREEMPTION))
synchronize_rcu_tasks();
- free_ops:
ftrace_trampoline_free(ops);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0e792b89e680 ("ftrace: Fix use-after-free for dynamic ftrace_ops")
fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 Mon Sep 17 00:00:00 2001
From: Li Huafei <lihuafei1(a)huawei.com>
Date: Thu, 3 Nov 2022 11:10:10 +0800
Subject: [PATCH] ftrace: Fix use-after-free for dynamic ftrace_ops
KASAN reported a use-after-free with ftrace ops [1]. It was found from
vmcore that perf had registered two ops with the same content
successively, both dynamic. After unregistering the second ops, a
use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the
FTRACE_UPDATE_CALLS command is not set because there is another enabled
ops with the same content. Also, both ops are dynamic and the ftrace
callback function is ftrace_ops_list_func, so the
FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value
of 'command' will be 0 and ftrace_shutdown() will skip the rcu
synchronization.
However, ftrace may be activated. When the ops is released, another CPU
may be accessing the ops. Add the missing synchronization to fix this
problem.
[1]
BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b4/0x248 lib/dump_stack.c:118
print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387
__kasan_report mm/kasan/report.c:547 [inline]
kasan_report+0x118/0x210 mm/kasan/report.c:564
check_memory_region_inline mm/kasan/generic.c:187 [inline]
__asan_load8+0x98/0xc0 mm/kasan/generic.c:253
__ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
ftrace_graph_call+0x0/0x4
__might_sleep+0x8/0x100 include/linux/perf_event.h:1170
__might_fault mm/memory.c:5183 [inline]
__might_fault+0x58/0x70 mm/memory.c:5171
do_strncpy_from_user lib/strncpy_from_user.c:41 [inline]
strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139
getname_flags+0xb0/0x31c fs/namei.c:149
getname+0x2c/0x40 fs/namei.c:209
[...]
Allocated by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:479 [inline]
__kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449
kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493
kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950
kmalloc include/linux/slab.h:563 [inline]
kzalloc include/linux/slab.h:675 [inline]
perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
__arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723
[...]
Freed by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track+0x24/0x34 mm/kasan/common.c:56
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358
__kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437
__kasan_slab_free mm/kasan/common.c:445 [inline]
kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446
slab_free_hook mm/slub.c:1569 [inline]
slab_free_freelist_hook mm/slub.c:1608 [inline]
slab_free mm/slub.c:3179 [inline]
kfree+0x12c/0xc10 mm/slub.c:4176
perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
[...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei…
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
Cc: stable(a)vger.kernel.org
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Li Huafei <lihuafei1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fbf2543111c0..7dc023641bf1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3028,18 +3028,8 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
command |= FTRACE_UPDATE_TRACE_FUNC;
}
- if (!command || !ftrace_enabled) {
- /*
- * If these are dynamic or per_cpu ops, they still
- * need their data freed. Since, function tracing is
- * not currently active, we can just free them
- * without synchronizing all CPUs.
- */
- if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
- goto free_ops;
-
- return 0;
- }
+ if (!command || !ftrace_enabled)
+ goto out;
/*
* If the ops uses a trampoline, then it needs to be
@@ -3076,6 +3066,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
removed_ops = NULL;
ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out:
/*
* Dynamic ops may be freed, we must make sure that all
* callers are done before leaving this function.
@@ -3103,7 +3094,6 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
if (IS_ENABLED(CONFIG_PREEMPTION))
synchronize_rcu_tasks();
- free_ops:
ftrace_trampoline_free(ops);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
968b71583130 ("btrfs: fix tree mod log mishandling of reallocated nodes")
888dd183390d ("btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()")
485df7555425 ("btrfs: always pin deleted leaves when there are active tree mod log users")
d3575156f662 ("btrfs: zoned: redirty released extent buffers")
169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
a94794d50d78 ("btrfs: zoned: calculate allocation offset for conventional zones")
08e11a3db098 ("btrfs: zoned: load zone's allocation offset")
1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
760f991f1428 ("btrfs: make attach_extent_buffer_page() handle subpage case")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
c0f0a9e71653 ("btrfs: introduce helper to grab an existing extent buffer from a page")
997e3e2e71b3 ("btrfs: only mark bg->needs_free_space if free space tree is on")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
862931c76327 ("btrfs: introduce max_zone_append_size")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
fb22e9c4cd57 ("btrfs: use detach_page_private() in alloc_extent_buffer()")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
0d01e247a06b ("btrfs: assert page mapping lock in attach_extent_buffer_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 968b71583130b6104c9f33ba60446d598e327a8b Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 14 Oct 2022 08:52:46 -0400
Subject: [PATCH] btrfs: fix tree mod log mishandling of reallocated nodes
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our existing extent buffer.
5. Since we're using our existing extent buffer (logical 0) we call
tree_mod_log_search(0) in order to get the newest change to start the
rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8.
6. Again since we didn't find an old_root we simply clone logical 0 at
it's current state.
7. We call tree_mod_log_rewind() with the cloned extent buffer.
8. Set n = btrfs_header_nritems(logical 0), which would be whatever the
original nritems was when we COWed the original root, say for this
example it's 2.
9. We start from the newest operation and work our way forward, so we
see LOG_ROOT_REPLACE which we ignore.
10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the
BUG_ON(tm->slot < n), because it expects if we've done this we have a
completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then
get the old_root set to logical 8192. In fact making that change fixes
this particular problem.
However consider the much more complicated case. We have a child node
in this tree and the above situation. In the above case we freed one
of the child blocks at the seq 3 operation. If this block was also
re-allocated and got new tree mod log operations we would have a
different problem. btrfs_search_old_slot(orig root) would get down to
the logical 0 root that still pointed at that node. However in
btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This
is not context aware enough to know which operations we should be
replaying. If the block was re-allocated multiple times we may only
want to replay a range of operations, and determining what that range is
isn't possible to determine.
We could maybe solve this by keeping track of which root the node
belonged to at every tree mod log operation, and then passing this
around to make sure we're only replaying operations that relate to the
root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow
reallocations if we have currently running tree mod log users. We
already do this for leaf's, so we're simply expanding this to nodes as
well. This is a relatively uncommon occurrence, and the problem is
complicated enough I'm worried that we will still have corner cases in
the reallocation case. So fix this in the most straightforward way
possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions")
CC: stable(a)vger.kernel.org # 3.3+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cd2d36580f1a..2801c991814f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3295,21 +3295,22 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
}
/*
- * If this is a leaf and there are tree mod log users, we may
- * have recorded mod log operations that point to this leaf.
- * So we must make sure no one reuses this leaf's extent before
- * mod log operations are applied to a node, otherwise after
- * rewinding a node using the mod log operations we get an
- * inconsistent btree, as the leaf's extent may now be used as
- * a node or leaf for another different btree.
+ * If there are tree mod log users we may have recorded mod log
+ * operations for this node. If we re-allocate this node we
+ * could replay operations on this node that happened when it
+ * existed in a completely different root. For example if it
+ * was part of root A, then was reallocated to root B, and we
+ * are doing a btrfs_old_search_slot(root b), we could replay
+ * operations that happened when the block was part of root A,
+ * giving us an inconsistent view of the btree.
+ *
* We are safe from races here because at this point no other
* node or root points to this extent buffer, so if after this
- * check a new tree mod log user joins, it will not be able to
- * find a node pointing to this leaf and record operations that
- * point to this leaf.
+ * check a new tree mod log user joins we will not have an
+ * existing log of operations on this node that we have to
+ * contend with.
*/
- if (btrfs_header_level(buf) == 0 &&
- test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
+ if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
968b71583130 ("btrfs: fix tree mod log mishandling of reallocated nodes")
888dd183390d ("btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()")
485df7555425 ("btrfs: always pin deleted leaves when there are active tree mod log users")
d3575156f662 ("btrfs: zoned: redirty released extent buffers")
169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
a94794d50d78 ("btrfs: zoned: calculate allocation offset for conventional zones")
08e11a3db098 ("btrfs: zoned: load zone's allocation offset")
1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
760f991f1428 ("btrfs: make attach_extent_buffer_page() handle subpage case")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
c0f0a9e71653 ("btrfs: introduce helper to grab an existing extent buffer from a page")
997e3e2e71b3 ("btrfs: only mark bg->needs_free_space if free space tree is on")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
862931c76327 ("btrfs: introduce max_zone_append_size")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
fb22e9c4cd57 ("btrfs: use detach_page_private() in alloc_extent_buffer()")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
0d01e247a06b ("btrfs: assert page mapping lock in attach_extent_buffer_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 968b71583130b6104c9f33ba60446d598e327a8b Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 14 Oct 2022 08:52:46 -0400
Subject: [PATCH] btrfs: fix tree mod log mishandling of reallocated nodes
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our existing extent buffer.
5. Since we're using our existing extent buffer (logical 0) we call
tree_mod_log_search(0) in order to get the newest change to start the
rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8.
6. Again since we didn't find an old_root we simply clone logical 0 at
it's current state.
7. We call tree_mod_log_rewind() with the cloned extent buffer.
8. Set n = btrfs_header_nritems(logical 0), which would be whatever the
original nritems was when we COWed the original root, say for this
example it's 2.
9. We start from the newest operation and work our way forward, so we
see LOG_ROOT_REPLACE which we ignore.
10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the
BUG_ON(tm->slot < n), because it expects if we've done this we have a
completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then
get the old_root set to logical 8192. In fact making that change fixes
this particular problem.
However consider the much more complicated case. We have a child node
in this tree and the above situation. In the above case we freed one
of the child blocks at the seq 3 operation. If this block was also
re-allocated and got new tree mod log operations we would have a
different problem. btrfs_search_old_slot(orig root) would get down to
the logical 0 root that still pointed at that node. However in
btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This
is not context aware enough to know which operations we should be
replaying. If the block was re-allocated multiple times we may only
want to replay a range of operations, and determining what that range is
isn't possible to determine.
We could maybe solve this by keeping track of which root the node
belonged to at every tree mod log operation, and then passing this
around to make sure we're only replaying operations that relate to the
root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow
reallocations if we have currently running tree mod log users. We
already do this for leaf's, so we're simply expanding this to nodes as
well. This is a relatively uncommon occurrence, and the problem is
complicated enough I'm worried that we will still have corner cases in
the reallocation case. So fix this in the most straightforward way
possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions")
CC: stable(a)vger.kernel.org # 3.3+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cd2d36580f1a..2801c991814f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3295,21 +3295,22 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
}
/*
- * If this is a leaf and there are tree mod log users, we may
- * have recorded mod log operations that point to this leaf.
- * So we must make sure no one reuses this leaf's extent before
- * mod log operations are applied to a node, otherwise after
- * rewinding a node using the mod log operations we get an
- * inconsistent btree, as the leaf's extent may now be used as
- * a node or leaf for another different btree.
+ * If there are tree mod log users we may have recorded mod log
+ * operations for this node. If we re-allocate this node we
+ * could replay operations on this node that happened when it
+ * existed in a completely different root. For example if it
+ * was part of root A, then was reallocated to root B, and we
+ * are doing a btrfs_old_search_slot(root b), we could replay
+ * operations that happened when the block was part of root A,
+ * giving us an inconsistent view of the btree.
+ *
* We are safe from races here because at this point no other
* node or root points to this extent buffer, so if after this
- * check a new tree mod log user joins, it will not be able to
- * find a node pointing to this leaf and record operations that
- * point to this leaf.
+ * check a new tree mod log user joins we will not have an
+ * existing log of operations on this node that we have to
+ * contend with.
*/
- if (btrfs_header_level(buf) == 0 &&
- test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
+ if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
968b71583130 ("btrfs: fix tree mod log mishandling of reallocated nodes")
888dd183390d ("btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()")
485df7555425 ("btrfs: always pin deleted leaves when there are active tree mod log users")
d3575156f662 ("btrfs: zoned: redirty released extent buffers")
169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
a94794d50d78 ("btrfs: zoned: calculate allocation offset for conventional zones")
08e11a3db098 ("btrfs: zoned: load zone's allocation offset")
1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
760f991f1428 ("btrfs: make attach_extent_buffer_page() handle subpage case")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
c0f0a9e71653 ("btrfs: introduce helper to grab an existing extent buffer from a page")
997e3e2e71b3 ("btrfs: only mark bg->needs_free_space if free space tree is on")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
862931c76327 ("btrfs: introduce max_zone_append_size")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
fb22e9c4cd57 ("btrfs: use detach_page_private() in alloc_extent_buffer()")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
0d01e247a06b ("btrfs: assert page mapping lock in attach_extent_buffer_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 968b71583130b6104c9f33ba60446d598e327a8b Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 14 Oct 2022 08:52:46 -0400
Subject: [PATCH] btrfs: fix tree mod log mishandling of reallocated nodes
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our existing extent buffer.
5. Since we're using our existing extent buffer (logical 0) we call
tree_mod_log_search(0) in order to get the newest change to start the
rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8.
6. Again since we didn't find an old_root we simply clone logical 0 at
it's current state.
7. We call tree_mod_log_rewind() with the cloned extent buffer.
8. Set n = btrfs_header_nritems(logical 0), which would be whatever the
original nritems was when we COWed the original root, say for this
example it's 2.
9. We start from the newest operation and work our way forward, so we
see LOG_ROOT_REPLACE which we ignore.
10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the
BUG_ON(tm->slot < n), because it expects if we've done this we have a
completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then
get the old_root set to logical 8192. In fact making that change fixes
this particular problem.
However consider the much more complicated case. We have a child node
in this tree and the above situation. In the above case we freed one
of the child blocks at the seq 3 operation. If this block was also
re-allocated and got new tree mod log operations we would have a
different problem. btrfs_search_old_slot(orig root) would get down to
the logical 0 root that still pointed at that node. However in
btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This
is not context aware enough to know which operations we should be
replaying. If the block was re-allocated multiple times we may only
want to replay a range of operations, and determining what that range is
isn't possible to determine.
We could maybe solve this by keeping track of which root the node
belonged to at every tree mod log operation, and then passing this
around to make sure we're only replaying operations that relate to the
root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow
reallocations if we have currently running tree mod log users. We
already do this for leaf's, so we're simply expanding this to nodes as
well. This is a relatively uncommon occurrence, and the problem is
complicated enough I'm worried that we will still have corner cases in
the reallocation case. So fix this in the most straightforward way
possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions")
CC: stable(a)vger.kernel.org # 3.3+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cd2d36580f1a..2801c991814f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3295,21 +3295,22 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
}
/*
- * If this is a leaf and there are tree mod log users, we may
- * have recorded mod log operations that point to this leaf.
- * So we must make sure no one reuses this leaf's extent before
- * mod log operations are applied to a node, otherwise after
- * rewinding a node using the mod log operations we get an
- * inconsistent btree, as the leaf's extent may now be used as
- * a node or leaf for another different btree.
+ * If there are tree mod log users we may have recorded mod log
+ * operations for this node. If we re-allocate this node we
+ * could replay operations on this node that happened when it
+ * existed in a completely different root. For example if it
+ * was part of root A, then was reallocated to root B, and we
+ * are doing a btrfs_old_search_slot(root b), we could replay
+ * operations that happened when the block was part of root A,
+ * giving us an inconsistent view of the btree.
+ *
* We are safe from races here because at this point no other
* node or root points to this extent buffer, so if after this
- * check a new tree mod log user joins, it will not be able to
- * find a node pointing to this leaf and record operations that
- * point to this leaf.
+ * check a new tree mod log user joins we will not have an
+ * existing log of operations on this node that we have to
+ * contend with.
*/
- if (btrfs_header_level(buf) == 0 &&
- test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
+ if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
968b71583130 ("btrfs: fix tree mod log mishandling of reallocated nodes")
888dd183390d ("btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()")
485df7555425 ("btrfs: always pin deleted leaves when there are active tree mod log users")
d3575156f662 ("btrfs: zoned: redirty released extent buffers")
169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
a94794d50d78 ("btrfs: zoned: calculate allocation offset for conventional zones")
08e11a3db098 ("btrfs: zoned: load zone's allocation offset")
1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
760f991f1428 ("btrfs: make attach_extent_buffer_page() handle subpage case")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
c0f0a9e71653 ("btrfs: introduce helper to grab an existing extent buffer from a page")
997e3e2e71b3 ("btrfs: only mark bg->needs_free_space if free space tree is on")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
862931c76327 ("btrfs: introduce max_zone_append_size")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
fb22e9c4cd57 ("btrfs: use detach_page_private() in alloc_extent_buffer()")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
0d01e247a06b ("btrfs: assert page mapping lock in attach_extent_buffer_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 968b71583130b6104c9f33ba60446d598e327a8b Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 14 Oct 2022 08:52:46 -0400
Subject: [PATCH] btrfs: fix tree mod log mishandling of reallocated nodes
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our existing extent buffer.
5. Since we're using our existing extent buffer (logical 0) we call
tree_mod_log_search(0) in order to get the newest change to start the
rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8.
6. Again since we didn't find an old_root we simply clone logical 0 at
it's current state.
7. We call tree_mod_log_rewind() with the cloned extent buffer.
8. Set n = btrfs_header_nritems(logical 0), which would be whatever the
original nritems was when we COWed the original root, say for this
example it's 2.
9. We start from the newest operation and work our way forward, so we
see LOG_ROOT_REPLACE which we ignore.
10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the
BUG_ON(tm->slot < n), because it expects if we've done this we have a
completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then
get the old_root set to logical 8192. In fact making that change fixes
this particular problem.
However consider the much more complicated case. We have a child node
in this tree and the above situation. In the above case we freed one
of the child blocks at the seq 3 operation. If this block was also
re-allocated and got new tree mod log operations we would have a
different problem. btrfs_search_old_slot(orig root) would get down to
the logical 0 root that still pointed at that node. However in
btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This
is not context aware enough to know which operations we should be
replaying. If the block was re-allocated multiple times we may only
want to replay a range of operations, and determining what that range is
isn't possible to determine.
We could maybe solve this by keeping track of which root the node
belonged to at every tree mod log operation, and then passing this
around to make sure we're only replaying operations that relate to the
root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow
reallocations if we have currently running tree mod log users. We
already do this for leaf's, so we're simply expanding this to nodes as
well. This is a relatively uncommon occurrence, and the problem is
complicated enough I'm worried that we will still have corner cases in
the reallocation case. So fix this in the most straightforward way
possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions")
CC: stable(a)vger.kernel.org # 3.3+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cd2d36580f1a..2801c991814f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3295,21 +3295,22 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
}
/*
- * If this is a leaf and there are tree mod log users, we may
- * have recorded mod log operations that point to this leaf.
- * So we must make sure no one reuses this leaf's extent before
- * mod log operations are applied to a node, otherwise after
- * rewinding a node using the mod log operations we get an
- * inconsistent btree, as the leaf's extent may now be used as
- * a node or leaf for another different btree.
+ * If there are tree mod log users we may have recorded mod log
+ * operations for this node. If we re-allocate this node we
+ * could replay operations on this node that happened when it
+ * existed in a completely different root. For example if it
+ * was part of root A, then was reallocated to root B, and we
+ * are doing a btrfs_old_search_slot(root b), we could replay
+ * operations that happened when the block was part of root A,
+ * giving us an inconsistent view of the btree.
+ *
* We are safe from races here because at this point no other
* node or root points to this extent buffer, so if after this
- * check a new tree mod log user joins, it will not be able to
- * find a node pointing to this leaf and record operations that
- * point to this leaf.
+ * check a new tree mod log user joins we will not have an
+ * existing log of operations on this node that we have to
+ * contend with.
*/
- if (btrfs_header_level(buf) == 0 &&
- test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
+ if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
968b71583130 ("btrfs: fix tree mod log mishandling of reallocated nodes")
888dd183390d ("btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()")
485df7555425 ("btrfs: always pin deleted leaves when there are active tree mod log users")
d3575156f662 ("btrfs: zoned: redirty released extent buffers")
169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
a94794d50d78 ("btrfs: zoned: calculate allocation offset for conventional zones")
08e11a3db098 ("btrfs: zoned: load zone's allocation offset")
1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
760f991f1428 ("btrfs: make attach_extent_buffer_page() handle subpage case")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
c0f0a9e71653 ("btrfs: introduce helper to grab an existing extent buffer from a page")
997e3e2e71b3 ("btrfs: only mark bg->needs_free_space if free space tree is on")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
862931c76327 ("btrfs: introduce max_zone_append_size")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
fb22e9c4cd57 ("btrfs: use detach_page_private() in alloc_extent_buffer()")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
0d01e247a06b ("btrfs: assert page mapping lock in attach_extent_buffer_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 968b71583130b6104c9f33ba60446d598e327a8b Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 14 Oct 2022 08:52:46 -0400
Subject: [PATCH] btrfs: fix tree mod log mishandling of reallocated nodes
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our existing extent buffer.
5. Since we're using our existing extent buffer (logical 0) we call
tree_mod_log_search(0) in order to get the newest change to start the
rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8.
6. Again since we didn't find an old_root we simply clone logical 0 at
it's current state.
7. We call tree_mod_log_rewind() with the cloned extent buffer.
8. Set n = btrfs_header_nritems(logical 0), which would be whatever the
original nritems was when we COWed the original root, say for this
example it's 2.
9. We start from the newest operation and work our way forward, so we
see LOG_ROOT_REPLACE which we ignore.
10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the
BUG_ON(tm->slot < n), because it expects if we've done this we have a
completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then
get the old_root set to logical 8192. In fact making that change fixes
this particular problem.
However consider the much more complicated case. We have a child node
in this tree and the above situation. In the above case we freed one
of the child blocks at the seq 3 operation. If this block was also
re-allocated and got new tree mod log operations we would have a
different problem. btrfs_search_old_slot(orig root) would get down to
the logical 0 root that still pointed at that node. However in
btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This
is not context aware enough to know which operations we should be
replaying. If the block was re-allocated multiple times we may only
want to replay a range of operations, and determining what that range is
isn't possible to determine.
We could maybe solve this by keeping track of which root the node
belonged to at every tree mod log operation, and then passing this
around to make sure we're only replaying operations that relate to the
root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow
reallocations if we have currently running tree mod log users. We
already do this for leaf's, so we're simply expanding this to nodes as
well. This is a relatively uncommon occurrence, and the problem is
complicated enough I'm worried that we will still have corner cases in
the reallocation case. So fix this in the most straightforward way
possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions")
CC: stable(a)vger.kernel.org # 3.3+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cd2d36580f1a..2801c991814f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3295,21 +3295,22 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
}
/*
- * If this is a leaf and there are tree mod log users, we may
- * have recorded mod log operations that point to this leaf.
- * So we must make sure no one reuses this leaf's extent before
- * mod log operations are applied to a node, otherwise after
- * rewinding a node using the mod log operations we get an
- * inconsistent btree, as the leaf's extent may now be used as
- * a node or leaf for another different btree.
+ * If there are tree mod log users we may have recorded mod log
+ * operations for this node. If we re-allocate this node we
+ * could replay operations on this node that happened when it
+ * existed in a completely different root. For example if it
+ * was part of root A, then was reallocated to root B, and we
+ * are doing a btrfs_old_search_slot(root b), we could replay
+ * operations that happened when the block was part of root A,
+ * giving us an inconsistent view of the btree.
+ *
* We are safe from races here because at this point no other
* node or root points to this extent buffer, so if after this
- * check a new tree mod log user joins, it will not be able to
- * find a node pointing to this leaf and record operations that
- * point to this leaf.
+ * check a new tree mod log user joins we will not have an
+ * existing log of operations on this node that we have to
+ * contend with.
*/
- if (btrfs_header_level(buf) == 0 &&
- test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
+ if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags))
must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
From: Filipe Manana <fdmanana(a)suse.com>
commit 8184620ae21213d51eaf2e0bd4186baacb928172 upstream.
When doing a direct IO write using a iocb with nowait and dsync set, we
end up not syncing the file once the write completes.
This is because we tell iomap to not call generic_write_sync(), which
would result in calling btrfs_sync_file(), in order to avoid a deadlock
since iomap can call it while we are holding the inode's lock and
btrfs_sync_file() needs to acquire the inode's lock. The deadlock happens
only if the write happens synchronously, when iomap_dio_rw() calls
iomap_dio_complete() before it returns. Instead we do the sync ourselves
at btrfs_do_write_iter().
For a nowait write however we can end up not doing the sync ourselves at
at btrfs_do_write_iter() because the write could have been queued, and
therefore we get -EIOCBQUEUED returned from iomap in such case. That makes
us skip the sync call at btrfs_do_write_iter(), as we don't do it for
any error returned from btrfs_direct_write(). We can't simply do the call
even if -EIOCBQUEUED is returned, since that would block the task waiting
for IO, both for the data since there are bios still in progress as well
as potentially blocking when joining a log transaction and when syncing
the log (writing log trees, super blocks, etc).
So let iomap do the sync call itself and in order to avoid deadlocks for
the case of synchronous writes (without nowait), use __iomap_dio_rw() and
have ourselves call iomap_dio_complete() after unlocking the inode.
A test case will later be sent for fstests, after this is fixed in Linus'
tree.
Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes")
Reported-by: Марк Коренберг <socketpair(a)gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAEmTpZGRKbzc16fWPvxbr6AfFsQoLmz-Lcg-7O…
CC: stable(a)vger.kernel.org # 6.0+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
The commit in the Fixes tag was backported to 5.15 stable releases, so
this patch is meant for 5.15.x and was tested on top of 5.15.77.
fs/btrfs/file.c | 39 ++++++++++++++++-----------------------
1 file changed, 16 insertions(+), 23 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1c597cd6c024..90934711dcf0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1906,7 +1906,6 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
{
- const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC);
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1917,6 +1916,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
loff_t endbyte;
ssize_t err;
unsigned int ilock_flags = 0;
+ struct iomap_dio *dio;
if (iocb->ki_flags & IOCB_NOWAIT)
ilock_flags |= BTRFS_ILOCK_TRY;
@@ -1959,15 +1959,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
goto buffered;
}
- /*
- * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw()
- * calls generic_write_sync() (through iomap_dio_complete()), because
- * that results in calling fsync (btrfs_sync_file()) which will try to
- * lock the inode in exclusive/write mode.
- */
- if (is_sync_write)
- iocb->ki_flags &= ~IOCB_DSYNC;
-
/*
* The iov_iter can be mapped to the same file range we are writing to.
* If that's the case, then we will deadlock in the iomap code, because
@@ -1986,12 +1977,23 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* So here we disable page faults in the iov_iter and then retry if we
* got -EFAULT, faulting in the pages before the retry.
*/
-again:
from->nofault = true;
- err = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
- IOMAP_DIO_PARTIAL, written);
+ dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
+ IOMAP_DIO_PARTIAL, written);
from->nofault = false;
+ /*
+ * iomap_dio_complete() will call btrfs_sync_file() if we have a dsync
+ * iocb, and that needs to lock the inode. So unlock it before calling
+ * iomap_dio_complete() to avoid a deadlock.
+ */
+ btrfs_inode_unlock(inode, ilock_flags);
+
+ if (IS_ERR_OR_NULL(dio))
+ err = PTR_ERR_OR_ZERO(dio);
+ else
+ err = iomap_dio_complete(dio);
+
/* No increment (+=) because iomap returns a cumulative value. */
if (err > 0)
written = err;
@@ -2017,19 +2019,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
} else {
fault_in_iov_iter_readable(from, left);
prev_left = left;
- goto again;
+ goto relock;
}
}
- btrfs_inode_unlock(inode, ilock_flags);
-
- /*
- * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do
- * the fsync (call generic_write_sync()).
- */
- if (is_sync_write)
- iocb->ki_flags |= IOCB_DSYNC;
-
/* If 'err' is -ENOTBLK then it means we must fallback to buffered IO. */
if ((err < 0 && err != -ENOTBLK) || !iov_iter_count(from))
goto out;
--
2.34.1
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
Cc: Gilles BULOZ <gilles.buloz(a)kontron.com>
drivers/tty/serial/8250/8250_port.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
--
2.30.2
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake")
Cc: <stable(a)vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter
Reported-by: Wentong Wu <wentong.wu(a)intel.com>
Co-developed-by: Srikanth Thokala <srikanth.thokala(a)intel.com>
Signed-off-by: Srikanth Thokala <srikanth.thokala(a)intel.com>
Co-developed-by: Aman Kumar <aman.kumar(a)intel.com>
Signed-off-by: Aman Kumar <aman.kumar(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
I know the list of Co-dev-bys & Sob seems a bit odd for a oneliner.
The reason is that I cleaned up this from a more complex patch using
the earlier change that I authored myself so only this oneliner
remained in this patch.
---
drivers/tty/serial/8250/8250_lpss.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_lpss.c b/drivers/tty/serial/8250/8250_lpss.c
index ed281445a97d..e0b4e1446eac 100644
--- a/drivers/tty/serial/8250/8250_lpss.c
+++ b/drivers/tty/serial/8250/8250_lpss.c
@@ -174,6 +174,8 @@ static int ehl_serial_setup(struct lpss8250 *lpss, struct uart_port *port)
*/
up->dma = dma;
+ lpss->dma_maxburst = 16;
+
port->set_termios = dw8250_do_set_termios;
return 0;
--
2.30.2
DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT
should have been triggered instead. Since IIR_RDI has higher priority
than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop.
The problem seems to occur at least with some combinations of
small-sized transfers (I've reproduced the problem on Elkhart Lake PSE
UARTs).
If there's already an on-going Rx DMA and IIR_RDI triggers, fall
graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had
occurred.
8250_omap already considers IIR_RDI similar to this change so its
nothing unheard of.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Co-developed-by: Srikanth Thokala <srikanth.thokala(a)intel.com>
Signed-off-by: Srikanth Thokala <srikanth.thokala(a)intel.com>
Co-developed-by: Aman Kumar <aman.kumar(a)intel.com>
Signed-off-by: Aman Kumar <aman.kumar(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/tty/serial/8250/8250_port.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fe8662cd9402..92dd18716169 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1897,6 +1897,10 @@ EXPORT_SYMBOL_GPL(serial8250_modem_status);
static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
{
switch (iir & 0x3f) {
+ case UART_IIR_RDI:
+ if (!up->dma->rx_running)
+ break;
+ fallthrough;
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
fallthrough;
--
2.30.2
On Tue, 01 Nov 2022, Rob Herring wrote:
> It's been a while since the last sync and Lee needs commit 73590342fc85
> ("libfdt: prevent integer overflow in fdt_next_tag").
>
> This adds the following commits from upstream:
>
> 55778a03df61 libfdt: tests: add get_next_tag_invalid_prop_len
> 73590342fc85 libfdt: prevent integer overflow in fdt_next_tag
> 035fb90d5375 libfdt: add fdt_get_property_by_offset_w helper
> 98a07006c48d Makefile: fix infinite recursion by dropping non-existent `%.output`
> a036cc7b0c10 Makefile: limit make re-execution to avoid infinite spin
> c6e92108bcd9 libdtc: remove duplicate judgments
> e37c25677dc9 Don't generate erroneous fixups from reference to path
> 50454658f2b5 libfdt: Don't mask fdt_get_name() returned error
> e64a204196c9 manual.txt: Follow README.md and remove Jon
> f508c83fe6f0 Update README in MANIFEST.in and setup.py to README.md
> c2ccf8a77dd2 Add description of Signed-off-by lines
> 90b9d9de42ca Split out information for contributors to CONTRIBUTING.md
> 0ee1d479b23a Remove Jon Loeliger from maintainers list
> b33a73c62c1c Convert README to README.md
> 7ad60734b1c1 Allow static building with meson
> fd9b8c96c780 Allow static building with make
> fda71da26e7f libfdt: Handle failed get_name() on BEGIN_NODE
> c7c7f17a83d5 Fix test script to run also on dash shell
> 01f23ffe1679 Add missing relref_merge test to meson test list
> ed310803ea89 pylibfdt: add FdtRo.get_path()
> c001fc01a43e pylibfdt: fix swig build in install
> 26c54f840d23 tests: add test cases for label-relative path references
> ec7986e682cf dtc: introduce label relative path references
> 651410e54cb9 util: introduce xstrndup helper
> 4048aed12b81 setup.py: fix out of tree build
> ff5afb96d0c0 Handle integer overflow in check_property_phandle_args()
> ca7294434309 README: Explain how to add a new API function
> c0c2e115f82e Fix a UB when fdt_get_string return null
> cd5f69cbc0d4 tests: setprop_inplace: use xstrdup instead of unchecked strdup
> a04f69025003 pylibfdt: add Property.as_*int*_array()
> 83102717d7c4 pylibfdt: add Property.as_stringlist()
> d152126bb029 Fix Python crash on getprop deallocation
> 17739b7ef510 Support 'r' format for printing raw bytes with fdtget
> 45f3d1a095dd libfdt: overlay: make overlay_get_target() public
> c19a4bafa514 libfdt: fix an incorrect integer promotion
> 1cc41b1c969f pylibfdt: Add packaging metadata
> db72398cd437 README: Update pylibfdt install instructions
> 383e148b70a4 pylibfdt: fix with Python 3.10
> 23b56cb7e189 pylibfdt: Move setup.py to the top level
> 69a760747d8d pylibfdt: Split setup.py author name and email
> 0b106a77dbdc pylibfdt: Use setuptools_scm for the version
> c691776ddb26 pylibfdt: Use setuptools instead of distutils
> 5216f3f1bbb7 libfdt: Add static lib to meson build
> 4eda2590f481 CI: Cirrus: bump used FreeBSD from 12.1 to 13.0
At least one of these patches fixes security concerns.
Could we also have this in Stable please?
> Signed-off-by: Rob Herring <robh(a)kernel.org>
> ---
> scripts/dtc/checks.c | 15 +++++++-----
> scripts/dtc/dtc-lexer.l | 2 +-
> scripts/dtc/dtc-parser.y | 13 ++++++++++
> scripts/dtc/libfdt/fdt.c | 20 +++++++++------
> scripts/dtc/libfdt/fdt.h | 4 +--
> scripts/dtc/libfdt/fdt_addresses.c | 2 +-
> scripts/dtc/libfdt/fdt_overlay.c | 29 ++++++----------------
> scripts/dtc/libfdt/fdt_ro.c | 2 +-
> scripts/dtc/libfdt/libfdt.h | 25 +++++++++++++++++++
> scripts/dtc/livetree.c | 39 +++++++++++++++++++++++++++---
> scripts/dtc/util.c | 15 ++++++++++--
> scripts/dtc/util.h | 4 ++-
> scripts/dtc/version_gen.h | 2 +-
> 13 files changed, 124 insertions(+), 48 deletions(-)
--
Lee Jones [李琼斯]
Hei ja miten voit?
Nimeni on rouva Evereen, lähetän tämän viestin suurella toivolla
välitön vastaus, koska minun on tehtävä uusi sydänleikkaus
tällä hetkellä huonokuntoinen ja vähäiset mahdollisuudet selviytyä.
Mutta ennen kuin minä
Tee toinen vaarallinen operaatio, annan sen sinulle
Minulla on 6 550 000 dollaria yhdysvaltalaisella pankkitilillä
sijoittamista, hallinnointia ja käyttöä varten
voittoa hyväntekeväisyysprojektin toteuttamiseen. Tarkoitan sairaiden auttamista
ja köyhät ovat viimeinen haluni maan päällä, sillä minulla ei ole niitä
keneltä perii rahaa.
Vastaa minulle nopeasti
terveisiä
Rouva Monika Evereen
Florida, Amerikan Yhdysvallat
=======================================================
Hi and how are you?
My name is Mrs. Evereen, I am sending this message with great hope for
an immediate response, as I have to undergo heart reoperation in my
current poor health with little chance of survival. But before I
undertake the second dangerous operation, I will give you the
$6,550,000 I have in my US bank account to invest well, manage and use
the profits to run a charity project for me. I count helping the sick
and the poor as my last wish on earth, because I have no one to
inherit money from.
Please give me a quick reply
regards
Mrs. Monika Evereen
Florida, United States of America
This bug is marked as fixed by commit:
net: core: netlink: add helper refcount dec and lock function
net: sched: add helper function to take reference to Qdisc
net: sched: extend Qdisc with rcu
net: sched: rename qdisc_destroy() to qdisc_put()
net: sched: use Qdisc rcu API instead of relying on rtnl lock
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
Hi again,
after sending this out, I noticed this is only a problem in the stable
versions (starting from v6.0.3), as c09eb2e578eb1668bbc has been applied (as
03f148c159a250dd454) but not 0e253f7e558a3e250902 ("bpf: Return value in kprobe
get_func_ip only for entry address") which makes always use of get_entry_ip.
I therefore think, 0e253f7e558a3e250902 needs to be added to the stable v6.0
series as well as otherwise it can't be compiled with -Werror if
CONFIG_X6_KERNEL_IBT is set but CONFIG_FPROBE isn't.
- Jonas
On Thu, Nov 03, 2022 at 04:03:03PM +0100, Jonas Rabenstein wrote:
> Commit c09eb2e578eb1668bbc ("bpf: Adjust kprobe_multi entry_ip
> for CONFIG_X86_KERNEL_IBT") introduced the get_entry_ip() function.
> Depending on CONFIG_X86_KERNEL_IBT it is a static function or only
> a macro definition. The only user of this symbol so far is in
> kprobe_multi_link_handler() if CONFIG_FPROBE is enabled.
> If CONFIG_FROBE is not set, the symbol is not used and - depending
> on CONFIG_X86_KERNEL_IBT - a warning for get_entry_ip() is emitted.
> To solve this, the function should be marked as __maybe_unused.
>
> Signed-off-by: Jonas Rabenstein <rabenstein(a)cs.fau.de>
> ---
> kernel/trace/bpf_trace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f2d8d070d024..19131aae0bc3 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1032,7 +1032,7 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_tracing = {
> };
>
> #ifdef CONFIG_X86_KERNEL_IBT
> -static unsigned long get_entry_ip(unsigned long fentry_ip)
> +static unsigned long __maybe_unused get_entry_ip(unsigned long fentry_ip)
> {
> u32 instr;
>
> --
> 2.37.4
>
In commit 720c24192404 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Note this patch is specific to stable branches 5.4 and 5.10. Since in
newer kernel releases binder no longer caches a pointer to the vma.
Instead, it has been refactored to use vma_lookup() which avoids the
issue described here. This switch was introduced in commit a43cfc87caaf
("android: binder: stop saving a pointer to the VMA").
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.10.x
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yang Shi <yang.shi(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
drivers/android/binder_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 95ca4f934d28..a77ed66425f2 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -212,7 +212,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
mm = alloc->vma_vm_mm;
if (mm) {
- mmap_read_lock(mm);
+ mmap_write_lock(mm);
vma = alloc->vma;
}
@@ -270,7 +270,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
trace_binder_alloc_page_end(alloc, index);
}
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return 0;
@@ -303,7 +303,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
err_no_vma:
if (mm) {
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
--
2.38.1.431.g37b22c650d-goog
From: Sascha Hauer <s.hauer(a)pengutronix.de>
commit 0fddf9ad06fd9f439f137139861556671673e31c upstream.
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
though: It is calculated based on the maximum page read time, but the
timeout is also used for page write and block erase operations which
require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of
tBERS_max and tPROG_max.
This is for now the easiest and most obvious way to fix the driver.
There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us
the desired timeout for the current operation, so we could program the
timeout dynamically for each operation instead of setting a fixed
timeout. Also we could wire up the interrupt handler to actually detect
and forward timeouts occurred when waiting for the chip being ready.
As a sidenote I verified that the change in 06781a5026350 is really
correct. I wired up the interrupt handler in my tree and measured the
time between starting the operation and the timeout interrupt handler
coming in. The time increases 41us with each step in the timeout
register which corresponds to 4096 clock cycles with the 99MHz clock
that I have.
Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Acked-by: Han Xu <han.xu(a)nxp.com>
Tested-by: Tomasz Moń <tomasz.mon(a)camlingroup.com>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Tim Harvey <tharvey(a)gateworks.com>
---
v2: add sb tag and add upstream commit id
---
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
index 92e8ca56f566..200d3ab343b0 100644
--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
@@ -653,8 +653,9 @@ static void gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
unsigned int tRP_ps;
bool use_half_period;
int sample_delay_ps, sample_delay_factor;
- u16 busy_timeout_cycles;
+ unsigned int busy_timeout_cycles;
u8 wrn_dly_sel;
+ u64 busy_timeout_ps;
if (sdr->tRC_min >= 30000) {
/* ONFI non-EDO modes [0-3] */
@@ -678,7 +679,8 @@ static void gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
addr_setup_cycles = TO_CYCLES(sdr->tALS_min, period_ps);
data_setup_cycles = TO_CYCLES(sdr->tDS_min, period_ps);
data_hold_cycles = TO_CYCLES(sdr->tDH_min, period_ps);
- busy_timeout_cycles = TO_CYCLES(sdr->tWB_max + sdr->tR_max, period_ps);
+ busy_timeout_ps = max(sdr->tBERS_max, sdr->tPROG_max);
+ busy_timeout_cycles = TO_CYCLES(busy_timeout_ps, period_ps);
hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
--
2.25.1
The commit 3c52c6bb831f (tcp/udp: Fix memory leak in
ipv6_renew_options()) fixes a memory leak reported by syzbot. This seems
to be a good candidate for the stable trees. This patch didn't apply cleanly
in 5.10 kernel, since release_sock() calls are changed to
sockopt_release_sock() in the latest kernel versions.
Kuniyuki Iwashima (1):
tcp/udp: Fix memory leak in ipv6_renew_options().
net/ipv6/ipv6_sockglue.c | 7 +++++++
1 file changed, 7 insertions(+)
--
2.38.1.273.g43a17bfeac-goog
Hi,
Can you please backport the following commit to 5.15 stable?
Title: "af_unix: Fix memory leaks of the whole sk due to OOB skb."
Commit SHA: 7a62ed61367b8fd01bae1e18e30602c25060d824
This commit fixes "314001f0bf927015e459c9d387d62a231fe93af3" which
landed on "tags/v5.15-rc1~157^2~305".
Best,
Anil
Please consider backporting 181490d5321806e537dc5386db5ea640b826bf78
("block, bfq: protect 'bfqd->queued' by 'bfqd->lock'"), which was
merged in 5.19
applies cleanly to the lts trees I have (4.14 - 5.15), glancing at the
git history it looks like this bug has been present since bfq was
introduced in 4.14ish, so the above stables should be enough.
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
711f8c3fb3db ("Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM")
15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
571f739083e2 ("Bluetooth: Use separate L2CAP LE credit based connection result values")
96cd8eaa131f ("Bluetooth: L2CAP: Derive rx credits from MTU and MPS")
fe1493101ac1 ("Bluetooth: L2CAP: Derive MPS from connection MTU")
d8edd9ed156a ("Bluetooth: L2CAP: Fix L2CAP_CR_SCID_IN_USE value")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 711f8c3fb3db61897080468586b970c87c61d9e4 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Mon, 31 Oct 2022 16:10:32 -0700
Subject: [PATCH] Bluetooth: L2CAP: Fix accepting connection request for
invalid SPSM
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Bluetooth spec states that the valid range for SPSM is from
0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
page 1059:
Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896
CC: stable(a)vger.kernel.org
Reported-by: Tamás Koczka <poprdi(a)google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an(a)intel.com>
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 1fbe087d6ae4..3eee915fb245 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -5813,6 +5813,19 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn,
BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm),
scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ chan = NULL;
+ goto response;
+ }
+
/* Check if we have socket listening on psm */
pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src,
&conn->hcon->dst, LE_LINK);
@@ -6001,6 +6014,18 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn,
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ goto response;
+ }
+
BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
711f8c3fb3db ("Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM")
15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
571f739083e2 ("Bluetooth: Use separate L2CAP LE credit based connection result values")
96cd8eaa131f ("Bluetooth: L2CAP: Derive rx credits from MTU and MPS")
fe1493101ac1 ("Bluetooth: L2CAP: Derive MPS from connection MTU")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 711f8c3fb3db61897080468586b970c87c61d9e4 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Mon, 31 Oct 2022 16:10:32 -0700
Subject: [PATCH] Bluetooth: L2CAP: Fix accepting connection request for
invalid SPSM
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Bluetooth spec states that the valid range for SPSM is from
0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
page 1059:
Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896
CC: stable(a)vger.kernel.org
Reported-by: Tamás Koczka <poprdi(a)google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an(a)intel.com>
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 1fbe087d6ae4..3eee915fb245 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -5813,6 +5813,19 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn,
BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm),
scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ chan = NULL;
+ goto response;
+ }
+
/* Check if we have socket listening on psm */
pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src,
&conn->hcon->dst, LE_LINK);
@@ -6001,6 +6014,18 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn,
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ goto response;
+ }
+
BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
711f8c3fb3db ("Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM")
15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
571f739083e2 ("Bluetooth: Use separate L2CAP LE credit based connection result values")
96cd8eaa131f ("Bluetooth: L2CAP: Derive rx credits from MTU and MPS")
fe1493101ac1 ("Bluetooth: L2CAP: Derive MPS from connection MTU")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 711f8c3fb3db61897080468586b970c87c61d9e4 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Mon, 31 Oct 2022 16:10:32 -0700
Subject: [PATCH] Bluetooth: L2CAP: Fix accepting connection request for
invalid SPSM
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Bluetooth spec states that the valid range for SPSM is from
0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
page 1059:
Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896
CC: stable(a)vger.kernel.org
Reported-by: Tamás Koczka <poprdi(a)google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an(a)intel.com>
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 1fbe087d6ae4..3eee915fb245 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -5813,6 +5813,19 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn,
BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm),
scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ chan = NULL;
+ goto response;
+ }
+
/* Check if we have socket listening on psm */
pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src,
&conn->hcon->dst, LE_LINK);
@@ -6001,6 +6014,18 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn,
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ goto response;
+ }
+
BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
711f8c3fb3db ("Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM")
15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 711f8c3fb3db61897080468586b970c87c61d9e4 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Mon, 31 Oct 2022 16:10:32 -0700
Subject: [PATCH] Bluetooth: L2CAP: Fix accepting connection request for
invalid SPSM
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Bluetooth spec states that the valid range for SPSM is from
0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
page 1059:
Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896
CC: stable(a)vger.kernel.org
Reported-by: Tamás Koczka <poprdi(a)google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an(a)intel.com>
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 1fbe087d6ae4..3eee915fb245 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -5813,6 +5813,19 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn,
BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm),
scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ chan = NULL;
+ goto response;
+ }
+
/* Check if we have socket listening on psm */
pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src,
&conn->hcon->dst, LE_LINK);
@@ -6001,6 +6014,18 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn,
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
+ * page 1059:
+ *
+ * Valid range: 0x0001-0x00ff
+ *
+ * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
+ */
+ if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) {
+ result = L2CAP_CR_LE_BAD_PSM;
+ goto response;
+ }
+
BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));
Hi Greg,
This 5.4.y backport series contains XFS fixes from v5.8. The patchset
has been acked by Darrick.
Brian Foster (1):
xfs: don't fail verifier on empty attr3 leaf block
Chuhong Yuan (1):
xfs: Add the missed xfs_perag_put() for xfs_ifree_cluster()
Darrick J. Wong (2):
xfs: use ordered buffers to initialize dquot buffers during quotacheck
xfs: don't fail unwritten extent conversion on writeback due to edquot
Dave Chinner (1):
xfs: gut error handling in xfs_trans_unreserve_and_mod_sb()
Eric Sandeen (1):
xfs: group quota should return EDQUOT when prj quota enabled
fs/xfs/libxfs/xfs_attr_leaf.c | 8 --
fs/xfs/libxfs/xfs_defer.c | 10 ++-
fs/xfs/xfs_dquot.c | 56 +++++++++---
fs/xfs/xfs_inode.c | 4 +-
fs/xfs/xfs_iomap.c | 2 +-
fs/xfs/xfs_trans.c | 163 +++++-----------------------------
fs/xfs/xfs_trans_dquot.c | 3 +-
7 files changed, 78 insertions(+), 168 deletions(-)
--
2.35.1
Please what time the acpi-bios error reported by the dmesg command will
be fixed, it's really annoying. I have never been able to eliminate such
reports, it's only to insert loglevel=3 in grub settings to block. But
it doesn't solve the problem, and I use google search , it's tells me
that fix this problem need a bios upgrade, but actually my hardware has
stopped upgrade, so I don't know how to do fix this, thanks
[ 0.160900] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.160900] ACPI: Added _OSI(Linux-Dell-Video)
[ 0.160900] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 0.160900] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 0.219876] ACPI BIOS Error (bug): Failure creating named object
[\_SB.PCI0.XHC.RHUB.TPLD], AE_ALREADY_EXISTS (20220331/dswload2-326)
[ 0.219885] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog
(20220331/psobject-220)
[ 0.219889] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0014)
[ 0.219944] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS01], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.219950] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.219953] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.219983] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS02], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.219987] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.219990] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220019] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS03], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220023] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.220026] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220055] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS04], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220059] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.220062] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220090] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS05], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220095] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.220098] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220126] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS06], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220130] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.220133] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220162] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS07], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220166] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.220169] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220197] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS08], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220202] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
[ 0.220205] ACPI: Skipping parse of AML opcode: OpcodeName
unavailable (0x0010)
[ 0.220233] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.HS09], AE_NOT_FOUND (20220331/dswload2-162)
[ 0.220237] ACPI Error: AE_NOT_FOUND, During name lookup/catalog
(20220331/psobject-220)
I urgently seek your service to represent me in investing in
your region / country and you will be rewarded for your service without
affecting your present job with very little time invested in it, which you will
be communicated in details upon response.
My dearest regards
Seyba Daniel