The quilt patch titled
Subject: nilfs2: fix missing cleanup on rollforward recovery error
has been removed from the -mm tree. Its filename was
nilfs2-fix-missing-cleanup-on-rollforward-recovery-error.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix missing cleanup on rollforward recovery error
Date: Sat, 10 Aug 2024 15:52:42 +0900
In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.
It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.
Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.
Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/recovery.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/recovery.c~nilfs2-fix-missing-cleanup-on-rollforward-recovery-error
+++ a/fs/nilfs2/recovery.c
@@ -716,6 +716,33 @@ static void nilfs_finish_roll_forward(st
}
/**
+ * nilfs_abort_roll_forward - cleaning up after a failed rollforward recovery
+ * @nilfs: nilfs object
+ */
+static void nilfs_abort_roll_forward(struct the_nilfs *nilfs)
+{
+ struct nilfs_inode_info *ii, *n;
+ LIST_HEAD(head);
+
+ /* Abandon inodes that have read recovery data */
+ spin_lock(&nilfs->ns_inode_lock);
+ list_splice_init(&nilfs->ns_dirty_files, &head);
+ spin_unlock(&nilfs->ns_inode_lock);
+ if (list_empty(&head))
+ return;
+
+ set_nilfs_purging(nilfs);
+ list_for_each_entry_safe(ii, n, &head, i_dirty) {
+ spin_lock(&nilfs->ns_inode_lock);
+ list_del_init(&ii->i_dirty);
+ spin_unlock(&nilfs->ns_inode_lock);
+
+ iput(&ii->vfs_inode);
+ }
+ clear_nilfs_purging(nilfs);
+}
+
+/**
* nilfs_salvage_orphan_logs - salvage logs written after the latest checkpoint
* @nilfs: nilfs object
* @sb: super block instance
@@ -773,15 +800,19 @@ int nilfs_salvage_orphan_logs(struct the
if (unlikely(err)) {
nilfs_err(sb, "error %d writing segment for recovery",
err);
- goto failed;
+ goto put_root;
}
nilfs_finish_roll_forward(nilfs, ri);
}
- failed:
+put_root:
nilfs_put_root(root);
return err;
+
+failed:
+ nilfs_abort_roll_forward(nilfs);
+ goto put_root;
}
/**
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-add-support-for-fs_ioc_getuuid.patch
nilfs2-add-support-for-fs_ioc_getfssysfspath.patch
nilfs2-add-support-for-fs_ioc_getfslabel.patch
nilfs2-add-support-for-fs_ioc_setfslabel.patch
nilfs2-do-not-output-warnings-when-clearing-dirty-buffers.patch
nilfs2-add-missing-argument-description-for-__nilfs_error.patch
nilfs2-add-missing-argument-descriptions-for-ioctl-related-helpers.patch
nilfs2-improve-kernel-doc-comments-for-b-tree-node-helpers.patch
nilfs2-fix-incorrect-kernel-doc-declaration-of-nilfs_palloc_req-structure.patch
nilfs2-add-missing-description-of-nilfs_btree_path-structure.patch
nilfs2-describe-the-members-of-nilfs_bmap_operations-structure.patch
nilfs2-fix-inconsistencies-in-kernel-doc-comments-in-segmenth.patch
nilfs2-fix-missing-initial-short-descriptions-of-kernel-doc-comments.patch
nilfs2-treat-missing-sufile-header-block-as-metadata-corruption.patch
nilfs2-treat-missing-cpfile-header-block-as-metadata-corruption.patch
nilfs2-do-not-propagate-enoent-error-from-sufile-during-recovery.patch
nilfs2-do-not-propagate-enoent-error-from-sufile-during-gc.patch
nilfs2-do-not-propagate-enoent-error-from-nilfs_sufile_mark_dirty.patch
nilfs2-use-the-bits_per_long-macro.patch
nilfs2-separate-inode-type-information-from-i_state-field.patch
nilfs2-eliminate-the-shared-counter-and-spinlock-for-i_generation.patch
nilfs2-do-not-repair-reserved-inode-bitmap-in-nilfs_new_inode.patch
nilfs2-remove-sc_timer_task.patch
nilfs2-use-kthread_create-and-kthread_stop-for-the-log-writer-thread.patch
nilfs2-refactor-nilfs_segctor_thread.patch
The quilt patch titled
Subject: userfaultfd: don't BUG_ON() if khugepaged yanks our page table
has been removed from the -mm tree. Its filename was
userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: userfaultfd: don't BUG_ON() if khugepaged yanks our page table
Date: Tue, 13 Aug 2024 22:25:22 +0200
Since khugepaged was changed to allow retracting page tables in file
mappings without holding the mmap lock, these BUG_ON()s are wrong - get
rid of them.
We could also remove the preceding "if (unlikely(...))" block, but then we
could reach pte_offset_map_lock() with transhuge pages not just for file
mappings but also for anonymous mappings - which would probably be fine
but I think is not necessarily expected.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@goog…
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table
+++ a/mm/userfaultfd.c
@@ -807,9 +807,10 @@ retry:
err = -EFAULT;
break;
}
-
- BUG_ON(pmd_none(*dst_pmd));
- BUG_ON(pmd_trans_huge(*dst_pmd));
+ /*
+ * For shmem mappings, khugepaged is allowed to remove page
+ * tables under us; pte_offset_map_lock() will deal with that.
+ */
err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
src_addr, flags, &folio);
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-harmless-type-confusion-in-lock_vma_under_rcu.patch
The quilt patch titled
Subject: userfaultfd: fix checks for huge PMDs
has been removed from the -mm tree. Its filename was
userfaultfd-fix-checks-for-huge-pmds.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: userfaultfd: fix checks for huge PMDs
Date: Tue, 13 Aug 2024 22:25:21 +0200
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
the right two race windows) - I've tested this in a kernel build with
some extra mdelay() calls. See the commit message for a description
of the race scenario.
On older kernels (before 6.5), I think the same bug can even
theoretically lead to accessing transhuge page contents as a page table
if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
detecting PMDs that don't point to page tables.
On older kernels (before 6.5), you'd just have to win a single fairly
wide race to hit this.
I've tested this on 6.1 stable by racing migration (with a mdelay()
patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
to yank page tables out from under us (though I haven't tested that),
so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
<zap PMD>
pmdp_get_lockless() [reads none pmd]
<bail if trans_huge>
<if none:>
<pagefault creates transhuge zeropage>
__pte_alloc [no-op]
<zap PMD>
<bail if pmd_trans_huge(*dst_pmd)>
BUG_ON(pmd_none(*dst_pmd))
I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).
If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@goog…
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@goog…
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-checks-for-huge-pmds
+++ a/mm/userfaultfd.c
@@ -787,21 +787,23 @@ retry:
}
dst_pmdval = pmdp_get_lockless(dst_pmd);
- /*
- * If the dst_pmd is mapped as THP don't
- * override it and just be strict.
- */
- if (unlikely(pmd_trans_huge(dst_pmdval))) {
- err = -EEXIST;
- break;
- }
if (unlikely(pmd_none(dst_pmdval)) &&
unlikely(__pte_alloc(dst_mm, dst_pmd))) {
err = -ENOMEM;
break;
}
- /* If an huge pmd materialized from under us fail */
- if (unlikely(pmd_trans_huge(*dst_pmd))) {
+ dst_pmdval = pmdp_get_lockless(dst_pmd);
+ /*
+ * If the dst_pmd is THP don't override it and just be strict.
+ * (This includes the case where the PMD used to be THP and
+ * changed back to none after __pte_alloc().)
+ */
+ if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
+ pmd_devmap(dst_pmdval))) {
+ err = -EEXIST;
+ break;
+ }
+ if (unlikely(pmd_bad(dst_pmdval))) {
err = -EFAULT;
break;
}
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-harmless-type-confusion-in-lock_vma_under_rcu.patch
The quilt patch titled
Subject: mm: vmalloc: ensure vmap_block is initialised before adding to queue
has been removed from the -mm tree. Its filename was
mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Will Deacon <will(a)kernel.org>
Subject: mm: vmalloc: ensure vmap_block is initialised before adding to queue
Date: Mon, 12 Aug 2024 18:16:06 +0100
Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.
When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised. If another CPU is
concurrently walking the xarray (e.g. via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.
This has been observed as UBSAN errors in Android:
| Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
|
| Call trace:
| purge_fragmented_block+0x204/0x21c
| _vm_unmap_aliases+0x170/0x378
| vm_unmap_aliases+0x1c/0x28
| change_memory_common+0x1dc/0x26c
| set_memory_ro+0x18/0x24
| module_enable_ro+0x98/0x238
| do_init_module+0x1b0/0x310
Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.
Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <will(a)kernel.org>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: Hailong.Liu <hailong.liu(a)oppo.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue
+++ a/mm/vmalloc.c
@@ -2626,6 +2626,7 @@ static void *new_vmap_block(unsigned int
vb->dirty_max = 0;
bitmap_set(vb->used_map, 0, (1UL << order));
INIT_LIST_HEAD(&vb->free_list);
+ vb->cpu = raw_smp_processor_id();
xa = addr_to_vb_xa(va->va_start);
vb_idx = addr_to_vb_idx(va->va_start);
@@ -2642,7 +2643,6 @@ static void *new_vmap_block(unsigned int
* integrity together with list_for_each_rcu from read
* side.
*/
- vb->cpu = raw_smp_processor_id();
vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
_
Patches currently in -mm which might be from will(a)kernel.org are
The quilt patch titled
Subject: revert-mm-skip-cma-pages-when-they-are-not-available-update
has been removed from the -mm tree. Its filename was
revert-mm-skip-cma-pages-when-they-are-not-available-update.patch
This patch was dropped because it was folded into revert-mm-skip-cma-pages-when-they-are-not-available.patch
------------------------------------------------------
From: Usama Arif <usamaarif642(a)gmail.com>
Subject: revert-mm-skip-cma-pages-when-they-are-not-available-update
Date: Wed, 21 Aug 2024 20:26:07 +0100
also revert b7108d66318a ("Multi-gen LRU: skip CMA pages when they are not
eligible"), per Johannes
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Link: https://lkml.kernel.org/r/357ac325-4c61-497a-92a3-bdbd230d5ec9@gmail.com
Fixes: 5da226dbfce3 ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Bharata B Rao <bharata(a)amd.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Zhaoyang Huang <huangzhaoyang(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 21 +--------------------
1 file changed, 1 insertion(+), 20 deletions(-)
--- a/mm/vmscan.c~revert-mm-skip-cma-pages-when-they-are-not-available-update
+++ a/mm/vmscan.c
@@ -4253,25 +4253,6 @@ void lru_gen_soft_reclaim(struct mem_cgr
#endif /* CONFIG_MEMCG */
-#ifdef CONFIG_CMA
-/*
- * It is waste of effort to scan and reclaim CMA pages if it is not available
- * for current allocation context. Kswapd can not be enrolled as it can not
- * distinguish this scenario by using sc->gfp_mask = GFP_KERNEL
- */
-static bool skip_cma(struct folio *folio, struct scan_control *sc)
-{
- return !current_is_kswapd() &&
- gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
- folio_migratetype(folio) == MIGRATE_CMA;
-}
-#else
-static bool skip_cma(struct folio *folio, struct scan_control *sc)
-{
- return false;
-}
-#endif
-
/******************************************************************************
* the eviction
******************************************************************************/
@@ -4319,7 +4300,7 @@ static bool sort_folio(struct lruvec *lr
}
/* ineligible */
- if (zone > sc->reclaim_idx || skip_cma(folio, sc)) {
+ if (zone > sc->reclaim_idx) {
gen = folio_inc_gen(lruvec, folio, false);
list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
return true;
_
Patches currently in -mm which might be from usamaarif642(a)gmail.com are
revert-mm-skip-cma-pages-when-they-are-not-available.patch
mm-store-zero-pages-to-be-swapped-out-in-a-bitmap.patch
mm-remove-code-to-handle-same-filled-pages.patch
mm-introduce-a-pageflag-for-partially-mapped-folios.patch
mm-split-underused-thps.patch
mm-add-sysfs-entry-to-disable-splitting-underused-thps.patch
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 11bb2ffb679399f99041540cf662409905179e3a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090121-distress-disengage-545d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
11bb2ffb6793 ("usb: typec: ucsi: Move unregister out of atomic section")
584e8df58942 ("usb: typec: ucsi: extract common code for command handling")
e1870c17e550 ("usb: typec: ucsi: inline ucsi_read_message_in")
5e9c1662a89b ("usb: typec: ucsi: rework command execution functions")
467399d989d7 ("usb: typec: ucsi: split read operation")
13f2ec3115c8 ("usb: typec: ucsi: simplify command sending API")
a7d2fa776976 ("usb: typec: ucsi: move ucsi_acknowledge() from ucsi_read_error()")
f7697db8b1b3 ("Merge 6.10-rc6 into usb-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 11bb2ffb679399f99041540cf662409905179e3a Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <quic_bjorande(a)quicinc.com>
Date: Tue, 20 Aug 2024 13:29:31 -0700
Subject: [PATCH] usb: typec: ucsi: Move unregister out of atomic section
Commit '9329933699b3 ("soc: qcom: pmic_glink: Make client-lock
non-sleeping")' moved the pmic_glink client list under a spinlock, as it
is accessed by the rpmsg/glink callback, which in turn is invoked from
IRQ context.
This means that ucsi_unregister() is now called from atomic context,
which isn't feasible as it's expecting a sleepable context. An effort is
under way to get GLINK to invoke its callbacks in a sleepable context,
but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen
after the remote processor, and thereby the communication link with it, is
gone. pmic_glink_send() is amended with a check to avoid the resulting NULL
pointer dereference.
This does however result in the user being informed about this error by
the following entry in the kernel log:
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: 9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
Cc: stable(a)vger.kernel.org
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Tested-by: Amit Pundir <amit.pundir(a)linaro.org>
Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Bjorn Andersson <quic_bjorande(a)quicinc.com>
Link: https://lore.kernel.org/r/20240820-pmic-glink-v6-11-races-v3-2-eec53c750a04…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 53b176d04fbd..b218460219b7 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_client_register);
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
{
struct pmic_glink *pg = client->pg;
+ int ret;
- return rpmsg_send(pg->ept, data, len);
+ mutex_lock(&pg->state_lock);
+ if (!pg->ept)
+ ret = -ECONNRESET;
+ else
+ ret = rpmsg_send(pg->ept, data, len);
+ mutex_unlock(&pg->state_lock);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c
index f6f4fae40399..6aace19d595b 100644
--- a/drivers/usb/typec/ucsi/ucsi_glink.c
+++ b/drivers/usb/typec/ucsi/ucsi_glink.c
@@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work;
struct work_struct register_work;
+ spinlock_t state_lock;
+ bool ucsi_registered;
+ bool pd_running;
u8 read_buf[UCSI_BUF_SIZE];
};
@@ -244,8 +247,20 @@ static void pmic_glink_ucsi_notify(struct work_struct *work)
static void pmic_glink_ucsi_register(struct work_struct *work)
{
struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
+ unsigned long flags;
+ bool pd_running;
- ucsi_register(ucsi->ucsi);
+ spin_lock_irqsave(&ucsi->state_lock, flags);
+ pd_running = ucsi->pd_running;
+ spin_unlock_irqrestore(&ucsi->state_lock, flags);
+
+ if (!ucsi->ucsi_registered && pd_running) {
+ ucsi_register(ucsi->ucsi);
+ ucsi->ucsi_registered = true;
+ } else if (ucsi->ucsi_registered && !pd_running) {
+ ucsi_unregister(ucsi->ucsi);
+ ucsi->ucsi_registered = false;
+ }
}
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv)
@@ -269,11 +284,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv)
static void pmic_glink_ucsi_pdr_notify(void *priv, int state)
{
struct pmic_glink_ucsi *ucsi = priv;
+ unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
- schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
- ucsi_unregister(ucsi->ucsi);
+ spin_lock_irqsave(&ucsi->state_lock, flags);
+ ucsi->pd_running = (state == SERVREG_SERVICE_STATE_UP);
+ spin_unlock_irqrestore(&ucsi->state_lock, flags);
+ schedule_work(&ucsi->register_work);
}
static void pmic_glink_ucsi_destroy(void *data)
@@ -320,6 +336,7 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev,
INIT_WORK(&ucsi->register_work, pmic_glink_ucsi_register);
init_completion(&ucsi->read_ack);
init_completion(&ucsi->write_ack);
+ spin_lock_init(&ucsi->state_lock);
mutex_init(&ucsi->lock);
ucsi->ucsi = ucsi_create(dev, &pmic_glink_ucsi_ops);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: imx335: Fix reset-gpio handling
Author: Umang Jain <umang.jain(a)ideasonboard.com>
Date: Fri Aug 30 11:41:52 2024 +0530
Rectify the logical value of reset-gpio so that it is set to
0 (disabled) during power-on and to 1 (enabled) during power-off.
Set the reset-gpio to GPIO_OUT_HIGH at initialization time to make
sure it starts off in reset. Also drop the "Set XCLR" comment which
is not-so-informative.
The existing usage of imx335 had reset-gpios polarity inverted
(GPIO_ACTIVE_HIGH) in their device-tree sources. With this patch
included, those DTS will not be able to stream imx335 anymore. The
reset-gpio polarity will need to be rectified in the device-tree
sources as shown in [1] example, in order to get imx335 functional
again (as it remains in reset prior to this fix).
Cc: stable(a)vger.kernel.org
Fixes: 45d19b5fb9ae ("media: i2c: Add imx335 camera sensor driver")
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Link: https://lore.kernel.org/linux-media/20240729110437.199428-1-umang.jain@idea…
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/i2c/imx335.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
---
diff --git a/drivers/media/i2c/imx335.c b/drivers/media/i2c/imx335.c
index 990d74214cc2..54a1de53d497 100644
--- a/drivers/media/i2c/imx335.c
+++ b/drivers/media/i2c/imx335.c
@@ -997,7 +997,7 @@ static int imx335_parse_hw_config(struct imx335 *imx335)
/* Request optional reset pin */
imx335->reset_gpio = devm_gpiod_get_optional(imx335->dev, "reset",
- GPIOD_OUT_LOW);
+ GPIOD_OUT_HIGH);
if (IS_ERR(imx335->reset_gpio)) {
dev_err(imx335->dev, "failed to get reset gpio %ld\n",
PTR_ERR(imx335->reset_gpio));
@@ -1110,8 +1110,7 @@ static int imx335_power_on(struct device *dev)
usleep_range(500, 550); /* Tlow */
- /* Set XCLR */
- gpiod_set_value_cansleep(imx335->reset_gpio, 1);
+ gpiod_set_value_cansleep(imx335->reset_gpio, 0);
ret = clk_prepare_enable(imx335->inclk);
if (ret) {
@@ -1124,7 +1123,7 @@ static int imx335_power_on(struct device *dev)
return 0;
error_reset:
- gpiod_set_value_cansleep(imx335->reset_gpio, 0);
+ gpiod_set_value_cansleep(imx335->reset_gpio, 1);
regulator_bulk_disable(ARRAY_SIZE(imx335_supply_name), imx335->supplies);
return ret;
@@ -1141,7 +1140,7 @@ static int imx335_power_off(struct device *dev)
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct imx335 *imx335 = to_imx335(sd);
- gpiod_set_value_cansleep(imx335->reset_gpio, 0);
+ gpiod_set_value_cansleep(imx335->reset_gpio, 1);
clk_disable_unprepare(imx335->inclk);
regulator_bulk_disable(ARRAY_SIZE(imx335_supply_name), imx335->supplies);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: ov5675: Fix power on/off delay timings
Author: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Date: Sat Jul 13 23:33:29 2024 +0100
The ov5675 specification says that the gap between XSHUTDN deassert and the
first I2C transaction should be a minimum of 8192 XVCLK cycles.
Right now we use a usleep_rage() that gives a sleep time of between about
430 and 860 microseconds.
On the Lenovo X13s we have observed that in about 1/20 cases the current
timing is too tight and we start transacting before the ov5675's reset
cycle completes, leading to I2C bus transaction failures.
The reset racing is sometimes triggered at initial chip probe but, more
usually on a subsequent power-off/power-on cycle e.g.
[ 71.451662] ov5675 24-0010: failed to write reg 0x0103. error = -5
[ 71.451686] ov5675 24-0010: failed to set plls
The current quiescence period we have is too tight. Instead of expressing
the post reset delay in terms of the current XVCLK this patch converts the
power-on and power-off delays to the maximum theoretical delay @ 6 MHz with
an additional buffer.
1.365 milliseconds on the power-on path is 1.5 milliseconds with grace.
85.3 microseconds on the power-off path is 90 microseconds with grace.
Fixes: 49d9ad719e89 ("media: ov5675: add device-tree support and support runtime PM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Quentin Schulz <quentin.schulz(a)cherry.de>
Tested-by: Quentin Schulz <quentin.schulz(a)cherry.de> # RK3399 Puma with
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/i2c/ov5675.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
---
diff --git a/drivers/media/i2c/ov5675.c b/drivers/media/i2c/ov5675.c
index 3641911bc73f..5b5127f8953f 100644
--- a/drivers/media/i2c/ov5675.c
+++ b/drivers/media/i2c/ov5675.c
@@ -972,12 +972,10 @@ static int ov5675_set_stream(struct v4l2_subdev *sd, int enable)
static int ov5675_power_off(struct device *dev)
{
- /* 512 xvclk cycles after the last SCCB transation or MIPI frame end */
- u32 delay_us = DIV_ROUND_UP(512, OV5675_XVCLK_19_2 / 1000 / 1000);
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct ov5675 *ov5675 = to_ov5675(sd);
- usleep_range(delay_us, delay_us * 2);
+ usleep_range(90, 100);
clk_disable_unprepare(ov5675->xvclk);
gpiod_set_value_cansleep(ov5675->reset_gpio, 1);
@@ -988,7 +986,6 @@ static int ov5675_power_off(struct device *dev)
static int ov5675_power_on(struct device *dev)
{
- u32 delay_us = DIV_ROUND_UP(8192, OV5675_XVCLK_19_2 / 1000 / 1000);
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct ov5675 *ov5675 = to_ov5675(sd);
int ret;
@@ -1014,8 +1011,11 @@ static int ov5675_power_on(struct device *dev)
gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
- /* 8192 xvclk cycles prior to the first SCCB transation */
- usleep_range(delay_us, delay_us * 2);
+ /* Worst case quiesence gap is 1.365 milliseconds @ 6MHz XVCLK
+ * Add an additional threshold grace period to ensure reset
+ * completion before initiating our first I2C transaction.
+ */
+ usleep_range(1500, 1600);
return 0;
}
On Fri, Aug 30, 2024 at 02:43:05PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> cifs: Fix FALLOC_FL_PUNCH_HOLE support
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> cifs-fix-falloc_fl_punch_hole-support.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This breaks the build on 6.1.y so I'll go drop it from there, sorry.
greg k-h
In linux-6.6.y commit 983e6b2636f0099dbac1874c9e885bbe1cf2df05,
alloc_pages was renamed to alloc_pages_op, but this was not changed for
the s390 PCI implementation, most likely due to upstream changes in the
s390 PCI implementation which moved it to using the generic IOMMU
implementation after Linux 6.6 was released.
Signed-off-by: Ariadne Conill <ariadne(a)ariadne.space>
---
arch/s390/pci/pci_dma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 99209085c75b..ce0f2990cb04 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -721,7 +721,7 @@ const struct dma_map_ops s390_pci_dma_ops = {
.unmap_page = s390_dma_unmap_pages,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
- .alloc_pages = dma_common_alloc_pages,
+ .alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
/* dma_supported is unconditionally true without a callback */
};
--
2.46.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7860543a94784d744c7ce34b78a2e11beefa5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072924-overact-drainable-8abb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8e7860543a94 ("btrfs: fix extent map use-after-free when adding pages to compressed bio")
b7d463a1d125 ("btrfs: store a pointer to the original btrfs_bio in struct compressed_bio")
690834e47cf7 ("btrfs: pass a btrfs_bio to btrfs_submit_compressed_read")
7edb9a3e7200 ("btrfs: move zero filling of compressed read bios into common code")
32586c5bca72 ("btrfs: factor out a btrfs_free_compressed_pages helper")
10e924bc320a ("btrfs: factor out a btrfs_add_compressed_bio_pages helper")
d7294e4deeb9 ("btrfs: use the bbio file offset in add_ra_bio_pages")
e7aff33e3161 ("btrfs: use the bbio file offset in btrfs_submit_compressed_read")
798c9fc74d03 ("btrfs: remove redundant free_extent_map in btrfs_submit_compressed_read")
544fe4a903ce ("btrfs: embed a btrfs_bio into struct compressed_bio")
d5e4377d5051 ("btrfs: split zone append bios in btrfs_submit_bio")
35a8d7da3ca8 ("btrfs: remove now spurious bio submission helpers")
285599b6fe15 ("btrfs: remove the fs_info argument to btrfs_submit_bio")
48253076c3a9 ("btrfs: open code submit_encoded_read_bio")
30493ff49f81 ("btrfs: remove stripe boundary calculation for compressed I/O")
2380220e1e13 ("btrfs: remove stripe boundary calculation for buffered I/O")
67d669825090 ("btrfs: pass the iomap bio to btrfs_submit_bio")
852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios")
69ccf3f4244a ("btrfs: handle recording of zoned writes in the storage layer")
f8a53bb58ec7 ("btrfs: handle checksum generation in the storage layer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7860543a94784d744c7ce34b78a2e11beefa5c Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 4 Jul 2024 16:11:20 +0100
Subject: [PATCH] btrfs: fix extent map use-after-free when adding pages to
compressed bio
At add_ra_bio_pages() we are accessing the extent map to calculate
'add_size' after we dropped our reference on the extent map, resulting
in a use-after-free. Fix this by computing 'add_size' before dropping our
extent map reference.
Reported-by: syzbot+853d80cba98ce1157ae6(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000038144061c6d18f2@google.com/
Fixes: 6a4049102055 ("btrfs: subpage: make add_ra_bio_pages() compatible")
CC: stable(a)vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index a149f3659b15..a8e2c461aff7 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -515,6 +515,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
put_page(page);
break;
}
+ add_size = min(em->start + em->len, page_end + 1) - cur;
free_extent_map(em);
if (page->index == end_index) {
@@ -527,7 +528,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
}
}
- add_size = min(em->start + em->len, page_end + 1) - cur;
ret = bio_add_page(orig_bio, page, add_size, offset_in_page(cur));
if (ret != add_size) {
unlock_extent(tree, cur, page_end, NULL);
The patch titled
Subject: alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
alloc_tag-fix-allocation-tag-reporting-when-config_modules=n.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
Date: Wed, 28 Aug 2024 16:15:36 -0700
codetag_module_init() is used to initialize sections containing allocation
tags. This function is used to initialize module sections as well as core
kernel sections, in which case the module parameter is set to NULL. This
function has to be called even when CONFIG_MODULES=n to initialize core
kernel allocation tag sections. When CONFIG_MODULES=n, this function is a
NOP, which is wrong. This leads to /proc/allocinfo reported as empty.
Fix this by making it independent of CONFIG_MODULES.
Link: https://lkml.kernel.org/r/20240828231536.1770519-1-surenb@google.com
Fixes: 916cc5167cc6 ("lib: code tagging framework")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/codetag.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
--- a/lib/codetag.c~alloc_tag-fix-allocation-tag-reporting-when-config_modules=n
+++ a/lib/codetag.c
@@ -125,7 +125,6 @@ static inline size_t range_size(const st
cttype->desc.tag_size;
}
-#ifdef CONFIG_MODULES
static void *get_symbol(struct module *mod, const char *prefix, const char *name)
{
DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
@@ -155,6 +154,15 @@ static struct codetag_range get_section_
};
}
+static const char *get_mod_name(__maybe_unused struct module *mod)
+{
+#ifdef CONFIG_MODULES
+ if (mod)
+ return mod->name;
+#endif
+ return "(built-in)";
+}
+
static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
{
struct codetag_range range;
@@ -164,8 +172,7 @@ static int codetag_module_init(struct co
range = get_section_range(mod, cttype->desc.section);
if (!range.start || !range.stop) {
pr_warn("Failed to load code tags of type %s from the module %s\n",
- cttype->desc.section,
- mod ? mod->name : "(built-in)");
+ cttype->desc.section, get_mod_name(mod));
return -EINVAL;
}
@@ -199,6 +206,7 @@ static int codetag_module_init(struct co
return 0;
}
+#ifdef CONFIG_MODULES
void codetag_load_module(struct module *mod)
{
struct codetag_type *cttype;
@@ -248,9 +256,6 @@ bool codetag_unload_module(struct module
return unload_ok;
}
-
-#else /* CONFIG_MODULES */
-static int codetag_module_init(struct codetag_type *cttype, struct module *mod) { return 0; }
#endif /* CONFIG_MODULES */
struct codetag_type *
_
Patches currently in -mm which might be from surenb(a)google.com are
alloc_tag-fix-allocation-tag-reporting-when-config_modules=n.patch
The patch titled
Subject: mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmalloc-optimize-vmap_lazy_nr-arithmetic-when-purging-each-vmap_area.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Adrian Huang <ahuang12(a)lenovo.com>
Subject: mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
Date: Thu, 29 Aug 2024 21:06:33 +0800
When running the vmalloc stress on a 448-core system, observe the average
latency of purge_vmap_node() is about 2 seconds by using the eBPF/bcc
'funclatency.py' tool [1].
# /your-git-repo/bcc/tools/funclatency.py -u purge_vmap_node & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 29 | |
4 -> 7 : 19 | |
8 -> 15 : 56 | |
16 -> 31 : 483 |**** |
32 -> 63 : 1548 |************ |
64 -> 127 : 2634 |********************* |
128 -> 255 : 2535 |********************* |
256 -> 511 : 1776 |************** |
512 -> 1023 : 1015 |******** |
1024 -> 2047 : 573 |**** |
2048 -> 4095 : 488 |**** |
4096 -> 8191 : 1091 |********* |
8192 -> 16383 : 3078 |************************* |
16384 -> 32767 : 4821 |****************************************|
32768 -> 65535 : 3318 |*************************** |
65536 -> 131071 : 1718 |************** |
131072 -> 262143 : 2220 |****************** |
262144 -> 524287 : 1147 |********* |
524288 -> 1048575 : 1179 |********* |
1048576 -> 2097151 : 822 |****** |
2097152 -> 4194303 : 906 |******* |
4194304 -> 8388607 : 2148 |***************** |
8388608 -> 16777215 : 4497 |************************************* |
16777216 -> 33554431 : 289 |** |
avg = 2041714 usecs, total: 78381401772 usecs, count: 38390
The worst case is over 16-33 seconds, so soft lockup is triggered [2].
[Root Cause]
1) Each purge_list has the long list. The following shows the number of
vmap_area is purged.
crash> p vmap_nodes
vmap_nodes = $27 = (struct vmap_node *) 0xff2de5a900100000
crash> vmap_node 0xff2de5a900100000 128 | grep nr_purged
nr_purged = 663070
...
nr_purged = 821670
nr_purged = 692214
nr_purged = 726808
...
2) atomic_long_sub() employs the 'lock' prefix to ensure the atomic
operation when purging each vmap_area. However, the iteration is over
600000 vmap_area (See 'nr_purged' above).
Here is objdump output:
$ objdump -D vmlinux
ffffffff813e8c80 <purge_vmap_node>:
...
ffffffff813e8d70: f0 48 29 2d 68 0c bb lock sub %rbp,0x2bb0c68(%rip)
...
Quote from "Instruction tables" pdf file [3]:
Instructions with a LOCK prefix have a long latency that depends on
cache organization and possibly RAM speed. If there are multiple
processors or cores or direct memory access (DMA) devices, then all
locked instructions will lock a cache line for exclusive access,
which may involve RAM access. A LOCK prefix typically costs more
than a hundred clock cycles, even on single-processor systems.
That's why the latency of purge_vmap_node() dramatically increases
on a many-core system: One core is busy on purging each vmap_area of
the *long* purge_list and executing atomic_long_sub() for each
vmap_area, while other cores free vmalloc allocations and execute
atomic_long_add_return() in free_vmap_area_noflush().
[Solution]
Employ a local variable to record the total purged pages, and execute
atomic_long_sub() after the traversal of the purge_list is done. The
experiment result shows the latency improvement is 99%.
[Experiment Result]
1) System Configuration: Three servers (with HT-enabled) are tested.
* 72-core server: 3rd Gen Intel Xeon Scalable Processor*1
* 192-core server: 5th Gen Intel Xeon Scalable Processor*2
* 448-core server: AMD Zen 4 Processor*2
2) Kernel Config
* CONFIG_KASAN is disabled
3) The data in column "w/o patch" and "w/ patch"
* Unit: micro seconds (us)
* Each data is the average of 3-time measurements
System w/o patch (us) w/ patch (us) Improvement (%)
--------------- -------------- ------------- -------------
72-core server 2194 14 99.36%
192-core server 143799 1139 99.21%
448-core server 1992122 6883 99.65%
[1] https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
[2] https://gist.github.com/AdrianHuang/37c15f67b45407b83c2d32f918656c12
[3] https://www.agner.org/optimize/instruction_tables.pdf
Link: https://lkml.kernel.org/r/20240829130633.2184-1-ahuang12@lenovo.com
Signed-off-by: Adrian Huang <ahuang12(a)lenovo.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-optimize-vmap_lazy_nr-arithmetic-when-purging-each-vmap_area
+++ a/mm/vmalloc.c
@@ -2191,6 +2191,7 @@ static void purge_vmap_node(struct work_
{
struct vmap_node *vn = container_of(work,
struct vmap_node, purge_work);
+ unsigned long nr_purged_pages = 0;
struct vmap_area *va, *n_va;
LIST_HEAD(local_list);
@@ -2208,7 +2209,7 @@ static void purge_vmap_node(struct work_
kasan_release_vmalloc(orig_start, orig_end,
va->va_start, va->va_end);
- atomic_long_sub(nr, &vmap_lazy_nr);
+ nr_purged_pages += nr;
vn->nr_purged++;
if (is_vn_id_valid(vn_id) && !vn->skip_populate)
@@ -2219,6 +2220,8 @@ static void purge_vmap_node(struct work_
list_add(&va->list, &local_list);
}
+ atomic_long_sub(nr_purged_pages, &vmap_lazy_nr);
+
reclaim_list_global(&local_list);
}
_
Patches currently in -mm which might be from ahuang12(a)lenovo.com are
mm-vmalloc-optimize-vmap_lazy_nr-arithmetic-when-purging-each-vmap_area.patch
mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation.patch
Until VM_DONTEXPAND was added in commit 1c1914d6e8c6 ("dma-buf: heaps:
Don't track CMA dma-buf pages under RssFile") it was possible to obtain
a mapping larger than the buffer size via mremap and bypass the overflow
check in dma_buf_mmap_internal. When using such a mapping to attempt to
fault past the end of the buffer, the CMA heap fault handler also checks
the fault offset against the buffer size, but gets the boundary wrong by
1. Fix the boundary check so that we don't read off the end of the pages
array and insert an arbitrary page in the mapping.
Reported-by: Xingyu Jin <xingyuj(a)google.com>
Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the cma_heap implementation")
Cc: stable(a)vger.kernel.org # Applicable >= 5.10. Needs adjustments only for 5.10.
Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
---
drivers/dma-buf/heaps/cma_heap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index c384004b918e..93be88b805fe 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -165,7 +165,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct cma_heap_buffer *buffer = vma->vm_private_data;
- if (vmf->pgoff > buffer->pagecount)
+ if (vmf->pgoff >= buffer->pagecount)
return VM_FAULT_SIGBUS;
return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
--
2.46.0.469.g59c65b2a67-goog
From: Simon Arlott <simon(a)octiron.net>
The mcp251x_hw_wake() function is called with the mpc_lock mutex held and
disables the interrupt handler so that no interrupts can be processed while
waking the device. If an interrupt has already occurred then waiting for
the interrupt handler to complete will deadlock because it will be trying
to acquire the same mutex.
CPU0 CPU1
---- ----
mcp251x_open()
mutex_lock(&priv->mcp_lock)
request_threaded_irq()
<interrupt>
mcp251x_can_ist()
mutex_lock(&priv->mcp_lock)
mcp251x_hw_wake()
disable_irq() <-- deadlock
Use disable_irq_nosync() instead because the interrupt handler does
everything while holding the mutex so it doesn't matter if it's still
running.
Fixes: 8ce8c0abcba3 ("can: mcp251x: only reset hardware as required")
Signed-off-by: Simon Arlott <simon(a)octiron.net>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/4fc08687-1d80-43fe-9f0d-8ef8475e75f6@0882a8b5-c…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 3b8736ff0345..ec5c64006a16 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -752,7 +752,7 @@ static int mcp251x_hw_wake(struct spi_device *spi)
int ret;
/* Force wakeup interrupt to wake device, but don't execute IST */
- disable_irq(spi->irq);
+ disable_irq_nosync(spi->irq);
mcp251x_write_2regs(spi, CANINTE, CANINTE_WAKIE, CANINTF_WAKIF);
/* Wait for oscillator startup timer after wake up */
--
2.45.2
Zero and negative number is not a valid IRQ for in-kernel code and the
irq_of_parse_and_map() function returns zero on error. So this check for
valid IRQs should only accept values > 0.
Cc: stable(a)vger.kernel.org
Fixes: 42bbb70980f3 ("powerpc/5200: Add mpc5200-spi (non-PSC) device driver")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/spi/spi-mpc52xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-mpc52xx.c b/drivers/spi/spi-mpc52xx.c
index d5ac60c135c2..b49155a25694 100644
--- a/drivers/spi/spi-mpc52xx.c
+++ b/drivers/spi/spi-mpc52xx.c
@@ -472,7 +472,7 @@ static int mpc52xx_spi_probe(struct platform_device *op)
INIT_WORK(&ms->work, mpc52xx_spi_wq);
/* Decide if interrupts can be used */
- if (ms->irq0 && ms->irq1) {
+ if (ms->irq0 > 0 && ms->irq1 > 0) {
rc = request_irq(ms->irq0, mpc52xx_spi_irq, 0,
"mpc5200-spi-modf", ms);
rc |= request_irq(ms->irq1, mpc52xx_spi_irq, 0,
--
2.25.1
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), callback function adev_release
calls kfree(iadev). We shouldn't call kfree(iadev) again
in the error handling path. Set 'iadev' to NULL.
Cc: stable(a)vger.kernel.org
Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/net/ethernet/intel/ice/ice_idc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 145b27f2a4ce..5db05f54a80c 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -330,6 +330,7 @@ int ice_plug_aux_dev(struct ice_pf *pf)
return ret;
}
+ iadev = NULL;
ret = auxiliary_device_add(adev);
if (ret) {
auxiliary_device_uninit(adev);
--
2.25.1
From: Chen Ridong <chenridong(a)huawei.com>
[ Upstream commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a ]
An UAF can happen when /proc/cpuset is read as reported in [1].
This can be reproduced by the following methods:
1.add an mdelay(1000) before acquiring the cgroup_lock In the
cgroup_path_ns function.
2.$cat /proc/<pid>/cpuset repeatly.
3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/
$umount /sys/fs/cgroup/cpuset/ repeatly.
The race that cause this bug can be shown as below:
(umount) | (cat /proc/<pid>/cpuset)
css_release | proc_cpuset_show
css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id);
css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...);
cgroup_destroy_root | mutex_lock(&cgroup_mutex);
rebind_subsystems |
cgroup_free_root |
| // cgrp was freed, UAF
| cgroup_path_ns_locked(cgrp,..);
When the cpuset is initialized, the root node top_cpuset.css.cgrp
will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will
allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated
&cgroup_root.cgrp. When the umount operation is executed,
top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
The problem is that when rebinding to cgrp_dfl_root, there are cases
where the cgroup_root allocated by setting up the root for cgroup v1
is cached. This could lead to a Use-After-Free (UAF) if it is
subsequently freed. The descendant cgroups of cgroup v1 can only be
freed after the css is released. However, the css of the root will never
be released, yet the cgroup_root should be freed when it is unmounted.
This means that obtaining a reference to the css of the root does
not guarantee that css.cgrp->root will not be freed.
Fix this problem by using rcu_read_lock in proc_cpuset_show().
As cgroup_root is kfree_rcu after commit d23b5c577715
("cgroup: Make operations on the cgroup root_list RCU safe"),
css->cgroup won't be freed during the critical section.
To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to
replace task_get_css with task_css.
[1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
Signed-off-by: Chen Ridong <chenridong(a)huawei.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
kernel/cgroup/cpuset.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 9f2a93c82..731547a0d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -22,6 +22,7 @@
* distribution for more details.
*/
+#include "cgroup-internal.h"
#include <linux/cpu.h>
#include <linux/cpumask.h>
#include <linux/cpuset.h>
@@ -3725,10 +3726,14 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
if (!buf)
goto out;
- css = task_get_css(tsk, cpuset_cgrp_id);
- retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX,
- current->nsproxy->cgroup_ns);
- css_put(css);
+ rcu_read_lock();
+ spin_lock_irq(&css_set_lock);
+ css = task_css(tsk, cpuset_cgrp_id);
+ retval = cgroup_path_ns_locked(css->cgroup, buf, PATH_MAX,
+ current->nsproxy->cgroup_ns);
+ spin_unlock_irq(&css_set_lock);
+ rcu_read_unlock();
+
if (retval >= PATH_MAX)
retval = -ENAMETOOLONG;
if (retval < 0)
--
2.39.4
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_gdp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_gdp.c b/drivers/gpu/drm/sti/sti_gdp.c
index 43c72c2604a0..f046f5f7ad25 100644
--- a/drivers/gpu/drm/sti/sti_gdp.c
+++ b/drivers/gpu/drm/sti/sti_gdp.c
@@ -638,6 +638,9 @@ static int sti_gdp_atomic_check(struct drm_plane *drm_plane,
mixer = to_sti_mixer(crtc);
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x b1560408692cd0ab0370cfbe9deb03ce97ab3f6d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081258-amusement-designing-88e9@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
b1560408692c ("tracing: Have format file honor EVENT_FILE_FL_FREED")
bb32500fb9b7 ("tracing: Have trace_event_file have ref counters")
5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode")
a1f157c7a3bb ("tracing: Expand all ring buffers individually")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b1560408692cd0ab0370cfbe9deb03ce97ab3f6d Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Tue, 30 Jul 2024 11:06:57 -0400
Subject: [PATCH] tracing: Have format file honor EVENT_FILE_FL_FREED
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When eventfs was introduced, special care had to be done to coordinate the
freeing of the file meta data with the files that are exposed to user
space. The file meta data would have a ref count that is set when the file
is created and would be decremented and freed after the last user that
opened the file closed it. When the file meta data was to be freed, it
would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
and any new references made (like new opens or reads) would fail as it is
marked freed. This allowed other meta data to be freed after this flag was
set (under the event_mutex).
All the files that were dynamically created in the events directory had a
pointer to the file meta data and would call event_release() when the last
reference to the user space file was closed. This would be the time that it
is safe to free the file meta data.
A shortcut was made for the "format" file. It's i_private would point to
the "call" entry directly and not point to the file's meta data. This is
because all format files are the same for the same "call", so it was
thought there was no reason to differentiate them. The other files
maintain state (like the "enable", "trigger", etc). But this meant if the
file were to disappear, the "format" file would be unaware of it.
This caused a race that could be trigger via the user_events test (that
would create dynamic events and free them), and running a loop that would
read the user_events format files:
In one console run:
# cd tools/testing/selftests/user_events
# while true; do ./ftrace_test; done
And in another console run:
# cd /sys/kernel/tracing/
# while true; do cat events/user_events/__test_event/format; done 2>/dev/null
With KASAN memory checking, it would trigger a use-after-free bug report
(which was a real bug). This was because the format file was not checking
the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the
event that the file meta data pointed to after the event was freed.
After inspection, there are other locations that were found to not check
the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a
new helper function: event_file_file() that will make sure that the
event_mutex is held, and will return NULL if the trace_event_file has the
EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file
pointer use event_file_file() and check for NULL. Later uses can still use
the event_file_data() helper function if the event_mutex is still held and
was not released since the event_file_file() call.
Link: https://lore.kernel.org/all/20240719204701.1605950-1-minipli@grsecurity.net/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Ajay Kaher <ajay.kaher(a)broadcom.com>
Cc: Ilkka Naulapää <digirigawa(a)gmail.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter(a)linaro.org>
Cc: Beau Belgrave <beaub(a)linux.microsoft.com>
Cc: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: Alexey Makhalov <alexey.makhalov(a)broadcom.com>
Cc: Vasavi Sirnapalli <vasavi.sirnapalli(a)broadcom.com>
Link: https://lore.kernel.org/20240730110657.3b69d3c1@gandalf.local.home
Fixes: b63db58e2fa5d ("eventfs/tracing: Add callback for release of an eventfs_inode")
Reported-by: Mathias Krause <minipli(a)grsecurity.net>
Tested-by: Mathias Krause <minipli(a)grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8783bebd0562..bd3e3069300e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1634,6 +1634,29 @@ static inline void *event_file_data(struct file *filp)
extern struct mutex event_mutex;
extern struct list_head ftrace_events;
+/*
+ * When the trace_event_file is the filp->i_private pointer,
+ * it must be taken under the event_mutex lock, and then checked
+ * if the EVENT_FILE_FL_FREED flag is set. If it is, then the
+ * data pointed to by the trace_event_file can not be trusted.
+ *
+ * Use the event_file_file() to access the trace_event_file from
+ * the filp the first time under the event_mutex and check for
+ * NULL. If it is needed to be retrieved again and the event_mutex
+ * is still held, then the event_file_data() can be used and it
+ * is guaranteed to be valid.
+ */
+static inline struct trace_event_file *event_file_file(struct file *filp)
+{
+ struct trace_event_file *file;
+
+ lockdep_assert_held(&event_mutex);
+ file = READ_ONCE(file_inode(filp)->i_private);
+ if (!file || file->flags & EVENT_FILE_FL_FREED)
+ return NULL;
+ return file;
+}
+
extern const struct file_operations event_trigger_fops;
extern const struct file_operations event_hist_fops;
extern const struct file_operations event_hist_debug_fops;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 6ef29eba90ce..f08fbaf8cad6 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1386,12 +1386,12 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
char buf[4] = "0";
mutex_lock(&event_mutex);
- file = event_file_data(filp);
+ file = event_file_file(filp);
if (likely(file))
flags = file->flags;
mutex_unlock(&event_mutex);
- if (!file || flags & EVENT_FILE_FL_FREED)
+ if (!file)
return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED &&
@@ -1424,8 +1424,8 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
case 1:
ret = -ENODEV;
mutex_lock(&event_mutex);
- file = event_file_data(filp);
- if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) {
+ file = event_file_file(filp);
+ if (likely(file)) {
ret = tracing_update_buffers(file->tr);
if (ret < 0) {
mutex_unlock(&event_mutex);
@@ -1540,7 +1540,8 @@ enum {
static void *f_next(struct seq_file *m, void *v, loff_t *pos)
{
- struct trace_event_call *call = event_file_data(m->private);
+ struct trace_event_file *file = event_file_data(m->private);
+ struct trace_event_call *call = file->event_call;
struct list_head *common_head = &ftrace_common_fields;
struct list_head *head = trace_get_fields(call);
struct list_head *node = v;
@@ -1572,7 +1573,8 @@ static void *f_next(struct seq_file *m, void *v, loff_t *pos)
static int f_show(struct seq_file *m, void *v)
{
- struct trace_event_call *call = event_file_data(m->private);
+ struct trace_event_file *file = event_file_data(m->private);
+ struct trace_event_call *call = file->event_call;
struct ftrace_event_field *field;
const char *array_descriptor;
@@ -1627,12 +1629,14 @@ static int f_show(struct seq_file *m, void *v)
static void *f_start(struct seq_file *m, loff_t *pos)
{
+ struct trace_event_file *file;
void *p = (void *)FORMAT_HEADER;
loff_t l = 0;
/* ->stop() is called even if ->start() fails */
mutex_lock(&event_mutex);
- if (!event_file_data(m->private))
+ file = event_file_file(m->private);
+ if (!file)
return ERR_PTR(-ENODEV);
while (l < *pos && p)
@@ -1706,8 +1710,8 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
trace_seq_init(s);
mutex_lock(&event_mutex);
- file = event_file_data(filp);
- if (file && !(file->flags & EVENT_FILE_FL_FREED))
+ file = event_file_file(filp);
+ if (file)
print_event_filter(file, s);
mutex_unlock(&event_mutex);
@@ -1736,9 +1740,13 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
return PTR_ERR(buf);
mutex_lock(&event_mutex);
- file = event_file_data(filp);
- if (file)
- err = apply_event_filter(file, buf);
+ file = event_file_file(filp);
+ if (file) {
+ if (file->flags & EVENT_FILE_FL_FREED)
+ err = -ENODEV;
+ else
+ err = apply_event_filter(file, buf);
+ }
mutex_unlock(&event_mutex);
kfree(buf);
@@ -2485,7 +2493,6 @@ static int event_callback(const char *name, umode_t *mode, void **data,
if (strcmp(name, "format") == 0) {
*mode = TRACE_MODE_READ;
*fops = &ftrace_event_format_fops;
- *data = call;
return 1;
}
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 6ece1308d36a..5f9119eb7c67 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -5601,7 +5601,7 @@ static int hist_show(struct seq_file *m, void *v)
mutex_lock(&event_mutex);
- event_file = event_file_data(m->private);
+ event_file = event_file_file(m->private);
if (unlikely(!event_file)) {
ret = -ENODEV;
goto out_unlock;
@@ -5880,7 +5880,7 @@ static int hist_debug_show(struct seq_file *m, void *v)
mutex_lock(&event_mutex);
- event_file = event_file_data(m->private);
+ event_file = event_file_file(m->private);
if (unlikely(!event_file)) {
ret = -ENODEV;
goto out_unlock;
diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
index 8650562bdaa9..a8f076809db4 100644
--- a/kernel/trace/trace_events_inject.c
+++ b/kernel/trace/trace_events_inject.c
@@ -299,7 +299,7 @@ event_inject_write(struct file *filp, const char __user *ubuf, size_t cnt,
strim(buf);
mutex_lock(&event_mutex);
- file = event_file_data(filp);
+ file = event_file_file(filp);
if (file) {
call = file->event_call;
size = parse_entry(buf, call, &entry);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 4bec043c8690..a5e3d6acf1e1 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -159,7 +159,7 @@ static void *trigger_start(struct seq_file *m, loff_t *pos)
/* ->stop() is called even if ->start() fails */
mutex_lock(&event_mutex);
- event_file = event_file_data(m->private);
+ event_file = event_file_file(m->private);
if (unlikely(!event_file))
return ERR_PTR(-ENODEV);
@@ -213,7 +213,7 @@ static int event_trigger_regex_open(struct inode *inode, struct file *file)
mutex_lock(&event_mutex);
- if (unlikely(!event_file_data(file))) {
+ if (unlikely(!event_file_file(file))) {
mutex_unlock(&event_mutex);
return -ENODEV;
}
@@ -293,7 +293,7 @@ static ssize_t event_trigger_regex_write(struct file *file,
strim(buf);
mutex_lock(&event_mutex);
- event_file = event_file_data(file);
+ event_file = event_file_file(file);
if (unlikely(!event_file)) {
mutex_unlock(&event_mutex);
kfree(buf);
The locks_remove_posix() function in fcntl_setlk/fcntl_setlk64 is designed
to reliably remove locks when an fcntl/close race is detected. However, it
was passing in the wrong filelock owner, it looks like a mistake and
resulting in a failure to remove locks. More critically, if the lock
removal fails, it could lead to a uaf issue while traversing the locks.
This problem occurs only in the 4.19/5.4 stable version.
Fixes: 4c43ad4ab416 ("filelock: Fix fcntl/close race recovery compat path")
Fixes: dc2ce1dfceaa ("filelock: Remove locks reliably when fcntl/close race is detected")
Cc: stable(a)vger.kernel.org
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/locks.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 85c8af53d4eb..cf6ed857664b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2542,7 +2542,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
@@ -2672,7 +2672,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
--
2.39.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081115-giving-hacked-fffb@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
82dbb57ac8d0 ("scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES")
0c25422d34b4 ("scsi: mpt3sas: Remove scsi_dma_map() error messages")
1c2048bdc3f4 ("scsi: mpt3sas: switch to generic DMA API")
919d8a3f3fef ("scsi: mpt3sas: Convert uses of pr_<level> with MPT3SAS_FMT to ioc_<level>")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Fri, 19 Jul 2024 16:39:12 +0900
Subject: [PATCH] scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
Some firmware versions of the 9600 series SAS HBA byte-swap the REPORT
ZONES command reply buffer from ATA-ZAC devices by directly accessing the
buffer in the host memory. This does not respect the default command DMA
direction and causes IOMMU page faults on architectures with an IOMMU
enforcing write-only mappings for DMA_FROM_DEVICE DMA driection (e.g. AMD
hosts).
scsi 18:0:0:0: Direct-Access-ZBC ATA WDC WSH722020AL W870 PQ: 0 ANSI: 6
scsi 18:0:0:0: SATA: handle(0x0027), sas_addr(0x300062b2083e7c40), phy(0), device_name(0x5000cca29dc35e11)
scsi 18:0:0:0: enclosure logical id (0x300062b208097c40), slot(0)
scsi 18:0:0:0: enclosure level(0x0000), connector name( C0.0)
scsi 18:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
scsi 18:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
sd 18:0:0:0: Attached scsi generic sg2 type 20
sd 18:0:0:0: [sdc] Host-managed zoned block device
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b200 flags=0x0050]
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b300 flags=0x0050]
mpt3sas_cm0: mpt3sas_ctl_pre_reset_handler: Releasing the trace buffer due to adapter reset.
mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
mpt3sas_cm0: fault_state(0x2666)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
sd 18:0:0:0: [sdc] REPORT ZONES start lba 0 failed
sd 18:0:0:0: [sdc] REPORT ZONES: Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
sd 18:0:0:0: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
Avoid such issue by always mapping the buffer of REPORT ZONES commands
using DMA_BIDIRECTIONAL (read+write IOMMU mapping). This is done by
introducing the helper function _base_scsi_dma_map() and using this helper
in _base_build_sg_scmd() and _base_build_sg_scmd_ieee() instead of calling
directly scsi_dma_map().
Fixes: 471ef9d4e498 ("mpt3sas: Build MPI SGL LIST on GEN2 HBAs and IEEE SGL LIST on GEN3 HBAs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240719073913.179559-3-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1092497563b2..c8fb965a6bf0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2671,6 +2671,22 @@ _base_build_zero_len_sge_ieee(struct MPT3SAS_ADAPTER *ioc, void *paddr)
_base_add_sg_single_ieee(paddr, sgl_flags, 0, 0, -1);
}
+static inline int _base_scsi_dma_map(struct scsi_cmnd *cmd)
+{
+ /*
+ * Some firmware versions byte-swap the REPORT ZONES command reply from
+ * ATA-ZAC devices by directly accessing in the host buffer. This does
+ * not respect the default command DMA direction and causes IOMMU page
+ * faults on some architectures with an IOMMU enforcing write mappings
+ * (e.g. AMD hosts). Avoid such issue by making the report zones buffer
+ * mapping bi-directional.
+ */
+ if (cmd->cmnd[0] == ZBC_IN && cmd->cmnd[1] == ZI_REPORT_ZONES)
+ cmd->sc_data_direction = DMA_BIDIRECTIONAL;
+
+ return scsi_dma_map(cmd);
+}
+
/**
* _base_build_sg_scmd - main sg creation routine
* pcie_device is unused here!
@@ -2717,7 +2733,7 @@ _base_build_sg_scmd(struct MPT3SAS_ADAPTER *ioc,
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
@@ -2861,7 +2877,7 @@ _base_build_sg_scmd_ieee(struct MPT3SAS_ADAPTER *ioc,
}
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 24e82654e98e96cece5d8b919c522054456eeec6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081202-hastily-panic-ee65@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
24e82654e98e ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e82654e98e96cece5d8b919c522054456eeec6 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6b713fb0b818..fdf171ad4a3c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1144,7 +1144,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2312,7 +2312,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3354,6 +3354,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-smoky-refrain-8af1@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
85067747cf98 ("dm: do not use waitqueue for request-based DM")
087615bf3acd ("dm mpath: pass IO start time to path selector")
5de719e3d01b ("dm mpath: fix missing call of path selector type->end_io")
645efa84f6c7 ("dm: add memory barrier before waitqueue_active")
3c94d83cb352 ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()")
c4576aed8d85 ("dm: fix request-based dm's use of dm_wait_for_completion")
b7934ba4147a ("dm: fix inflight IO check")
6f75723190d8 ("dm: remove the pending IO accounting")
dbd3bbd291a0 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO")
80a787ba3809 ("dm: dont rewrite dm_disk(md)->part0.in_flight")
ae8799125d56 ("blk-mq: provide a helper to check if a queue is busy")
6a23e05c2fe3 ("dm: remove legacy request-based IO path")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b49420d6a1aeb399e5b107fc6eb8584d0860fbd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083019-grumble-fading-91fb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
b49420d6a1ae ("video/aperture: optionally match the device in sysfb_disable()")
4e754597d603 ("firmware/sysfb: Create firmware device only for enabled PCI devices")
9eac534db001 ("firmware/sysfb: Set firmware-framebuffer parent device")
2bebc3cd4870 ("Revert "firmware/sysfb: Clear screen_info state after consuming it"")
38814330fedd ("Merge tag 'devicetree-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b49420d6a1aeb399e5b107fc6eb8584d0860fbd7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Wed, 21 Aug 2024 15:11:35 -0400
Subject: [PATCH] video/aperture: optionally match the device in
sysfb_disable()
In aperture_remove_conflicting_pci_devices(), we currently only
call sysfb_disable() on vga class devices. This leads to the
following problem when the pimary device is not VGA compatible:
1. A PCI device with a non-VGA class is the boot display
2. That device is probed first and it is not a VGA device so
sysfb_disable() is not called, but the device resources
are freed by aperture_detach_platform_device()
3. Non-primary GPU has a VGA class and it ends up calling sysfb_disable()
4. NULL pointer dereference via sysfb_disable() since the resources
have already been freed by aperture_detach_platform_device() when
it was called by the other device.
Fix this by passing a device pointer to sysfb_disable() and checking
the device to determine if we should execute it or not.
v2: Fix build when CONFIG_SCREEN_INFO is not set
v3: Move device check into the mutex
Drop primary variable in aperture_remove_conflicting_pci_devices()
Drop __init on pci sysfb_pci_dev_is_enabled()
Fixes: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240821191135.829765-1-alexa…
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 921f61507ae8..02a07d3d0d40 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -39,6 +39,8 @@ static struct platform_device *pd;
static DEFINE_MUTEX(disable_lock);
static bool disabled;
+static struct device *sysfb_parent_dev(const struct screen_info *si);
+
static bool sysfb_unregister(void)
{
if (IS_ERR_OR_NULL(pd))
@@ -52,6 +54,7 @@ static bool sysfb_unregister(void)
/**
* sysfb_disable() - disable the Generic System Framebuffers support
+ * @dev: the device to check if non-NULL
*
* This disables the registration of system framebuffer devices that match the
* generic drivers that make use of the system framebuffer set up by firmware.
@@ -61,17 +64,21 @@ static bool sysfb_unregister(void)
* Context: The function can sleep. A @disable_lock mutex is acquired to serialize
* against sysfb_init(), that registers a system framebuffer device.
*/
-void sysfb_disable(void)
+void sysfb_disable(struct device *dev)
{
+ struct screen_info *si = &screen_info;
+
mutex_lock(&disable_lock);
- sysfb_unregister();
- disabled = true;
+ if (!dev || dev == sysfb_parent_dev(si)) {
+ sysfb_unregister();
+ disabled = true;
+ }
mutex_unlock(&disable_lock);
}
EXPORT_SYMBOL_GPL(sysfb_disable);
#if defined(CONFIG_PCI)
-static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
+static bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
{
/*
* TODO: Try to integrate this code into the PCI subsystem
@@ -87,13 +94,13 @@ static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
return true;
}
#else
-static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
+static bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
{
return false;
}
#endif
-static __init struct device *sysfb_parent_dev(const struct screen_info *si)
+static struct device *sysfb_parent_dev(const struct screen_info *si)
{
struct pci_dev *pdev;
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 389d4ea6bfc1..ef622d41eb5b 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -592,7 +592,7 @@ static int __init of_platform_default_populate_init(void)
* This can happen for example on DT systems that do EFI
* booting and may provide a GOP handle to the EFI stub.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
of_platform_device_create(node, NULL, NULL);
of_node_put(node);
}
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..2b5a1e666e9b 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -293,7 +293,7 @@ int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si
* ask for this, so let's assume that a real driver for the display
* was already probed and prevent sysfb to register devices later.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
aperture_detach_devices(base, size);
@@ -346,15 +346,10 @@ EXPORT_SYMBOL(__aperture_remove_legacy_vga_devices);
*/
int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name)
{
- bool primary = false;
resource_size_t base, size;
int bar, ret = 0;
- if (pdev == vga_default_device())
- primary = true;
-
- if (primary)
- sysfb_disable();
+ sysfb_disable(&pdev->dev);
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
@@ -370,7 +365,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
* that consumes the VGA framebuffer I/O range. Remove this
* device as well.
*/
- if (primary)
+ if (pdev == vga_default_device())
ret = __aperture_remove_legacy_vga_devices(pdev);
return ret;
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index c9cb657dad08..bef5f06a91de 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,11 +58,11 @@ struct efifb_dmi_info {
#ifdef CONFIG_SYSFB
-void sysfb_disable(void);
+void sysfb_disable(struct device *dev);
#else /* CONFIG_SYSFB */
-static inline void sysfb_disable(void)
+static inline void sysfb_disable(struct device *dev)
{
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x b49420d6a1aeb399e5b107fc6eb8584d0860fbd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083018-silent-capped-83df@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
b49420d6a1ae ("video/aperture: optionally match the device in sysfb_disable()")
4e754597d603 ("firmware/sysfb: Create firmware device only for enabled PCI devices")
9eac534db001 ("firmware/sysfb: Set firmware-framebuffer parent device")
2bebc3cd4870 ("Revert "firmware/sysfb: Clear screen_info state after consuming it"")
38814330fedd ("Merge tag 'devicetree-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b49420d6a1aeb399e5b107fc6eb8584d0860fbd7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Wed, 21 Aug 2024 15:11:35 -0400
Subject: [PATCH] video/aperture: optionally match the device in
sysfb_disable()
In aperture_remove_conflicting_pci_devices(), we currently only
call sysfb_disable() on vga class devices. This leads to the
following problem when the pimary device is not VGA compatible:
1. A PCI device with a non-VGA class is the boot display
2. That device is probed first and it is not a VGA device so
sysfb_disable() is not called, but the device resources
are freed by aperture_detach_platform_device()
3. Non-primary GPU has a VGA class and it ends up calling sysfb_disable()
4. NULL pointer dereference via sysfb_disable() since the resources
have already been freed by aperture_detach_platform_device() when
it was called by the other device.
Fix this by passing a device pointer to sysfb_disable() and checking
the device to determine if we should execute it or not.
v2: Fix build when CONFIG_SCREEN_INFO is not set
v3: Move device check into the mutex
Drop primary variable in aperture_remove_conflicting_pci_devices()
Drop __init on pci sysfb_pci_dev_is_enabled()
Fixes: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240821191135.829765-1-alexa…
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 921f61507ae8..02a07d3d0d40 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -39,6 +39,8 @@ static struct platform_device *pd;
static DEFINE_MUTEX(disable_lock);
static bool disabled;
+static struct device *sysfb_parent_dev(const struct screen_info *si);
+
static bool sysfb_unregister(void)
{
if (IS_ERR_OR_NULL(pd))
@@ -52,6 +54,7 @@ static bool sysfb_unregister(void)
/**
* sysfb_disable() - disable the Generic System Framebuffers support
+ * @dev: the device to check if non-NULL
*
* This disables the registration of system framebuffer devices that match the
* generic drivers that make use of the system framebuffer set up by firmware.
@@ -61,17 +64,21 @@ static bool sysfb_unregister(void)
* Context: The function can sleep. A @disable_lock mutex is acquired to serialize
* against sysfb_init(), that registers a system framebuffer device.
*/
-void sysfb_disable(void)
+void sysfb_disable(struct device *dev)
{
+ struct screen_info *si = &screen_info;
+
mutex_lock(&disable_lock);
- sysfb_unregister();
- disabled = true;
+ if (!dev || dev == sysfb_parent_dev(si)) {
+ sysfb_unregister();
+ disabled = true;
+ }
mutex_unlock(&disable_lock);
}
EXPORT_SYMBOL_GPL(sysfb_disable);
#if defined(CONFIG_PCI)
-static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
+static bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
{
/*
* TODO: Try to integrate this code into the PCI subsystem
@@ -87,13 +94,13 @@ static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
return true;
}
#else
-static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
+static bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
{
return false;
}
#endif
-static __init struct device *sysfb_parent_dev(const struct screen_info *si)
+static struct device *sysfb_parent_dev(const struct screen_info *si)
{
struct pci_dev *pdev;
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 389d4ea6bfc1..ef622d41eb5b 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -592,7 +592,7 @@ static int __init of_platform_default_populate_init(void)
* This can happen for example on DT systems that do EFI
* booting and may provide a GOP handle to the EFI stub.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
of_platform_device_create(node, NULL, NULL);
of_node_put(node);
}
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..2b5a1e666e9b 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -293,7 +293,7 @@ int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si
* ask for this, so let's assume that a real driver for the display
* was already probed and prevent sysfb to register devices later.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
aperture_detach_devices(base, size);
@@ -346,15 +346,10 @@ EXPORT_SYMBOL(__aperture_remove_legacy_vga_devices);
*/
int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name)
{
- bool primary = false;
resource_size_t base, size;
int bar, ret = 0;
- if (pdev == vga_default_device())
- primary = true;
-
- if (primary)
- sysfb_disable();
+ sysfb_disable(&pdev->dev);
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
@@ -370,7 +365,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
* that consumes the VGA framebuffer I/O range. Remove this
* device as well.
*/
- if (primary)
+ if (pdev == vga_default_device())
ret = __aperture_remove_legacy_vga_devices(pdev);
return ret;
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index c9cb657dad08..bef5f06a91de 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,11 +58,11 @@ struct efifb_dmi_info {
#ifdef CONFIG_SYSFB
-void sysfb_disable(void);
+void sysfb_disable(struct device *dev);
#else /* CONFIG_SYSFB */
-static inline void sysfb_disable(void)
+static inline void sysfb_disable(struct device *dev)
{
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x aba07b9a0587f50e5d3346eaa19019cf3f86c0ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083034-repave-fit-eff7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
aba07b9a0587 ("drm/vmwgfx: Prevent unmapping active read buffers")
d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
0208ca55aa9c ("Backmerge tag 'v6.9-rc5' into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aba07b9a0587f50e5d3346eaa19019cf3f86c0ea Mon Sep 17 00:00:00 2001
From: Zack Rusin <zack.rusin(a)broadcom.com>
Date: Fri, 16 Aug 2024 14:32:05 -0400
Subject: [PATCH] drm/vmwgfx: Prevent unmapping active read buffers
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.r…
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x aba07b9a0587f50e5d3346eaa19019cf3f86c0ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083033-refurnish-cartel-4b09@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
aba07b9a0587 ("drm/vmwgfx: Prevent unmapping active read buffers")
d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
0208ca55aa9c ("Backmerge tag 'v6.9-rc5' into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aba07b9a0587f50e5d3346eaa19019cf3f86c0ea Mon Sep 17 00:00:00 2001
From: Zack Rusin <zack.rusin(a)broadcom.com>
Date: Fri, 16 Aug 2024 14:32:05 -0400
Subject: [PATCH] drm/vmwgfx: Prevent unmapping active read buffers
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.r…
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a2ccc33b88e2953a6bf0b309e7e8849cc5320018
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083052-definite-customs-8a5e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a2ccc33b88e2 ("drm/i915/dp_mst: Fix MST state after a sink reset")
e37137380931 ("drm/i915/dp_mst: Force modeset CRTC if DSC toggling requires it")
326b1e792ff0 ("drm/i915/dp_mst: Add the MST topology state for modesetted CRTCs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2ccc33b88e2953a6bf0b309e7e8849cc5320018 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak(a)intel.com>
Date: Fri, 23 Aug 2024 19:29:18 +0300
Subject: [PATCH] drm/i915/dp_mst: Fix MST state after a sink reset
In some cases the sink can reset itself after it was configured into MST
mode, without the driver noticing the disconnected state. For instance
the reset may happen in the middle of a modeset, or the (long) HPD pulse
generated may be not long enough for the encoder detect handler to
observe the HPD's deasserted state. In this case the sink's DPCD
register programmed to enable MST will be reset, while the driver still
assumes MST is still enabled. Detect this condition, which will tear
down and recreate/re-enable the MST topology.
v2:
- Add a code comment about adjusting the expected DP_MSTM_CTRL register
value for SST + SideBand. (Suraj, Jani)
- Print a debug message about detecting the link reset. (Jani)
- Verify the DPCD MST state only if it wasn't already determined that
the sink is disconnected.
Cc: stable(a)vger.kernel.org
Cc: Jani Nikula <jani.nikula(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11195
Reviewed-by: Suraj Kandpal <suraj.kandpal(a)intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240823162918.1211875-1-imre…
(cherry picked from commit 594cf78dc36f31c0c7e0de4567e644f406d46bae)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 59f11af3b0a1..dc75a929d3ed 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5935,6 +5935,18 @@ intel_dp_detect(struct drm_connector *connector,
else
status = connector_status_disconnected;
+ if (status != connector_status_disconnected &&
+ !intel_dp_mst_verify_dpcd_state(intel_dp))
+ /*
+ * This requires retrying detection for instance to re-enable
+ * the MST mode that got reset via a long HPD pulse. The retry
+ * will happen either via the hotplug handler's retry logic,
+ * ensured by setting the connector here to SST/disconnected,
+ * or via a userspace connector probing in response to the
+ * hotplug uevent sent when removing the MST connectors.
+ */
+ status = connector_status_disconnected;
+
if (status == connector_status_disconnected) {
memset(&intel_dp->compliance, 0, sizeof(intel_dp->compliance));
memset(intel_connector->dp.dsc_dpcd, 0, sizeof(intel_connector->dp.dsc_dpcd));
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 27ce5c3f5951..17978a1f9ab0 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1998,3 +1998,43 @@ bool intel_dp_mst_crtc_needs_modeset(struct intel_atomic_state *state,
return false;
}
+
+/*
+ * intel_dp_mst_verify_dpcd_state - verify the MST SW enabled state wrt. the DPCD
+ * @intel_dp: DP port object
+ *
+ * Verify if @intel_dp's MST enabled SW state matches the corresponding DPCD
+ * state. A long HPD pulse - not long enough to be detected as a disconnected
+ * state - could've reset the DPCD state, which requires tearing
+ * down/recreating the MST topology.
+ *
+ * Returns %true if the SW MST enabled and DPCD states match, %false
+ * otherwise.
+ */
+bool intel_dp_mst_verify_dpcd_state(struct intel_dp *intel_dp)
+{
+ struct intel_display *display = to_intel_display(intel_dp);
+ struct intel_connector *connector = intel_dp->attached_connector;
+ struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+ struct intel_encoder *encoder = &dig_port->base;
+ int ret;
+ u8 val;
+
+ if (!intel_dp->is_mst)
+ return true;
+
+ ret = drm_dp_dpcd_readb(intel_dp->mst_mgr.aux, DP_MSTM_CTRL, &val);
+
+ /* Adjust the expected register value for SST + SideBand. */
+ if (ret < 0 || val != (DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC)) {
+ drm_dbg_kms(display->drm,
+ "[CONNECTOR:%d:%s][ENCODER:%d:%s] MST mode got reset, removing topology (ret=%d, ctrl=0x%02x)\n",
+ connector->base.base.id, connector->base.name,
+ encoder->base.base.id, encoder->base.name,
+ ret, val);
+
+ return false;
+ }
+
+ return true;
+}
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.h b/drivers/gpu/drm/i915/display/intel_dp_mst.h
index 8ca1d599091c..9e4c7679f1c3 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.h
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.h
@@ -27,5 +27,6 @@ int intel_dp_mst_atomic_check_link(struct intel_atomic_state *state,
struct intel_link_bw_limits *limits);
bool intel_dp_mst_crtc_needs_modeset(struct intel_atomic_state *state,
struct intel_crtc *crtc);
+bool intel_dp_mst_verify_dpcd_state(struct intel_dp *intel_dp);
#endif /* __INTEL_DP_MST_H__ */
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a2ccc33b88e2953a6bf0b309e7e8849cc5320018
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083051-camisole-self-47bb@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
a2ccc33b88e2 ("drm/i915/dp_mst: Fix MST state after a sink reset")
e37137380931 ("drm/i915/dp_mst: Force modeset CRTC if DSC toggling requires it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2ccc33b88e2953a6bf0b309e7e8849cc5320018 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak(a)intel.com>
Date: Fri, 23 Aug 2024 19:29:18 +0300
Subject: [PATCH] drm/i915/dp_mst: Fix MST state after a sink reset
In some cases the sink can reset itself after it was configured into MST
mode, without the driver noticing the disconnected state. For instance
the reset may happen in the middle of a modeset, or the (long) HPD pulse
generated may be not long enough for the encoder detect handler to
observe the HPD's deasserted state. In this case the sink's DPCD
register programmed to enable MST will be reset, while the driver still
assumes MST is still enabled. Detect this condition, which will tear
down and recreate/re-enable the MST topology.
v2:
- Add a code comment about adjusting the expected DP_MSTM_CTRL register
value for SST + SideBand. (Suraj, Jani)
- Print a debug message about detecting the link reset. (Jani)
- Verify the DPCD MST state only if it wasn't already determined that
the sink is disconnected.
Cc: stable(a)vger.kernel.org
Cc: Jani Nikula <jani.nikula(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11195
Reviewed-by: Suraj Kandpal <suraj.kandpal(a)intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240823162918.1211875-1-imre…
(cherry picked from commit 594cf78dc36f31c0c7e0de4567e644f406d46bae)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 59f11af3b0a1..dc75a929d3ed 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5935,6 +5935,18 @@ intel_dp_detect(struct drm_connector *connector,
else
status = connector_status_disconnected;
+ if (status != connector_status_disconnected &&
+ !intel_dp_mst_verify_dpcd_state(intel_dp))
+ /*
+ * This requires retrying detection for instance to re-enable
+ * the MST mode that got reset via a long HPD pulse. The retry
+ * will happen either via the hotplug handler's retry logic,
+ * ensured by setting the connector here to SST/disconnected,
+ * or via a userspace connector probing in response to the
+ * hotplug uevent sent when removing the MST connectors.
+ */
+ status = connector_status_disconnected;
+
if (status == connector_status_disconnected) {
memset(&intel_dp->compliance, 0, sizeof(intel_dp->compliance));
memset(intel_connector->dp.dsc_dpcd, 0, sizeof(intel_connector->dp.dsc_dpcd));
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 27ce5c3f5951..17978a1f9ab0 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1998,3 +1998,43 @@ bool intel_dp_mst_crtc_needs_modeset(struct intel_atomic_state *state,
return false;
}
+
+/*
+ * intel_dp_mst_verify_dpcd_state - verify the MST SW enabled state wrt. the DPCD
+ * @intel_dp: DP port object
+ *
+ * Verify if @intel_dp's MST enabled SW state matches the corresponding DPCD
+ * state. A long HPD pulse - not long enough to be detected as a disconnected
+ * state - could've reset the DPCD state, which requires tearing
+ * down/recreating the MST topology.
+ *
+ * Returns %true if the SW MST enabled and DPCD states match, %false
+ * otherwise.
+ */
+bool intel_dp_mst_verify_dpcd_state(struct intel_dp *intel_dp)
+{
+ struct intel_display *display = to_intel_display(intel_dp);
+ struct intel_connector *connector = intel_dp->attached_connector;
+ struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+ struct intel_encoder *encoder = &dig_port->base;
+ int ret;
+ u8 val;
+
+ if (!intel_dp->is_mst)
+ return true;
+
+ ret = drm_dp_dpcd_readb(intel_dp->mst_mgr.aux, DP_MSTM_CTRL, &val);
+
+ /* Adjust the expected register value for SST + SideBand. */
+ if (ret < 0 || val != (DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC)) {
+ drm_dbg_kms(display->drm,
+ "[CONNECTOR:%d:%s][ENCODER:%d:%s] MST mode got reset, removing topology (ret=%d, ctrl=0x%02x)\n",
+ connector->base.base.id, connector->base.name,
+ encoder->base.base.id, encoder->base.name,
+ ret, val);
+
+ return false;
+ }
+
+ return true;
+}
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.h b/drivers/gpu/drm/i915/display/intel_dp_mst.h
index 8ca1d599091c..9e4c7679f1c3 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.h
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.h
@@ -27,5 +27,6 @@ int intel_dp_mst_atomic_check_link(struct intel_atomic_state *state,
struct intel_link_bw_limits *limits);
bool intel_dp_mst_crtc_needs_modeset(struct intel_atomic_state *state,
struct intel_crtc *crtc);
+bool intel_dp_mst_verify_dpcd_state(struct intel_dp *intel_dp);
#endif /* __INTEL_DP_MST_H__ */
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f18fa2abf81099d822d842a107f8c9889c86043c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083032-bodacious-swifter-6f19@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
f18fa2abf810 ("selftests: mptcp: join: check re-re-adding ID 0 signal")
20ccc7c5f7a3 ("selftests: mptcp: join: validate event numbers")
d397d7246c11 ("selftests: mptcp: join: check re-re-adding ID 0 endp")
1c2326fcae4f ("selftests: mptcp: join: check re-adding init endp with != id")
5f94b08c0012 ("selftests: mptcp: join: check removing ID 0 endpoint")
65fb58afa341 ("selftests: mptcp: join: check re-using ID of closed subflow")
a13d5aad4dd9 ("selftests: mptcp: join: check re-using ID of unused ADD_ADDR")
b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd")
40061817d95b ("selftests: mptcp: join: fix dev in check_endpoint")
23a0485d1c04 ("selftests: mptcp: declare event macros in mptcp_lib")
35bc143a8514 ("selftests: mptcp: add mptcp_lib_events helper")
3a0f9bed3c28 ("selftests: mptcp: add mptcp_lib_ns_init/exit helpers")
3fb8c33ef4b9 ("selftests: mptcp: add mptcp_lib_check_tools helper")
7c2eac649054 ("selftests: mptcp: stop forcing iptables-legacy")
4cc5cc7ca052 ("selftests: mptcp: userspace pm get addr tests")
38f027fca1b7 ("selftests: mptcp: dump userspace addrs list")
2d0c1d27ea4e ("selftests: mptcp: add mptcp_lib_check_output helper")
9480f388a2ef ("selftests: mptcp: join: add ss mptcp support check")
7092dbee2328 ("selftests: mptcp: rm subflow with v4/v4mapped addr")
04b57c9e096a ("selftests: mptcp: join: stop transfer when check is done (part 2)")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f18fa2abf81099d822d842a107f8c9889c86043c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 28 Aug 2024 08:14:38 +0200
Subject: [PATCH] selftests: mptcp: join: check re-re-adding ID 0 signal
This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index a8ea0fe200fb..a4762c49a878 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3688,7 +3688,7 @@ endpoint_tests()
# broadcast IP: no packet for this address will be received on ns1
pm_nl_add_endpoint $ns1 224.0.0.1 id 2 flags signal
pm_nl_add_endpoint $ns1 10.0.1.1 id 42 flags signal
- test_linkfail=4 speed=20 \
+ test_linkfail=4 speed=5 \
run_tests $ns1 $ns2 10.0.1.1 &
local tests_pid=$!
@@ -3717,7 +3717,17 @@ endpoint_tests()
pm_nl_add_endpoint $ns1 10.0.1.1 id 99 flags signal
wait_mpj $ns2
- chk_subflow_nr "after re-add" 3
+ chk_subflow_nr "after re-add ID 0" 3
+ chk_mptcp_info subflows 3 subflows 3
+
+ pm_nl_del_endpoint $ns1 99 10.0.1.1
+ sleep 0.5
+ chk_subflow_nr "after re-delete ID 0" 2
+ chk_mptcp_info subflows 2 subflows 2
+
+ pm_nl_add_endpoint $ns1 10.0.1.1 id 88 flags signal
+ wait_mpj $ns2
+ chk_subflow_nr "after re-re-add ID 0" 3
chk_mptcp_info subflows 3 subflows 3
mptcp_lib_kill_wait $tests_pid
@@ -3727,19 +3737,19 @@ endpoint_tests()
chk_evt_nr ns1 MPTCP_LIB_EVENT_ESTABLISHED 1
chk_evt_nr ns1 MPTCP_LIB_EVENT_ANNOUNCED 0
chk_evt_nr ns1 MPTCP_LIB_EVENT_REMOVED 0
- chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_ESTABLISHED 4
- chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_CLOSED 2
+ chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_ESTABLISHED 5
+ chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_CLOSED 3
chk_evt_nr ns2 MPTCP_LIB_EVENT_CREATED 1
chk_evt_nr ns2 MPTCP_LIB_EVENT_ESTABLISHED 1
- chk_evt_nr ns2 MPTCP_LIB_EVENT_ANNOUNCED 5
- chk_evt_nr ns2 MPTCP_LIB_EVENT_REMOVED 3
- chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_ESTABLISHED 4
- chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_CLOSED 2
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_ANNOUNCED 6
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_REMOVED 4
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_ESTABLISHED 5
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_CLOSED 3
- chk_join_nr 4 4 4
- chk_add_nr 5 5
- chk_rm_nr 3 2 invert
+ chk_join_nr 5 5 5
+ chk_add_nr 6 6
+ chk_rm_nr 4 3 invert
fi
# flush and re-add
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x f18fa2abf81099d822d842a107f8c9889c86043c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083031-stung-aversion-7783@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
f18fa2abf810 ("selftests: mptcp: join: check re-re-adding ID 0 signal")
20ccc7c5f7a3 ("selftests: mptcp: join: validate event numbers")
d397d7246c11 ("selftests: mptcp: join: check re-re-adding ID 0 endp")
1c2326fcae4f ("selftests: mptcp: join: check re-adding init endp with != id")
5f94b08c0012 ("selftests: mptcp: join: check removing ID 0 endpoint")
65fb58afa341 ("selftests: mptcp: join: check re-using ID of closed subflow")
a13d5aad4dd9 ("selftests: mptcp: join: check re-using ID of unused ADD_ADDR")
b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f18fa2abf81099d822d842a107f8c9889c86043c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 28 Aug 2024 08:14:38 +0200
Subject: [PATCH] selftests: mptcp: join: check re-re-adding ID 0 signal
This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index a8ea0fe200fb..a4762c49a878 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3688,7 +3688,7 @@ endpoint_tests()
# broadcast IP: no packet for this address will be received on ns1
pm_nl_add_endpoint $ns1 224.0.0.1 id 2 flags signal
pm_nl_add_endpoint $ns1 10.0.1.1 id 42 flags signal
- test_linkfail=4 speed=20 \
+ test_linkfail=4 speed=5 \
run_tests $ns1 $ns2 10.0.1.1 &
local tests_pid=$!
@@ -3717,7 +3717,17 @@ endpoint_tests()
pm_nl_add_endpoint $ns1 10.0.1.1 id 99 flags signal
wait_mpj $ns2
- chk_subflow_nr "after re-add" 3
+ chk_subflow_nr "after re-add ID 0" 3
+ chk_mptcp_info subflows 3 subflows 3
+
+ pm_nl_del_endpoint $ns1 99 10.0.1.1
+ sleep 0.5
+ chk_subflow_nr "after re-delete ID 0" 2
+ chk_mptcp_info subflows 2 subflows 2
+
+ pm_nl_add_endpoint $ns1 10.0.1.1 id 88 flags signal
+ wait_mpj $ns2
+ chk_subflow_nr "after re-re-add ID 0" 3
chk_mptcp_info subflows 3 subflows 3
mptcp_lib_kill_wait $tests_pid
@@ -3727,19 +3737,19 @@ endpoint_tests()
chk_evt_nr ns1 MPTCP_LIB_EVENT_ESTABLISHED 1
chk_evt_nr ns1 MPTCP_LIB_EVENT_ANNOUNCED 0
chk_evt_nr ns1 MPTCP_LIB_EVENT_REMOVED 0
- chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_ESTABLISHED 4
- chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_CLOSED 2
+ chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_ESTABLISHED 5
+ chk_evt_nr ns1 MPTCP_LIB_EVENT_SUB_CLOSED 3
chk_evt_nr ns2 MPTCP_LIB_EVENT_CREATED 1
chk_evt_nr ns2 MPTCP_LIB_EVENT_ESTABLISHED 1
- chk_evt_nr ns2 MPTCP_LIB_EVENT_ANNOUNCED 5
- chk_evt_nr ns2 MPTCP_LIB_EVENT_REMOVED 3
- chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_ESTABLISHED 4
- chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_CLOSED 2
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_ANNOUNCED 6
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_REMOVED 4
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_ESTABLISHED 5
+ chk_evt_nr ns2 MPTCP_LIB_EVENT_SUB_CLOSED 3
- chk_join_nr 4 4 4
- chk_add_nr 5 5
- chk_rm_nr 3 2 invert
+ chk_join_nr 5 5 5
+ chk_add_nr 6 6
+ chk_rm_nr 4 3 invert
fi
# flush and re-add
In case of sev PLATFORM_STATUS failure, sev_get_api_version() fails
resulting in sev_data field of psp_master nulled out. This later becomes
a problem when unloading the ccp module because the device has not been
unregistered (via misc_deregister()) before clearing the sev_data field
of psp_master. As a result, on reloading the ccp module, a duplicate
device issue is encountered as can be seen from the dmesg log below.
on reloading ccp module via modprobe ccp
Call Trace:
<TASK>
dump_stack_lvl+0xd7/0xf0
dump_stack+0x10/0x20
sysfs_warn_dup+0x5c/0x70
sysfs_create_dir_ns+0xbc/0xd
kobject_add_internal+0xb1/0x2f0
kobject_add+0x7a/0xe0
? srso_alias_return_thunk+0x5/0xfbef5
? get_device_parent+0xd4/0x1e0
? __pfx_klist_children_get+0x10/0x10
device_add+0x121/0x870
? srso_alias_return_thunk+0x5/0xfbef5
device_create_groups_vargs+0xdc/0x100
device_create_with_groups+0x3f/0x60
misc_register+0x13b/0x1c0
sev_dev_init+0x1d4/0x290 [ccp]
psp_dev_init+0x136/0x300 [ccp]
sp_init+0x6f/0x80 [ccp]
sp_pci_probe+0x2a6/0x310 [ccp]
? srso_alias_return_thunk+0x5/0xfbef5
local_pci_probe+0x4b/0xb0
work_for_cpu_fn+0x1a/0x30
process_one_work+0x203/0x600
worker_thread+0x19e/0x350
? __pfx_worker_thread+0x10/0x10
kthread+0xeb/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x3c/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
kobject: kobject_add_internal failed for sev with -EEXIST, don't try to register things with the same name in the same directory.
ccp 0000:22:00.1: sev initialization failed
ccp 0000:22:00.1: psp initialization failed
ccp 0000:a2:00.1: no command queues available
ccp 0000:a2:00.1: psp enabled
Address this issue by unregistering the /dev/sev before clearing out
sev_data in case of PLATFORM_STATUS failure.
Fixes: 200664d5237f ("crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavan Kumar Paluri <papaluri(a)amd.com>
---
drivers/crypto/ccp/sev-dev.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9810edbb272d..5f63d2018649 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2410,6 +2410,8 @@ void sev_pci_init(void)
return;
err:
+ sev_dev_destroy(psp_master);
+
psp_master->sev_data = NULL;
}
base-commit: b8c7cbc324dc17b9e42379b42603613580bec2d8
--
2.34.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1c2326fcae4f0c5de8ad0d734ced43a8e5f17dac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083018-rimmed-calamity-65c5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1c2326fcae4f ("selftests: mptcp: join: check re-adding init endp with != id")
a13d5aad4dd9 ("selftests: mptcp: join: check re-using ID of unused ADD_ADDR")
b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1c2326fcae4f0c5de8ad0d734ced43a8e5f17dac Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 28 Aug 2024 08:14:30 +0200
Subject: [PATCH] selftests: mptcp: join: check re-adding init endp with != id
The initial subflow has a special local ID: 0. It is specific per
connection.
When a global endpoint is deleted and re-added later, it can have a
different ID, but the kernel should still use the ID 0 if it corresponds
to the initial address.
This test validates this behaviour: the endpoint linked to the initial
subflow is removed, and re-added with a different ID.
Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 8b4529ff15e5..75458ade32c7 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3627,11 +3627,12 @@ endpoint_tests()
# remove and re-add
if reset "delete re-add signal" &&
mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
- pm_nl_set_limits $ns1 0 2
- pm_nl_set_limits $ns2 2 2
+ pm_nl_set_limits $ns1 0 3
+ pm_nl_set_limits $ns2 3 3
pm_nl_add_endpoint $ns1 10.0.2.1 id 1 flags signal
# broadcast IP: no packet for this address will be received on ns1
pm_nl_add_endpoint $ns1 224.0.0.1 id 2 flags signal
+ pm_nl_add_endpoint $ns1 10.0.1.1 id 42 flags signal
test_linkfail=4 speed=20 \
run_tests $ns1 $ns2 10.0.1.1 &
local tests_pid=$!
@@ -3653,11 +3654,21 @@ endpoint_tests()
wait_mpj $ns2
chk_subflow_nr "after re-add" 3
chk_mptcp_info subflows 2 subflows 2
+
+ pm_nl_del_endpoint $ns1 42 10.0.1.1
+ sleep 0.5
+ chk_subflow_nr "after delete ID 0" 2
+ chk_mptcp_info subflows 2 subflows 2
+
+ pm_nl_add_endpoint $ns1 10.0.1.1 id 99 flags signal
+ wait_mpj $ns2
+ chk_subflow_nr "after re-add" 3
+ chk_mptcp_info subflows 3 subflows 3
mptcp_lib_kill_wait $tests_pid
- chk_join_nr 3 3 3
- chk_add_nr 4 4
- chk_rm_nr 2 1 invert
+ chk_join_nr 4 4 4
+ chk_add_nr 5 5
+ chk_rm_nr 3 2 invert
fi
# flush and re-add
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1c2326fcae4f0c5de8ad0d734ced43a8e5f17dac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083017-acclaim-eatery-5ccf@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
1c2326fcae4f ("selftests: mptcp: join: check re-adding init endp with != id")
a13d5aad4dd9 ("selftests: mptcp: join: check re-using ID of unused ADD_ADDR")
b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1c2326fcae4f0c5de8ad0d734ced43a8e5f17dac Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 28 Aug 2024 08:14:30 +0200
Subject: [PATCH] selftests: mptcp: join: check re-adding init endp with != id
The initial subflow has a special local ID: 0. It is specific per
connection.
When a global endpoint is deleted and re-added later, it can have a
different ID, but the kernel should still use the ID 0 if it corresponds
to the initial address.
This test validates this behaviour: the endpoint linked to the initial
subflow is removed, and re-added with a different ID.
Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 8b4529ff15e5..75458ade32c7 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3627,11 +3627,12 @@ endpoint_tests()
# remove and re-add
if reset "delete re-add signal" &&
mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
- pm_nl_set_limits $ns1 0 2
- pm_nl_set_limits $ns2 2 2
+ pm_nl_set_limits $ns1 0 3
+ pm_nl_set_limits $ns2 3 3
pm_nl_add_endpoint $ns1 10.0.2.1 id 1 flags signal
# broadcast IP: no packet for this address will be received on ns1
pm_nl_add_endpoint $ns1 224.0.0.1 id 2 flags signal
+ pm_nl_add_endpoint $ns1 10.0.1.1 id 42 flags signal
test_linkfail=4 speed=20 \
run_tests $ns1 $ns2 10.0.1.1 &
local tests_pid=$!
@@ -3653,11 +3654,21 @@ endpoint_tests()
wait_mpj $ns2
chk_subflow_nr "after re-add" 3
chk_mptcp_info subflows 2 subflows 2
+
+ pm_nl_del_endpoint $ns1 42 10.0.1.1
+ sleep 0.5
+ chk_subflow_nr "after delete ID 0" 2
+ chk_mptcp_info subflows 2 subflows 2
+
+ pm_nl_add_endpoint $ns1 10.0.1.1 id 99 flags signal
+ wait_mpj $ns2
+ chk_subflow_nr "after re-add" 3
+ chk_mptcp_info subflows 3 subflows 3
mptcp_lib_kill_wait $tests_pid
- chk_join_nr 3 3 3
- chk_add_nr 4 4
- chk_rm_nr 2 1 invert
+ chk_join_nr 4 4 4
+ chk_add_nr 5 5
+ chk_rm_nr 3 2 invert
fi
# flush and re-add
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 9366922adc6a71378ca01f898c41be295309f044
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083011-antics-rented-acb7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
9366922adc6a ("mptcp: pm: fix ID 0 endp usage after multiple re-creations")
8b8ed1b429f8 ("mptcp: pm: reuse ID 0 after delete and re-add")
48e50dcbcbaa ("mptcp: pm: avoid possible UaF when selecting endp")
85df533a787b ("mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set")
cd7c957f936f ("mptcp: pm: don't try to create sf if alloc failed")
c95eb32ced82 ("mptcp: pm: reduce indentation blocks")
40eec1795cc2 ("mptcp: pm: update add_addr counters after connect")
6a09788c1a66 ("mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID")
e571fb09c893 ("selftests: mptcp: add speed env var")
4aadde088a58 ("selftests: mptcp: add fullmesh env var")
080b7f5733fd ("selftests: mptcp: add fastclose env var")
662aa22d7dcd ("selftests: mptcp: set all env vars as local ones")
9e9d176df8e9 ("selftests: mptcp: add pm_nl_set_endpoint helper")
1534f87ee0dc ("selftests: mptcp: drop sflags parameter")
595ef566a2ef ("selftests: mptcp: drop addr_nr_ns1/2 parameters")
0c93af1f8907 ("selftests: mptcp: drop test_linkfail parameter")
be7e9786c915 ("selftests: mptcp: set FAILING_LINKS in run_tests")
4369c198e599 ("selftests: mptcp: test userspace pm out of transfer")
528cb5f2a1e8 ("mptcp: pass addr to mptcp_pm_alloc_anno_list")
ae947bb2c253 ("selftests: mptcp: join: skip Fastclose tests if not supported")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9366922adc6a71378ca01f898c41be295309f044 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 28 Aug 2024 08:14:33 +0200
Subject: [PATCH] mptcp: pm: fix ID 0 endp usage after multiple re-creations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
'local_addr_used' and 'add_addr_accepted' are decremented for addresses
not related to the initial subflow (ID0), because the source and
destination addresses of the initial subflows are known from the
beginning: they don't count as "additional local address being used" or
"ADD_ADDR being accepted".
It is then required not to increment them when the entrypoint used by
the initial subflow is removed and re-added during a connection. Without
this modification, this entrypoint cannot be removed and re-added more
than once.
Reported-by: Arınç ÜNAL <arinc.unal(a)arinc9.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/512
Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Reported-by: syzbot+455d38ecd5f655fc45cf(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/00000000000049861306209237f4@google.com
Cc: stable(a)vger.kernel.org
Tested-by: Arınç ÜNAL <arinc.unal(a)arinc9.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3ff273e219f2..a93450ded50a 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -615,12 +615,13 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
fullmesh = !!(local.flags & MPTCP_PM_ADDR_FLAG_FULLMESH);
- msk->pm.local_addr_used++;
__clear_bit(local.addr.id, msk->pm.id_avail_bitmap);
/* Special case for ID0: set the correct ID */
if (local.addr.id == msk->mpc_endpoint_id)
local.addr.id = 0;
+ else /* local_addr_used is not decr for ID 0 */
+ msk->pm.local_addr_used++;
nr = fill_remote_addresses_vec(msk, &local.addr, fullmesh, addrs);
if (nr == 0)
@@ -750,7 +751,9 @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
spin_lock_bh(&msk->pm.lock);
if (sf_created) {
- msk->pm.add_addr_accepted++;
+ /* add_addr_accepted is not decr for ID 0 */
+ if (remote.id)
+ msk->pm.add_addr_accepted++;
if (msk->pm.add_addr_accepted >= add_addr_accept_max ||
msk->pm.subflows >= subflows_max)
WRITE_ONCE(msk->pm.accept_addr, false);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8b8ed1b429f8fa7ebd5632555e7b047bc0620075
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024083043-surreal-prepay-46e5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8b8ed1b429f8 ("mptcp: pm: reuse ID 0 after delete and re-add")
48e50dcbcbaa ("mptcp: pm: avoid possible UaF when selecting endp")
85df533a787b ("mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set")
cd7c957f936f ("mptcp: pm: don't try to create sf if alloc failed")
c95eb32ced82 ("mptcp: pm: reduce indentation blocks")
528cb5f2a1e8 ("mptcp: pass addr to mptcp_pm_alloc_anno_list")
77e4b94a3de6 ("mptcp: update userspace pm infos")
24430f8bf516 ("mptcp: add address into userspace pm list")
b9d69db87fb7 ("mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses")
fb00ee4f3343 ("mptcp: netlink: respect v4/v6-only sockets")
80638684e840 ("mptcp: get sk from msk directly")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8b8ed1b429f8fa7ebd5632555e7b047bc0620075 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 28 Aug 2024 08:14:24 +0200
Subject: [PATCH] mptcp: pm: reuse ID 0 after delete and re-add
When the endpoint used by the initial subflow is removed and re-added
later, the PM has to force the ID 0, it is a special case imposed by the
MPTCP specs.
Note that the endpoint should then need to be re-added reusing the same
ID.
Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 8d2f97854c64..ec45ab4c66ab 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -585,6 +585,11 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
__clear_bit(local.addr.id, msk->pm.id_avail_bitmap);
msk->pm.add_addr_signaled++;
+
+ /* Special case for ID0: set the correct ID */
+ if (local.addr.id == msk->mpc_endpoint_id)
+ local.addr.id = 0;
+
mptcp_pm_announce_addr(msk, &local.addr, false);
mptcp_pm_nl_addr_send_ack(msk);
@@ -609,6 +614,11 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
msk->pm.local_addr_used++;
__clear_bit(local.addr.id, msk->pm.id_avail_bitmap);
+
+ /* Special case for ID0: set the correct ID */
+ if (local.addr.id == msk->mpc_endpoint_id)
+ local.addr.id = 0;
+
nr = fill_remote_addresses_vec(msk, &local.addr, fullmesh, addrs);
if (nr == 0)
continue;
These couple of patches intends to fix the reset-gpio handling
for imx335 driver.
Patch 1/2 mentions reset-gpio polarity in DT binding example.
Patch 2/2 fixes the logical value of reset-gpio during
power-on/power-off sequence.
--
Changes in v4:
- rework 2/2 commit message
- Explain conclusions for 2/2 patch, in the '---' section.
Changes in v3:
- Rework 1/2 commit message
- Fix gpio include in DT example in 1/2
- Remove not-so-informative XCLR comment in 2/2
Changes in v2:
- Also include reset-gpio polarity, mention in DT binding
- Add Fixes tag in 2/2
- Set the reset line to high during init time in 2/2
Link to v2:
https://lore.kernel.org/linux-media/20240729110437.199428-1-umang.jain@idea…
Link to v1:
https://lore.kernel.org/linux-media/tyo5etjwsfznuk6vzwqmcphbu4pz4lskrg3fjie…
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
---
Umang Jain (2):
dt-bindings: media: imx335: Add reset-gpios to the DT example
media: imx335: Fix reset-gpio handling
Documentation/devicetree/bindings/media/i2c/sony,imx335.yaml | 4 ++++
drivers/media/i2c/imx335.c | 9 ++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
---
base-commit: 393556c9f56ced8d9776c32ce99f34913cfd904e
change-id: 20240830-imx335-vflip-7f5d4b4d00fe
Best regards,
--
Umang Jain <umang.jain(a)ideasonboard.com>
In psnet_open_pf_bar() and snet_open_vf_bar() a string later passed to
pcim_iomap_regions() is placed on the stack. Neither
pcim_iomap_regions() nor the functions it calls copy that string.
Should the string later ever be used, this, consequently, causes
undefined behavior since the stack frame will by then have disappeared.
Fix the bug by allocating the strings on the heap through
devm_kasprintf().
Cc: stable(a)vger.kernel.org # v6.3
Fixes: 51a8f9d7f587 ("virtio: vdpa: new SolidNET DPU driver.")
Reported-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Closes: https://lore.kernel.org/all/74e9109a-ac59-49e2-9b1d-d825c9c9f891@wanadoo.fr/
Suggested-by: Andy Shevchenko <andy(a)kernel.org>
Signed-off-by: Philipp Stanner <pstanner(a)redhat.com>
---
drivers/vdpa/solidrun/snet_main.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c
index 99428a04068d..c8b74980dbd1 100644
--- a/drivers/vdpa/solidrun/snet_main.c
+++ b/drivers/vdpa/solidrun/snet_main.c
@@ -555,7 +555,7 @@ static const struct vdpa_config_ops snet_config_ops = {
static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
{
- char name[50];
+ char *name;
int ret, i, mask = 0;
/* We don't know which BAR will be used to communicate..
* We will map every bar with len > 0.
@@ -573,7 +573,10 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
return -ENODEV;
}
- snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev));
+ name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "psnet[%s]-bars", pci_name(pdev));
+ if (!name)
+ return -ENOMEM;
+
ret = pcim_iomap_regions(pdev, mask, name);
if (ret) {
SNET_ERR(pdev, "Failed to request and map PCI BARs\n");
@@ -590,10 +593,13 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet)
{
- char name[50];
+ char *name;
int ret;
- snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev));
+ name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "snet[%s]-bars", pci_name(pdev));
+ if (!name)
+ return -ENOMEM;
+
/* Request and map BAR */
ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name);
if (ret) {
--
2.46.0
The update_hw_blink() function prints an error message when hardware is
not able to handle a blink configuration on its own. IMHO, this isn't a
'real' error since the software fallback is used afterwards.
Remove the error messages to avoid flooding the logs with unnecessary
messages.
Cc: stable(a)vger.kernel.org
Fixes: 48ca7f302cfc ("leds: pca9532: Use PWM1 for hardware blinking")
Signed-off-by: Bastien Curutchet <bastien.curutchet(a)bootlin.com>
---
drivers/leds/leds-pca9532.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/leds/leds-pca9532.c b/drivers/leds/leds-pca9532.c
index 9f3fac66a11c..bcc3063bd441 100644
--- a/drivers/leds/leds-pca9532.c
+++ b/drivers/leds/leds-pca9532.c
@@ -215,8 +215,7 @@ static int pca9532_update_hw_blink(struct pca9532_led *led,
if (other->state == PCA9532_PWM1) {
if (other->ldev.blink_delay_on != delay_on ||
other->ldev.blink_delay_off != delay_off) {
- dev_err(&led->client->dev,
- "HW can handle only one blink configuration at a time\n");
+ /* HW can handle only one blink configuration at a time */
return -EINVAL;
}
}
@@ -224,7 +223,7 @@ static int pca9532_update_hw_blink(struct pca9532_led *led,
psc = ((delay_on + delay_off) * PCA9532_PWM_PERIOD_DIV - 1) / 1000;
if (psc > U8_MAX) {
- dev_err(&led->client->dev, "Blink period too long to be handled by hardware\n");
+ /* Blink period too long to be handled by hardware */
return -EINVAL;
}
--
2.45.0
From userspace, spawning a new process with, for example,
posix_spawn(), only allows the user to work with
the scheduling priority value defined by POSIX
in the sched_param struct.
However, sched_setparam() and similar syscalls lead to
__sched_setscheduler() which rejects any new value
for the priority other than 0 for non-RT schedule classes,
a behavior kept since Linux 2.6 or earlier.
Linux translates the usage of the sched_param struct
into it's own internal sched_attr struct during the syscall,
but the user has no way to manage the other values
within the sched_attr struct using only POSIX functions.
The only other way to adjust niceness while using posix_spawn()
would be to set the value after the process has started,
but this introduces the risk of the process being dead
before the next syscall can set the priority after the fact.
To resolve this, allow the use of the priority value
originally from the POSIX sched_param struct in order to
set the niceness value instead of rejecting the priority value.
Edit the sched_get_priority_*() POSIX syscalls
in order to reflect the range of values accepted.
Cc: stable(a)vger.kernel.org # Apply to kernel/sched/core.c
Signed-off-by: Michael Pratt <mcpratt(a)pm.me>
---
kernel/sched/syscalls.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index ae1b42775ef9..52c02b80f037 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -853,6 +853,19 @@ static int _sched_setscheduler(struct task_struct *p, int policy,
attr.sched_policy = policy;
}
+ if (attr.sched_priority > MAX_PRIO-1)
+ return -EINVAL;
+
+ /*
+ * If priority is set for SCHED_NORMAL or SCHED_BATCH,
+ * set the niceness instead, but only for user calls.
+ */
+ if (check && attr.sched_priority > MAX_RT_PRIO-1 &&
+ ((policy != SETPARAM_POLICY && fair_policy(policy)) || fair_policy(p->policy))) {
+ attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
+ attr.sched_priority = 0;
+ }
+
return __sched_setscheduler(p, &attr, check, true);
}
/**
@@ -1598,9 +1611,11 @@ SYSCALL_DEFINE1(sched_get_priority_max, int, policy)
case SCHED_RR:
ret = MAX_RT_PRIO-1;
break;
- case SCHED_DEADLINE:
case SCHED_NORMAL:
case SCHED_BATCH:
+ ret = MAX_PRIO-1;
+ break;
+ case SCHED_DEADLINE:
case SCHED_IDLE:
ret = 0;
break;
@@ -1625,9 +1640,11 @@ SYSCALL_DEFINE1(sched_get_priority_min, int, policy)
case SCHED_RR:
ret = 1;
break;
- case SCHED_DEADLINE:
case SCHED_NORMAL:
case SCHED_BATCH:
+ ret = MAX_RT_PRIO;
+ break;
+ case SCHED_DEADLINE:
case SCHED_IDLE:
ret = 0;
}
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
--
2.30.2
Robert Gill reported below #GP when dosemu software was executing vm86()
system call:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
EIP: restore_all_switch_stack+0xbe/0xcf
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
Call Trace:
show_regs+0x70/0x78
die_addr+0x29/0x70
exc_general_protection+0x13c/0x348
exc_bounds+0x98/0x98
handle_exception+0x14d/0x14d
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
This only happens when VERW based mitigations like MDS/RFDS are enabled.
This is because segment registers with an arbitrary user value can result
in #GP when executing VERW. Intel SDM vol. 2C documents the following
behavior for VERW instruction:
#GP(0) - If a memory operand effective address is outside the CS, DS, ES,
FS, or GS segment limit.
CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
arbitrary user %ds. Also, in NMI return path, move VERW to after
RESTORE_ALL_NMI that touches GPRs.
For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
version is being used:
* entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
handle_exception_return) do:
restore_all_switch_stack:
[...]
mov %esi,%esi
verw %ss:0xc0fc92c0 <-------------
iret
* Opportunistic SYSEXIT:
[...]
verw %ss:0xc0fc92c0 <-------------
btrl $0x9,(%esp)
popf
pop %eax
sti
sysexit
* nmi_return and nmi_from_espfix:
mov %esi,%esi
verw %ss:0xc0fc92c0 <-------------
jmp .Lirq_return
Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
Cc: stable(a)vger.kernel.org # 5.10+
Reported-by: Robert Gill <rtgill82(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.i…
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Suggested-by: Brian Gerst <brgerst(a)gmail.com> # Use %ss
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
v5:
- Simplify the use of ALTERNATIVE construct (Uros/Jiri/Peter).
v4: https://lore.kernel.org/r/20240710-fix-dosemu-vm86-v4-1-aa6464e1de6f@linux.…
- Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
- In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.…
- Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
- Do verw before popf in SYSEXIT path (Jari).
v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.…
- Safe guard against any other system calls like vm86() that might change %ds (Dave).
v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.…
---
---
arch/x86/entry/entry_32.S | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d3a814efbff6..25c942149fb5 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -253,6 +253,14 @@
.Lend_\@:
.endm
+/*
+ * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
+ * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
+ */
+.macro CLEAR_CPU_BUFFERS_SAFE
+ ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF
+.endm
+
.macro RESTORE_INT_REGS
popl %ebx
popl %ecx
@@ -871,6 +879,8 @@ SYM_FUNC_START(entry_SYSENTER_32)
/* Now ready to switch the cr3 */
SWITCH_TO_USER_CR3 scratch_reg=%eax
+ /* Clobbers ZF */
+ CLEAR_CPU_BUFFERS_SAFE
/*
* Restore all flags except IF. (We restore IF separately because
@@ -881,7 +891,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
BUG_IF_WRONG_CR3 no_user_check=1
popfl
popl %eax
- CLEAR_CPU_BUFFERS
/*
* Return back to the vDSO, which will pop ecx and edx.
@@ -951,7 +960,7 @@ restore_all_switch_stack:
/* Restore user state */
RESTORE_REGS pop=4 # skip orig_eax/error_code
- CLEAR_CPU_BUFFERS
+ CLEAR_CPU_BUFFERS_SAFE
.Lirq_return:
/*
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1144,7 +1153,6 @@ SYM_CODE_START(asm_exc_nmi)
/* Not on SYSENTER stack. */
call exc_nmi
- CLEAR_CPU_BUFFERS
jmp .Lnmi_return
.Lnmi_from_sysenter_stack:
@@ -1165,6 +1173,7 @@ SYM_CODE_START(asm_exc_nmi)
CHECK_AND_APPLY_ESPFIX
RESTORE_ALL_NMI cr3_reg=%edi pop=4
+ CLEAR_CPU_BUFFERS_SAFE
jmp .Lirq_return
#ifdef CONFIG_X86_ESPFIX32
@@ -1206,6 +1215,7 @@ SYM_CODE_START(asm_exc_nmi)
* 1 - orig_ax
*/
lss (1+5+6)*4(%esp), %esp # back to espfix stack
+ CLEAR_CPU_BUFFERS_SAFE
jmp .Lirq_return
#endif
SYM_CODE_END(asm_exc_nmi)
---
base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
change-id: 20240426-fix-dosemu-vm86-dd111a01737e
From: Simon Arlott <simon(a)octiron.net>
The mcp251x_hw_wake() function is called with the mpc_lock mutex held and
disables the interrupt handler so that no interrupts can be processed while
waking the device. If an interrupt has already occurred then waiting for
the interrupt handler to complete will deadlock because it will be trying
to acquire the same mutex.
CPU0 CPU1
---- ----
mcp251x_open()
mutex_lock(&priv->mcp_lock)
request_threaded_irq()
<interrupt>
mcp251x_can_ist()
mutex_lock(&priv->mcp_lock)
mcp251x_hw_wake()
disable_irq() <-- deadlock
Use disable_irq_nosync() instead because the interrupt handler does
everything while holding the mutex so it doesn't matter if it's still
running.
Fixes: 8ce8c0abcba3 ("can: mcp251x: only reset hardware as required")
Signed-off-by: Simon Arlott <simon(a)octiron.net>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/4fc08687-1d80-43fe-9f0d-8ef8475e75f6@0882a8b5-c…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 3b8736ff0345..ec5c64006a16 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -752,7 +752,7 @@ static int mcp251x_hw_wake(struct spi_device *spi)
int ret;
/* Force wakeup interrupt to wake device, but don't execute IST */
- disable_irq(spi->irq);
+ disable_irq_nosync(spi->irq);
mcp251x_write_2regs(spi, CANINTE, CANINTE_WAKIE, CANINTF_WAKIF);
/* Wait for oscillator startup timer after wake up */
--
2.45.2
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 44ceabdec12f4e5938f5668c5a691aa3aac703d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082945-closable-sighing-5d70@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
44ceabdec12f ("ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book3 Ultra")
dcfed708742c ("ALSA: hda/realtek: Implement sound init sequence for Samsung Galaxy Book3 Pro 360")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 44ceabdec12f4e5938f5668c5a691aa3aac703d7 Mon Sep 17 00:00:00 2001
From: YOUNGJIN JOO <neoelec(a)gmail.com>
Date: Sun, 25 Aug 2024 18:25:15 +0900
Subject: [PATCH] ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy
Book3 Ultra
144d:c1cc requires the same workaround to enable the speaker amp
as other Samsung models with the ALC298 codec.
Signed-off-by: YOUNGJIN JOO <neoelec(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20240825092515.28728-1-neoelec@gmail.com
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index b5cc3417138c..c04eac6a5064 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -10540,6 +10540,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x144d, 0xca03, "Samsung Galaxy Book2 Pro 360 (NP930QED)", ALC298_FIXUP_SAMSUNG_AMP),
SND_PCI_QUIRK(0x144d, 0xc868, "Samsung Galaxy Book2 Pro (NP930XED)", ALC298_FIXUP_SAMSUNG_AMP),
SND_PCI_QUIRK(0x144d, 0xc1ca, "Samsung Galaxy Book3 Pro 360 (NP960QFG-KB1US)", ALC298_FIXUP_SAMSUNG_AMP2),
+ SND_PCI_QUIRK(0x144d, 0xc1cc, "Samsung Galaxy Book3 Ultra (NT960XFH-XD92G))", ALC298_FIXUP_SAMSUNG_AMP2),
SND_PCI_QUIRK(0x1458, 0xfa53, "Gigabyte BXBT-2807", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x1462, 0xb120, "MSI Cubi MS-B120", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x1462, 0xb171, "Cubi N 8GL (MS-B171)", ALC283_FIXUP_HEADSET_MIC),
Hi Greg and Sasha,
Please apply commit dbaee836d60a ("KVM: arm64: Don't use cbz/adr with
external symbols") to linux-5.10.y, as a recent LLVM optimization around
__cold functions [1] can cause hyp_panic() to be too far away from
__guest_enter(), resulting in the same error that occurred with LTO (for
the same reason):
ld.lld: error: arch/arm64/built-in.a(kvm/hyp/entry.o):(function __guest_enter: .text+0x120): relocation R_AARCH64_CONDBR19 out of range: 14339212 is not in [-1048576, 1048575]; references 'hyp_panic'
>>> referenced by entry.S:88 (arch/arm64/kvm/hyp/vhe/../entry.S:88)
>>> defined in arch/arm64/built-in.a(kvm/hyp/vhe/switch.o)
ld.lld: error: arch/arm64/built-in.a(kvm/hyp/entry.o):(function __guest_enter: .text+0x134): relocation R_AARCH64_ADR_PREL_LO21 out of range: 14339192 is not in [-1048576, 1048575]; references 'hyp_panic'
>>> referenced by entry.S:97 (arch/arm64/kvm/hyp/vhe/../entry.S:97)
>>> defined in arch/arm64/built-in.a(kvm/hyp/vhe/switch.o)
It applies cleanly with 'patch -p1' and I have verified that it fixes
the issue.
[1]: https://github.com/llvm/llvm-project/commit/6b11573b8c5e3d36beee099dbe7347c…
Cheers,
Nathan
From: Jesse Zhang <jesse.zhang(a)amd.com>
[ Upstream commit 88a9a467c548d0b3c7761b4fd54a68e70f9c0944 ]
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001.
V2: To really improve the handling we would actually
need to have a separate value of 0xffffffff.(Christian)
Signed-off-by: Jesse Zhang <jesse.zhang(a)amd.com>
Suggested-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Vamsi Krishna Brahmajosyula <vamsi-krishna.brahmajosyula(a)broadcom.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index ecaa2d748..0a28daa14 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -725,7 +725,8 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t ib_idx)
uint32_t created = 0;
uint32_t allocated = 0;
uint32_t tmp, handle = 0;
- uint32_t *size = &tmp;
+ uint32_t dummy = 0xffffffff;
+ uint32_t *size = &dummy;
unsigned idx;
int i, r = 0;
--
2.39.4
From: Stefan Berger <stefanb(a)linux.ibm.com>
Commit d2add27cf2b8 ("tpm: Add NULL primary creation") introduced
CONFIG_TCG_TPM2_HMAC. When this option is enabled on ppc64 then the
following message appears in the kernel log due to a missing call to
tpm2_sessions_init().
[ 2.654549] tpm tpm0: auth session is not active
Add the missing call to tpm2_session_init() to the ibmvtpm driver to
resolve this issue.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: d2add27cf2b8 ("tpm: Add NULL primary creation")
Signed-off-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
drivers/char/tpm/tpm_ibmvtpm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index d3989b257f42..1e5b107d1f3b 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -698,6 +698,10 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
rc = tpm2_get_cc_attrs_tbl(chip);
if (rc)
goto init_irq_cleanup;
+
+ rc = tpm2_sessions_init(chip);
+ if (rc)
+ goto init_irq_cleanup;
}
return tpm_chip_register(chip);
--
2.39.2
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
For security reasons, access from kernel space to MMIO addresses in
userspace should be restricted. All MMIO operations from kernel space
are considered trusted and are not validated.
For instance, if in response to a syscall, kernel does put_user() and
the target address is MMIO mapping in userspace, current #VE handler
threat this access as kernel MMIO which is wrong and have security
implications.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
---
arch/x86/coco/tdx/tdx.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 5d2d07aa08ce..65f65015238a 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -411,6 +411,11 @@ static inline bool is_private_gpa(u64 gpa)
return gpa == cc_mkenc(gpa);
}
+static inline bool is_kernel_addr(unsigned long addr)
+{
+ return (long)addr < 0;
+}
+
static int get_phys_addr(unsigned long addr, phys_addr_t *phys_addr, bool *writable)
{
unsigned int level;
@@ -606,6 +611,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
if (WARN_ON_ONCE(mmio == INSN_MMIO_DECODE_FAILED))
return -EINVAL;
+ if (!user_mode(regs) && !is_kernel_addr(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
vaddr = (unsigned long)insn_get_addr_ref(&insn, regs);
if (user_mode(regs)) {
--
2.46.0
Commit e882575efc77 ("spi: rockchip: Suspend and resume the bus during
NOIRQ_SYSTEM_SLEEP_PM ops") stopped respecting runtime PM status and
simply disabled clocks unconditionally when suspending the system. This
causes problems when the device is already runtime suspended when we go
to sleep -- in which case we double-disable clocks and produce a
WARNing.
Switch back to pm_runtime_force_{suspend,resume}(), because that still
seems like the right thing to do, and the aforementioned commit makes no
explanation why it stopped using it.
Also, refactor some of the resume() error handling, because it's not
actually a good idea to re-disable clocks on failure.
Fixes: e882575efc77 ("spi: rockchip: Suspend and resume the bus during NOIRQ_SYSTEM_SLEEP_PM ops")
Cc: <stable(a)vger.kernel.org>
Reported-by: "Ondřej Jirman" <megi(a)xff.cz>
Closes: https://lore.kernel.org/lkml/20220621154218.sau54jeij4bunf56@core/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
Changes in v3:
- actually CC the appropriate lists (sorry, I accidentally dropped them
on v2)
Changes in v2:
- fix unused 'rs' warning
drivers/spi/spi-rockchip.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)
diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c
index e1ecd96c7858..0bb33c43b1b4 100644
--- a/drivers/spi/spi-rockchip.c
+++ b/drivers/spi/spi-rockchip.c
@@ -945,14 +945,16 @@ static int rockchip_spi_suspend(struct device *dev)
{
int ret;
struct spi_controller *ctlr = dev_get_drvdata(dev);
- struct rockchip_spi *rs = spi_controller_get_devdata(ctlr);
ret = spi_controller_suspend(ctlr);
if (ret < 0)
return ret;
- clk_disable_unprepare(rs->spiclk);
- clk_disable_unprepare(rs->apb_pclk);
+ ret = pm_runtime_force_suspend(dev);
+ if (ret < 0) {
+ spi_controller_resume(ctlr);
+ return ret;
+ }
pinctrl_pm_select_sleep_state(dev);
@@ -963,25 +965,14 @@ static int rockchip_spi_resume(struct device *dev)
{
int ret;
struct spi_controller *ctlr = dev_get_drvdata(dev);
- struct rockchip_spi *rs = spi_controller_get_devdata(ctlr);
pinctrl_pm_select_default_state(dev);
- ret = clk_prepare_enable(rs->apb_pclk);
+ ret = pm_runtime_force_resume(dev);
if (ret < 0)
return ret;
- ret = clk_prepare_enable(rs->spiclk);
- if (ret < 0)
- clk_disable_unprepare(rs->apb_pclk);
-
- ret = spi_controller_resume(ctlr);
- if (ret < 0) {
- clk_disable_unprepare(rs->spiclk);
- clk_disable_unprepare(rs->apb_pclk);
- }
-
- return 0;
+ return spi_controller_resume(ctlr);
}
#endif /* CONFIG_PM_SLEEP */
--
2.46.0.295.g3b9ea8a38a-goog
This fix addresses STAR 9001285599, which only affects DWC_usb3 version
3.20a. The timer value for PM_LC_TIMER in DWC_usb3 3.20a for the Link
ECN changes is incorrect. If the PM TIMER ECN is enabled via GUCTL2[19],
the link compliance test (TD7.21) may fail. If the ECN is not enabled
(GUCTL2[19] = 0), the controller will use the old timer value (5us),
which is still acceptable for the link compliance test. Therefore, clear
GUCTL2[19] to pass the USB link compliance test: TD 7.21.
Cc: stable(a)vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
Changes in v2: Updated comment and commit message to include the
controller IP.
v1 link:
https://lore.kernel.org/all/20240822084504.1355-1-quic_faisalh@quicinc.com
drivers/usb/dwc3/core.c | 15 +++++++++++++++
drivers/usb/dwc3/core.h | 2 ++
2 files changed, 17 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..92d7ec830322 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1378,6 +1378,21 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
}
+ /*
+ * STAR 9001285599: This issue affects DWC_usb3 version 3.20a
+ * only. If the PM TIMER ECM is enabled through GUCTL2[19], the
+ * link compliance test (TD7.21) may fail. If the ECN is not
+ * enabled (GUCTL2[19] = 0), the controller will use the old timer
+ * value (5us), which is still acceptable for the link compliance
+ * test. Therefore, do not enable PM TIMER ECM in 3.20a by
+ * setting GUCTL2[19] by default; instead, use GUCTL2[19] = 0.
+ */
+ if (DWC3_VER_IS(DWC3, 320A)) {
+ reg = dwc3_readl(dwc->regs, DWC3_GUCTL2);
+ reg &= ~DWC3_GUCTL2_LC_TIMER;
+ dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
+ }
+
/*
* When configured in HOST mode, after issuing U3/L2 exit controller
* fails to send proper CRC checksum in CRC5 feild. Because of this
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..c71240e8f7c7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -421,6 +421,7 @@
/* Global User Control Register 2 */
#define DWC3_GUCTL2_RST_ACTBITLATER BIT(14)
+#define DWC3_GUCTL2_LC_TIMER BIT(19)
/* Global User Control Register 3 */
#define DWC3_GUCTL3_SPLITDISABLE BIT(14)
@@ -1269,6 +1270,7 @@ struct dwc3 {
#define DWC3_REVISION_290A 0x5533290a
#define DWC3_REVISION_300A 0x5533300a
#define DWC3_REVISION_310A 0x5533310a
+#define DWC3_REVISION_320A 0x5533320a
#define DWC3_REVISION_330A 0x5533330a
#define DWC31_REVISION_ANY 0x0
--
2.17.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
On ilk/snb the pipe may be configured to place the LUT before or
after the CSC depending on various factors, but as there is only
one LUT (no split mode like on IVB+) we only advertise a gamma_lut
and no degamma_lut in the uapi to avoid confusing userspace.
This can cause a problem during readout if the VBIOS/GOP enabled
the LUT in the pre CSC configuration. The current code blindly
assigns the results of the readout to the degamma_lut, which will
cause a failure during the next atomic_check() as we aren't expecting
anything to be in degamma_lut since it's not visible to userspace.
Fix the problem by assigning whatever LUT we read out from the
hardware into gamma_lut.
Cc: stable(a)vger.kernel.org
Fixes: d2559299d339 ("drm/i915: Make ilk_read_luts() capable of degamma readout")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11608
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
.../drm/i915/display/intel_modeset_setup.c | 31 ++++++++++++++++---
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_modeset_setup.c b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
index 7602cb30ebf1..e1213f3d93cc 100644
--- a/drivers/gpu/drm/i915/display/intel_modeset_setup.c
+++ b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
@@ -326,6 +326,8 @@ static void intel_modeset_update_connector_atomic_state(struct drm_i915_private
static void intel_crtc_copy_hw_to_uapi_state(struct intel_crtc_state *crtc_state)
{
+ struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev);
+
if (intel_crtc_is_joiner_secondary(crtc_state))
return;
@@ -337,11 +339,30 @@ static void intel_crtc_copy_hw_to_uapi_state(struct intel_crtc_state *crtc_state
crtc_state->uapi.adjusted_mode = crtc_state->hw.adjusted_mode;
crtc_state->uapi.scaling_filter = crtc_state->hw.scaling_filter;
- /* assume 1:1 mapping */
- drm_property_replace_blob(&crtc_state->hw.degamma_lut,
- crtc_state->pre_csc_lut);
- drm_property_replace_blob(&crtc_state->hw.gamma_lut,
- crtc_state->post_csc_lut);
+ if (DISPLAY_INFO(i915)->color.degamma_lut_size) {
+ /* assume 1:1 mapping */
+ drm_property_replace_blob(&crtc_state->hw.degamma_lut,
+ crtc_state->pre_csc_lut);
+ drm_property_replace_blob(&crtc_state->hw.gamma_lut,
+ crtc_state->post_csc_lut);
+ } else {
+ /*
+ * ilk/snb hw may be configured for either pre_csc_lut
+ * or post_csc_lut, but we don't advertise degamma_lut as
+ * being available in the uapi since there is only one
+ * hardware LUT. Always assign the result of the readout
+ * to gamma_lut as that is the only valid source of LUTs
+ * in the uapi.
+ */
+ drm_WARN_ON(&i915->drm, crtc_state->post_csc_lut &&
+ crtc_state->pre_csc_lut);
+
+ drm_property_replace_blob(&crtc_state->hw.degamma_lut,
+ NULL);
+ drm_property_replace_blob(&crtc_state->hw.gamma_lut,
+ crtc_state->post_csc_lut ?:
+ crtc_state->pre_csc_lut);
+ }
drm_property_replace_blob(&crtc_state->uapi.degamma_lut,
crtc_state->hw.degamma_lut);
--
2.44.2
Here is a new batch of fixes for the MPTCP in-kernel path-manager:
Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.
Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.
Patch 3 validates the two previous patches.
Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.
Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.
Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.
Patch 7 validates the three previous patches.
Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.
Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.
Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.
Patch 11 validates the previous patch.
Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.
Patch 13 validates the previous patch.
Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.
Patch 15 validates the previous patch.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Patches 11,15/15: allow the connection to run for longer, should fix
the issue seen on the Netdev CI, with a debug kconfig.
- Link to v1: https://lore.kernel.org/r/20240826-net-mptcp-more-pm-fix-v1-0-8cd6c87d1d6d@…
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: reuse ID 0 after delete and re-add
mptcp: pm: fix RM_ADDR ID for the initial subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: send ACK on an active subflow
mptcp: pm: skip connecting to already established sf
mptcp: pm: reset MPC endp ID when re-added
selftests: mptcp: join: check re-adding init endp with != id
selftests: mptcp: join: no extra msg if no counter
mptcp: pm: do not remove already closed subflows
mptcp: pm: fix ID 0 endp usage after multiple re-creations
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: validate event numbers
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: check re-re-adding ID 0 signal
net/mptcp/pm.c | 4 +-
net/mptcp/pm_netlink.c | 87 ++++++++++----
net/mptcp/protocol.c | 6 +
net/mptcp/protocol.h | 5 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++----
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 +
6 files changed, 209 insertions(+), 50 deletions(-)
---
base-commit: 3a0504d54b3b57f0d7bf3d9184a00c9f8887f6d7
change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
0x7d and 0x7e bytes are meant to be escaped in the data portion of
frames, but this didn't occur since next_chunk_len() had an off-by-one
error. That also resulted in the final byte of a payload being written
as a separate tty write op.
The chunk prior to an escaped byte would be one byte short, and the
next call would never test the txpos+1 case, which is where the escaped
byte was located. That meant it never hit the escaping case in
mctp_serial_tx_work().
Example Input: 01 00 08 c8 7e 80 02
Previous incorrect chunks from next_chunk_len():
01 00 08
c8 7e 80
02
With this fix:
01 00 08 c8
7e
80 02
Cc: stable(a)vger.kernel.org
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding")
Signed-off-by: Matt Johnston <matt(a)codeconstruct.com.au>
---
drivers/net/mctp/mctp-serial.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mctp/mctp-serial.c b/drivers/net/mctp/mctp-serial.c
index 7a40d07ff77b..f39bbe255497 100644
--- a/drivers/net/mctp/mctp-serial.c
+++ b/drivers/net/mctp/mctp-serial.c
@@ -91,8 +91,8 @@ static int next_chunk_len(struct mctp_serial *dev)
* will be those non-escaped bytes, and does not include the escaped
* byte.
*/
- for (i = 1; i + dev->txpos + 1 < dev->txlen; i++) {
- if (needs_escape(dev->txbuf[dev->txpos + i + 1]))
+ for (i = 1; i + dev->txpos < dev->txlen; i++) {
+ if (needs_escape(dev->txbuf[dev->txpos + i]))
break;
}
This fix addresses STAR 9001285599, which only affects dwc3 core version
320a. The timer value for PM_LC_TIMER in 320a for the Link ECN changes
is incorrect. If the PM TIMER ECN is enabled via GUCTL2[19], the link
compliance test (TD7.21) may fail. If the ECN is not enabled
(GUCTL2[19] = 0), the controller will use the old timer value (5us),
which is still acceptable for the link compliance test. Therefore, clear
GUCTL2[19] to pass the USB link compliance test: TD 7.21.
Cc: stable(a)vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com>
---
drivers/usb/dwc3/core.c | 15 +++++++++++++++
drivers/usb/dwc3/core.h | 2 ++
2 files changed, 17 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..d0bd3a0e1f9c 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1378,6 +1378,21 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
}
+ /*
+ * STAR 9001285599: This issue affects dwc3 core version 3.20a
+ * only. If the PM TIMER ECM is enabled through GUCTL2[19], the
+ * link compliance test (TD7.21) may fail. If the ECN is not
+ * enabled (GUCTL2[19] = 0), the controller will use the old timer
+ * value (5us), which is still acceptable for the link compliance
+ * test. Therefore, do not enable PM TIMER ECM in 3.20a by
+ * setting GUCTL2[19] by default; instead, use GUCTL2[19] = 0.
+ */
+ if (DWC3_VER_IS(DWC3, 320A)) {
+ reg = dwc3_readl(dwc->regs, DWC3_GUCTL2);
+ reg &= ~DWC3_GUCTL2_LC_TIMER;
+ dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
+ }
+
/*
* When configured in HOST mode, after issuing U3/L2 exit controller
* fails to send proper CRC checksum in CRC5 feild. Because of this
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..c71240e8f7c7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -421,6 +421,7 @@
/* Global User Control Register 2 */
#define DWC3_GUCTL2_RST_ACTBITLATER BIT(14)
+#define DWC3_GUCTL2_LC_TIMER BIT(19)
/* Global User Control Register 3 */
#define DWC3_GUCTL3_SPLITDISABLE BIT(14)
@@ -1269,6 +1270,7 @@ struct dwc3 {
#define DWC3_REVISION_290A 0x5533290a
#define DWC3_REVISION_300A 0x5533300a
#define DWC3_REVISION_310A 0x5533310a
+#define DWC3_REVISION_320A 0x5533320a
#define DWC3_REVISION_330A 0x5533330a
#define DWC31_REVISION_ANY 0x0
--
2.17.1
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
v5: Further rewording of the comment in function.
v4: Rewording the comment in function definition.
v3: Added notes on top the function definition.
v2: Refactored the patch as suggested in v1 discussion.
drivers/usb/dwc3/gadget.c | 41 ++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 24 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..291bc549935b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,23 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to the programming guide, if the link state is in L1/L2/U3,
+ * then sending the Start Transfer command may not complete. The
+ * programming guide suggested to bring the link state back to ON/U0 by
+ * performing remote wakeup prior to sending the command. However, don't
+ * initiate remote wakeup when the user/function does not send wakeup
+ * request via wakeup ops. Send the command when it's allowed.
+ *
+ * Notes:
+ * For L1 link state, issuing a command requires the clearing of
+ * GUSB2PHYCFG.SUSPENDUSB2, which turns on the signal required to complete
+ * the given command (usually within 50us). This should happen within the
+ * command timeout set by driver. No additional step is needed.
+ *
+ * For L2 or U3 link state, the gadget is in USB suspend. Care should be
+ * taken when sending Start Transfer command to ensure that it's done after
+ * USB resume.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +344,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
--
2.25.1
__split_huge_pmd_locked() can be called for a present THP, devmap or
(non-present) migration entry. It calls pmdp_invalidate()
unconditionally on the pmdp and only determines if it is present or not
based on the returned old pmd. This is a problem for the migration entry
case because pmd_mkinvalid(), called by pmdp_invalidate() must only be
called for a present pmd.
On arm64 at least, pmd_mkinvalid() will mark the pmd such that any
future call to pmd_present() will return true. And therefore any
lockless pgtable walker could see the migration entry pmd in this state
and start interpretting the fields as if it were present, leading to
BadThings (TM). GUP-fast appears to be one such lockless pgtable walker.
x86 does not suffer the above problem, but instead pmd_mkinvalid() will
corrupt the offset field of the swap entry within the swap pte. See link
below for discussion of that problem.
Fix all of this by only calling pmdp_invalidate() for a present pmd. And
for good measure let's add a warning to all implementations of
pmdp_invalidate[_ad](). I've manually reviewed all other
pmdp_invalidate[_ad]() call sites and believe all others to be
conformant.
This is a theoretical bug found during code review. I don't have any
test case to trigger it in practice.
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
Right v3; this goes back to the original approach in v1 to fix core-mm rather
than push the fix into arm64, since we discovered that x86 can't handle
pmd_mkinvalid() being called for non-present pmds either.
I'm pulling in more arch maintainers because this version adds some warnings in
arch code to help spot incorrect usage.
Although Catalin had already accepted v2 (fixing arm64) [2] into for-next/fixes,
he's agreed to either remove or revert it.
Changes since v1 [1]
====================
- Improve pmdp_mkinvalid() docs to make it clear it can only be called for
present pmd (per JohnH, Zi Yan)
- Added warnings to arch overrides of pmdp_invalidate[_ad]() (per Zi Yan)
- Moved comment next to new location of pmpd_invalidate() (per Zi Yan)
[1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.…
[2] https://lore.kernel.org/all/20240430133138.732088-1-ryan.roberts@arm.com/
Thanks,
Ryan
Documentation/mm/arch_pgtable_helpers.rst | 6 ++-
arch/powerpc/mm/book3s64/pgtable.c | 1 +
arch/s390/include/asm/pgtable.h | 4 +-
arch/sparc/mm/tlb.c | 1 +
arch/x86/mm/pgtable.c | 2 +
mm/huge_memory.c | 49 ++++++++++++-----------
mm/pgtable-generic.c | 2 +
7 files changed, 39 insertions(+), 26 deletions(-)
diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst
index 2466d3363af7..ad50ca6f495e 100644
--- a/Documentation/mm/arch_pgtable_helpers.rst
+++ b/Documentation/mm/arch_pgtable_helpers.rst
@@ -140,7 +140,8 @@ PMD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD |
+---------------------------+--------------------------------------------------+
-| pmd_mkinvalid | Invalidates a mapped PMD [1] |
+| pmd_mkinvalid | Invalidates a present PMD; do not call for |
+| | non-present PMD [1] |
+---------------------------+--------------------------------------------------+
| pmd_set_huge | Creates a PMD huge mapping |
+---------------------------+--------------------------------------------------+
@@ -196,7 +197,8 @@ PUD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD |
+---------------------------+--------------------------------------------------+
-| pud_mkinvalid | Invalidates a mapped PUD [1] |
+| pud_mkinvalid | Invalidates a present PUD; do not call for |
+| | non-present PUD [1] |
+---------------------------+--------------------------------------------------+
| pud_set_huge | Creates a PUD huge mapping |
+---------------------------+--------------------------------------------------+
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 83823db3488b..2975ea0841ba 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -170,6 +170,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
{
unsigned long old_pmd;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return __pmd(old_pmd);
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 60950e7a25f5..480bea44559d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1768,8 +1768,10 @@ static inline pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
- pmd_t pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
+ pmd_t pmd;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+ pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
return pmdp_xchg_direct(vma->vm_mm, addr, pmdp, pmd);
}
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index b44d79d778c7..ef69127d7e5e 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -249,6 +249,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
{
pmd_t old, entry;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
old = pmdp_establish(vma, address, pmdp, entry);
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index d007591b8059..103cbccf1d7d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -631,6 +631,8 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+
/*
* No flush is necessary. Once an invalid PTE is established, the PTE's
* access and dirty bits cannot be updated.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 89f58c7603b2..dd1fc105f70b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2493,32 +2493,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
return __split_huge_zero_page_pmd(vma, haddr, pmd);
}
- /*
- * Up to this point the pmd is present and huge and userland has the
- * whole access to the hugepage during the split (which happens in
- * place). If we overwrite the pmd with the not-huge version pointing
- * to the pte here (which of course we could if all CPUs were bug
- * free), userland could trigger a small page size TLB miss on the
- * small sized TLB while the hugepage TLB entry is still established in
- * the huge TLB. Some CPU doesn't like that.
- * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
- * 383 on page 105. Intel should be safe but is also warns that it's
- * only safe if the permission and cache attributes of the two entries
- * loaded in the two TLB is identical (which should be the case here).
- * But it is generally safer to never allow small and huge TLB entries
- * for the same virtual address to be loaded simultaneously. So instead
- * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
- * current pmd notpresent (atomically because here the pmd_trans_huge
- * must remain set at all times on the pmd until the split is complete
- * for this pmd), then we flush the SMP TLB and finally we write the
- * non-huge version of the pmd entry with pmd_populate.
- */
- old_pmd = pmdp_invalidate(vma, haddr, pmd);
-
- pmd_migration = is_pmd_migration_entry(old_pmd);
+ pmd_migration = is_pmd_migration_entry(*pmd);
if (unlikely(pmd_migration)) {
swp_entry_t entry;
+ old_pmd = *pmd;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_swap_entry_to_page(entry);
write = is_writable_migration_entry(entry);
@@ -2529,6 +2508,30 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
soft_dirty = pmd_swp_soft_dirty(old_pmd);
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
+ /*
+ * Up to this point the pmd is present and huge and userland has
+ * the whole access to the hugepage during the split (which
+ * happens in place). If we overwrite the pmd with the not-huge
+ * version pointing to the pte here (which of course we could if
+ * all CPUs were bug free), userland could trigger a small page
+ * size TLB miss on the small sized TLB while the hugepage TLB
+ * entry is still established in the huge TLB. Some CPU doesn't
+ * like that. See
+ * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
+ * 383 on page 105. Intel should be safe but is also warns that
+ * it's only safe if the permission and cache attributes of the
+ * two entries loaded in the two TLB is identical (which should
+ * be the case here). But it is generally safer to never allow
+ * small and huge TLB entries for the same virtual address to be
+ * loaded simultaneously. So instead of doing "pmd_populate();
+ * flush_pmd_tlb_range();" we first mark the current pmd
+ * notpresent (atomically because here the pmd_trans_huge must
+ * remain set at all times on the pmd until the split is
+ * complete for this pmd), then we flush the SMP TLB and finally
+ * we write the non-huge version of the pmd entry with
+ * pmd_populate.
+ */
+ old_pmd = pmdp_invalidate(vma, haddr, pmd);
page = pmd_page(old_pmd);
folio = page_folio(page);
if (pmd_dirty(old_pmd)) {
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 4fcd959dcc4d..a78a4adf711a 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return old;
@@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
return pmdp_invalidate(vma, address, pmdp);
}
#endif
--
2.25.1
Boards based on the same SoC family can use different boot loaders.
These may pass numeric arguments which we erroneously interpret as
command line or environment pointers. Such errors will cause boot
to halt at an early stage since commit 056a68cea01e ("mips: allow
firmware to pass RNG seed to kernel").
One known example of this issue is a HPE switch using a BootWare
boot loader. It was found to pass these arguments to the kernel:
0x00020000 0x00060000 0xfffdffff 0x0000416c
We can avoid hanging by validating that both passed pointers are in
KSEG1 as expected.
Cc: stable(a)vger.kernel.org
Fixes: 14aecdd41921 ("MIPS: FW: Add environment variable processing.")
Signed-off-by: Bjørn Mork <bjorn(a)mork.no>
---
arch/mips/fw/lib/cmdline.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/fw/lib/cmdline.c b/arch/mips/fw/lib/cmdline.c
index 892765b742bb..51238c4f9455 100644
--- a/arch/mips/fw/lib/cmdline.c
+++ b/arch/mips/fw/lib/cmdline.c
@@ -22,7 +22,7 @@ void __init fw_init_cmdline(void)
int i;
/* Validate command line parameters. */
- if ((fw_arg0 >= CKSEG0) || (fw_arg1 < CKSEG0)) {
+ if (fw_arg0 >= CKSEG0 || fw_arg1 < CKSEG0 || fw_arg1 >= CKSEG2) {
fw_argc = 0;
_fw_argv = NULL;
} else {
@@ -31,7 +31,7 @@ void __init fw_init_cmdline(void)
}
/* Validate environment pointer. */
- if (fw_arg2 < CKSEG0)
+ if (fw_arg2 < CKSEG0 || fw_arg2 >= CKSEG2)
_fw_envp = NULL;
else
_fw_envp = (int *)fw_arg2;
--
2.39.2
The extent changeset may have some additional memory dynamically allocated
for ulist in result of clear_record_extent_bits() execution.
Release it after the local changeset is no longer needed in
BTRFS_QGROUP_MODE_DISABLED case.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+81670362c283f3dd889c(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com
Fixes: af0e2aab3b70 ("btrfs: qgroup: flush reservations during quota disable")
Cc: stable(a)vger.kernel.org # 6.10+
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
fs/btrfs/qgroup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 5d57a285d59b..4f1fa5d427e1 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4345,9 +4345,10 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode,
if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) {
extent_changeset_init(&changeset);
- return clear_record_extent_bits(&inode->io_tree, start,
- start + len - 1,
- EXTENT_QGROUP_RESERVED, &changeset);
+ ret = clear_record_extent_bits(&inode->io_tree, start,
+ start + len - 1,
+ EXTENT_QGROUP_RESERVED, &changeset);
+ goto out;
}
/* In release case, we shouldn't have @reserved */
--
2.39.2
Most firmware names are hardcoded strings, or are constructed from fairly
constrained format strings where the dynamic parts are just some hex
numbers or such.
However, there are a couple codepaths in the kernel where firmware file
names contain string components that are passed through from a device or
semi-privileged userspace; the ones I could find (not counting interfaces
that require root privileges) are:
- lpfc_sli4_request_firmware_update() seems to construct the firmware
filename from "ModelName", a string that was previously parsed out of
some descriptor ("Vital Product Data") in lpfc_fill_vpd()
- nfp_net_fw_find() seems to construct a firmware filename from a model
name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I
think parses some descriptor that was read from the device.
(But this case likely isn't exploitable because the format string looks
like "netronome/nic_%s", and there shouldn't be any *folders* starting
with "netronome/nic_". The previous case was different because there,
the "%s" is *at the start* of the format string.)
- module_flash_fw_schedule() is reachable from the
ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as
GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is
enough to pass the privilege check), and takes a userspace-provided
firmware name.
(But I think to reach this case, you need to have CAP_NET_ADMIN over a
network namespace that a special kind of ethernet device is mapped into,
so I think this is not a viable attack path in practice.)
Fix it by rejecting any firmware names containing ".." path components.
For what it's worth, I went looking and haven't found any USB device
drivers that use the firmware loader dangerously.
Cc: stable(a)vger.kernel.org
Reviewed-by: Danilo Krummrich <dakr(a)kernel.org>
Fixes: abb139e75c2c ("firmware: teach the kernel to load firmware files directly from the filesystem")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Changes in v3:
- replace name_contains_dotdot implementation (Danilo)
- add missing \n in log format string (Danilo)
- Link to v2: https://lore.kernel.org/r/20240823-firmware-traversal-v2-1-880082882709@goo…
Changes in v2:
- describe fix in commit message (dakr)
- write check more clearly and with comment in separate helper (dakr)
- document new restriction in comment above request_firmware() (dakr)
- warn when new restriction is triggered
- Link to v1: https://lore.kernel.org/r/20240820-firmware-traversal-v1-1-8699ffaa9276@goo…
---
drivers/base/firmware_loader/main.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index a03ee4b11134..324a9a3c087a 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -849,6 +849,26 @@ static void fw_log_firmware_info(const struct firmware *fw, const char *name,
{}
#endif
+/*
+ * Reject firmware file names with ".." path components.
+ * There are drivers that construct firmware file names from device-supplied
+ * strings, and we don't want some device to be able to tell us "I would like to
+ * be sent my firmware from ../../../etc/shadow, please".
+ *
+ * Search for ".." surrounded by either '/' or start/end of string.
+ *
+ * This intentionally only looks at the firmware name, not at the firmware base
+ * directory or at symlink contents.
+ */
+static bool name_contains_dotdot(const char *name)
+{
+ size_t name_len = strlen(name);
+
+ return strcmp(name, "..") == 0 || strncmp(name, "../", 3) == 0 ||
+ strstr(name, "/../") != NULL ||
+ (name_len >= 3 && strcmp(name+name_len-3, "/..") == 0);
+}
+
/* called from request_firmware() and request_firmware_work_func() */
static int
_request_firmware(const struct firmware **firmware_p, const char *name,
@@ -869,6 +889,14 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
goto out;
}
+ if (name_contains_dotdot(name)) {
+ dev_warn(device,
+ "Firmware load for '%s' refused, path contains '..' component\n",
+ name);
+ ret = -EINVAL;
+ goto out;
+ }
+
ret = _request_firmware_prepare(&fw, name, device, buf, size,
offset, opt_flags);
if (ret <= 0) /* error or already assigned */
@@ -946,6 +974,8 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
* @name will be used as $FIRMWARE in the uevent environment and
* should be distinctive enough not to be confused with any other
* firmware image for this or any other device.
+ * It must not contain any ".." path components - "foo/bar..bin" is
+ * allowed, but "foo/../bar.bin" is not.
*
* Caller must hold the reference count of @device.
*
---
base-commit: b0da640826ba3b6506b4996a6b23a429235e6923
change-id: 20240820-firmware-traversal-6df8501b0fe4
--
Jann Horn <jannh(a)google.com>
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e
("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags
includes __GFP_NOFAIL with high order in vm_area_alloc_pages()
and page allocation failed for high order, the pages** may contain
two different page shifts (high order and order-0). This could
lead __vmap_pages_range_noflush() to perform incorrect mappings,
potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
__vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
vmap_pages_range()
vmap_pages_range_noflush()
__vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order
allocation fails, __vmalloc_node_range_noprof() will retry with
order-0. Therefore, it is unnecessary to fallback to order-0
here. Therefore, fix this by removing the fallback code.
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com>
Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
CC: Barry Song <21cnbao(a)gmail.com>
CC: Baoquan He <bhe(a)redhat.com>
CC: Matthew Wilcox <willy(a)infradead.org>
---
mm/vmalloc.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6b783baf12a1..af2de36549d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_noprof(alloc_gfp, order);
else
page = alloc_pages_node_noprof(nid, alloc_gfp, order);
- if (unlikely(!page)) {
- if (!nofail)
- break;
-
- /* fall back to the zero order allocations */
- alloc_gfp |= __GFP_NOFAIL;
- order = 0;
- continue;
- }
+ if (unlikely(!page))
+ break;
/*
* Higher order allocations must be able to be treated as
---
Sorry for fat fingers. with .rej file. resend this.
Baoquan suggests set page_shift to 0 if fallback in (2 and concern about
performance of retry with order-0. But IMO with retry,
- Save memory usage if high order allocation failed.
- Keep consistancy with align and page-shift.
- make use of bulk allocator with order-0
[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/
--
2.30.0
Hi, all
Recently I noticed a bug[1] in btrfs, after digged it into
and I believe it'a race in vfs.
Let's assume there's a inode (ie ino 261) with i_count 1 is
called by iput(), and there's a concurrent thread calling
generic_shutdown_super().
cpu0: cpu1:
iput() // i_count is 1
->spin_lock(inode)
->dec i_count to 0
->iput_final() generic_shutdown_super()
->__inode_add_lru() ->evict_inodes()
// cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
// return before // inode 261 passed the above check
// list_lru_add_obj() // and then schedule out
->spin_unlock()
// note here: the inode 261
// was still at sb list and hash list,
// and I_FREEING|I_WILL_FREE was not been set
btrfs_iget()
// after some function calls
->find_inode()
// found the above inode 261
->spin_lock(inode)
// check I_FREEING|I_WILL_FREE
// and passed
->__iget()
->spin_unlock(inode) // schedule back
->spin_lock(inode)
// check (I_NEW|I_FREEING|I_WILL_FREE) flags,
// passed and set I_FREEING
iput() ->spin_unlock(inode)
->spin_lock(inode) ->evict()
// dec i_count to 0
->iput_final()
->spin_unlock()
->evict()
Now, we have two threads simultaneously evicting
the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
statement both within clear_inode() and iput().
To fix the bug, recheck the inode->i_count after holding i_lock.
Because in the most scenarios, the first check is valid, and
the overhead of spin_lock() can be reduced.
If there is any misunderstanding, please let me know, thanks.
[1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
[2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
return false when I reproduced the bug.
Reported-by: syzbot+67ba3c42bcbb4665d3ad(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
CC: stable(a)vger.kernel.org
Fixes: 63997e98a3be ("split invalidate_inodes()")
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/inode.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/inode.c b/fs/inode.c
index 3a41f83a4ba5..011f630777d0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -723,6 +723,10 @@ void evict_inodes(struct super_block *sb)
continue;
spin_lock(&inode->i_lock);
+ if (atomic_read(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
+ continue;
+ }
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
continue;
--
2.39.2
Hi, all.
Recently syzbot reported a bug as following:
kernel BUG at fs/f2fs/inode.c:896!
CPU: 1 UID: 0 PID: 5217 Comm: syz-executor605 Not tainted 6.11.0-rc4-syzkaller-00033-g872cf28b8df9 #0
RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896
Call Trace:
<TASK>
evict+0x532/0x950 fs/inode.c:704
dispose_list fs/inode.c:747 [inline]
evict_inodes+0x5f9/0x690 fs/inode.c:797
generic_shutdown_super+0x9d/0x2d0 fs/super.c:627
kill_block_super+0x44/0x90 fs/super.c:1696
kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898
deactivate_locked_super+0xc4/0x130 fs/super.c:473
cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373
task_work_run+0x24f/0x310 kernel/task_work.c:228
ptrace_notify+0x2d2/0x380 kernel/signal.c:2402
ptrace_report_syscall include/linux/ptrace.h:415 [inline]
ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline]
syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173
syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218
do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The syzbot constructed the following scenario: concurrently
creating directories and setting the file system to read-only.
In this case, while f2fs was making dir, the filesystem switched to
readonly, and when it tried to clear the dirty flag, it triggered this
code path: f2fs_mkdir()-> f2fs_sync_fs()->f2fs_write_checkpoint()
->f2fs_readonly(). This resulted FI_DIRTY_INODE flag not being cleared,
which eventually led to a bug being triggered during the FI_DIRTY_INODE
check in f2fs_evict_inode().
In this case, we cannot do anything further, so if filesystem is readonly,
do not trigger the BUG. Instead, clean up resources to the best of our
ability to prevent triggering subsequent resource leak checks.
If there is anything important I'm missing, please let me know, thanks.
Reported-by: syzbot+ebea2790904673d7c618(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ebea2790904673d7c618
Fixes: ca7d802a7d8e ("f2fs: detect dirty inode in evict_inode")
CC: stable(a)vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/f2fs/inode.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index aef57172014f..52d273383ec2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -892,8 +892,12 @@ void f2fs_evict_inode(struct inode *inode)
atomic_read(&fi->i_compr_blocks));
if (likely(!f2fs_cp_error(sbi) &&
- !is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
- f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
+ !is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
+ if (!f2fs_readonly(sbi->sb))
+ f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
+ else
+ f2fs_inode_synced(inode);
+ }
else
f2fs_inode_synced(inode);
--
2.39.2
From: Jan Kiszka <jan.kiszka(a)siemens.com>
By simply bailing out, the driver was violating its rule and internal
assumptions that either both or no rproc should be initialized. E.g.,
this could cause the first core to be available but not the second one,
leading to crashes on its shutdown later on while trying to dereference
that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1")
Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com>
---
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 39a47540c590..eb09d2e9b32a 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
dev_err(dev,
"Timed out waiting for %s core to power up!\n",
rproc->name);
- return ret;
+ goto err_powerup;
}
}
@@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
}
}
+err_powerup:
rproc_del(rproc);
err_add:
k3_r5_reserved_mem_exit(kproc);
--
2.43.0
The TDG_VM_WR TDCALL is used to ask the TDX module to change some
TD-specific VM configuration. There is currently only one user in the
kernel of this TDCALL leaf. More will be added shortly.
Refactor to make way for more users of TDG_VM_WR who will need to modify
other TD configuration values.
Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specific
metadata from the TDX module. There are currently no users for
TDG_VM_RD. Mark it as __maybe_unused until the first user appears.
This is preparation for enumeration and enabling optional TD features.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 32 ++++++++++++++++++++++++++-----
arch/x86/include/asm/shared/tdx.h | 1 +
2 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2bac2553..64717a96a936 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -77,6 +77,32 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
}
+/* Read TD-scoped metadata */
+static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ };
+ u64 ret;
+
+ ret = __tdcall_ret(TDG_VM_RD, &args);
+ *value = args.r8;
+
+ return ret;
+}
+
+/* Write TD-scoped metadata */
+static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ .r8 = value,
+ .r9 = mask,
+ };
+
+ return __tdcall(TDG_VM_WR, &args);
+}
+
/**
* tdx_mcall_get_report0() - Wrapper to get TDREPORT0 (a.k.a. TDREPORT
* subtype 0) using TDG.MR.REPORT TDCALL.
@@ -924,10 +950,6 @@ static void tdx_kexec_finish(void)
void __init tdx_early_init(void)
{
- struct tdx_module_args args = {
- .rdx = TDCS_NOTIFY_ENABLES,
- .r9 = -1ULL,
- };
u64 cc_mask;
u32 eax, sig[3];
@@ -946,7 +968,7 @@ void __init tdx_early_init(void)
cc_set_mask(cc_mask);
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdcall(TDG_VM_WR, &args);
+ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
/*
* All bits above GPA width are reserved and kernel treats shared bit
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index fdfd41511b02..7e12cfa28bec 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -16,6 +16,7 @@
#define TDG_VP_VEINFO_GET 3
#define TDG_MR_REPORT 4
#define TDG_MEM_PAGE_ACCEPT 6
+#define TDG_VM_RD 7
#define TDG_VM_WR 8
/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
--
2.45.2
Anirudh Rayabharam <anirudh(a)anirudhrb.com> writes:
> On Mon, Aug 26, 2024 at 02:36:44PM +0200, Vitaly Kuznetsov wrote:
>> Anirudh Rayabharam <anirudh(a)anirudhrb.com> writes:
>>
>> > From: Anirudh Rayabharam (Microsoft) <anirudh(a)anirudhrb.com>
>> >
>> > 9636be85cc5b ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go
>> > online/offline") introduces a new cpuhp state for hyperv initialization.
>> >
>> > cpuhp_setup_state() returns the state number if state is CPUHP_AP_ONLINE_DYN
>> > or CPUHP_BP_PREPARE_DYN and 0 for all other states. For the hyperv case,
>> > since a new cpuhp state was introduced it would return 0. However,
>> > in hv_machine_shutdown(), the cpuhp_remove_state() call is conditioned upon
>> > "hyperv_init_cpuhp > 0". This will never be true and so hv_cpu_die() won't be
>> > called on all CPUs. This means the VP assist page won't be reset. When the
>> > kexec kernel tries to setup the VP assist page again, the hypervisor corrupts
>> > the memory region of the old VP assist page causing a panic in case the kexec
>> > kernel is using that memory elsewhere. This was originally fixed in dfe94d4086e4
>> > ("x86/hyperv: Fix kexec panic/hang issues").
>> >
>> > Set hyperv_init_cpuhp to CPUHP_AP_HYPERV_ONLINE upon successful setup so that
>> > the hyperv cpuhp state is removed correctly on kexec and the necessary cleanup
>> > takes place.
>> >
>> > Cc: stable(a)vger.kernel.org
>> > Fixes: 9636be85cc5b ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline")
>> > Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh(a)anirudhrb.com>
>> > ---
>> > arch/x86/hyperv/hv_init.c | 4 ++--
>> > 1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> > index 17a71e92a343..81d1981a75d1 100644
>> > --- a/arch/x86/hyperv/hv_init.c
>> > +++ b/arch/x86/hyperv/hv_init.c
>> > @@ -607,7 +607,7 @@ void __init hyperv_init(void)
>> >
>> > register_syscore_ops(&hv_syscore_ops);
>> >
>> > - hyperv_init_cpuhp = cpuhp;
>> > + hyperv_init_cpuhp = CPUHP_AP_HYPERV_ONLINE;
>>
>> Do we really need 'hyperv_init_cpuhp' at all? I.e. post-change (which
>> LGTM btw), I can only see one usage in hv_machine_shutdown():
>>
>> if (kexec_in_progress && hyperv_init_cpuhp > 0)
>> cpuhp_remove_state(hyperv_init_cpuhp);
>>
>> and I'm wondering if the 'hyperv_init_cpuhp' check is really
>> needed. This only case where this check would fail is if we're crashing
>> in between ms_hyperv_init_platform() and hyperv_init() afaiu. Does it
>
> Or if we fail to setup the cpuhp state for some reason but don't
> actually crash and then later do a kexec?
I see this can happen for CPUHP_AP_ONLINE_DYN/CPUHP_BP_PREPARE_DYN
because we run out of free slots (40/20), but here we have our own
dedicated CPUHP_AP_HYPERV_ONLINE and other failure paths seem to be
exotic...
>
> I guess I was just trying to be extra safe and make sure we have
> actually setup the cpuhp state before calling cpuhp_remove_state()
> for it. However, looking elsewhere in the kernel code I don't
> see anybody doing this for custom states...
>
>> hurt if we try cpuhp_remove_state() anyway?
>
> cpuhp_invoke_callback() would trigger a WARNING if we try to remove a
> cpuhp state that was never setup.
>
> 184 if (cpuhp_step_empty(bringup, step)) {
> 185 WARN_ON_ONCE(1);
> 186 return 0;
> 187 }
>
Personally, I'd say that getting an extra WARN for such a corner case
(failing to setup cpuhp state or crashing in between
ms_hyperv_init_platform() and hyperv_init()) is OK.
Alternatively, we can convert hyperv_init_cpuhp to a boolean to make it
a bit more staitforward but as it's uncomon to do it for other states,
it's likely an overkill.
--
Vitaly
6.9 moved client RPC calls to namespace in "Make nfs stats visible in
network NS" patchet.
https://lore.kernel.org/linux-nfs/cover.1708026931.git.josef@toxicpanda.com/
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
Changes v1->v2:
* Point out whole patchset, not just single commit
* Add a comment about the patchset
Hi all,
could you please ack this so that we have fixed mainline?
FYI Some parts has been backported, e.g.:
d47151b79e322 ("nfs: expose /proc/net/sunrpc/nfs in net namespaces")
to all stable/LTS: 5.4.276, 5.10.217, 5.15.159, 6.1.91, 6.6.31.
But most of that is not yet (but planned to be backported), e.g.
93483ac5fec62 ("nfsd: expose /proc/net/sunrpc/nfsd in net namespaces")
see Chuck's patchset for 6.6
https://lore.kernel.org/linux-nfs/20240812223604.32592-1-cel@kernel.org/
Once all kernels up to 5.4 fixed we should update the version.
Kind regards,
Petr
testcases/network/nfs/nfsstat01/nfsstat01.sh | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/testcases/network/nfs/nfsstat01/nfsstat01.sh b/testcases/network/nfs/nfsstat01/nfsstat01.sh
index c2856eff1f..1beecbec43 100755
--- a/testcases/network/nfs/nfsstat01/nfsstat01.sh
+++ b/testcases/network/nfs/nfsstat01/nfsstat01.sh
@@ -15,7 +15,14 @@ get_calls()
local calls opt
[ "$name" = "rpc" ] && opt="r" || opt="n"
- ! tst_net_use_netns && [ "$nfs_f" != "nfs" ] && type="rhost"
+
+ if tst_net_use_netns; then
+ # "Make nfs stats visible in network NS" patchet
+ # https://lore.kernel.org/linux-nfs/cover.1708026931.git.josef@toxicpanda.com/
+ tst_kvcmp -ge "6.9" && [ "$nfs_f" = "nfs" ] && type="rhost"
+ else
+ [ "$nfs_f" != "nfs" ] && type="rhost"
+ fi
if [ "$type" = "lhost" ]; then
calls="$(grep $name /proc/net/rpc/$nfs_f | cut -d' ' -f$field)"
--
2.45.2
To check if mmc cqe is in halt state, need to check set/clear of CQHCI_HALT
bit. At this time, we need to check with &, not &&.
Fixes: 0653300224a6 ("mmc: cqhci: rename cqhci.c to cqhci-core.c")
Cc: stable(a)vger.kernel.org
Signed-off-by: Seunghwan Baek <sh8267.baek(a)samsung.com>
---
drivers/mmc/host/cqhci-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index c14d7251d0bb..a02da26a1efd 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -617,7 +617,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
cqhci_writel(cq_host, 0, CQHCI_CTL);
mmc->cqe_on = true;
pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
- if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
+ if (cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT) {
pr_err("%s: cqhci: CQE failed to exit halt state\n",
mmc_hostname(mmc));
}
--
2.17.1
If an interrupt occurs in queued_spin_lock_slowpath() after we increment
qnodesp->count and before node->lock is initialized, another CPU might
see stale lock values in get_tail_qnode(). If the stale lock value happens
to match the lock on that CPU, then we write to the "next" pointer of
the wrong qnode. This causes a deadlock as the former CPU, once it becomes
the head of the MCS queue, will spin indefinitely until it's "next" pointer
is set by its successor in the queue. This results in lockups similar to
the following.
watchdog: CPU 15 Hard LOCKUP
......
NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490
LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90
Call Trace:
0xc000002cfffa3bf0 (unreliable)
_raw_spin_lock+0x6c/0x90
raw_spin_rq_lock_nested.part.135+0x4c/0xd0
sched_ttwu_pending+0x60/0x1f0
__flush_smp_call_function_queue+0x1dc/0x670
smp_ipi_demux_relaxed+0xa4/0x100
xive_muxed_ipi_action+0x20/0x40
__handle_irq_event_percpu+0x80/0x240
handle_irq_event_percpu+0x2c/0x80
handle_percpu_irq+0x84/0xd0
generic_handle_irq+0x54/0x80
__do_irq+0xac/0x210
__do_IRQ+0x74/0xd0
0x0
do_IRQ+0x8c/0x170
hardware_interrupt_common_virt+0x29c/0x2a0
--- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490
......
NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490
LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90
--- interrupt: 500
0xc0000029c1a41d00 (unreliable)
_raw_spin_lock+0x6c/0x90
futex_wake+0x100/0x260
do_futex+0x21c/0x2a0
sys_futex+0x98/0x270
system_call_exception+0x14c/0x2f0
system_call_vectored_common+0x15c/0x2ec
The following code flow illustrates how the deadlock occurs:
CPU0 CPU1
---- ----
spin_lock_irqsave(A) |
spin_unlock_irqrestore(A) |
spin_lock(B) |
| |
▼ |
id = qnodesp->count++; |
(Note that nodes[0].lock == A) |
| |
▼ |
Interrupt |
(happens before "nodes[0].lock = B") |
| |
▼ |
spin_lock_irqsave(A) |
| |
▼ |
id = qnodesp->count++ |
nodes[1].lock = A |
| |
▼ |
Tail of MCS queue |
| spin_lock_irqsave(A)
▼ |
Head of MCS queue ▼
| CPU0 is previous tail
▼ |
Spin indefinitely ▼
(until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0)
|
▼
prev == &qnodes[CPU0].nodes[0]
(as qnodes[CPU0].nodes[0].lock == A)
|
▼
WRITE_ONCE(prev->next, node)
|
▼
Spin indefinitely
(until nodes[0].locked == 1)
Thanks to Saket Kumar Bhaskar for help with recreating the issue
Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters")
Cc: stable(a)vger.kernel.org # v6.2+
Reported-by: Geetika Moolchandani <geetika(a)linux.ibm.com>
Reported-by: Vaishnavi Bhat <vaish123(a)in.ibm.com>
Reported-by: Jijo Varghese <vargjijo(a)in.ibm.com>
Signed-off-by: Nysal Jan K.A. <nysal(a)linux.ibm.com>
---
arch/powerpc/lib/qspinlock.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 5de4dd549f6e..59861c665cef 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -697,6 +697,12 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
}
release:
+ /*
+ * Clear the lock, as another CPU might see stale values if an
+ * interrupt occurs after we increment qnodesp->count but before
+ * node->lock is initialized
+ */
+ node->lock = NULL;
qnodesp->count--; /* release the node */
}
--
2.46.0
This commit addresses an issue where the USB core could access an
invalid event buffer address during runtime suspend, potentially causing
SMMU faults and other memory issues in Exynos platforms. The problem
arises from the following sequence.
1. In dwc3_gadget_suspend, there is a chance of a timeout when
moving the USB core to the halt state after clearing the
run/stop bit by software.
2. In dwc3_core_exit, the event buffer is cleared regardless of
the USB core's status, which may lead to an SMMU faults and
other memory issues. if the USB core tries to access the event
buffer address.
To prevent this hardware quirk on Exynos platforms, this commit ensures
that the event buffer address is not cleared by software when the USB
core is active during runtime suspend by checking its status before
clearing the buffer address.
Cc: stable(a)vger.kernel.org # v6.1+
Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com>
---
Changes in v3:
- Added comment on why we need this fix.
- Included platform name in commit message.
- Removed Fixes tag as no issue on the previous commits, and updated Cc tag.
- Link to v2: https://lore.kernel.org/lkml/20240808120507.1464-1-selvarasu.g@samsung.com/
Changes in v2:
- Added separate check for USB controller status before cleaning the
event buffer.
- Link to v1: https://lore.kernel.org/lkml/20240722145617.537-1-selvarasu.g@samsung.com/
---
drivers/usb/dwc3/core.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..ccc3895dbd7f 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -564,9 +564,17 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
void dwc3_event_buffers_cleanup(struct dwc3 *dwc)
{
struct dwc3_event_buffer *evt;
+ u32 reg;
if (!dwc->ev_buf)
return;
+ /*
+ * Exynos platforms may not be able to access event buffer if the
+ * controller failed to halt on dwc3_core_exit().
+ */
+ reg = dwc3_readl(dwc->regs, DWC3_DSTS);
+ if (!(reg & DWC3_DSTS_DEVCTRLHLT))
+ return;
evt = dwc->ev_buf;
--
2.17.1
The patch titled
Subject: codetag: debug: mark codetags for poisoned page as empty
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
codetag-debug-mark-codetags-for-poisoned-page-as-empty.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hao Ge <gehao(a)kylinos.cn>
Subject: codetag: debug: mark codetags for poisoned page as empty
Date: Mon, 26 Aug 2024 00:36:49 +0800
When PG_hwpoison pages are freed they are treated differently in
free_pages_prepare() and instead of being released they are isolated.
Page allocation tag counters are decremented at this point since the page
is considered not in use. Later on when such pages are released by
unpoison_memory(), the allocation tag counters will be decremented again
and the following warning gets reported:
[ 113.930443][ T3282] ------------[ cut here ]------------
[ 113.931105][ T3282] alloc_tag was not set
[ 113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164
[ 113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4
[ 113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G W 6.11.0-rc4-dirty #18
[ 113.943003][ T3282] Tainted: [W]=WARN
[ 113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164
[ 113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164
[ 113.946706][ T3282] sp : ffff800087093a10
[ 113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0
[ 113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000
[ 113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000
[ 113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff
[ 113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210
[ 113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339
[ 113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0
[ 113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff
[ 113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001
[ 113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000
[ 113.957962][ T3282] Call trace:
[ 113.958350][ T3282] pgalloc_tag_sub.part.66+0x154/0x164
[ 113.959000][ T3282] pgalloc_tag_sub+0x14/0x1c
[ 113.959539][ T3282] free_unref_page+0xf4/0x4b8
[ 113.960096][ T3282] __folio_put+0xd4/0x120
[ 113.960614][ T3282] folio_put+0x24/0x50
[ 113.961103][ T3282] unpoison_memory+0x4f0/0x5b0
[ 113.961678][ T3282] hwpoison_unpoison+0x30/0x48 [hwpoison_inject]
[ 113.962436][ T3282] simple_attr_write_xsigned.isra.34+0xec/0x1cc
[ 113.963183][ T3282] simple_attr_write+0x38/0x48
[ 113.963750][ T3282] debugfs_attr_write+0x54/0x80
[ 113.964330][ T3282] full_proxy_write+0x68/0x98
[ 113.964880][ T3282] vfs_write+0xdc/0x4d0
[ 113.965372][ T3282] ksys_write+0x78/0x100
[ 113.965875][ T3282] __arm64_sys_write+0x24/0x30
[ 113.966440][ T3282] invoke_syscall+0x7c/0x104
[ 113.966984][ T3282] el0_svc_common.constprop.1+0x88/0x104
[ 113.967652][ T3282] do_el0_svc+0x2c/0x38
[ 113.968893][ T3282] el0_svc+0x3c/0x1b8
[ 113.969379][ T3282] el0t_64_sync_handler+0x98/0xbc
[ 113.969980][ T3282] el0t_64_sync+0x19c/0x1a0
[ 113.970511][ T3282] ---[ end trace 0000000000000000 ]---
To fix this, clear the page tag reference after the page got isolated
and accounted for.
Link: https://lkml.kernel.org/r/20240825163649.33294-1-hao.ge@linux.dev
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Hao Ge <gehao(a)kylinos.cn>
Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hao Ge <gehao(a)kylinos.cn>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: <stable(a)vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/page_alloc.c~codetag-debug-mark-codetags-for-poisoned-page-as-empty
+++ a/mm/page_alloc.c
@@ -1054,6 +1054,13 @@ __always_inline bool free_pages_prepare(
reset_page_owner(page, order);
page_table_check_free(page, order);
pgalloc_tag_sub(page, 1 << order);
+
+ /*
+ * The page is isolated and accounted for.
+ * Mark the codetag as empty to avoid accounting error
+ * when the page is freed by unpoison_memory().
+ */
+ clear_page_tag_ref(page);
return false;
}
_
Patches currently in -mm which might be from gehao(a)kylinos.cn are
mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook.patch
codetag-debug-mark-codetags-for-poisoned-page-as-empty.patch
mm-cma-change-the-addition-of-totalcma_pages-in-the-cma_init_reserved_mem.patch
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
v4: Rewording the comment in function definition.
v3: Added notes on top the function definition.
v2: Refactored the patch as suggested in v1 discussion.
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++------------------------
1 file changed, 14 insertions(+), 24 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..ea583d24aa37 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,20 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to databook, while issuing StartXfer command if the link is in L1/L2/U3,
+ * then the command may not complete and timeout, hence software must bring the link
+ * back to ON state by performing remote wakeup. However, since issuing a command in
+ * USB2 speeds requires the clearing of GUSB2PHYCFG.SUSPENDUSB2, which turns on the
+ * signal required to complete the given command (usually within 50us). This should
+ * happen within the command timeout set by driver. Hence we don't expect to trigger
+ * a remote wakeup from here; instead it should be done by wakeup ops.
+ *
+ * Special note: If wakeup ops is triggered for remote wakeup, care should be taken
+ * if StartXfer command needs to be sent soon after. The wakeup ops is asynchronous
+ * and the link state may not transition to ON state yet. And after receiving wakeup
+ * event, device would no longer be in U3, and any link transition afterwards needs
+ * to be adressed with wakeup ops.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +341,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
--
2.25.1
Memory access #VE's are hard for Linux to handle in contexts like the
entry code or NMIs. But other OSes need them for functionality.
There's a static (pre-guest-boot) way for a VMM to choose one or the
other. But VMMs don't always know which OS they are booting, so they
choose to deliver those #VE's so the "other" OSes will work. That,
unfortunately has left us in the lurch and exposed to these
hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static
configuration is "send nasty #VE's", the kernel can dynamically request
that they be disabled.
Check if the feature is available and disable SEPT #VE if possible.
If the TD allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
attribute is no longer reliable. It reflects the initial state of the
control for the TD, but it will not be updated if someone (e.g. bootloader)
changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
determine if SEPT #VEs are enabled or disabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++-------
arch/x86/include/asm/shared/tdx.h | 10 +++-
2 files changed, 69 insertions(+), 17 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 08ce488b54d0..ba3103877b21 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -78,7 +78,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
}
/* Read TD-scoped metadata */
-static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+static inline u64 tdg_vm_rd(u64 field, u64 *value)
{
struct tdx_module_args args = {
.rdx = field,
@@ -193,6 +193,62 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
+/*
+ * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure
+ * that no #VE will be delivered for accesses to TD-private memory.
+ *
+ * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM
+ * controls if the guest will receive such #VE with TD attribute
+ * ATTR_SEPT_VE_DISABLE.
+ *
+ * Newer TDX module allows the guest to control if it wants to receive SEPT
+ * violation #VEs.
+ *
+ * Check if the feature is available and disable SEPT #VE if possible.
+ *
+ * If the TD allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
+ * attribute is no longer reliable. It reflects the initial state of the
+ * control for the TD, but it will not be updated if someone (e.g. bootloader)
+ * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
+ * determine if SEPT #VEs are enabled or disabled.
+ */
+static void disable_sept_ve(u64 td_attr)
+{
+ const char *msg = "TD misconfiguration: SEPT #VE has to be disabled";
+ bool debug = td_attr & ATTR_DEBUG;
+ u64 config, controls;
+
+ /* Is this TD allowed to disable SEPT #VE */
+ tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
+ if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) {
+ /* No SEPT #VE controls for the guest: check the attribute */
+ if (td_attr & ATTR_SEPT_VE_DISABLE)
+ return;
+
+ /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */
+ if (debug)
+ pr_warn("%s\n", msg);
+ else
+ tdx_panic(msg);
+ return;
+ }
+
+ /* Check if SEPT #VE has been disabled before us */
+ tdg_vm_rd(TDCS_TD_CTLS, &controls);
+ if (controls & TD_CTLS_PENDING_VE_DISABLE)
+ return;
+
+ /* Keep #VEs enabled for splats in debugging environments */
+ if (debug)
+ return;
+
+ /* Disable SEPT #VEs */
+ tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE,
+ TD_CTLS_PENDING_VE_DISABLE);
+
+ return;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -218,24 +274,12 @@ static void tdx_setup(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ td_attr = args.rdx;
+
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
- /*
- * The kernel can not handle #VE's when accessing normal kernel
- * memory. Ensure that no #VE will be delivered for accesses to
- * TD-private memory. Only VMM-shared memory (MMIO) will #VE.
- */
- td_attr = args.rdx;
- if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
- const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set.";
-
- /* Relax SEPT_VE_DISABLE check for debug TD. */
- if (td_attr & ATTR_DEBUG)
- pr_warn("%s\n", msg);
- else
- tdx_panic(msg);
- }
+ disable_sept_ve(td_attr);
}
/*
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 7e12cfa28bec..fecb2a6e864b 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -19,9 +19,17 @@
#define TDG_VM_RD 7
#define TDG_VM_WR 8
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+/* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
+#define TDCS_CONFIG_FLAGS 0x1110000300000016
+#define TDCS_TD_CTLS 0x1110000300000017
#define TDCS_NOTIFY_ENABLES 0x9100000000000010
+/* TDCS_CONFIG_FLAGS bits */
+#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1)
+
+/* TDCS_TD_CTLS bits */
+#define TD_CTLS_PENDING_VE_DISABLE BIT_ULL(0)
+
/* TDX hypercall Leaf IDs */
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
--
2.43.0
Here are different fixes:
Patch 1 closes the subflow after having received a FIN, instead of
leaving it half-closed until the end of the MPTCP connection. A fix for
v5.12.
Patch 2 validates the previous patch.
Patch 3 is a fix for a recent fix to check both directions for the
backup flag. It can follow the 'Fixes' commit and be backported up to
v5.7.
Patch 4 adds a missing \n at the end of pr_debug(), causing debug
messages to be displayed with a delay, which confuses the debugger. A
fix for v5.6.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Note: Peter's email address has been removed from the Cc list, because
it is bouncing.
---
Matthieu Baerts (NGI0) (4):
mptcp: close subflow when receiving TCP+FIN
selftests: mptcp: join: cannot rm sf if closed
mptcp: sched: check both backup in retrans
mptcp: pr_debug: add missing \n at the end
net/mptcp/fastopen.c | 4 +-
net/mptcp/options.c | 50 ++++++++++-----------
net/mptcp/pm.c | 28 ++++++------
net/mptcp/pm_netlink.c | 20 ++++-----
net/mptcp/protocol.c | 59 +++++++++++++------------
net/mptcp/protocol.h | 4 +-
net/mptcp/sched.c | 4 +-
net/mptcp/sockopt.c | 4 +-
net/mptcp/subflow.c | 56 ++++++++++++-----------
tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 ++---
10 files changed, 122 insertions(+), 118 deletions(-)
---
base-commit: 31a972959ae57691a1e4f539399b2674ae576086
change-id: 20240826-net-mptcp-close-extra-sf-fin-19d4e5aa4c9c
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
In psnet_open_pf_bar() and snet_open_vf_bar() a string later passed to
pcim_iomap_regions() is placed on the stack. Neither
pcim_iomap_regions() nor the functions it calls copy that string.
Should the string later ever be used, this, consequently, causes
undefined behavior since the stack frame will by then have disappeared.
Fix the bug by allocating the strings on the heap through
devm_kasprintf().
Cc: stable(a)vger.kernel.org # v6.3
Fixes: 51a8f9d7f587 ("virtio: vdpa: new SolidNET DPU driver.")
Reported-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Closes: https://lore.kernel.org/all/74e9109a-ac59-49e2-9b1d-d825c9c9f891@wanadoo.fr/
Suggested-by: Andy Shevchenko <andy(a)kernel.org>
Signed-off-by: Philipp Stanner <pstanner(a)redhat.com>
---
drivers/vdpa/solidrun/snet_main.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c
index 99428a04068d..c8b74980dbd1 100644
--- a/drivers/vdpa/solidrun/snet_main.c
+++ b/drivers/vdpa/solidrun/snet_main.c
@@ -555,7 +555,7 @@ static const struct vdpa_config_ops snet_config_ops = {
static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
{
- char name[50];
+ char *name;
int ret, i, mask = 0;
/* We don't know which BAR will be used to communicate..
* We will map every bar with len > 0.
@@ -573,7 +573,10 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
return -ENODEV;
}
- snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev));
+ name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "psnet[%s]-bars", pci_name(pdev));
+ if (!name)
+ return -ENOMEM;
+
ret = pcim_iomap_regions(pdev, mask, name);
if (ret) {
SNET_ERR(pdev, "Failed to request and map PCI BARs\n");
@@ -590,10 +593,13 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet)
{
- char name[50];
+ char *name;
int ret;
- snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev));
+ name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "snet[%s]-bars", pci_name(pdev));
+ if (!name)
+ return -ENOMEM;
+
/* Request and map BAR */
ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name);
if (ret) {
--
2.46.0
From: Li RongQing <lirongqing(a)baidu.com>
[ Upstream Commit 8e6ed96cdd5001c55fccc80a17f651741c1ca7d2]
when the vCPU was migrated, if its timer is expired, KVM _should_ fire
the timer ASAP, zeroing the deadline here will cause the timer to
immediately fire on the destination
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Peter Shier <pshier(a)google.com>
Cc: Jim Mattson <jmattson(a)google.com>
Cc: Wanpeng Li <wanpengli(a)tencent.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
Link: https://lore.kernel.org/r/20230106040625.8404-1-lirongqing@baidu.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
(cherry picked from commit 8e6ed96cdd5001c55fccc80a17f651741c1ca7d2)
The code was able to compile without errors or warnings.
Signed-off-by: David Hunter <david.hunter.linux(a)gmail.com>
---
arch/x86/kvm/lapic.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c90fef0258c5..3cd590ace95a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1843,8 +1843,12 @@ static bool set_target_expiration(struct kvm_lapic *apic, u32 count_reg)
if (unlikely(count_reg != APIC_TMICT)) {
deadline = tmict_to_ns(apic,
kvm_lapic_get_reg(apic, count_reg));
- if (unlikely(deadline <= 0))
- deadline = apic->lapic_timer.period;
+ if (unlikely(deadline <= 0)) {
+ if (apic_lvtt_period(apic))
+ deadline = apic->lapic_timer.period;
+ else
+ deadline = 0;
+ }
else if (unlikely(deadline > apic->lapic_timer.period)) {
pr_info_ratelimited(
"kvm: vcpu %i: requested lapic timer restore with "
--
2.43.0
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.
Additionally fix a similar race: user space converts a normal enclave
page to a TCS page (via SGX_IOC_ENCLAVE_MODIFY_TYPES) on CPU1, and at
the same time, user space performs a memory access on the same page on
CPU2. This fix is not strictly necessary (this particular race would
indicate a bug in a user space application), but it gives a consistent
rule: if an enclave page is under certain operation by the kernel with
the mapping removed, then other threads trying to access that page are
temporarily blocked and should retry.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.h | 3 ++-
arch/x86/kernel/cpu/sgx/ioctl.c | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index b566b8ad5f33..96b11e8fb770 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,7 +22,8 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit indicating that the page is busy (being reclaimed). */
+/* 'desc' bit indicating that the page is busy (being reclaimed, removed or
+ * converted to a TCS page). */
#define SGX_ENCL_PAGE_BUSY BIT(2)
/*
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 5d390df21440..ee619f2b3414 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -969,12 +969,22 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE
+ * entry for the enclave page, CPU2 may attempt to load
+ * this page (because the page is still in enclave's
+ * xarray). To prevent CPU2 from loading the page, mark
+ * the page as busy before unlock and unmark after lock
+ * again.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
mutex_lock(&encl->lock);
+ entry->desc &= ~SGX_ENCL_PAGE_BUSY;
sgx_mark_page_reclaimable(entry->epc_page);
}
@@ -1141,7 +1151,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE entry
+ * for the enclave page, CPU2 may attempt to load this page
+ * (because the page is still in enclave's xarray). To prevent
+ * CPU2 from loading the page, mark the page as busy.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.43.0
Imagine an mmap()'d file. Two threads touch the same address at the same
time and fault. Both allocate a physical page and race to install a PTE
for that page. Only one will win the race. The loser frees its page, but
still continues handling the fault as a success and returns
VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the
SGX steers into a failure path. The loser EREMOVE's the winner's EPC
page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Check whether another thread already
allocated the page and if yes, return with VM_FAULT_NOPAGE.
The race can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Kai Huang <kai.huang(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 36 ++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c0a3c00284c8..2aa7ced0e4a0 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -337,6 +337,16 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
return VM_FAULT_SIGBUS;
+ mutex_lock(&encl->lock);
+
+ /*
+ * Multiple threads may try to fault on the same page concurrently.
+ * Re-check if another thread has already done that.
+ */
+ encl_page = xa_load(&encl->page_array, PFN_DOWN(addr));
+ if (encl_page)
+ goto done;
+
/*
* Ignore internal permission checking for dynamically added pages.
* They matter only for data added during the pre-initialization
@@ -345,23 +355,23 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
*/
secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
encl_page = sgx_encl_page_alloc(encl, addr - encl->base, secinfo_flags);
- if (IS_ERR(encl_page))
- return VM_FAULT_OOM;
-
- mutex_lock(&encl->lock);
+ if (IS_ERR(encl_page)) {
+ vmret = VM_FAULT_OOM;
+ goto err_out_unlock;
+ }
epc_page = sgx_encl_load_secs(encl);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
- goto err_out_unlock;
+ goto err_out_encl;
}
epc_page = sgx_alloc_epc_page(encl_page, false);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
- goto err_out_unlock;
+ goto err_out_encl;
}
va_page = sgx_encl_grow(encl, false);
@@ -376,10 +386,6 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
encl_page, GFP_KERNEL);
- /*
- * If ret == -EBUSY then page was created in another flow while
- * running without encl->lock
- */
if (ret)
goto err_out_shrink;
@@ -389,7 +395,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
if (ret)
- goto err_out;
+ goto err_out_eaug;
encl_page->encl = encl;
encl_page->epc_page = epc_page;
@@ -408,20 +414,20 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
mutex_unlock(&encl->lock);
return VM_FAULT_SIGBUS;
}
+done:
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
-err_out:
+err_out_eaug:
xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
-
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
sgx_encl_free_epc_page(epc_page);
+err_out_encl:
+ kfree(encl_page);
err_out_unlock:
mutex_unlock(&encl->lock);
- kfree(encl_page);
-
return vmret;
}
--
2.43.0
The page reclaimer thread sets SGX_ENC_PAGE_BEING_RECLAIMED flag when
the enclave page is being reclaimed (moved to the backing store). This
flag however has two logical meanings:
1. Don't attempt to load the enclave page (the page is busy), see
__sgx_encl_load_page().
2. Don't attempt to remove the PCMD page corresponding to this enclave
page (the PCMD page is busy), see reclaimer_writing_to_pcmd().
To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into
two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently,
both flags are set only when the enclave page is being reclaimed (by the
page reclaimer thread). A future commit will introduce new cases when
the enclave page is being operated on; these new cases will set only the
SGX_ENCL_PAGE_BUSY flag.
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Acked-by: Kai Huang <kai.huang(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 16 +++++++---------
arch/x86/kernel/cpu/sgx/encl.h | 10 ++++++++--
arch/x86/kernel/cpu/sgx/main.c | 4 ++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..c0a3c00284c8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -46,10 +46,10 @@ static int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_ind
* a check if an enclave page sharing the PCMD page is in the process of being
* reclaimed.
*
- * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
- * intends to reclaim that enclave page - it means that the PCMD page
- * associated with that enclave page is about to get some data and thus
- * even if the PCMD page is empty, it should not be truncated.
+ * The reclaimer sets the SGX_ENCL_PAGE_PCMD_BUSY flag when it intends to
+ * reclaim that enclave page - it means that the PCMD page associated with that
+ * enclave page is about to get some data and thus even if the PCMD page is
+ * empty, it should not be truncated.
*
* Context: Enclave mutex (&sgx_encl->lock) must be held.
* Return: 1 if the reclaimer is about to write to the PCMD page
@@ -77,8 +77,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* Stop when reaching the SECS page - it does not
* have a page_array entry and its reclaim is
* started and completed with enclave mutex held so
- * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
- * flag.
+ * it does not use the SGX_ENCL_PAGE_PCMD_BUSY flag.
*/
if (addr == encl->base + encl->size)
break;
@@ -91,8 +90,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* VA page slot ID uses same bit as the flag so it is important
* to ensure that the page is not already in backing store.
*/
- if (entry->epc_page &&
- (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+ if (entry->epc_page && (entry->desc & SGX_ENCL_PAGE_PCMD_BUSY)) {
reclaimed = 1;
break;
}
@@ -257,7 +255,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & SGX_ENCL_PAGE_BUSY)
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..b566b8ad5f33 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,8 +22,14 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit marking that the page is being reclaimed. */
-#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit indicating that the page is busy (being reclaimed). */
+#define SGX_ENCL_PAGE_BUSY BIT(2)
+
+/*
+ * 'desc' bit indicating that PCMD page associated with the enclave page is
+ * busy (because the enclave page is being reclaimed).
+ */
+#define SGX_ENCL_PAGE_PCMD_BUSY BIT(3)
struct sgx_encl_page {
unsigned long desc;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..e94b09c43673 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -204,7 +204,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
void *va_slot;
int ret;
- encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc &= ~(SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY);
va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
list);
@@ -340,7 +340,7 @@ static void sgx_reclaim_pages(void)
goto skip;
}
- encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc |= SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY;
mutex_unlock(&encl_page->encl->lock);
continue;
--
2.43.0
From: Willem de Bruijn <willemb(a)google.com>
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Migitage with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable(a)vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
include/linux/virtio_net.h | 16 +++++-----------
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 3 +++
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index d1d7825318c32..6c395a2600e8d 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -56,7 +56,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
unsigned int thlen = 0;
unsigned int p_off = 0;
unsigned int ip_proto;
- u64 ret, remainder, gso_size;
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -99,16 +98,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
- if (hdr->gso_size) {
- gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
- ret = div64_u64_rem(skb->len, gso_size, &remainder);
- if (!(ret && (hdr->gso_size > needed) &&
- ((remainder > needed) || (remainder == 0)))) {
- return -EINVAL;
- }
- skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
- }
-
if (!pskb_may_pull(skb, needed))
return -EINVAL;
@@ -182,6 +171,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
if (gso_type != SKB_GSO_UDP_L4)
return -EINVAL;
break;
+ case SKB_GSO_TCPV4:
+ case SKB_GSO_TCPV6:
+ if (skb->csum_offset != offsetof(struct tcphdr, check))
+ return -EINVAL;
+ break;
}
/* Kernel has a special handling for GSO_BY_FRAGS. */
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 4b791e74529e1..9e49ffcc77071 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -140,6 +140,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
if (thlen < sizeof(*th))
goto out;
+ if (unlikely(skb->csum_start != skb->transport_header))
+ goto out;
+
if (!pskb_may_pull(skb, thlen))
goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index aa2e0a28ca613..f521152c40871 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -278,6 +278,9 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
if (gso_skb->len <= sizeof(*uh) + mss)
return ERR_PTR(-EINVAL);
+ if (unlikely(gso_skb->csum_start != gso_skb->transport_header))
+ return ERR_PTR(-EINVAL);
+
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
--
2.46.0.rc1.232.g9752f9e123-goog
Commit e882575efc77 ("spi: rockchip: Suspend and resume the bus during
NOIRQ_SYSTEM_SLEEP_PM ops") stopped respecting runtime PM status and
simply disabled clocks unconditionally when suspending the system. This
causes problems when the device is already runtime suspended when we go
to sleep -- in which case we double-disable clocks and produce a
WARNing.
Switch back to pm_runtime_force_{suspend,resume}(), because that still
seems like the right thing to do, and the aforementioned commit makes no
explanation why it stopped using it.
Also, refactor some of the resume() error handling, because it's not
actually a good idea to re-disable clocks on failure.
Fixes: e882575efc77 ("spi: rockchip: Suspend and resume the bus during NOIRQ_SYSTEM_SLEEP_PM ops")
Cc: <stable(a)vger.kernel.org>
Reported-by: "Ondřej Jirman" <megi(a)xff.cz>
Closes: https://lore.kernel.org/lkml/20220621154218.sau54jeij4bunf56@core/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
Changes in v2:
- fix unused 'rs' warning
drivers/spi/spi-rockchip.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)
diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c
index e1ecd96c7858..0bb33c43b1b4 100644
--- a/drivers/spi/spi-rockchip.c
+++ b/drivers/spi/spi-rockchip.c
@@ -945,14 +945,16 @@ static int rockchip_spi_suspend(struct device *dev)
{
int ret;
struct spi_controller *ctlr = dev_get_drvdata(dev);
- struct rockchip_spi *rs = spi_controller_get_devdata(ctlr);
ret = spi_controller_suspend(ctlr);
if (ret < 0)
return ret;
- clk_disable_unprepare(rs->spiclk);
- clk_disable_unprepare(rs->apb_pclk);
+ ret = pm_runtime_force_suspend(dev);
+ if (ret < 0) {
+ spi_controller_resume(ctlr);
+ return ret;
+ }
pinctrl_pm_select_sleep_state(dev);
@@ -963,25 +965,14 @@ static int rockchip_spi_resume(struct device *dev)
{
int ret;
struct spi_controller *ctlr = dev_get_drvdata(dev);
- struct rockchip_spi *rs = spi_controller_get_devdata(ctlr);
pinctrl_pm_select_default_state(dev);
- ret = clk_prepare_enable(rs->apb_pclk);
+ ret = pm_runtime_force_resume(dev);
if (ret < 0)
return ret;
- ret = clk_prepare_enable(rs->spiclk);
- if (ret < 0)
- clk_disable_unprepare(rs->apb_pclk);
-
- ret = spi_controller_resume(ctlr);
- if (ret < 0) {
- clk_disable_unprepare(rs->spiclk);
- clk_disable_unprepare(rs->apb_pclk);
- }
-
- return 0;
+ return spi_controller_resume(ctlr);
}
#endif /* CONFIG_PM_SLEEP */
--
2.46.0.295.g3b9ea8a38a-goog
From: NeilBrown <neilb(a)suse.de>
[ Upstream commit bf32075256e9dd9c6b736859e2c5813981339908 ]
The error paths in nfsd_svc() are needlessly complex and can result in a
final call to svc_put() without nfsd_last_thread() being called. This
results in the listening sockets not being closed properly.
The per-netns setup provided by nfsd_startup_new() and removed by
nfsd_shutdown_net() is needed precisely when there are running threads.
So we don't need nfsd_up_before. We don't need to know if it *was* up.
We only need to know if any threads are left. If none are, then we must
call nfsd_shutdown_net(). But we don't need to do that explicitly as
nfsd_last_thread() does that for us.
So simply call nfsd_last_thread() before the last svc_put() if there are
no running threads. That will always do the right thing.
Also discard:
pr_info("nfsd: last server has exited, flushing export cache\n");
It may not be true if an attempt to start the first server failed, and
it isn't particularly helpful and it simply reports normal behaviour.
Signed-off-by: NeilBrown <neilb(a)suse.de>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/nfsd/nfssvc.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Suggested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 7911c4b3b5d3..710a54c7dffc 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -567,7 +567,6 @@ void nfsd_last_thread(struct net *net)
return;
nfsd_shutdown_net(net);
- pr_info("nfsd: last server has exited, flushing export cache\n");
nfsd_export_flush(net);
}
@@ -783,7 +782,6 @@ int
nfsd_svc(int nrservs, struct net *net, const struct cred *cred)
{
int error;
- bool nfsd_up_before;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_serv *serv;
@@ -803,8 +801,6 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred)
error = nfsd_create_serv(net);
if (error)
goto out;
-
- nfsd_up_before = nn->nfsd_net_up;
serv = nn->nfsd_serv;
error = nfsd_startup_net(net, cred);
@@ -812,17 +808,15 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred)
goto out_put;
error = svc_set_num_threads(serv, NULL, nrservs);
if (error)
- goto out_shutdown;
+ goto out_put;
error = serv->sv_nrthreads;
- if (error == 0)
- nfsd_last_thread(net);
-out_shutdown:
- if (error < 0 && !nfsd_up_before)
- nfsd_shutdown_net(net);
out_put:
/* Threads now hold service active */
if (xchg(&nn->keep_active, 0))
svc_put(serv);
+
+ if (serv->sv_nrthreads == 0)
+ nfsd_last_thread(net);
svc_put(serv);
out:
mutex_unlock(&nfsd_mutex);
--
2.45.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082754-tricking-facsimile-011e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6c3fc0b1c3d0 ("igc: Fix qbv tx latency by setting gtxoffset")
790835fcc0cb ("igc: Correct the launchtime offset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464 Mon Sep 17 00:00:00 2001
From: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Date: Sun, 7 Jul 2024 08:53:18 -0400
Subject: [PATCH] igc: Fix qbv tx latency by setting gtxoffset
A large tx latency issue was discovered during testing when only QBV was
enabled. The issue occurs because gtxoffset was not set when QBV is
active, it was only set when launch time is active.
The patch "igc: Correct the launchtime offset" only sets gtxoffset when
the launchtime_enable field is set by the user. Enabling launchtime_enable
ultimately sets the register IGC_TXQCTL_QUEUE_MODE_LAUNCHT (referred to as
LaunchT in the SW user manual).
Section 7.5.2.6 of the IGC i225/6 SW User Manual Rev 1.2.4 states:
"The latency between transmission scheduling (launch time) and the
time the packet is transmitted to the network is listed in Table 7-61."
However, the patch misinterprets the phrase "launch time" in that section
by assuming it specifically refers to the LaunchT register, whereas it
actually denotes the generic term for when a packet is released from the
internal buffer to the MAC transmit logic.
This launch time, as per that section, also implicitly refers to the QBV
gate open time, where a packet waits in the buffer for the QBV gate to
open. Therefore, latency applies whenever QBV is in use. TSN features such
as QBU and QAV reuse QBV, making the latency universal to TSN features.
Discussed with i226 HW owner (Shalev, Avi) and we were in agreement that
the term "launch time" used in Section 7.5.2.6 is not clear and can be
easily misinterpreted. Avi will update this section to:
"When TQAVCTRL.TRANSMIT_MODE = TSN, the latency between transmission
scheduling and the time the packet is transmitted to the network is listed
in Table 7-61."
Fix this issue by using igc_tsn_is_tx_mode_in_tsn() as a condition to
write to gtxoffset, aligning with the newly updated SW User Manual.
Tested:
1. Enrol taprio on talker board
base-time 0
cycle-time 1000000
flags 0x2
index 0 cmd S gatemask 0x1 interval1
index 0 cmd S gatemask 0x1 interval2
Note:
interval1 = interval for a 64 bytes packet to go through
interval2 = cycle-time - interval1
2. Take tcpdump on listener board
3. Use udp tai app on talker to send packets to listener
4. Check the timestamp on listener via wireshark
Test Result:
100 Mbps: 113 ~193 ns
1000 Mbps: 52 ~ 84 ns
2500 Mbps: 95 ~ 223 ns
Note that the test result is similar to the patch "igc: Correct the
launchtime offset".
Fixes: 790835fcc0cb ("igc: Correct the launchtime offset")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes(a)intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index ada751430517..d68fa7f3d5f0 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -61,7 +61,7 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter)
struct igc_hw *hw = &adapter->hw;
u16 txoffset;
- if (!is_any_launchtime(adapter))
+ if (!igc_tsn_is_tx_mode_in_tsn(adapter))
return;
switch (adapter->link_speed) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082732-calculus-unviable-28eb@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6c3fc0b1c3d0 ("igc: Fix qbv tx latency by setting gtxoffset")
790835fcc0cb ("igc: Correct the launchtime offset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464 Mon Sep 17 00:00:00 2001
From: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Date: Sun, 7 Jul 2024 08:53:18 -0400
Subject: [PATCH] igc: Fix qbv tx latency by setting gtxoffset
A large tx latency issue was discovered during testing when only QBV was
enabled. The issue occurs because gtxoffset was not set when QBV is
active, it was only set when launch time is active.
The patch "igc: Correct the launchtime offset" only sets gtxoffset when
the launchtime_enable field is set by the user. Enabling launchtime_enable
ultimately sets the register IGC_TXQCTL_QUEUE_MODE_LAUNCHT (referred to as
LaunchT in the SW user manual).
Section 7.5.2.6 of the IGC i225/6 SW User Manual Rev 1.2.4 states:
"The latency between transmission scheduling (launch time) and the
time the packet is transmitted to the network is listed in Table 7-61."
However, the patch misinterprets the phrase "launch time" in that section
by assuming it specifically refers to the LaunchT register, whereas it
actually denotes the generic term for when a packet is released from the
internal buffer to the MAC transmit logic.
This launch time, as per that section, also implicitly refers to the QBV
gate open time, where a packet waits in the buffer for the QBV gate to
open. Therefore, latency applies whenever QBV is in use. TSN features such
as QBU and QAV reuse QBV, making the latency universal to TSN features.
Discussed with i226 HW owner (Shalev, Avi) and we were in agreement that
the term "launch time" used in Section 7.5.2.6 is not clear and can be
easily misinterpreted. Avi will update this section to:
"When TQAVCTRL.TRANSMIT_MODE = TSN, the latency between transmission
scheduling and the time the packet is transmitted to the network is listed
in Table 7-61."
Fix this issue by using igc_tsn_is_tx_mode_in_tsn() as a condition to
write to gtxoffset, aligning with the newly updated SW User Manual.
Tested:
1. Enrol taprio on talker board
base-time 0
cycle-time 1000000
flags 0x2
index 0 cmd S gatemask 0x1 interval1
index 0 cmd S gatemask 0x1 interval2
Note:
interval1 = interval for a 64 bytes packet to go through
interval2 = cycle-time - interval1
2. Take tcpdump on listener board
3. Use udp tai app on talker to send packets to listener
4. Check the timestamp on listener via wireshark
Test Result:
100 Mbps: 113 ~193 ns
1000 Mbps: 52 ~ 84 ns
2500 Mbps: 95 ~ 223 ns
Note that the test result is similar to the patch "igc: Correct the
launchtime offset".
Fixes: 790835fcc0cb ("igc: Correct the launchtime offset")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes(a)intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index ada751430517..d68fa7f3d5f0 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -61,7 +61,7 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter)
struct igc_hw *hw = &adapter->hw;
u16 txoffset;
- if (!is_any_launchtime(adapter))
+ if (!igc_tsn_is_tx_mode_in_tsn(adapter))
return;
switch (adapter->link_speed) {
Commit 97ab304ecd95 ("ASoC: topology: Fix references to freed memory")
is a problematic fix for issue in topology loading code, which was
cherry-picked to stable. It was later corrected in
0298f51652be ("ASoC: topology: Fix route memory corruption"), however to
apply cleanly e0e7bc2cbee9 ("ASoC: topology: Clean up route loading")
also needs to be applied.
Link: https://lore.kernel.org/linux-sound/ZrwUCnrtKQ61LWFS@sashalap/T/#mbfd273adf…
Amadeusz Sławiński (2):
ASoC: topology: Clean up route loading
ASoC: topology: Fix route memory corruption
sound/soc/soc-topology.c | 32 ++++++++------------------------
1 file changed, 8 insertions(+), 24 deletions(-)
base-commit: 878fbff41def4649a2884e9d33bb423f5a7726b0
--
2.34.1
Hi,
the stable tag was missing for the following commit:
commit 2a07bb64d801 ("s390/dasd: Remove DMA alignment")
The change needs to be applied for kernel 6.0+ essentially reverting
bc792884b76f ("s390/dasd: Establish DMA alignment").
The patch fixes filesystem errors especially for XFS when DASD devices are formatted
with a blocksize smaller than 4096 bytes.
The commit 2a07bb64d801 ("s390/dasd: Remove DMA alignment") should apply cleanly for
kernel 6.9+. There was a refactoring happening at the time with the following two commits
(just for context, not required as prereqs!):
commit 0127a47f58c6 ("dasd: move queue setup to common code")
commit fde07a4d74e3 ("dasd: use the atomic queue limits API")
For everything before 6.9 a simple git revert for commit bc792884b76f
("s390/dasd: Establish DMA alignment") should work just fine.
If you run into any conflicts, need separate patches, or have any questions,
please let me know.
Thanks a lot and apologies for the inconvenience!
regards,
Jan
From: Mario Limonciello <mario.limonciello(a)amd.com>
This is a backport of patches from 6.11-rc1 that improve power savings
for VCN when hardware accelerated video playback is active.
Boyuan Zhang (2):
drm/amdgpu/vcn: identify unified queue in sw init
drm/amdgpu/vcn: not pause dpg for unified queue
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 53 ++++++++++++-------------
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
2 files changed, 27 insertions(+), 27 deletions(-)
--
2.43.0
From: "Lee, Chun-Yi" <joeyli.kernel(a)gmail.com>
commit 9c33663af9ad115f90c076a1828129a3fbadea98 upstream.
This patch adds code to check HCI_UART_PROTO_READY flag before
accessing hci_uart->proto. It fixes the race condition in
hci_uart_tty_ioctl() between HCIUARTSETPROTO and HCIUARTGETPROTO.
This issue bug found by Yu Hao and Weiteng Chen:
BUG: general protection fault in hci_uart_tty_ioctl [1]
The information of C reproducer can also reference the link [2]
Reported-by: Yu Hao <yhao016(a)ucr.edu>
Closes: https://lore.kernel.org/all/CA+UBctC3p49aTgzbVgkSZ2+TQcqq4fPDO7yZitFT5uBPDe… [1]
Reported-by: Weiteng Chen <wchen130(a)ucr.edu>
Closes: https://lore.kernel.org/lkml/CA+UBctDPEvHdkHMwD340=n02rh+jNRJNNQ5LBZNA+Wm4K… [2]
Signed-off-by: "Lee, Chun-Yi" <jlee(a)suse.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
[Harshit: bp to stable kernels]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
---
This is backport of a fix for CVE-2023-31083, it applies cleanly to all
stable trees and I have build tested this.
drivers/bluetooth/hci_ldisc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 865112e96ff9..c1feebd9e3a0 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -770,7 +770,8 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, unsigned int cmd,
break;
case HCIUARTGETPROTO:
- if (test_bit(HCI_UART_PROTO_SET, &hu->flags))
+ if (test_bit(HCI_UART_PROTO_SET, &hu->flags) &&
+ test_bit(HCI_UART_PROTO_READY, &hu->flags))
err = hu->proto->id;
else
err = -EUNATCH;
--
2.45.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f2916c83d746eb99f50f42c15cf4c47c2ea5f3b3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082635-dislike-tipping-1bee@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
f2916c83d746 ("net: ngbe: Fix phy mode set to external phy")
bc2426d74aa3 ("net: ngbe: convert phylib to phylink")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f2916c83d746eb99f50f42c15cf4c47c2ea5f3b3 Mon Sep 17 00:00:00 2001
From: Mengyuan Lou <mengyuanlou(a)net-swift.com>
Date: Tue, 20 Aug 2024 11:04:25 +0800
Subject: [PATCH] net: ngbe: Fix phy mode set to external phy
The MAC only has add the TX delay and it can not be modified.
MAC and PHY are both set the TX delay cause transmission problems.
So just disable TX delay in PHY, when use rgmii to attach to
external phy, set PHY_INTERFACE_MODE_RGMII_RXID to phy drivers.
And it is does not matter to internal phy.
Fixes: bc2426d74aa3 ("net: ngbe: convert phylib to phylink")
Signed-off-by: Mengyuan Lou <mengyuanlou(a)net-swift.com>
Cc: stable(a)vger.kernel.org # 6.3+
Reviewed-by: Jacob Keller <jacob.e.keller(a)intel.com>
Link: https://patch.msgid.link/E6759CF1387CF84C+20240820030425.93003-1-mengyuanlo…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c
index ec54b18c5fe7..a5e9b779c44d 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c
@@ -124,8 +124,12 @@ static int ngbe_phylink_init(struct wx *wx)
MAC_SYM_PAUSE | MAC_ASYM_PAUSE;
config->mac_managed_pm = true;
- phy_mode = PHY_INTERFACE_MODE_RGMII_ID;
- __set_bit(PHY_INTERFACE_MODE_RGMII_ID, config->supported_interfaces);
+ /* The MAC only has add the Tx delay and it can not be modified.
+ * So just disable TX delay in PHY, and it is does not matter to
+ * internal phy.
+ */
+ phy_mode = PHY_INTERFACE_MODE_RGMII_RXID;
+ __set_bit(PHY_INTERFACE_MODE_RGMII_RXID, config->supported_interfaces);
phylink = phylink_create(config, NULL, phy_mode, &ngbe_mac_ops);
if (IS_ERR(phylink))
From: Mario Limonciello <mario.limonciello(a)amd.com>
This is a backport of patches from 6.11-rc1 that improve power savings
for VCN when hardware accelerated video playback is active.
Boyuan Zhang (2):
drm/amdgpu/vcn: identify unified queue in sw init
drm/amdgpu/vcn: not pause dpg for unified queue
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 53 ++++++++++++-------------
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
2 files changed, 27 insertions(+), 27 deletions(-)
--
2.43.0
When commit 390390240145 ("nfsd: don't allow nfsd threads to be
signalled.") is backported to 5.10, it was adjusted considering commit
3feac2b55293 ("sunrpc: exclude from freezer when waiting for requests:").
However, 3feac2b55293 is based on commit f6e70aab9dfe ("SUNRPC: refresh
rq_pages using a bulk page allocator"), which converted page-by-page
allocation to a batch allocation, so schedule_timeout() is placed
un-nested.
As a result, the backported commit 7229200f6866 ("nfsd: don't allow nfsd
threads to be signalled.") placed freezable_schedule_timeout() in the wrong
place.
Now, freezable_schedule_timeout() is called after every successful page
allocation, and we see 30%+ performance regression on 5.10.220 in our
test suite.
Let's move it to the correct place so that freezable_schedule_timeout()
is called only when page allocation fails.
Fixes: 7229200f6866 ("nfsd: don't allow nfsd threads to be signalled.")
Reported-by: Hughdan Liu <hughliu(a)amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
---
net/sunrpc/svc_xprt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d1eacf3358b8..60782504ad3e 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -679,8 +679,8 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
set_current_state(TASK_RUNNING);
return -EINTR;
}
+ freezable_schedule_timeout(msecs_to_jiffies(500));
}
- freezable_schedule_timeout(msecs_to_jiffies(500));
rqstp->rq_pages[i] = p;
}
rqstp->rq_page_end = &rqstp->rq_pages[i];
--
2.30.2
From: Chuck Lever <chuck.lever(a)oracle.com>
Address an NFSD crasher that was noted here:
https://lore.kernel.org/linux-nfs/65ee9c0d-e89e-b3e5-f542-103a0ee4745c@huaw…
To apply the fix cleanly, backport a few NFSD patches into v6.1.y
that have been in the other LTS kernels for a while.
Reported-by: Li LingFeng <lilingfeng3(a)huawei.com>
Suggested-by: Li LingFeng <lilingfeng3(a)huawei.com>
Tested-by: Li LingFeng <lilingfeng3(a)huawei.com>
Jeff Layton (1):
nfsd: drop the nfsd_put helper
NeilBrown (5):
nfsd: Simplify code around svc_exit_thread() call in nfsd()
nfsd: separate nfsd_last_thread() from nfsd_put()
NFSD: simplify error paths in nfsd_svc()
nfsd: call nfsd_last_thread() before final nfsd_put()
nfsd: don't call locks_release_private() twice concurrently
Trond Myklebust (1):
nfsd: Fix a regression in nfsd_setattr()
fs/nfsd/nfs4proc.c | 4 ++
fs/nfsd/nfs4state.c | 2 +-
fs/nfsd/nfsctl.c | 32 ++++++++------
fs/nfsd/nfsd.h | 3 +-
fs/nfsd/nfssvc.c | 85 ++++++++++----------------------------
fs/nfsd/vfs.c | 6 ++-
include/linux/sunrpc/svc.h | 13 ------
7 files changed, 51 insertions(+), 94 deletions(-)
--
2.45.1
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit bed2eb964c70b780fb55925892a74f26cb590b25 ]
Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
Further investigation shows that the crash is due to invalid memory access
in stacksafe(). More specifically, it is the following code:
if (exact != NOT_EXACT &&
old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
cur->stack[spi].slot_type[i % BPF_REG_SIZE])
return false;
The 'i' iterates old->allocated_stack.
If cur->allocated_stack < old->allocated_stack the out-of-bound
access will happen.
To fix the issue add 'i >= cur->allocated_stack' check such that if
the condition is true, stacksafe() should fail. Otherwise,
cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.
Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Reported-by: Daniel Hodges <hodgesd(a)meta.com>
Acked-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
shung-hsi.yu: "exact" variable is bool instead enum because commit 4f81c16f50ba
("bpf: Recognize that two registers are safe when their ranges match") is not
present.
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
---
kernel/bpf/verifier.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 171045b6956d..3f1a9cd7fc9e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16124,8 +16124,9 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
spi = i / BPF_REG_SIZE;
if (exact &&
- old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
- cur->stack[spi].slot_type[i % BPF_REG_SIZE])
+ (i >= cur->allocated_stack ||
+ old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
+ cur->stack[spi].slot_type[i % BPF_REG_SIZE]))
return false;
if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ) && !exact) {
--
2.46.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 40b760cfd44566bca791c80e0720d70d75382b84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081934-embargo-primer-a23e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
40b760cfd445 ("mm/numa: no task_numa_fault() call if PTE is changed")
d2136d749d76 ("mm: support multi-size THP numa balancing")
6b0ed7b3c775 ("mm: factor out the numa mapping rebuilding into a new helper")
ec1778807a80 ("mm: mprotect: use a folio in change_pte_range()")
6695cf68b15c ("mm: memory: use a folio in do_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
df57721f9a63 ("Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40b760cfd44566bca791c80e0720d70d75382b84 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:04 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PTE is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
TLB flush via delaying mapping on hint page fault") restructured
do_numa_page() and did not avoid task_numa_fault() call in the second page
table check after a numa migration failure. Fix it by making all
!pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 34f8402d2046..3c01d68065be 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5295,7 +5295,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+ return 0;
}
pte = pte_modify(old_pte, vma->vm_page_prot);
@@ -5358,23 +5358,19 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
nid = target_nid;
flags |= TNF_MIGRATED;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
- vmf->address, &vmf->ptl);
- if (unlikely(!vmf->pte))
- goto out;
- if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, nr_pages, flags);
- return 0;
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+ vmf->address, &vmf->ptl);
+ if (unlikely(!vmf->pte))
+ return 0;
+ if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ return 0;
+ }
out_map:
/*
* Make it present again, depending on how arch implements
@@ -5387,7 +5383,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
writable);
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 61ebe5a747da649057c37be1c37eb934b4af79ca
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081918-payday-symphonic-ac65@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
61ebe5a747da ("mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0")
88ae5fb755b0 ("mm: vmalloc: enable memory allocation profiling")
e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
3ba2c3ff98ea ("Merge tag 'modules-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 61ebe5a747da649057c37be1c37eb934b4af79ca Mon Sep 17 00:00:00 2001
From: Hailong Liu <hailong.liu(a)oppo.com>
Date: Thu, 8 Aug 2024 20:19:56 +0800
Subject: [PATCH] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with
high order fallback to order 0
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e ("mm,
vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
__GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
failed for high order, the pages** may contain two different page shifts
(high order and order-0). This could lead __vmap_pages_range_noflush() to
perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for
PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
__vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
vmap_pages_range()
vmap_pages_range_noflush()
__vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails,
__vmalloc_node_range_noprof() will retry with order-0. Therefore, it is
unnecessary to fallback to order-0 here. Therefore, fix this by removing
the fallback code.
Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com>
Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Acked-by: Barry Song <baohua(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6b783baf12a1..af2de36549d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_noprof(alloc_gfp, order);
else
page = alloc_pages_node_noprof(nid, alloc_gfp, order);
- if (unlikely(!page)) {
- if (!nofail)
- break;
-
- /* fall back to the zero order allocations */
- alloc_gfp |= __GFP_NOFAIL;
- order = 0;
- continue;
- }
+ if (unlikely(!page))
+ break;
/*
* Higher order allocations must be able to be treated as
commit ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 upstream.
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
---
Greg, this is a backport for 6.6.y and older stable kernels that failed
to cherry-pick the original one.
sound/core/timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/core/timer.c b/sound/core/timer.c
index a0b515981ee9..230babace502 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -556,7 +556,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) {
result = -EINVAL;
goto unlock;
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit bed2eb964c70b780fb55925892a74f26cb590b25 ]
Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
Further investigation shows that the crash is due to invalid memory access
in stacksafe(). More specifically, it is the following code:
if (exact != NOT_EXACT &&
old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
cur->stack[spi].slot_type[i % BPF_REG_SIZE])
return false;
The 'i' iterates old->allocated_stack.
If cur->allocated_stack < old->allocated_stack the out-of-bound
access will happen.
To fix the issue add 'i >= cur->allocated_stack' check such that if
the condition is true, stacksafe() should fail. Otherwise,
cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.
Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Reported-by: Daniel Hodges <hodgesd(a)meta.com>
Acked-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
---
I see this patch itself was already picked up[1] by Sasha. This thread
additional includes the associated selftest that was sent in the same
series (not entirely sure if backporting selftest goes against the
stable backporting rule though).
1: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…
---
kernel/bpf/verifier.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a8845cc299fe..521bd7efae03 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16881,8 +16881,9 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
spi = i / BPF_REG_SIZE;
if (exact != NOT_EXACT &&
- old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
- cur->stack[spi].slot_type[i % BPF_REG_SIZE])
+ (i >= cur->allocated_stack ||
+ old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
+ cur->stack[spi].slot_type[i % BPF_REG_SIZE]))
return false;
if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
--
2.46.0
Hi,
The following patches in 6.11-rc1 help VCN power consumption on a lot of
modern products. Can we please take then to 6.10.y so more people can
get the power savings?
commit ecfa23c8df7e ("drm/amdgpu/vcn: identify unified queue in sw init")
commit 7d75ef3736a0 ("drm/amdgpu/vcn: not pause dpg for unified queue")
I've also sent out backports to both 6.6.y and 6.1.y separately.
Thanks!
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: videobuf2: Drop minimum allocation requirement of 2 buffers
Author: Laurent Pinchart <laurent.pinchart+renesas(a)ideasonboard.com>
Date: Mon Aug 26 02:24:49 2024 +0300
When introducing the ability for drivers to indicate the minimum number
of buffers they require an application to allocate, commit 6662edcd32cc
("media: videobuf2: Add min_reqbufs_allocation field to vb2_queue
structure") also introduced a global minimum of 2 buffers. It turns out
this breaks the Renesas R-Car VSP test suite, where a test that
allocates a single buffer fails when two buffers are used.
One may consider debatable whether test suite failures without failures
in production use cases should be considered as a regression, but
operation with a single buffer is a valid use case. While full frame
rate can't be maintained, memory-to-memory devices can still be used
with a decent efficiency, and requiring applications to allocate
multiple buffers for single-shot use cases with capture devices would
just waste memory.
For those reasons, fix the regression by dropping the global minimum of
buffers. Individual drivers can still set their own minimum.
Fixes: 6662edcd32cc ("media: videobuf2: Add min_reqbufs_allocation field to vb2_queue structure")
Cc: stable(a)vger.kernel.org
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas(a)ideasonboard.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Acked-by: Tomasz Figa <tfiga(a)chromium.org>
Link: https://lore.kernel.org/r/20240825232449.25905-1-laurent.pinchart+renesas@i…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
drivers/media/common/videobuf2/videobuf2-core.c | 7 -------
1 file changed, 7 deletions(-)
---
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 500a4e0c84ab..29a8d876e6c2 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -2632,13 +2632,6 @@ int vb2_core_queue_init(struct vb2_queue *q)
if (WARN_ON(q->supports_requests && q->min_queued_buffers))
return -EINVAL;
- /*
- * The minimum requirement is 2: one buffer is used
- * by the hardware while the other is being processed by userspace.
- */
- if (q->min_reqbufs_allocation < 2)
- q->min_reqbufs_allocation = 2;
-
/*
* If the driver needs 'min_queued_buffers' in the queue before
* calling start_streaming() then the minimum requirement is
When introducing the ability for drivers to indicate the minimum number
of buffers they require an application to allocate, commit 6662edcd32cc
("media: videobuf2: Add min_reqbufs_allocation field to vb2_queue
structure") also introduced a global minimum of 2 buffers. It turns out
this breaks the Renesas R-Car VSP test suite, where a test that
allocates a single buffer fails when two buffers are used.
One may consider debatable whether test suite failures without failures
in production use cases should be considered as a regression, but
operation with a single buffer is a valid use case. While full frame
rate can't be maintained, memory-to-memory devices can still be used
with a decent efficiency, and requiring applications to allocate
multiple buffers for single-shot use cases with capture devices would
just waste memory.
For those reasons, fix the regression by dropping the global minimum of
buffers. Individual drivers can still set their own minimum.
Fixes: 6662edcd32cc ("media: videobuf2: Add min_reqbufs_allocation field to vb2_queue structure")
Cc: stable(a)vger.kernel.org
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas(a)ideasonboard.com>
---
drivers/media/common/videobuf2/videobuf2-core.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 500a4e0c84ab..29a8d876e6c2 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -2632,13 +2632,6 @@ int vb2_core_queue_init(struct vb2_queue *q)
if (WARN_ON(q->supports_requests && q->min_queued_buffers))
return -EINVAL;
- /*
- * The minimum requirement is 2: one buffer is used
- * by the hardware while the other is being processed by userspace.
- */
- if (q->min_reqbufs_allocation < 2)
- q->min_reqbufs_allocation = 2;
-
/*
* If the driver needs 'min_queued_buffers' in the queue before
* calling start_streaming() then the minimum requirement is
base-commit: a043ea54bbb975ca9239c69fd17f430488d33522
--
Regards,
Laurent Pinchart
In _emif_get_id(), of_get_address() may return NULL which is later
dereferenced. Fix this bug by adding NULL check.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408160935.A6QFliqt-lkp@intel.com/
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v5:
- According to the developer's suggestion, added an inspection of function
of_translate_address(). However, kernel test robot reported a build
warning, so the inspection is removed here, reverting to the modification
solution of patch v3.
Changes in v4:
- added the check of of_translate_address() as suggestions.
Changes in v3:
- added the patch operations omitted in PATCH v2 RESEND compared to PATCH
v2. Sorry for my oversight.
Changes in v2:
- added Cc stable line.
---
drivers/edac/ti_edac.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/edac/ti_edac.c b/drivers/edac/ti_edac.c
index 29723c9592f7..6f3da8d99eab 100644
--- a/drivers/edac/ti_edac.c
+++ b/drivers/edac/ti_edac.c
@@ -207,6 +207,9 @@ static int _emif_get_id(struct device_node *node)
int my_id = 0;
addrp = of_get_address(node, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
my_addr = (u32)of_translate_address(node, addrp);
for_each_matching_node(np, ti_edac_of_match) {
@@ -214,6 +217,9 @@ static int _emif_get_id(struct device_node *node)
continue;
addrp = of_get_address(np, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
addr = (u32)of_translate_address(np, addrp);
edac_printk(KERN_INFO, EDAC_MOD_NAME,
--
2.25.1
With recent work to the doorbell workaround code a small hole was
introduced that could cause a tx_timeout. This happens if the rx
dbell_deadline goes beyond the netdev watchdog timeout set by the driver
(i.e. 2 seconds). Fix this by changing the netdev watchdog timeout to 5
seconds and reduce the max rx dbell_deadline to 4 seconds.
The test that can reproduce the issue being fixed is a multi-queue send
test via pktgen with the "burst" setting to 1. This causes the queue's
doorbell to be rung on every packet sent to the driver, which may result
in the device missing doorbells due to the high doorbell rate.
Cc: stable(a)vger.kernel.org
Fixes: 4ded136c78f8 ("ionic: add work item for missed-doorbell check")
Signed-off-by: Brett Creeley <brett.creeley(a)amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson(a)amd.com>
---
v2:
- Drop budget == 0 patch to expedite getting this patch merged due to
the budget == 0 patch being more complicated than we originally
thought.
v1:
- https://lore.kernel.org/netdev/20240813234122.53083-1-brett.creeley@amd.com/
drivers/net/ethernet/pensando/ionic/ionic_dev.h | 2 +-
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.h b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
index c647033f3ad2..f2f07bf88545 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_dev.h
+++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
@@ -32,7 +32,7 @@
#define IONIC_ADMIN_DOORBELL_DEADLINE (HZ / 2) /* 500ms */
#define IONIC_TX_DOORBELL_DEADLINE (HZ / 100) /* 10ms */
#define IONIC_RX_MIN_DOORBELL_DEADLINE (HZ / 100) /* 10ms */
-#define IONIC_RX_MAX_DOORBELL_DEADLINE (HZ * 5) /* 5s */
+#define IONIC_RX_MAX_DOORBELL_DEADLINE (HZ * 4) /* 4s */
struct ionic_dev_bar {
void __iomem *vaddr;
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index aa0cc31dfe6e..86774d9922d8 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -3220,7 +3220,7 @@ int ionic_lif_alloc(struct ionic *ionic)
netdev->netdev_ops = &ionic_netdev_ops;
ionic_ethtool_set_ops(netdev);
- netdev->watchdog_timeo = 2 * HZ;
+ netdev->watchdog_timeo = 5 * HZ;
netif_carrier_off(netdev);
lif->identity = lid;
--
2.17.1
0x7d and 0x7e bytes are meant to be escaped in the data portion of
frames, but this didn't occur since next_chunk_len() had an off-by-one
error. That also resulted in the final byte of a payload being written
as a separate tty write op.
The chunk prior to an escaped byte would be one byte short, and the
next call would never test the txpos+1 case, which is where the escaped
byte was located. That meant it never hit the escaping case in
mctp_serial_tx_work().
Example Input: 01 00 08 c8 7e 80 02
Previous incorrect chunks from next_chunk_len():
01 00 08
c8 7e 80
02
With this fix:
01 00 08 c8
7e
80 02
Cc: stable(a)vger.kernel.org
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding")
Signed-off-by: Matt Johnston <matt(a)codeconstruct.com.au>
---
drivers/net/mctp/mctp-serial.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mctp/mctp-serial.c b/drivers/net/mctp/mctp-serial.c
index d7db11355909..82890e983847 100644
--- a/drivers/net/mctp/mctp-serial.c
+++ b/drivers/net/mctp/mctp-serial.c
@@ -91,8 +91,8 @@ static int next_chunk_len(struct mctp_serial *dev)
* will be those non-escaped bytes, and does not include the escaped
* byte.
*/
- for (i = 1; i + dev->txpos + 1 < dev->txlen; i++) {
- if (needs_escape(dev->txbuf[dev->txpos + i + 1]))
+ for (i = 1; i + dev->txpos < dev->txlen; i++) {
+ if (needs_escape(dev->txbuf[dev->txpos + i]))
break;
}
In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert the
GPU scheduler to variable number of run-queues") implemented a call to
drm_err(), which uses the job's scheduler pointer as a parameter.
job->sched, however, is not yet valid as it gets set by
drm_sched_job_arm(), which is always called after drm_sched_job_init().
Since the scheduler code has no control over how the API-User has
allocated or set 'job', the pointer's dereference is undefined behavior.
Fix the UB by replacing drm_err() with pr_err().
Cc: <stable(a)vger.kernel.org> # 6.7+
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Reported-by: Danilo Krummrich <dakr(a)redhat.com>
Closes: https://lore.kernel.org/lkml/20231108022716.15250-1-dakr@redhat.com/
Signed-off-by: Philipp Stanner <pstanner(a)redhat.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..356c30fa24a8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
* or worse--a blank screen--leave a trail in the
* logs, so this can be debugged easier.
*/
- drm_err(job->sched, "%s: entity has no rq!\n", __func__);
+ pr_err("*ERROR* %s: entity has no rq!\n", __func__);
return -ENOENT;
}
--
2.46.0
Here is a new batch of fixes for the MPTCP in-kernel path-manager:
Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.
Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.
Patch 3 validates the two previous patches.
Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.
Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.
Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.
Patch 7 validates the three previous patches.
Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.
Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.
Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.
Patch 11 validates the previous patch.
Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.
Patch 13 validates the previous patch.
Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.
Patch 15 validates the previous patch.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: reuse ID 0 after delete and re-add
mptcp: pm: fix RM_ADDR ID for the initial subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: send ACK on an active subflow
mptcp: pm: skip connecting to already established sf
mptcp: pm: reset MPC endp ID when re-added
selftests: mptcp: join: check re-adding init endp with != id
selftests: mptcp: join: no extra msg if no counter
mptcp: pm: do not remove already closed subflows
mptcp: pm: fix ID 0 endp usage after multiple re-creations
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: validate event numbers
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: check re-re-adding ID 0 signal
net/mptcp/pm.c | 4 +-
net/mptcp/pm_netlink.c | 87 ++++++++++----
net/mptcp/protocol.c | 6 +
net/mptcp/protocol.h | 5 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 149 ++++++++++++++++++++----
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 +
6 files changed, 207 insertions(+), 48 deletions(-)
---
base-commit: 8af174ea863c72f25ce31cee3baad8a301c0cf0f
change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
From: Anirudh Rayabharam (Microsoft) <anirudh(a)anirudhrb.com>
9636be85cc5b ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go
online/offline") introduces a new cpuhp state for hyperv initialization.
cpuhp_setup_state() returns the state number if state is CPUHP_AP_ONLINE_DYN
or CPUHP_BP_PREPARE_DYN and 0 for all other states. For the hyperv case,
since a new cpuhp state was introduced it would return 0. However,
in hv_machine_shutdown(), the cpuhp_remove_state() call is conditioned upon
"hyperv_init_cpuhp > 0". This will never be true and so hv_cpu_die() won't be
called on all CPUs. This means the VP assist page won't be reset. When the
kexec kernel tries to setup the VP assist page again, the hypervisor corrupts
the memory region of the old VP assist page causing a panic in case the kexec
kernel is using that memory elsewhere. This was originally fixed in dfe94d4086e4
("x86/hyperv: Fix kexec panic/hang issues").
Set hyperv_init_cpuhp to CPUHP_AP_HYPERV_ONLINE upon successful setup so that
the hyperv cpuhp state is removed correctly on kexec and the necessary cleanup
takes place.
Cc: stable(a)vger.kernel.org
Fixes: 9636be85cc5b ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline")
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh(a)anirudhrb.com>
---
arch/x86/hyperv/hv_init.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 17a71e92a343..81d1981a75d1 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,7 +607,7 @@ void __init hyperv_init(void)
register_syscore_ops(&hv_syscore_ops);
- hyperv_init_cpuhp = cpuhp;
+ hyperv_init_cpuhp = CPUHP_AP_HYPERV_ONLINE;
if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
hv_get_partition_id();
@@ -637,7 +637,7 @@ void __init hyperv_init(void)
clean_guest_os_id:
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
- cpuhp_remove_state(cpuhp);
+ cpuhp_remove_state(CPUHP_AP_HYPERV_ONLINE);
free_ghcb_page:
free_percpu(hv_ghcb_pg);
free_vp_assist_page:
--
2.45.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082742-getting-scoured-1475@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6c3fc0b1c3d0 ("igc: Fix qbv tx latency by setting gtxoffset")
790835fcc0cb ("igc: Correct the launchtime offset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464 Mon Sep 17 00:00:00 2001
From: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Date: Sun, 7 Jul 2024 08:53:18 -0400
Subject: [PATCH] igc: Fix qbv tx latency by setting gtxoffset
A large tx latency issue was discovered during testing when only QBV was
enabled. The issue occurs because gtxoffset was not set when QBV is
active, it was only set when launch time is active.
The patch "igc: Correct the launchtime offset" only sets gtxoffset when
the launchtime_enable field is set by the user. Enabling launchtime_enable
ultimately sets the register IGC_TXQCTL_QUEUE_MODE_LAUNCHT (referred to as
LaunchT in the SW user manual).
Section 7.5.2.6 of the IGC i225/6 SW User Manual Rev 1.2.4 states:
"The latency between transmission scheduling (launch time) and the
time the packet is transmitted to the network is listed in Table 7-61."
However, the patch misinterprets the phrase "launch time" in that section
by assuming it specifically refers to the LaunchT register, whereas it
actually denotes the generic term for when a packet is released from the
internal buffer to the MAC transmit logic.
This launch time, as per that section, also implicitly refers to the QBV
gate open time, where a packet waits in the buffer for the QBV gate to
open. Therefore, latency applies whenever QBV is in use. TSN features such
as QBU and QAV reuse QBV, making the latency universal to TSN features.
Discussed with i226 HW owner (Shalev, Avi) and we were in agreement that
the term "launch time" used in Section 7.5.2.6 is not clear and can be
easily misinterpreted. Avi will update this section to:
"When TQAVCTRL.TRANSMIT_MODE = TSN, the latency between transmission
scheduling and the time the packet is transmitted to the network is listed
in Table 7-61."
Fix this issue by using igc_tsn_is_tx_mode_in_tsn() as a condition to
write to gtxoffset, aligning with the newly updated SW User Manual.
Tested:
1. Enrol taprio on talker board
base-time 0
cycle-time 1000000
flags 0x2
index 0 cmd S gatemask 0x1 interval1
index 0 cmd S gatemask 0x1 interval2
Note:
interval1 = interval for a 64 bytes packet to go through
interval2 = cycle-time - interval1
2. Take tcpdump on listener board
3. Use udp tai app on talker to send packets to listener
4. Check the timestamp on listener via wireshark
Test Result:
100 Mbps: 113 ~193 ns
1000 Mbps: 52 ~ 84 ns
2500 Mbps: 95 ~ 223 ns
Note that the test result is similar to the patch "igc: Correct the
launchtime offset".
Fixes: 790835fcc0cb ("igc: Correct the launchtime offset")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes(a)intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index ada751430517..d68fa7f3d5f0 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -61,7 +61,7 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter)
struct igc_hw *hw = &adapter->hw;
u16 txoffset;
- if (!is_any_launchtime(adapter))
+ if (!igc_tsn_is_tx_mode_in_tsn(adapter))
return;
switch (adapter->link_speed) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082740-citadel-facelift-bc00@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6c3fc0b1c3d0 ("igc: Fix qbv tx latency by setting gtxoffset")
790835fcc0cb ("igc: Correct the launchtime offset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464 Mon Sep 17 00:00:00 2001
From: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Date: Sun, 7 Jul 2024 08:53:18 -0400
Subject: [PATCH] igc: Fix qbv tx latency by setting gtxoffset
A large tx latency issue was discovered during testing when only QBV was
enabled. The issue occurs because gtxoffset was not set when QBV is
active, it was only set when launch time is active.
The patch "igc: Correct the launchtime offset" only sets gtxoffset when
the launchtime_enable field is set by the user. Enabling launchtime_enable
ultimately sets the register IGC_TXQCTL_QUEUE_MODE_LAUNCHT (referred to as
LaunchT in the SW user manual).
Section 7.5.2.6 of the IGC i225/6 SW User Manual Rev 1.2.4 states:
"The latency between transmission scheduling (launch time) and the
time the packet is transmitted to the network is listed in Table 7-61."
However, the patch misinterprets the phrase "launch time" in that section
by assuming it specifically refers to the LaunchT register, whereas it
actually denotes the generic term for when a packet is released from the
internal buffer to the MAC transmit logic.
This launch time, as per that section, also implicitly refers to the QBV
gate open time, where a packet waits in the buffer for the QBV gate to
open. Therefore, latency applies whenever QBV is in use. TSN features such
as QBU and QAV reuse QBV, making the latency universal to TSN features.
Discussed with i226 HW owner (Shalev, Avi) and we were in agreement that
the term "launch time" used in Section 7.5.2.6 is not clear and can be
easily misinterpreted. Avi will update this section to:
"When TQAVCTRL.TRANSMIT_MODE = TSN, the latency between transmission
scheduling and the time the packet is transmitted to the network is listed
in Table 7-61."
Fix this issue by using igc_tsn_is_tx_mode_in_tsn() as a condition to
write to gtxoffset, aligning with the newly updated SW User Manual.
Tested:
1. Enrol taprio on talker board
base-time 0
cycle-time 1000000
flags 0x2
index 0 cmd S gatemask 0x1 interval1
index 0 cmd S gatemask 0x1 interval2
Note:
interval1 = interval for a 64 bytes packet to go through
interval2 = cycle-time - interval1
2. Take tcpdump on listener board
3. Use udp tai app on talker to send packets to listener
4. Check the timestamp on listener via wireshark
Test Result:
100 Mbps: 113 ~193 ns
1000 Mbps: 52 ~ 84 ns
2500 Mbps: 95 ~ 223 ns
Note that the test result is similar to the patch "igc: Correct the
launchtime offset".
Fixes: 790835fcc0cb ("igc: Correct the launchtime offset")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim(a)linux.intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes(a)intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index ada751430517..d68fa7f3d5f0 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -61,7 +61,7 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter)
struct igc_hw *hw = &adapter->hw;
u16 txoffset;
- if (!is_any_launchtime(adapter))
+ if (!igc_tsn_is_tx_mode_in_tsn(adapter))
return;
switch (adapter->link_speed) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ce61b605a00502c59311d0a4b1f58d62b48272d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082604-depose-iphone-7d55@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ce61b605a005 ("ksmbd: the buffer of smb2 query dir response has at least 1 byte")
e2b76ab8b5c9 ("ksmbd: add support for read compound")
e202a1e8634b ("ksmbd: no response from compound read")
7b7d709ef7cf ("ksmbd: add missing compound request handing in some commands")
81a94b27847f ("ksmbd: use kvzalloc instead of kvmalloc")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
30210947a343 ("ksmbd: fix racy issue under cocurrent smb2 tree disconnect")
abcc506a9a71 ("ksmbd: fix racy issue from smb2 close and logoff with multichannel")
ea174a918939 ("ksmbd: destroy expired sessions")
f5c779b7ddbd ("ksmbd: fix racy issue from session setup and logoff")
74d7970febf7 ("ksmbd: fix racy issue from using ->d_parent and ->d_name")
34e8ccf9ce24 ("ksmbd: set NegotiateContextCount once instead of every inc")
42bc6793e452 ("Merge tag 'pull-lock_rename_child' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into ksmbd-for-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce61b605a00502c59311d0a4b1f58d62b48272d0 Mon Sep 17 00:00:00 2001
From: Namjae Jeon <linkinjeon(a)kernel.org>
Date: Tue, 20 Aug 2024 22:07:38 +0900
Subject: [PATCH] ksmbd: the buffer of smb2 query dir response has at least 1
byte
When STATUS_NO_MORE_FILES status is set to smb2 query dir response,
->StructureSize is set to 9, which mean buffer has 1 byte.
This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to
flex-array.
Fixes: eb3e28c1e89b ("smb3: Replace smb2pdu 1-element arrays with flex-arrays")
Cc: stable(a)vger.kernel.org # v6.1+
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index 0bc9edf22ba4..e9204180919e 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -4409,7 +4409,8 @@ int smb2_query_dir(struct ksmbd_work *work)
rsp->OutputBufferLength = cpu_to_le32(0);
rsp->Buffer[0] = 0;
rc = ksmbd_iov_pin_rsp(work, (void *)rsp,
- sizeof(struct smb2_query_directory_rsp));
+ offsetof(struct smb2_query_directory_rsp, Buffer)
+ + 1);
if (rc)
goto err_out;
} else {
Commit e882575efc77 ("spi: rockchip: Suspend and resume the bus during
NOIRQ_SYSTEM_SLEEP_PM ops") stopped respecting runtime PM status and
simply disabled clocks unconditionally when suspending the system. This
causes problems when the device is already runtime suspended when we go
to sleep -- in which case we double-disable clocks and produce a
WARNing.
Switch back to pm_runtime_force_{suspend,resume}(), because that still
seems like the right thing to do, and the aforementioned commit makes no
explanation why it stopped using it.
Also, refactor some of the resume() error handling, because it's not
actually a good idea to re-disable clocks on failure.
Fixes: e882575efc77 ("spi: rockchip: Suspend and resume the bus during NOIRQ_SYSTEM_SLEEP_PM ops")
Cc: <stable(a)vger.kernel.org>
Reported-by: "Ondřej Jirman" <megi(a)xff.cz>
Closes: https://lore.kernel.org/lkml/20220621154218.sau54jeij4bunf56@core/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/spi/spi-rockchip.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c
index e1ecd96c7858..f30af4316b8b 100644
--- a/drivers/spi/spi-rockchip.c
+++ b/drivers/spi/spi-rockchip.c
@@ -951,8 +951,11 @@ static int rockchip_spi_suspend(struct device *dev)
if (ret < 0)
return ret;
- clk_disable_unprepare(rs->spiclk);
- clk_disable_unprepare(rs->apb_pclk);
+ ret = pm_runtime_force_suspend(dev);
+ if (ret < 0) {
+ spi_controller_resume(ctlr);
+ return ret;
+ }
pinctrl_pm_select_sleep_state(dev);
@@ -967,21 +970,11 @@ static int rockchip_spi_resume(struct device *dev)
pinctrl_pm_select_default_state(dev);
- ret = clk_prepare_enable(rs->apb_pclk);
+ ret = pm_runtime_force_resume(dev);
if (ret < 0)
return ret;
- ret = clk_prepare_enable(rs->spiclk);
- if (ret < 0)
- clk_disable_unprepare(rs->apb_pclk);
-
- ret = spi_controller_resume(ctlr);
- if (ret < 0) {
- clk_disable_unprepare(rs->spiclk);
- clk_disable_unprepare(rs->apb_pclk);
- }
-
- return 0;
+ return spi_controller_resume(ctlr);
}
#endif /* CONFIG_PM_SLEEP */
--
2.46.0.295.g3b9ea8a38a-goog
Do not validate format equality for the non 3d cases to allow xrgb to
argb copies and make sure the dx binding flags are only used
on dx compatible surfaces.
Fixes basic 2d kms setup on configurations without 3d. There's little
practical benefit to it because kms framebuffer coherence is disabled
on configurations without 3d but with those changes the code actually
makes sense.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.9+
Cc: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Cc: Martin Krastev <martin.krastev(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 9 ---------
drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 9 ++++++---
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 288ed0bb75cb..b5fc5a9e123a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1339,15 +1339,6 @@ static int vmw_kms_new_framebuffer_surface(struct vmw_private *dev_priv,
return -EINVAL;
}
- /*
- * For DX, surface format validation is done when surface->scanout
- * is set.
- */
- if (!has_sm4_context(dev_priv) && format != surface->metadata.format) {
- DRM_ERROR("Invalid surface format for requested mode.\n");
- return -EINVAL;
- }
-
vfbs = kzalloc(sizeof(*vfbs), GFP_KERNEL);
if (!vfbs) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 1625b30d9970..5721c74da3e0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -2276,9 +2276,12 @@ int vmw_dumb_create(struct drm_file *file_priv,
const struct SVGA3dSurfaceDesc *desc = vmw_surface_get_desc(format);
SVGA3dSurfaceAllFlags flags = SVGA3D_SURFACE_HINT_TEXTURE |
SVGA3D_SURFACE_HINT_RENDERTARGET |
- SVGA3D_SURFACE_SCREENTARGET |
- SVGA3D_SURFACE_BIND_SHADER_RESOURCE |
- SVGA3D_SURFACE_BIND_RENDER_TARGET;
+ SVGA3D_SURFACE_SCREENTARGET;
+
+ if (vmw_surface_is_dx_screen_target_format(format)) {
+ flags |= SVGA3D_SURFACE_BIND_SHADER_RESOURCE |
+ SVGA3D_SURFACE_BIND_RENDER_TARGET;
+ }
/*
* Without mob support we're just going to use raw memory buffer
--
2.43.0
From: ysay <ysaydong(a)gmail.com>
When guest_halt_poll_allow_shrink=N,setting guest_halt_poll_ns
from a large value to 0 does not reset the CPU polling time,
despite guest_halt_poll_ns being intended as a mandatory maximum
time limit.
The problem was situated in the adjust_poll_limit() within
drivers/cpuidle/governors/haltpoll.c:79.
Specifically, when guest_halt_poll_allow_shrink was set to N,
resetting guest_halt_poll_ns to zero did not lead to executing any
section of code that adjusts dev->poll_limit_ns.
The issue has been resolved by relocating the check and assignment for
dev->poll_limit_ns outside of the conditional block.
This ensures that every modification to guest_halt_poll_ns
properly influences the CPU polling time.
Signed-off-by: ysay <ysaydong(a)gmail.com>
Fixes: 2cffe9f6b96f ("cpuidle: add haltpoll governor")
---
drivers/cpuidle/governors/haltpoll.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/cpuidle/governors/haltpoll.c b/drivers/cpuidle/governors/haltpoll.c
index 663b7f164..99c6260d7 100644
--- a/drivers/cpuidle/governors/haltpoll.c
+++ b/drivers/cpuidle/governors/haltpoll.c
@@ -78,26 +78,22 @@ static int haltpoll_select(struct cpuidle_driver *drv,
static void adjust_poll_limit(struct cpuidle_device *dev, u64 block_ns)
{
- unsigned int val;
+ unsigned int val = dev->poll_limit_ns;
/* Grow cpu_halt_poll_us if
* cpu_halt_poll_us < block_ns < guest_halt_poll_us
*/
if (block_ns > dev->poll_limit_ns && block_ns <= guest_halt_poll_ns) {
- val = dev->poll_limit_ns * guest_halt_poll_grow;
+ val *= guest_halt_poll_grow;
if (val < guest_halt_poll_grow_start)
val = guest_halt_poll_grow_start;
- if (val > guest_halt_poll_ns)
- val = guest_halt_poll_ns;
trace_guest_halt_poll_ns_grow(val, dev->poll_limit_ns);
- dev->poll_limit_ns = val;
} else if (block_ns > guest_halt_poll_ns &&
guest_halt_poll_allow_shrink) {
unsigned int shrink = guest_halt_poll_shrink;
- val = dev->poll_limit_ns;
if (shrink == 0) {
val = 0;
} else {
@@ -108,8 +104,12 @@ static void adjust_poll_limit(struct cpuidle_device *dev, u64 block_ns)
}
trace_guest_halt_poll_ns_shrink(val, dev->poll_limit_ns);
- dev->poll_limit_ns = val;
}
+
+ if (val > guest_halt_poll_ns)
+ val = guest_halt_poll_ns;
+
+ dev->poll_limit_ns = val;
}
/**
--
2.43.5
To check if mmc cqe is in halt state, need to check set/clear of CQHCI_HALT
bit. At this time, we need to check with &, not &&. Therefore, code to
check whether cqe is in halt state is modified to cqhci_halted, which has
already been implemented.
Fixes: 0653300224a6 ("mmc: cqhci: rename cqhci.c to cqhci-core.c")
Cc: stable(a)vger.kernel.org
Signed-off-by: Seunghwan Baek <sh8267.baek(a)samsung.com>
---
drivers/mmc/host/cqhci-core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index c14d7251d0bb..3d5bcb92c78e 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -282,7 +282,7 @@ static void __cqhci_enable(struct cqhci_host *cq_host)
cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
- if (cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT)
+ if (cqhci_halted(cq_host))
cqhci_writel(cq_host, 0, CQHCI_CTL);
mmc->cqe_on = true;
@@ -617,7 +617,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
cqhci_writel(cq_host, 0, CQHCI_CTL);
mmc->cqe_on = true;
pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
- if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
+ if (cqhci_halted(cq_host)) {
pr_err("%s: cqhci: CQE failed to exit halt state\n",
mmc_hostname(mmc));
}
--
2.17.1
create_elf_fdpic_tables() does not correctly account the space for the
AUX vector when an architecture has ELF_HWCAP2 defined. Prior to the
commit 10e29251be0e ("binfmt_elf_fdpic: fix /proc/<pid>/auxv") it
resulted in the last entry of the AUX vector being set to zero, but with
that change it results in a kernel BUG.
Fix that by adding one to the number of AUXV entries (nitems) when
ELF_HWCAP2 is defined.
Fixes: 10e29251be0e ("binfmt_elf_fdpic: fix /proc/<pid>/auxv")
Cc: stable(a)vger.kernel.org
Reported-by: Greg Ungerer <gregungerer(a)westnet.com.au>
Closes: https://lore.kernel.org/lkml/5b51975f-6d0b-413c-8b38-39a6a45e8821@westnet.c…
Signed-off-by: Max Filippov <jcmvbkbc(a)gmail.com>
---
fs/binfmt_elf_fdpic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index c11289e1301b..a5cb45cb30c8 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -594,6 +594,9 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
if (bprm->have_execfd)
nitems++;
+#ifdef ELF_HWCAP2
+ nitems++;
+#endif
csp = sp;
sp -= nitems * 2 * sizeof(unsigned long);
--
2.39.2
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b6fb565a2d15277896583d471b21bc14a0c99661
Gitweb: https://git.kernel.org/tip/b6fb565a2d15277896583d471b21bc14a0c99661
Author: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
AuthorDate: Mon, 26 Aug 2024 15:53:04 +03:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 26 Aug 2024 12:45:19 -07:00
x86/tdx: Fix data leak in mmio_read()
The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an
address from the VMM.
Sean noticed that mmio_read() unintentionally exposes the value of an
initialized variable (val) on the stack to the VMM.
This variable is only needed as an output value. It did not need to be
passed to the VMM in the first place.
Do not send the original value of *val to the VMM.
[ dhansen: clarify what 'val' is used for. ]
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240826125304.1566719-1-kirill.shutemov%40linu…
---
arch/x86/coco/tdx/tdx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2ba..da8b66d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -389,7 +389,6 @@ static bool mmio_read(int size, unsigned long addr, unsigned long *val)
.r12 = size,
.r13 = EPT_READ,
.r14 = addr,
- .r15 = *val,
};
if (__tdx_hypercall(&args))
The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an
address from the VMM.
Sean noticed that mmio_read() unintentionally exposes the value of an
initialized variable on the stack to the VMM.
Do not send the original value of *val to the VMM.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Sean Christopherson <seanjc(a)google.com>
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Cc: stable(a)vger.kernel.org # v5.19+
---
arch/x86/coco/tdx/tdx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2bac2553..da8b66dce0da 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -389,7 +389,6 @@ static bool mmio_read(int size, unsigned long addr, unsigned long *val)
.r12 = size,
.r13 = EPT_READ,
.r14 = addr,
- .r15 = *val,
};
if (__tdx_hypercall(&args))
--
2.43.0
From: ysay <ysaydong(a)gmail.com>
When guest_halt_poll_allow_shrink=N,setting guest_halt_poll_ns
from a large value to 0 does not reset the CPU polling time,
despite guest_halt_poll_ns being intended as a mandatory maximum
time limit.
The problem was situated in the adjust_poll_limit() within
drivers/cpuidle/governors/haltpoll.c:79.
Specifically, when guest_halt_poll_allow_shrink was set to N,
resetting guest_halt_poll_ns to zero did not lead to executing any
section of code that adjusts dev->poll_limit_ns.
The issue has been resolved by relocating the check and assignment for
dev->poll_limit_ns outside of the conditional block.
This ensures that every modification to guest_halt_poll_ns
properly influences the CPU polling time.
Signed-off-by: ysay <ysaydong(a)gmail.com>
---
drivers/cpuidle/governors/haltpoll.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/cpuidle/governors/haltpoll.c b/drivers/cpuidle/governors/haltpoll.c
index 663b7f164..99c6260d7 100644
--- a/drivers/cpuidle/governors/haltpoll.c
+++ b/drivers/cpuidle/governors/haltpoll.c
@@ -78,26 +78,22 @@ static int haltpoll_select(struct cpuidle_driver *drv,
static void adjust_poll_limit(struct cpuidle_device *dev, u64 block_ns)
{
- unsigned int val;
+ unsigned int val = dev->poll_limit_ns;
/* Grow cpu_halt_poll_us if
* cpu_halt_poll_us < block_ns < guest_halt_poll_us
*/
if (block_ns > dev->poll_limit_ns && block_ns <= guest_halt_poll_ns) {
- val = dev->poll_limit_ns * guest_halt_poll_grow;
+ val *= guest_halt_poll_grow;
if (val < guest_halt_poll_grow_start)
val = guest_halt_poll_grow_start;
- if (val > guest_halt_poll_ns)
- val = guest_halt_poll_ns;
trace_guest_halt_poll_ns_grow(val, dev->poll_limit_ns);
- dev->poll_limit_ns = val;
} else if (block_ns > guest_halt_poll_ns &&
guest_halt_poll_allow_shrink) {
unsigned int shrink = guest_halt_poll_shrink;
- val = dev->poll_limit_ns;
if (shrink == 0) {
val = 0;
} else {
@@ -108,8 +104,12 @@ static void adjust_poll_limit(struct cpuidle_device *dev, u64 block_ns)
}
trace_guest_halt_poll_ns_shrink(val, dev->poll_limit_ns);
- dev->poll_limit_ns = val;
}
+
+ if (val > guest_halt_poll_ns)
+ val = guest_halt_poll_ns;
+
+ dev->poll_limit_ns = val;
}
/**
--
2.43.5
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: eb786ee1390b8fbba633c01a971709c6906fd8bf
Gitweb: https://git.kernel.org/tip/eb786ee1390b8fbba633c01a971709c6906fd8bf
Author: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
AuthorDate: Mon, 26 Aug 2024 15:53:04 +03:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 26 Aug 2024 07:04:09 -07:00
x86/tdx: Fix data leak in mmio_read()
The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an
address from the VMM.
Sean noticed that mmio_read() unintentionally exposes the value of an
initialized variable on the stack to the VMM.
Do not send the original value of *val to the VMM.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240826125304.1566719-1-kirill.shutemov%40linu…
---
arch/x86/coco/tdx/tdx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2ba..da8b66d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -389,7 +389,6 @@ static bool mmio_read(int size, unsigned long addr, unsigned long *val)
.r12 = size,
.r13 = EPT_READ,
.r14 = addr,
- .r15 = *val,
};
if (__tdx_hypercall(&args))
commit 2347961b11d4079deace3c81dceed460c08a8fc1
I would like to propose this commit from 5.12 for stable,
or rather, ask whether it’s a candidate and leave the exact
picking/backporting to the experts (if it has other commits
as prerequisites and/or later fixups).
This is because qemu-user needs it, and it arrived just too
late for the current Debian LTS kernel (5.10), and qemu-user
in Debian until yesterday had a workaround, but now doesn’t
because it’s in the stable kernel (6.1), so qemu-user-static
cannot be just used as-is on Debian LTS any more.
I’d like it to be applied to 5.10 (obviously), but perhaps
others would appreciate more broad coverage.
Thanks in advance,
//mirabilos
--
„Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund,
mksh auf jedem System zu installieren.“
-- XTaran auf der OpenRheinRuhr, ganz begeistert
(EN: “[…]uhr.gz is a reason to install mksh on every system.”)
There is a race condition generic_shutdown_super() and
__btrfs_run_defrag_inode().
Consider the following scenario:
umount thread: btrfs-cleaner thread:
btrfs_run_delayed_iputs()
->run_delayed_iput_locked()
->iput(inode)
// Here the inode (ie ino 261) will be cleared and freed
btrfs_kill_super()
->generic_shutdown_super()
btrfs_run_defrag_inodes()
->__btrfs_run_defrag_inode()
->btrfs_iget(ino)
// The inode 261 was recreated with i_count=1
// and added to the sb list
->evict_inodes(sb) // After some work
// inode 261 was added ->iput(inode)
// to the dispose list ->iput_funal()
->evict(inode) ->evict(inode)
Now, we have two threads simultaneously evicting
the same inode, which led to a bug.
The above behavior can be confirmed by the log I added for debugging
and the log printed when BUG was triggered.
Due to space limitations, I cannot paste the full diff and here is a
brief describtion.
First, within __btrfs_run_defrag_inode(), set inode->i_state |= (1<<19)
just before calling iput().
Within the dispose_list(), check the flag, if the flag was set, then
pr_info("bug! double evict! crash will happen! state is 0x%lx\n", inode->i_state);
Here is the printed log when the BUG was triggered:
[ 190.686726][ T2336] bug! double evict! crash will happen! state is 0x80020
[ 190.687647][ T2336] ------------[ cut here ]------------
[ 190.688294][ T2336] kernel BUG at fs/inode.c:626!
[ 190.688939][ T2336] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 190.689792][ T2336] CPU: 1 PID: 2336 Comm: a.out Not tainted 6.10.0-rc2-00223-g0c529ab65ef8-dirty #109
[ 190.690894][ T2336] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 190.692111][ T2336] RIP: 0010:clear_inode+0x15b/0x190
// some logs...
[ 190.704501][ T2336] btrfs_evict_inode+0x529/0xe80
[ 190.706966][ T2336] evict+0x2ed/0x6c0
[ 190.707209][ T2336] dispose_list+0x62/0x260
[ 190.707490][ T2336] evict_inodes+0x34e/0x450
To prevent this behavior, we need to set BTRFS_FS_CLOSING_START
before kill_anon_super() to ensure that
btrfs_run_defrag_inodes() doesn't continue working after unmount.
Reported-and-tested-by: syzbot+67ba3c42bcbb4665d3ad(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
CC: stable(a)vger.kernel.org
Fixes: c146afad2c7f ("Btrfs: mount ro and remount support")
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/btrfs/super.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f05cce7c8b8d..f7e87fe583ab 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2093,6 +2093,7 @@ static int btrfs_get_tree(struct fs_context *fc)
static void btrfs_kill_super(struct super_block *sb)
{
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+ set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
kill_anon_super(sb);
btrfs_free_fs_info(fs_info);
}
--
2.39.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c0a1ef9c5be72ff28a5413deb1b3e1a066593c13
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082600-trodden-majesty-7be0@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
c0a1ef9c5be7 ("thermal: of: Fix OF node leak in of_thermal_zone_find() error paths")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0a1ef9c5be72ff28a5413deb1b3e1a066593c13 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Wed, 14 Aug 2024 21:58:23 +0200
Subject: [PATCH] thermal: of: Fix OF node leak in of_thermal_zone_find() error
paths
Terminating for_each_available_child_of_node() loop requires dropping OF
node reference, so bailing out on errors misses this. Solve the OF node
reference leak with scoped for_each_available_child_of_node_scoped().
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst(a)chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro…
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index b08a9b64718d..1f252692815a 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -184,14 +184,14 @@ static struct device_node *of_thermal_zone_find(struct device_node *sensor, int
* Search for each thermal zone, a defined sensor
* corresponding to the one passed as parameter
*/
- for_each_available_child_of_node(np, tz) {
+ for_each_available_child_of_node_scoped(np, child) {
int count, i;
- count = of_count_phandle_with_args(tz, "thermal-sensors",
+ count = of_count_phandle_with_args(child, "thermal-sensors",
"#thermal-sensor-cells");
if (count <= 0) {
- pr_err("%pOFn: missing thermal sensor\n", tz);
+ pr_err("%pOFn: missing thermal sensor\n", child);
tz = ERR_PTR(-EINVAL);
goto out;
}
@@ -200,18 +200,19 @@ static struct device_node *of_thermal_zone_find(struct device_node *sensor, int
int ret;
- ret = of_parse_phandle_with_args(tz, "thermal-sensors",
+ ret = of_parse_phandle_with_args(child, "thermal-sensors",
"#thermal-sensor-cells",
i, &sensor_specs);
if (ret < 0) {
- pr_err("%pOFn: Failed to read thermal-sensors cells: %d\n", tz, ret);
+ pr_err("%pOFn: Failed to read thermal-sensors cells: %d\n", child, ret);
tz = ERR_PTR(ret);
goto out;
}
if ((sensor == sensor_specs.np) && id == (sensor_specs.args_count ?
sensor_specs.args[0] : 0)) {
- pr_debug("sensor %pOFn id=%d belongs to %pOFn\n", sensor, id, tz);
+ pr_debug("sensor %pOFn id=%d belongs to %pOFn\n", sensor, id, child);
+ tz = no_free_ptr(child);
goto out;
}
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c0a1ef9c5be72ff28a5413deb1b3e1a066593c13
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082659-veto-ladies-d1b3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
c0a1ef9c5be7 ("thermal: of: Fix OF node leak in of_thermal_zone_find() error paths")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0a1ef9c5be72ff28a5413deb1b3e1a066593c13 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Wed, 14 Aug 2024 21:58:23 +0200
Subject: [PATCH] thermal: of: Fix OF node leak in of_thermal_zone_find() error
paths
Terminating for_each_available_child_of_node() loop requires dropping OF
node reference, so bailing out on errors misses this. Solve the OF node
reference leak with scoped for_each_available_child_of_node_scoped().
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst(a)chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro…
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index b08a9b64718d..1f252692815a 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -184,14 +184,14 @@ static struct device_node *of_thermal_zone_find(struct device_node *sensor, int
* Search for each thermal zone, a defined sensor
* corresponding to the one passed as parameter
*/
- for_each_available_child_of_node(np, tz) {
+ for_each_available_child_of_node_scoped(np, child) {
int count, i;
- count = of_count_phandle_with_args(tz, "thermal-sensors",
+ count = of_count_phandle_with_args(child, "thermal-sensors",
"#thermal-sensor-cells");
if (count <= 0) {
- pr_err("%pOFn: missing thermal sensor\n", tz);
+ pr_err("%pOFn: missing thermal sensor\n", child);
tz = ERR_PTR(-EINVAL);
goto out;
}
@@ -200,18 +200,19 @@ static struct device_node *of_thermal_zone_find(struct device_node *sensor, int
int ret;
- ret = of_parse_phandle_with_args(tz, "thermal-sensors",
+ ret = of_parse_phandle_with_args(child, "thermal-sensors",
"#thermal-sensor-cells",
i, &sensor_specs);
if (ret < 0) {
- pr_err("%pOFn: Failed to read thermal-sensors cells: %d\n", tz, ret);
+ pr_err("%pOFn: Failed to read thermal-sensors cells: %d\n", child, ret);
tz = ERR_PTR(ret);
goto out;
}
if ((sensor == sensor_specs.np) && id == (sensor_specs.args_count ?
sensor_specs.args[0] : 0)) {
- pr_debug("sensor %pOFn id=%d belongs to %pOFn\n", sensor, id, tz);
+ pr_debug("sensor %pOFn id=%d belongs to %pOFn\n", sensor, id, child);
+ tz = no_free_ptr(child);
goto out;
}
}
Most firmware names are hardcoded strings, or are constructed from fairly
constrained format strings where the dynamic parts are just some hex
numbers or such.
However, there are a couple codepaths in the kernel where firmware file
names contain string components that are passed through from a device or
semi-privileged userspace; the ones I could find (not counting interfaces
that require root privileges) are:
- lpfc_sli4_request_firmware_update() seems to construct the firmware
filename from "ModelName", a string that was previously parsed out of
some descriptor ("Vital Product Data") in lpfc_fill_vpd()
- nfp_net_fw_find() seems to construct a firmware filename from a model
name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I
think parses some descriptor that was read from the device.
(But this case likely isn't exploitable because the format string looks
like "netronome/nic_%s", and there shouldn't be any *folders* starting
with "netronome/nic_". The previous case was different because there,
the "%s" is *at the start* of the format string.)
- module_flash_fw_schedule() is reachable from the
ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as
GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is
enough to pass the privilege check), and takes a userspace-provided
firmware name.
(But I think to reach this case, you need to have CAP_NET_ADMIN over a
network namespace that a special kind of ethernet device is mapped into,
so I think this is not a viable attack path in practice.)
Fix it by rejecting any firmware names containing ".." path components.
For what it's worth, I went looking and haven't found any USB device
drivers that use the firmware loader dangerously.
Cc: stable(a)vger.kernel.org
Fixes: abb139e75c2c ("firmware: teach the kernel to load firmware files directly from the filesystem")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Changes in v2:
- describe fix in commit message (dakr)
- write check more clearly and with comment in separate helper (dakr)
- document new restriction in comment above request_firmware() (dakr)
- warn when new restriction is triggered
- Link to v1: https://lore.kernel.org/r/20240820-firmware-traversal-v1-1-8699ffaa9276@goo…
---
drivers/base/firmware_loader/main.c | 41 +++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index a03ee4b11134..dd47ce9a761f 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -849,6 +849,37 @@ static void fw_log_firmware_info(const struct firmware *fw, const char *name,
{}
#endif
+/*
+ * Reject firmware file names with ".." path components.
+ * There are drivers that construct firmware file names from device-supplied
+ * strings, and we don't want some device to be able to tell us "I would like to
+ * be sent my firmware from ../../../etc/shadow, please".
+ *
+ * Search for ".." surrounded by either '/' or start/end of string.
+ *
+ * This intentionally only looks at the firmware name, not at the firmware base
+ * directory or at symlink contents.
+ */
+static bool name_contains_dotdot(const char *name)
+{
+ size_t name_len = strlen(name);
+ size_t i;
+
+ if (name_len < 2)
+ return false;
+ for (i = 0; i < name_len - 1; i++) {
+ /* do we see a ".." sequence? */
+ if (name[i] != '.' || name[i+1] != '.')
+ continue;
+
+ /* is it a path component? */
+ if ((i == 0 || name[i-1] == '/') &&
+ (i == name_len - 2 || name[i+2] == '/'))
+ return true;
+ }
+ return false;
+}
+
/* called from request_firmware() and request_firmware_work_func() */
static int
_request_firmware(const struct firmware **firmware_p, const char *name,
@@ -869,6 +900,14 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
goto out;
}
+ if (name_contains_dotdot(name)) {
+ dev_warn(device,
+ "Firmware load for '%s' refused, path contains '..' component",
+ name);
+ ret = -EINVAL;
+ goto out;
+ }
+
ret = _request_firmware_prepare(&fw, name, device, buf, size,
offset, opt_flags);
if (ret <= 0) /* error or already assigned */
@@ -946,6 +985,8 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
* @name will be used as $FIRMWARE in the uevent environment and
* should be distinctive enough not to be confused with any other
* firmware image for this or any other device.
+ * It must not contain any ".." path components - "foo/bar..bin" is
+ * allowed, but "foo/../bar.bin" is not.
*
* Caller must hold the reference count of @device.
*
---
base-commit: b0da640826ba3b6506b4996a6b23a429235e6923
change-id: 20240820-firmware-traversal-6df8501b0fe4
--
Jann Horn <jannh(a)google.com>