From: Jann Horn <jannh(a)google.com>
commit 71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8 upstream.
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
the right two race windows) - I've tested this in a kernel build with
some extra mdelay() calls. See the commit message for a description
of the race scenario.
On older kernels (before 6.5), I think the same bug can even
theoretically lead to accessing transhuge page contents as a page table
if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
detecting PMDs that don't point to page tables.
On older kernels (before 6.5), you'd just have to win a single fairly
wide race to hit this.
I've tested this on 6.1 stable by racing migration (with a mdelay()
patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
to yank page tables out from under us (though I haven't tested that),
so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
<zap PMD>
pmdp_get_lockless() [reads none pmd]
<bail if trans_huge>
<if none:>
<pagefault creates transhuge zeropage>
__pte_alloc [no-op]
<zap PMD>
<bail if pmd_trans_huge(*dst_pmd)>
BUG_ON(pmd_none(*dst_pmd))
I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).
If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@goog…
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@goog…
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ According to backport note in git comment message, using pmd_read_atomic()
instead of pmdp_get_lockless() in older kernels ]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified on build test
---
mm/userfaultfd.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 992a0a16846f..998c7075c62a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -642,21 +642,23 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
}
dst_pmdval = pmd_read_atomic(dst_pmd);
- /*
- * If the dst_pmd is mapped as THP don't
- * override it and just be strict.
- */
- if (unlikely(pmd_trans_huge(dst_pmdval))) {
- err = -EEXIST;
- break;
- }
if (unlikely(pmd_none(dst_pmdval)) &&
unlikely(__pte_alloc(dst_mm, dst_pmd))) {
err = -ENOMEM;
break;
}
- /* If an huge pmd materialized from under us fail */
- if (unlikely(pmd_trans_huge(*dst_pmd))) {
+ dst_pmdval = pmd_read_atomic(dst_pmd);
+ /*
+ * If the dst_pmd is THP don't override it and just be strict.
+ * (This includes the case where the PMD used to be THP and
+ * changed back to none after __pte_alloc().)
+ */
+ if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
+ pmd_devmap(dst_pmdval))) {
+ err = -EEXIST;
+ break;
+ }
+ if (unlikely(pmd_bad(dst_pmdval))) {
err = -EFAULT;
break;
}
--
2.34.1
If the directory is corrupted and the number of nlinks is less than 2
(valid nlinks have at least 2), then when the directory is deleted, the
minix_rmdir will try to reduce the nlinks(unsigned int) to a negative
value.
Make nlinks validity check for directories.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrey Kriulin <kitotavrik.s(a)gmail.com>
---
v3: Move nlinks validaty check to minix_rmdir and minix_rename per Jan
Kara <jack(a)suse.cz> request.
v2: Move nlinks validaty check to V[12]_minix_iget() per Jan Kara
<jack(a)suse.cz> request. Change return error code to EUCLEAN. Don't block
directory in r/o mode per Al Viro <viro(a)zeniv.linux.org.uk> request.
fs/minix/namei.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/minix/namei.c b/fs/minix/namei.c
index 8938536d8d3c..5a1e5f8ef443 100644
--- a/fs/minix/namei.c
+++ b/fs/minix/namei.c
@@ -161,8 +161,12 @@ static int minix_unlink(struct inode * dir, struct dentry *dentry)
static int minix_rmdir(struct inode * dir, struct dentry *dentry)
{
struct inode * inode = d_inode(dentry);
- int err = -ENOTEMPTY;
+ int err = -EUCLEAN;
+ if (inode->i_nlink < 2)
+ return err;
+
+ err = -ENOTEMPTY;
if (minix_empty_dir(inode)) {
err = minix_unlink(dir, dentry);
if (!err) {
@@ -235,6 +239,10 @@ static int minix_rename(struct mnt_idmap *idmap,
mark_inode_dirty(old_inode);
if (dir_de) {
+ if (old_dir->i_nlink <= 2) {
+ err = -EUCLEAN;
+ goto out_dir;
+ }
err = minix_set_link(dir_de, dir_folio, new_dir);
if (!err)
inode_dec_link_count(old_dir);
--
2.47.2
During an xhci host controller reset (via `USBCMD.HCRST`), reading DWC3
registers can return zero instead of their actual values. This applies
not only to registers within the xhci memory space but also those in
the broader DWC3 IP block.
By default, the xhci driver doesn't wait for the reset handshake to
complete during teardown. This can cause problems when the DWC3 controller
is operating as a dual role device and is switching from host to device
mode, the invalid register read caused by ongoing HCRST could lead to
gadget mode startup failures and unintended register overwrites.
To mitigate this, enable xhci-full-reset-on-remove-quirk to ensure that
xhci_reset() completes its full reset handshake during xhci removal.
Cc: stable(a)vger.kernel.org
Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Signed-off-by: Roy Luo <royluo(a)google.com>
---
drivers/usb/dwc3/host.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index b48e108fc8fe..ea865898308f 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -126,7 +126,7 @@ static int dwc3_host_get_irq(struct dwc3 *dwc)
int dwc3_host_init(struct dwc3 *dwc)
{
- struct property_entry props[6];
+ struct property_entry props[7];
struct platform_device *xhci;
int ret, irq;
int prop_idx = 0;
@@ -182,6 +182,9 @@ int dwc3_host_init(struct dwc3 *dwc)
if (DWC3_VER_IS_WITHIN(DWC3, ANY, 300A))
props[prop_idx++] = PROPERTY_ENTRY_BOOL("quirk-broken-port-ped");
+ if (dwc->dr_mode == USB_DR_MODE_OTG)
+ props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-full-reset-on-remove-quirk");
+
if (prop_idx) {
ret = device_create_managed_software_node(&xhci->dev, props, NULL);
if (ret) {
--
2.49.0.1112.g889b7c5bd8-goog
Add the correct scale to get temperature in mili degree Celcius.
Add sign component to temperature scan element.
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
---
Changes in v4:
- Blindly Copy/Pasted Andy's comment. Use the correct MILLIDEGREE_PER_DEGREE
- Link to v3: https://lore.kernel.org/r/20250505-fxls-v3-0-8c541bf0205c@geanix.com
Changes in v3:
- Dropping define infavor of inline scale value
- Added using constants from units.h
- Tweaked commit msg to make it more assertive
- Link to v2: https://lore.kernel.org/r/20250502-fxls-v2-0-e1af65f1aa6c@geanix.com
Changes in v2:
- Correct offset is applied before scaling component
- Added sign component to temperature scan element
- Link to v1: https://lore.kernel.org/r/20250501-fxls-v1-1-f54061a07099@geanix.com
---
Sean Nyekjaer (2):
iio: accel: fxls8962af: Fix temperature calculation
iio: accel: fxls8962af: Fix temperature scan element sign
drivers/iio/accel/fxls8962af-core.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
---
base-commit: 609bc31eca06c7408e6860d8b46311ebe45c1fef
change-id: 20250501-fxls-307ef3d6d065
Best regards,
--
Sean Nyekjaer <sean(a)geanix.com>