The quilt patch titled
Subject: mm: disable CONFIG_PER_VMA_LOCK until its fixed
has been removed from the -mm tree. Its filename was
mm-disable-config_per_vma_lock-until-its-fixed.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: disable CONFIG_PER_VMA_LOCK until its fixed
Date: Wed, 5 Jul 2023 18:14:00 -0700
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Disable per-VMA locks config to
prevent this issue until the fix is confirmed. This is expected to be a
temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-3-surenb@google.com
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Holger Hoffst��tte <holger(a)applied-asynchrony.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/Kconfig~mm-disable-config_per_vma_lock-until-its-fixed
+++ a/mm/Kconfig
@@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
def_bool n
config PER_VMA_LOCK
- def_bool y
+ bool "Enable per-vma locking during page fault handling."
depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+ depends on BROKEN
help
Allow per-vma locking during page fault handling.
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-lock-a-vma-before-stack-expansion.patch
mm-lock-newly-mapped-vma-which-can-be-modified-after-it-becomes-visible.patch
swap-remove-remnants-of-polling-from-read_swap_cache_async.patch
mm-add-missing-vm_fault_result_trace-name-for-vm_fault_completed.patch
mm-drop-per-vma-lock-when-returning-vm_fault_retry-or-vm_fault_completed.patch
mm-change-folio_lock_or_retry-to-use-vm_fault-directly.patch
mm-handle-swap-page-faults-under-per-vma-lock.patch
mm-handle-userfaults-under-vma-lock.patch
The quilt patch titled
Subject: fork: lock VMAs of the parent process when forking
has been removed from the -mm tree. Its filename was
fork-lock-vmas-of-the-parent-process-when-forking.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: fork: lock VMAs of the parent process when forking
Date: Wed, 5 Jul 2023 18:13:59 -0700
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork. I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disables per-VMA locks until the fix is tested and verified.
This patch (of 2):
When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page. Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue. This
fix can potentially regress some fork-heavy workloads. Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression. If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations
are possible if this regression proves to be problematic.
Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffst��tte <holger(a)applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-as…
Reported-by: Jacob Young <jacobly.alt(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Holger Hoffsttte <holger(a)applied-asynchrony.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/kernel/fork.c~fork-lock-vmas-of-the-parent-process-when-forking
+++ a/kernel/fork.c
@@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(str
retval = -EINTR;
goto fail_uprobe_end;
}
+#ifdef CONFIG_PER_VMA_LOCK
+ /* Disallow any page faults before calling flush_cache_dup_mm */
+ for_each_vma(old_vmi, mpnt)
+ vma_start_write(mpnt);
+ vma_iter_set(&old_vmi, 0);
+#endif
flush_cache_dup_mm(oldmm);
uprobe_dup_mmap(oldmm, mm);
/*
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-disable-config_per_vma_lock-until-its-fixed.patch
mm-lock-a-vma-before-stack-expansion.patch
mm-lock-newly-mapped-vma-which-can-be-modified-after-it-becomes-visible.patch
swap-remove-remnants-of-polling-from-read_swap_cache_async.patch
mm-add-missing-vm_fault_result_trace-name-for-vm_fault_completed.patch
mm-drop-per-vma-lock-when-returning-vm_fault_retry-or-vm_fault_completed.patch
mm-change-folio_lock_or_retry-to-use-vm_fault-directly.patch
mm-handle-swap-page-faults-under-per-vma-lock.patch
mm-handle-userfaults-under-vma-lock.patch
Somehow PR_GET_AUXV got added into PR_MCE_KILL's switch when
the patch was applied [1].
Thus move it out of the switch, to the place the patch added it.
In the recently released v6.4 kernel some user could, in
principle, be already using this feature by mapping the right
page and passing the PR_GET_AUXV constant as a pointer:
prctl(PR_MCE_KILL, PR_GET_AUXV, ...)
So this does change the behavior for users. We could keep the bug
since the other subcases in PR_MCE_KILL (PR_MCE_KILL_CLEAR and
PR_MCE_KILL_SET) do not overlap.
However, v6.4 may be recent enough (2 weeks old) that moving
the lines (rather than just adding a new case) does not break
anybody? Moreover, the documentation in man-pages was just
committed today [2].
Fixes: ddc65971bb67 ("prctl: add PR_GET_AUXV to copy auxv to userspace")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/d81864a7f7f43bca6afa2a09fc2e850e4050ab42.168061… [1]
Link: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=8cf0… [2]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
kernel/sys.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index 339fee3eff6a..a36a27ebac33 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2529,11 +2529,6 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
else
return -EINVAL;
break;
- case PR_GET_AUXV:
- if (arg4 || arg5)
- return -EINVAL;
- error = prctl_get_auxv((void __user *)arg2, arg3);
- break;
default:
return -EINVAL;
}
@@ -2688,6 +2683,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_SET_VMA:
error = prctl_set_vma(arg2, arg3, arg4, arg5);
break;
+ case PR_GET_AUXV:
+ if (arg4 || arg5)
+ return -EINVAL;
+ error = prctl_get_auxv((void __user *)arg2, arg3);
+ break;
#ifdef CONFIG_KSM
case PR_SET_MEMORY_MERGE:
if (arg3 || arg4 || arg5)
base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
--
2.41.0
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugh Dickins <hughd(a)google.com>
---
mm/mmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 84c71431a527..3eda23c9ebe7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2809,11 +2809,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (vma_iter_prealloc(&vmi))
goto close_and_free_vma;
+ /* Lock the VMA since it is modified after insertion into VMA tree */
+ vma_start_write(vma);
if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
- /* Lock the VMA since it is modified after insertion into VMA tree */
- vma_start_write(vma);
vma_iter_store(&vmi, vma);
mm->map_count++;
if (vma->vm_file) {
--
2.35.3
Dear developer, I am a security researcher at Wuhan University. Recently I discovered a vulnerability in the driver of the USBcore module in the Linux kernel. This vulnerability will lead to an infinite loop in the probe process of the USB device, which will consume a lot of system resources. The vulnerability was found in kernel version 5.6.19 and tested to exist in the new 6.3.7 kernel version as well. I hope that after your review, you will be able to apply for a CVE number to disclose this vulnerability. If you need more detailed vulnerability information, please contact me. Thank you for your help.
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
---
mm/mmap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index 204ddcd52625..c66e4622a557 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1977,6 +1977,8 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
return -ENOMEM;
}
+ /* Lock the VMA before expanding to prevent concurrent page faults */
+ vma_start_write(vma);
/*
* vma->vm_start/vm_end cannot change under us because the caller
* is required to hold the mmap_lock in read mode. We need the
@@ -2064,6 +2066,8 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
return -ENOMEM;
}
+ /* Lock the VMA before expanding to prevent concurrent page faults */
+ vma_start_write(vma);
/*
* vma->vm_start/vm_end cannot change under us because the caller
* is required to hold the mmap_lock in read mode. We need the
--
2.41.0.255.g8b1d071c50-goog
The patch titled
Subject: mm: hugetlb_vmemmap: fix a race between vmemmap pmd split
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb_vmemmap-fix-a-race-between-vmemmap-pmd-split.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb_vmemmap: fix a race between vmemmap pmd split
Date: Fri, 7 Jul 2023 11:38:59 +0800
The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd
page without holding page_table_lock may possiblely get the page table
page instead of a huge pmd page.
The effect may be in set_pte_at() since we may pass an invalid page
struct, if set_pte_at() wants to access the page struct (e.g.
CONFIG_PAGE_TABLE_CHECK is enabled), it may crash the kernel.
So fix it. And inline __split_vmemmap_huge_pmd() since it only has one
user.
Link: https://lkml.kernel.org/r/20230707033859.16148-1-songmuchun@bytedance.com
Fixes: d8d55f5616cf ("mm: sparsemem: use page table lock to protect kernel pmd operations")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb_vmemmap.c | 34 ++++++++++++++--------------------
1 file changed, 14 insertions(+), 20 deletions(-)
--- a/mm/hugetlb_vmemmap.c~mm-hugetlb_vmemmap-fix-a-race-between-vmemmap-pmd-split
+++ a/mm/hugetlb_vmemmap.c
@@ -36,14 +36,22 @@ struct vmemmap_remap_walk {
struct list_head *vmemmap_pages;
};
-static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
+static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
{
pmd_t __pmd;
int i;
unsigned long addr = start;
- struct page *page = pmd_page(*pmd);
- pte_t *pgtable = pte_alloc_one_kernel(&init_mm);
+ struct page *head;
+ pte_t *pgtable;
+
+ spin_lock(&init_mm.page_table_lock);
+ head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL;
+ spin_unlock(&init_mm.page_table_lock);
+ if (!head)
+ return 0;
+
+ pgtable = pte_alloc_one_kernel(&init_mm);
if (!pgtable)
return -ENOMEM;
@@ -53,7 +61,7 @@ static int __split_vmemmap_huge_pmd(pmd_
pte_t entry, *pte;
pgprot_t pgprot = PAGE_KERNEL;
- entry = mk_pte(page + i, pgprot);
+ entry = mk_pte(head + i, pgprot);
pte = pte_offset_kernel(&__pmd, addr);
set_pte_at(&init_mm, addr, pte, entry);
}
@@ -65,8 +73,8 @@ static int __split_vmemmap_huge_pmd(pmd_
* be treated as indepdenent small pages (as they can be freed
* individually).
*/
- if (!PageReserved(page))
- split_page(page, get_order(PMD_SIZE));
+ if (!PageReserved(head))
+ split_page(head, get_order(PMD_SIZE));
/* Make pte visible before pmd. See comment in pmd_install(). */
smp_wmb();
@@ -80,20 +88,6 @@ static int __split_vmemmap_huge_pmd(pmd_
return 0;
}
-static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
-{
- int leaf;
-
- spin_lock(&init_mm.page_table_lock);
- leaf = pmd_leaf(*pmd);
- spin_unlock(&init_mm.page_table_lock);
-
- if (!leaf)
- return 0;
-
- return __split_vmemmap_huge_pmd(pmd, start);
-}
-
static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end,
struct vmemmap_remap_walk *walk)
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlb_vmemmap-fix-a-race-between-vmemmap-pmd-split.patch
Hi,
Can you please help in back porting the below patch to linux-5.15.y
stable tree:
commit d8c47cc7bf602ef73384a00869a70148146c1191("mm: page_io: fix psi
memory pressure error on cold swapins") .
With the absence of this patch we are seeing some user space tools, like
Android low memory killer based on PSI events, bit aggressive as it
makes the PSI is accounted for even cold swapin on a device where swap
is mounted on a zram with slower backing dev.
Thanks,
Charan
From: Rafał Miłecki <rafal(a)milecki.pl>
commit f99e6d7c4ed3be2531bd576425a5bd07fb133bd7 upstream.
While bringing hardware up we should perform a full reset including the
switch bit (BGMAC_BCMA_IOCTL_SW_RESET aka SICF_SWRST). It's what
specification says and what reference driver does.
This seems to be critical for the BCM5358. Without this hardware doesn't
get initialized properly and doesn't seem to transmit or receive any
packets.
Originally bgmac was calling bgmac_chip_reset() before setting
"has_robosw" property which resulted in expected behaviour. That has
changed as a side effect of adding platform device support which
regressed BCM5358 support.
Fixes: f6a95a24957a ("net: ethernet: bgmac: Add platform device support")
Cc: Jon Mason <jdmason(a)kudzu.us>
Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl>
Reviewed-by: Leon Romanovsky <leonro(a)nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Link: https://lore.kernel.org/r/20230227091156.19509-1-zajec5@gmail.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
Upstream commit wasn't backported to 5.4 (and older) because it couldn't
be cherry-picked cleanly. There was a small fuzz caused by a missing
commit 8c7da63978f1 ("bgmac: configure MTU and add support for frames
beyond 8192 byte size").
I've manually cherry-picked fix for BCM5358 to the linux-5.4.x.
---
drivers/net/ethernet/broadcom/bgmac.c | 8 ++++++--
drivers/net/ethernet/broadcom/bgmac.h | 2 ++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 193722334d93..89a63fdbe0e3 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -890,13 +890,13 @@ static void bgmac_chip_reset_idm_config(struct bgmac *bgmac)
if (iost & BGMAC_BCMA_IOST_ATTACHED) {
flags = BGMAC_BCMA_IOCTL_SW_CLKEN;
- if (!bgmac->has_robosw)
+ if (bgmac->in_init || !bgmac->has_robosw)
flags |= BGMAC_BCMA_IOCTL_SW_RESET;
}
bgmac_clk_enable(bgmac, flags);
}
- if (iost & BGMAC_BCMA_IOST_ATTACHED && !bgmac->has_robosw)
+ if (iost & BGMAC_BCMA_IOST_ATTACHED && (bgmac->in_init || !bgmac->has_robosw))
bgmac_idm_write(bgmac, BCMA_IOCTL,
bgmac_idm_read(bgmac, BCMA_IOCTL) &
~BGMAC_BCMA_IOCTL_SW_RESET);
@@ -1489,6 +1489,8 @@ int bgmac_enet_probe(struct bgmac *bgmac)
struct net_device *net_dev = bgmac->net_dev;
int err;
+ bgmac->in_init = true;
+
bgmac_chip_intrs_off(bgmac);
net_dev->irq = bgmac->irq;
@@ -1538,6 +1540,8 @@ int bgmac_enet_probe(struct bgmac *bgmac)
net_dev->hw_features = net_dev->features;
net_dev->vlan_features = net_dev->features;
+ bgmac->in_init = false;
+
err = register_netdev(bgmac->net_dev);
if (err) {
dev_err(bgmac->dev, "Cannot register net device\n");
diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h
index 40d02fec2747..76930b8353d6 100644
--- a/drivers/net/ethernet/broadcom/bgmac.h
+++ b/drivers/net/ethernet/broadcom/bgmac.h
@@ -511,6 +511,8 @@ struct bgmac {
int irq;
u32 int_mask;
+ bool in_init;
+
/* Current MAC state */
int mac_speed;
int mac_duplex;
--
2.35.3