The patch titled
Subject: mm,hugetlb: change mechanism to detect a COW on private mapping
has been added to the -mm mm-new branch. Its filename is
mmhugetlb-change-mechanism-to-detect-a-cow-on-private-mapping.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: mm,hugetlb: change mechanism to detect a COW on private mapping
Date: Fri, 20 Jun 2025 14:30:10 +0200
Patch series "Misc rework on hugetlb faulting path", v2.
This patchset aims to give some love to the hugetlb faulting path, doing
so by removing obsolete comments that are no longer true, sorting out the
folio lock, and changing the mechanism we use to determine whether we are
COWing a private mapping already.
The most important patch of the series is #1, as it fixes a deadlock that
was described in [1], where two processes were holding the same lock for
the folio in the pagecache, and then deadlocked in the mutex.
Looking up and locking the folio in the pagecache was done to check
whether that folio was the same folio we had mapped in our pagetables,
meaning that if it was different we knew that we already mapped that folio
privately, so any further CoW would be made on a private mapping, which
lead us to the question: __Was the reservation for that address
consumed?__ That is all we care about, because if it was indeed consumed
and we are the owner and we cannot allocate more folios, we need to unmap
the folio from the processes pagetables and make it exclusive for us.
We figured we do not need to look up the folio at all, and it is just
enough to check whether the folio we have mapped is anonymous, which means
we mapped it privately, so the reservation was indeed consumed.
Patch #2 sorts out folio locking in the faulting path, reducing the scope
of it ,only taking it when we are dealing with an anonymous folio and
document it. More details in the patch.
Patches #3-5 are cleanups.
hugetlb_wp() checks whether the process is trying to COW on a private
mapping in order to know whether the reservation for that address was
already consumed or not. If it was consumed and we are the ownner of the
mapping, the folio will have to be unmapped from the other processes.
Currently, that check is done by looking up the folio in the pagecache and
comparing it to the folio which is mapped in our pagetables. If it
differs, it means we already mapped it privately before, consuming a
reservation on the way. All we are interested in is whether the mapped
folio is anonymous, so we can simplify and check for that instead.
Also, we transition from a trylock to a folio_lock, since the former was
only needed when hugetlb_fault() had to lock both folios, in order to
avoid deadlock.
Link: https://lkml.kernel.org/r/20250620123014.29748-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20250620123014.29748-2-osalvador@suse.de
Link: https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/ [1]
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Reported-by: Gavin Guo <gavinguo(a)igalia.com>
Closes: https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Suggested-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 70 +++++++++++--------------------------------------
1 file changed, 17 insertions(+), 53 deletions(-)
--- a/mm/hugetlb.c~mmhugetlb-change-mechanism-to-detect-a-cow-on-private-mapping
+++ a/mm/hugetlb.c
@@ -6130,8 +6130,7 @@ static void unmap_ref_private(struct mm_
* cannot race with other handlers or page migration.
* Keep the pte_same checks anyway to make transition from the mutex easier.
*/
-static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
- struct vm_fault *vmf)
+static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct mm_struct *mm = vma->vm_mm;
@@ -6193,16 +6192,17 @@ retry_avoidcopy:
PageAnonExclusive(&old_folio->page), &old_folio->page);
/*
- * If the process that created a MAP_PRIVATE mapping is about to
- * perform a COW due to a shared page count, attempt to satisfy
- * the allocation without using the existing reserves. The pagecache
- * page is used to determine if the reserve at this address was
- * consumed or not. If reserves were used, a partial faulted mapping
- * at the time of fork() could consume its reserves on COW instead
- * of the full address range.
+ * If the process that created a MAP_PRIVATE mapping is about to perform
+ * a COW due to a shared page count, attempt to satisfy the allocation
+ * without using the existing reserves.
+ * In order to determine where this is a COW on a MAP_PRIVATE mapping it
+ * is enough to check whether the old_folio is anonymous. This means that
+ * the reserve for this address was consumed. If reserves were used, a
+ * partial faulted mapping at the fime of fork() could consume its reserves
+ * on COW instead of the full address range.
*/
if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) &&
- old_folio != pagecache_folio)
+ folio_test_anon(old_folio))
cow_from_owner = true;
folio_get(old_folio);
@@ -6581,7 +6581,7 @@ static vm_fault_t hugetlb_no_page(struct
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
- ret = hugetlb_wp(folio, vmf);
+ ret = hugetlb_wp(vmf);
}
spin_unlock(vmf->ptl);
@@ -6648,11 +6648,9 @@ vm_fault_t hugetlb_fault(struct mm_struc
{
vm_fault_t ret;
u32 hash;
- struct folio *folio = NULL;
- struct folio *pagecache_folio = NULL;
+ struct folio *folio;
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
- int need_wait_lock = 0;
struct vm_fault vmf = {
.vma = vma,
.address = address & huge_page_mask(h),
@@ -6747,8 +6745,7 @@ vm_fault_t hugetlb_fault(struct mm_struc
* If we are going to COW/unshare the mapping later, we examine the
* pending reservations for this page now. This will ensure that any
* allocations necessary to record that reservation occur outside the
- * spinlock. Also lookup the pagecache page now as it is used to
- * determine if a reservation has been consumed.
+ * spinlock.
*/
if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
!(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(vmf.orig_pte)) {
@@ -6758,11 +6755,6 @@ vm_fault_t hugetlb_fault(struct mm_struc
}
/* Just decrements count, does not deallocate */
vma_end_reservation(h, vma, vmf.address);
-
- pagecache_folio = filemap_lock_hugetlb_folio(h, mapping,
- vmf.pgoff);
- if (IS_ERR(pagecache_folio))
- pagecache_folio = NULL;
}
vmf.ptl = huge_pte_lock(h, mm, vmf.pte);
@@ -6776,10 +6768,6 @@ vm_fault_t hugetlb_fault(struct mm_struc
(flags & FAULT_FLAG_WRITE) && !huge_pte_write(vmf.orig_pte)) {
if (!userfaultfd_wp_async(vma)) {
spin_unlock(vmf.ptl);
- if (pagecache_folio) {
- folio_unlock(pagecache_folio);
- folio_put(pagecache_folio);
- }
hugetlb_vma_unlock_read(vma);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
return handle_userfault(&vmf, VM_UFFD_WP);
@@ -6791,23 +6779,14 @@ vm_fault_t hugetlb_fault(struct mm_struc
/* Fallthrough to CoW */
}
- /*
- * hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) and
- * pagecache_folio, so here we need take the former one
- * when folio != pagecache_folio or !pagecache_folio.
- */
+ /* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
folio = page_folio(pte_page(vmf.orig_pte));
- if (folio != pagecache_folio)
- if (!folio_trylock(folio)) {
- need_wait_lock = 1;
- goto out_ptl;
- }
-
+ folio_lock(folio);
folio_get(folio);
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(vmf.orig_pte)) {
- ret = hugetlb_wp(pagecache_folio, &vmf);
+ ret = hugetlb_wp(&vmf);
goto out_put_page;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte);
@@ -6818,16 +6797,10 @@ vm_fault_t hugetlb_fault(struct mm_struc
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, vmf.address, vmf.pte);
out_put_page:
- if (folio != pagecache_folio)
- folio_unlock(folio);
+ folio_unlock(folio);
folio_put(folio);
out_ptl:
spin_unlock(vmf.ptl);
-
- if (pagecache_folio) {
- folio_unlock(pagecache_folio);
- folio_put(pagecache_folio);
- }
out_mutex:
hugetlb_vma_unlock_read(vma);
@@ -6839,15 +6812,6 @@ out_mutex:
vma_end_read(vma);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
- /*
- * Generally it's safe to hold refcount during waiting page lock. But
- * here we just wait to defer the next page fault to avoid busy loop and
- * the page is not used after unlocked before returning from the current
- * page fault. So we are safe from accessing freed page, even if we wait
- * here without taking refcount.
- */
- if (need_wait_lock)
- folio_wait_locked(folio);
return ret;
}
_
Patches currently in -mm which might be from osalvador(a)suse.de are
mmslub-do-not-special-case-n_normal-nodes-for-slab_nodes.patch
mmmemory_hotplug-remove-status_change_nid_normal-and-update-documentation.patch
mmmemory_hotplug-implement-numa-node-notifier.patch
mmslub-use-node-notifier-instead-of-memory-notifier.patch
mmmemory-tiers-use-node-notifier-instead-of-memory-notifier.patch
driverscxl-use-node-notifier-instead-of-memory-notifier.patch
drivershmat-use-node-notifier-instead-of-memory-notifier.patch
kernelcpuset-use-node-notifier-instead-of-memory-notifier.patch
mmmempolicy-use-node-notifier-instead-of-memory-notifier.patch
mmpage_ext-derive-the-node-from-the-pfn.patch
mmmemory_hotplug-drop-status_change_nid-parameter-from-memory_notify.patch
mmhugetlb-change-mechanism-to-detect-a-cow-on-private-mapping.patch
mmhugetlb-sort-out-folio-locking-in-the-faulting-path.patch
mmhugetlb-rename-anon_rmap-to-new_anon_folio-and-make-it-boolean.patch
mmhugetlb-rename-anon_rmap-to-new_anon_folio-and-make-it-boolean-fix.patch
mmhugetlb-drop-obsolete-comment-about-non-present-pte-and-second-faults.patch
mmhugetlb-drop-unlikelys-from-hugetlb_fault.patch
From: Dominique Martinet <asmadeus(a)codewreck.org>
A buffer overflow vulnerability exists in the USB 9pfs transport layer
where inconsistent size validation between packet header parsing and
actual data copying allows a malicious USB host to overflow heap buffers.
The issue occurs because:
- usb9pfs_rx_header() validates only the declared size in packet header
- usb9pfs_rx_complete() uses req->actual (actual received bytes) for
memcpy
This allows an attacker to craft packets with small declared size
(bypassing validation) but large actual payload (triggering overflow
in memcpy).
Add validation in usb9pfs_rx_complete() to ensure req->actual does not
exceed the buffer capacity before copying data.
Reported-by: Yuhao Jiang <danisjiang(a)gmail.com>
Closes: https://lkml.kernel.org/r/20250616132539.63434-1-danisjiang@gmail.com
Fixes: a3be076dc174 ("net/9p/usbg: Add new usb gadget function transport")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
---
(still not actually tested, can't get dummy_hcd/gt to create a device
listed by p9_fwd.py/useable in qemu, I give up..)
Changes in v3:
- fix typo s/req_sizel/req_size/ -- sorry for that, module wasn't
built...
- Link to v2: https://lore.kernel.org/r/20250620-9p-usb_overflow-v2-1-026c6109c7a1@codewr…
Changes in v2:
- run through p9_client_cb() on error
- Link to v1: https://lore.kernel.org/r/20250616132539.63434-1-danisjiang@gmail.com
---
net/9p/trans_usbg.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_usbg.c b/net/9p/trans_usbg.c
index 6b694f117aef296a66419fed5252305e7a1d0936..468f7e8f0277b9ae5f1bb3c94c649fca97d28857 100644
--- a/net/9p/trans_usbg.c
+++ b/net/9p/trans_usbg.c
@@ -231,6 +231,8 @@ static void usb9pfs_rx_complete(struct usb_ep *ep, struct usb_request *req)
struct f_usb9pfs *usb9pfs = ep->driver_data;
struct usb_composite_dev *cdev = usb9pfs->function.config->cdev;
struct p9_req_t *p9_rx_req;
+ unsigned int req_size = req->actual;
+ int status = REQ_STATUS_RCVD;
if (req->status) {
dev_err(&cdev->gadget->dev, "%s usb9pfs complete --> %d, %d/%d\n",
@@ -242,11 +244,19 @@ static void usb9pfs_rx_complete(struct usb_ep *ep, struct usb_request *req)
if (!p9_rx_req)
return;
- memcpy(p9_rx_req->rc.sdata, req->buf, req->actual);
+ if (req_size > p9_rx_req->rc.capacity) {
+ dev_err(&cdev->gadget->dev,
+ "%s received data size %u exceeds buffer capacity %zu\n",
+ ep->name, req_size, p9_rx_req->rc.capacity);
+ req_size = 0;
+ status = REQ_STATUS_ERROR;
+ }
- p9_rx_req->rc.size = req->actual;
+ memcpy(p9_rx_req->rc.sdata, req->buf, req_size);
- p9_client_cb(usb9pfs->client, p9_rx_req, REQ_STATUS_RCVD);
+ p9_rx_req->rc.size = req_size;
+
+ p9_client_cb(usb9pfs->client, p9_rx_req, status);
p9_req_put(usb9pfs->client, p9_rx_req);
complete(&usb9pfs->received);
---
base-commit: 74b4cc9b8780bfe8a3992c9ac0033bf22ac01f19
change-id: 20250620-9p-usb_overflow-25bfc5e9bef3
Best regards,
--
Dominique Martinet <asmadeus(a)codewreck.org>
From: Dominique Martinet <asmadeus(a)codewreck.org>
A buffer overflow vulnerability exists in the USB 9pfs transport layer
where inconsistent size validation between packet header parsing and
actual data copying allows a malicious USB host to overflow heap buffers.
The issue occurs because:
- usb9pfs_rx_header() validates only the declared size in packet header
- usb9pfs_rx_complete() uses req->actual (actual received bytes) for
memcpy
This allows an attacker to craft packets with small declared size
(bypassing validation) but large actual payload (triggering overflow
in memcpy).
Add validation in usb9pfs_rx_complete() to ensure req->actual does not
exceed the buffer capacity before copying data.
Reported-by: Yuhao Jiang <danisjiang(a)gmail.com>
Closes: https://lkml.kernel.org/r/20250616132539.63434-1-danisjiang@gmail.com
Fixes: a3be076dc174 ("net/9p/usbg: Add new usb gadget function transport")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
---
Not actually tested, I'll try to find time to figure out how to run with
qemu for real this time...
Changes in v2:
- run through p9_client_cb() on error
- Link to v1: https://lore.kernel.org/r/20250616132539.63434-1-danisjiang@gmail.com
---
net/9p/trans_usbg.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_usbg.c b/net/9p/trans_usbg.c
index 6b694f117aef296a66419fed5252305e7a1d0936..43078e0d4ca3f4063660f659d28452c81bef10b4 100644
--- a/net/9p/trans_usbg.c
+++ b/net/9p/trans_usbg.c
@@ -231,6 +231,8 @@ static void usb9pfs_rx_complete(struct usb_ep *ep, struct usb_request *req)
struct f_usb9pfs *usb9pfs = ep->driver_data;
struct usb_composite_dev *cdev = usb9pfs->function.config->cdev;
struct p9_req_t *p9_rx_req;
+ unsigned int req_size = req->actual;
+ int status = REQ_STATUS_RCVD;
if (req->status) {
dev_err(&cdev->gadget->dev, "%s usb9pfs complete --> %d, %d/%d\n",
@@ -242,11 +244,19 @@ static void usb9pfs_rx_complete(struct usb_ep *ep, struct usb_request *req)
if (!p9_rx_req)
return;
- memcpy(p9_rx_req->rc.sdata, req->buf, req->actual);
+ if (req_size > p9_rx_req->rc.capacity) {
+ dev_err(&cdev->gadget->dev,
+ "%s received data size %u exceeds buffer capacity %zu\n",
+ ep->name, req_size, p9_rx_req->rc.capacity);
+ req_size = 0;
+ status = REQ_STATUS_ERROR;
+ }
- p9_rx_req->rc.size = req->actual;
+ memcpy(p9_rx_req->rc.sdata, req->buf, req_size);
- p9_client_cb(usb9pfs->client, p9_rx_req, REQ_STATUS_RCVD);
+ p9_rx_req->rc.size = req_sizel;
+
+ p9_client_cb(usb9pfs->client, p9_rx_req, status);
p9_req_put(usb9pfs->client, p9_rx_req);
complete(&usb9pfs->received);
---
base-commit: 74b4cc9b8780bfe8a3992c9ac0033bf22ac01f19
change-id: 20250620-9p-usb_overflow-25bfc5e9bef3
Best regards,
--
Dominique Martinet <asmadeus(a)codewreck.org>
The patch titled
Subject: kallsyms: fix build without execinfo
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kallsyms-fix-build-without-execinfo.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Achill Gilgenast <fossdd(a)pwned.life>
Subject: kallsyms: fix build without execinfo
Date: Sun, 22 Jun 2025 03:45:49 +0200
Some libc's like musl libc don't provide execinfo.h since it's not part of
POSIX. In order to fix compilation on musl, only include execinfo.h if
available (HAVE_BACKTRACE_SUPPORT)
This was discovered with c104c16073b7 ("Kunit to check the longest symbol
length") which starts to include linux/kallsyms.h with Alpine Linux'
configs.
Link: https://lkml.kernel.org/r/20250622014608.448718-1-fossdd@pwned.life
Fixes: c104c16073b7 ("Kunit to check the longest symbol length")
Signed-off-by: Achill Gilgenast <fossdd(a)pwned.life>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/include/linux/kallsyms.h | 4 ++++
1 file changed, 4 insertions(+)
--- a/tools/include/linux/kallsyms.h~kallsyms-fix-build-without-execinfo
+++ a/tools/include/linux/kallsyms.h
@@ -18,6 +18,7 @@ static inline const char *kallsyms_looku
return NULL;
}
+#ifdef HAVE_BACKTRACE_SUPPORT
#include <execinfo.h>
#include <stdlib.h>
static inline void print_ip_sym(const char *loglvl, unsigned long ip)
@@ -30,5 +31,8 @@ static inline void print_ip_sym(const ch
free(name);
}
+#else
+static inline void print_ip_sym(const char *loglvl, unsigned long ip) {}
+#endif
#endif
_
Patches currently in -mm which might be from fossdd(a)pwned.life are
kallsyms-fix-build-without-execinfo.patch
Hello,
Please consider applying the following commits for 6.12.y:
c104c16073b7 ("Kunit to check the longest symbol length")
f710202b2a45 ("x86/tools: Drop duplicate unlikely() definition in
insn_decoder_test.c")
They should apply cleanly.
Those two commits implement a kunit test to verify that a symbol with
KSYM_NAME_LEN of 512 can be read.
The first commit implements the test. This commit also includes a fix
for the test x86/insn_decoder_test. In the case a symbol exceeds the
symbol length limit, an error will happen:
arch/x86/tools/insn_decoder_test: error: malformed line 1152000:
tBb_+0xf2>
..which overflowed by 10 characters reading this line:
ffffffff81458193: 74 3d je
ffffffff814581d2
<_RNvXse_NtNtNtCshGpAVYOtgW1_4core4iter8adapters7flattenINtB5_13FlattenCompatINtNtB7_3map3MapNtNtNtBb_3str4iter5CharsNtB1v_17CharEscapeDefaultENtNtBb_4char13EscapeDefaultENtNtBb_3fmt5Debug3fmtBb_+0xf2>
The fix was proposed in [1] and initially mentioned at [2].
The second commit fixes a warning when building with clang because
there was a definition of unlikely from compiler.h in tools/include/linux,
which conflicted with the one in the instruction decoder selftest.
[1] https://lore.kernel.org/lkml/Y9ES4UKl%2F+DtvAVS@gmail.com/
[2] https://lore.kernel.org/lkml/320c4dba-9919-404b-8a26-a8af16be1845@app.fastm…
I will send something similar to 6.6.y and 6.1.y
Thanks!
Best regards,
Sergio
commit: 10685681bafc ("net_sched: sch_sfq: don't allow 1 packet limit")
fixes CVE-2024-57996 and commit: b3bf8f63e617 ("net_sched: sch_sfq: move
the limit validation") fixes CVE-2025-37752.
Patches 3 and 5 are CVE fixes for above mentioned CVEs. Patch 1,2 and 4
are pulled in as stable-deps.
Testeing performed on the patches 5.15.185 kernel with the above 5
patches: (Used latest upstream kselftests for tc-testing)
# ./tdc.py -f tc-tests/qdiscs/sfq.json
-- ns/SubPlugin.__init__
Test 7482: Create SFQ with default setting
Test c186: Create SFQ with limit setting
Test ae23: Create SFQ with perturb setting
Test a430: Create SFQ with quantum setting
Test 4539: Create SFQ with divisor setting
Test b089: Create SFQ with flows setting
Test 99a0: Create SFQ with depth setting
Test 7389: Create SFQ with headdrop setting
Test 6472: Create SFQ with redflowlimit setting
Test 8929: Show SFQ class
Test 4d6f: Check that limit of 1 is rejected
Test 7f8f: Check that a derived limit of 1 is rejected (limit 2 depth 1 flows 1)
Test 5168: Check that a derived limit of 1 is rejected (limit 2 depth 1 divisor 1)
All test results:
1..13
ok 1 7482 - Create SFQ with default setting
ok 2 c186 - Create SFQ with limit setting
ok 3 ae23 - Create SFQ with perturb setting
ok 4 a430 - Create SFQ with quantum setting
ok 5 4539 - Create SFQ with divisor setting
ok 6 b089 - Create SFQ with flows setting
ok 7 99a0 - Create SFQ with depth setting
ok 8 7389 - Create SFQ with headdrop setting
ok 9 6472 - Create SFQ with redflowlimit setting
ok 10 8929 - Show SFQ class
ok 11 4d6f - Check that limit of 1 is rejected
ok 12 7f8f - Check that a derived limit of 1 is rejected (limit 2 depth 1 flows 1)
ok 13 5168 - Check that a derived limit of 1 is rejected (limit 2 depth 1 divisor 1)
# uname -a
Linux hamogala-vm-6 5.15.185+ #1 SMP Fri Jun 13 18:34:53 GMT 2025 x86_64 x86_64 x86_64 GNU/Linux
I will try to send similar backports to kernels older than 5.15.y as
well.
Thanks,
Harshit
Eric Dumazet (2):
net_sched: sch_sfq: annotate data-races around q->perturb_period
net_sched: sch_sfq: handle bigger packets
Octavian Purdila (3):
net_sched: sch_sfq: don't allow 1 packet limit
net_sched: sch_sfq: use a temporary work area for validating
configuration
net_sched: sch_sfq: move the limit validation
net/sched/sch_sfq.c | 112 ++++++++++++++++++++++++++++----------------
1 file changed, 71 insertions(+), 41 deletions(-)
--
2.47.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x f4c7baa0699b69edb6887a992283b389761e0e81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062239-playful-huff-6d2b@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4c7baa0699b69edb6887a992283b389761e0e81 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Fri, 16 May 2025 15:16:54 +0300
Subject: [PATCH] drm/i915/display: Add check for alloc_ordered_workqueue() and
alloc_workqueue()
Add check for the return value of alloc_ordered_workqueue()
and alloc_workqueue(). Furthermore, if some allocations fail,
cleanup works are added to avoid potential memory leak problem.
Fixes: 40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
Link: https://lore.kernel.org/r/20d3d096c6a4907636f8a1389b3b4dd753ca356e.17473976…
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit dcab7a228f4ea9cda3f5b0a1f0679e046d23d7f7)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/drivers/gpu/drm/i915/display/intel_display_driver.c
index 5c74ab5fd1aa..411fe7b918a7 100644
--- a/drivers/gpu/drm/i915/display/intel_display_driver.c
+++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
@@ -244,31 +244,45 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
intel_dmc_init(display);
display->wq.modeset = alloc_ordered_workqueue("i915_modeset", 0);
+ if (!display->wq.modeset) {
+ ret = -ENOMEM;
+ goto cleanup_vga_client_pw_domain_dmc;
+ }
+
display->wq.flip = alloc_workqueue("i915_flip", WQ_HIGHPRI |
WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE);
+ if (!display->wq.flip) {
+ ret = -ENOMEM;
+ goto cleanup_wq_modeset;
+ }
+
display->wq.cleanup = alloc_workqueue("i915_cleanup", WQ_HIGHPRI, 0);
+ if (!display->wq.cleanup) {
+ ret = -ENOMEM;
+ goto cleanup_wq_flip;
+ }
intel_mode_config_init(display);
ret = intel_cdclk_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_color_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_dbuf_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_bw_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_pmdemand_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
intel_init_quirks(display);
@@ -276,6 +290,12 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
return 0;
+cleanup_wq_cleanup:
+ destroy_workqueue(display->wq.cleanup);
+cleanup_wq_flip:
+ destroy_workqueue(display->wq.flip);
+cleanup_wq_modeset:
+ destroy_workqueue(display->wq.modeset);
cleanup_vga_client_pw_domain_dmc:
intel_dmc_fini(display);
intel_power_domains_driver_remove(display);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f4c7baa0699b69edb6887a992283b389761e0e81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062239-erased-bonus-68e2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4c7baa0699b69edb6887a992283b389761e0e81 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Fri, 16 May 2025 15:16:54 +0300
Subject: [PATCH] drm/i915/display: Add check for alloc_ordered_workqueue() and
alloc_workqueue()
Add check for the return value of alloc_ordered_workqueue()
and alloc_workqueue(). Furthermore, if some allocations fail,
cleanup works are added to avoid potential memory leak problem.
Fixes: 40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
Link: https://lore.kernel.org/r/20d3d096c6a4907636f8a1389b3b4dd753ca356e.17473976…
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit dcab7a228f4ea9cda3f5b0a1f0679e046d23d7f7)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/drivers/gpu/drm/i915/display/intel_display_driver.c
index 5c74ab5fd1aa..411fe7b918a7 100644
--- a/drivers/gpu/drm/i915/display/intel_display_driver.c
+++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
@@ -244,31 +244,45 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
intel_dmc_init(display);
display->wq.modeset = alloc_ordered_workqueue("i915_modeset", 0);
+ if (!display->wq.modeset) {
+ ret = -ENOMEM;
+ goto cleanup_vga_client_pw_domain_dmc;
+ }
+
display->wq.flip = alloc_workqueue("i915_flip", WQ_HIGHPRI |
WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE);
+ if (!display->wq.flip) {
+ ret = -ENOMEM;
+ goto cleanup_wq_modeset;
+ }
+
display->wq.cleanup = alloc_workqueue("i915_cleanup", WQ_HIGHPRI, 0);
+ if (!display->wq.cleanup) {
+ ret = -ENOMEM;
+ goto cleanup_wq_flip;
+ }
intel_mode_config_init(display);
ret = intel_cdclk_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_color_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_dbuf_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_bw_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_pmdemand_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
intel_init_quirks(display);
@@ -276,6 +290,12 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
return 0;
+cleanup_wq_cleanup:
+ destroy_workqueue(display->wq.cleanup);
+cleanup_wq_flip:
+ destroy_workqueue(display->wq.flip);
+cleanup_wq_modeset:
+ destroy_workqueue(display->wq.modeset);
cleanup_vga_client_pw_domain_dmc:
intel_dmc_fini(display);
intel_power_domains_driver_remove(display);
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x f4c7baa0699b69edb6887a992283b389761e0e81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062238-motivate-deflate-5b63@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4c7baa0699b69edb6887a992283b389761e0e81 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Fri, 16 May 2025 15:16:54 +0300
Subject: [PATCH] drm/i915/display: Add check for alloc_ordered_workqueue() and
alloc_workqueue()
Add check for the return value of alloc_ordered_workqueue()
and alloc_workqueue(). Furthermore, if some allocations fail,
cleanup works are added to avoid potential memory leak problem.
Fixes: 40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
Link: https://lore.kernel.org/r/20d3d096c6a4907636f8a1389b3b4dd753ca356e.17473976…
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit dcab7a228f4ea9cda3f5b0a1f0679e046d23d7f7)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/drivers/gpu/drm/i915/display/intel_display_driver.c
index 5c74ab5fd1aa..411fe7b918a7 100644
--- a/drivers/gpu/drm/i915/display/intel_display_driver.c
+++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
@@ -244,31 +244,45 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
intel_dmc_init(display);
display->wq.modeset = alloc_ordered_workqueue("i915_modeset", 0);
+ if (!display->wq.modeset) {
+ ret = -ENOMEM;
+ goto cleanup_vga_client_pw_domain_dmc;
+ }
+
display->wq.flip = alloc_workqueue("i915_flip", WQ_HIGHPRI |
WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE);
+ if (!display->wq.flip) {
+ ret = -ENOMEM;
+ goto cleanup_wq_modeset;
+ }
+
display->wq.cleanup = alloc_workqueue("i915_cleanup", WQ_HIGHPRI, 0);
+ if (!display->wq.cleanup) {
+ ret = -ENOMEM;
+ goto cleanup_wq_flip;
+ }
intel_mode_config_init(display);
ret = intel_cdclk_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_color_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_dbuf_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_bw_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
ret = intel_pmdemand_init(display);
if (ret)
- goto cleanup_vga_client_pw_domain_dmc;
+ goto cleanup_wq_cleanup;
intel_init_quirks(display);
@@ -276,6 +290,12 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
return 0;
+cleanup_wq_cleanup:
+ destroy_workqueue(display->wq.cleanup);
+cleanup_wq_flip:
+ destroy_workqueue(display->wq.flip);
+cleanup_wq_modeset:
+ destroy_workqueue(display->wq.modeset);
cleanup_vga_client_pw_domain_dmc:
intel_dmc_fini(display);
intel_power_domains_driver_remove(display);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051202-nutrient-upswing-4a86@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 Mon Sep 17 00:00:00 2001
From: Gavin Guo <gavinguo(a)igalia.com>
Date: Mon, 21 Apr 2025 19:35:36 +0800
Subject: [PATCH] mm/huge_memory: fix dereferencing invalid pmd migration entry
When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below. To prevent this invalid access, it is necessary to
check the PMD migration entry and return early. In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio. Since the PMD migration entry is locked, it
cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com
Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Gavin Guo <gavinguo(a)igalia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Gavin Shan <gshan(a)redhat.com>
Cc: Florent Revest <revest(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2a47682d1ab7..47d76d03ce30 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, bool freeze, struct folio *folio)
{
+ bool pmd_migration = is_pmd_migration_entry(*pmd);
+
VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio));
VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
@@ -3085,9 +3087,12 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
* require a folio to check the PMD against. Otherwise, there
* is a risk of replacing the wrong folio.
*/
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
- is_pmd_migration_entry(*pmd)) {
- if (folio && folio != pmd_folio(*pmd))
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) {
+ /*
+ * Do not apply pmd_folio() to a migration entry; and folio lock
+ * guarantees that it must be of the wrong folio anyway.
+ */
+ if (folio && (pmd_migration || folio != pmd_folio(*pmd)))
return;
__split_huge_pmd_locked(vma, pmd, address, freeze);
}