The quilt patch titled
Subject: maple_tree: reload mas before the second call for mas_empty_area
has been removed from the -mm tree. Its filename was
maple_tree-reload-mas-before-the-second-call-for-mas_empty_area.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yang Erkun <yangerkun(a)huawei.com>
Subject: maple_tree: reload mas before the second call for mas_empty_area
Date: Sat, 14 Dec 2024 17:30:05 +0800
Change the LONG_MAX in simple_offset_add to 1024, and do latter:
[root@fedora ~]# mkdir /tmp/dir
[root@fedora ~]# for i in {1..1024}; do touch /tmp/dir/$i; done
touch: cannot touch '/tmp/dir/1024': Device or resource busy
[root@fedora ~]# rm /tmp/dir/123
[root@fedora ~]# touch /tmp/dir/1024
[root@fedora ~]# rm /tmp/dir/100
[root@fedora ~]# touch /tmp/dir/1025
touch: cannot touch '/tmp/dir/1025': Device or resource busy
After we delete file 100, actually this is a empty entry, but the latter
create failed unexpected.
mas_alloc_cyclic has two chance to find empty entry. First find the entry
with range range_lo and range_hi, if no empty entry exist, and range_lo >
min, retry find with range min and range_hi. However, the first call
mas_empty_area may mark mas as EBUSY, and the second call for
mas_empty_area will return false directly. Fix this by reload mas before
second call for mas_empty_area.
[Liam.Howlett(a)Oracle.com: fix mas_alloc_cyclic() second search]
Link: https://lore.kernel.org/all/20241216060600.287B4C4CED0@smtp.kernel.org/
Link: https://lkml.kernel.org/r/20241216190113.1226145-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20241214093005.72284-1-yangerkun@huaweicloud.com
Fixes: 9b6713cc7522 ("maple_tree: Add mtree_alloc_cyclic()")
Signed-off-by: Yang Erkun <yangerkun(a)huawei.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Chuck Lever <chuck.lever(a)oracle.com> says:
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 1 +
1 file changed, 1 insertion(+)
--- a/lib/maple_tree.c~maple_tree-reload-mas-before-the-second-call-for-mas_empty_area
+++ a/lib/maple_tree.c
@@ -4354,6 +4354,7 @@ int mas_alloc_cyclic(struct ma_state *ma
ret = 1;
}
if (ret < 0 && range_lo > min) {
+ mas_reset(mas);
ret = mas_empty_area(mas, min, range_hi, 1);
if (ret == 0)
ret = 1;
_
Patches currently in -mm which might be from yangerkun(a)huawei.com are
The quilt patch titled
Subject: mm/readahead: fix large folio support in async readahead
has been removed from the -mm tree. Its filename was
mm-readahead-fix-large-folio-support-in-async-readahead.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: mm/readahead: fix large folio support in async readahead
Date: Fri, 6 Dec 2024 16:30:25 +0800
When testing large folio support with XFS on our servers, we observed that
only a few large folios are mapped when reading large files via mmap.
After a thorough analysis, I identified it was caused by the
`/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this
parameter is set to 128KB. After I tune it to 2MB, the large folio can
work as expected. However, I believe the large folio behavior should not
be dependent on the value of read_ahead_kb. It would be more robust if
the kernel can automatically adopt to it.
With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a
sequential read on a 1GB file using MADV_HUGEPAGE, the differences in
/proc/meminfo are as follows:
- before this patch
FileHugePages: 18432 kB
FilePmdMapped: 4096 kB
- after this patch
FileHugePages: 1067008 kB
FilePmdMapped: 1048576 kB
This shows that after applying the patch, the entire 1GB file is mapped to
huge pages. The stable list is CCed, as without this patch, large folios
don't function optimally in the readahead path.
It's worth noting that if read_ahead_kb is set to a larger value that
isn't aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail
to map to hugepages.
Link: https://lkml.kernel.org/r/20241108141710.9721-1-laoar.shao@gmail.com
Link: https://lkml.kernel.org/r/20241206083025.3478-1-laoar.shao@gmail.com
Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com>
Tested-by: kernel test robot <oliver.sang(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/readahead.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/readahead.c~mm-readahead-fix-large-folio-support-in-async-readahead
+++ a/mm/readahead.c
@@ -646,7 +646,11 @@ void page_cache_async_ra(struct readahea
1UL << order);
if (index == expected) {
ra->start += ra->size;
- ra->size = get_next_ra_size(ra, max_pages);
+ /*
+ * In the case of MADV_HUGEPAGE, the actual size might exceed
+ * the readahead window.
+ */
+ ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
ra->async_size = ra->size;
goto readit;
}
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
kernel-remove-get_task_comm-and-print-task-comm-directly.patch
arch-remove-get_task_comm-and-print-task-comm-directly.patch
net-remove-get_task_comm-and-print-task-comm-directly.patch
security-remove-get_task_comm-and-print-task-comm-directly.patch
drivers-remove-get_task_comm-and-print-task-comm-directly.patch
The quilt patch titled
Subject: mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
has been removed from the -mm tree. Its filename was
mm-vmscan-account-for-free-pages-to-prevent-infinite-loop-in-throttle_direct_reclaim.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Seiji Nishikawa <snishika(a)redhat.com>
Subject: mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
Date: Sun, 1 Dec 2024 01:12:34 +0900
The task sometimes continues looping in throttle_direct_reclaim() because
allow_direct_reclaim(pgdat) keeps returning false.
#0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
#1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
#2 [ffff80002cb6f990] schedule at ffff800008abc50c
#3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
#4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
#5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
#6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
#7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
#8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
#9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4
At this point, the pgdat contains the following two zones:
NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32"
SIZE: 20480 MIN/LOW/HIGH: 11/28/45
VM_STAT:
NR_FREE_PAGES: 359
NR_ZONE_INACTIVE_ANON: 18813
NR_ZONE_ACTIVE_ANON: 0
NR_ZONE_INACTIVE_FILE: 50
NR_ZONE_ACTIVE_FILE: 0
NR_ZONE_UNEVICTABLE: 0
NR_ZONE_WRITE_PENDING: 0
NR_MLOCK: 0
NR_BOUNCE: 0
NR_ZSPAGES: 0
NR_FREE_CMA_PAGES: 0
NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal"
SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264
VM_STAT:
NR_FREE_PAGES: 146
NR_ZONE_INACTIVE_ANON: 94668
NR_ZONE_ACTIVE_ANON: 3
NR_ZONE_INACTIVE_FILE: 735
NR_ZONE_ACTIVE_FILE: 78
NR_ZONE_UNEVICTABLE: 0
NR_ZONE_WRITE_PENDING: 0
NR_MLOCK: 0
NR_BOUNCE: 0
NR_ZSPAGES: 0
NR_FREE_CMA_PAGES: 0
In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
inactive/active file-backed pages calculated in zone_reclaimable_pages()
based on the result of zone_page_state_snapshot() is zero.
Additionally, since this system lacks swap, the calculation of inactive/
active anonymous pages is skipped.
crash> p nr_swap_pages
nr_swap_pages = $1937 = {
counter = 0
}
As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
free pages significantly exceeding the high watermark.
The problem is that the pgdat->kswapd_failures hasn't been incremented.
crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
$1935 = 0x0
This is because the node deemed balanced. The node balancing logic in
balance_pgdat() evaluates all zones collectively. If one or more zones
(e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
entire node is deemed balanced. This causes balance_pgdat() to exit early
before incrementing the kswapd_failures, as it considers the overall
memory state acceptable, even though some zones (like ZONE_NORMAL) remain
under significant pressure.
The patch ensures that zone_reclaimable_pages() includes free pages
(NR_FREE_PAGES) in its calculation when no other reclaimable pages are
available (e.g., file-backed or anonymous pages). This change prevents
zones like ZONE_DMA32, which have sufficient free pages, from being
mistakenly deemed unreclaimable. By doing so, the patch ensures proper
node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
and prevents infinite loops in throttle_direct_reclaim() caused by
allow_direct_reclaim(pgdat) repeatedly returning false.
The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
by a node being incorrectly deemed balanced despite pressure in certain
zones, such as ZONE_NORMAL. This issue arises from
zone_reclaimable_pages() returning 0 for zones without reclaimable file-
backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
free pages to be skipped.
The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
during reclaim, masking pressure in other zones. Consequently,
pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
mechanisms in allow_direct_reclaim() from being triggered, leading to an
infinite loop in throttle_direct_reclaim().
This patch modifies zone_reclaimable_pages() to account for free pages
(NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones
with sufficient free pages are not skipped, enabling proper balancing and
reclaim behavior.
[akpm(a)linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com
Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com
Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations")
Signed-off-by: Seiji Nishikawa <snishika(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-account-for-free-pages-to-prevent-infinite-loop-in-throttle_direct_reclaim
+++ a/mm/vmscan.c
@@ -374,7 +374,14 @@ unsigned long zone_reclaimable_pages(str
if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
-
+ /*
+ * If there are no reclaimable file-backed or anonymous pages,
+ * ensure zones with sufficient free pages are not skipped.
+ * This prevents zones like DMA32 from being ignored in reclaim
+ * scenarios where they can still help alleviate memory pressure.
+ */
+ if (nr == 0)
+ nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
return nr;
}
_
Patches currently in -mm which might be from snishika(a)redhat.com are
The quilt patch titled
Subject: mm: reinstate ability to map write-sealed memfd mappings read-only
has been removed from the -mm tree. Its filename was
mm-reinstate-ability-to-map-write-sealed-memfd-mappings-read-only.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm: reinstate ability to map write-sealed memfd mappings read-only
Date: Thu, 28 Nov 2024 15:06:17 +0000
Patch series "mm: reinstate ability to map write-sealed memfd mappings
read-only".
In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.
Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.
This series reworks how we both permit write-sealed mappings being mapped
read-only and disallow mprotect() from undoing the write-seal, fixing this
regression.
We also add a regression test to ensure that we do not accidentally
regress this in future.
Thanks to Julian Orth for reporting this regression.
This patch (of 2):
In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.
This was previously unnecessarily disallowed, despite the man page
documentation indicating that it would be, thereby limiting the usefulness
of F_SEAL_WRITE logic.
We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
seal (one which disallows future writes to the memfd) to also be used for
F_SEAL_WRITE.
For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
read-only mapping to disallow mprotect() from overriding the seal - an
operation performed by seal_check_write(), invoked from shmem_mmap(), the
f_op->mmap() hook used by shmem mappings.
By extending this to F_SEAL_WRITE and critically - checking
mapping_map_writable() to determine if we may map the memfd AFTER we
invoke shmem_mmap() - the desired logic becomes possible. This is because
mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
have cleared.
Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.
We reinstate this functionality by moving the check out of shmem_mmap()
and instead performing it in do_mmap() at the point at which VMA flags are
being determined, which seems in any case to be a more appropriate place
in which to make this determination.
In order to achieve this we rework memfd seal logic to allow us access to
this information using existing logic and eliminate the clearing of
VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
instead.
Link: https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Julian Orth <ju.orth(a)gmail.com>
Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZx…
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memfd.h | 14 +++++++++
include/linux/mm.h | 58 +++++++++++++++++++++++++++-------------
mm/memfd.c | 2 -
mm/mmap.c | 4 ++
4 files changed, 59 insertions(+), 19 deletions(-)
--- a/include/linux/memfd.h~mm-reinstate-ability-to-map-write-sealed-memfd-mappings-read-only
+++ a/include/linux/memfd.h
@@ -7,6 +7,7 @@
#ifdef CONFIG_MEMFD_CREATE
extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg);
struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx);
+unsigned int *memfd_file_seals_ptr(struct file *file);
#else
static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned int a)
{
@@ -16,6 +17,19 @@ static inline struct folio *memfd_alloc_
{
return ERR_PTR(-EINVAL);
}
+
+static inline unsigned int *memfd_file_seals_ptr(struct file *file)
+{
+ return NULL;
+}
#endif
+/* Retrieve memfd seals associated with the file, if any. */
+static inline unsigned int memfd_file_seals(struct file *file)
+{
+ unsigned int *sealsp = memfd_file_seals_ptr(file);
+
+ return sealsp ? *sealsp : 0;
+}
+
#endif /* __LINUX_MEMFD_H */
--- a/include/linux/mm.h~mm-reinstate-ability-to-map-write-sealed-memfd-mappings-read-only
+++ a/include/linux/mm.h
@@ -4101,6 +4101,37 @@ void mem_dump_obj(void *object);
static inline void mem_dump_obj(void *object) {}
#endif
+static inline bool is_write_sealed(int seals)
+{
+ return seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE);
+}
+
+/**
+ * is_readonly_sealed - Checks whether write-sealed but mapped read-only,
+ * in which case writes should be disallowing moving
+ * forwards.
+ * @seals: the seals to check
+ * @vm_flags: the VMA flags to check
+ *
+ * Returns whether readonly sealed, in which case writess should be disallowed
+ * going forward.
+ */
+static inline bool is_readonly_sealed(int seals, vm_flags_t vm_flags)
+{
+ /*
+ * Since an F_SEAL_[FUTURE_]WRITE sealed memfd can be mapped as
+ * MAP_SHARED and read-only, take care to not allow mprotect to
+ * revert protections on such mappings. Do this only for shared
+ * mappings. For private mappings, don't need to mask
+ * VM_MAYWRITE as we still want them to be COW-writable.
+ */
+ if (is_write_sealed(seals) &&
+ ((vm_flags & (VM_SHARED | VM_WRITE)) == VM_SHARED))
+ return true;
+
+ return false;
+}
+
/**
* seal_check_write - Check for F_SEAL_WRITE or F_SEAL_FUTURE_WRITE flags and
* handle them.
@@ -4112,24 +4143,15 @@ static inline void mem_dump_obj(void *ob
*/
static inline int seal_check_write(int seals, struct vm_area_struct *vma)
{
- if (seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
- /*
- * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
- * write seals are active.
- */
- if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
- return -EPERM;
-
- /*
- * Since an F_SEAL_[FUTURE_]WRITE sealed memfd can be mapped as
- * MAP_SHARED and read-only, take care to not allow mprotect to
- * revert protections on such mappings. Do this only for shared
- * mappings. For private mappings, don't need to mask
- * VM_MAYWRITE as we still want them to be COW-writable.
- */
- if (vma->vm_flags & VM_SHARED)
- vm_flags_clear(vma, VM_MAYWRITE);
- }
+ if (!is_write_sealed(seals))
+ return 0;
+
+ /*
+ * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
+ * write seals are active.
+ */
+ if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
+ return -EPERM;
return 0;
}
--- a/mm/memfd.c~mm-reinstate-ability-to-map-write-sealed-memfd-mappings-read-only
+++ a/mm/memfd.c
@@ -170,7 +170,7 @@ static int memfd_wait_for_pins(struct ad
return error;
}
-static unsigned int *memfd_file_seals_ptr(struct file *file)
+unsigned int *memfd_file_seals_ptr(struct file *file)
{
if (shmem_file(file))
return &SHMEM_I(file_inode(file))->seals;
--- a/mm/mmap.c~mm-reinstate-ability-to-map-write-sealed-memfd-mappings-read-only
+++ a/mm/mmap.c
@@ -47,6 +47,7 @@
#include <linux/oom.h>
#include <linux/sched/mm.h>
#include <linux/ksm.h>
+#include <linux/memfd.h>
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
@@ -368,6 +369,7 @@ unsigned long do_mmap(struct file *file,
if (file) {
struct inode *inode = file_inode(file);
+ unsigned int seals = memfd_file_seals(file);
unsigned long flags_mask;
if (!file_mmap_ok(file, inode, pgoff, len))
@@ -408,6 +410,8 @@ unsigned long do_mmap(struct file *file,
vm_flags |= VM_SHARED | VM_MAYSHARE;
if (!(file->f_mode & FMODE_WRITE))
vm_flags &= ~(VM_MAYWRITE | VM_SHARED);
+ else if (is_readonly_sealed(seals, vm_flags))
+ vm_flags &= ~VM_MAYWRITE;
fallthrough;
case MAP_PRIVATE:
if (!(file->f_mode & FMODE_READ))
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-vma-move-brk-internals-to-mm-vmac.patch
mm-vma-move-brk-internals-to-mm-vmac-fix.patch
mm-vma-move-unmapped_area-internals-to-mm-vmac.patch
mm-abstract-get_arg_page-stack-expansion-and-mmap-read-lock.patch
mm-vma-move-stack-expansion-logic-to-mm-vmac.patch
mm-vma-move-__vm_munmap-to-mm-vmac.patch
selftests-mm-add-fork-cow-guard-page-test.patch
mm-enforce-__must_check-on-vma-merge-and-split.patch
mm-perform-all-memfd-seal-checks-in-a-single-place.patch
mm-perform-all-memfd-seal-checks-in-a-single-place-fix.patch
maintainers-update-memory-mapping-section.patch
mm-assert-mmap-write-lock-held-on-do_mmap-mmap_region.patch
mm-add-comments-to-do_mmap-mmap_region-and-vm_mmap.patch
tools-testing-add-simple-__mmap_region-userland-test.patch
The quilt patch titled
Subject: maple_tree: fix mas_alloc_cyclic() second search
has been removed from the -mm tree. Its filename was
maple_tree-reload-mas-before-the-second-call-for-mas_empty_area-fix.patch
This patch was dropped because it was folded into maple_tree-reload-mas-before-the-second-call-for-mas_empty_area.patch
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)Oracle.com>
Subject: maple_tree: fix mas_alloc_cyclic() second search
Date: Mon, 16 Dec 2024 14:01:12 -0500
The first search may leave the maple state in an error state. Reset the
maple state before the second search so that the search has a chance of
executing correctly after an exhausted first search.
Link: https://lore.kernel.org/all/20241216060600.287B4C4CED0@smtp.kernel.org/
Link: https://lkml.kernel.org/r/20241216190113.1226145-2-Liam.Howlett@oracle.com
Fixes: 9b6713cc7522 ("maple_tree: Add mtree_alloc_cyclic()")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Yang Erkun <yangerkun(a)huawei.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Chuck Lever <chuck.lever(a)oracle.com> says:
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/maple_tree.c~maple_tree-reload-mas-before-the-second-call-for-mas_empty_area-fix
+++ a/lib/maple_tree.c
@@ -4346,7 +4346,6 @@ int mas_alloc_cyclic(struct ma_state *ma
{
unsigned long min = range_lo;
int ret = 0;
- struct ma_state m = *mas;
range_lo = max(min, *next);
ret = mas_empty_area(mas, range_lo, range_hi, 1);
@@ -4355,7 +4354,7 @@ int mas_alloc_cyclic(struct ma_state *ma
ret = 1;
}
if (ret < 0 && range_lo > min) {
- *mas = m;
+ mas_reset(mas);
ret = mas_empty_area(mas, min, range_hi, 1);
if (ret == 0)
ret = 1;
_
Patches currently in -mm which might be from Liam.Howlett(a)Oracle.com are
maple_tree-reload-mas-before-the-second-call-for-mas_empty_area.patch
test_maple_tree-test-exhausted-upper-limit-of-mtree_alloc_cyclic.patch
From: Joshua Washington <joshwash(a)google.com>
Commit ba0925c34e0f ("gve: process XSK TX descriptors as part of RX NAPI")
moved XSK TX processing to be part of the RX NAPI. However, that commit
did not include triggering the RX NAPI in gve_xsk_wakeup. This is
necessary because the TX NAPI only processes TX completions, meaning
that a TX wakeup would not actually trigger XSK descriptor processing.
Also, the branch on XDP_WAKEUP_TX was supposed to have been removed, as
the NAPI should be scheduled whether the wakeup is for RX or TX.
Fixes: ba0925c34e0f ("gve: process XSK TX descriptors as part of RX NAPI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joshua Washington <joshwash(a)google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
---
drivers/net/ethernet/google/gve/gve_main.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 09fb7f16f73e..8a8f6ab12a98 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1714,7 +1714,7 @@ static int gve_xsk_pool_disable(struct net_device *dev,
static int gve_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
{
struct gve_priv *priv = netdev_priv(dev);
- int tx_queue_id = gve_xdp_tx_queue_id(priv, queue_id);
+ struct napi_struct *napi;
if (!gve_get_napi_enabled(priv))
return -ENETDOWN;
@@ -1722,19 +1722,12 @@ static int gve_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
if (queue_id >= priv->rx_cfg.num_queues || !priv->xdp_prog)
return -EINVAL;
- if (flags & XDP_WAKEUP_TX) {
- struct gve_tx_ring *tx = &priv->tx[tx_queue_id];
- struct napi_struct *napi =
- &priv->ntfy_blocks[tx->ntfy_id].napi;
-
- if (!napi_if_scheduled_mark_missed(napi)) {
- /* Call local_bh_enable to trigger SoftIRQ processing */
- local_bh_disable();
- napi_schedule(napi);
- local_bh_enable();
- }
-
- tx->xdp_xsk_wakeup++;
+ napi = &priv->ntfy_blocks[gve_rx_idx_to_ntfy(priv, queue_id)].napi;
+ if (!napi_if_scheduled_mark_missed(napi)) {
+ /* Call local_bh_enable to trigger SoftIRQ processing */
+ local_bh_disable();
+ napi_schedule(napi);
+ local_bh_enable();
}
return 0;
--
2.47.1.613.gc27f4b7a9f-goog
Greg,
A Gentoo user found an issue with btrfs where:
"... upstream commit a5d74fa24752
("btrfs: avoid unnecessary device path update for the same device") breaks
6.6 LTS kernel behavior where previously lsblk can properly show multiple
mount points of the same btrfs"
There's a revert on the bug report [1].
The fix by the author can be found on linux-btrfs [2]
[1] https://bugs.gentoo.org/947126
[2] https://lore.kernel.org/linux-btrfs/30aefd8b4e8c1f0c5051630b106a1ff3570d28e…
Mike
--
Mike Pagano
Gentoo Developer - Kernel Project
E-Mail : mpagano(a)gentoo.org
GnuPG FP : 52CC A0B0 F631 0B17 0142 F83F 92A6 DBEC 81F2 B137
Public Key : http://pgp.mit.edu/pks/lookup?search=0x92A6DBEC81F2B137&op=index
The field "eip" (instruction pointer) and "esp" (stack pointer) of a task
can be read from /proc/PID/stat. These fields can be interesting for
coredump.
However, these fields were disabled by commit 0a1eb2d474ed ("fs/proc: Stop
reporting eip and esp in /proc/PID/stat"), because it is generally unsafe
to do so. But it is safe for a coredumping process, and therefore
exceptions were made:
- for a coredumping thread by commit fd7d56270b52 ("fs/proc: Report
eip/esp in /prod/PID/stat for coredumping").
- for all other threads in a coredumping process by commit cb8f381f1613
("fs/proc/array.c: allow reporting eip/esp for all coredumping
threads").
The above two commits check the PF_DUMPCORE flag to determine a coredump thread
and the PF_EXITING flag for the other threads.
Unfortunately, commit 92307383082d ("coredump: Don't perform any cleanups
before dumping core") moved coredump to happen earlier and before PF_EXITING is
set. Thus, checking PF_EXITING is no longer the correct way to determine
threads in a coredumping process.
Instead of PF_EXITING, use PF_POSTCOREDUMP to determine the other threads.
Checking of PF_EXITING was added for coredumping, so it probably can now be
removed. But it doesn't hurt to keep.
Fixes: 92307383082d ("coredump: Don't perform any cleanups before dumping core")
Cc: stable(a)vger.kernel.org
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
---
fs/proc/array.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 55ed3510d2bb..d6a0369caa93 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -500,7 +500,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
* a program is not able to use ptrace(2) in that case. It is
* safe because the task has stopped executing permanently.
*/
- if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) {
+ if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE|PF_POSTCOREDUMP))) {
if (try_get_task_stack(task)) {
eip = KSTK_EIP(task);
esp = KSTK_ESP(task);
--
2.39.5
Hi, I'm experiencing UBSAN array-index-out-of-bounds errors while using
my Framework 13" AMD laptop with its Mediatek MT7922 wifi adapter
(mt7921e).
It seems to happen only once on boot, and occurs with both kernel
versions 6.12.7 and 6.13-rc4, both compiled from vanilla upstream kernel
sources on Fedora 41 using the kernel.org LLVM toolchain (19.1.6).
I can try some other kernel series if necessary, and also a bisect if I
find a working version, but that may take me a while.
I wasn't sure if I should mark this as a regression, as I'm not sure
which/if there is a working kernel version at this point.
Thanks.
----
[ 17.754417] UBSAN: array-index-out-of-bounds in /data/linux/net/wireless/scan.c:766:2
[ 17.754423] index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]')
[ 17.754427] CPU: 13 UID: 0 PID: 620 Comm: kworker/u64:10 Tainted: G T 6.13.0-rc4 #9
[ 17.754433] Tainted: [T]=RANDSTRUCT
[ 17.754435] Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
[ 17.754438] Workqueue: events_unbound cfg80211_wiphy_work
[ 17.754446] Call Trace:
[ 17.754449] <TASK>
[ 17.754452] dump_stack_lvl+0x82/0xc0
[ 17.754459] __ubsan_handle_out_of_bounds+0xe7/0x110
[ 17.754464] ? srso_alias_return_thunk+0x5/0xfbef5
[ 17.754470] ? __kmalloc_noprof+0x1a7/0x280
[ 17.754477] cfg80211_scan_6ghz+0x3bb/0xfd0
[ 17.754482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 17.754486] ? try_to_wake_up+0x368/0x4c0
[ 17.754491] ? try_to_wake_up+0x1a9/0x4c0
[ 17.754496] ___cfg80211_scan_done+0xa9/0x1e0
[ 17.754500] cfg80211_wiphy_work+0xb7/0xe0
[ 17.754504] process_scheduled_works+0x205/0x3a0
[ 17.754509] worker_thread+0x24a/0x300
[ 17.754514] ? __cfi_worker_thread+0x10/0x10
[ 17.754519] kthread+0x158/0x180
[ 17.754524] ? __cfi_kthread+0x10/0x10
[ 17.754528] ret_from_fork+0x40/0x50
[ 17.754534] ? __cfi_kthread+0x10/0x10
[ 17.754538] ret_from_fork_asm+0x11/0x30
[ 17.754544] </TASK>