February 2025 - Linux-stable-mirror

[merged mm-hotfixes-stable] mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages has been removed from the -mm tree. Its filename was mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Qi Zheng <zhengqi.arch(a)bytedance.com> Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages Date: Tue, 11 Feb 2025 15:26:25 +0800 In zap_pte_range(), if the pte lock was released midway, the pte entries may be refilled with physical pages by another thread, which may cause a non-empty PTE page to be reclaimed and eventually cause the system to crash. To fix it, fall back to the slow path in this case to recheck if all pte entries are still none. Link: https://lkml.kernel.org/r/20250211072625.89188-1-zhengqi.arch@bytedance.com Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)") Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com> Reported-by: Christian Brauner <brauner(a)kernel.org> Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c7@braune… Reported-by: Qu Wenruo <quwenruo.btrfs(a)gmx.com> Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gmx.com/ Tested-by: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Cc: "Darrick J. Wong" <djwong(a)kernel.org> Cc: Dave Chinner <david(a)fromorbit.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Jann Horn <jannh(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Zi Yan <ziy(a)nvidia.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) --- a/mm/memory.c~mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages +++ a/mm/memory.c @@ -1719,7 +1719,7 @@ static unsigned long zap_pte_range(struc pmd_t pmdval; unsigned long start = addr; bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details); - bool direct_reclaim = false; + bool direct_reclaim = true; int nr; retry: @@ -1734,8 +1734,10 @@ retry: do { bool any_skipped = false; - if (need_resched()) + if (need_resched()) { + direct_reclaim = false; break; + } nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, &force_flush, &force_break, &any_skipped); @@ -1743,11 +1745,20 @@ retry: can_reclaim_pt = false; if (unlikely(force_break)) { addr += nr * PAGE_SIZE; + direct_reclaim = false; break; } } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); - if (can_reclaim_pt && addr == end) + /* + * Fast path: try to hold the pmd lock and unmap the PTE page. + * + * If the pte lock was released midway (retry case), or if the attempt + * to hold the pmd lock failed, then we need to recheck all pte entries + * to ensure they are still none, thereby preventing the pte entries + * from being repopulated by another thread. + */ + if (can_reclaim_pt && direct_reclaim && addr == end) direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval); add_mm_rss_vec(mm, rss); _ Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are arm-pgtable-fix-null-pointer-dereference-issue.patch

8 months, 3 weeks

1
0
0 0

[merged mm-hotfixes-stable] mm-migrate_device-dont-add-folio-to-be-freed-to-lru-in-migrate_device_finalize.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() has been removed from the -mm tree. Its filename was mm-migrate_device-dont-add-folio-to-be-freed-to-lru-in-migrate_device_finalize.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() Date: Mon, 10 Feb 2025 17:13:17 +0100 If migration succeeded, we called folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the old to the new folio. This will set memcg_data of the old folio to 0. Similarly, if migration failed, memcg_data of the dst folio is left unset. If we call folio_putback_lru() on such folios (memcg_data == 0), we will add the folio to be freed to the LRU, making memcg code unhappy. Running the hmm selftests: # ./hmm-tests ... # RUN hmm.hmm_device_private.migrate ... [ 102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00 [ 102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff) [ 102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9 [ 102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000 [ 102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled()) [ 102.087230][T14893] ------------[ cut here ]------------ [ 102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.090478][T14893] Modules linked in: [ 102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151 [ 102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.096104][T14893] Code: ... [ 102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293 [ 102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426 [ 102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880 [ 102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 [ 102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8 [ 102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000 [ 102.108830][T14893] FS: 00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 [ 102.110643][T14893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0 [ 102.113478][T14893] PKRU: 55555554 [ 102.114172][T14893] Call Trace: [ 102.114805][T14893] <TASK> [ 102.115397][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.116547][T14893] ? __warn.cold+0x110/0x210 [ 102.117461][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.118667][T14893] ? report_bug+0x1b9/0x320 [ 102.119571][T14893] ? handle_bug+0x54/0x90 [ 102.120494][T14893] ? exc_invalid_op+0x17/0x50 [ 102.121433][T14893] ? asm_exc_invalid_op+0x1a/0x20 [ 102.122435][T14893] ? __wake_up_klogd.part.0+0x76/0xd0 [ 102.123506][T14893] ? dump_page+0x4f/0x60 [ 102.124352][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.125500][T14893] folio_batch_move_lru+0xd4/0x200 [ 102.126577][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.127505][T14893] __folio_batch_add_and_move+0x391/0x720 [ 102.128633][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.129550][T14893] folio_putback_lru+0x16/0x80 [ 102.130564][T14893] migrate_device_finalize+0x9b/0x530 [ 102.131640][T14893] dmirror_migrate_to_device.constprop.0+0x7c5/0xad0 [ 102.133047][T14893] dmirror_fops_unlocked_ioctl+0x89b/0xc80 Likely, nothing else goes wrong: putting the last folio reference will remove the folio from the LRU again. So besides memcg complaining, adding the folio to be freed to the LRU is just an unnecessary step. The new flow resembles what we have in migrate_folio_move(): add the dst to the lru, remove migration ptes, unlock and unref dst. Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory") Signed-off-by: David Hildenbrand <david(a)redhat.com> Cc: J��r��me Glisse <jglisse(a)redhat.com> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Alistair Popple <apopple(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate_device.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) --- a/mm/migrate_device.c~mm-migrate_device-dont-add-folio-to-be-freed-to-lru-in-migrate_device_finalize +++ a/mm/migrate_device.c @@ -840,20 +840,15 @@ void migrate_device_finalize(unsigned lo dst = src; } + if (!folio_is_zone_device(dst)) + folio_add_lru(dst); remove_migration_ptes(src, dst, 0); folio_unlock(src); - - if (folio_is_zone_device(src)) - folio_put(src); - else - folio_putback_lru(src); + folio_put(src); if (dst != src) { folio_unlock(dst); - if (folio_is_zone_device(dst)) - folio_put(dst); - else - folio_putback_lru(dst); + folio_put(dst); } } } _ Patches currently in -mm which might be from david(a)redhat.com are mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive-fix.patch mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch mm-use-single-swp_device_exclusive-entry-type.patch mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch mm-rmap-avoid-ebusy-from-make_device_exclusive.patch

8 months, 3 weeks

1
0
0 0

[merged mm-hotfixes-stable] mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm,madvise,hugetlb: check for 0-length range after end address adjustment has been removed from the -mm tree. Its filename was mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ricardo Ca��uelo Navarro <rcn(a)igalia.com> Subject: mm,madvise,hugetlb: check for 0-length range after end address adjustment Date: Mon, 3 Feb 2025 08:52:06 +0100 Add a sanity check to madvise_dontneed_free() to address a corner case in madvise where a race condition causes the current vma being processed to be backed by a different page size. During a madvise(MADV_DONTNEED) call on a memory region registered with a userfaultfd, there's a period of time where the process mm lock is temporarily released in order to send a UFFD_EVENT_REMOVE and let userspace handle the event. During this time, the vma covering the current address range may change due to an explicit mmap done concurrently by another thread. If, after that change, the memory region, which was originally backed by 4KB pages, is now backed by hugepages, the end address is rounded down to a hugepage boundary to avoid data loss (see "Fixes" below). This rounding may cause the end address to be truncated to the same address as the start. Make this corner case follow the same semantics as in other similar cases where the requested region has zero length (ie. return 0). This will make madvise_walk_vmas() continue to the next vma in the range (this time holding the process mm lock) which, due to the prev pointer becoming stale because of the vma change, will be the same hugepage-backed vma that was just checked before. The next time madvise_dontneed_free() runs for this vma, if the start address isn't aligned to a hugepage boundary, it'll return -EINVAL, which is also in line with the madvise api. From userspace perspective, madvise() will return EINVAL because the start address isn't aligned according to the new vma alignment requirements (hugepage), even though it was correctly page-aligned when the call was issued. Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs") Signed-off-by: Ricardo Ca��uelo Navarro <rcn(a)igalia.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Florent Revest <revest(a)google.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/madvise.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/mm/madvise.c~mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment +++ a/mm/madvise.c @@ -933,7 +933,16 @@ static long madvise_dontneed_free(struct */ end = vma->vm_end; } - VM_WARN_ON(start >= end); + /* + * If the memory region between start and end was + * originally backed by 4kB pages and then remapped to + * be backed by hugepages while mmap_lock was dropped, + * the adjustment for hugetlb vma above may have rounded + * end down to the start address. + */ + if (start == end) + return 0; + VM_WARN_ON(start > end); } if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED) _ Patches currently in -mm which might be from rcn(a)igalia.com are

8 months, 3 weeks

1
0
0 0

[merged mm-hotfixes-stable] mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/zswap: fix inconsistency when zswap_store_page() fails has been removed from the -mm tree. Its filename was mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hyeonggon Yoo <42.hyeyoo(a)gmail.com> Subject: mm/zswap: fix inconsistency when zswap_store_page() fails Date: Wed, 29 Jan 2025 19:08:44 +0900 Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") skips charging any zswap entries when it failed to zswap the entire folio. However, when some base pages are zswapped but it failed to zswap the entire folio, the zswap operation is rolled back. When freeing zswap entries for those pages, zswap_entry_free() uncharges the zswap entries that were not previously charged, causing zswap charging to become inconsistent. This inconsistency triggers two warnings with following steps: # On a machine with 64GiB of RAM and 36GiB of zswap $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng $ sudo reboot The two warnings are: in mm/memcontrol.c:163, function obj_cgroup_release(): WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); in mm/page_counter.c:60, function page_counter_cancel(): if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n", new, nr_pages)) zswap_stored_pages also becomes inconsistent in the same way. As suggested by Kanchana, increment zswap_stored_pages and charge zswap entries within zswap_store_page() when it succeeds. This way, zswap_entry_free() will decrement the counter and uncharge the entries when it failed to zswap the entire folio. While this could potentially be optimized by batching objcg charging and incrementing the counter, let's focus on fixing the bug this time and leave the optimization for later after some evaluation. After resolving the inconsistency, the warnings disappear. [42.hyeyoo(a)gmail.com: refactor zswap_store_page()] Link: https://lkml.kernel.org/r/20250131082037.2426-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20250129100844.2935-1-42.hyeyoo@gmail.com Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com> Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com> Acked-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> Acked-by: Nhat Pham <nphamcs(a)gmail.com> Cc: Chengming Zhou <chengming.zhou(a)linux.dev> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/zswap.c | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-) --- a/mm/zswap.c~mm-zswap-fix-inconsistency-when-zswap_store_page-fails +++ a/mm/zswap.c @@ -1445,9 +1445,9 @@ resched: * main API **********************************/ -static ssize_t zswap_store_page(struct page *page, - struct obj_cgroup *objcg, - struct zswap_pool *pool) +static bool zswap_store_page(struct page *page, + struct obj_cgroup *objcg, + struct zswap_pool *pool) { swp_entry_t page_swpentry = page_swap_entry(page); struct zswap_entry *entry, *old; @@ -1456,7 +1456,7 @@ static ssize_t zswap_store_page(struct p entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); if (!entry) { zswap_reject_kmemcache_fail++; - return -EINVAL; + return false; } if (!zswap_compress(page, entry, pool)) @@ -1483,13 +1483,17 @@ static ssize_t zswap_store_page(struct p /* * The entry is successfully compressed and stored in the tree, there is - * no further possibility of failure. Grab refs to the pool and objcg. - * These refs will be dropped by zswap_entry_free() when the entry is - * removed from the tree. + * no further possibility of failure. Grab refs to the pool and objcg, + * charge zswap memory, and increment zswap_stored_pages. + * The opposite actions will be performed by zswap_entry_free() + * when the entry is removed from the tree. */ zswap_pool_get(pool); - if (objcg) + if (objcg) { obj_cgroup_get(objcg); + obj_cgroup_charge_zswap(objcg, entry->length); + } + atomic_long_inc(&zswap_stored_pages); /* * We finish initializing the entry while it's already in xarray. @@ -1510,13 +1514,13 @@ static ssize_t zswap_store_page(struct p zswap_lru_add(&zswap_list_lru, entry); } - return entry->length; + return true; store_failed: zpool_free(pool->zpool, entry->handle); compress_failed: zswap_entry_cache_free(entry); - return -EINVAL; + return false; } bool zswap_store(struct folio *folio) @@ -1526,7 +1530,6 @@ bool zswap_store(struct folio *folio) struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg = NULL; struct zswap_pool *pool; - size_t compressed_bytes = 0; bool ret = false; long index; @@ -1564,20 +1567,14 @@ bool zswap_store(struct folio *folio) for (index = 0; index < nr_pages; ++index) { struct page *page = folio_page(folio, index); - ssize_t bytes; - bytes = zswap_store_page(page, objcg, pool); - if (bytes < 0) + if (!zswap_store_page(page, objcg, pool)) goto put_pool; - compressed_bytes += bytes; } - if (objcg) { - obj_cgroup_charge_zswap(objcg, compressed_bytes); + if (objcg) count_objcg_events(objcg, ZSWPOUT, nr_pages); - } - atomic_long_add(nr_pages, &zswap_stored_pages); count_vm_events(ZSWPOUT, nr_pages); ret = true; _ Patches currently in -mm which might be from 42.hyeyoo(a)gmail.com are

8 months, 3 weeks

1
0
0 0

[merged mm-hotfixes-stable] lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: lib/iov_iter: fix import_iovec_ubuf iovec management has been removed from the -mm tree. Its filename was lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Pavel Begunkov <asml.silence(a)gmail.com> Subject: lib/iov_iter: fix import_iovec_ubuf iovec management Date: Fri, 31 Jan 2025 14:13:15 +0000 import_iovec() says that it should always be fine to kfree the iovec returned in @iovp regardless of the error code. __import_iovec_ubuf() never reallocates it and thus should clear the pointer even in cases when copy_iovec_*() fail. Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.17383324… Fixes: 3b2deb0e46da ("iov_iter: import single vector iovecs as ITER_UBUF") Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Reviewed-by: Jens Axboe <axboe(a)kernel.dk> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/iov_iter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/lib/iov_iter.c~lib-iov_iter-fix-import_iovec_ubuf-iovec-management +++ a/lib/iov_iter.c @@ -1428,6 +1428,8 @@ static ssize_t __import_iovec_ubuf(int t struct iovec *iov = *iovp; ssize_t ret; + *iovp = NULL; + if (compat) ret = copy_compat_iovec_from_user(iov, uvec, 1); else @@ -1438,7 +1440,6 @@ static ssize_t __import_iovec_ubuf(int t ret = import_ubuf(type, iov->iov_base, iov->iov_len, i); if (unlikely(ret)) return ret; - *iovp = NULL; return i->count; } _ Patches currently in -mm which might be from asml.silence(a)gmail.com are

8 months, 3 weeks

1
0
0 0

+ hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio has been added to the -mm mm-hotfixes-unstable branch. Its filename is hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ma Wupeng <mawupeng1(a)huawei.com> Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio Date: Mon, 17 Feb 2025 09:43:29 +0800 Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined) add page poison checks in do_migrate_range in order to make offline hwpoisoned page possible by introducing isolate_lru_page and try_to_unmap for hwpoisoned page. However folio lock must be held before calling try_to_unmap. Add it to fix this problem. Warning will be produced if folio is not locked during unmap: ------------[ cut here ]------------ kernel BUG at ./include/linux/swapops.h:400! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 Tainted: [W]=WARN Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : try_to_unmap_one+0xb08/0xd3c lr : try_to_unmap_one+0x3dc/0xd3c Call trace: try_to_unmap_one+0xb08/0xd3c (P) try_to_unmap_one+0x3dc/0xd3c (L) rmap_walk_anon+0xdc/0x1f8 rmap_walk+0x3c/0x58 try_to_unmap+0x88/0x90 unmap_poisoned_folio+0x30/0xa8 do_migrate_range+0x4a0/0x568 offline_pages+0x5a4/0x670 memory_block_action+0x17c/0x374 memory_subsys_offline+0x3c/0x78 device_offline+0xa4/0xd0 state_store+0x8c/0xf0 dev_attr_store+0x18/0x2c sysfs_kf_write+0x44/0x54 kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x3a8/0x4bc ksys_write+0x6c/0xf8 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0xc8/0xcc el0t_64_sync+0x198/0x19c Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory_hotplug.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/memory_hotplug.c~hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio +++ a/mm/memory_hotplug.c @@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned lo (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); - if (folio_mapped(folio)) + if (folio_mapped(folio)) { + folio_lock(folio); unmap_poisoned_folio(folio, pfn, false); + folio_unlock(folio); + } goto put_folio; } _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch

8 months, 3 weeks

1
0
0 0

+ mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ma Wupeng <mawupeng1(a)huawei.com> Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range Date: Mon, 17 Feb 2025 09:43:28 +0800 If a folio has an increased reference count, folio_try_get() will acquire it, perform necessary operations, and then release it. In the case of a poisoned folio without an elevated reference count (which is unlikely for memory-failure), folio_try_get() will simply bypass it. Therefore, relocate the folio_try_get() function, responsible for checking and acquiring this reference count at first. Link: https://lkml.kernel.org/r/20250217014329.3610326-3-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory_hotplug.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) --- a/mm/memory_hotplug.c~mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range +++ a/mm/memory_hotplug.c @@ -1822,12 +1822,12 @@ static void do_migrate_range(unsigned lo if (folio_test_large(folio)) pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; - /* - * HWPoison pages have elevated reference counts so the migration would - * fail on them. It also doesn't make any sense to migrate them in the - * first place. Still try to unmap such a page in case it is still mapped - * (keep the unmap as the catch all safety net). - */ + if (!folio_try_get(folio)) + continue; + + if (unlikely(page_folio(page) != folio)) + goto put_folio; + if (folio_test_hwpoison(folio) || (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) @@ -1835,14 +1835,8 @@ static void do_migrate_range(unsigned lo if (folio_mapped(folio)) unmap_poisoned_folio(folio, pfn, false); - continue; - } - - if (!folio_try_get(folio)) - continue; - - if (unlikely(page_folio(page) != folio)) goto put_folio; + } if (!isolate_folio_to_list(folio, &source)) { if (__ratelimit(&migrate_rs)) { _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch

8 months, 3 weeks

1
0
0 0

+ mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: memory-failure: update ttu flag inside unmap_poisoned_folio has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ma Wupeng <mawupeng1(a)huawei.com> Subject: mm: memory-failure: update ttu flag inside unmap_poisoned_folio Date: Mon, 17 Feb 2025 09:43:27 +0800 Patch series "mm: memory_failure: unmap poisoned folio during migrate properly", v3. Fix two bugs during folio migration if the folio is poisoned. This patch (of 3): Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON in order to stop send SIGBUS signal when accessing an error page after a memory error on a clean folio. However during page migration, anon folio must be set with TTU_HWPOISON during unmap_*(). For pagecache we need some policy just like the one in hwpoison_user_mappings to set this flag. So move this policy from hwpoison_user_mappings to unmap_poisoned_folio to handle this warning properly. Warning will be produced during unamp poison folio with the following log: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c Modules linked in: CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G W 6.13.0-rc1-00018-gacdb4bbda7ab #42 Tainted: [W]=WARN Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : try_to_unmap_one+0x8fc/0xd3c lr : try_to_unmap_one+0x3dc/0xd3c Call trace: try_to_unmap_one+0x8fc/0xd3c (P) try_to_unmap_one+0x3dc/0xd3c (L) rmap_walk_anon+0xdc/0x1f8 rmap_walk+0x3c/0x58 try_to_unmap+0x88/0x90 unmap_poisoned_folio+0x30/0xa8 do_migrate_range+0x4a0/0x568 offline_pages+0x5a4/0x670 memory_block_action+0x17c/0x374 memory_subsys_offline+0x3c/0x78 device_offline+0xa4/0xd0 state_store+0x8c/0xf0 dev_attr_store+0x18/0x2c sysfs_kf_write+0x44/0x54 kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x3a8/0x4bc ksys_write+0x6c/0xf8 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0xc8/0xcc el0t_64_sync+0x198/0x19c ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20250217014329.3610326-1-mawupeng1@huawei.com Link: https://lkml.kernel.org/r/20250217014329.3610326-2-mawupeng1@huawei.com Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Suggested-by: David Hildenbrand <david(a)redhat.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Ma Wupeng <mawupeng1(a)huawei.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/internal.h | 5 ++- mm/memory-failure.c | 61 +++++++++++++++++++++--------------------- mm/memory_hotplug.c | 3 +- 3 files changed, 36 insertions(+), 33 deletions(-) --- a/mm/internal.h~mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio +++ a/mm/internal.h @@ -1115,7 +1115,7 @@ static inline int find_next_best_node(in * mm/memory-failure.c */ #ifdef CONFIG_MEMORY_FAILURE -void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu); +int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill); void shake_folio(struct folio *folio); extern int hwpoison_filter(struct page *p); @@ -1138,8 +1138,9 @@ unsigned long page_mapped_in_vma(const s struct vm_area_struct *vma); #else -static inline void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu) +static inline int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill) { + return -EBUSY; } #endif --- a/mm/memory-failure.c~mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio +++ a/mm/memory-failure.c @@ -1556,8 +1556,34 @@ static int get_hwpoison_page(struct page return ret; } -void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu) +int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill) { + enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON; + struct address_space *mapping; + + if (folio_test_swapcache(folio)) { + pr_err("%#lx: keeping poisoned page in swap cache\n", pfn); + ttu &= ~TTU_HWPOISON; + } + + /* + * Propagate the dirty bit from PTEs to struct page first, because we + * need this to decide if we should kill or just drop the page. + * XXX: the dirty test could be racy: set_page_dirty() may not always + * be called inside page lock (it's recommended but not enforced). + */ + mapping = folio_mapping(folio); + if (!must_kill && !folio_test_dirty(folio) && mapping && + mapping_can_writeback(mapping)) { + if (folio_mkclean(folio)) { + folio_set_dirty(folio); + } else { + ttu &= ~TTU_HWPOISON; + pr_info("%#lx: corrupted page was clean: dropped without side effects\n", + pfn); + } + } + if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) { struct address_space *mapping; @@ -1572,7 +1598,7 @@ void unmap_poisoned_folio(struct folio * if (!mapping) { pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n", folio_pfn(folio)); - return; + return -EBUSY; } try_to_unmap(folio, ttu|TTU_RMAP_LOCKED); @@ -1580,6 +1606,8 @@ void unmap_poisoned_folio(struct folio * } else { try_to_unmap(folio, ttu); } + + return folio_mapped(folio) ? -EBUSY : 0; } /* @@ -1589,8 +1617,6 @@ void unmap_poisoned_folio(struct folio * static bool hwpoison_user_mappings(struct folio *folio, struct page *p, unsigned long pfn, int flags) { - enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON; - struct address_space *mapping; LIST_HEAD(tokill); bool unmap_success; int forcekill; @@ -1613,29 +1639,6 @@ static bool hwpoison_user_mappings(struc if (!folio_mapped(folio)) return true; - if (folio_test_swapcache(folio)) { - pr_err("%#lx: keeping poisoned page in swap cache\n", pfn); - ttu &= ~TTU_HWPOISON; - } - - /* - * Propagate the dirty bit from PTEs to struct page first, because we - * need this to decide if we should kill or just drop the page. - * XXX: the dirty test could be racy: set_page_dirty() may not always - * be called inside page lock (it's recommended but not enforced). - */ - mapping = folio_mapping(folio); - if (!(flags & MF_MUST_KILL) && !folio_test_dirty(folio) && mapping && - mapping_can_writeback(mapping)) { - if (folio_mkclean(folio)) { - folio_set_dirty(folio); - } else { - ttu &= ~TTU_HWPOISON; - pr_info("%#lx: corrupted page was clean: dropped without side effects\n", - pfn); - } - } - /* * First collect all the processes that have the page * mapped in dirty form. This has to be done before try_to_unmap, @@ -1643,9 +1646,7 @@ static bool hwpoison_user_mappings(struc */ collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); - unmap_poisoned_folio(folio, ttu); - - unmap_success = !folio_mapped(folio); + unmap_success = !unmap_poisoned_folio(folio, pfn, flags & MF_MUST_KILL); if (!unmap_success) pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n", pfn, folio_mapcount(folio)); --- a/mm/memory_hotplug.c~mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio +++ a/mm/memory_hotplug.c @@ -1833,7 +1833,8 @@ static void do_migrate_range(unsigned lo if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); if (folio_mapped(folio)) - unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); + unmap_poisoned_folio(folio, pfn, false); + continue; } _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch

8 months, 3 weeks

1
0
0 0

+ arm-pgtable-fix-null-pointer-dereference-issue.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: arm: pgtable: fix NULL pointer dereference issue has been added to the -mm mm-hotfixes-unstable branch. Its filename is arm-pgtable-fix-null-pointer-dereference-issue.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Qi Zheng <zhengqi.arch(a)bytedance.com> Subject: arm: pgtable: fix NULL pointer dereference issue Date: Mon, 17 Feb 2025 10:49:24 +0800 When update_mmu_cache_range() is called by update_mmu_cache(), the vmf parameter is NULL, which will cause a NULL pointer dereference issue in adjust_pte(): Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read Hardware name: Atmel AT91SAM9 PC is at update_mmu_cache_range+0x1e0/0x278 LR is at pte_offset_map_rw_nolock+0x18/0x2c Call trace: update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec remove_migration_pte from rmap_walk_file+0xcc/0x130 rmap_walk_file from remove_migration_ptes+0x90/0xa4 remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 migrate_pages_batch from migrate_pages+0x188/0x488 migrate_pages from compact_zone+0x56c/0x954 compact_zone from compact_node+0x90/0xf0 compact_node from kcompactd+0x1d4/0x204 kcompactd from kthread+0x120/0x12c kthread from ret_from_fork+0x14/0x38 Exception stack(0xc0d8bfb0 to 0xc0d8bff8) To fix it, do not rely on whether 'ptl' is equal to decide whether to hold the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is enabled. In addition, if two vmas map to the same PTE page, there is no need to hold the pte lock again, otherwise a deadlock will occur. Just add the need_lock parameter to let adjust_pte() know this information. Link: https://lkml.kernel.org/r/20250217024924.57996-1-zhengqi.arch@bytedance.com Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com> Reported-by: Ezra Buehler <ezra.buehler(a)husqvarnagroup.com> Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0c… Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Hugh Dickens <hughd(a)google.com> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Qi Zheng <zhengqi.arch(a)bytedance.com> Cc: Russel King <linux(a)armlinux.org.uk> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/arm/mm/fault-armv.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) --- a/arch/arm/mm/fault-armv.c~arm-pgtable-fix-null-pointer-dereference-issue +++ a/arch/arm/mm/fault-armv.c @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_ } static int adjust_pte(struct vm_area_struct *vma, unsigned long address, - unsigned long pfn, struct vm_fault *vmf) + unsigned long pfn, bool need_lock) { spinlock_t *ptl; pgd_t *pgd; @@ -99,12 +99,11 @@ again: if (!pte) return 0; - /* - * If we are using split PTE locks, then we need to take the page - * lock here. Otherwise we are using shared mm->page_table_lock - * which is already locked, thus cannot take it. - */ - if (ptl != vmf->ptl) { + if (need_lock) { + /* + * Use nested version here to indicate that we are already + * holding one similar spinlock. + */ spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { pte_unmap_unlock(pte, ptl); @@ -114,7 +113,7 @@ again: ret = do_adjust_pte(vma, address, pfn, pte); - if (ptl != vmf->ptl) + if (need_lock) spin_unlock(ptl); pte_unmap(pte); @@ -123,9 +122,10 @@ again: static void make_coherent(struct address_space *mapping, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep, unsigned long pfn, - struct vm_fault *vmf) + unsigned long addr, pte_t *ptep, unsigned long pfn) { + const unsigned long pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); + const unsigned long pmd_end_addr = pmd_start_addr + PMD_SIZE; struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *mpnt; unsigned long offset; @@ -142,6 +142,14 @@ make_coherent(struct address_space *mapp flush_dcache_mmap_lock(mapping); vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { /* + * If we are using split PTE locks, then we need to take the pte + * lock. Otherwise we are using shared mm->page_table_lock which + * is already locked, thus cannot take it. + */ + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); + unsigned long mpnt_addr; + + /* * If this VMA is not in our MM, we can ignore it. * Note that we intentionally mask out the VMA * that we are fixing up. @@ -151,7 +159,12 @@ make_coherent(struct address_space *mapp if (!(mpnt->vm_flags & VM_MAYSHARE)) continue; offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); + mpnt_addr = mpnt->vm_start + offset; + + /* Avoid deadlocks by not grabbing the same PTE lock again. */ + if (mpnt_addr >= pmd_start_addr && mpnt_addr < pmd_end_addr) + need_lock = false; + aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock); } flush_dcache_mmap_unlock(mapping); if (aliases) @@ -194,7 +207,7 @@ void update_mmu_cache_range(struct vm_fa __flush_dcache_folio(mapping, folio); if (mapping) { if (cache_is_vivt()) - make_coherent(mapping, vma, addr, ptep, pfn, vmf); + make_coherent(mapping, vma, addr, ptep, pfn); else if (vma->vm_flags & VM_EXEC) __flush_icache_all(); } _ Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch arm-pgtable-fix-null-pointer-dereference-issue.patch

8 months, 3 weeks

1
0
0 0

[net-next, v3] wifi: mt76: mt7921: fix kernel panic due to null pointer dereference

by Mingyen Hsieh

From: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com> Address a kernel panic caused by a null pointer dereference in the `mt792x_rx_get_wcid` function. The issue arises because the `deflink` structure is not properly initialized with the `sta` context. This patch ensures that the `deflink` structure is correctly linked to the `sta` context, preventing the null pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000400 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 470 Comm: mt76-usb-rx phy Not tainted 6.12.13-gentoo-dist #1 Hardware name: /AMD HUDSON-M1, BIOS 4.6.4 11/15/2011 RIP: 0010:mt792x_rx_get_wcid+0x48/0x140 [mt792x_lib] RSP: 0018:ffffa147c055fd98 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8e9ecb652000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e9ecb652000 RBP: 0000000000000685 R08: ffff8e9ec6570000 R09: 0000000000000000 R10: ffff8e9ecd2ca000 R11: ffff8e9f22a217c0 R12: 0000000038010119 R13: 0000000080843801 R14: ffff8e9ec6570000 R15: ffff8e9ecb652000 FS: 0000000000000000(0000) GS:ffff8e9f22a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000400 CR3: 000000000d2ea000 CR4: 00000000000006f0 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? mt792x_rx_get_wcid+0x48/0x140 [mt792x_lib] mt7921_queue_rx_skb+0x1c6/0xaa0 [mt7921_common] mt76u_alloc_queues+0x784/0x810 [mt76_usb] ? __pfx___mt76_worker_fn+0x10/0x10 [mt76] __mt76_worker_fn+0x4f/0x80 [mt76] kthread+0xd2/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Reported-by: Nick Morrow <usbwifi2024(a)gmail.com> Closes: https://github.com/morrownr/USB-WiFi/issues/577 Cc: stable(a)vger.kernel.org Fixes: 90c10286b176 ("wifi: mt76: mt7925: Update mt792x_rx_get_wcid for per-link STA") Signed-off-by: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com> Tested-by: Salah Coronya <salah.coronya(a)gmail.com> --- v2: - Change Nick from "Tested-by" to "Reported-by". v3: - Rewrite the commit msg in the imperative mood. - Remove the boot time for code trace. --- drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c b/drivers/net/wireless/mediatek/mt76/mt7921/main.c index 13e58c328aff..78b77a54d195 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c @@ -811,6 +811,7 @@ int mt7921_mac_sta_add(struct mt76_dev *mdev, struct ieee80211_vif *vif, msta->deflink.wcid.phy_idx = mvif->bss_conf.mt76.band_idx; msta->deflink.wcid.tx_info |= MT_WCID_TX_INFO_SET; msta->deflink.last_txs = jiffies; + msta->deflink.sta = msta; ret = mt76_connac_pm_wake(&dev->mphy, &dev->pm); if (ret) -- 2.45.2

8 months, 3 weeks

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2025