- Linux-stable-mirror - lists.linaro.org

[patch 11/11] mm, page_alloc: fix core hung in free_pcppages_bulk()

by Andrew Morton

From: Charan Teja Reddy <charante(a)codeaurora.org> Subject: mm, page_alloc: fix core hung in free_pcppages_bulk() The following race is observed with the repeated online, offline and a delay between two successive online of memory blocks of movable zone. P1 P2 Online the first memory block in the movable zone. The pcp struct values are initialized to default values,i.e., pcp->high = 0 & pcp->batch = 1. Allocate the pages from the movable zone. Try to Online the second memory block in the movable zone thus it entered the online_pages() but yet to call zone_pcp_update(). This process is entered into the exit path thus it tries to release the order-0 pages to pcp lists through free_unref_page_commit(). As pcp->high = 0, pcp->count = 1 proceed to call the function free_pcppages_bulk(). Update the pcp values thus the new pcp values are like, say, pcp->high = 378, pcp->batch = 63. Read the pcp's batch value using READ_ONCE() and pass the same to free_pcppages_bulk(), pcp values passed here are, batch = 63, count = 1. Since num of pages in the pcp lists are less than ->batch, then it will stuck in while(list_empty(list)) loop with interrupts disabled thus a core hung. Avoid this by ensuring free_pcppages_bulk() is called with proper count of pcp list pages. The mentioned race is some what easily reproducible without [1] because pcp's are not updated for the first memory block online and thus there is a enough race window for P2 between alloc+free and pcp struct values update through onlining of second memory block. With [1], the race still exists but it is very narrow as we update the pcp struct values for the first memory block online itself. This is not limited to the movable zone, it could also happen in cases with the normal zone (e.g., hotplug to a node that only has DMA memory, or no other memory yet). [1]: https://patchwork.kernel.org/patch/11696389/ Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaur… Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into one-list-per-migrate-type") Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org> Acked-by: David Hildenbrand <david(a)redhat.com> Acked-by: David Rientjes <rientjes(a)google.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Vinayak Menon <vinmenon(a)codeaurora.org> Cc: <stable(a)vger.kernel.org> [2.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/mm/page_alloc.c~mm-page_alloc-fix-core-hung-in-free_pcppages_bulk +++ a/mm/page_alloc.c @@ -1302,6 +1302,11 @@ static void free_pcppages_bulk(struct zo struct page *page, *tmp; LIST_HEAD(head); + /* + * Ensure proper count is passed which otherwise would stuck in the + * below while (list_empty(list)) loop. + */ + count = min(pcp->count, count); while (count) { struct list_head *list; _

5 years, 3 months

1
0
0 0

[patch 10/11] mm: include CMA pages in lowmem_reserve at boot

by Andrew Morton

From: Doug Berger <opendmb(a)gmail.com> Subject: mm: include CMA pages in lowmem_reserve at boot The lowmem_reserve arrays provide a means of applying pressure against allocations from lower zones that were targeted at higher zones. Its values are a function of the number of pages managed by higher zones and are assigned by a call to the setup_per_zone_lowmem_reserve() function. The function is initially called at boot time by the function init_per_zone_wmark_min() and may be called later by accesses of the /proc/sys/vm/lowmem_reserve_ratio sysctl file. The function init_per_zone_wmark_min() was moved up from a module_init to a core_initcall to resolve a sequencing issue with khugepaged. Unfortunately this created a sequencing issue with CMA page accounting. The CMA pages are added to the managed page count of a zone when cma_init_reserved_areas() is called at boot also as a core_initcall. This makes it uncertain whether the CMA pages will be added to the managed page counts of their zones before or after the call to init_per_zone_wmark_min() as it becomes dependent on link order. With the current link order the pages are added to the managed count after the lowmem_reserve arrays are initialized at boot. This means the lowmem_reserve values at boot may be lower than the values used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the ratio values are unchanged. In many cases the difference is not significant, but for example an ARM platform with 1GB of memory and the following memory layout [ 0.000000] cma: Reserved 256 MiB at 0x0000000030000000 [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff] [ 0.000000] Normal empty [ 0.000000] HighMem [mem 0x0000000030000000-0x000000003fffffff] would result in 0 lowmem_reserve for the DMA zone. This would allow userspace to deplete the DMA zone easily. Funnily enough $ cat /proc/sys/vm/lowmem_reserve_ratio would fix up the situation because it forces setup_per_zone_lowmem_reserve as a side effect. This commit breaks the link order dependency by invoking init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages have the chance to be properly accounted in their zone(s) and allowing the lowmem_reserve arrays to receive consistent values. Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com Fixes: bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization") Signed-off-by: Doug Berger <opendmb(a)gmail.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Jason Baron <jbaron(a)akamai.com> Cc: David Rientjes <rientjes(a)google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-include-cma-pages-in-lowmem_reserve-at-boot +++ a/mm/page_alloc.c @@ -7888,7 +7888,7 @@ int __meminit init_per_zone_wmark_min(vo return 0; } -core_initcall(init_per_zone_wmark_min) +postcore_initcall(init_per_zone_wmark_min) /* * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so _

5 years, 3 months

1
0
0 0

[patch 09/11] squashfs: avoid bio_alloc() failure with 1Mbyte blocks

by Andrew Morton

From: Phillip Lougher <phillip(a)squashfs.org.uk> Subject: squashfs: avoid bio_alloc() failure with 1Mbyte blocks This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Bio_alloc() is limited to 256 pages (1 Mbyte). This can cause a failure when reading 1 Mbyte block filesystems. The problem is a datablock can be fully (or almost uncompressed), requiring 256 pages, but, because blocks are not aligned to page boundaries, it may require 257 pages to read. Bio_kmalloc() can handle 1024 pages, and so use this for the edge condition. Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO") Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk> Reported-by: Nicolas Prochazka <nicolas.prochazka(a)gmail.com> Reported-by: Tomoatsu Shimada <shimada(a)walbrix.com> Reviewed-by: Guenter Roeck <groeck(a)chromium.org> Cc: Philippe Liard <pliard(a)google.com> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Adrien Schildknecht <adrien+dev(a)schischi.me> Cc: Daniel Rosenberg <drosen(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/squashfs/block.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/fs/squashfs/block.c~squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks +++ a/fs/squashfs/block.c @@ -87,7 +87,11 @@ static int squashfs_bio_read(struct supe int error, i; struct bio *bio; - bio = bio_alloc(GFP_NOIO, page_count); + if (page_count <= BIO_MAX_PAGES) + bio = bio_alloc(GFP_NOIO, page_count); + else + bio = bio_kmalloc(GFP_NOIO, page_count); + if (!bio) return -ENOMEM; _

5 years, 3 months

1
0
0 0

[patch 08/11] uprobes: __replace_page() avoid BUG in munlock_vma_page()

by Andrew Morton

From: Hugh Dickins <hughd(a)google.com> Subject: uprobes: __replace_page() avoid BUG in munlock_vma_page() syzbot crashed on the VM_BUG_ON_PAGE(PageTail) in munlock_vma_page(), when called from uprobes __replace_page(). Which of many ways to fix it? Settled on not calling when PageCompound (since Head and Tail are equals in this context, PageCompound the usual check in uprobes.c, and the prior use of FOLL_SPLIT_PMD will have cleared PageMlocked already). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008161338360.20413@eggly.anvils Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT") Signed-off-by: Hugh Dickins <hughd(a)google.com> Reported-by: syzbot <syzkaller(a)googlegroups.com> Acked-by: Song Liu <songliubraving(a)fb.com> Acked-by: Oleg Nesterov <oleg(a)redhat.com> Reviewed-by: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/events/uprobes.c~uprobes-__replace_page-avoid-bug-in-munlock_vma_page +++ a/kernel/events/uprobes.c @@ -205,7 +205,7 @@ static int __replace_page(struct vm_area try_to_free_swap(old_page); page_vma_mapped_walk_done(&pvmw); - if (vma->vm_flags & VM_LOCKED) + if ((vma->vm_flags & VM_LOCKED) && !PageCompound(old_page)) munlock_vma_page(old_page); put_page(old_page); _

5 years, 3 months

1
0
0 0

[patch 07/11] kernel/relay.c: fix memleak on destroy relay channel

by Andrew Morton

From: Wei Yongjun <weiyongjun1(a)huawei.com> Subject: kernel/relay.c: fix memleak on destroy relay channel kmemleak report memory leak as follows: unreferenced object 0x607ee4e5f948 (size 8): comm "syz-executor.1", pid 2098, jiffies 4295031601 (age 288.468s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000ca1de2fa>] relay_open kernel/relay.c:583 [inline] [<00000000ca1de2fa>] relay_open+0xb6/0x970 kernel/relay.c:563 [<0000000038ae5a4b>] do_blk_trace_setup+0x4a8/0xb20 kernel/trace/blktrace.c:557 [<00000000d5e778e9>] __blk_trace_setup+0xb6/0x150 kernel/trace/blktrace.c:597 [<0000000038fdf803>] blk_trace_ioctl+0x146/0x280 kernel/trace/blktrace.c:738 [<00000000ce25a0ca>] blkdev_ioctl+0xb2/0x6a0 block/ioctl.c:613 [<00000000579e47e0>] block_ioctl+0xe5/0x120 fs/block_dev.c:1871 [<00000000b1588c11>] vfs_ioctl fs/ioctl.c:48 [inline] [<00000000b1588c11>] __do_sys_ioctl fs/ioctl.c:753 [inline] [<00000000b1588c11>] __se_sys_ioctl fs/ioctl.c:739 [inline] [<00000000b1588c11>] __x64_sys_ioctl+0x170/0x1ce fs/ioctl.c:739 [<0000000088fc9942>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 [<000000004f6dd57a>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 'chan->buf' is malloced in relay_open() by alloc_percpu() but not free while destroy the relay channel. Fix it by adding free_percpu() before return from relay_destroy_channel(). Link: http://lkml.kernel.org/r/20200817122826.48518-1-weiyongjun1@huawei.com Fixes: 017c59c042d0 ("relay: Use per CPU constructs for the relay channel buffer pointers") Signed-off-by: Wei Yongjun <weiyongjun1(a)huawei.com> Reported-by: Hulk Robot <hulkci(a)huawei.com> Reviewed-by: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Michael Ellerman <mpe(a)ellerman.id.au> Cc: David Rientjes <rientjes(a)google.com> Cc: Michel Lespinasse <walken(a)google.com> Cc: Daniel Axtens <dja(a)axtens.net> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Akash Goel <akash.goel(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/relay.c | 1 + 1 file changed, 1 insertion(+) --- a/kernel/relay.c~kernel-relayc-fix-memleak-on-destroy-relay-channel +++ a/kernel/relay.c @@ -197,6 +197,7 @@ free_buf: static void relay_destroy_channel(struct kref *kref) { struct rchan *chan = container_of(kref, struct rchan, kref); + free_percpu(chan->buf); kfree(chan); } _

5 years, 3 months

1
0
0 0

[patch 06/11] romfs: fix uninitialized memory leak in romfs_dev_read()

by Andrew Morton

From: Jann Horn <jannh(a)google.com> Subject: romfs: fix uninitialized memory leak in romfs_dev_read() romfs has a superblock field that limits the size of the filesystem; data beyond that limit is never accessed. romfs_dev_read() fetches a caller-supplied number of bytes from the backing device. It returns 0 on success or an error code on failure; therefore, its API can't represent short reads, it's all-or-nothing. However, when romfs_dev_read() detects that the requested operation would cross the filesystem size limit, it currently silently truncates the requested number of bytes. This e.g. means that when the content of a file with size 0x1000 starts one byte before the filesystem size limit, ->readpage() will only fill a single byte of the supplied page while leaving the rest uninitialized, leaking that uninitialized memory to userspace. Fix it by returning an error code instead of truncating the read when the requested read operation would go beyond the end of the filesystem. Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.com Fixes: da4458bda237 ("NOMMU: Make it possible for RomFS to use MTD devices directly") Signed-off-by: Jann Horn <jannh(a)google.com> Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: David Howells <dhowells(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/romfs/storage.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/fs/romfs/storage.c~romfs-fix-uninitialized-memory-leak-in-romfs_dev_read +++ a/fs/romfs/storage.c @@ -217,10 +217,8 @@ int romfs_dev_read(struct super_block *s size_t limit; limit = romfs_maxsize(sb); - if (pos >= limit) + if (pos >= limit || buflen > limit - pos) return -EIO; - if (buflen > limit - pos) - buflen = limit - pos; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) _

5 years, 3 months

1
0
0 0

[patch 04/11] mm/vunmap: add cond_resched() in vunmap_pmd_range

by Andrew Morton

From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com> Subject: mm/vunmap: add cond_resched() in vunmap_pmd_range Like zap_pte_range add cond_resched so that we can avoid softlockups as reported below. On non-preemptible kernel with large I/O map region (like the one we get when using persistent memory with sector mode), an unmap of the namespace can report below softlockups. 22724.027334] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [ndctl:50777] NIP [c0000000000dc224] plpar_hcall+0x38/0x58 LR [c0000000000d8898] pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: [c0000004e87a7780] [c0000004fb197c00] 0xc0000004fb197c00 (unreliable) [c0000004e87a7810] [c00000000007f4e4] flush_hash_page+0x114/0x200 [c0000004e87a7890] [c0000000000833cc] hpte_need_flush+0x2dc/0x540 [c0000004e87a7950] [c0000000003f5798] vunmap_page_range+0x538/0x6f0 [c0000004e87a7a70] [c0000000003f76d0] free_unmap_vmap_area+0x30/0x70 [c0000004e87a7aa0] [c0000000003f7a6c] remove_vm_area+0xfc/0x140 [c0000004e87a7ad0] [c0000000003f7dd8] __vunmap+0x68/0x270 [c0000004e87a7b50] [c000000000079de4] __iounmap.part.0+0x34/0x60 [c0000004e87a7bb0] [c000000000376394] memunmap+0x54/0x70 [c0000004e87a7bd0] [c000000000881d7c] release_nodes+0x28c/0x300 [c0000004e87a7c40] [c00000000087a65c] device_release_driver_internal+0x16c/0x280 [c0000004e87a7c80] [c000000000876fc4] unbind_store+0x124/0x170 [c0000004e87a7cd0] [c000000000875be4] drv_attr_store+0x44/0x60 [c0000004e87a7cf0] [c00000000057c734] sysfs_kf_write+0x64/0x90 [c0000004e87a7d10] [c00000000057bc10] kernfs_fop_write+0x1b0/0x290 [c0000004e87a7d60] [c000000000488e6c] __vfs_write+0x3c/0x70 [c0000004e87a7d80] [c00000000048c868] vfs_write+0xd8/0x260 [c0000004e87a7dd0] [c00000000048ccac] ksys_write+0xdc/0x130 [c0000004e87a7e20] [c00000000000b588] system_call+0x5c/0x70 Link: http://lkml.kernel.org/r/20200807075933.310240-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com> Reported-by: Harish Sriram <harish(a)linux.ibm.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmalloc.c | 2 ++ 1 file changed, 2 insertions(+) --- a/mm/vmalloc.c~mm-vunmap-add-cond_resched-in-vunmap_pmd_range +++ a/mm/vmalloc.c @@ -104,6 +104,8 @@ static void vunmap_pmd_range(pud_t *pud, if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); + + cond_resched(); } while (pmd++, addr = next, addr != end); } _

5 years, 3 months

1
0
0 0

[patch 03/11] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()

by Andrew Morton

From: Hugh Dickins <hughd(a)google.com> Subject: khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in __khugepaged_enter(): yes, when one thread is about to dump core, has set core_state, and is waiting for others, another might do something calling __khugepaged_enter(), which now crashes because I lumped the core_state test (known as "mmget_still_valid") into khugepaged_test_exit(). I still think it's best to lump them together, so just in this exceptional case, check mm->mm_users directly instead of khugepaged_test_exit(). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvils Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()") Signed-off-by: Hugh Dickins <hughd(a)google.com> Reported-by: syzbot <syzkaller(a)googlegroups.com> Acked-by: Yang Shi <shy828301(a)gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Song Liu <songliubraving(a)fb.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Eric Dumazet <edumazet(a)google.com> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/khugepaged.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/khugepaged.c~khugepaged-adjust-vm_bug_on_mm-in-__khugepaged_enter +++ a/mm/khugepaged.c @@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct return -ENOMEM; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return 0; _

5 years, 3 months

1
0
0 0

+ mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake has been added to the -mm tree. Its filename is mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-fix-allocating-cluste… and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-fix-allocating-cluste… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Gao Xiang <hsiangkao(a)redhat.com> Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS. So, !SWP_FS means non NFS for now, it could be either file backed or device backed. Something similar goes with legacy SWP_FILE. So in order to achieve the goal of the original patch, SWP_BLKDEV should be used instead. FS corruption can be observed with SSD device + XFS + fragmented swapfile due to CONFIG_THP_SWAP=y. I reproduced the issue with the following details: Environment: QEMU + upstream kernel + buildroot + NVMe (2 GB) Kernel config: CONFIG_BLK_DEV_NVME=y CONFIG_THP_SWAP=y Some reproducable steps: mkfs.xfs -f /dev/nvme0n1 mkdir /tmp/mnt mount /dev/nvme0n1 /tmp/mnt bs="32k" sz="1024m" # doesn't matter too much, I also tried 16m xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw mkswap /tmp/mnt/sw swapon /tmp/mnt/sw stress --vm 2 --vm-bytes 600M # doesn't matter too much as well Symptoms: - FS corruption (e.g. checksum failure) - memory corruption at: 0xd2808010 - segfault Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") Signed-off-by: Gao Xiang <hsiangkao(a)redhat.com> Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Acked-by: Rafael Aquini <aquini(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Carlos Maiolino <cmaiolino(a)redhat.com> Cc: Eric Sandeen <esandeen(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/swapfile.c~mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake +++ a/mm/swapfile.c @@ -1078,7 +1078,7 @@ start_over: goto nextsi; } if (size == SWAPFILE_CLUSTER) { - if (!(si->flags & SWP_FS)) + if (si->flags & SWP_BLKDEV) n_ret = swap_alloc_cluster(si, swp_entries); } else n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, _ Patches currently in -mm which might be from hsiangkao(a)redhat.com are mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch

5 years, 3 months

1
0
0 0

FAILED: patch "[PATCH] mm/hugetlb: fix calculation of" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 75802ca66354a39ab8e35822747cd08b3384a99a Mon Sep 17 00:00:00 2001 From: Peter Xu <peterx(a)redhat.com> Date: Thu, 6 Aug 2020 23:26:11 -0700 Subject: [PATCH] mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible This is found by code observation only. Firstly, the worst case scenario should assume the whole range was covered by pmd sharing. The old algorithm might not work as expected for ranges like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the expected range should be (0, 2g). Since at it, remove the loop since it should not be required. With that, the new code should be faster too when the invalidating range is huge. Mike said: : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only : adjust to (0, 1g+2m) which is incorrect. : : We should cc stable. The original reason for adjusting the range was to : prevent data corruption (getting wrong page). Since the range is not : always adjusted correctly, the potential for corruption still exists. : : However, I am fairly confident that adjust_range_if_pmd_sharing_possible : is only gong to be called in two cases: : : 1) for a single page : 2) for range == entire vma : : In those cases, the current code should produce the correct results. : : To be safe, let's just cc stable. Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages") Signed-off-by: Peter Xu <peterx(a)redhat.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 27556d4d49fe..e52c878940bb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5314,25 +5314,21 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { - unsigned long check_addr; + unsigned long a_start, a_end; if (!(vma->vm_flags & VM_MAYSHARE)) return; - for (check_addr = *start; check_addr < *end; check_addr += PUD_SIZE) { - unsigned long a_start = check_addr & PUD_MASK; - unsigned long a_end = a_start + PUD_SIZE; + /* Extend the range to be PUD aligned for a worst case scenario */ + a_start = ALIGN_DOWN(*start, PUD_SIZE); + a_end = ALIGN(*end, PUD_SIZE); - /* - * If sharing is possible, adjust start/end if necessary. - */ - if (range_in_vma(vma, a_start, a_end)) { - if (a_start < *start) - *start = a_start; - if (a_end > *end) - *end = a_end; - } - } + /* + * Intersect the range with the vma range, since pmd sharing won't be + * across vma after all + */ + *start = max(vma->vm_start, a_start); + *end = min(vma->vm_end, a_end); } /*

5 years, 3 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror