June 2021 - Linux-stable-mirror

[merged] mm-debug_vm_pgtable-fix-alignment-for-pmd-pud_advanced_tests.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() has been removed from the -mm tree. Its filename was mm-debug_vm_pgtable-fix-alignment-for-pmd-pud_advanced_tests.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> Subject: mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud entry, and so it does not match the given pmdp/pudp and (aligned down) pfn any more. For s390, this results in memory corruption, because the IDTE instruction used e.g. in xxx_get_and_clear() will take the vaddr for some calculations, in combination with the given pmdp. It will then end up with a wrong table origin, ending on ...ff8, and some of those wrongly set low-order bits will also select a wrong pagetable level for the index addition. IDTE could therefore invalidate (or 0x20) something outside of the page tables, depending on the wrongly picked index, which in turn depends on the random vaddr. As result, we sometimes see "BUG task_struct (Not tainted): Padding overwritten" on s390, where one 0x5a padding value got overwritten with 0x7a. Fix this by aligning down, similar to how the pmd/pud_aligned pfns are calculated. Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm… Fixes: a5c3b9ffb0f40 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers") Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Vineet Gupta <vgupta(a)synopsys.com> Cc: Palmer Dabbelt <palmer(a)dabbelt.com> Cc: Paul Walmsley <paul.walmsley(a)sifive.com> Cc: <stable(a)vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/debug_vm_pgtable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-fix-alignment-for-pmd-pud_advanced_tests +++ a/mm/debug_vm_pgtable.c @@ -192,7 +192,7 @@ static void __init pmd_advanced_tests(st pr_debug("Validating PMD advanced\n"); /* Align the address wrt HPAGE_PMD_SIZE */ - vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE; + vaddr &= HPAGE_PMD_MASK; pgtable_trans_huge_deposit(mm, pmdp, pgtable); @@ -330,7 +330,7 @@ static void __init pud_advanced_tests(st pr_debug("Validating PUD advanced\n"); /* Align the address wrt HPAGE_PUD_SIZE */ - vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE; + vaddr &= HPAGE_PUD_MASK; set_pud_at(mm, vaddr, pudp, pud); pudp_set_wrprotect(mm, vaddr, pudp); _ Patches currently in -mm which might be from gerald.schaefer(a)linux.ibm.com are

4 years, 6 months

1
0
0 0

[merged] pid-take-a-reference-when-initializing-cad_pid.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: pid: take a reference when initializing `cad_pid` has been removed from the -mm tree. Its filename was pid-take-a-reference-when-initializing-cad_pid.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mark Rutland <mark.rutland(a)arm.com> Subject: pid: take a reference when initializing `cad_pid` During boot, kernel_init_freeable() initializes `cad_pid` to the init task's struct pid. Later on, we may change `cad_pid` via a sysctl, and when this happens proc_do_cad_pid() will increment the refcount on the new pid via get_pid(), and will decrement the refcount on the old pid via put_pid(). As we never called get_pid() when we initialized `cad_pid`, we decrement a reference we never incremented, can therefore free the init task's struct pid early. As there can be dangling references to the struct pid, we can later encounter a use-after-free (e.g. when delivering signals). This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have been around since the conversion of `cad_pid` to struct pid in commit: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") ... from the pre-KASAN stone age of v2.6.19. Fix this by getting a reference to the init task's struct pid when we assign it to `cad_pid`. Full KASAN splat below. ================================================================== BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline] BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273 CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x4a8 arch/arm64/kernel/stacktrace.c:105 show_stack+0x34/0x48 arch/arm64/kernel/stacktrace.c:191 __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x1d4/0x2a0 lib/dump_stack.c:120 print_address_description.constprop.11+0x60/0x3a8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report+0x1e8/0x200 mm/kasan/report.c:416 __asan_report_load4_noabort+0x30/0x48 mm/kasan/report_generic.c:308 ns_of_pid include/linux/pid.h:153 [inline] task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 do_notify_parent+0x308/0xe60 kernel/signal.c:1950 exit_notify kernel/exit.c:682 [inline] do_exit+0x2334/0x2bd0 kernel/exit.c:845 do_group_exit+0x108/0x2c8 kernel/exit.c:922 get_signal+0x4e4/0x2a88 kernel/signal.c:2781 do_signal arch/arm64/kernel/signal.c:882 [inline] do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936 work_pending+0xc/0x2dc Allocated by task 0: kasan_save_stack+0x28/0x58 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] __kasan_slab_alloc+0x88/0xa8 mm/kasan/common.c:460 kasan_slab_alloc include/linux/kasan.h:223 [inline] slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2915 [inline] kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920 alloc_pid+0xdc/0xc00 kernel/pid.c:180 copy_process+0x2794/0x5e18 kernel/fork.c:2129 kernel_clone+0x194/0x13c8 kernel/fork.c:2500 kernel_thread+0xd4/0x110 kernel/fork.c:2552 rest_init+0x44/0x4a0 init/main.c:687 arch_call_rest_init+0x1c/0x28 start_kernel+0x520/0x554 init/main.c:1064 0x0 Freed by task 270: kasan_save_stack+0x28/0x58 mm/kasan/common.c:38 kasan_set_track+0x28/0x40 mm/kasan/common.c:46 kasan_set_free_info+0x28/0x50 mm/kasan/generic.c:357 ____kasan_slab_free mm/kasan/common.c:360 [inline] ____kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xf4/0x148 mm/kasan/common.c:367 kasan_slab_free include/linux/kasan.h:199 [inline] slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kmem_cache_free+0x224/0x8e0 mm/slub.c:3177 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114 put_pid+0x30/0x48 kernel/pid.c:109 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617 call_write_iter include/linux/fs.h:1977 [inline] new_sync_write+0x3ac/0x510 fs/read_write.c:518 vfs_write fs/read_write.c:605 [inline] vfs_write+0x9c4/0x1018 fs/read_write.c:585 ksys_write+0x124/0x240 fs/read_write.c:658 __do_sys_write fs/read_write.c:670 [inline] __se_sys_write fs/read_write.c:667 [inline] __arm64_sys_write+0x78/0xb0 fs/read_write.c:667 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall arch/arm64/kernel/syscall.c:49 [inline] el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701 The buggy address belongs to the object at ffff23794dda0000 which belongs to the cache pid of size 224 The buggy address is located 4 bytes inside of 224-byte region [ffff23794dda0000, ffff23794dda00e0) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0 head:(____ptrval____) order:1 compound_mapcount:0 flags: 0x3fffc0000010200(slab|head) raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080 raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") Signed-off-by: Mark Rutland <mark.rutland(a)arm.com> Acked-by: Christian Brauner <christian.brauner(a)ubuntu.com> Cc: Cedric Le Goater <clg(a)fr.ibm.com> Cc: Christian Brauner <christian(a)brauner.io> Cc: Eric W. Biederman <ebiederm(a)xmission.com> Cc: Kees Cook <keescook(a)chromium.org Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Cc: Paul Mackerras <paulus(a)samba.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/init/main.c~pid-take-a-reference-when-initializing-cad_pid +++ a/init/main.c @@ -1537,7 +1537,7 @@ static noinline void __init kernel_init_ */ set_mems_allowed(node_states[N_MEMORY]); - cad_pid = task_pid(current); + cad_pid = get_pid(task_pid(current)); smp_prepare_cpus(setup_max_cpus); _ Patches currently in -mm which might be from mark.rutland(a)arm.com are

4 years, 6 months

1
0
0 0

[merged] kfence-use-task_idle-when-awaiting-allocation.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: kfence: use TASK_IDLE when awaiting allocation has been removed from the -mm tree. Its filename was kfence-use-task_idle-when-awaiting-allocation.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Marco Elver <elver(a)google.com> Subject: kfence: use TASK_IDLE when awaiting allocation Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an allocation counts towards load. However, for KFENCE, this does not make any sense, since there is no busy work we're awaiting. Instead, use TASK_IDLE via wait_event_idle() to not count towards load. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565 Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com Fixes: 407f1d8c1b5f ("kfence: await for allocation using wait_event") Signed-off-by: Marco Elver <elver(a)google.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Alexander Potapenko <glider(a)google.com> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: David Laight <David.Laight(a)ACULAB.COM> Cc: Hillf Danton <hdanton(a)sina.com> Cc: <stable(a)vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/kfence/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/kfence/core.c~kfence-use-task_idle-when-awaiting-allocation +++ a/mm/kfence/core.c @@ -627,10 +627,10 @@ static void toggle_allocation_gate(struc * During low activity with no allocations we might wait a * while; let's avoid the hung task warning. */ - wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), - sysctl_hung_task_timeout_secs * HZ / 2); + wait_event_idle_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), + sysctl_hung_task_timeout_secs * HZ / 2); } else { - wait_event(allocation_wait, atomic_read(&kfence_allocation_gate)); + wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate)); } /* Disable static key and reset timer. */ _ Patches currently in -mm which might be from elver(a)google.com are mm-slub-change-run-time-assertion-in-kmalloc_index-to-compile-time-fix.patch printk-introduce-dump_stack_lvl-fix.patch kfence-unconditionally-use-unbound-work-queue.patch kcov-add-__no_sanitize_coverage-to-fix-noinstr-for-all-architectures.patch kcov-add-__no_sanitize_coverage-to-fix-noinstr-for-all-architectures-v2.patch kcov-add-__no_sanitize_coverage-to-fix-noinstr-for-all-architectures-v3.patch

4 years, 6 months

1
0
0 0

[merged] revert-mips-make-userspace-mapping-young-by-default.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: Revert "MIPS: make userspace mapping young by default" has been removed from the -mm tree. Its filename was revert-mips-make-userspace-mapping-young-by-default.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de> Subject: Revert "MIPS: make userspace mapping young by default" This reverts commit f685a533a7fab35c5d069dcd663f59c8e4171a75. MIPS cache flush logic needs to know whether the mapping was already established to decide how to flush caches. This is done by checking the valid bit in the PTE. The commit above breaks this logic by setting the valid in the PTE in new mappings, which causes kernel crashes. Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de Fixes: f685a533a7f ("MIPS: make userspace mapping young by default") Reported-by: Zhou Yanjie <zhouyanjie(a)wanyeetech.com> Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de> Cc: Huang Pei <huangpei(a)loongson.cn> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/mips/mm/cache.c | 30 ++++++++++++++---------------- include/linux/pgtable.h | 8 ++++++++ mm/memory.c | 4 ++++ 3 files changed, 26 insertions(+), 16 deletions(-) --- a/arch/mips/mm/cache.c~revert-mips-make-userspace-mapping-young-by-default +++ a/arch/mips/mm/cache.c @@ -158,31 +158,29 @@ unsigned long _page_cachable_default; EXPORT_SYMBOL(_page_cachable_default); #define PM(p) __pgprot(_page_cachable_default | (p)) -#define PVA(p) PM(_PAGE_VALID | _PAGE_ACCESSED | (p)) static inline void setup_protection_map(void) { protection_map[0] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); - protection_map[1] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC); - protection_map[2] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); - protection_map[3] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC); - protection_map[4] = PVA(_PAGE_PRESENT); - protection_map[5] = PVA(_PAGE_PRESENT); - protection_map[6] = PVA(_PAGE_PRESENT); - protection_map[7] = PVA(_PAGE_PRESENT); + protection_map[1] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC); + protection_map[2] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); + protection_map[3] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC); + protection_map[4] = PM(_PAGE_PRESENT); + protection_map[5] = PM(_PAGE_PRESENT); + protection_map[6] = PM(_PAGE_PRESENT); + protection_map[7] = PM(_PAGE_PRESENT); protection_map[8] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); - protection_map[9] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC); - protection_map[10] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE | + protection_map[9] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC); + protection_map[10] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE | _PAGE_NO_READ); - protection_map[11] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE); - protection_map[12] = PVA(_PAGE_PRESENT); - protection_map[13] = PVA(_PAGE_PRESENT); - protection_map[14] = PVA(_PAGE_PRESENT); - protection_map[15] = PVA(_PAGE_PRESENT); + protection_map[11] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE); + protection_map[12] = PM(_PAGE_PRESENT); + protection_map[13] = PM(_PAGE_PRESENT); + protection_map[14] = PM(_PAGE_PRESENT | _PAGE_WRITE); + protection_map[15] = PM(_PAGE_PRESENT | _PAGE_WRITE); } -#undef _PVA #undef PM void cpu_cache_init(void) --- a/include/linux/pgtable.h~revert-mips-make-userspace-mapping-young-by-default +++ a/include/linux/pgtable.h @@ -432,6 +432,14 @@ static inline void ptep_set_wrprotect(st * To be differentiate with macro pte_mkyoung, this macro is used on platforms * where software maintains page access bit. */ +#ifndef pte_sw_mkyoung +static inline pte_t pte_sw_mkyoung(pte_t pte) +{ + return pte; +} +#define pte_sw_mkyoung pte_sw_mkyoung +#endif + #ifndef pte_savedwrite #define pte_savedwrite pte_write #endif --- a/mm/memory.c~revert-mips-make-userspace-mapping-young-by-default +++ a/mm/memory.c @@ -2939,6 +2939,7 @@ static vm_fault_t wp_page_copy(struct vm } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = mk_pte(new_page, vma->vm_page_prot); + entry = pte_sw_mkyoung(entry); entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* @@ -3602,6 +3603,7 @@ static vm_fault_t do_anonymous_page(stru __SetPageUptodate(page); entry = mk_pte(page, vma->vm_page_prot); + entry = pte_sw_mkyoung(entry); if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); @@ -3786,6 +3788,8 @@ void do_set_pte(struct vm_fault *vmf, st if (prefault && arch_wants_old_prefaulted_pte()) entry = pte_mkold(entry); + else + entry = pte_sw_mkyoung(entry); if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); _ Patches currently in -mm which might be from tsbogend(a)alpha.franken.de are

4 years, 6 months

1
0
0 0

[Patch v2] mm/hugetlb: expand restore_reserve_on_error functionality

by Mike Kravetz

The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.… Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Mina Almasry <almasrymina(a)google.com> Cc: <stable(a)vger.kernel.org> --- Changes in v2 Change incorrrect flag name in comment (Mina) and fix up text describing restore_reserve_on_error function. Add Reviewed-by: (Mina) fs/hugetlbfs/inode.c | 1 + include/linux/hugetlb.h | 2 + mm/hugetlb.c | 120 ++++++++++++++++++++++++++++++++-------- 3 files changed, 100 insertions(+), 23 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 21f7a5c92e8a..926eeb9bf4eb 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __SetPageUptodate(page); error = huge_add_to_page_cache(page, mapping, index); if (unlikely(error)) { + restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 44721df20e89..b375b31f60c4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -627,6 +627,8 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9a616b7a8563..36b691c3eabf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2259,12 +2259,18 @@ static void return_unused_surplus_pages(struct hstate *h, * be restored when a newly allocated huge page must be freed. It is * to be called after calling vma_needs_reservation to determine if a * reservation exists. + * + * vma_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to + * be called after calling vma_needs_reservation to determine if a reservation + * exists. */ enum vma_resv_mode { VMA_NEEDS_RESV, VMA_COMMIT_RESV, VMA_END_RESV, VMA_ADD_RESV, + VMA_DEL_RESV, }; static long __vma_reservation_common(struct hstate *h, struct vm_area_struct *vma, unsigned long addr, @@ -2308,11 +2314,21 @@ static long __vma_reservation_common(struct hstate *h, ret = region_del(resv, idx, idx + 1); } break; + case VMA_DEL_RESV: + if (vma->vm_flags & VM_MAYSHARE) { + region_abort(resv, idx, idx + 1, 1); + ret = region_del(resv, idx, idx + 1); + } else { + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + /* region_add calls of range 1 should never fail. */ + VM_BUG_ON(ret < 0); + } + break; default: BUG(); } - if (vma->vm_flags & VM_MAYSHARE) + if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2360,25 +2376,39 @@ static long vma_add_reservation(struct hstate *h, return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); } +static long vma_del_reservation(struct hstate *h, + struct vm_area_struct *vma, unsigned long addr) +{ + return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); +} + /* - * This routine is called to restore a reservation on error paths. In the - * specific error paths, a huge page was allocated (via alloc_huge_page) - * and is about to be freed. If a reservation for the page existed, - * alloc_huge_page would have consumed the reservation and set - * HPageRestoreReserve in the newly allocated page. When the page is freed - * via free_huge_page, the global reservation count will be incremented if - * HPageRestoreReserve is set. However, free_huge_page can not adjust the - * reserve map. Adjust the reserve map here to be consistent with global - * reserve count adjustments to be made by free_huge_page. + * This routine is called to restore reservation information on error paths. + * It should ONLY be called for pages allocated via alloc_huge_page(), and + * the hugetlb mutex should remain held when calling this routine. + * + * It handles two specific cases: + * 1) A reservation was in place and page consumed the reservation. + * HPageRestoreRsvCnt is set in the page. + * 2) No reservation was in place for the page, so HPageRestoreRsvCnt is + * not set. However, alloc_huge_page always updates the reserve map. + * + * In case 1, free_huge_page will increment the global reserve count. But, + * free_huge_page does not have enough context to adjust the reservation map. + * This case deals primarily with private mappings. Adjust the reserve map + * here to be consistent with global reserve count adjustments to be made + * by free_huge_page. Make sure the reserve map indicates there is a + * reservation present. + * + * In case 2, simply undo reserve map modifications done by alloc_huge_page. */ -static void restore_reserve_on_error(struct hstate *h, - struct vm_area_struct *vma, unsigned long address, - struct page *page) +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page) { - if (unlikely(HPageRestoreReserve(page))) { - long rc = vma_needs_reservation(h, vma, address); + long rc = vma_needs_reservation(h, vma, address); - if (unlikely(rc < 0)) { + if (HPageRestoreReserve(page)) { + if (unlikely(rc < 0)) /* * Rare out of memory condition in reserve map * manipulation. Clear HPageRestoreReserve so that @@ -2391,16 +2421,57 @@ static void restore_reserve_on_error(struct hstate *h, * accounting of reserve counts. */ ClearHPageRestoreReserve(page); - } else if (rc) { - rc = vma_add_reservation(h, vma, address); - if (unlikely(rc < 0)) + else if (rc) + (void)vma_add_reservation(h, vma, address); + else + vma_end_reservation(h, vma, address); + } else { + if (!rc) { + /* + * This indicates there is an entry in the reserve map + * added by alloc_huge_page. We know it was added + * before the alloc_huge_page call, otherwise + * HPageRestoreReserve would be set on the page. + * Remove the entry so that a subsequent allocation + * does not consume a reservation. + */ + rc = vma_del_reservation(h, vma, address); + if (rc < 0) /* - * See above comment about rare out of - * memory condition. + * VERY rare out of memory condition. Since + * we can not delete the entry, set + * HPageRestoreReserve so that the reserve + * count will be incremented when the page + * is freed. This reserve will be consumed + * on a subsequent allocation. */ - ClearHPageRestoreReserve(page); + SetHPageRestoreReserve(page); + } else if (rc < 0) { + /* + * Rare out of memory condition from + * vma_needs_reservation call. Memory allocation is + * only attempted if a new entry is needed. Therefore, + * this implies there is not an entry in the + * reserve map. + * + * For shared mappings, no entry in the map indicates + * no reservation. We are done. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + /* + * For private mappings, no entry indicates + * a reservation is present. Since we can + * not add an entry, set SetHPageRestoreReserve + * on the page so reserve count will be + * incremented when freed. This reserve will + * be consumed on a subsequent allocation. + */ + SetHPageRestoreReserve(page); } else - vma_end_reservation(h, vma, address); + /* + * No reservation present, do nothing + */ + vma_end_reservation(h, vma, address); } } @@ -4176,6 +4247,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { + restore_reserve_on_error(h, vma, addr, + new); put_page(new); /* dst_entry won't change as in child */ goto again; @@ -5174,6 +5247,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; } -- 2.31.1

4 years, 6 months

1
1
0 0

[Patch v2] block: return the correct bvec when checking for gaps

by longli＠linuxonhyperv.com

From: Long Li <longli(a)microsoft.com> After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can have multiple pages. But bio_will_gap() still assumes one page bvec while checking for merging. If the pages in the bvec go across the seg_boundary_mask, this check for merging can potentially succeed if only the 1st page is tested, and can fail if all the pages are tested. Later, when SCSI builds the SG list the same check for merging is done in __blk_segment_map_sg_merge() with all the pages in the bvec tested. This time the check may fail if the pages in bvec go across the seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those BIOs were merged). If this check fails, we end up with a broken SG list for drivers assuming the SG list not having offsets in intermediate pages. This results in incorrect pages written to the disk. Fix this by returning the multi-page bvec when testing gaps for merging. Cc: Jens Axboe <axboe(a)kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn(a)wdc.com> Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: Ming Lei <ming.lei(a)redhat.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org> Cc: Jeffle Xu <jefflexu(a)linux.alibaba.com> Cc: linux-kernel(a)vger.kernel.org Cc: stable(a)vger.kernel.org Fixes: 07173c3ec276 ("block: enable multipage bvecs") Signed-off-by: Long Li <longli(a)microsoft.com> --- Change from v1: add commit details on how data corruption happens include/linux/bio.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index a0b4cfdf62a4..6b2f609ccfbf 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -44,9 +44,6 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs) #define bio_offset(bio) bio_iter_offset((bio), (bio)->bi_iter) #define bio_iovec(bio) bio_iter_iovec((bio), (bio)->bi_iter) -#define bio_multiple_segments(bio) \ - ((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len) - #define bvec_iter_sectors(iter) ((iter).bi_size >> 9) #define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter))) @@ -271,7 +268,7 @@ static inline void bio_clear_flag(struct bio *bio, unsigned int bit) static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv) { - *bv = bio_iovec(bio); + *bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); } static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv) @@ -279,10 +276,10 @@ static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv) struct bvec_iter iter = bio->bi_iter; int idx; - if (unlikely(!bio_multiple_segments(bio))) { - *bv = bio_iovec(bio); + /* this bio has only one bvec */ + *bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + if (bv->bv_len == bio->bi_iter.bi_size) return; - } bio_advance_iter(bio, &iter, iter.bi_size); -- 2.17.1

4 years, 6 months

4
3
0 0

[PATCH AUTOSEL 4.4 01/14] HID: hid-sensor-hub: Return error for hid_set_field() failure

by Sasha Levin

From: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com> [ Upstream commit edb032033da0dc850f6e7740fa1023c73195bc89 ] In the function sensor_hub_set_feature(), return error when hid_set_field() fails. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Signed-off-by: Jiri Kosina <jkosina(a)suse.cz> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/hid/hid-sensor-hub.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/hid/hid-sensor-hub.c b/drivers/hid/hid-sensor-hub.c index 83e45d5801a9..ce4e9b34af98 100644 --- a/drivers/hid/hid-sensor-hub.c +++ b/drivers/hid/hid-sensor-hub.c @@ -222,16 +222,21 @@ int sensor_hub_set_feature(struct hid_sensor_hub_device *hsdev, u32 report_id, buffer_size = buffer_size / sizeof(__s32); if (buffer_size) { for (i = 0; i < buffer_size; ++i) { - hid_set_field(report->field[field_index], i, - (__force __s32)cpu_to_le32(*buf32)); + ret = hid_set_field(report->field[field_index], i, + (__force __s32)cpu_to_le32(*buf32)); + if (ret) + goto done_proc; + ++buf32; } } if (remaining_bytes) { value = 0; memcpy(&value, (u8 *)buf32, remaining_bytes); - hid_set_field(report->field[field_index], i, - (__force __s32)cpu_to_le32(value)); + ret = hid_set_field(report->field[field_index], i, + (__force __s32)cpu_to_le32(value)); + if (ret) + goto done_proc; } hid_hw_request(hsdev->hdev, report, HID_REQ_SET_REPORT); hid_hw_wait(hsdev->hdev); -- 2.30.2

4 years, 6 months

1
13
0 0

[PATCH AUTOSEL 4.9 01/15] net: ieee802154: fix null deref in parse dev addr

by Sasha Levin

From: Dan Robertson <dan(a)dlrobertson.com> [ Upstream commit 9fdd04918a452980631ecc499317881c1d120b70 ] Fix a logic error that could result in a null deref if the user sets the mode incorrectly for the given addr type. Signed-off-by: Dan Robertson <dan(a)dlrobertson.com> Acked-by: Alexander Aring <aahringo(a)redhat.com> Link: https://lore.kernel.org/r/20210423040214.15438-2-dan@dlrobertson.com Signed-off-by: Stefan Schmidt <stefan(a)datenfreihafen.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/ieee802154/nl802154.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index cfc01314958f..936371340dc3 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -1330,19 +1330,20 @@ ieee802154_llsec_parse_dev_addr(struct nlattr *nla, nl802154_dev_addr_policy)) return -EINVAL; - if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] || - !attrs[NL802154_DEV_ADDR_ATTR_MODE] || - !(attrs[NL802154_DEV_ADDR_ATTR_SHORT] || - attrs[NL802154_DEV_ADDR_ATTR_EXTENDED])) + if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] || !attrs[NL802154_DEV_ADDR_ATTR_MODE]) return -EINVAL; addr->pan_id = nla_get_le16(attrs[NL802154_DEV_ADDR_ATTR_PAN_ID]); addr->mode = nla_get_u32(attrs[NL802154_DEV_ADDR_ATTR_MODE]); switch (addr->mode) { case NL802154_DEV_ADDR_SHORT: + if (!attrs[NL802154_DEV_ADDR_ATTR_SHORT]) + return -EINVAL; addr->short_addr = nla_get_le16(attrs[NL802154_DEV_ADDR_ATTR_SHORT]); break; case NL802154_DEV_ADDR_EXTENDED: + if (!attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]) + return -EINVAL; addr->extended_addr = nla_get_le64(attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]); break; default: -- 2.30.2

4 years, 6 months

1
14
0 0

[PATCH AUTOSEL 4.14 01/18] net: ieee802154: fix null deref in parse dev addr

by Sasha Levin

From: Dan Robertson <dan(a)dlrobertson.com> [ Upstream commit 9fdd04918a452980631ecc499317881c1d120b70 ] Fix a logic error that could result in a null deref if the user sets the mode incorrectly for the given addr type. Signed-off-by: Dan Robertson <dan(a)dlrobertson.com> Acked-by: Alexander Aring <aahringo(a)redhat.com> Link: https://lore.kernel.org/r/20210423040214.15438-2-dan@dlrobertson.com Signed-off-by: Stefan Schmidt <stefan(a)datenfreihafen.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/ieee802154/nl802154.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index b1c55db73764..6d4c71a52b6b 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -1315,19 +1315,20 @@ ieee802154_llsec_parse_dev_addr(struct nlattr *nla, nl802154_dev_addr_policy, NULL)) return -EINVAL; - if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] || - !attrs[NL802154_DEV_ADDR_ATTR_MODE] || - !(attrs[NL802154_DEV_ADDR_ATTR_SHORT] || - attrs[NL802154_DEV_ADDR_ATTR_EXTENDED])) + if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] || !attrs[NL802154_DEV_ADDR_ATTR_MODE]) return -EINVAL; addr->pan_id = nla_get_le16(attrs[NL802154_DEV_ADDR_ATTR_PAN_ID]); addr->mode = nla_get_u32(attrs[NL802154_DEV_ADDR_ATTR_MODE]); switch (addr->mode) { case NL802154_DEV_ADDR_SHORT: + if (!attrs[NL802154_DEV_ADDR_ATTR_SHORT]) + return -EINVAL; addr->short_addr = nla_get_le16(attrs[NL802154_DEV_ADDR_ATTR_SHORT]); break; case NL802154_DEV_ADDR_EXTENDED: + if (!attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]) + return -EINVAL; addr->extended_addr = nla_get_le64(attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]); break; default: -- 2.30.2

4 years, 6 months

1
17
0 0

[PATCH AUTOSEL 4.19 01/21] net: ieee802154: fix null deref in parse dev addr

by Sasha Levin

From: Dan Robertson <dan(a)dlrobertson.com> [ Upstream commit 9fdd04918a452980631ecc499317881c1d120b70 ] Fix a logic error that could result in a null deref if the user sets the mode incorrectly for the given addr type. Signed-off-by: Dan Robertson <dan(a)dlrobertson.com> Acked-by: Alexander Aring <aahringo(a)redhat.com> Link: https://lore.kernel.org/r/20210423040214.15438-2-dan@dlrobertson.com Signed-off-by: Stefan Schmidt <stefan(a)datenfreihafen.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/ieee802154/nl802154.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index b1c55db73764..6d4c71a52b6b 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -1315,19 +1315,20 @@ ieee802154_llsec_parse_dev_addr(struct nlattr *nla, nl802154_dev_addr_policy, NULL)) return -EINVAL; - if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] || - !attrs[NL802154_DEV_ADDR_ATTR_MODE] || - !(attrs[NL802154_DEV_ADDR_ATTR_SHORT] || - attrs[NL802154_DEV_ADDR_ATTR_EXTENDED])) + if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] || !attrs[NL802154_DEV_ADDR_ATTR_MODE]) return -EINVAL; addr->pan_id = nla_get_le16(attrs[NL802154_DEV_ADDR_ATTR_PAN_ID]); addr->mode = nla_get_u32(attrs[NL802154_DEV_ADDR_ATTR_MODE]); switch (addr->mode) { case NL802154_DEV_ADDR_SHORT: + if (!attrs[NL802154_DEV_ADDR_ATTR_SHORT]) + return -EINVAL; addr->short_addr = nla_get_le16(attrs[NL802154_DEV_ADDR_ATTR_SHORT]); break; case NL802154_DEV_ADDR_EXTENDED: + if (!attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]) + return -EINVAL; addr->extended_addr = nla_get_le64(attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]); break; default: -- 2.30.2

4 years, 6 months

1
20
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror June 2021