October 2023 - Linux-stable-mirror

+ hugetlbfs-close-race-between-madv_dontneed-and-page-fault.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: hugetlbfs: close race between MADV_DONTNEED and page fault has been added to the -mm mm-hotfixes-unstable branch. Its filename is hugetlbfs-close-race-between-madv_dontneed-and-page-fault.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Rik van Riel <riel(a)surriel.com> Subject: hugetlbfs: close race between MADV_DONTNEED and page fault Date: Thu, 5 Oct 2023 23:59:08 -0400 Malloc libraries, like jemalloc and tcalloc, take decisions on when to call madvise independently from the code in the main application. This sometimes results in the application page faulting on an address, right after the malloc library has shot down the backing memory with MADV_DONTNEED. Usually this is harmless, because we always have some 4kB pages sitting around to satisfy a page fault. However, with hugetlbfs systems often allocate only the exact number of huge pages that the application wants. Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of any lock taken on the page fault path, which can open up the following race condition: CPU 1 CPU 2 MADV_DONTNEED unmap page shoot down TLB entry page fault fail to allocate a huge page killed with SIGBUS free page Fix that race by pulling the locking from __unmap_hugepage_final_range into helper functions called from zap_page_range_single. This ensures page faults stay locked out of the MADV_DONTNEED VMA until the huge pages have actually been freed. Link: https://lkml.kernel.org/r/20231006040020.3677377-4-riel@surriel.com Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") Signed-off-by: Rik van Riel <riel(a)surriel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/hugetlb.h | 35 +++++++++++++++++++++++++++++++++-- mm/hugetlb.c | 34 ++++++++++++++++++++++------------ mm/memory.c | 13 ++++++++----- 3 files changed, 63 insertions(+), 19 deletions(-) --- a/include/linux/hugetlb.h~hugetlbfs-close-race-between-madv_dontneed-and-page-fault +++ a/include/linux/hugetlb.h @@ -139,7 +139,7 @@ struct page *hugetlb_follow_page_mask(st void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, struct page *, zap_flags_t); -void __unmap_hugepage_range_final(struct mmu_gather *tlb, +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page, zap_flags_t zap_flags); @@ -246,6 +246,25 @@ int huge_pmd_unshare(struct mm_struct *m void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end); +extern void __hugetlb_zap_begin(struct vm_area_struct *vma, + unsigned long *begin, unsigned long *end); +extern void __hugetlb_zap_end(struct vm_area_struct *vma, + struct zap_details *details); + +static inline void hugetlb_zap_begin(struct vm_area_struct *vma, + unsigned long *start, unsigned long *end) +{ + if (is_vm_hugetlb_page(vma)) + __hugetlb_zap_begin(vma, start, end); +} + +static inline void hugetlb_zap_end(struct vm_area_struct *vma, + struct zap_details *details) +{ + if (is_vm_hugetlb_page(vma)) + __hugetlb_zap_end(vma, details); +} + void hugetlb_vma_lock_read(struct vm_area_struct *vma); void hugetlb_vma_unlock_read(struct vm_area_struct *vma); void hugetlb_vma_lock_write(struct vm_area_struct *vma); @@ -297,6 +316,18 @@ static inline void adjust_range_if_pmd_s { } +static inline void hugetlb_zap_begin( + struct vm_area_struct *vma, + unsigned long *start, unsigned long *end) +{ +} + +static inline void hugetlb_zap_end( + struct vm_area_struct *vma, + struct zap_details *details) +{ +} + static inline struct page *hugetlb_follow_page_mask( struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask) @@ -442,7 +473,7 @@ static inline long hugetlb_change_protec return 0; } -static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, +static inline void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page, zap_flags_t zap_flags) --- a/mm/hugetlb.c~hugetlbfs-close-race-between-madv_dontneed-and-page-fault +++ a/mm/hugetlb.c @@ -5306,9 +5306,9 @@ int move_hugetlb_page_tables(struct vm_a return len + old_addr - old_end; } -static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, - unsigned long start, unsigned long end, - struct page *ref_page, zap_flags_t zap_flags) +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page, zap_flags_t zap_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -5437,16 +5437,25 @@ static void __unmap_hugepage_range(struc tlb_flush_mmu_tlbonly(tlb); } -void __unmap_hugepage_range_final(struct mmu_gather *tlb, - struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page, - zap_flags_t zap_flags) +void __hugetlb_zap_begin(struct vm_area_struct *vma, + unsigned long *start, unsigned long *end) { + if (!vma->vm_file) /* hugetlbfs_file_mmap error */ + return; + + adjust_range_if_pmd_sharing_possible(vma, start, end); hugetlb_vma_lock_write(vma); - i_mmap_lock_write(vma->vm_file->f_mapping); + if (vma->vm_file) + i_mmap_lock_write(vma->vm_file->f_mapping); +} + +void __hugetlb_zap_end(struct vm_area_struct *vma, + struct zap_details *details) +{ + zap_flags_t zap_flags = details ? details->zap_flags : 0; - /* mmu notification performed in caller */ - __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); + if (!vma->vm_file) /* hugetlbfs_file_mmap error */ + return; if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */ /* @@ -5459,11 +5468,12 @@ void __unmap_hugepage_range_final(struct * someone else. */ __hugetlb_vma_unlock_write_free(vma); - i_mmap_unlock_write(vma->vm_file->f_mapping); } else { - i_mmap_unlock_write(vma->vm_file->f_mapping); hugetlb_vma_unlock_write(vma); } + + if (vma->vm_file) + i_mmap_unlock_write(vma->vm_file->f_mapping); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, --- a/mm/memory.c~hugetlbfs-close-race-between-madv_dontneed-and-page-fault +++ a/mm/memory.c @@ -1683,7 +1683,7 @@ static void unmap_single_vma(struct mmu_ if (vma->vm_file) { zap_flags_t zap_flags = details ? details->zap_flags : 0; - __unmap_hugepage_range_final(tlb, vma, start, end, + __unmap_hugepage_range(tlb, vma, start, end, NULL, zap_flags); } } else @@ -1728,8 +1728,12 @@ void unmap_vmas(struct mmu_gather *tlb, start_addr, end_addr); mmu_notifier_invalidate_range_start(&range); do { - unmap_single_vma(tlb, vma, start_addr, end_addr, &details, + unsigned long start = start_addr; + unsigned long end = end_addr; + hugetlb_zap_begin(vma, &start, &end); + unmap_single_vma(tlb, vma, start, end, &details, mm_wr_locked); + hugetlb_zap_end(vma, &details); } while ((vma = mas_find(mas, tree_end - 1)) != NULL); mmu_notifier_invalidate_range_end(&range); } @@ -1753,9 +1757,7 @@ void zap_page_range_single(struct vm_are lru_add_drain(); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, address, end); - if (is_vm_hugetlb_page(vma)) - adjust_range_if_pmd_sharing_possible(vma, &range.start, - &range.end); + hugetlb_zap_begin(vma, &range.start, &range.end); tlb_gather_mmu(&tlb, vma->vm_mm); update_hiwater_rss(vma->vm_mm); mmu_notifier_invalidate_range_start(&range); @@ -1766,6 +1768,7 @@ void zap_page_range_single(struct vm_are unmap_single_vma(&tlb, vma, address, end, details, false); mmu_notifier_invalidate_range_end(&range); tlb_finish_mmu(&tlb); + hugetlb_zap_end(vma, details); } /** _ Patches currently in -mm which might be from riel(a)surriel.com are hugetlbfs-clear-resv_map-pointer-if-mmap-fails.patch hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas.patch hugetlbfs-close-race-between-madv_dontneed-and-page-fault.patch

1 year, 8 months

1
0
0 0

+ hugetlbfs-clear-resv_map-pointer-if-mmap-fails.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: hugetlbfs: clear resv_map pointer if mmap fails has been added to the -mm mm-hotfixes-unstable branch. Its filename is hugetlbfs-clear-resv_map-pointer-if-mmap-fails.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Rik van Riel <riel(a)surriel.com> Subject: hugetlbfs: clear resv_map pointer if mmap fails Date: Thu, 5 Oct 2023 23:59:06 -0400 Patch series "hugetlbfs: close race between MADV_DONTNEED and page fault", v7. Malloc libraries, like jemalloc and tcalloc, take decisions on when to call madvise independently from the code in the main application. This sometimes results in the application page faulting on an address, right after the malloc library has shot down the backing memory with MADV_DONTNEED. Usually this is harmless, because we always have some 4kB pages sitting around to satisfy a page fault. However, with hugetlbfs systems often allocate only the exact number of huge pages that the application wants. Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of any lock taken on the page fault path, which can open up the following race condition: CPU 1 CPU 2 MADV_DONTNEED unmap page shoot down TLB entry page fault fail to allocate a huge page killed with SIGBUS free page Fix that race by extending the hugetlb_vma_lock locking scheme to also cover private hugetlb mappings (with resv_map), and pulling the locking from __unmap_hugepage_final_range into helper functions called from zap_page_range_single. This ensures page faults stay locked out of the MADV_DONTNEED VMA until the huge pages have actually been freed. This patch (of 3): Hugetlbfs leaves a dangling pointer in the VMA if mmap fails. This has not been a problem so far, but other code in this patch series tries to follow that pointer. Link: https://lkml.kernel.org/r/20231006040020.3677377-1-riel@surriel.com Link: https://lkml.kernel.org/r/20231006040020.3677377-2-riel@surriel.com Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Signed-off-by: Rik van Riel <riel(a)surriel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/mm/hugetlb.c~hugetlbfs-clear-resv_map-pointer-if-mmap-fails +++ a/mm/hugetlb.c @@ -1138,8 +1138,7 @@ static void set_vma_resv_map(struct vm_a VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); VM_BUG_ON_VMA(vma->vm_flags & VM_MAYSHARE, vma); - set_vma_private_data(vma, (get_vma_private_data(vma) & - HPAGE_RESV_MASK) | (unsigned long)map); + set_vma_private_data(vma, (unsigned long)map); } static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags) @@ -6811,8 +6810,10 @@ out_err: */ if (chg >= 0 && add < 0) region_abort(resv_map, from, to, regions_needed); - if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) + if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { kref_put(&resv_map->refs, resv_map_release); + set_vma_resv_map(vma, NULL); + } return false; } _ Patches currently in -mm which might be from riel(a)surriel.com are hugetlbfs-clear-resv_map-pointer-if-mmap-fails.patch hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas.patch hugetlbfs-close-race-between-madv_dontneed-and-page-fault.patch

1 year, 8 months

1
0
0 0

+ hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: hugetlbfs: extend hugetlb_vma_lock to private VMAs has been added to the -mm mm-hotfixes-unstable branch. Its filename is hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Rik van Riel <riel(a)surriel.com> Subject: hugetlbfs: extend hugetlb_vma_lock to private VMAs Date: Thu, 5 Oct 2023 23:59:07 -0400 Extend the locking scheme used to protect shared hugetlb mappings from truncate vs page fault races, in order to protect private hugetlb mappings (with resv_map) against MADV_DONTNEED. Add a read-write semaphore to the resv_map data structure, and use that from the hugetlb_vma_(un)lock_* functions, in preparation for closing the race between MADV_DONTNEED and page faults. Link: https://lkml.kernel.org/r/20231006040020.3677377-3-riel@surriel.com Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") Signed-off-by: Rik van Riel <riel(a)surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/hugetlb.h | 6 +++++ mm/hugetlb.c | 41 ++++++++++++++++++++++++++++++++++---- 2 files changed, 43 insertions(+), 4 deletions(-) --- a/include/linux/hugetlb.h~hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas +++ a/include/linux/hugetlb.h @@ -60,6 +60,7 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; + struct rw_semaphore rw_sema; #ifdef CONFIG_CGROUP_HUGETLB /* * On private mappings, the counter to uncharge reservations is stored @@ -1233,6 +1234,11 @@ static inline bool __vma_shareable_lock( return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; } +static inline bool __vma_private_lock(struct vm_area_struct *vma) +{ + return (!(vma->vm_flags & VM_MAYSHARE)) && vma->vm_private_data; +} + /* * Safe version of huge_pte_offset() to check the locks. See comments * above huge_pte_offset(). --- a/mm/hugetlb.c~hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas +++ a/mm/hugetlb.c @@ -97,6 +97,7 @@ static void hugetlb_vma_lock_alloc(struc static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); +static struct resv_map *vma_resv_map(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -267,6 +268,10 @@ void hugetlb_vma_lock_read(struct vm_are struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + down_read(&resv_map->rw_sema); } } @@ -276,6 +281,10 @@ void hugetlb_vma_unlock_read(struct vm_a struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + up_read(&resv_map->rw_sema); } } @@ -285,6 +294,10 @@ void hugetlb_vma_lock_write(struct vm_ar struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + down_write(&resv_map->rw_sema); } } @@ -294,17 +307,27 @@ void hugetlb_vma_unlock_write(struct vm_ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + up_write(&resv_map->rw_sema); } } int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - if (!__vma_shareable_lock(vma)) - return 1; + if (__vma_shareable_lock(vma)) { + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - return down_write_trylock(&vma_lock->rw_sema); + return down_write_trylock(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + return down_write_trylock(&resv_map->rw_sema); + } + + return 1; } void hugetlb_vma_assert_locked(struct vm_area_struct *vma) @@ -313,6 +336,10 @@ void hugetlb_vma_assert_locked(struct vm struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; lockdep_assert_held(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + lockdep_assert_held(&resv_map->rw_sema); } } @@ -345,6 +372,11 @@ static void __hugetlb_vma_unlock_write_f struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; __hugetlb_vma_unlock_write_put(vma_lock); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + /* no free for anon vmas, but still need to unlock */ + up_write(&resv_map->rw_sema); } } @@ -1068,6 +1100,7 @@ struct resv_map *resv_map_alloc(void) kref_init(&resv_map->refs); spin_lock_init(&resv_map->lock); INIT_LIST_HEAD(&resv_map->regions); + init_rwsem(&resv_map->rw_sema); resv_map->adds_in_progress = 0; /* _ Patches currently in -mm which might be from riel(a)surriel.com are hugetlbfs-clear-resv_map-pointer-if-mmap-fails.patch hugetlbfs-extend-hugetlb_vma_lock-to-private-vmas.patch hugetlbfs-close-race-between-madv_dontneed-and-page-fault.patch

1 year, 8 months

1
0
0 0

+ mm-zswap-fix-pool-refcount-bug-around-shrink_worker.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: zswap: fix pool refcount bug around shrink_worker() has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-zswap-fix-pool-refcount-bug-around-shrink_worker.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Johannes Weiner <hannes(a)cmpxchg.org> Subject: mm: zswap: fix pool refcount bug around shrink_worker() Date: Fri, 6 Oct 2023 12:00:24 -0400 When a zswap store fails due to the limit, it acquires a pool reference and queues the shrinker. When the shrinker runs, it drops the reference. However, there can be multiple store attempts before the shrinker wakes up and runs once. This results in reference leaks and eventual saturation warnings for the pool refcount. Fix this by dropping the reference again when the shrinker is already queued. This ensures one reference per shrinker run. Link: https://lkml.kernel.org/r/20231006160024.170748-1-hannes@cmpxchg.org Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org> Reported-by: Chris Mason <clm(a)fb.com> Cc: Vitaly Wool <vitaly.wool(a)konsulko.com> Cc: Domenico Cerasuolo <cerasuolodomenico(a)gmail.com> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: <stable(a)vger.kernel.org> [5.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/zswap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/zswap.c~mm-zswap-fix-pool-refcount-bug-around-shrink_worker +++ a/mm/zswap.c @@ -1383,8 +1383,8 @@ reject: shrink: pool = zswap_pool_last_get(); - if (pool) - queue_work(shrink_wq, &pool->shrink_work); + if (pool && !queue_work(shrink_wq, &pool->shrink_work)) + zswap_pool_put(pool); goto reject; } _ Patches currently in -mm which might be from hannes(a)cmpxchg.org are mm-zswap-fix-pool-refcount-bug-around-shrink_worker.patch

1 year, 8 months

1
0
0 0

[PATCH] fs/ext4/acl: apply umask if ACL support is disabled

by Max Kellermann

The function ext4_init_acl() calls posix_acl_create() which is responsible for applying the umask. But without CONFIG_EXT4_FS_POSIX_ACL, ext4_init_acl() is an empty inline function, and nobody applies the umask. This fixes a bug which causes the umask to be ignored with O_TMPFILE on ext4: https://github.com/MusicPlayerDaemon/MPD/issues/558 https://bugs.gentoo.org/show_bug.cgi?id=686142#c3 https://bugzilla.kernel.org/show_bug.cgi?id=203625 Reviewed-by: J. Bruce Fields <bfields(a)redhat.com> Cc: stable(a)vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com> --- fs/ext4/acl.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/ext4/acl.h b/fs/ext4/acl.h index 0c5a79c3b5d4..ef4c19e5f570 100644 --- a/fs/ext4/acl.h +++ b/fs/ext4/acl.h @@ -68,6 +68,11 @@ extern int ext4_init_acl(handle_t *, struct inode *, struct inode *); static inline int ext4_init_acl(handle_t *handle, struct inode *inode, struct inode *dir) { + /* usually, the umask is applied by posix_acl_create(), but if + ext4 ACL support is disabled at compile time, we need to do + it here, because posix_acl_create() will never be called */ + inode->i_mode &= ~current_umask(); + return 0; } #endif /* CONFIG_EXT4_FS_POSIX_ACL */ -- 2.39.2

1 year, 8 months

2
1
0 0

[PATCH v4] ext4: fix race between writepages and remount

by Baokun Li

We got a WARNING in ext4_add_complete_io: ================================================================== WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4] [...] Call Trace: <TASK> ext4_end_bio+0xa8/0x240 [ext4] bio_endio+0x195/0x310 blk_update_request+0x184/0x770 scsi_end_request+0x2f/0x240 scsi_io_completion+0x75/0x450 scsi_finish_command+0xef/0x160 scsi_complete+0xa3/0x180 blk_complete_reqs+0x60/0x80 blk_done_softirq+0x25/0x40 __do_softirq+0x119/0x4c8 run_ksoftirqd+0x42/0x70 smpboot_thread_fn+0x136/0x3c0 kthread+0x140/0x1a0 ret_from_fork+0x2c/0x50 ================================================================== Above issue may happen as follows: cpu1 cpu2 ----------------------------|---------------------------- mount -o dioread_lock ext4_writepages ext4_do_writepages *if (ext4_should_dioread_nolock(inode))* // rsv_blocks is not assigned here mount -o remount,dioread_nolock ext4_journal_start_with_reserve __ext4_journal_start __ext4_journal_start_sb jbd2__journal_start *if (rsv_blocks)* // h_rsv_handle is not initialized here mpage_map_and_submit_extent mpage_map_one_extent dioread_nolock = ext4_should_dioread_nolock(inode) if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN)) mpd->io_submit.io_end->handle = handle->h_rsv_handle ext4_set_io_unwritten_flag io_end->flag |= EXT4_IO_END_UNWRITTEN // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag scsi_finish_command scsi_io_completion scsi_io_completion_action scsi_end_request blk_update_request req_bio_endio bio_endio bio->bi_end_io > ext4_end_bio ext4_put_io_end_defer ext4_add_complete_io // trigger WARN_ON(!io_end->handle && sbi->s_journal); The immediate cause of this problem is that ext4_should_dioread_nolock() function returns inconsistent values in the ext4_do_writepages() and mpage_map_one_extent(). There are four conditions in this function that can be changed at mount time to cause this problem. These four conditions can be divided into two categories: (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount The two in the first category have been fixed by commit c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages") and commit cb85f4d23f79 ("ext4: fix race between writepages and enabling EXT4_EXTENTS_FL") respectively. Two cases in the other category have not yet been fixed, and the above issue is caused by this situation. We refer to the fix for the first category, when applying options during remount, we grab s_writepages_rwsem to avoid racing with writepages ops to trigger this problem. Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io") Cc: stable(a)vger.kernel.org Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- V1->V2: Grab s_writepages_rwsem unconditionally during remount. Remove patches 1,2 that are no longer needed. V2->V3: Also grab s_writepages_rwsem when restoring options. V3->V4: Rebased on top of mainline. Reference 00d873c17e29 ("ext4: avoid deadlock in fs reclaim with page writeback") to use s_writepages_rwsem. fs/ext4/ext4.h | 3 ++- fs/ext4/super.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 6948d673bba2..97ef99c7f296 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1613,7 +1613,8 @@ struct ext4_sb_info { /* * Barrier between writepages ops and changing any inode's JOURNAL_DATA - * or EXTENTS flag. + * or EXTENTS flag or between writepages ops and changing DELALLOC or + * DIOREAD_NOLOCK mount options on remount. */ struct percpu_rw_semaphore s_writepages_rwsem; struct dax_device *s_daxdev; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9680fe753e59..fff42682e4e0 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -6389,6 +6389,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb) ext4_group_t g; int err = 0; int enable_rw = 0; + int alloc_ctx; #ifdef CONFIG_QUOTA int enable_quota = 0; int i, j; @@ -6429,7 +6430,16 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb) } + /* + * Changing the DIOREAD_NOLOCK or DELALLOC mount options may cause + * two calls to ext4_should_dioread_nolock() to return inconsistent + * values, triggering WARN_ON in ext4_add_complete_io(). we grab + * here s_writepages_rwsem to avoid race between writepages ops and + * remount. + */ + alloc_ctx = ext4_writepages_down_write(sb); ext4_apply_options(fc, sb); + ext4_writepages_up_write(sb, alloc_ctx); if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^ test_opt(sb, JOURNAL_CHECKSUM)) { @@ -6650,6 +6660,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb) if ((sb->s_flags & SB_RDONLY) && !(old_sb_flags & SB_RDONLY) && sb_any_quota_suspended(sb)) dquot_resume(sb, -1); + + alloc_ctx = ext4_writepages_down_write(sb); sb->s_flags = old_sb_flags; sbi->s_mount_opt = old_opts.s_mount_opt; sbi->s_mount_opt2 = old_opts.s_mount_opt2; @@ -6658,6 +6670,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb) sbi->s_commit_interval = old_opts.s_commit_interval; sbi->s_min_batch_time = old_opts.s_min_batch_time; sbi->s_max_batch_time = old_opts.s_max_batch_time; + ext4_writepages_up_write(sb, alloc_ctx); + if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks) ext4_release_system_zone(sb); #ifdef CONFIG_QUOTA -- 2.31.1

1 year, 8 months

3
2
0 0

[PATCH 1/2] HID: logitech-hidpp: Avoid hidpp_connect_event() running while probe() restarts IO

by Hans de Goede

hidpp_probe() restarts IO after setting things up, if we get a connect event just before hidpp_probe() stops all IO then hidpp_connect_event() will see IO errors causing it to fail to setup the connected device. Add a new io_mutex which hidpp_probe() locks while restarting IO and which is also taken by hidpp_connect_event() to avoid these 2 things from racing. Hopefully this will help with the occasional connect errors leading to e.g. missing battery monitoring. Cc: stable(a)vger.kernel.org Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> --- drivers/hid/hid-logitech-hidpp.c | 35 ++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c index a209d51bd247..33f9cd98485a 100644 --- a/drivers/hid/hid-logitech-hidpp.c +++ b/drivers/hid/hid-logitech-hidpp.c @@ -181,6 +181,7 @@ struct hidpp_scroll_counter { struct hidpp_device { struct hid_device *hid_dev; struct input_dev *input; + struct mutex io_mutex; struct mutex send_mutex; void *send_receive_buf; char *name; /* will never be NULL and should not be freed */ @@ -4207,36 +4208,39 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) return; } + /* Avoid probe() restarting IO */ + mutex_lock(&hidpp->io_mutex); + if (hidpp->quirks & HIDPP_QUIRK_CLASS_WTP) { ret = wtp_connect(hdev, connected); if (ret) - return; + goto out_unlock; } else if (hidpp->quirks & HIDPP_QUIRK_CLASS_M560) { ret = m560_send_config_command(hdev, connected); if (ret) - return; + goto out_unlock; } else if (hidpp->quirks & HIDPP_QUIRK_CLASS_K400) { ret = k400_connect(hdev, connected); if (ret) - return; + goto out_unlock; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_WHEELS) { ret = hidpp10_wheel_connect(hidpp); if (ret) - return; + goto out_unlock; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_EXTRA_MOUSE_BTNS) { ret = hidpp10_extra_mouse_buttons_connect(hidpp); if (ret) - return; + goto out_unlock; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS) { ret = hidpp10_consumer_keys_connect(hidpp); if (ret) - return; + goto out_unlock; } /* the device is already connected, we can ask for its name and @@ -4245,7 +4249,7 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) ret = hidpp_root_get_protocol_version(hidpp); if (ret) { hid_err(hdev, "Can not get the protocol version.\n"); - return; + goto out_unlock; } } @@ -4256,7 +4260,7 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) "%s", name); kfree(name); if (!devm_name) - return; + goto out_unlock; hidpp->name = devm_name; } @@ -4291,12 +4295,12 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) if (!(hidpp->quirks & HIDPP_QUIRK_DELAYED_INIT) || hidpp->delayed_input) /* if the input nodes are already created, we can stop now */ - return; + goto out_unlock; input = hidpp_allocate_input(hdev); if (!input) { hid_err(hdev, "cannot allocate new input device: %d\n", ret); - return; + goto out_unlock; } hidpp_populate_input(hidpp, input); @@ -4304,10 +4308,12 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) ret = input_register_device(input); if (ret) { input_free_device(input); - return; + goto out_unlock; } hidpp->delayed_input = input; +out_unlock: + mutex_unlock(&hidpp->io_mutex); } static DEVICE_ATTR(builtin_power_supply, 0000, NULL, NULL); @@ -4450,6 +4456,7 @@ static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) will_restart = true; INIT_WORK(&hidpp->work, delayed_work_cb); + mutex_init(&hidpp->io_mutex); mutex_init(&hidpp->send_mutex); init_waitqueue_head(&hidpp->wait); @@ -4519,6 +4526,9 @@ static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) flush_work(&hidpp->work); if (will_restart) { + /* Avoid hidpp_connect_event() running while restarting */ + mutex_lock(&hidpp->io_mutex); + /* Reset the HID node state */ hid_device_io_stop(hdev); hid_hw_close(hdev); @@ -4529,6 +4539,7 @@ static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) /* Now export the actual inputs and hidraw nodes to the world */ ret = hid_hw_start(hdev, connect_mask); + mutex_unlock(&hidpp->io_mutex); if (ret) { hid_err(hdev, "%s:hid_hw_start returned error\n", __func__); goto hid_hw_start_fail; @@ -4553,6 +4564,7 @@ static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) sysfs_remove_group(&hdev->dev.kobj, &ps_attribute_group); cancel_work_sync(&hidpp->work); mutex_destroy(&hidpp->send_mutex); + mutex_destroy(&hidpp->io_mutex); return ret; } @@ -4568,6 +4580,7 @@ static void hidpp_remove(struct hid_device *hdev) hid_hw_stop(hdev); cancel_work_sync(&hidpp->work); mutex_destroy(&hidpp->send_mutex); + mutex_destroy(&hidpp->io_mutex); } #define LDJ_DEVICE(product) \ -- 2.41.0

1 year, 8 months

2
5
0 0

[PATCH] ceph: use kernel_connect()

by Jordan Rife

Direct calls to ops->connect() can overwrite the address parameter when used in conjunction with BPF SOCK_ADDR hooks. Recent changes to kernel_connect() ensure that callers are insulated from such side effects. This patch wraps the direct call to ops->connect() with kernel_connect() to prevent unexpected changes to the address passed to ceph_tcp_connect(). This change was originally part of a larger patch targeting the net tree addressing all instances of unprotected calls to ops->connect() throughout the kernel, but this change was split up into several patches targeting various trees. Link: https://lore.kernel.org/netdev/20230821100007.559638-1-jrife@google.com/ Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.cam… Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect") Cc: stable(a)vger.kernel.org Signed-off-by: Jordan Rife <jrife(a)google.com> --- net/ceph/messenger.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 10a41cd9c5235..3c8b78d9c4d1c 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -459,8 +459,8 @@ int ceph_tcp_connect(struct ceph_connection *con) set_sock_callbacks(sock, con); con_sock_state_connecting(con); - ret = sock->ops->connect(sock, (struct sockaddr *)&ss, sizeof(ss), - O_NONBLOCK); + ret = kernel_connect(sock, (struct sockaddr *)&ss, sizeof(ss), + O_NONBLOCK); if (ret == -EINPROGRESS) { dout("connect %s EINPROGRESS sk_state = %u\n", ceph_pr_addr(&con->peer_addr), -- 2.42.0.582.g8ccd20d70d-goog

1 year, 8 months

2
3
0 0

@Re-Design and Development....

by anjali rao

Hello There, May I know if you are interested in any of these services? Please let me know if you are interested in Re-Design and Development (HTML, WordPress) or Ranking your Website on Top Page in Google so that we can send you more information regarding the same. Make this first impression count with a professional and interesting website that clearly communicates your brand and what it stands for. Revert on this mail with an example of a reference website with special features which you want on your site. Thanks & Regards Anjali

1 year, 8 months

1
0
0 0

patch "iio: pressure: ms5611: ms5611_prom_is_valid false negative bug" added to char-misc-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled iio: pressure: ms5611: ms5611_prom_is_valid false negative bug to my char-misc git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git in the char-misc-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. From fd39d9668f2ce9f4b05ad55e8c8d80c098073e0b Mon Sep 17 00:00:00 2001 From: Alexander Zangerl <az(a)breathe-safe.com> Date: Wed, 20 Sep 2023 10:01:10 +1000 Subject: iio: pressure: ms5611: ms5611_prom_is_valid false negative bug The ms5611 driver falsely rejects lots of MS5607-02BA03-50 chips with "PROM integrity check failed" because it doesn't accept a prom crc value of zero as legitimate. According to the datasheet for this chip (and the manufacturer's application note about the PROM CRC), none of the possible values for the CRC are excluded - but the current code in ms5611_prom_is_valid() ends with return crc_orig != 0x0000 && crc == crc_orig Discussed with the driver author (Tomasz Duszynski) and he indicated that at that time (2015) he was dealing with some faulty chip samples which returned blank data under some circumstances and/or followed example code which indicated CRC zero being bad. As far as I can tell this exception should not be applied anymore; We've got a few hundred custom boards here with this chip where large numbers of the prom have a legitimate CRC value 0, and do work fine, but which the current driver code wrongly rejects. Signed-off-by: Alexander Zangerl <az(a)breathe-safe.com> Fixes: c0644160a8b5 ("iio: pressure: add support for MS5611 pressure and temperature sensor") Link: https://lore.kernel.org/r/2535-1695168070.831792@Ze3y.dhYT.s3fx Cc: <Stable(a)vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> --- drivers/iio/pressure/ms5611_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/pressure/ms5611_core.c b/drivers/iio/pressure/ms5611_core.c index 627497e61a63..2fc706f9d8ae 100644 --- a/drivers/iio/pressure/ms5611_core.c +++ b/drivers/iio/pressure/ms5611_core.c @@ -76,7 +76,7 @@ static bool ms5611_prom_is_valid(u16 *prom, size_t len) crc = (crc >> 12) & 0x000F; - return crc_orig != 0x0000 && crc == crc_orig; + return crc == crc_orig; } static int ms5611_read_prom(struct iio_dev *indio_dev) -- 2.42.0

1 year, 8 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror October 2023