The quilt patch titled Subject: mm/thp: check and bail out if page in deferred queue already has been removed from the -mm tree. Its filename was mm-thp-check-and-bail-out-if-page-in-deferred-queue-already.patch
This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------ From: Yin Fengwei fengwei.yin@intel.com Subject: mm/thp: check and bail out if page in deferred queue already Date: Fri, 23 Dec 2022 21:52:07 +0800
Kernel build regression with LLVM was reported here: https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with commit f35b5d7d676e ("mm: align larger anonymous mappings on THP boundaries"). And the commit f35b5d7d676e was reverted.
It turned out the regression is related with madvise(MADV_DONTNEED) was used by ld.lld. But with none PMD_SIZE aligned parameter len. trace-bpfcc captured: 531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4 531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4
If the underneath physical page is THP, the madvise(MADV_DONTNEED) can trigger split_queue_lock contention raised significantly. perf showed following data: 14.85% 0.00% ld.lld [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe 11.52% entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_madvise do_madvise.part.0 zap_page_range unmap_single_vma unmap_page_range page_remove_rmap deferred_split_huge_page __lock_text_start native_queued_spin_lock_slowpath
If THP can't be removed from rmap as whole THP, partial THP will be removed from rmap by removing sub-pages from rmap. Even the THP head page is added to deferred queue already, the split_queue_lock will be acquired and check whether the THP head page is in the queue already. Thus, the contention of split_queue_lock is raised.
Before acquire split_queue_lock, check and bail out early if the THP head page is in the queue already. The checking without holding split_queue_lock could race with deferred_split_scan, but it doesn't impact the correctness here.
Test result of building kernel with ld.lld: commit 7b5a0b664ebe (parent commit of f35b5d7d676e): time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all 6:07.99 real, 26367.77 user, 5063.35 sys
commit f35b5d7d676e: time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all 7:22.15 real, 26235.03 user, 12504.55 sys
commit f35b5d7d676e with the fixing patch: time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all 6:08.49 real, 26520.15 user, 5047.91 sys
Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@intel.com Signed-off-by: Yin Fengwei fengwei.yin@intel.com Tested-by: Nathan Chancellor nathan@kernel.org Acked-by: David Rientjes rientjes@google.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Feng Tang feng.tang@intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Rik van Riel riel@surriel.com Cc: Xing Zhengjun zhengjun.xing@linux.intel.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
mm/huge_memory.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/huge_memory.c~mm-thp-check-and-bail-out-if-page-in-deferred-queue-already +++ a/mm/huge_memory.c @@ -2835,6 +2835,9 @@ void deferred_split_huge_page(struct pag if (PageSwapCache(page)) return;
+ if (!list_empty(page_deferred_list(page))) + return; + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE); _
Patches currently in -mm which might be from fengwei.yin@intel.com are
On Wed, Jan 18, 2023 at 5:15 PM Andrew Morton akpm@linux-foundation.org wrote:
The quilt patch titled Subject: mm/thp: check and bail out if page in deferred queue already has been removed from the -mm tree. Its filename was mm-thp-check-and-bail-out-if-page-in-deferred-queue-already.patch
This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
From: Yin Fengwei fengwei.yin@intel.com Subject: mm/thp: check and bail out if page in deferred queue already Date: Fri, 23 Dec 2022 21:52:07 +0800
Kernel build regression with LLVM was reported here: https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with commit f35b5d7d676e ("mm: align larger anonymous mappings on THP boundaries"). And the commit f35b5d7d676e was reverted.
It turned out the regression is related with madvise(MADV_DONTNEED) was used by ld.lld. But with none PMD_SIZE aligned parameter len. trace-bpfcc captured: 531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4 531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4
This just reminds me that we should reinstantiate Rik's commit?
If the underneath physical page is THP, the madvise(MADV_DONTNEED) can trigger split_queue_lock contention raised significantly. perf showed following data: 14.85% 0.00% ld.lld [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe 11.52% entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_madvise do_madvise.part.0 zap_page_range unmap_single_vma unmap_page_range page_remove_rmap deferred_split_huge_page __lock_text_start native_queued_spin_lock_slowpath
If THP can't be removed from rmap as whole THP, partial THP will be removed from rmap by removing sub-pages from rmap. Even the THP head page is added to deferred queue already, the split_queue_lock will be acquired and check whether the THP head page is in the queue already. Thus, the contention of split_queue_lock is raised.
Before acquire split_queue_lock, check and bail out early if the THP head page is in the queue already. The checking without holding split_queue_lock could race with deferred_split_scan, but it doesn't impact the correctness here.
Test result of building kernel with ld.lld: commit 7b5a0b664ebe (parent commit of f35b5d7d676e): time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all 6:07.99 real, 26367.77 user, 5063.35 sys
commit f35b5d7d676e: time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all 7:22.15 real, 26235.03 user, 12504.55 sys
commit f35b5d7d676e with the fixing patch: time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all 6:08.49 real, 26520.15 user, 5047.91 sys
Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@intel.com Signed-off-by: Yin Fengwei fengwei.yin@intel.com Tested-by: Nathan Chancellor nathan@kernel.org Acked-by: David Rientjes rientjes@google.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Feng Tang feng.tang@intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Rik van Riel riel@surriel.com Cc: Xing Zhengjun zhengjun.xing@linux.intel.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
mm/huge_memory.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/huge_memory.c~mm-thp-check-and-bail-out-if-page-in-deferred-queue-already +++ a/mm/huge_memory.c @@ -2835,6 +2835,9 @@ void deferred_split_huge_page(struct pag if (PageSwapCache(page)) return;
if (!list_empty(page_deferred_list(page)))
return;
spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE);
_
Patches currently in -mm which might be from fengwei.yin@intel.com are
On Wed, 18 Jan 2023 17:31:48 -0800 Yang Shi shy828301@gmail.com wrote:
On Wed, Jan 18, 2023 at 5:15 PM Andrew Morton akpm@linux-foundation.org wrote:
The quilt patch titled Subject: mm/thp: check and bail out if page in deferred queue already has been removed from the -mm tree. Its filename was mm-thp-check-and-bail-out-if-page-in-deferred-queue-already.patch
This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
From: Yin Fengwei fengwei.yin@intel.com Subject: mm/thp: check and bail out if page in deferred queue already Date: Fri, 23 Dec 2022 21:52:07 +0800
Kernel build regression with LLVM was reported here: https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with commit f35b5d7d676e ("mm: align larger anonymous mappings on THP boundaries"). And the commit f35b5d7d676e was reverted.
It turned out the regression is related with madvise(MADV_DONTNEED) was used by ld.lld. But with none PMD_SIZE aligned parameter len. trace-bpfcc captured: 531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4 531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4
This just reminds me that we should reinstantiate Rik's commit?
OK, I did that.
The changelog doesn't mention any performance testing results?
From: Rik van Riel riel@surriel.com Subject: mm: align larger anonymous mappings on THP boundaries Date: Tue, 9 Aug 2022 14:24:57 -0400
Align larger anonymous memory mappings on THP boundaries by going through thp_get_unmapped_area if THPs are enabled for the current process.
With this patch, larger anonymous mappings are now THP aligned. When a malloc library allocates a 2MB or larger arena, that arena can now be mapped with THPs right from the start, which can result in better TLB hit rates and execution time.
Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com Signed-off-by: Rik van Riel riel@surriel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
--- a/mm/mmap.c~mm-align-larger-anonymous-mappings-on-thp-boundaries +++ a/mm/mmap.c @@ -1782,6 +1782,9 @@ get_unmapped_area(struct file *file, uns */ pgoff = 0; get_area = shmem_get_unmapped_area; + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + /* Ensures that larger anonymous mappings are THP aligned. */ + get_area = thp_get_unmapped_area; }
addr = get_area(file, addr, len, pgoff, flags); _
linux-stable-mirror@lists.linaro.org