On 10/24/24 17:47, Lorenzo Stoakes wrote:
On Thu, Oct 24, 2024 at 05:12:29PM +0200, Vlastimil Babka wrote:
Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page.
However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The
Ugh god.
benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas.
Any more details on precisely why?
The experiments performed in [1] didn't seem conclusive enough for me to say that with enough confidence :) Generally speaking if there are multiple addresses with the same virtual or physical offset accesssed rapidly, they can alias in the TLB or processor caches due to limited associativity and cause thrashing. Aligning the mappings to same 2MB boundary can cause such aliasing.
Another known regression bisected to commit efa7df3e3bb5 is darktable [2] [3] and early testing suggests this patch fixes the regression there as well.
Good!
To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again.
Seems like the original logic just padded the length by PMD size and checks for overflow, assuming that [pgoff << PAGE_SHIFT, pgoff << PAGE_SHIFT + len) contains at least one PMD-sized block.
Which I guess results in potentially getting mis-sized empty spaces that now can't be PMD-merged at the bits that 'overhang' the PMD-sized/aligned bit?
Which is yeah, not great and would explain this (correct me if my understanding is wrong).
Reported-by: Michael Matz matz@suse.de Debugged-by: Gabriel Krisman Bertazi gabriel@krisman.be Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] Reported-by: Matthias Bodenbinder matthias@bodenbinder.de Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.in... [3] Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Cc: stable@vger.kernel.org Cc: Rik van Riel riel@surriel.com Cc: Yang Shi yang@os.amperecomputing.com Signed-off-by: Vlastimil Babka vbabka@suse.cz
mm/mmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 9c0fb43064b5..a5297cfb1dfc 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -900,7 +900,8 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (get_area) { addr = get_area(file, addr, len, pgoff, flags);
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
&& IS_ALIGNED(len, PMD_SIZE)) {
So doing this feels right but...
Hm this seems like it belongs in __thp_get_unmapped_area() which does a bunch of checks up front returning 0 if they fail, which then results in it peforming the normal get unmapped area logic.
That also has a bunch of (offset) alignment checks as well overflow checks so it would seem the natural place to also check length?
Petr suggested the same, but changing __thp_get_unmapped_area() affects FS THP's and the proposed check seemed wrong to me:
https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/
While it could be fixed, I'm still not sure if we want to restrict FS THPs the same as anonymous THPs. AFAIU even small mappings of a range from a file should be aligned properly to make it possible for a large range from the same file (that includes the smaller range) mapped elsewhere to be THP backed? I mean we can investigate it further, but for the regression fix to backported to stable kernels it seemed more safe to address only the case that was changed by commit efa7df3e3bb5 specifically, i.e. anonymous mappings.
/* Ensures that larger anonymous mappings are THP aligned. */ addr = thp_get_unmapped_area_vmflags(file, addr, len, pgoff, flags, vm_flags);
-- 2.47.0