On 14 Feb 2024, at 5:55, David Hildenbrand wrote:
On 14.02.24 11:50, Ryan Roberts wrote:
On 13/02/2024 22:31, Zi Yan wrote:
On 13 Feb 2024, at 17:21, David Hildenbrand wrote:
On 13.02.24 22:55, Zi Yan wrote:
From: Zi Yan ziy@nvidia.com
Hi all,
File folio supports any order and multi-size THP is upstreamed[1], so both file and anonymous folios can be >0 order. Currently, split_huge_page() only splits a huge page to order-0 pages, but splitting to orders higher than 0 is going to better utilize large folios. In addition, Large Block Sizes in XFS support would benefit from it[2]. This patchset adds support for splitting a large folio to any lower order folios and uses it during file folio truncate operations.
For Patch 6, Hugh did not like my approach to minimize the number of folios for truncate[3]. I would like to get more feedback, especially from FS people, on it to decide whether to keep it or not.
I'm curious, would it make sense to exclude the "more" controversial parts (i.e., patch #6) for now, and focus on the XFS use case only?
Sure. Patch 6 was there to make use of split_huge_page_to_list_to_order(). Now we have multi-size THP and XFS use cases, it can be dropped.
What are your plans for how to determine when to split THP and to what order? I don't see anything in this series that would split anon THP to non-zero order?
We have talked about using hints from user space in the past (e.g. mremap, munmap, madvise, etc). But chrome has a use case where it temporarily mprotects a single (4K) page as part of garbage collection (IIRC). If you eagerly split on that hint, you will have lost the benefits of the large folio when it later mprotects back to the original setting.
Not only that, splitting will make some of these operations more expensive, possibly with no actual benefit.
I guess David will suggest this would be a good use case for the khugepaged-lite machanism we have been talking about. I dunno - it seems wasteful to split then collapse again.
I agree. mprotect() and even madvise(), ... might not be good candidates for splitting. mremap() likely is, if the folio is mapped exclusively. MADV_DONTNEED/munmap()/mlock() might be good candidates (again, if mapped exclusively). This will need a lot of thought I'm afraid (as you say, deferred splitting is another example).
My initial use was for splitting 1GB THP to 2MB THP, but 1GB THP is not upstream yet. So for now, this might only be used by XFS. For anonymous large folios, we will use this when we find a justified use case. What I can think of is when a PMD-mapped THP happens to be split and the resulting order can be a HW/SW favored order, like 64KB or 32KB (to be able to use contig PTE), we split to that order, otherwise, we still split to order-0.
-- Best Regards, Yan, Zi