On 4 Mar 2025, at 6:49, Hugh Dickins wrote:
On Wed, 26 Feb 2025, Zi Yan wrote:
This is a preparation patch, both added functions are not used yet.
The added __split_unmapped_folio() is able to split a folio with its mapping removed in two manners: 1) uniform split (the existing way), and 2) buddy allocator like split.
The added __split_folio_to_order() can split a folio into any lower order. For uniform split, __split_unmapped_folio() calls it once to split the given folio to the new order. For buddy allocator split, __split_unmapped_folio() calls it (folio_order - new_order) times and each time splits the folio containing the given page to one lower order.
Signed-off-by: Zi Yan ziy@nvidia.com
Sorry, I'm tired and don't really want to be writing this yet, but the migrate "hotfix" has tipped my hand, and I need to get this out to you before more days pass.
I'd been unable to complete even a single iteration of my "kernel builds on huge tmpfs while swapping to SSD" testing during this current 6.14-rc mm.git cycle (6.14-rc itself fine) - until the last week, when some important fixes have come in, so I'm no longer getting I/O errors from ext4-on-loop0-on-huge-tmpfs, and "Huh VM_FAULT_OOM leaked" warnings: good.
But I still can't get beyond a few iterations, a few minutes: there's some corruption of user data, which usually manifests as a kernel build failing because fixdep couldn't find some truncated-on-the-left pathname.
While it definitely bisected to your folio_split() series, it's quite possible that you're merely exposing an existing bug to wider use.
I've spent the last few days trying to track this down, but still not succeeded: I'm still getting much the same corruption. But have been folding in various fixes as I found them, even though they have not solved the main problem at all. I'll return to trying to debug the corruption "tomorrow".
I think (might be wrong, I'm in a rush) my mods are all to this "add two new (not yet used) functions for folio_split()" patch: please merge them in if you agree.
- From source inspection, it looks like a folio_set_order() was missed.
Actually no. folio_set_order(folio, new_order) is called multiple times in the for loop above. It is duplicated but not missing.
- Why is swapcache only checked when folio_test_anon? I can see that you've just copied that over from the old __split_huge_page(), but it seems wrong to me here and there - I guess a relic from before shmem could swap out a huge page.
Yes, it is a relic, but it is still right before I change another relic in __folio_split() or split_huge_page_to_list_to_order() from mainline, if (!mapping) { ret = -EBUSY; goto out; }. It excludes the shmem in swap cache case. I probably will leave it as is in my next folio_split() version to avoid adding more potential bugs, but will come back later in another patch.
Best Regards, Yan, Zi