On Mon, 3 Apr 2023, Zi Yan wrote:
From: Zi Yan ziy@nvidia.com
To split a THP to any lower order pages, we need to reform THPs on subpages at given order and add page refcount based on the new page order. Also we need to reinitialize page_deferred_list after removing the page from the split_queue, otherwise a subsequent split will see list corruption when checking the page_deferred_list again.
It has many uses, like minimizing the number of pages after truncating a huge pagecache page. For anonymous THPs, we can only split them to order-0 like before until we add support for any size anonymous THPs.
Signed-off-by: Zi Yan ziy@nvidia.com
...
@@ -2754,14 +2798,18 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (folio_test_swapbacked(folio)) { __lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
} else {
} else if (!new_order) {
/*
* Decrease THP stats only if split to normal
* pages
}*/ __lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); }
This part is wrong. The problem I've had is /proc/sys/vm/stat_refresh warning of negative nr_shmem_hugepages (which then gets shown as 0 in vmstat or meminfo, even though there actually are shmem hugepages).
At first I thought that the fix needed (which I'm running with) is:
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2797,17 +2797,16 @@ int split_huge_page_to_list_to_order(str int nr = folio_nr_pages(folio);
xas_split(&xas, folio, folio_order(folio)); - if (folio_test_swapbacked(folio)) { - __lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, - -nr); - } else if (!new_order) { - /* - * Decrease THP stats only if split to normal - * pages - */ - __lruvec_stat_mod_folio(folio, NR_FILE_THPS, - -nr); - filemap_nr_thps_dec(mapping); + if (folio_test_pmd_mappable(folio) && + new_order < HPAGE_PMD_ORDER) { + if (folio_test_swapbacked(folio)) { + __lruvec_stat_mod_folio(folio, + NR_SHMEM_THPS, -nr); + } else { + __lruvec_stat_mod_folio(folio, + NR_FILE_THPS, -nr); + filemap_nr_thps_dec(mapping); + } } }
because elsewhere the maintenance of NR_SHMEM_THPS or NR_FILE_THPS is rightly careful to be dependent on folio_test_pmd_mappable() (and, so far as I know, we shall not be seeing folios of order higher than HPAGE_PMD_ORDER yet in mm/huge_memory.c - those would need more thought).
But it may be more complicated than that, given that patch 7/7 appears (I haven't tried) to allow splitting to other orders on a file opened for reading - that might be a bug.
The complication here is that we now have four kinds of large folio in mm/huge_memory.c, and the rules are a bit different for each.
Anonymous THPs: okay, I think I've seen you exclude those with -EINVAL at a higher level (and they wouldn't be getting into this "if (mapping) {" block anyway).
Shmem (swapbacked) THPs: we are only allocating shmem in 0-order or HPAGE_PMD_ORDER at present. I can imagine that in a few months or a year-or-so's time, we shall want to follow Matthew's folio readahead, and generalize to other orders in shmem; but right now I'd really prefer not to have truncation or debugfs introducing the surprise of other orders there. Maybe there's little that needs to be fixed, only the THP_SWPOUT and THP_SWPOUT_FALLBACK statistics have come to mind so far (would need to be limited to folio_test_pmd_mappable()); though I've no idea how well intermediate orders will work with or against THP swapout.
CONFIG_READ_ONLY_THP_FOR_FS=y file THPs: those need special care, and their filemap_nr_thps_dec(mapping) above may not be good enough. So long as it's working as intended, it does exclude the possibility of truncation splitting here; but if you allow splitting via debugfs to reach them, then the accounting needs to be changed - for them, any order higher than 0 has to be counted in nr_thps - so splitting one HPAGE_PMD_ORDER THP into multiple large folios will need to add to that count, not decrement it. Otherwise, a filesystem unprepared for large folios or compound pages is in danger of meeting them by surprise. Better just disable that possibility, along with shmem.
mapping_large_folio_support() file THPs: this category is the one you're really trying to address with this series, they can already come in various orders, and it's fair for truncation to make a different choice of orders - but is what it's doing worth doing? I'll say more on 6/7.
Hugh