Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
---
This patch is based on current mm-new, latest commit:
056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flags
Backport note:
Current code evolved from original commit with following four changes. We should do proper adjustment respectively on backporting.
commit c010d47f107f609b9f4d6a103b6dfc53889049e9 Author: Zi Yan ziy@nvidia.com Date: Mon Feb 26 15:55:33 2024 -0500
mm: thp: split huge page to any lower order pages
commit 6a50c9b512f7734bc356f4bd47885a6f7c98491a Author: Ran Xiaokai ran.xiaokai@zte.com.cn Date: Fri Jun 7 17:40:48 2024 +0800
mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
commit 9b2f764933eb5e3ac9ebba26e3341529219c4401 Author: Zi Yan ziy@nvidia.com Date: Wed Jan 22 11:19:27 2025 -0500
mm/huge_memory: allow split shmem large folio to any lower order
commit 58729c04cf1092b87aeef0bf0998c9e2e4771133 Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:39:57 2025 -0500
mm/huge_memory: add buddy allocator like (non-uniform) folio_split() --- mm/huge_memory.c | 68 +++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 33 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..8701c3eef05f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false; - } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { - if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && - !mapping_large_folio_support(folio->mapping)) { - /* - * We can always split a folio down to a single page - * (new_order == 0) uniformly. - * - * For any other scenario - * a) uniform split targeting a large folio - * (new_order > 0) - * b) any non-uniform split - * we must confirm that the file system supports large - * folios. - * - * Note that we might still have THPs in such - * mappings, which is created from khugepaged when - * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that - * case, the mapping does not actually support large - * folios properly. - */ - VM_WARN_ONCE(warns, - "Cannot split file folio to non-0 order"); + } else { + const struct address_space *mapping = folio->mapping; + + /* Truncated ? */ + /* + * TODO: add support for large shmem folio in swap cache. + * When shmem is in swap cache, mapping is NULL and + * folio_test_swapcache() is true. + */ + if (!mapping) return false; + + if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && + !mapping_large_folio_support(folio->mapping)) { + /* + * We can always split a folio down to a + * single page (new_order == 0) uniformly. + * + * For any other scenario + * a) uniform split targeting a large folio + * (new_order > 0) + * b) any non-uniform split + * we must confirm that the file system + * supports large folios. + * + * Note that we might still have THPs in such + * mappings, which is created from khugepaged + * when CONFIG_READ_ONLY_THP_FOR_FS is + * enabled. But in that case, the mapping does + * not actually support large folios properly. + */ + VM_WARN_ONCE(warns, + "Cannot split file folio to non-0 order"); + return false; + } } }
@@ -3965,17 +3978,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
mapping = folio->mapping;
- /* Truncated ? */ - /* - * TODO: add support for large shmem folio in swap cache. - * When shmem is in swap cache, mapping is NULL and - * folio_test_swapcache() is true. - */ - if (!mapping) { - ret = -EBUSY; - goto out; - } - min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) { ret = -EINVAL;
On 18 Nov 2025, at 20:26, Wei Yang wrote:
Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
This patch is based on current mm-new, latest commit:
056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flagsBackport note:
Current code evolved from original commit with following four changes. We should do proper adjustment respectively on backporting.
commit c010d47f107f609b9f4d6a103b6dfc53889049e9 Author: Zi Yan ziy@nvidia.com Date: Mon Feb 26 15:55:33 2024 -0500
mm: thp: split huge page to any lower order pagescommit 6a50c9b512f7734bc356f4bd47885a6f7c98491a Author: Ran Xiaokai ran.xiaokai@zte.com.cn Date: Fri Jun 7 17:40:48 2024 +0800
mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
This is a hot fix to commit c010d47f107f, so the backport should end at this point.
commit 9b2f764933eb5e3ac9ebba26e3341529219c4401 Author: Zi Yan ziy@nvidia.com Date: Wed Jan 22 11:19:27 2025 -0500
mm/huge_memory: allow split shmem large folio to any lower ordercommit 58729c04cf1092b87aeef0bf0998c9e2e4771133 Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:39:57 2025 -0500
mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
mm/huge_memory.c | 68 +++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 33 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..8701c3eef05f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
- } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&!mapping_large_folio_support(folio->mapping)) {/** We can always split a folio down to a single page* (new_order == 0) uniformly.** For any other scenario* a) uniform split targeting a large folio* (new_order > 0)* b) any non-uniform split* we must confirm that the file system supports large* folios.** Note that we might still have THPs in such* mappings, which is created from khugepaged when* CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that* case, the mapping does not actually support large* folios properly.*/VM_WARN_ONCE(warns,"Cannot split file folio to non-0 order");
- } else {
const struct address_space *mapping = folio->mapping;/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) return false;if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&!mapping_large_folio_support(folio->mapping)) {
folio->mapping can just be mapping here. The above involved commits would mostly need separate backport patches, so keeping folio->mapping as the original code does not make backporting easier.
/** We can always split a folio down to a* single page (new_order == 0) uniformly.** For any other scenario* a) uniform split targeting a large folio* (new_order > 0)* b) any non-uniform split* we must confirm that the file system* supports large folios.** Note that we might still have THPs in such* mappings, which is created from khugepaged* when CONFIG_READ_ONLY_THP_FOR_FS is* enabled. But in that case, the mapping does* not actually support large folios properly.*/VM_WARN_ONCE(warns,"Cannot split file folio to non-0 order");return false; } }}@@ -3965,17 +3978,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}- min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) { ret = -EINVAL;
-- 2.34.1
Otherwise, LGTM. Thank you for fixing the issue.
Reviewed-by: Zi Yan ziy@nvidia.com
Best Regards, Yan, Zi
On Tue, Nov 18, 2025 at 09:32:05PM -0500, Zi Yan wrote:
On 18 Nov 2025, at 20:26, Wei Yang wrote:
Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
This patch is based on current mm-new, latest commit:
056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flagsBackport note:
Current code evolved from original commit with following four changes. We should do proper adjustment respectively on backporting.
commit c010d47f107f609b9f4d6a103b6dfc53889049e9 Author: Zi Yan ziy@nvidia.com Date: Mon Feb 26 15:55:33 2024 -0500
mm: thp: split huge page to any lower order pagescommit 6a50c9b512f7734bc356f4bd47885a6f7c98491a Author: Ran Xiaokai ran.xiaokai@zte.com.cn Date: Fri Jun 7 17:40:48 2024 +0800
mm: huge_memory: fix misused mapping_large_folio_support() for anon foliosThis is a hot fix to commit c010d47f107f, so the backport should end at this point.
commit 9b2f764933eb5e3ac9ebba26e3341529219c4401 Author: Zi Yan ziy@nvidia.com Date: Wed Jan 22 11:19:27 2025 -0500
mm/huge_memory: allow split shmem large folio to any lower ordercommit 58729c04cf1092b87aeef0bf0998c9e2e4771133 Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:39:57 2025 -0500
mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
mm/huge_memory.c | 68 +++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 33 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..8701c3eef05f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
- } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&!mapping_large_folio_support(folio->mapping)) {/** We can always split a folio down to a single page* (new_order == 0) uniformly.** For any other scenario* a) uniform split targeting a large folio* (new_order > 0)* b) any non-uniform split* we must confirm that the file system supports large* folios.** Note that we might still have THPs in such* mappings, which is created from khugepaged when* CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that* case, the mapping does not actually support large* folios properly.*/VM_WARN_ONCE(warns,"Cannot split file folio to non-0 order");
- } else {
const struct address_space *mapping = folio->mapping;/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) return false;if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&!mapping_large_folio_support(folio->mapping)) {folio->mapping can just be mapping here. The above involved commits would mostly need separate backport patches, so keeping folio->mapping as the original code does not make backporting easier.
Thanks, I think you are right. I tried to keep the foot print small for backport. But it seems not benefit much.
@Andrew
If an updated version is necessary, please let me know.
/** We can always split a folio down to a* single page (new_order == 0) uniformly.** For any other scenario* a) uniform split targeting a large folio* (new_order > 0)* b) any non-uniform split* we must confirm that the file system* supports large folios.** Note that we might still have THPs in such* mappings, which is created from khugepaged* when CONFIG_READ_ONLY_THP_FOR_FS is* enabled. But in that case, the mapping does* not actually support large folios properly.*/VM_WARN_ONCE(warns,"Cannot split file folio to non-0 order");return false; } }}@@ -3965,17 +3978,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}- min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) { ret = -EINVAL;
-- 2.34.1
Otherwise, LGTM. Thank you for fixing the issue.
Reviewed-by: Zi Yan ziy@nvidia.com
Best Regards, Yan, Zi
On 19.11.25 02:26, Wei Yang wrote:
Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
Under which circumstances would that be the case? Only for large shmem folios in the swapcache or also for truncated folios? So I'd assume this would also affect truncated folios and we should spell that out here?
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
The doc of __split_huge_page_to_list_to_order() would now be outdated and has to be updated.
Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of -EBUSY will make a difference on concurrent truncation. -EINVAL will be propagated and make the operation fail, while -EBUSY will be translated to -EAGAIN and the caller will simply lookup the folio again and retry.
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Probably worth mentioning that this was identified by code inspection?
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This patch is based on current mm-new, latest commit:
056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flagsBackport note:
Current code evolved from original commit with following four changes. We should do proper adjustment respectively on backporting.
commit c010d47f107f609b9f4d6a103b6dfc53889049e9 Author: Zi Yan ziy@nvidia.com Date: Mon Feb 26 15:55:33 2024 -0500
mm: thp: split huge page to any lower order pagescommit 6a50c9b512f7734bc356f4bd47885a6f7c98491a Author: Ran Xiaokai ran.xiaokai@zte.com.cn Date: Fri Jun 7 17:40:48 2024 +0800
mm: huge_memory: fix misused mapping_large_folio_support() for anon folioscommit 9b2f764933eb5e3ac9ebba26e3341529219c4401 Author: Zi Yan ziy@nvidia.com Date: Wed Jan 22 11:19:27 2025 -0500
mm/huge_memory: allow split shmem large folio to any lower ordercommit 58729c04cf1092b87aeef0bf0998c9e2e4771133 Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:39:57 2025 -0500
mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
mm/huge_memory.c | 68 +++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 33 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..8701c3eef05f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
- } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&!mapping_large_folio_support(folio->mapping)) {/** We can always split a folio down to a single page* (new_order == 0) uniformly.** For any other scenario* a) uniform split targeting a large folio* (new_order > 0)* b) any non-uniform split* we must confirm that the file system supports large* folios.** Note that we might still have THPs in such* mappings, which is created from khugepaged when* CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that* case, the mapping does not actually support large* folios properly.*/VM_WARN_ONCE(warns,"Cannot split file folio to non-0 order");
- } else {
Why not simply
} else if (!folio->mapping) { /* * Truncated? ... return false; } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { ...
const struct address_space *mapping = folio->mapping;/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/
While at it, can we merge the two comments into something, like:
/* * If there is no mapping that the folio was truncated and we cannot * split. * * TODO: large shmem folio in the swap cache also don't currently have a * mapping but folio_test_swapcache() is true for them. */
On Wed, Nov 19, 2025 at 09:57:58AM +0100, David Hildenbrand (Red Hat) wrote:
On 19.11.25 02:26, Wei Yang wrote:
Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
Under which circumstances would that be the case? Only for large shmem folios in the swapcache or also for truncated folios? So I'd assume this would also affect truncated folios and we should spell that out here?
Sorry I don't mention it for my uncertainty.
Will add it.
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
The doc of __split_huge_page_to_list_to_order() would now be outdated and has to be updated.
Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of -EBUSY will make a difference on concurrent truncation. -EINVAL will be propagated and make the operation fail, while -EBUSY will be translated to -EAGAIN and the caller will simply lookup the folio again and retry.
Oops, I missed this case.
I will rescan users.
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
[1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org
This patch is based on current mm-new, latest commit:
056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flagsBackport note:
Current code evolved from original commit with following four changes. We should do proper adjustment respectively on backporting.
commit c010d47f107f609b9f4d6a103b6dfc53889049e9 Author: Zi Yan ziy@nvidia.com Date: Mon Feb 26 15:55:33 2024 -0500
mm: thp: split huge page to any lower order pagescommit 6a50c9b512f7734bc356f4bd47885a6f7c98491a Author: Ran Xiaokai ran.xiaokai@zte.com.cn Date: Fri Jun 7 17:40:48 2024 +0800
mm: huge_memory: fix misused mapping_large_folio_support() for anon folioscommit 9b2f764933eb5e3ac9ebba26e3341529219c4401 Author: Zi Yan ziy@nvidia.com Date: Wed Jan 22 11:19:27 2025 -0500
mm/huge_memory: allow split shmem large folio to any lower ordercommit 58729c04cf1092b87aeef0bf0998c9e2e4771133 Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:39:57 2025 -0500
mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
mm/huge_memory.c | 68 +++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 33 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..8701c3eef05f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
- } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&!mapping_large_folio_support(folio->mapping)) {/** We can always split a folio down to a single page* (new_order == 0) uniformly.** For any other scenario* a) uniform split targeting a large folio* (new_order > 0)* b) any non-uniform split* we must confirm that the file system supports large* folios.** Note that we might still have THPs in such* mappings, which is created from khugepaged when* CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that* case, the mapping does not actually support large* folios properly.*/VM_WARN_ONCE(warns,"Cannot split file folio to non-0 order");
- } else {
Why not simply
} else if (!folio->mapping) { /* * Truncated? ... return false; } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { ...
Ah, just not spot this :-(
const struct address_space *mapping = folio->mapping;/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/While at it, can we merge the two comments into something, like:
/*
- If there is no mapping that the folio was truncated and we cannot
- split.
- TODO: large shmem folio in the swap cache also don't currently have a
- mapping but folio_test_swapcache() is true for them.
*/
Sure.
-- Cheers
David
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false; + } else if (folio_test_swapcache(folio)) { + /* TODO: support shmem folios that are in the swapcache. */ + return false; } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /* @@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false; + } else if (folio_test_swapcache(folio)) { + /* TODO: support shmem folios that are in the swapcache. */ + return false; } else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { @@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
+ /* + * Folios that just got truncated cannot get split. Signal to the + * caller that there was a race. + * + * TODO: support shmem folios that are in the swapcache. + */ + if (!is_anon && !folio->mapping && !folio_test_swapcache(folio)) + return -EBUSY; + if (new_order >= folio_order(folio)) return -EINVAL;
@@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp;
mapping = folio->mapping; - - /* Truncated ? */ - /* - * TODO: add support for large shmem folio in swap cache. - * When shmem is in swap cache, mapping is NULL and - * folio_test_swapcache() is true. - */ - if (!mapping) { - ret = -EBUSY; - goto out; - } + VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {
So rule out the truncated case earlier, leaving only the swapcache check to be handled later.
Thoughts?
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
As we will want to backport this patch, likely we want to have it apply on current master.
Bur Andrew can comment what he prefers in this case of a stable fix.
On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false; } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /*@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;
You are splitting the truncate case into shmem one and page cache one. This is only for shmem in the swap cache and ...
} else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: support shmem folios that are in the swapcache.
this is for page cache one. So this TODO is not needed.
*/if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {So rule out the truncated case earlier, leaving only the swapcache check to be handled later.
Thoughts?
I thought the truncated case includes both page cache and shmem in the swap cache.
Otherwise, it looks good to me.
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
As we will want to backport this patch, likely we want to have it apply on current master.
Bur Andrew can comment what he prefers in this case of a stable fix.
That could mess up with mm-new tree[1] based on Andrew's recent feedback.
[1] https://lore.kernel.org/all/20251118140658.9078de6aab719b2308996387@linux-fo...
-- Best Regards, Yan, Zi
On Wed, Nov 19, 2025 at 08:08:01AM -0500, Zi Yan wrote:
On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;
Hmm... we are filtering out all swapcache instead of just shmem swapcache?
Is it possible for (folio->mapping && folio_test_swapcache(folio)) reach here? Looks the logic is little different, but maybe I missed something.
OK, my brain is out of state. Hope I don't make stupid mistake.
} else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /*@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;You are splitting the truncate case into shmem one and page cache one. This is only for shmem in the swap cache and ...
} else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: support shmem folios that are in the swapcache.this is for page cache one. So this TODO is not needed.
*/if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {So rule out the truncated case earlier, leaving only the swapcache check to be handled later.
Thoughts?
I thought the truncated case includes both page cache and shmem in the swap cache.
Otherwise, it looks good to me.
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
As we will want to backport this patch, likely we want to have it apply on current master.
Bur Andrew can comment what he prefers in this case of a stable fix.
That could mess up with mm-new tree[1] based on Andrew's recent feedback.
[1] https://lore.kernel.org/all/20251118140658.9078de6aab719b2308996387@linux-fo...
-- Best Regards, Yan, Zi
On 19.11.25 14:41, Wei Yang wrote:
On Wed, Nov 19, 2025 at 08:08:01AM -0500, Zi Yan wrote:
On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;Hmm... we are filtering out all swapcache instead of just shmem swapcache?
Is it possible for (folio->mapping && folio_test_swapcache(folio)) reach here? Looks the logic is little different, but maybe I missed something.
OK, my brain is out of state. Hope I don't make stupid mistake.
It's tricky. folio_test_swapcache() only applies to anon and shmem.
But looking at it, we have
PG_swapcache = PG_owner_priv_1,
PG_owner_priv_1 is also used for * XEN stuff * vmemmap_self_hosted
Which is not important for us IIRC.
But we have
/* Some filesystems */ PG_checked = PG_owner_priv_1
So maybe we could indeed have false positives here.
So I guess we cannot rely on folio_test_swapcache() ... here. What a mess.
On 19.11.25 14:08, Zi Yan wrote:
On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false; } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /*@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;You are splitting the truncate case into shmem one and page cache one. This is only for shmem in the swap cache and ...
} else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: support shmem folios that are in the swapcache.this is for page cache one. So this TODO is not needed.
I added the TODO because we'd like to detect truncation there as well I think. Hm.
*/if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))return -EBUSY;
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; }
+static bool folio_test_shmem_swapcache(struct folio *folio) +{ + VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio); + /* These folios do not have folio->mapping set. */ + return folio_test_swapbacked(folio) && folio_test_swapcache(folio); +} + bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) { @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false; + } else if (folio_test_shmem_swapcache(folio)) { + /* TODO: support shmem folios that are in the swapcache. */ + return false; } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /* @@ -3556,6 +3566,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false; + } else if (folio_test_shmem_swapcache(folio)) { + /* TODO: support shmem folios that are in the swapcache. */ + return false; } else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { @@ -3619,6 +3632,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
+ /* + * Folios that just got truncated cannot get split. Signal to the + * caller that there was a race. + */ + if (!is_anon && !folio->mapping && !folio_test_shmem_swapcache(folio)) + return -EBUSY; + if (new_order >= folio_order(folio)) return -EINVAL;
@@ -3659,17 +3679,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp;
mapping = folio->mapping; - - /* Truncated ? */ - /* - * TODO: add support for large shmem folio in swap cache. - * When shmem is in swap cache, mapping is NULL and - * folio_test_swapcache() is true. - */ - if (!mapping) { - ret = -EBUSY; - goto out; - } + VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
As we will want to backport this patch, likely we want to have it apply on current master.
Bur Andrew can comment what he prefers in this case of a stable fix.
That could mess up with mm-new tree[1] based on Andrew's recent feedback.
Right, a similar fix could be had against mm-new.
On 19 Nov 2025, at 9:09, David Hildenbrand (Red Hat) wrote:
On 19.11.25 14:08, Zi Yan wrote:
On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false; } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /*@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;You are splitting the truncate case into shmem one and page cache one. This is only for shmem in the swap cache and ...
} else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: support shmem folios that are in the swapcache.this is for page cache one. So this TODO is not needed.
I added the TODO because we'd like to detect truncation there as well I think. Hm.
OK, got it. Here you mean shmem in the swapcache is not checked and when shmem in the swapcache is supported, folio_test_swapcache() can be removed here along with the TODO. Now it makes sense.
*/if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))return -EBUSY;Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) { @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;
With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
} else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /*@@ -3556,6 +3566,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false; } else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3619,6 +3632,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.*/if (!is_anon && !folio->mapping && !folio_test_shmem_swapcache(folio))return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3679,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {
I think it works if there is no impact on s390_wiggle_split_folio(). It also clarifies two truncated cases.
For backporting, maybe just move "if (!mapping)" up for simplicity?
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
As we will want to backport this patch, likely we want to have it apply on current master.
Bur Andrew can comment what he prefers in this case of a stable fix.
That could mess up with mm-new tree[1] based on Andrew's recent feedback.
Right, a similar fix could be had against mm-new.
-- Cheers
David
-- Best Regards, Yan, Zi
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
Ah, my memory comes back.
vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..5ce86882b2727 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
+ /* + * Folios that just got truncated cannot get split. Signal to the + * caller that there was a race. + * + * TODO: this will also currently refuse shmem folios that are in + * the swapcache. + */ + if (!is_anon && !folio->mapping) + return -EBUSY; + if (new_order >= folio_order(folio)) return -EINVAL;
@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp;
mapping = folio->mapping; - - /* Truncated ? */ - /* - * TODO: add support for large shmem folio in swap cache. - * When shmem is in swap cache, mapping is NULL and - * folio_test_swapcache() is true. - */ - if (!mapping) { - ret = -EBUSY; - goto out; - } + VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {
On 19 Nov 2025, at 9:46, David Hildenbrand (Red Hat) wrote:
On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
Ah, my memory comes back.
vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
Exactly, also an easy backport.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..5ce86882b2727 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: this will also currently refuse shmem folios that are in* the swapcache.*/if (!is_anon && !folio->mapping)return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {-- Cheers
David
Best Regards, Yan, Zi
On 19.11.25 15:48, Zi Yan wrote:
On 19 Nov 2025, at 9:46, David Hildenbrand (Red Hat) wrote:
On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
Ah, my memory comes back.
vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
Exactly, also an easy backport.
Okay, let's do that then.
@Wei can you send a v2?
On Wed, Nov 19, 2025 at 03:46:14PM +0100, David Hildenbrand (Red Hat) wrote:
On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
Ah, my memory comes back.
vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..5ce86882b2727 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: this will also currently refuse shmem folios that are in* the swapcache.*/if (!is_anon && !folio->mapping)return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {
Thanks, will prepare a v2 with this.
-- Cheers
David
On Wed, Nov 19, 2025 at 03:46:14PM +0100, David Hildenbrand (Red Hat) wrote:
On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
Ah, my memory comes back.
vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..5ce86882b2727 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: this will also currently refuse shmem folios that are in* the swapcache.*/if (!is_anon && !folio->mapping)return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {
One more thing come up my mind.
Current folio_split_supported() is used in try_folio_split_to_order().
Here are related commits:
[1] commit 7460b470a131f985a70302a322617121efdd7caa Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:40:00 2025 -0500
mm/truncate: use folio_split() in truncate operation
[2] commit 77008e1b2ef73249bceb078a321a3ff6bc087afb Author: Zi Yan ziy@nvidia.com Date: Thu Oct 16 21:36:30 2025 -0400
mm/huge_memory: do not change split_huge_page*() target order silently
[1] looks fine, because before calling folio_split_supported(), min_order_for_split() would return negative if !folio->mapping.
But [2] moves min_order_for_split() from try_folio_split_to_order() to it caller.
Currently it looks good, but not sure it will leave potential misuse.
-- Cheers
David
On 19 Nov 2025, at 19:47, Wei Yang wrote:
On Wed, Nov 19, 2025 at 03:46:14PM +0100, David Hildenbrand (Red Hat) wrote:
On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
Ah, my memory comes back.
vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..5ce86882b2727 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: this will also currently refuse shmem folios that are in* the swapcache.*/if (!is_anon && !folio->mapping)return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {One more thing come up my mind.
Current folio_split_supported() is used in try_folio_split_to_order().
Here are related commits:
[1] commit 7460b470a131f985a70302a322617121efdd7caa Author: Zi Yan ziy@nvidia.com Date: Fri Mar 7 12:40:00 2025 -0500
mm/truncate: use folio_split() in truncate operation[2] commit 77008e1b2ef73249bceb078a321a3ff6bc087afb Author: Zi Yan ziy@nvidia.com Date: Thu Oct 16 21:36:30 2025 -0400
mm/huge_memory: do not change split_huge_page*() target order silently[1] looks fine, because before calling folio_split_supported(), min_order_for_split() would return negative if !folio->mapping.
But [2] moves min_order_for_split() from try_folio_split_to_order() to it caller.
Currently it looks good, but not sure it will leave potential misuse.
I am sending patches to handle it. Thank you for spotting it. Best Regards, Yan, Zi
On 19 Nov 2025, at 9:37, David Hildenbrand (Red Hat) wrote:
Given folio_test_swapcache() might have false positives, I assume we'd need a
folio_test_swapbacked() && folio_test_swapcache(folio)
To detect large large shmem folios in the swapcache in all cases here.
Something like the following would hopefully do:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..57aab66bedbea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool folio_test_shmem_swapcache(struct folio *folio) +{
VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);/* These folios do not have folio->mapping set. */return folio_test_swapbacked(folio) && folio_test_swapcache(folio);+}
- bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns) {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_shmem_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false;With this, truncated shmem returns -EINVALID instead of -EBUSY now. Can s390_wiggle_split_folio() such folios?
[noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
I am still not clear on that one.
s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )
IIUC, in shrink_folio_list(), pageout()[1] calls writeout(), which calls shmem_writeout(). shmem_writeout() allocates swapcache and removes the folio from pagecache[2]. Between pageout() and the folio is freed, folio->mapping is NULL. Before pageout(), the folio should be unmapped.
[1] https://elixir.bootlin.com/linux/v6.17.8/source/mm/vmscan.c#L1452 [2] https://elixir.bootlin.com/linux/v6.17.8/source/mm/shmem.c#L963 Best Regards, Yan, Zi
On Wed, Nov 19, 2025 at 01:54:45PM +0100, David Hildenbrand (Red Hat) wrote:
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
Hmm... Don't get how to do this nicely.
Looks we can't do it in folio_split_supported().
Or change folio_split_supported() return error code directly?
On upstream, I would do something like the following (untested):
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f2a521e5d683..33fc3590867e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false; } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { /*@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return false;
} else if (folio_test_swapcache(folio)) {/* TODO: support shmem folios that are in the swapcache. */return false; } else if (new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL;
/** Folios that just got truncated cannot get split. Signal to the* caller that there was a race.** TODO: support shmem folios that are in the swapcache.*/if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))return -EBUSY;if (new_order >= folio_order(folio)) return -EINVAL;@@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, gfp_t gfp; mapping = folio->mapping;
/* Truncated ? *//** TODO: add support for large shmem folio in swap cache.* When shmem is in swap cache, mapping is NULL and* folio_test_swapcache() is true.*/if (!mapping) {ret = -EBUSY;goto out;}
VM_WARN_ON_ONCE_FOLIO(!mapping, folio); min_order = mapping_min_folio_order(folio->mapping); if (new_order < min_order) {So rule out the truncated case earlier, leaving only the swapcache check to be handled later.
Thoughts?
Cleaner, will test this first.
Probably worth mentioning that this was identified by code inspection?
Agree.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang richard.weiyang@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org
Hmm, what would this patch look like when based on current upstream? We'd likely want to get that upstream asap.
This depends whether we want it on top of [1].
Current upstream doesn't have it [1] and need to fix it in two places.
Andrew mention prefer a fixup version in [2].
As we will want to backport this patch, likely we want to have it apply on current master.
Bur Andrew can comment what he prefers in this case of a stable fix.
Yep, I will prepare patch both for current master and current mm-new.
And wait for Andrew's order.
-- Cheers
David
On Wed, Nov 19, 2025 at 09:57:58AM +0100, David Hildenbrand (Red Hat) wrote:
On 19.11.25 02:26, Wei Yang wrote:
Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
Under which circumstances would that be the case? Only for large shmem folios in the swapcache or also for truncated folios? So I'd assume this would also affect truncated folios and we should spell that out here?
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
The doc of __split_huge_page_to_list_to_order() would now be outdated and has to be updated.
Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of -EBUSY will make a difference on concurrent truncation. -EINVAL will be propagated and make the operation fail, while -EBUSY will be translated to -EAGAIN and the caller will simply lookup the folio again and retry.
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
I come up a draft:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..3e140fa1ca13 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,6 +3696,18 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return -EINVAL; + } else if (!folio->mapping) { + /* + * If there is no mapping that the folio was truncated and we + * cannot split. + * + * TODO: large shmem folio in the swap cache also don't + * currently have a mapping but folio_test_swapcache() is true + * for them. + */ + if (folio_test_swapcache(folio)) + return -EINVAL; + return -EBUSY; } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { @@ -3931,8 +3943,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (new_order >= old_order) return -EINVAL;
- if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true)) - return -EINVAL; + ret = folio_split_supported((folio, new_order, split_type, /* warn = */ true)); + if (ret) + return ret;
is_hzp = is_huge_zero_folio(folio); if (is_hzp) {
Not sure I get your point correctly.
On 19.11.25 13:42, Wei Yang wrote:
On Wed, Nov 19, 2025 at 09:57:58AM +0100, David Hildenbrand (Red Hat) wrote:
On 19.11.25 02:26, Wei Yang wrote:
Commit c010d47f107f ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference.
Under which circumstances would that be the case? Only for large shmem folios in the swapcache or also for truncated folios? So I'd assume this would also affect truncated folios and we should spell that out here?
This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags.
This fix necessarily changes the return value from -EBUSY to -EINVAL when mapping is NULL. After reviewing current callers, they do not differentiate between these two error codes, making this change safe.
The doc of __split_huge_page_to_list_to_order() would now be outdated and has to be updated.
Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of -EBUSY will make a difference on concurrent truncation. -EINVAL will be propagated and make the operation fail, while -EBUSY will be translated to -EAGAIN and the caller will simply lookup the folio again and retry.
So I think we should try to keep truncation return -EBUSY. For the shmem case, I think it's ok to return -EINVAL. I guess we can identify such folios by checking for folio_test_swapcache().
I come up a draft:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c69572b6c3f..3e140fa1ca13 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3696,6 +3696,18 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, "Cannot split to order-1 folio"); if (new_order == 1) return -EINVAL;
} else if (!folio->mapping) {/** If there is no mapping that the folio was truncated and we* cannot split.** TODO: large shmem folio in the swap cache also don't* currently have a mapping but folio_test_swapcache() is true* for them.*/if (folio_test_swapcache(folio))
As per discussions, that would likely have to be
folio_test_swapbacked() && folio_test_swapcache()
return -EINVAL;return -EBUSY; } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) {@@ -3931,8 +3943,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (new_order >= old_order) return -EINVAL;
if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true))return -EINVAL;
ret = folio_split_supported((folio, new_order, split_type, /* warn = */ true));if (ret)
I'd prefer if folio_split_supported() would keep returning a boolen, such that we detect the truncation case earlier and just return -EBUSY.
But no strong opinion. Important part is that truncation handling is not changed without taking a lot of care.
linux-stable-mirror@lists.linaro.org