Hi all,
This patch series addresses false positives in the generic mm selftests and skips tests that cannot run correctly due to missing features or system limitations.
v2: https://lore.kernel.org/all/20250703060656.54345-1-aboorvad@linux.ibm.com/
Changes in v3:
- Rebased onto the latest mm-new branch, top commit of the base is commit 0709ddf8951f ("mm: add zblock allocator"). - Minor refactor based on the review comments. - Included the tags from the previous version.
---
v1: https://lore.kernel.org/all/20250616160632.35250-1-aboorvad@linux.ibm.com/
Changes in v2:
- Rebased onto the mm-new branch, top commit of the base is commit 3b4a8ad89f7e ("mm: add zblock allocator"). - Split some patches for clarity. - Updated virtual_address_range test to support testing 4PB VA on PPC64. - Added proper Fixes: tags. - Included a patch to skip a failing userfaultfd test when unsupported, instead of reporting a failure.
---
Please let us know if you have any further comments.
Thanks, Aboorva
Aboorva Devarajan (3): selftests/mm: Fix child process exit codes in ksm_functional_tests selftests/mm: Skip thuge-gen test if system is not setup properly selftests/mm: Skip hugepage-mremap test if userfaultfd unavailable
Donet Tom (4): mm/selftests: Fix incorrect pointer being passed to mark_range() selftests/mm: Add support to test 4PB VA on PPC64 selftest/mm: Fix ksm_funtional_test failures mm/selftests: Fix split_huge_page_test failure on systems with 64KB page size
tools/testing/selftests/mm/hugepage-mremap.c | 16 +++++++++-- .../selftests/mm/ksm_functional_tests.c | 28 +++++++++++++------ .../selftests/mm/split_huge_page_test.c | 23 +++++++++------ tools/testing/selftests/mm/thuge-gen.c | 11 +++++--- .../selftests/mm/virtual_address_range.c | 13 ++++++++- 5 files changed, 67 insertions(+), 24 deletions(-)
From: Donet Tom donettom@linux.ibm.com
In main(), the high address is stored in hptr, but for mark_range(), the address passed is ptr, not hptr. Fixed this by changing ptr[i] to hptr[i] in mark_range() function call.
Fixes: b2a79f62133a ("selftests/mm: virtual_address_range: unmap chunks after validation") Reviewed-by: Dev Jain dev.jain@arm.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com --- tools/testing/selftests/mm/virtual_address_range.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 169dbd692bf5..e24c36a39f22 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -227,7 +227,7 @@ int main(int argc, char *argv[]) if (hptr[i] == MAP_FAILED) break;
- mark_range(ptr[i], MAP_CHUNK_SIZE); + mark_range(hptr[i], MAP_CHUNK_SIZE); validate_addr(hptr[i], 1); } hchunks = i;
From: Donet Tom donettom@linux.ibm.com
PowerPC64 supports a 4PB virtual address space, but this test was previously limited to 512TB. This patch extends the coverage up to the full 4PB VA range on PowerPC64.
Memory from 0 to 128TB is allocated without an address hint, while allocations from 128TB to 4PB use a hint address.
Reviewed-by: Dev Jain dev.jain@arm.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com --- tools/testing/selftests/mm/virtual_address_range.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index e24c36a39f22..81b33d8f78f4 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -44,12 +44,18 @@ * On Arm64 the address space is 256TB and support for * high mappings up to 4PB virtual address space has * been added. + * + * On PowerPC64, the address space up to 128TB can be + * mapped without a hint. Addresses beyond 128TB, up to + * 4PB, can be mapped with a hint. + * */
#define NR_CHUNKS_128TB ((128 * SZ_1TB) / MAP_CHUNK_SIZE) /* Number of chunks for 128TB */ #define NR_CHUNKS_256TB (NR_CHUNKS_128TB * 2UL) #define NR_CHUNKS_384TB (NR_CHUNKS_128TB * 3UL) #define NR_CHUNKS_3840TB (NR_CHUNKS_128TB * 30UL) +#define NR_CHUNKS_3968TB (NR_CHUNKS_128TB * 31UL)
#define ADDR_MARK_128TB (1UL << 47) /* First address beyond 128TB */ #define ADDR_MARK_256TB (1UL << 48) /* First address beyond 256TB */ @@ -59,6 +65,11 @@ #define HIGH_ADDR_SHIFT 49 #define NR_CHUNKS_LOW NR_CHUNKS_256TB #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB +#elif defined(__PPC64__) +#define HIGH_ADDR_MARK ADDR_MARK_128TB +#define HIGH_ADDR_SHIFT 48 +#define NR_CHUNKS_LOW NR_CHUNKS_128TB +#define NR_CHUNKS_HIGH NR_CHUNKS_3968TB #else #define HIGH_ADDR_MARK ADDR_MARK_128TB #define HIGH_ADDR_SHIFT 48
From: Donet Tom donettom@linux.ibm.com
This patch fixed 2 issues.
1) After fork() in test_prctl_fork, the child process uses the file descriptors from the parent process to read ksm_stat and ksm_merging_pages. This results in incorrect values being read (parent process ksm_stat and ksm_merging_pages will be read in child), causing the test to fail.
This patch calls init_global_file_handles() in the child process to ensure that the current process's file descriptors are used to read ksm_stat and ksm_merging_pages.
2) All tests currently call ksm_merge to trigger page merging. To ensure the system remains in a consistent state for subsequent tests, it is better to call ksm_unmerge during the test cleanup phase.
In the test_prctl_fork test, after a fork(), reading ksm_merging_pages in the child process returns a non-zero value because a previous test performed a merge, and the child's memory state is inherited from the parent.
Although the child process calls ksm_unmerge, the ksm_merging_pages counter in the parent is reset to zero, while the child's counter remains unchanged. This discrepancy causes the test to fail.
To avoid this issue, each test should call ksm_unmerge during cleanup to ensure the counter is reset and the system is in a clean state for subsequent tests.
execv argument is an array of pointers to null-terminated strings. In this patch we also added NULL in the execv argument.
Fixes: 6c47de3be3a0 ("selftest/mm: ksm_functional_tests: extend test case for ksm fork/exec") Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com --- tools/testing/selftests/mm/ksm_functional_tests.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index d8bd1911dfc0..996dc6645570 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd; static int pagemap_fd; static size_t pagesize;
+static void init_global_file_handles(void); + static bool range_maps_duplicates(char *addr, unsigned long size) { unsigned long offs_a, offs_b, pfn_a, pfn_b; @@ -274,6 +276,7 @@ static void test_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap: + ksm_unmerge(); munmap(map, size); }
@@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void) ksft_test_result(!range_maps_duplicates(map, size), "KSM zero pages were unmerged\n"); unmap: + ksm_unmerge(); munmap(map, size); }
@@ -366,6 +370,7 @@ static void test_unmerge_discarded(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap: + ksm_unmerge(); munmap(map, size); }
@@ -452,6 +457,7 @@ static void test_unmerge_uffd_wp(void) close_uffd: close(uffd); unmap: + ksm_unmerge(); munmap(map, size); } #endif @@ -515,6 +521,7 @@ static int test_child_ksm(void) else if (map == MAP_MERGE_SKIP) return -3;
+ ksm_unmerge(); munmap(map, size); return 0; } @@ -548,6 +555,7 @@ static void test_prctl_fork(void)
child_pid = fork(); if (!child_pid) { + init_global_file_handles(); exit(test_child_ksm()); } else if (child_pid < 0) { ksft_test_result_fail("fork() failed\n"); @@ -595,7 +603,7 @@ static void test_prctl_fork_exec(void) return; } else if (child_pid == 0) { char *prg_name = "./ksm_functional_tests"; - char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME }; + char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
execv(prg_name, argv_for_program); return; @@ -644,6 +652,7 @@ static void test_prctl_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap: + ksm_unmerge(); munmap(map, size); }
@@ -677,6 +686,7 @@ static void test_prot_none(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap: + ksm_unmerge(); munmap(map, size); }
On Tue, Jul 29, 2025 at 11:03:59AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
This patch fixed 2 issues.
- After fork() in test_prctl_fork, the child process uses the file
descriptors from the parent process to read ksm_stat and ksm_merging_pages. This results in incorrect values being read (parent process ksm_stat and ksm_merging_pages will be read in child), causing the test to fail.
This patch calls init_global_file_handles() in the child process to ensure that the current process's file descriptors are used to read ksm_stat and ksm_merging_pages.
- All tests currently call ksm_merge to trigger page merging.
To ensure the system remains in a consistent state for subsequent tests, it is better to call ksm_unmerge during the test cleanup phase.
In the test_prctl_fork test, after a fork(), reading ksm_merging_pages in the child process returns a non-zero value because a previous test performed a merge, and the child's memory state is inherited from the parent.
Although the child process calls ksm_unmerge, the ksm_merging_pages counter in the parent is reset to zero, while the child's counter remains unchanged. This discrepancy causes the test to fail.
To avoid this issue, each test should call ksm_unmerge during cleanup to ensure the counter is reset and the system is in a clean state for subsequent tests.
execv argument is an array of pointers to null-terminated strings. In this patch we also added NULL in the execv argument.
Fixes: 6c47de3be3a0 ("selftest/mm: ksm_functional_tests: extend test case for ksm fork/exec") Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com
tools/testing/selftests/mm/ksm_functional_tests.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index d8bd1911dfc0..996dc6645570 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd; static int pagemap_fd; static size_t pagesize;
+static void init_global_file_handles(void);
static bool range_maps_duplicates(char *addr, unsigned long size) { unsigned long offs_a, offs_b, pfn_a, pfn_b; @@ -274,6 +276,7 @@ static void test_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge();
In __mmap_and_merge_range(), we call ksm_unmerge(). Why this one not help?
Not very familiar with ksm stuff. Would you mind giving more on how this fix the failure you see?
munmap(map, size); }
@@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void) ksft_test_result(!range_maps_duplicates(map, size), "KSM zero pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -366,6 +370,7 @@ static void test_unmerge_discarded(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -452,6 +457,7 @@ static void test_unmerge_uffd_wp(void) close_uffd: close(uffd); unmap:
- ksm_unmerge(); munmap(map, size);
} #endif @@ -515,6 +521,7 @@ static int test_child_ksm(void) else if (map == MAP_MERGE_SKIP) return -3;
- ksm_unmerge(); munmap(map, size); return 0;
} @@ -548,6 +555,7 @@ static void test_prctl_fork(void)
child_pid = fork(); if (!child_pid) {
init_global_file_handles();
Would this leave fd in parent as orphan?
exit(test_child_ksm());
} else if (child_pid < 0) { ksft_test_result_fail("fork() failed\n"); @@ -595,7 +603,7 @@ static void test_prctl_fork_exec(void) return; } else if (child_pid == 0) { char *prg_name = "./ksm_functional_tests";
char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME };
char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
execv(prg_name, argv_for_program); return;
@@ -644,6 +652,7 @@ static void test_prctl_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -677,6 +686,7 @@ static void test_prot_none(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
-- 2.47.1
}
@@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void) ksft_test_result(!range_maps_duplicates(map, size), "KSM zero pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -366,6 +370,7 @@ static void test_unmerge_discarded(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -452,6 +457,7 @@ static void test_unmerge_uffd_wp(void) close_uffd: close(uffd); unmap:
- ksm_unmerge(); munmap(map, size);
} #endif @@ -515,6 +521,7 @@ static int test_child_ksm(void) else if (map == MAP_MERGE_SKIP) return -3;
- ksm_unmerge(); munmap(map, size); return 0;
} @@ -548,6 +555,7 @@ static void test_prctl_fork(void)
child_pid = fork(); if (!child_pid) {
init_global_file_handles();
Would this leave fd in parent as orphan?
Probably yes, but only until the child quits, so likely we don't care.
On 8/4/25 2:41 PM, Wei Yang wrote:
On Tue, Jul 29, 2025 at 11:03:59AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
This patch fixed 2 issues.
- After fork() in test_prctl_fork, the child process uses the file
descriptors from the parent process to read ksm_stat and ksm_merging_pages. This results in incorrect values being read (parent process ksm_stat and ksm_merging_pages will be read in child), causing the test to fail.
This patch calls init_global_file_handles() in the child process to ensure that the current process's file descriptors are used to read ksm_stat and ksm_merging_pages.
- All tests currently call ksm_merge to trigger page merging.
To ensure the system remains in a consistent state for subsequent tests, it is better to call ksm_unmerge during the test cleanup phase.
In the test_prctl_fork test, after a fork(), reading ksm_merging_pages in the child process returns a non-zero value because a previous test performed a merge, and the child's memory state is inherited from the parent.
Although the child process calls ksm_unmerge, the ksm_merging_pages counter in the parent is reset to zero, while the child's counter remains unchanged. This discrepancy causes the test to fail.
To avoid this issue, each test should call ksm_unmerge during cleanup to ensure the counter is reset and the system is in a clean state for subsequent tests.
execv argument is an array of pointers to null-terminated strings. In this patch we also added NULL in the execv argument.
Fixes: 6c47de3be3a0 ("selftest/mm: ksm_functional_tests: extend test case for ksm fork/exec") Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com
tools/testing/selftests/mm/ksm_functional_tests.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index d8bd1911dfc0..996dc6645570 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd; static int pagemap_fd; static size_t pagesize;
+static void init_global_file_handles(void);
static bool range_maps_duplicates(char *addr, unsigned long size) { unsigned long offs_a, offs_b, pfn_a, pfn_b; @@ -274,6 +276,7 @@ static void test_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge();
In __mmap_and_merge_range(), we call ksm_unmerge(). Why this one not help?
Not very familiar with ksm stuff. Would you mind giving more on how this fix the failure you see?
The issue I was facing here was test_prctl_fork was failing.
# [RUN] test_prctl_fork # Still pages merged #
This issue occurred because the previous test performed a merge, causing the value of /proc/self/ksm_merging_pages to reflect the number of deduplicated pages. After that, a fork() was called. Post-fork, the child process inherited the parent's ksm_merging_pages value.
Then, the child process invoked __mmap_and_merge_range(), which resulted in unmerging the pages and resetting the value. However, since the parent process had performed the merge, its ksm_merging_pages value also got reset to 0. Meanwhile, the child process had not performed any merge itself, so the inherited value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
munmap(map, size); }
@@ -338,6 +341,7 @@ static void test_unmerge_zero_pages(void) ksft_test_result(!range_maps_duplicates(map, size), "KSM zero pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -366,6 +370,7 @@ static void test_unmerge_discarded(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -452,6 +457,7 @@ static void test_unmerge_uffd_wp(void) close_uffd: close(uffd); unmap:
- ksm_unmerge(); munmap(map, size);
} #endif @@ -515,6 +521,7 @@ static int test_child_ksm(void) else if (map == MAP_MERGE_SKIP) return -3;
- ksm_unmerge(); munmap(map, size); return 0;
} @@ -548,6 +555,7 @@ static void test_prctl_fork(void)
child_pid = fork(); if (!child_pid) {
init_global_file_handles();
Would this leave fd in parent as orphan?
exit(test_child_ksm());
} else if (child_pid < 0) { ksft_test_result_fail("fork() failed\n"); @@ -595,7 +603,7 @@ static void test_prctl_fork_exec(void) return; } else if (child_pid == 0) { char *prg_name = "./ksm_functional_tests";
char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME };
char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
execv(prg_name, argv_for_program); return;
@@ -644,6 +652,7 @@ static void test_prctl_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
@@ -677,6 +686,7 @@ static void test_prot_none(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge(); munmap(map, size);
}
-- 2.47.1
On Tue, Aug 05, 2025 at 11:39:15AM +0530, Donet Tom wrote:
On 8/4/25 2:41 PM, Wei Yang wrote:
On Tue, Jul 29, 2025 at 11:03:59AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
[...]
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index d8bd1911dfc0..996dc6645570 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd; static int pagemap_fd; static size_t pagesize;
+static void init_global_file_handles(void);
static bool range_maps_duplicates(char *addr, unsigned long size) { unsigned long offs_a, offs_b, pfn_a, pfn_b; @@ -274,6 +276,7 @@ static void test_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge();
In __mmap_and_merge_range(), we call ksm_unmerge(). Why this one not help?
Not very familiar with ksm stuff. Would you mind giving more on how this fix the failure you see?
The issue I was facing here was test_prctl_fork was failing.
# [RUN] test_prctl_fork # Still pages merged #
This issue occurred because the previous test performed a merge, causing the value of /proc/self/ksm_merging_pages to reflect the number of deduplicated pages. After that, a fork() was called. Post-fork, the child process inherited the parent's ksm_merging_pages value.
Yes, this one is fixed by calling init_global_file_handles() in child.
Then, the child process invoked __mmap_and_merge_range(), which resulted in unmerging the pages and resetting the value. However, since the parent process had performed the merge, its ksm_merging_pages value also got reset to 0. Meanwhile, the child process had not performed any merge itself, so the inherited
I assume the behavior described here is after the change to call init_global_file_handles() in child.
Child process inherit the ksm_merging_pages from parent, which is reasonable to me. But I am confused why ksm_unmerge() would just reset ksm_merging_pages for parent and leave ksm_merging_pages in child process unchanged.
ksm_unmerge() writes to /sys/kernel/mm/ksm/run, which is a system wide sysfs interface. I expect it applies to both parent and child.
value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
I guess you mean the get_my_merging_page() in __mmap_and_merge_range() return a non-zero value. But there is ksm_unmerge() before it. Why this ksm_unmerge() couldn't reset the value, but a ksm_unmerge() in parent could.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
Also all the tests before test_prctl_fork(), except test_prctl(), calls
ksft_test_result(!range_maps_duplicates());
If the previous tests succeed, it means there is no duplicate pages. This means ksm_merging_pages should be 0 before test_prctl_fork() if other tests pass. And the child process would inherit a 0 ksm_merging_pages. (A quick test proves it.)
So which part of the story I missed?
On 8/5/25 10:33 PM, Wei Yang wrote:
On Tue, Aug 05, 2025 at 11:39:15AM +0530, Donet Tom wrote:
On 8/4/25 2:41 PM, Wei Yang wrote:
On Tue, Jul 29, 2025 at 11:03:59AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
[...]
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index d8bd1911dfc0..996dc6645570 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -46,6 +46,8 @@ static int ksm_use_zero_pages_fd; static int pagemap_fd; static size_t pagesize;
+static void init_global_file_handles(void);
static bool range_maps_duplicates(char *addr, unsigned long size) { unsigned long offs_a, offs_b, pfn_a, pfn_b; @@ -274,6 +276,7 @@ static void test_unmerge(void) ksft_test_result(!range_maps_duplicates(map, size), "Pages were unmerged\n"); unmap:
- ksm_unmerge();
In __mmap_and_merge_range(), we call ksm_unmerge(). Why this one not help?
Not very familiar with ksm stuff. Would you mind giving more on how this fix the failure you see?
The issue I was facing here was test_prctl_fork was failing.
# [RUN] test_prctl_fork # Still pages merged #
This issue occurred because the previous test performed a merge, causing the value of /proc/self/ksm_merging_pages to reflect the number of deduplicated pages. After that, a fork() was called. Post-fork, the child process inherited the parent's ksm_merging_pages value.
Yes, this one is fixed by calling init_global_file_handles() in child.
Then, the child process invoked __mmap_and_merge_range(), which resulted in unmerging the pages and resetting the value. However, since the parent process had performed the merge, its ksm_merging_pages value also got reset to 0. Meanwhile, the child process had not performed any merge itself, so the inherited
I assume the behavior described here is after the change to call init_global_file_handles() in child.
Yes
Child process inherit the ksm_merging_pages from parent, which is reasonable to me. But I am confused why ksm_unmerge() would just reset ksm_merging_pages for parent and leave ksm_merging_pages in child process unchanged.
ksm_unmerge() writes to /sys/kernel/mm/ksm/run, which is a system wide sysfs interface. I expect it applies to both parent and child.
I am not very familiar with the KSM code, but from what I understand:
The ksm_merging_pages counter is maintained per mm_struct. When we write to /sys/kernel/mm/ksm/run, unmerging is triggered, and the counters are updated for all mm_structs present in the ksm_mm_slot list.
A mm_struct gets added to this list when MADV_MERGEABLE is called. In the case of the child process, since MADV_MERGEABLE has not been invoked yet, its mm_struct is not part of the list. As a result, its ksm_merging_pages counter is not reset.
value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
I guess you mean the get_my_merging_page() in __mmap_and_merge_range() return a non-zero value. But there is ksm_unmerge() before it. Why this ksm_unmerge() couldn't reset the value, but a ksm_unmerge() in parent could.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
Also all the tests before test_prctl_fork(), except test_prctl(), calls
ksft_test_result(!range_maps_duplicates());
If the previous tests succeed, it means there is no duplicate pages. This means ksm_merging_pages should be 0 before test_prctl_fork() if other tests pass. And the child process would inherit a 0 ksm_merging_pages. (A quick test proves it.)
If I understand correctly, all the tests are calling MADV_UNMERGEABLE, which internally calls break_ksm() in the kernel. This function replaces the KSM page with an exclusive anonymous page. However, the ksm_merging_pages counters are not updated at this point.
The function range_maps_duplicates(map, size) checks whether the pages have been unmerged. Since break_ksm() does perform the unmerge, this function returns false, and the test passes.
The ksm_merging_pages update happens later via the ksm_scan_thread(). That’s why we observe that ksm_merging_pages values are not reset immediately after the test finishes.
If we add a sleep(1) after the MADV_UNMERGEABLE call, we can see that the ksm_merging_pages values are reset after the sleep.
Once the test completes successfully, we can call ksm_unmerge(), which will immediately reset the ksm_merging_pages value. This way, in the fork test, the child process will also see the correct value.
So which part of the story I missed?
So, during the cleanup phase after a successful test, we can call ksm_unmerge() to reset the counter. Do you see any issue with this approach?
On Wed, Aug 06, 2025 at 06:30:37PM +0530, Donet Tom wrote: [...]
Child process inherit the ksm_merging_pages from parent, which is reasonable to me. But I am confused why ksm_unmerge() would just reset ksm_merging_pages for parent and leave ksm_merging_pages in child process unchanged.
ksm_unmerge() writes to /sys/kernel/mm/ksm/run, which is a system wide sysfs interface. I expect it applies to both parent and child.
I am not very familiar with the KSM code, but from what I understand:
The ksm_merging_pages counter is maintained per mm_struct. When we write to /sys/kernel/mm/ksm/run, unmerging is triggered, and the counters are updated for all mm_structs present in the ksm_mm_slot list.
A mm_struct gets added to this list when MADV_MERGEABLE is called. In the case of the child process, since MADV_MERGEABLE has not been invoked yet, its mm_struct is not part of the list. As a result, its ksm_merging_pages counter is not reset.
Would this flag be inherited during fork? VM_MERGEABLE is saved in related vma I don't see it would be dropped during fork. Maybe missed.
value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
I guess you mean the get_my_merging_page() in __mmap_and_merge_range() return a non-zero value. But there is ksm_unmerge() before it. Why this ksm_unmerge() couldn't reset the value, but a ksm_unmerge() in parent could.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
Also all the tests before test_prctl_fork(), except test_prctl(), calls
ksft_test_result(!range_maps_duplicates());
If the previous tests succeed, it means there is no duplicate pages. This means ksm_merging_pages should be 0 before test_prctl_fork() if other tests pass. And the child process would inherit a 0 ksm_merging_pages. (A quick test proves it.)
If I understand correctly, all the tests are calling MADV_UNMERGEABLE, which internally calls break_ksm() in the kernel. This function replaces the KSM page with an exclusive anonymous page. However, the ksm_merging_pages counters are not updated at this point.
The function range_maps_duplicates(map, size) checks whether the pages have been unmerged. Since break_ksm() does perform the unmerge, this function returns false, and the test passes.
The ksm_merging_pages update happens later via the ksm_scan_thread(). That’s why we observe that ksm_merging_pages values are not reset immediately after the test finishes.
Not familiar with ksm internal. But the ksm_merging_pages counter still has non-zero value when all merged pages are unmerged makes me feel odd.
If we add a sleep(1) after the MADV_UNMERGEABLE call, we can see that the ksm_merging_pages values are reset after the sleep.
Once the test completes successfully, we can call ksm_unmerge(), which will immediately reset the ksm_merging_pages value. This way, in the fork test, the child process will also see the correct value.
So which part of the story I missed?
So, during the cleanup phase after a successful test, we can call ksm_unmerge() to reset the counter. Do you see any issue with this approach?
It looks there is no issue with an extra ksm_unmerge().
But one more question. Why an extra ksm_unmerge() could help.
Here is what we have during test:
test_prot_none() !range_maps_duplicates() ksm_unmerge() 1) <--- newly add test_prctl_fork() >--- in child __mmap_and_merge_range() ksm_unmerge() 2) <--- already have
As you mentioned above ksm_unmerge() would immediately reset ksm_merging_pages, why ksm_unmerge() at 2) still leave ksm_merging_pages non-zero? And the one at 1) could help.
Or there is still some timing issue like sleep(1) you did?
On 8/6/25 8:24 PM, Wei Yang wrote:
On Wed, Aug 06, 2025 at 06:30:37PM +0530, Donet Tom wrote: [...]
Child process inherit the ksm_merging_pages from parent, which is reasonable to me. But I am confused why ksm_unmerge() would just reset ksm_merging_pages for parent and leave ksm_merging_pages in child process unchanged.
ksm_unmerge() writes to /sys/kernel/mm/ksm/run, which is a system wide sysfs interface. I expect it applies to both parent and child.
I am not very familiar with the KSM code, but from what I understand:
The ksm_merging_pages counter is maintained per mm_struct. When we write to /sys/kernel/mm/ksm/run, unmerging is triggered, and the counters are updated for all mm_structs present in the ksm_mm_slot list.
A mm_struct gets added to this list when MADV_MERGEABLE is called. In the case of the child process, since MADV_MERGEABLE has not been invoked yet, its mm_struct is not part of the list. As a result, its ksm_merging_pages counter is not reset.
Would this flag be inherited during fork? VM_MERGEABLE is saved in related vma I don't see it would be dropped during fork. Maybe missed.
value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
I guess you mean the get_my_merging_page() in __mmap_and_merge_range() return a non-zero value. But there is ksm_unmerge() before it. Why this ksm_unmerge() couldn't reset the value, but a ksm_unmerge() in parent could.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
Also all the tests before test_prctl_fork(), except test_prctl(), calls
ksft_test_result(!range_maps_duplicates());
If the previous tests succeed, it means there is no duplicate pages. This means ksm_merging_pages should be 0 before test_prctl_fork() if other tests pass. And the child process would inherit a 0 ksm_merging_pages. (A quick test proves it.)
If I understand correctly, all the tests are calling MADV_UNMERGEABLE, which internally calls break_ksm() in the kernel. This function replaces the KSM page with an exclusive anonymous page. However, the ksm_merging_pages counters are not updated at this point.
The function range_maps_duplicates(map, size) checks whether the pages have been unmerged. Since break_ksm() does perform the unmerge, this function returns false, and the test passes.
The ksm_merging_pages update happens later via the ksm_scan_thread(). That’s why we observe that ksm_merging_pages values are not reset immediately after the test finishes.
Not familiar with ksm internal. But the ksm_merging_pages counter still has non-zero value when all merged pages are unmerged makes me feel odd.
If we add a sleep(1) after the MADV_UNMERGEABLE call, we can see that the ksm_merging_pages values are reset after the sleep.
Once the test completes successfully, we can call ksm_unmerge(), which will immediately reset the ksm_merging_pages value. This way, in the fork test, the child process will also see the correct value.
So which part of the story I missed?
So, during the cleanup phase after a successful test, we can call ksm_unmerge() to reset the counter. Do you see any issue with this approach?
It looks there is no issue with an extra ksm_unmerge().
But one more question. Why an extra ksm_unmerge() could help.
Here is what we have during test:
test_prot_none() !range_maps_duplicates() ksm_unmerge() 1) <--- newly add test_prctl_fork() >--- in child __mmap_and_merge_range() ksm_unmerge() 2) <--- already have
As you mentioned above ksm_unmerge() would immediately reset ksm_merging_pages, why ksm_unmerge() at 2) still leave ksm_merging_pages non-zero? And the one at 1) could help.
From the debugging, what I understood is:
When we perform fork(), MADV_MERGEABLE, or PR_SET_MEMORY_MERGE, the mm_struct of the process gets added to the ksm_mm_slot list. As a result, both the parent and child processes’ mm_struct structures will be present in ksm_mm_slot.
When KSM merges the pages, it creates a ksm_rmap_item for each page, and the ksm_merging_pages counter is incremented accordingly.
Since the parent process did the merge, its mm_struct is present in ksm_mm_slot, and ksm_rmap_item entries are created for all the merged pages.
When a process is forked, the child’s mm_struct is also added to ksm_mm_slot, and it inherits the ksm_merging_pages count. However, no ksm_rmap_item entries are created for the child process because it did not do any merge.
When ksm_unmerge() is called, it iterates over all processes in ksm_mm_slot. In our case, both the parent and child are present. It first processes the parent, which has ksm_rmap_item entries, so it unmerges the pages and resets the ksm_merging_pages counter.
For the child, since it did not perform any actual merging, it does not have any ksm_rmap_item entries. Therefore, there are no pages to unmerge, and the counter remains unchanged.
So, only processes that performed KSM merging will have their counters updated during ksm_unmerge(). The child process, having not initiated any merging, retains the inherited counter value without any update.
So from a testing point of view, I think it is better to reset the counters as part of the cleanup code to ensure that the next tests do not get incorrect values.
The question I have is: is it correct to keep the inherited |ksm_merging_page| value in the child or Should we reset it to 0 during |ksm_fork()|?
Or there is still some timing issue like sleep(1) you did?
On Thu, Aug 07, 2025 at 02:56:28PM +0530, Donet Tom wrote:
On 8/6/25 8:24 PM, Wei Yang wrote:
On Wed, Aug 06, 2025 at 06:30:37PM +0530, Donet Tom wrote: [...]
Child process inherit the ksm_merging_pages from parent, which is reasonable to me. But I am confused why ksm_unmerge() would just reset ksm_merging_pages for parent and leave ksm_merging_pages in child process unchanged.
ksm_unmerge() writes to /sys/kernel/mm/ksm/run, which is a system wide sysfs interface. I expect it applies to both parent and child.
I am not very familiar with the KSM code, but from what I understand:
The ksm_merging_pages counter is maintained per mm_struct. When we write to /sys/kernel/mm/ksm/run, unmerging is triggered, and the counters are updated for all mm_structs present in the ksm_mm_slot list.
A mm_struct gets added to this list when MADV_MERGEABLE is called. In the case of the child process, since MADV_MERGEABLE has not been invoked yet, its mm_struct is not part of the list. As a result, its ksm_merging_pages counter is not reset.
Would this flag be inherited during fork? VM_MERGEABLE is saved in related vma I don't see it would be dropped during fork. Maybe missed.
value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
I guess you mean the get_my_merging_page() in __mmap_and_merge_range() return a non-zero value. But there is ksm_unmerge() before it. Why this ksm_unmerge() couldn't reset the value, but a ksm_unmerge() in parent could.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
Also all the tests before test_prctl_fork(), except test_prctl(), calls
ksft_test_result(!range_maps_duplicates());
If the previous tests succeed, it means there is no duplicate pages. This means ksm_merging_pages should be 0 before test_prctl_fork() if other tests pass. And the child process would inherit a 0 ksm_merging_pages. (A quick test proves it.)
If I understand correctly, all the tests are calling MADV_UNMERGEABLE, which internally calls break_ksm() in the kernel. This function replaces the KSM page with an exclusive anonymous page. However, the ksm_merging_pages counters are not updated at this point.
The function range_maps_duplicates(map, size) checks whether the pages have been unmerged. Since break_ksm() does perform the unmerge, this function returns false, and the test passes.
The ksm_merging_pages update happens later via the ksm_scan_thread(). That’s why we observe that ksm_merging_pages values are not reset immediately after the test finishes.
Not familiar with ksm internal. But the ksm_merging_pages counter still has non-zero value when all merged pages are unmerged makes me feel odd.
If we add a sleep(1) after the MADV_UNMERGEABLE call, we can see that the ksm_merging_pages values are reset after the sleep.
Once the test completes successfully, we can call ksm_unmerge(), which will immediately reset the ksm_merging_pages value. This way, in the fork test, the child process will also see the correct value.
So which part of the story I missed?
So, during the cleanup phase after a successful test, we can call ksm_unmerge() to reset the counter. Do you see any issue with this approach?
It looks there is no issue with an extra ksm_unmerge().
But one more question. Why an extra ksm_unmerge() could help.
Here is what we have during test:
test_prot_none() !range_maps_duplicates() ksm_unmerge() 1) <--- newly add test_prctl_fork() >--- in child __mmap_and_merge_range() ksm_unmerge() 2) <--- already have
As you mentioned above ksm_unmerge() would immediately reset ksm_merging_pages, why ksm_unmerge() at 2) still leave ksm_merging_pages non-zero? And the one at 1) could help.
From the debugging, what I understood is:
When we perform fork(), MADV_MERGEABLE, or PR_SET_MEMORY_MERGE, the mm_struct of the process gets added to the ksm_mm_slot list. As a result, both the parent and child processes’ mm_struct structures will be present in ksm_mm_slot.
When KSM merges the pages, it creates a ksm_rmap_item for each page, and the ksm_merging_pages counter is incremented accordingly.
Since the parent process did the merge, its mm_struct is present in ksm_mm_slot, and ksm_rmap_item entries are created for all the merged pages.
When a process is forked, the child’s mm_struct is also added to ksm_mm_slot, and it inherits the ksm_merging_pages count. However, no ksm_rmap_item entries are created for the child process because it did not do any merge.
When ksm_unmerge() is called, it iterates over all processes in ksm_mm_slot. In our case, both the parent and child are present. It first processes the parent, which has ksm_rmap_item entries, so it unmerges the pages and resets the ksm_merging_pages counter.
For the child, since it did not perform any actual merging, it does not have any ksm_rmap_item entries. Therefore, there are no pages to unmerge, and the counter remains unchanged.
Thanks for the detailed analysis.
So the key is child has no ksm_rmap_item which will not clear ksm_merging_page on ksm_unmerge().
So, only processes that performed KSM merging will have their counters updated during ksm_unmerge(). The child process, having not initiated any merging, retains the inherited counter value without any update.
So from a testing point of view, I think it is better to reset the counters as part of the cleanup code to ensure that the next tests do not get incorrect values.
Hmm... I agree from the test point of view based on current situation.
While maybe this is also a check point for later version.
The question I have is: is it correct to keep the inherited |ksm_merging_page| value in the child or Should we reset it to 0 during |ksm_fork()|?
Very good question. There looks to be something wrong, but I am not sure this is the correct way.
Or there is still some timing issue like sleep(1) you did?
On 8/8/25 8:28 AM, Wei Yang wrote:
On Thu, Aug 07, 2025 at 02:56:28PM +0530, Donet Tom wrote:
On 8/6/25 8:24 PM, Wei Yang wrote:
On Wed, Aug 06, 2025 at 06:30:37PM +0530, Donet Tom wrote: [...]
Child process inherit the ksm_merging_pages from parent, which is reasonable to me. But I am confused why ksm_unmerge() would just reset ksm_merging_pages for parent and leave ksm_merging_pages in child process unchanged.
ksm_unmerge() writes to /sys/kernel/mm/ksm/run, which is a system wide sysfs interface. I expect it applies to both parent and child.
I am not very familiar with the KSM code, but from what I understand:
The ksm_merging_pages counter is maintained per mm_struct. When we write to /sys/kernel/mm/ksm/run, unmerging is triggered, and the counters are updated for all mm_structs present in the ksm_mm_slot list.
A mm_struct gets added to this list when MADV_MERGEABLE is called. In the case of the child process, since MADV_MERGEABLE has not been invoked yet, its mm_struct is not part of the list. As a result, its ksm_merging_pages counter is not reset.
Would this flag be inherited during fork? VM_MERGEABLE is saved in related vma I don't see it would be dropped during fork. Maybe missed.
value remained unchanged. That’s why get_my_merging_page() in the child was returning a non-zero value.
I guess you mean the get_my_merging_page() in __mmap_and_merge_range() return a non-zero value. But there is ksm_unmerge() before it. Why this ksm_unmerge() couldn't reset the value, but a ksm_unmerge() in parent could.
Initially, I fixed the issue by calling ksm_unmerge() before the fork(), and that resolved the problem. Later, I decided it would be cleaner to move the ksm_unmerge() call to the test cleanup phase.
Also all the tests before test_prctl_fork(), except test_prctl(), calls
ksft_test_result(!range_maps_duplicates());
If the previous tests succeed, it means there is no duplicate pages. This means ksm_merging_pages should be 0 before test_prctl_fork() if other tests pass. And the child process would inherit a 0 ksm_merging_pages. (A quick test proves it.)
If I understand correctly, all the tests are calling MADV_UNMERGEABLE, which internally calls break_ksm() in the kernel. This function replaces the KSM page with an exclusive anonymous page. However, the ksm_merging_pages counters are not updated at this point.
The function range_maps_duplicates(map, size) checks whether the pages have been unmerged. Since break_ksm() does perform the unmerge, this function returns false, and the test passes.
The ksm_merging_pages update happens later via the ksm_scan_thread(). That’s why we observe that ksm_merging_pages values are not reset immediately after the test finishes.
Not familiar with ksm internal. But the ksm_merging_pages counter still has non-zero value when all merged pages are unmerged makes me feel odd.
If we add a sleep(1) after the MADV_UNMERGEABLE call, we can see that the ksm_merging_pages values are reset after the sleep.
Once the test completes successfully, we can call ksm_unmerge(), which will immediately reset the ksm_merging_pages value. This way, in the fork test, the child process will also see the correct value.
So which part of the story I missed?
So, during the cleanup phase after a successful test, we can call ksm_unmerge() to reset the counter. Do you see any issue with this approach?
It looks there is no issue with an extra ksm_unmerge().
But one more question. Why an extra ksm_unmerge() could help.
Here is what we have during test:
test_prot_none() !range_maps_duplicates() ksm_unmerge() 1) <--- newly add test_prctl_fork() >--- in child __mmap_and_merge_range() ksm_unmerge() 2) <--- already have
As you mentioned above ksm_unmerge() would immediately reset ksm_merging_pages, why ksm_unmerge() at 2) still leave ksm_merging_pages non-zero? And the one at 1) could help.
From the debugging, what I understood is: When we perform fork(), MADV_MERGEABLE, or PR_SET_MEMORY_MERGE, the mm_struct of the process gets added to the ksm_mm_slot list. As a result, both the parent and child processes’ mm_struct structures will be present in ksm_mm_slot.
When KSM merges the pages, it creates a ksm_rmap_item for each page, and the ksm_merging_pages counter is incremented accordingly.
Since the parent process did the merge, its mm_struct is present in ksm_mm_slot, and ksm_rmap_item entries are created for all the merged pages.
When a process is forked, the child’s mm_struct is also added to ksm_mm_slot, and it inherits the ksm_merging_pages count. However, no ksm_rmap_item entries are created for the child process because it did not do any merge.
When ksm_unmerge() is called, it iterates over all processes in ksm_mm_slot. In our case, both the parent and child are present. It first processes the parent, which has ksm_rmap_item entries, so it unmerges the pages and resets the ksm_merging_pages counter.
For the child, since it did not perform any actual merging, it does not have any ksm_rmap_item entries. Therefore, there are no pages to unmerge, and the counter remains unchanged.
Thanks for the detailed analysis.
So the key is child has no ksm_rmap_item which will not clear ksm_merging_page on ksm_unmerge().
So, only processes that performed KSM merging will have their counters updated during ksm_unmerge(). The child process, having not initiated any merging, retains the inherited counter value without any update.
So from a testing point of view, I think it is better to reset the counters as part of the cleanup code to ensure that the next tests do not get incorrect values.
Hmm... I agree from the test point of view based on current situation.
While maybe this is also a check point for later version.
Are you okay to proceed with the current patch in this series?
The question I have is: is it correct to keep the inherited |ksm_merging_page| value in the child or Should we reset it to 0 during |ksm_fork()|?
Very good question. There looks to be something wrong, but I am not sure this is the correct way.
ok.
I am going through it and will come up with a fix along with a test for this scenario. I will post it as a separate series.
Or there is still some timing issue like sleep(1) you did?
On Fri, Aug 08, 2025 at 07:55:37PM +0530, Donet Tom wrote: [...]
Thanks for the detailed analysis.
So the key is child has no ksm_rmap_item which will not clear ksm_merging_page on ksm_unmerge().
So, only processes that performed KSM merging will have their counters updated during ksm_unmerge(). The child process, having not initiated any merging, retains the inherited counter value without any update.
So from a testing point of view, I think it is better to reset the counters as part of the cleanup code to ensure that the next tests do not get incorrect values.
Hmm... I agree from the test point of view based on current situation.
While maybe this is also a check point for later version.
Are you okay to proceed with the current patch in this series?
Sure.
From: Donet Tom donettom@linux.ibm.com
The split_huge_page_test fails on systems with a 64KB base page size. This is because the order of a 2MB huge page is different:
On 64KB systems, the order is 5.
On 4KB systems, it's 9.
The test currently assumes a maximum huge page order of 9, which is only valid for 4KB base page systems. On systems with 64KB pages, attempting to split huge pages beyond their actual order (5) causes the test to fail.
In this patch, we calculate the huge page order based on the system's base page size. With this change, the tests now run successfully on both 64KB and 4KB page size systems.
Fixes: fa6c02315f745 ("mm: huge_memory: a new debugfs interface for splitting THP tests") Reviewed-by: Dev Jain dev.jain@arm.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com --- .../selftests/mm/split_huge_page_test.c | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 05de1fc0005b..718daceb5282 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -36,6 +36,7 @@ uint64_t pmd_pagesize;
#define PFN_MASK ((1UL<<55)-1) #define KPF_THP (1UL<<22) +#define GET_ORDER(nr_pages) (31 - __builtin_clz(nr_pages))
int is_backed_by_thp(char *vaddr, int pagemap_file, int kpageflags_file) { @@ -522,6 +523,9 @@ int main(int argc, char **argv) const char *fs_loc; bool created_tmp; int offset; + unsigned int max_order; + unsigned int nr_pages; + unsigned int tests;
ksft_print_header();
@@ -533,35 +537,38 @@ int main(int argc, char **argv) if (argc > 1) optional_xfs_path = argv[1];
- ksft_set_plan(1+8+1+9+9+8*4+2); - pagesize = getpagesize(); pageshift = ffs(pagesize) - 1; pmd_pagesize = read_pmd_pagesize(); if (!pmd_pagesize) ksft_exit_fail_msg("Reading PMD pagesize failed\n");
+ nr_pages = pmd_pagesize / pagesize; + max_order = GET_ORDER(nr_pages); + tests = 2 + (max_order - 1) + (2 * max_order) + (max_order - 1) * 4 + 2; + ksft_set_plan(tests); + fd_size = 2 * pmd_pagesize;
split_pmd_zero_pages();
- for (i = 0; i < 9; i++) + for (i = 0; i < max_order; i++) if (i != 1) split_pmd_thp_to_order(i);
split_pte_mapped_thp(); - for (i = 0; i < 9; i++) + for (i = 0; i < max_order; i++) split_file_backed_thp(i);
created_tmp = prepare_thp_fs(optional_xfs_path, fs_loc_template, &fs_loc); - for (i = 8; i >= 0; i--) + for (i = max_order - 1; i >= 0; i--) split_thp_in_pagecache_to_order_at(fd_size, fs_loc, i, -1);
- for (i = 0; i < 9; i++) + for (i = 0; i < max_order; i++) for (offset = 0; - offset < pmd_pagesize / pagesize; - offset += MAX(pmd_pagesize / pagesize / 4, 1 << i)) + offset < nr_pages; + offset += MAX(nr_pages / 4, 1 << i)) split_thp_in_pagecache_to_order_at(fd_size, fs_loc, i, offset); cleanup_thp_fs(fs_loc, created_tmp);
On Tue, Jul 29, 2025 at 11:04:00AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
The split_huge_page_test fails on systems with a 64KB base page size. This is because the order of a 2MB huge page is different:
On 64KB systems, the order is 5.
On 4KB systems, it's 9.
The test currently assumes a maximum huge page order of 9, which is only valid for 4KB base page systems. On systems with 64KB pages, attempting to split huge pages beyond their actual order (5) causes the test to fail.
In this patch, we calculate the huge page order based on the system's base page size. With this change, the tests now run successfully on both 64KB and 4KB page size systems.
Fixes: fa6c02315f745 ("mm: huge_memory: a new debugfs interface for splitting THP tests") Reviewed-by: Dev Jain dev.jain@arm.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com
.../selftests/mm/split_huge_page_test.c | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 05de1fc0005b..718daceb5282 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -36,6 +36,7 @@ uint64_t pmd_pagesize;
#define PFN_MASK ((1UL<<55)-1) #define KPF_THP (1UL<<22) +#define GET_ORDER(nr_pages) (31 - __builtin_clz(nr_pages))
int is_backed_by_thp(char *vaddr, int pagemap_file, int kpageflags_file) { @@ -522,6 +523,9 @@ int main(int argc, char **argv) const char *fs_loc; bool created_tmp; int offset;
unsigned int max_order;
unsigned int nr_pages;
unsigned int tests;
ksft_print_header();
@@ -533,35 +537,38 @@ int main(int argc, char **argv) if (argc > 1) optional_xfs_path = argv[1];
- ksft_set_plan(1+8+1+9+9+8*4+2);
- pagesize = getpagesize(); pageshift = ffs(pagesize) - 1; pmd_pagesize = read_pmd_pagesize(); if (!pmd_pagesize) ksft_exit_fail_msg("Reading PMD pagesize failed\n");
- nr_pages = pmd_pagesize / pagesize;
- max_order = GET_ORDER(nr_pages);
There is a sz2ord() in cow.c and uffd-wp-mremap.c.
Maybe we can factor it into vm_util.h and use it here.
On 04.08.25 11:04, Wei Yang wrote:
On Tue, Jul 29, 2025 at 11:04:00AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
The split_huge_page_test fails on systems with a 64KB base page size. This is because the order of a 2MB huge page is different:
On 64KB systems, the order is 5.
On 4KB systems, it's 9.
The test currently assumes a maximum huge page order of 9, which is only valid for 4KB base page systems. On systems with 64KB pages, attempting to split huge pages beyond their actual order (5) causes the test to fail.
In this patch, we calculate the huge page order based on the system's base page size. With this change, the tests now run successfully on both 64KB and 4KB page size systems.
Fixes: fa6c02315f745 ("mm: huge_memory: a new debugfs interface for splitting THP tests") Reviewed-by: Dev Jain dev.jain@arm.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com
.../selftests/mm/split_huge_page_test.c | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 05de1fc0005b..718daceb5282 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -36,6 +36,7 @@ uint64_t pmd_pagesize;
#define PFN_MASK ((1UL<<55)-1) #define KPF_THP (1UL<<22) +#define GET_ORDER(nr_pages) (31 - __builtin_clz(nr_pages))
int is_backed_by_thp(char *vaddr, int pagemap_file, int kpageflags_file) { @@ -522,6 +523,9 @@ int main(int argc, char **argv) const char *fs_loc; bool created_tmp; int offset;
unsigned int max_order;
unsigned int nr_pages;
unsigned int tests;
ksft_print_header();
@@ -533,35 +537,38 @@ int main(int argc, char **argv) if (argc > 1) optional_xfs_path = argv[1];
- ksft_set_plan(1+8+1+9+9+8*4+2);
- pagesize = getpagesize(); pageshift = ffs(pagesize) - 1; pmd_pagesize = read_pmd_pagesize(); if (!pmd_pagesize) ksft_exit_fail_msg("Reading PMD pagesize failed\n");
- nr_pages = pmd_pagesize / pagesize;
- max_order = GET_ORDER(nr_pages);
There is a sz2ord() in cow.c and uffd-wp-mremap.c.
Maybe we can factor it into vm_util.h and use it here.
That sounds reasonable to me.
On 8/4/25 2:34 PM, Wei Yang wrote:
On Tue, Jul 29, 2025 at 11:04:00AM +0530, Aboorva Devarajan wrote:
From: Donet Tom donettom@linux.ibm.com
The split_huge_page_test fails on systems with a 64KB base page size. This is because the order of a 2MB huge page is different:
On 64KB systems, the order is 5.
On 4KB systems, it's 9.
The test currently assumes a maximum huge page order of 9, which is only valid for 4KB base page systems. On systems with 64KB pages, attempting to split huge pages beyond their actual order (5) causes the test to fail.
In this patch, we calculate the huge page order based on the system's base page size. With this change, the tests now run successfully on both 64KB and 4KB page size systems.
Fixes: fa6c02315f745 ("mm: huge_memory: a new debugfs interface for splitting THP tests") Reviewed-by: Dev Jain dev.jain@arm.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com
.../selftests/mm/split_huge_page_test.c | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 05de1fc0005b..718daceb5282 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -36,6 +36,7 @@ uint64_t pmd_pagesize;
#define PFN_MASK ((1UL<<55)-1) #define KPF_THP (1UL<<22) +#define GET_ORDER(nr_pages) (31 - __builtin_clz(nr_pages))
int is_backed_by_thp(char *vaddr, int pagemap_file, int kpageflags_file) { @@ -522,6 +523,9 @@ int main(int argc, char **argv) const char *fs_loc; bool created_tmp; int offset;
unsigned int max_order;
unsigned int nr_pages;
unsigned int tests;
ksft_print_header();
@@ -533,35 +537,38 @@ int main(int argc, char **argv) if (argc > 1) optional_xfs_path = argv[1];
- ksft_set_plan(1+8+1+9+9+8*4+2);
- pagesize = getpagesize(); pageshift = ffs(pagesize) - 1; pmd_pagesize = read_pmd_pagesize(); if (!pmd_pagesize) ksft_exit_fail_msg("Reading PMD pagesize failed\n");
- nr_pages = pmd_pagesize / pagesize;
- max_order = GET_ORDER(nr_pages);
There is a sz2ord() in cow.c and uffd-wp-mremap.c.
Maybe we can factor it into vm_util.h and use it here.
Sure, I will make the change and send a new version.
In ksm_functional_tests, test_child_ksm() returned negative values to indicate errors. However, when passed to exit(), these were interpreted as large unsigned values (e.g, -2 became 254), leading to incorrect handling in the parent process. As a result, some tests appeared to be skipped or silently failed.
This patch changes test_child_ksm() to return positive error codes (1, 2, 3) and updates test_child_ksm_err() to interpret them correctly. Additionally, test_prctl_fork_exec() now uses exit(4) after a failed execv() to clearly signal exec failures. This ensures the parent accurately detects and reports child process failures.
-------------- Before patch: -------------- - [RUN] test_unmerge ok 1 Pages were unmerged ... - [RUN] test_prctl_fork - No pages got merged - [RUN] test_prctl_fork_exec ok 7 PR_SET_MEMORY_MERGE value is inherited ... Bail out! 1 out of 8 tests failed - Planned tests != run tests (9 != 8) - Totals: pass:7 fail:1 xfail:0 xpass:0 skip:0 error:0
-------------- After patch: -------------- - [RUN] test_unmerge ok 1 Pages were unmerged ... - [RUN] test_prctl_fork - No pages got merged not ok 7 Merge in child failed - [RUN] test_prctl_fork_exec ok 8 PR_SET_MEMORY_MERGE value is inherited ... Bail out! 2 out of 9 tests failed - Totals: pass:7 fail:2 xfail:0 xpass:0 skip:0 error:0
Fixes: 6c47de3be3a0 ("selftest/mm: ksm_functional_tests: extend test case for ksm fork/exec") Acked-by: David Hildenbrand david@redhat.com Co-developed-by: Donet Tom donettom@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com --- .../testing/selftests/mm/ksm_functional_tests.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index 996dc6645570..534aa405cac7 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -512,14 +512,14 @@ static int test_child_ksm(void)
/* Test if KSM is enabled for the process. */ if (prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0) != 1) - return -1; + return 1;
/* Test if merge could really happen. */ map = __mmap_and_merge_range(0xcf, size, PROT_READ | PROT_WRITE, KSM_MERGE_NONE); if (map == MAP_MERGE_FAIL) - return -2; + return 2; else if (map == MAP_MERGE_SKIP) - return -3; + return 3;
ksm_unmerge(); munmap(map, size); @@ -528,12 +528,14 @@ static int test_child_ksm(void)
static void test_child_ksm_err(int status) { - if (status == -1) + if (status == 1) ksft_test_result_fail("unexpected PR_GET_MEMORY_MERGE result in child\n"); - else if (status == -2) + else if (status == 2) ksft_test_result_fail("Merge in child failed\n"); - else if (status == -3) + else if (status == 3) ksft_test_result_skip("Merge in child skipped\n"); + else if (status == 4) + ksft_test_result_fail("Binary not found\n"); }
/* Verify that prctl ksm flag is inherited. */ @@ -606,7 +608,7 @@ static void test_prctl_fork_exec(void) char *argv_for_program[] = { prg_name, FORK_EXEC_CHILD_PRG_NAME, NULL };
execv(prg_name, argv_for_program); - return; + exit(4); }
if (waitpid(child_pid, &status, 0) > 0) {
Make thuge-gen skip instead of fail when it can't run due to system settings. If shmmax is too small or no 1G huge pages are available, the test now prints a warning and is marked as skipped.
------------------- Before Patch: ------------------- ~ running ./thuge-gen ~ Bail out! Please do echo 262144 > /proc/sys/kernel/shmmax ~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 ~ [FAIL] not ok 28 thuge-gen ~ exit=1
------------------- After Patch: ------------------- ~ running ./thuge-gen ~ ~ WARNING: shmmax is too small to run this test. ~ ~ Please run the following command to increase shmmax: ~ ~ echo 262144 > /proc/sys/kernel/shmmax ~ 1..0 ~ SKIP Test skipped due to insufficient shmmax value. ~ [SKIP] ok 29 thuge-gen ~ SKIP
Reviewed-by: Dev Jain dev.jain@arm.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Donet Tom donettom@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com --- tools/testing/selftests/mm/thuge-gen.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c index 8e2b08dc5762..4f5e290ff1a6 100644 --- a/tools/testing/selftests/mm/thuge-gen.c +++ b/tools/testing/selftests/mm/thuge-gen.c @@ -177,13 +177,16 @@ void find_pagesizes(void) globfree(&g);
read_sysfs("/proc/sys/kernel/shmmax", &shmmax_val); - if (shmmax_val < NUM_PAGES * largest) - ksft_exit_fail_msg("Please do echo %lu > /proc/sys/kernel/shmmax", - largest * NUM_PAGES); + if (shmmax_val < NUM_PAGES * largest) { + ksft_print_msg("WARNING: shmmax is too small to run this test.\n"); + ksft_print_msg("Please run the following command to increase shmmax:\n"); + ksft_print_msg("echo %lu > /proc/sys/kernel/shmmax\n", largest * NUM_PAGES); + ksft_exit_skip("Test skipped due to insufficient shmmax value.\n"); + }
#if defined(__x86_64__) if (largest != 1U<<30) { - ksft_exit_fail_msg("No GB pages available on x86-64\n" + ksft_exit_skip("No GB pages available on x86-64\n" "Please boot with hugepagesz=1G hugepages=%d\n", NUM_PAGES); } #endif
Gracefully skip test if userfaultfd is not supported (ENOSYS) or not permitted (EPERM), instead of failing. This avoids misleading failures with clear skip messages. -------------- Before Patch -------------- ~ running ./hugepage-mremap ... ~ Bail out! userfaultfd: Function not implemented ~ Planned tests != run tests (1 != 0) ~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 ~ [FAIL] not ok 4 hugepage-mremap # exit=1
-------------- After Patch -------------- ~ running ./hugepage-mremap ... ~ ok 2 # SKIP userfaultfd is not supported/not enabled. ~ 1 skipped test(s) detected. ~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0 ~ [SKIP] ok 4 hugepage-mremap # SKIP
Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Zi Yan ziy@nvidia.com Co-developed-by: Donet Tom donettom@linux.ibm.com Signed-off-by: Donet Tom donettom@linux.ibm.com Signed-off-by: Aboorva Devarajan aboorvad@linux.ibm.com --- tools/testing/selftests/mm/hugepage-mremap.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/hugepage-mremap.c b/tools/testing/selftests/mm/hugepage-mremap.c index c463d1c09c9b..2bd1dac75c3f 100644 --- a/tools/testing/selftests/mm/hugepage-mremap.c +++ b/tools/testing/selftests/mm/hugepage-mremap.c @@ -65,10 +65,20 @@ static void register_region_with_uffd(char *addr, size_t len) struct uffdio_api uffdio_api;
/* Create and enable userfaultfd object. */ - uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); - if (uffd == -1) - ksft_exit_fail_msg("userfaultfd: %s\n", strerror(errno)); + if (uffd == -1) { + switch (errno) { + case EPERM: + ksft_exit_skip("Insufficient permissions, try running as root.\n"); + break; + case ENOSYS: + ksft_exit_skip("userfaultfd is not supported/not enabled.\n"); + break; + default: + ksft_exit_fail_msg("userfaultfd failed with %s\n", strerror(errno)); + break; + } + }
uffdio_api.api = UFFD_API; uffdio_api.features = 0;
linux-kselftest-mirror@lists.linaro.org