On Tue, Aug 12, 2025 at 05:59:02PM +0900, Harry Yoo wrote:
On Mon, Aug 11, 2025 at 12:46:32PM +0100, Lorenzo Stoakes wrote:
On Mon, Aug 11, 2025 at 02:34:20PM +0900, Harry Yoo wrote:
Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure page tables are properly synchronized when calling p*d_populate_kernel(). It is inteneded to synchronize page tables via pgd_pouplate_kernel() when 5-level paging is in use and via p4d_pouplate_kernel() when 4-level paging is used.
I think it's worth mentioning here that pgd_populate() is a no-op in 4-level systems, so the sychronisation must occur at the P4D level, just to make this clear.
Yeah, that's indeed confusing and agree that it's worth mentioning. Will do. The new one:
Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure page tables are properly synchronized when calling p*d_populate_kernel().
For 5-level paging, synchronization is performed via pgd_populate_kernel(). In 4-level paging, pgd_populate() is a no-op, so synchronization is instead performed at the P4D level via p4d_populate_kernel().
That's great thanks!
This fixes intermittent boot failures on systems using 4-level paging and a large amount of persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI RIP: 0010:__init_single_page+0x9/0x6d Call Trace:
<TASK> __init_zone_device_page+0x17/0x5d memmap_init_zone_device+0x154/0x1bb pagemap_range+0x2e0/0x40f memremap_pages+0x10b/0x2f0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0xce/0x2ec [device_dax] dax_bus_probe+0x6d/0xc9 [... snip ...] </TASK>
It also fixes a crash in vmemmap_set_pmd() caused by accessing vmemmap before sync_global_pgds() [1]:
BUG: unable to handle page fault for address: ffffeb3ff1200000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI Tainted: [W]=WARN RIP: 0010:vmemmap_set_pmd+0xff/0x230
<TASK> vmemmap_populate_hugepages+0x176/0x180 vmemmap_populate+0x34/0x80 __populate_section_memmap+0x41/0x90 sparse_add_section+0x121/0x3e0 __add_pages+0xba/0x150 add_pages+0x1d/0x70 memremap_pages+0x3dc/0x810 devm_memremap_pages+0x1c/0x60 xe_devm_add+0x8b/0x100 [xe] xe_tile_init_noalloc+0x6a/0x70 [xe] xe_device_probe+0x48c/0x740 [xe] [... snip ...]
Cc: stable@vger.kernel.org Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Closes: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@int... [1] Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Harry Yoo harry.yoo@oracle.com
Other than nitty comments, this looks good to me, so:
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Thanks!
arch/x86/include/asm/pgtable_64_types.h | 3 +++ arch/x86/mm/init_64.c | 5 +++++ 2 files changed, 8 insertions(+)
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 4604f924d8b8..7eb61ef6a185 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -36,6 +36,9 @@ static inline bool pgtable_l5_enabled(void) #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57) #endif /* USE_EARLY_PGTABLE_L5 */
+#define ARCH_PAGE_TABLE_SYNC_MASK \
- (pgtable_l5_enabled() ? PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED)
extern unsigned int pgdir_shift; extern unsigned int ptrs_per_p4d;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 76e33bd7c556..a78b498c0dc3 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -223,6 +223,11 @@ static void sync_global_pgds(unsigned long start, unsigned long end) sync_global_pgds_l4(start, end); }
Worth a comment to say 'if 4-level, then we synchronise at P4D level by convention, however the same sync_global_pgds() applies'?
Maybe:
/*
- Make kernel mappings visible in all page tables in the system.
- This is necessary except when the init task populates kernel mappings
- during the boot process. In that case, all processes originating from
- the init task copies the kernel mappings, so there is no issue.
- Otherwise, missing synchronization could lead to kernel crashes due
- to missing page table entries for certain kernel mappings.
- Synchronization is performed at the top level, which is the PGD in
- 5-level paging systems. But in 4-level paging systems, however,
- pgd_populate() is a no-op, so synchronization is done at P4D level instead.
- sync_global_pgds() handles this difference between paging levels.
*/
That's great also, thanks!
-- Cheers, Harry / Hyeonggon
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end) +{
- sync_global_pgds(start, end);
+}
/*
- NOTE: This function is marked __ref because it calls __init function
- (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.
-- 2.43.0