The patch titled Subject: mm: don't allow deferred pages with NEED_PER_CPU_KM has been added to the -mm tree. Its filename is mm-dont-allow-deferred-pages-with-need_per_cpu_km.patch
This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-dont-allow-deferred-pages-with-n... and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-dont-allow-deferred-pages-with-n...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
------------------------------------------------------ From: Pavel Tatashin pasha.tatashin@oracle.com Subject: mm: don't allow deferred pages with NEED_PER_CPU_KM
It is unsafe to do virtual to physical translations before mm_init() is called if struct page is needed in order to determine the memory section number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init() we initialize struct pages for all the allocated memory when deferred struct pages are used.
My recent fix c9e97a1997 ("mm: initialize pages on demand during boot") exposed this problem, because it greatly reduced number of pages that are initialized before mm_init(), but the problem existed even before my fix, as Fengguang Wu found.
Below is a more detailed explanation of the problem.
We initialize struct pages in four places:
1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones.
2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock.
3. Using on-demand logic when pages are allocated after mm_init call (when memblock is finished)
4. After smp_init() when the rest free deferred pages are initialized.
The problem occurs if we try to do va to phys translation of a memory between steps 1 and 2. Because we have not yet initialized struct pages for all the reserved pages, it is inherently unsafe to do va to phys if the translation itself requires access of "struct page" as in case of this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP
The following path exposes the problem:
start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT)
We disable this path by not allowing NEED_PER_CPU_KM with deferred struct pages feature.
The problems are discussed in these threads: http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.... http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.... http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Pavel Tatashin pasha.tatashin@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Steven Sistare steven.sistare@oracle.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Fengguang Wu fengguang.wu@intel.com Cc: Dennis Zhou dennisszhou@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
mm/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff -puN mm/Kconfig~mm-dont-allow-deferred-pages-with-need_per_cpu_km mm/Kconfig --- a/mm/Kconfig~mm-dont-allow-deferred-pages-with-need_per_cpu_km +++ a/mm/Kconfig @@ -636,6 +636,7 @@ config DEFERRED_STRUCT_PAGE_INIT default n depends on NO_BOOTMEM depends on !FLATMEM + depends on !NEED_PER_CPU_KM help Ordinarily all struct pages are initialised during early boot in a single thread. On very large machines this can take a considerable _
Patches currently in -mm which might be from pasha.tatashin@oracle.com are
mm-dont-allow-deferred-pages-with-need_per_cpu_km.patch sparc64-ng4-memset-32-bits-overflow.patch