From: Mike Rapoport rppt@linux.ibm.com
commit bde9cfa3afe4324ec251e4af80ebf9b7afaf7afe upstream.
Patch series "mm: fix initialization of struct page for holes in memory layout", v3.
Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") exposed several issues with the memory map initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported back in April [1]. It seemed back then that the problem was fixed, but a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional discussion at [3].
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-found...
This patch (of 2):
The first 4Kb of memory is a BIOS owned area and to avoid its allocation for the kernel it was not listed in e820 tables as memory. As the result, pfn 0 was never recognised by the generic memory management and it is not a part of neither node 0 nor ZONE_DMA.
If set_pfnblock_flags_mask() would be ever called for the pageblock corresponding to the first 2Mbytes of memory, having pfn 0 outside of ZONE_DMA would trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
Along with reserving the first 4Kb in e820 tables, several first pages are reserved with memblock in several places during setup_arch(). These reservations are enough to ensure the kernel does not touch the BIOS area and it is not necessary to remove E820_TYPE_RAM for pfn 0.
Remove the update of e820 table that changes the type of pfn 0 and move the comment describing why it was done to trim_low_memory_range() that reserves the beginning of the memory.
Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org Signed-off-by: Mike Rapoport rppt@linux.ibm.com Cc: Baoquan He bhe@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: David Hildenbrand david@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Mel Gorman mgorman@suse.de Cc: Michal Hocko mhocko@kernel.org Cc: Qian Cai cai@lca.pw Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/setup.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -666,17 +666,6 @@ static void __init trim_platform_memory_ static void __init trim_bios_range(void) { /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - * - * This typically reserves additional memory (64KiB by default) - * since some BIOSes are known to corrupt low memory. See the - * Kconfig help text for X86_RESERVE_LOW. - */ - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); - - /* * special case: Some BIOSes report the PC BIOS * area (640Kb -> 1Mb) as RAM even though it is not. * take them out. @@ -733,6 +722,15 @@ early_param("reservelow", parse_reservel
static void __init trim_low_memory_range(void) { + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + * + * This typically reserves additional memory (64KiB by default) + * since some BIOSes are known to corrupt low memory. See the + * Kconfig help text for X86_RESERVE_LOW. + */ memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); }