When linux memory is not aligned with page block size and have hole in zone, the 5.4-lts arm kernel might crash in move_freepages() as Kefen Wang reported in [1]. Backport the upstream fix commits by Mike Rapoport [2] to 5.4 can fix this issue.
And free_unused_memmap() of arm and arm64 are moved to generic mm/memblock in the below upstream commit, so I applied the first two patches to free_unused_memmap() in arch/arm/mm/init.c.
(4f5b0c178996 arm, arm64: move free_unused_memmap() to generic mm)
[1] https://lore.kernel.org/lkml/2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com... [2] https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/#t
Mike Rapoport (5): memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER memblock: align freed memory map on pageblock boundaries with SPARSEMEM memblock: ensure there is no overflow in memblock_overlaps_region() arm: extend pfn_valid to take into account freed memory map alignment arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM
arch/arm/mm/init.c | 37 +++++++++++++++++++++++++------------ arch/arm/mm/ioremap.c | 4 +++- mm/memblock.c | 3 ++- 3 files changed, 30 insertions(+), 14 deletions(-)
From: Mike Rapoport rppt@linux.ibm.com
commit e2a86800d58639b3acde7eaeb9eb393dca066e08 upstream.
The code that frees unused memory map uses rounds start and end of the holes that are freed to MAX_ORDER_NR_PAGES to preserve continuity of the memory map for MAX_ORDER regions.
Lots of core memory management functionality relies on homogeneity of the memory map within each pageblock which size may differ from MAX_ORDER in certain configurations.
Although currently, for the architectures that use free_unused_memmap(), pageblock_order and MAX_ORDER are equivalent, it is cleaner to have common notation thought mm code.
Replace MAX_ORDER_NR_PAGES with pageblock_nr_pages and update the comments to make it more clear why the alignment to pageblock boundaries is required.
Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ [backport upstream modification in mm/memblock.c to arch/arm/mm/init.c] Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com --- arch/arm/mm/init.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 7ea4d3b43444..6905dd8bc03f 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -381,11 +381,11 @@ static void __init free_unused_memmap(void) ALIGN(prev_end, PAGES_PER_SECTION)); #else /* - * Align down here since the VM subsystem insists that the - * memmap entries are valid from the bank start aligned to - * MAX_ORDER_NR_PAGES. + * Align down here since many operations in VM subsystem + * presume that there are no holes in the memory map inside + * a pageblock */ - start = round_down(start, MAX_ORDER_NR_PAGES); + start = round_down(start, pageblock_nr_pages); #endif /* * If we had a previous bank, and there is a space @@ -395,12 +395,12 @@ static void __init free_unused_memmap(void) free_memmap(prev_end, start);
/* - * Align up here since the VM subsystem insists that the - * memmap entries are valid from the bank end aligned to - * MAX_ORDER_NR_PAGES. + * Align up here since many operations in VM subsystem + * presume that there are no holes in the memory map inside + * a pageblock */ prev_end = ALIGN(memblock_region_memory_end_pfn(reg), - MAX_ORDER_NR_PAGES); + pageblock_nr_pages); }
#ifdef CONFIG_SPARSEMEM
On Mon, Dec 13, 2021 at 04:57:06PM +0800, Mark-PK Tsai wrote:
Subject: [PATCH 5.4 1/5] memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER
I'd replace memblock: with arm: in the subject. I believe it's clearer this way.
The same applies for the second patch in the series and for 5.10 posting.
From: Mike Rapoport rppt@linux.ibm.com
commit e2a86800d58639b3acde7eaeb9eb393dca066e08 upstream.
The code that frees unused memory map uses rounds start and end of the holes that are freed to MAX_ORDER_NR_PAGES to preserve continuity of the memory map for MAX_ORDER regions.
Lots of core memory management functionality relies on homogeneity of the memory map within each pageblock which size may differ from MAX_ORDER in certain configurations.
Although currently, for the architectures that use free_unused_memmap(), pageblock_order and MAX_ORDER are equivalent, it is cleaner to have common notation thought mm code.
Replace MAX_ORDER_NR_PAGES with pageblock_nr_pages and update the comments to make it more clear why the alignment to pageblock boundaries is required.
Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ [backport upstream modification in mm/memblock.c to arch/arm/mm/init.c] Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com
arch/arm/mm/init.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 7ea4d3b43444..6905dd8bc03f 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -381,11 +381,11 @@ static void __init free_unused_memmap(void) ALIGN(prev_end, PAGES_PER_SECTION)); #else /*
* Align down here since the VM subsystem insists that the
* memmap entries are valid from the bank start aligned to
* MAX_ORDER_NR_PAGES.
* Align down here since many operations in VM subsystem
* presume that there are no holes in the memory map inside
*/* a pageblock
start = round_down(start, MAX_ORDER_NR_PAGES);
start = round_down(start, pageblock_nr_pages);
#endif /* * If we had a previous bank, and there is a space @@ -395,12 +395,12 @@ static void __init free_unused_memmap(void) free_memmap(prev_end, start); /*
* Align up here since the VM subsystem insists that the
* memmap entries are valid from the bank end aligned to
* MAX_ORDER_NR_PAGES.
* Align up here since many operations in VM subsystem
* presume that there are no holes in the memory map inside
*/ prev_end = ALIGN(memblock_region_memory_end_pfn(reg),* a pageblock
MAX_ORDER_NR_PAGES);
}pageblock_nr_pages);
#ifdef CONFIG_SPARSEMEM
2.18.0
From: Mike Rapoport rppt@linux.ibm.com
commit f921f53e089a12a192808ac4319f28727b35dc0f upstream.
When CONFIG_SPARSEMEM=y the ranges of the memory map that are freed are not aligned to the pageblock boundaries which breaks assumptions about homogeneity of the memory map throughout core mm code.
Make sure that the freed memory map is always aligned on pageblock boundaries regardless of the memory model selection.
Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ [backport upstream modification in mm/memblock.c to arch/arm/mm/init.c] Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com --- arch/arm/mm/init.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 6905dd8bc03f..c0e70e643f92 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -379,14 +379,14 @@ static void __init free_unused_memmap(void) */ start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); -#else +#endif /* * Align down here since many operations in VM subsystem * presume that there are no holes in the memory map inside * a pageblock */ start = round_down(start, pageblock_nr_pages); -#endif + /* * If we had a previous bank, and there is a space * between the current bank and the previous, free it. @@ -404,9 +404,11 @@ static void __init free_unused_memmap(void) }
#ifdef CONFIG_SPARSEMEM - if (!IS_ALIGNED(prev_end, PAGES_PER_SECTION)) + if (!IS_ALIGNED(prev_end, PAGES_PER_SECTION)) { + prev_end = ALIGN(prev_end, pageblock_nr_pages); free_memmap(prev_end, ALIGN(prev_end, PAGES_PER_SECTION)); + } #endif }
From: Mike Rapoport rppt@linux.ibm.com
commit 023accf5cdc1e504a9b04187ec23ff156fe53d90 upstream.
There maybe an overflow in memblock_overlaps_region() if it is called with base and size such that
base + size > PHYS_ADDR_MAX
Make sure that memblock_overlaps_region() caps the size to prevent such overflow and remove now duplicated call to memblock_cap_size() from memblock_is_region_reserved().
Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com --- mm/memblock.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/memblock.c b/mm/memblock.c index 4de2af293f47..e13003ed6ee7 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -164,6 +164,8 @@ bool __init_memblock memblock_overlaps_region(struct memblock_type *type, { unsigned long i;
+ memblock_cap_size(base, &size); + for (i = 0; i < type->cnt; i++) if (memblock_addrs_overlap(base, size, type->regions[i].base, type->regions[i].size)) @@ -1764,7 +1766,6 @@ bool __init_memblock memblock_is_region_memory(phys_addr_t base, phys_addr_t siz */ bool __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t size) { - memblock_cap_size(base, &size); return memblock_overlaps_region(&memblock.reserved, base, size); }
From: Mike Rapoport rppt@linux.ibm.com
commit a4d5613c4dc6d413e0733e37db9d116a2a36b9f3 upstream.
When unused memory map is freed the preserved part of the memory map is extended to match pageblock boundaries because lots of core mm functionality relies on homogeneity of the memory map within pageblock boundaries.
Since pfn_valid() is used to check whether there is a valid memory map entry for a PFN, make it return true also for PFNs that have memory map entries even if there is no actual memory populated there.
Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Kefeng Wang wangkefeng.wang@huawei.com Tested-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com --- arch/arm/mm/init.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index c0e70e643f92..c30b4b2f8de9 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -176,11 +176,22 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, int pfn_valid(unsigned long pfn) { phys_addr_t addr = __pfn_to_phys(pfn); + unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
if (__phys_to_pfn(addr) != pfn) return 0;
- return memblock_is_map_memory(__pfn_to_phys(pfn)); + /* + * If address less than pageblock_size bytes away from a present + * memory chunk there still will be a memory map entry for it + * because we round freed memory map to the pageblock boundaries. + */ + if (memblock_overlaps_region(&memblock.memory, + ALIGN_DOWN(addr, pageblock_size), + pageblock_size)) + return 1; + + return 0; } EXPORT_SYMBOL(pfn_valid); #endif
From: Mike Rapoport rppt@linux.ibm.com
commit 024591f9a6e0164ec23301784d1e6d8f6cacbe59 upstream.
The semantics of pfn_valid() is to check presence of the memory map for a PFN and not whether a PFN is in RAM. The memory map may be present for a hole in the physical memory and if such hole corresponds to an MMIO range, __arm_ioremap_pfn_caller() will produce a WARN() and fail:
[ 2.863406] WARNING: CPU: 0 PID: 1 at arch/arm/mm/ioremap.c:287 __arm_ioremap_pfn_caller+0xf0/0x1dc [ 2.864812] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-09882-ga180bd1d7e16 #1 [ 2.865263] Hardware name: Generic DT based system [ 2.865711] Backtrace: [ 2.866063] [<80b07e58>] (dump_backtrace) from [<80b080ac>] (show_stack+0x20/0x24) [ 2.866633] r7:00000009 r6:0000011f r5:60000153 r4:80ddd1c0 [ 2.866922] [<80b0808c>] (show_stack) from [<80b18df0>] (dump_stack_lvl+0x58/0x74) [ 2.867117] [<80b18d98>] (dump_stack_lvl) from [<80b18e20>] (dump_stack+0x14/0x1c) [ 2.867309] r5:80118cac r4:80dc6774 [ 2.867404] [<80b18e0c>] (dump_stack) from [<80122fcc>] (__warn+0xe4/0x150) [ 2.867583] [<80122ee8>] (__warn) from [<80b08850>] (warn_slowpath_fmt+0x88/0xc0) [ 2.867774] r7:0000011f r6:80dc6774 r5:00000000 r4:814c4000 [ 2.867917] [<80b087cc>] (warn_slowpath_fmt) from [<80118cac>] (__arm_ioremap_pfn_caller+0xf0/0x1dc) [ 2.868158] r9:00000001 r8:9ef00000 r7:80e8b0d4 r6:0009ef00 r5:00000000 r4:00100000 [ 2.868346] [<80118bbc>] (__arm_ioremap_pfn_caller) from [<80118df8>] (__arm_ioremap_caller+0x60/0x68) [ 2.868581] r9:9ef00000 r8:821b6dc0 r7:00100000 r6:00000000 r5:815d1010 r4:80118d98 [ 2.868761] [<80118d98>] (__arm_ioremap_caller) from [<80118fcc>] (ioremap+0x28/0x30) [ 2.868958] [<80118fa4>] (ioremap) from [<8062871c>] (__devm_ioremap_resource+0x154/0x1c8) [ 2.869169] r5:815d1010 r4:814c5d2c [ 2.869263] [<806285c8>] (__devm_ioremap_resource) from [<8062899c>] (devm_ioremap_resource+0x14/0x18) [ 2.869495] r9:9e9f57a0 r8:814c4000 r7:815d1000 r6:815d1010 r5:8177c078 r4:815cf400 [ 2.869676] [<80628988>] (devm_ioremap_resource) from [<8091c6e4>] (fsi_master_acf_probe+0x1a8/0x5d8) [ 2.869909] [<8091c53c>] (fsi_master_acf_probe) from [<80723dbc>] (platform_probe+0x68/0xc8) [ 2.870124] r9:80e9dadc r8:00000000 r7:815d1010 r6:810c1000 r5:815d1010 r4:00000000 [ 2.870306] [<80723d54>] (platform_probe) from [<80721208>] (really_probe+0x1cc/0x470) [ 2.870512] r7:815d1010 r6:810c1000 r5:00000000 r4:815d1010 [ 2.870651] [<8072103c>] (really_probe) from [<807215cc>] (__driver_probe_device+0x120/0x1fc) [ 2.870872] r7:815d1010 r6:810c1000 r5:810c1000 r4:815d1010 [ 2.871013] [<807214ac>] (__driver_probe_device) from [<807216e8>] (driver_probe_device+0x40/0xd8) [ 2.871244] r9:80e9dadc r8:00000000 r7:815d1010 r6:810c1000 r5:812feaa0 r4:812fe994 [ 2.871428] [<807216a8>] (driver_probe_device) from [<80721a58>] (__driver_attach+0xa8/0x1d4) [ 2.871647] r9:80e9dadc r8:00000000 r7:00000000 r6:810c1000 r5:815d1054 r4:815d1010 [ 2.871830] [<807219b0>] (__driver_attach) from [<8071ee8c>] (bus_for_each_dev+0x88/0xc8) [ 2.872040] r7:00000000 r6:814c4000 r5:807219b0 r4:810c1000 [ 2.872194] [<8071ee04>] (bus_for_each_dev) from [<80722208>] (driver_attach+0x28/0x30) [ 2.872418] r7:810a2aa0 r6:00000000 r5:821b6000 r4:810c1000 [ 2.872570] [<807221e0>] (driver_attach) from [<8071f80c>] (bus_add_driver+0x114/0x200) [ 2.872788] [<8071f6f8>] (bus_add_driver) from [<80722ec4>] (driver_register+0x98/0x128) [ 2.873011] r7:81011d0c r6:814c4000 r5:00000000 r4:810c1000 [ 2.873167] [<80722e2c>] (driver_register) from [<80725240>] (__platform_driver_register+0x2c/0x34) [ 2.873408] r5:814dcb80 r4:80f2a764 [ 2.873513] [<80725214>] (__platform_driver_register) from [<80f2a784>] (fsi_master_acf_init+0x20/0x28) [ 2.873766] [<80f2a764>] (fsi_master_acf_init) from [<80f014a8>] (do_one_initcall+0x108/0x290) [ 2.874007] [<80f013a0>] (do_one_initcall) from [<80f01840>] (kernel_init_freeable+0x1ac/0x230) [ 2.874248] r9:80e9dadc r8:80f3987c r7:80f3985c r6:00000007 r5:814dcb80 r4:80f627a4 [ 2.874456] [<80f01694>] (kernel_init_freeable) from [<80b19f44>] (kernel_init+0x20/0x138) [ 2.874691] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:80b19f24 [ 2.874894] r4:00000000 [ 2.874977] [<80b19f24>] (kernel_init) from [<80100170>] (ret_from_fork+0x14/0x24) [ 2.875231] Exception stack(0x814c5fb0 to 0x814c5ff8) [ 2.875535] 5fa0: 00000000 00000000 00000000 00000000 [ 2.875849] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.876133] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 2.876363] r5:80b19f24 r4:00000000 [ 2.876683] ---[ end trace b2f74b8536829970 ]--- [ 2.876911] fsi-master-acf gpio-fsi: ioremap failed for resource [mem 0x9ef00000-0x9effffff] [ 2.877492] fsi-master-acf gpio-fsi: Error -12 mapping coldfire memory [ 2.877689] fsi-master-acf: probe of gpio-fsi failed with error -12
Use memblock_is_map_memory() instead of pfn_valid() to check if a PFN is in RAM or not.
Reported-by: Guenter Roeck linux@roeck-us.net Fixes: a4d5613c4dc6 ("arm: extend pfn_valid to take into account freed memory map alignment") Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com --- arch/arm/mm/ioremap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index d42b93316183..513c26b46db3 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -27,6 +27,7 @@ #include <linux/vmalloc.h> #include <linux/io.h> #include <linux/sizes.h> +#include <linux/memblock.h>
#include <asm/cp15.h> #include <asm/cputype.h> @@ -301,7 +302,8 @@ static void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, * Don't allow RAM to be mapped with mismatched attributes - this * causes problems with ARMv6+ */ - if (WARN_ON(pfn_valid(pfn) && mtype != MT_MEMORY_RW)) + if (WARN_ON(memblock_is_map_memory(PFN_PHYS(pfn)) && + mtype != MT_MEMORY_RW)) return NULL;
area = get_vm_area_caller(size, VM_IOREMAP, caller);
On Mon, Dec 13, 2021 at 04:57:05PM +0800, Mark-PK Tsai wrote:
When linux memory is not aligned with page block size and have hole in zone, the 5.4-lts arm kernel might crash in move_freepages() as Kefen Wang reported in [1]. Backport the upstream fix commits by Mike Rapoport [2] to 5.4 can fix this issue.
And free_unused_memmap() of arm and arm64 are moved to generic mm/memblock in the below upstream commit, so I applied the first two patches to free_unused_memmap() in arch/arm/mm/init.c.
(4f5b0c178996 arm, arm64: move free_unused_memmap() to generic mm)
[1] https://lore.kernel.org/lkml/2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com... [2] https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/#t
Mike Rapoport (5): memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER memblock: align freed memory map on pageblock boundaries with SPARSEMEM memblock: ensure there is no overflow in memblock_overlaps_region() arm: extend pfn_valid to take into account freed memory map alignment arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM
arch/arm/mm/init.c | 37 +++++++++++++++++++++++++------------ arch/arm/mm/ioremap.c | 4 +++- mm/memblock.c | 3 ++- 3 files changed, 30 insertions(+), 14 deletions(-)
-- 2.18.0
These look like they also are required in 5.10.y as well, right? Please also provide a backported series for that tree, we can not have users moving to a newer kernel version and having regressions.
I can't take this series until then, sorry.
thanks,
greg k-h
On Mon, Dec 13, 2021 at 10:07:40AM +0100, Greg KH wrote:
On Mon, Dec 13, 2021 at 04:57:05PM +0800, Mark-PK Tsai wrote:
When linux memory is not aligned with page block size and have hole in zone, the 5.4-lts arm kernel might crash in move_freepages() as Kefen Wang reported in [1]. Backport the upstream fix commits by Mike Rapoport [2] to 5.4 can fix this issue.
And free_unused_memmap() of arm and arm64 are moved to generic mm/memblock in the below upstream commit, so I applied the first two patches to free_unused_memmap() in arch/arm/mm/init.c.
(4f5b0c178996 arm, arm64: move free_unused_memmap() to generic mm)
[1] https://lore.kernel.org/lkml/2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com... [2] https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/#t
Mike Rapoport (5): memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER memblock: align freed memory map on pageblock boundaries with SPARSEMEM memblock: ensure there is no overflow in memblock_overlaps_region() arm: extend pfn_valid to take into account freed memory map alignment arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM
arch/arm/mm/init.c | 37 +++++++++++++++++++++++++------------ arch/arm/mm/ioremap.c | 4 +++- mm/memblock.c | 3 ++- 3 files changed, 30 insertions(+), 14 deletions(-)
-- 2.18.0
These look like they also are required in 5.10.y as well, right? Please also provide a backported series for that tree, we can not have users moving to a newer kernel version and having regressions.
I can't take this series until then, sorry.
Ah, now I see your 5.10 series, thanks. I'll go queue both of these series up now, thanks for the backports.
greg k-h
linux-stable-mirror@lists.linaro.org