Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

15 Mar 2018


      On Thu, Mar 15, 2018 at 12:50 PM, Michal Hocko mhocko@kernel.org wrote:
...
On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
...
On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko mhocko@kernel.org wrote:
...
On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
...
On some architectures (reported on arm64) commit 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
causes a boot hang. This patch fixes the hang making sure the alignment
never steps back.
I am sorry to be complaining again, but the code is so obscure that I
No worries, I'm glad for any review. Which code exactly you do find
obscure? This patch or my former fix or the original commit
introducing memblock_next_valid_pfn()? Coz I'd agree the original
commit looks pretty obscure...
As mentioned in the other email, the whole going back and forth in the
same loop is just too ugly to live.
It's not really supposed to go back, but I guess you understand.
...
...
...
would _really_ appreciate some more information about what is going
on here. memblock_next_valid_pfn will most likely return a pfn within
the same memblock and the alignment will move it before the old pfn
which is not valid - so the block has some holes. Is that correct?
I do not understand what you mean by 'pfn within the same memblock'?
Sorry, I should have said in the same pageblock
...
And by 'the block has some holes'?
memblock_next_valid_pfn clearly returns pfn which is within a pageblock
and that is why we do not initialize pages in the begining of the block
while move_freepages_block does really expect the full pageblock to be
initialized properly. That is the fundamental problem, right?
Yes, that's correct.
...
...
memblock has types 'memory' (as usable memory) and 'reserved' (for
unusable mem), if I understand correctly.
We might not have struct pages for invalid pfns. That really depends on
the memory mode. Sure sparse mem model will usually allocate struct
pages for whole memory sections but that is not universally true and
adding such a suble assumption is simply wrong.
This is gray area for me. But if I understand correctly this
assumption comes from the code. It was already there and got broken
hence I was trying to keep it. If anything needs redesigning I'm all
for it. But I was just calming the fire here. I only didn't test on
arm, which seems to be the only one different.
...
I suspect you are making strong assumptions based on a very specific
implementation which might be not true in general. That was the feeling
I've had since the patch was proposed for the first time. This is such a
cluttered area that I am not really sure myself, thoug.
I understand. And again this is likely correct. I'll be glad for any
assistance here. My limited knowledge is the primary cause for lack of
relevant details I guess. What I checked looks like pfn_valid is a
generic function used by all arches but arm, which seems to be the
only one to implement CONFIG_HAVE_ARCH_PFN_VALID if I didn't miss
anything. So if this config is enabled on arm, it uses it's own
version of pfn_valid(). If not, I'd expect all arches behave the same.
That's where my assumption comes from.
...
--
Michal Hocko
SUSE Labs

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone