On Sat, Jan 30, 2021 at 04:37:54PM -0800, Linus Torvalds wrote:
On Sat, Jan 30, 2021 at 2:10 PM Mike Rapoport rppt@kernel.org wrote:
In either case, e820__memblock_setup() won't add the range 0x0000 - 0x1000 to memblock.memory and later during memory map initialization this range is left outside any zone.
Honestly, this just sounds like memblock being stupid in the first place.
Why aren't these zones padded to sane alignments?
The implicit alignment of zones would be a guess. What alignment would be sane here? 1M? MAX_ORDER? pageblock_order?
I'm not sure that if an architecture reports its memory at X and we use, say, round_down(X, 1M) for node[0]->node_start_pfn and zone[0]->zone_start_pfn it wouldn't cause boot failure on some system out there in the wild.
This patch smells like working around the memblock code being fragile rather than a real fix.
That's *particularly* true when the very line above it did a "memblock_reserve()" of the exact same range that the memblock_add() "adds".
The most correct thing to do would have been to
memblock_add(0, end_of_first_memory_bank);
Somewhere at e820__memblock_setup().
But that would mean we also must change the way e820__memblock_setup() reserves memory and that seemed to me like really asking for troubles so I've limited the registration of memory to the range that's for sure reserved.
A part of the problem is that x86 adds only usable memory to memblock.memory omitting holes and reserved areas, while free_area_init() presumes that memblock.memory covers populated physical memory.
I've tried implicitly adding ranges from memblock.reserved to memblock.memory if they were not there and it had broken some arm machines:
https://lore.kernel.org/lkml/127999c4-7d56-0c36-7f88-8e1a5c934cae@collabora....
I do feel that free_area_init() is fragile and no doubt there is a room for improvement there. But I think the safer way forward is to reduce inconsistencies between arch and generic code, so that we won't need to guess what is the memory layout at free_area_init() time.
Linus