Re: [PATCH] mm: fix panic in __alloc_pages

2 Nov 2021

...
...
Yes, but a zonelist cannot be correct for an offline node, where we might
not even have an allocated pgdat yet. No pgdat, no zonelist. So as soon as
we allocate the pgdat and set the node online (->hotadd_new_pgdat()), the zone lists have to be correct. And I can spot an build_all_zonelists() in hotadd_new_pgdat().
Yes, that is what I had in mind. We are talking about two things here.
Memoryless nodes and offline nodes. The later sounds like a bug to me.
Agreed. memoryless nodes should just have proper zonelists -- which
seems to be the case.
...
...
Maybe __alloc_pages_bulk() and alloc_pages_node() should bail out directly
(VM_BUG()) in case we're providing an offline node with eventually no/stale pgdat as
preferred nid.
Historically, those allocation interfaces were not trying to be robust
against wrong inputs because that adds cpu cycles for everybody for
"what if buggy" code. This has worked (surprisingly) well. Memory less
nodes have brought in some confusion but this is still something that we
can address on a higher level. Nobody give arbitrary nodes as an input.
cpu_to_node might be tricky because it can point to a memory less node
which along with __GFP_THISNODE is very likely not something anybody
wants. Hence cpu_to_mem should be used for allocations. I hate we have
two very similar APIs...
To be precise, I'm wondering if we should do:

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 55b2ec1f965a..8c49b88336ee 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -565,7 +565,7 @@ static inline struct page *
 __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
        VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
-       VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid));
+       VM_WARN_ON(!node_online(nid));
return __alloc_pages(gfp_mask, order, nid, NULL);
 }
(Or maybe VM_BUG_ON)
Because it cannot possibly work and we'll dereference NULL later.
...
But something seems wrong in this case. cpu_to_node shouldn't return
offline nodes. That is just a land mine. It is not clear to me how the
cpu has been brought up so that the numa node allocation was left
behind. As pointed in other email add_cpu resp. cpu_up is not it.
Is it possible that the cpu bring up was only half way?
I tried to follow the code (what sets a CPU present, what sets a CPU
online, when do we update cpu_to_node() mapping) and IMHO it's all a big
mess. Maybe it's clearer to people familiar with that code, but CPU
hotplug in general seems to be a confusing piece of (arch-specific) code.
Also, I have no clue if cpu_to_node() mapping will get invalidated after
unplugging that CPU, or if the mapping will simply stay around for all
eternity ...
-- 
Thanks,

David / dhildenb


    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm: fix panic in __alloc_pages