In add_memory_resource() we hotplug the new node if required and set it online. Memory might get onlined later, via online_pages().
You are correct. In case of memory hot add, it is true. But in case of adding CPU with memoryless node, try_node_online() will be called only during CPU onlining, see cpu_up().
Is there any reason why try_online_node() resides in cpu_up() and not in add_cpu()? I think it would be correct to online node during the CPU hot add to align with memory hot add.
I am not familiar with cpu hotplug, but this doesn't seem to be anything new so how come this became problem only now?
So IIUC, the issue is that we have a node
a) That has no memory b) That is offline
This node will get onlined when onlining the CPU as Alexey says. Yet we have some code that stumbles over the node and goes ahead trying to use the pgdat -- that code is broken.
You are correct.
If we take a look at build_zonelists() we indeed skip any !node_online(node). Any other code should do the same. If the node is not online, it shall be ignored because we might not even have a pgdat yet -- see hotadd_new_pgdat(). Without node_online(), the pgdat might be stale or non-existant.
Agree, alloc_pages_node() should also do the same. Not exactly to skip the node, but to fallback to another node if !node_online(node). alloc_pages_node() can also be hit while onlining the node, creating chicken-egg problem, see below.
The node onlining logic when onlining a CPU sounds bogus as well: Let's take a look at try_offline_node(). It checks that:
- That no memory is *present*
- That no CPU is *present*
We should online the node when adding the CPU ("present"), not when onlining the CPU.
Possible. Assuming try_online_node was moved under add_cpu(), let’s take look on this call stack: add_cpu() try_online_node() __try_online_node() hotadd_new_pgdat() At line 1190 we'll have a problem: 1183 pgdat = NODE_DATA(nid); 1184 if (!pgdat) { 1185 pgdat = arch_alloc_nodedata(nid); 1186 if (!pgdat) 1187 return NULL; 1188 1189 pgdat->per_cpu_nodestats = 1190 alloc_percpu(struct per_cpu_nodestat); 1191 arch_refresh_nodedata(nid, pgdat);
alloc_percpu() will go for all possible CPUs and will eventually end up calling alloc_pages_node() trying to use subject nid for corresponding CPU hitting the same state #2 problem as NODE_DATA(nid) is still NULL and nid is not yet online.
I like the idea of onlining the node when adding the CPU rather then when CPU get online. It will require current patch or another solution to resolve described above chicken-egg problem.
PS, earlier this year I initiated discussion about redesigning per_cpu allocator to do not allocate/waste memory chunks for not present CPUs, but it has another complications.
Thanks, —Alexey