On Wed 08-12-21 08:19:16, Alexey Makhalov wrote:
Hi Michal,
On Dec 8, 2021, at 12:04 AM, Michal Hocko mhocko@suse.com wrote:
On Tue 07-12-21 17:17:27, Alexey Makhalov wrote:
On Dec 7, 2021, at 9:13 AM, David Hildenbrand david@redhat.com wrote:
On 07.12.21 18:02, Alexey Makhalov wrote:
On Dec 7, 2021, at 8:36 AM, Michal Hocko mhocko@suse.com wrote:
On Tue 07-12-21 17:27:29, Michal Hocko wrote: [...] > So your proposal is to drop set_node_online from the patch and add it as > a separate one which handles > - sysfs part (i.e. do not register a node which doesn't span a > physical address space) > - hotplug side of (drop the pgd allocation, register node lazily > when a first memblocks are registered)
In other words, the first stage diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..f9024ba09c53 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6382,7 +6382,11 @@ static void __build_all_zonelists(void *data) if (self && !node_online(self->node_id)) { build_zonelists(self); } else {
for_each_online_node(nid) {
/*
* All possible nodes have pgdat preallocated
* free_area_init
*/
for_each_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); build_zonelists(pgdat);
Will it blow up memory usage for the nodes which might never be onlined? I prefer the idea of init on demand.
Even now there is an existing problem. In my experiments, I observed _huge_ memory consumption increase by increasing number of possible numa nodes. I’m going to report it in separate mail thread.
I already raised that PPC might be problematic in that regard. Which architecture / setup do you have in mind that can have a lot of possible nodes?
It is x86_64 VMware VM, not the regular one, but specially configured (1 vCPU per node, with hot-plug support, 128 possible nodes)
This is slightly tangent but could you elaborate more on this setup and reasoning behind it. I was already curious when you mentioned this previously. Why would you want to have so many nodes and having 1:1 with CPUs. What is the resulting NUMA topology?
This setup with 128 nodes was used purely for development purposes. That is when the issue with hot adding numa nodes was found.
OK, I see.
Original issue presents even with feasible number of nodes.
Yes the issue is independent on the number of offline nodes currently. The number of nodes is only interesting for the wasted amount of memory if we are to allocate pgdat for each possible node.