On 07.12.21 16:29, Michal Hocko wrote:
On Tue 07-12-21 16:09:39, David Hildenbrand wrote:
On 07.12.21 14:23, Michal Hocko wrote:
On Tue 07-12-21 13:28:31, David Hildenbrand wrote: [...]
But maybe I am missing something important regarding online vs. offline nodes that your patch changes?
I am relying on alloc_node_data setting the node online. But if we are to change the call to arch_alloc_node_data then the patch needs to be more involved. Here is what I have right now. If this happens to be the right way then there is some additional work to sync up with the hotplug code.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..a296e934ad2f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8032,8 +8032,23 @@ void __init free_area_init(unsigned long *max_zone_pfn) /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids();
- for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
- for_each_node(nid) {
pg_data_t *pgdat;
if (!node_online(nid)) {
pr_warn("Node %d uninitialized by the platform. Please report with memory map.\n", nid);
pgdat = arch_alloc_nodedata(nid);
pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
arch_refresh_nodedata(nid, pgdat);
node_set_online(nid);
Setting all possible nodes online might result in quite some QE noice, because all these nodes will then be visible in the sysfs and try_offline_nodes() is essentially for the trash.
I am not sure I follow. I believe sysfs will not get populate because I do not call register_one_node.
arch/x86/kernel/topology.c:topology_init()
for_each_online_node(i) register_one_node(i);
You are right that try_offline_nodes will be reduce which is good imho. More changes will be possible (hopefully to drop some ugly code) on top of this change (or any other that achieves that there are no NULL pgdat for possible nodes).
No to exposing actually offline nodes to user space via sysfs. Let's concentrate on preallocating the pgdat and fixing the issue at hand. One step at a time please.
I agree to prealloc the pgdat, I don't think we should actually set the nodes online. Node onlining/offlining should be done when we do have actual CPUs/memory populated.
If we keep the offline/online node state notion we are not solving an important aspect of the problem - confusing api.
I don't think it's that confusing. Just like we do have online and offline CPUs. Or online and offline memory blocks. Similarly, a node is either online or offline.