I think we'll soon might see setups (again, CXL is an example, but als owhen providing a dynamic amount of performance differentiated memory via virtio-mem) where this will most probably matter. With performance differentiated memory we'll see a lot more nodes getting used in general, and a lot more nodes eventually getting hotplugged.
There are certainly machines with many nodes. E.g. SLES kernels are build with CONFIG_NODES_SHIFT=10 which is a lot of potential nodes. And I have seen really large machines with many nodes but those usually come with a lot of memory and they do not tend to have non populated nodes AFAIR.
Right, and is about to change as nodes are getting used to represent memory with differing performance characteristics/individual devices, not the traditional "this is a socket" setup: we'll see more and more small (virtual) machines with multiple nodes and eventually many possible nodes.