On Thu 09-12-21 09:28:55, Alexey Makhalov wrote:
On Dec 9, 2021, at 12:46 AM, Michal Hocko mhocko@suse.com wrote:
On Thu 09-12-21 02:16:17, Alexey Makhalov wrote:
This patch calls alloc_percpu() from setup_arch() while percpu allocator is not yet initialized (before setup_per_cpu_areas()).
Yeah, I haven't realized the pcp is not available. I was not really sure about that. Could you try with the alloc_percpu dropped?
Thanks for testing!
Michal Hocko SUSE Labs
It boots now. dmesg has these new messages:
[ 0.081777] Node 4 uninitialized by the platform. Please report with boot dmesg. [ 0.081790] Initmem setup node 4 [mem 0x0000000000000000-0x0000000000000000] ... [ 0.086441] Node 127 uninitialized by the platform. Please report with boot dmesg. [ 0.086454] Initmem setup node 127 [mem 0x0000000000000000-0x0000000000000000]
Interesting that only those two didn't get a proper arch specific initialization. Could you check why? I assume init_cpu_to_node doesn't see any CPU pointing at this node. Wondering why that would be the case but that can be a bug in the affinity tables.
vCPU/node hot add works. Onlining works as well, but with warning. I do not think it is related to the patch: [ 36.838838] CPU4 has been hot-added [ 36.838987] acpi_processor_hotadd_init:205 cpu 4, node 4, online 0, ndata 00000000e9c7f79b [ 48.480498] Built 4 zonelists, mobility grouping on. Total pages: 961440 [ 48.480508] Policy zone: Normal [ 48.508318] smpboot: Booting Node 4 Processor 4 APIC 0x8 [ 48.509255] Disabled fast string operations [ 48.509807] smpboot: CPU 4 Converting physical 8 to logical package 4 [ 48.509825] smpboot: CPU 4 Converting physical 0 to logical die 4 [ 48.510040] WARNING: workqueue cpumask: online intersect > possible intersect
I will double check. There are changes required on the hotplug side. I would like to see that this one doesn't blow up before diving there.
[ 48.510324] vmware: vmware-stealtime: cpu 4, pa 3e667000 [ 48.511311] Will online and init hotplugged CPU: 4
Hot remove does not quite work. It might be issue in ACPI/Firmware code or Hypervisor. Debugging…
Do you want me to perform any specific tests?
No, not really. AFAIU your issue has been reproducible during boot and that seems to be fixed. I will work on the hotplug side of the things and post something resembling a real patch soon. That would require also memory hotplug testing.
Thanks for your help!