The patch titled Subject: mm: fix panic in __alloc_pages has been added to the -mm tree. Its filename is mm-fix-panic-in-__alloc_pages.patch
This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-panic-in-__alloc_pages.patc... and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-panic-in-__alloc_pages.patc...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
------------------------------------------------------ From: Alexey Makhalov amakhalov@vmware.com Subject: mm: fix panic in __alloc_pages
There is a kernel panic caused by pcpu_alloc_pages() passing offlined and uninitialized node to alloc_pages_node() leading to panic by NULL dereferencing uninitialized NODE_DATA(nid).
CPU2 has been hot-added BUG: unable to handle page fault for address: 0000000000001608 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 1 Comm: systemd Tainted: G E 5.15.0-rc7+ #11 Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW
RIP: 0010:__alloc_pages+0x127/0x290 Code: 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 e0 48 8b 55 b8 c1 e8 0c 83 e0 01 88 45 d0 4c 89 c8 48 85 d2 0f 85 1a 01 00 00 <45> 3b 41 08 0f 82 10 01 00 00 48 89 45 c0 48 8b 00 44 89 e2 81 e2 RSP: 0018:ffffc900006f3bc8 EFLAGS: 00010246 RAX: 0000000000001600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000cc2 RBP: ffffc900006f3c18 R08: 0000000000000001 R09: 0000000000001600 R10: ffffc900006f3a40 R11: ffff88813c9fffe8 R12: 0000000000000cc2 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000cc2 FS: 00007f27ead70500(0000) GS:ffff88807ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001608 CR3: 000000000582c003 CR4: 00000000001706b0 Call Trace: pcpu_alloc_pages.constprop.0+0xe4/0x1c0 pcpu_populate_chunk+0x33/0xb0 pcpu_alloc+0x4d3/0x6f0 __alloc_percpu_gfp+0xd/0x10 alloc_mem_cgroup_per_node_info+0x54/0xb0 mem_cgroup_alloc+0xed/0x2f0 mem_cgroup_css_alloc+0x33/0x2f0 css_create+0x3a/0x1f0 cgroup_apply_control_enable+0x12b/0x150 cgroup_mkdir+0xdd/0x110 kernfs_iop_mkdir+0x4f/0x80 vfs_mkdir+0x178/0x230 do_mkdirat+0xfd/0x120 __x64_sys_mkdir+0x47/0x70 ? syscall_exit_to_user_mode+0x21/0x50 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
Panic can be easily reproduced by disabling udev rule for automatic onlining hot added CPU followed by CPU with memoryless node (NUMA node with CPU only) hot add.
Hot adding CPU and memoryless node does not bring the node to online state. Memoryless node will be onlined only during the onlining its CPU.
Node can be in one of the following states: 1. not present.(nid == NUMA_NO_NODE) 2. present, but offline (nid > NUMA_NO_NODE, node_online(nid) == 0, NODE_DATA(nid) == NULL) 3. present and online (nid > NUMA_NO_NODE, node_online(nid) > 0, NODE_DATA(nid) != NULL)
Percpu code is doing allocations for all possible CPUs. The issue happens when it serves hot added but not yet onlined CPU when its node is in 2nd state. This node is not ready to use, fallback to numa_mem_id().
Link: https://lkml.kernel.org/r/20211108202325.20304-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov amakhalov@vmware.com Reviewed-by: David Hildenbrand david@redhat.com Cc: David Hildenbrand david@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: Oscar Salvador osalvador@suse.de Cc: Dennis Zhou dennis@kernel.org Cc: Tejun Heo tj@kernel.org Cc: Christoph Lameter cl@linux.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
mm/percpu-vm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/percpu-vm.c~mm-fix-panic-in-__alloc_pages +++ a/mm/percpu-vm.c @@ -84,15 +84,19 @@ static int pcpu_alloc_pages(struct pcpu_ gfp_t gfp) { unsigned int cpu, tcpu; - int i; + int i, nid;
gfp |= __GFP_HIGHMEM;
for_each_possible_cpu(cpu) { + nid = cpu_to_node(cpu); + if (nid == NUMA_NO_NODE || !node_online(nid)) + nid = numa_mem_id(); + for (i = page_start; i < page_end; i++) { struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
- *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0); + *pagep = alloc_pages_node(nid, gfp, 0); if (!*pagep) goto err; } _
Patches currently in -mm which might be from amakhalov@vmware.com are
mm-fix-panic-in-__alloc_pages.patch
I have opposed this patch http://lkml.kernel.org/r/YYj91Mkt4m8ySIWt@dhcp22.suse.cz There was no response to that feedback. I will not go as far as to nack it explicitly because pcp allocator is not an area I would nack patches but seriously, this issue needs a deeper look rather than a paper over patch. I hope we do not want to do a similar thing to all callers of cpu_to_mem.
On Mon 08-11-21 12:50:31, Andrew Morton wrote:
The patch titled Subject: mm: fix panic in __alloc_pages has been added to the -mm tree. Its filename is mm-fix-panic-in-__alloc_pages.patch
This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-panic-in-__alloc_pages.patc... and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-panic-in-__alloc_pages.patc...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
From: Alexey Makhalov amakhalov@vmware.com Subject: mm: fix panic in __alloc_pages
There is a kernel panic caused by pcpu_alloc_pages() passing offlined and uninitialized node to alloc_pages_node() leading to panic by NULL dereferencing uninitialized NODE_DATA(nid).
CPU2 has been hot-added BUG: unable to handle page fault for address: 0000000000001608 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 1 Comm: systemd Tainted: G E 5.15.0-rc7+ #11 Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW
RIP: 0010:__alloc_pages+0x127/0x290 Code: 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 e0 48 8b 55 b8 c1 e8 0c 83 e0 01 88 45 d0 4c 89 c8 48 85 d2 0f 85 1a 01 00 00 <45> 3b 41 08 0f 82 10 01 00 00 48 89 45 c0 48 8b 00 44 89 e2 81 e2 RSP: 0018:ffffc900006f3bc8 EFLAGS: 00010246 RAX: 0000000000001600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000cc2 RBP: ffffc900006f3c18 R08: 0000000000000001 R09: 0000000000001600 R10: ffffc900006f3a40 R11: ffff88813c9fffe8 R12: 0000000000000cc2 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000cc2 FS: 00007f27ead70500(0000) GS:ffff88807ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001608 CR3: 000000000582c003 CR4: 00000000001706b0 Call Trace: pcpu_alloc_pages.constprop.0+0xe4/0x1c0 pcpu_populate_chunk+0x33/0xb0 pcpu_alloc+0x4d3/0x6f0 __alloc_percpu_gfp+0xd/0x10 alloc_mem_cgroup_per_node_info+0x54/0xb0 mem_cgroup_alloc+0xed/0x2f0 mem_cgroup_css_alloc+0x33/0x2f0 css_create+0x3a/0x1f0 cgroup_apply_control_enable+0x12b/0x150 cgroup_mkdir+0xdd/0x110 kernfs_iop_mkdir+0x4f/0x80 vfs_mkdir+0x178/0x230 do_mkdirat+0xfd/0x120 __x64_sys_mkdir+0x47/0x70 ? syscall_exit_to_user_mode+0x21/0x50 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
Panic can be easily reproduced by disabling udev rule for automatic onlining hot added CPU followed by CPU with memoryless node (NUMA node with CPU only) hot add.
Hot adding CPU and memoryless node does not bring the node to online state. Memoryless node will be onlined only during the onlining its CPU.
Node can be in one of the following states:
- not present.(nid == NUMA_NO_NODE)
- present, but offline (nid > NUMA_NO_NODE, node_online(nid) == 0, NODE_DATA(nid) == NULL)
- present and online (nid > NUMA_NO_NODE, node_online(nid) > 0, NODE_DATA(nid) != NULL)
Percpu code is doing allocations for all possible CPUs. The issue happens when it serves hot added but not yet onlined CPU when its node is in 2nd state. This node is not ready to use, fallback to numa_mem_id().
Link: https://lkml.kernel.org/r/20211108202325.20304-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov amakhalov@vmware.com Reviewed-by: David Hildenbrand david@redhat.com Cc: David Hildenbrand david@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: Oscar Salvador osalvador@suse.de Cc: Dennis Zhou dennis@kernel.org Cc: Tejun Heo tj@kernel.org Cc: Christoph Lameter cl@linux.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
mm/percpu-vm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/percpu-vm.c~mm-fix-panic-in-__alloc_pages +++ a/mm/percpu-vm.c @@ -84,15 +84,19 @@ static int pcpu_alloc_pages(struct pcpu_ gfp_t gfp) { unsigned int cpu, tcpu;
- int i;
- int i, nid;
gfp |= __GFP_HIGHMEM; for_each_possible_cpu(cpu) {
nid = cpu_to_node(cpu);
if (nid == NUMA_NO_NODE || !node_online(nid))
nid = numa_mem_id();
- for (i = page_start; i < page_end; i++) { struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
*pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
}*pagep = alloc_pages_node(nid, gfp, 0); if (!*pagep) goto err;
_
Patches currently in -mm which might be from amakhalov@vmware.com are
mm-fix-panic-in-__alloc_pages.patch
On 09.11.21 09:37, Michal Hocko wrote:
I have opposed this patch http://lkml.kernel.org/r/YYj91Mkt4m8ySIWt@dhcp22.suse.cz There was no response to that feedback. I will not go as far as to nack it explicitly because pcp allocator is not an area I would nack patches but seriously, this issue needs a deeper look rather than a paper over patch. I hope we do not want to do a similar thing to all callers of cpu_to_mem.
While we could move it into the !HOLES version of cpu_to_mem(), calling cpu_to_mem() on an offline (and eventually not even present) CPU (with an offline node) is really a corner case.
Instead of additional runtime overhead for all cpu_to_mem(), my take would be to just do it for the random special cases. Sure, we can document that people should be careful when calling cpu_to_mem() on offline CPUs. But IMHO it's really a corner case.
On Tue 09-11-21 09:42:56, David Hildenbrand wrote:
On 09.11.21 09:37, Michal Hocko wrote:
I have opposed this patch http://lkml.kernel.org/r/YYj91Mkt4m8ySIWt@dhcp22.suse.cz There was no response to that feedback. I will not go as far as to nack it explicitly because pcp allocator is not an area I would nack patches but seriously, this issue needs a deeper look rather than a paper over patch. I hope we do not want to do a similar thing to all callers of cpu_to_mem.
While we could move it into the !HOLES version of cpu_to_mem(), calling cpu_to_mem() on an offline (and eventually not even present) CPU (with an offline node) is really a corner case.
Instead of additional runtime overhead for all cpu_to_mem(), my take would be to just do it for the random special cases. Sure, we can document that people should be careful when calling cpu_to_mem() on offline CPUs. But IMHO it's really a corner case.
I suspect I haven't made myself clear enough. I do not think we should be touching cpu_to_mem/cpu_to_node and handle this corner case. We should be looking at the underlying problem instead. We cannot really rely on cpu to be onlined to have a proper node association. We should really look at the initialization code and handle this situation properly. Memory less nodes are something we have been dealing with already. This particular instance of the problem is new and we should understand why.
Hello,
On Tue, Nov 09, 2021 at 12:00:46PM +0100, Michal Hocko wrote:
On Tue 09-11-21 09:42:56, David Hildenbrand wrote:
On 09.11.21 09:37, Michal Hocko wrote:
I have opposed this patch http://lkml.kernel.org/r/YYj91Mkt4m8ySIWt@dhcp22.suse.cz There was no response to that feedback. I will not go as far as to nack it explicitly because pcp allocator is not an area I would nack patches but seriously, this issue needs a deeper look rather than a paper over patch. I hope we do not want to do a similar thing to all callers of cpu_to_mem.
While we could move it into the !HOLES version of cpu_to_mem(), calling cpu_to_mem() on an offline (and eventually not even present) CPU (with an offline node) is really a corner case.
Instead of additional runtime overhead for all cpu_to_mem(), my take would be to just do it for the random special cases. Sure, we can document that people should be careful when calling cpu_to_mem() on offline CPUs. But IMHO it's really a corner case.
I suspect I haven't made myself clear enough. I do not think we should be touching cpu_to_mem/cpu_to_node and handle this corner case. We should be looking at the underlying problem instead. We cannot really rely on cpu to be onlined to have a proper node association. We should really look at the initialization code and handle this situation properly. Memory less nodes are something we have been dealing with already. This particular instance of the problem is new and we should understand why. -- Michal Hocko SUSE Labs
So I think we're still short a solution here. This patch solves the side effect but not the underlying problem related to cpu hotplug.
I'm fine with this going in as a stop gap because I imagine the fixes to hotplug are a lot more intrusive, but do we have someone who can own that work to fix hotplug? I think that should be a requirement for taking this because clearly it's hotplug that's broken and not percpu.
Acked-by: Dennis Zhou dennis@kernel.org
Thanks, Dennis
On Fri 12-11-21 13:20:20, Dennis Zhou wrote:
Hello,
On Tue, Nov 09, 2021 at 12:00:46PM +0100, Michal Hocko wrote:
On Tue 09-11-21 09:42:56, David Hildenbrand wrote:
On 09.11.21 09:37, Michal Hocko wrote:
I have opposed this patch http://lkml.kernel.org/r/YYj91Mkt4m8ySIWt@dhcp22.suse.cz There was no response to that feedback. I will not go as far as to nack it explicitly because pcp allocator is not an area I would nack patches but seriously, this issue needs a deeper look rather than a paper over patch. I hope we do not want to do a similar thing to all callers of cpu_to_mem.
While we could move it into the !HOLES version of cpu_to_mem(), calling cpu_to_mem() on an offline (and eventually not even present) CPU (with an offline node) is really a corner case.
Instead of additional runtime overhead for all cpu_to_mem(), my take would be to just do it for the random special cases. Sure, we can document that people should be careful when calling cpu_to_mem() on offline CPUs. But IMHO it's really a corner case.
I suspect I haven't made myself clear enough. I do not think we should be touching cpu_to_mem/cpu_to_node and handle this corner case. We should be looking at the underlying problem instead. We cannot really rely on cpu to be onlined to have a proper node association. We should really look at the initialization code and handle this situation properly. Memory less nodes are something we have been dealing with already. This particular instance of the problem is new and we should understand why. -- Michal Hocko SUSE Labs
So I think we're still short a solution here. This patch solves the side effect but not the underlying problem related to cpu hotplug.
I'm fine with this going in as a stop gap because I imagine the fixes to hotplug are a lot more intrusive, but do we have someone who can own that work to fix hotplug? I think that should be a requirement for taking this because clearly it's hotplug that's broken and not percpu.
I have asked several times for details about the specific setup that has led to the reported crash. Without much success so far. Reproduction steps would be the first step. That would allow somebody to work on this at least if Alexey doesn't have time to dive into this deeper.
I would be more inclined to a stop gap workaround if this was a more wide spread problem but a lack of other repports suggests this has been a one off.
The final saying is yours of course.
Acked-by: Dennis Zhou dennis@kernel.org
Thanks, Dennis
Hi Michal,
I have asked several times for details about the specific setup that has led to the reported crash. Without much success so far. Reproduction steps would be the first step. That would allow somebody to work on this at least if Alexey doesn't have time to dive into this deeper.
I didn’t know that repro steps are still not clear.
To reproduce the panic you need to have a system, where you can hot add the CPU that belongs to memoryless NUMA node which is not present and onlined yet. In other words, by hot adding CPU, you will add both CPU and NUMA node at the same time. I’m using VMware hypervisor and linux VM there configured in a way that every (possible) CPU has its own NUMA node. Before doing CPU hot add, udev rule for CPU onlining should be disabled. After CPU hot add event, panic will be triggered shortly right on the next percpu allocation.
Let me know if this is enough or you need some extra information.
Thanks, —Alexey
On Mon 15-11-21 11:04:16, Alexey Makhalov wrote:
Hi Michal,
I have asked several times for details about the specific setup that has led to the reported crash. Without much success so far. Reproduction steps would be the first step. That would allow somebody to work on this at least if Alexey doesn't have time to dive into this deeper.
I didn’t know that repro steps are still not clear.
To reproduce the panic you need to have a system, where you can hot add the CPU that belongs to memoryless NUMA node which is not present and onlined yet. In other words, by hot adding CPU, you will add both CPU and NUMA node at the same time.
There seems to be something different in your setup because memory less nodes have reportedly worked on x86. I suspect something must be different in your setup. Maybe it is that you are adding a cpu that is outside of possible cpus intialized during boot time. Those should have their nodes initialized properly - at least per init_cpu_to_node. Your report doesn't really explain how the cpu is hotadded. Maybe you are trying to do something that has never been supported on x86.
It would be really great if you can provide more information in the original email thread. E.g. boot time messges and then more details about the hotplug operation as well (e.g. which cpu, the node association, how it is injected to the guest etc.).
Thanks!
On Nov 15, 2021, at 4:58 AM, Michal Hocko mhocko@suse.com wrote:
On Mon 15-11-21 11:04:16, Alexey Makhalov wrote:
Hi Michal,
I have asked several times for details about the specific setup that has led to the reported crash. Without much success so far. Reproduction steps would be the first step. That would allow somebody to work on this at least if Alexey doesn't have time to dive into this deeper.
I didn’t know that repro steps are still not clear.
To reproduce the panic you need to have a system, where you can hot add the CPU that belongs to memoryless NUMA node which is not present and onlined yet. In other words, by hot adding CPU, you will add both CPU and NUMA node at the same time.
There seems to be something different in your setup because memory less nodes have reportedly worked on x86. I suspect something must be different in your setup. Maybe it is that you are adding a cpu that is outside of possible cpus intialized during boot time. Those should have their nodes initialized properly - at least per init_cpu_to_node. Your report doesn't really explain how the cpu is hotadded. Maybe you are trying to do something that has never been supported on x86.
Memoryless nodes are supported by x86. But hot add of such nodes not quite done.
It would be really great if you can provide more information in the original email thread. E.g. boot time messges and then more details about the hotplug operation as well (e.g. which cpu, the node association, how it is injected to the guest etc.).
I’ll provide more information in the main thread.
Regards, —Alexey
On Mon, Nov 15, 2021 at 11:11:44PM +0000, Alexey Makhalov wrote:
On Nov 15, 2021, at 4:58 AM, Michal Hocko mhocko@suse.com wrote:
On Mon 15-11-21 11:04:16, Alexey Makhalov wrote:
Hi Michal,
I have asked several times for details about the specific setup that has led to the reported crash. Without much success so far. Reproduction steps would be the first step. That would allow somebody to work on this at least if Alexey doesn't have time to dive into this deeper.
I didn’t know that repro steps are still not clear.
To reproduce the panic you need to have a system, where you can hot add the CPU that belongs to memoryless NUMA node which is not present and onlined yet. In other words, by hot adding CPU, you will add both CPU and NUMA node at the same time.
There seems to be something different in your setup because memory less nodes have reportedly worked on x86. I suspect something must be different in your setup. Maybe it is that you are adding a cpu that is outside of possible cpus intialized during boot time. Those should have their nodes initialized properly - at least per init_cpu_to_node. Your report doesn't really explain how the cpu is hotadded. Maybe you are trying to do something that has never been supported on x86.
Memoryless nodes are supported by x86. But hot add of such nodes not quite done.
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
If this is a new use case, then I'm inclined to say this patch should NOT go in and a proper fix should be implemented on hotplug's side. I don't want to be in the business of having/seeing this conversation reoccur because we just papered over this issue in percpu.
Thanks, Dennis
On Mon, 15 Nov 2021, Dennis Zhou wrote:
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
Could you do that step by step?
First add the new node and ensure everything is ok and that the memory is online.
*After* that is done bring up the new processor and associate the processor with *online* memory.
On Tue 16-11-21 13:30:45, Christoph Lameter wrote:
On Mon, 15 Nov 2021, Dennis Zhou wrote:
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
Could you do that step by step?
First add the new node and ensure everything is ok and that the memory is online.
*After* that is done bring up the new processor and associate the processor with *online* memory.
We are discussing that in the original thread - http://lkml.kernel.org/r/YZN3ExwL7BiDS5nj@dhcp22.suse.cz
This patch is a a workaround that problem in the pcp code.
On Mon 15-11-21 22:52:27, Dennis Zhou wrote:
On Mon, Nov 15, 2021 at 11:11:44PM +0000, Alexey Makhalov wrote:
On Nov 15, 2021, at 4:58 AM, Michal Hocko mhocko@suse.com wrote:
On Mon 15-11-21 11:04:16, Alexey Makhalov wrote:
Hi Michal,
I have asked several times for details about the specific setup that has led to the reported crash. Without much success so far. Reproduction steps would be the first step. That would allow somebody to work on this at least if Alexey doesn't have time to dive into this deeper.
I didn’t know that repro steps are still not clear.
To reproduce the panic you need to have a system, where you can hot add the CPU that belongs to memoryless NUMA node which is not present and onlined yet. In other words, by hot adding CPU, you will add both CPU and NUMA node at the same time.
There seems to be something different in your setup because memory less nodes have reportedly worked on x86. I suspect something must be different in your setup. Maybe it is that you are adding a cpu that is outside of possible cpus intialized during boot time. Those should have their nodes initialized properly - at least per init_cpu_to_node. Your report doesn't really explain how the cpu is hotadded. Maybe you are trying to do something that has never been supported on x86.
Memoryless nodes are supported by x86. But hot add of such nodes not quite done.
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
If this is a new use case, then I'm inclined to say this patch should NOT go in and a proper fix should be implemented on hotplug's side. I don't want to be in the business of having/seeing this conversation reoccur because we just papered over this issue in percpu.
The patch still seems to be in the mmotm tree. I have sent a different fix candidate [1] which should be more robust and cover also other potential places.
[1] http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org
On Tue, 14 Dec 2021 11:11:54 +0100 Michal Hocko mhocko@suse.com wrote:
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
If this is a new use case, then I'm inclined to say this patch should NOT go in and a proper fix should be implemented on hotplug's side. I don't want to be in the business of having/seeing this conversation reoccur because we just papered over this issue in percpu.
The patch still seems to be in the mmotm tree. I have sent a different fix candidate [1] which should be more robust and cover also other potential places.
[1] http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org
Is cool, I'm paying attention.
We do want something short and simple for backporting to -stable (like Alexey's patch) so please bear that in mind while preparing an alternative.
On Tue 14-12-21 12:57:48, Andrew Morton wrote:
On Tue, 14 Dec 2021 11:11:54 +0100 Michal Hocko mhocko@suse.com wrote:
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
If this is a new use case, then I'm inclined to say this patch should NOT go in and a proper fix should be implemented on hotplug's side. I don't want to be in the business of having/seeing this conversation reoccur because we just papered over this issue in percpu.
The patch still seems to be in the mmotm tree. I have sent a different fix candidate [1] which should be more robust and cover also other potential places.
[1] http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org
Is cool, I'm paying attention.
We do want something short and simple for backporting to -stable (like Alexey's patch) so please bear that in mind while preparing an alternative.
I think we want something that fixes the underlying problem. Please keep in mind that the pcp allocation is not the only place to hit the issue. We have more. I do not want we want to handle each and every one separately.
I am definitly not going to push for my solution but if there is a consensus this is the right approach then I do not think we really want to implement these partial workarounds.
On Wed 15-12-21 11:05:12, Michal Hocko wrote:
On Tue 14-12-21 12:57:48, Andrew Morton wrote:
On Tue, 14 Dec 2021 11:11:54 +0100 Michal Hocko mhocko@suse.com wrote:
I need some clarification here. It sounds like memoryless nodes work on x86, but hotplug + memoryless nodes isn't a supported use case or you're introducing it as a new use case?
If this is a new use case, then I'm inclined to say this patch should NOT go in and a proper fix should be implemented on hotplug's side. I don't want to be in the business of having/seeing this conversation reoccur because we just papered over this issue in percpu.
The patch still seems to be in the mmotm tree. I have sent a different fix candidate [1] which should be more robust and cover also other potential places.
[1] http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org
Is cool, I'm paying attention.
We do want something short and simple for backporting to -stable (like Alexey's patch) so please bear that in mind while preparing an alternative.
I think we want something that fixes the underlying problem. Please keep in mind that the pcp allocation is not the only place to hit the issue. We have more. I do not want we want to handle each and every one separately.
I am definitly not going to push for my solution but if there is a consensus this is the right approach then I do not think we really want to implement these partial workarounds.
Btw. I forgot to add that if we do not agree on the preallocation approach then the approach should be something like http://lkml.kernel.org/r/51c65635-1dae-6ba4-daf9-db9df0ec35d8@redhat.com proposed by David.
linux-stable-mirror@lists.linaro.org