When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb() passed head pfn to kill_accessing_process(), that is not right. The precise pfn of the poisoned page should be used in order to determine the precise vaddr as the SIGBUS payload.
This issue has already been taken care of in the normal path, that is, hwpoison_user_mappings(), see [1][2]. Further more, for [3] to work correctly in the hugetlb repoisoning case, it's essential to inform VM the precise poisoned page, not the head page.
[1] https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org [2] https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com [3] https://lore.kernel.org/lkml/20251116013223.1557158-1-jiaqiyan@google.com/
Cc: stable@vger.kernel.org Signed-off-by: Jane Chu jane.chu@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com --- v2 -> v3: incorporated suggestions from Miaohe and Matthew. v1 -> v2: pickup R-B, add stable to cc list. --- mm/memory-failure.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 8b47e8a1b12d..98612ac961b0 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -692,6 +692,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, unsigned long poisoned_pfn, struct to_kill *tk) { unsigned long pfn = 0; + unsigned long hwpoison_vaddr; + unsigned long mask;
if (pte_present(pte)) { pfn = pte_pfn(pte); @@ -702,10 +704,12 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, pfn = softleaf_to_pfn(entry); }
- if (!pfn || pfn != poisoned_pfn) + mask = ~((1UL << (shift - PAGE_SHIFT)) - 1); + if (!pfn || ((pfn & mask) != (poisoned_pfn & mask))) return 0;
- set_to_kill(tk, addr, shift); + hwpoison_vaddr = addr + ((poisoned_pfn - pfn) << PAGE_SHIFT); + set_to_kill(tk, hwpoison_vaddr, shift); return 1; }
@@ -2038,10 +2042,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb return 0; case MF_HUGETLB_ALREADY_POISONED: case MF_HUGETLB_ACC_EXISTING_POISON: - if (flags & MF_ACTION_REQUIRED) { - folio = page_folio(p); - res = kill_accessing_process(current, folio_pfn(folio), flags); - } + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, pfn, flags); if (res == MF_HUGETLB_ALREADY_POISONED) action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED); else
On 12/23/2025 1:13 AM, David Hildenbrand (Red Hat) wrote:
On 12/23/25 02:21, Jane Chu wrote:
When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb() passed head pfn to kill_accessing_process(), that is not right. The precise pfn of the poisoned page should be used in order to determine the precise vaddr as the SIGBUS payload.
This issue has already been taken care of in the normal path, that is, hwpoison_user_mappings(), see [1][2]. Further more, for [3] to work correctly in the hugetlb repoisoning case, it's essential to inform VM the precise poisoned page, not the head page.
[1] https://lkml.kernel.org/r/20231218135837.3310403-1- willy@infradead.org [2] https://lkml.kernel.org/r/20250224211445.2663312-1- jane.chu@oracle.com [3] https://lore.kernel.org/lkml/20251116013223.1557158-1- jiaqiyan@google.com/
Cc: stable@vger.kernel.org Signed-off-by: Jane Chu jane.chu@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com
v2 -> v3: incorporated suggestions from Miaohe and Matthew. v1 -> v2: pickup R-B, add stable to cc list.
Please don't send new versions when the discussions on your old submissions are still going on. Makes the whole discussion hard to follow.
Got it, thanks.
You asked in the old version:
" What happens if non-head PFN of hugetlb is indicated in a SIGBUG to QEMU? Because, the regular path, the path via hwpoison_user_mappings() already behave this way.
I'm not familiar with QEMU. AFAIK, the need for this patch came from our VM/QEMU team. "
I just took a look and I think it's ok. I remembered a discussion around [1] where we concluded that the kernel would always give us the first PFN, but essentially the whole hugetlb folio will vanish.
But in QEMU we work completely on the given vaddr, and are able to identify that it's a hugetlb folio through our information on memory mappings.
QEMU stores a list of positioned vaddrs, to remap them (e.g., fallocate(PUNCH_HOLE)) when restarting the VM. If we get various vaddrs for the same hugetlb folio we will simply try to remap a hugetlb folio several times, which is not a real problem. I think we discussed that that could get optimized as part of [1] (or follow-up versions) if ever required.
[1] https://lore.kernel.org/qemu-devel/20240910090747.2741475-1- william.roche@oracle.com/
Thanks a lot!
-jane
On 12/23/25 02:21, Jane Chu wrote:
When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb() passed head pfn to kill_accessing_process(), that is not right. The precise pfn of the poisoned page should be used in order to determine the precise vaddr as the SIGBUS payload.
This issue has already been taken care of in the normal path, that is, hwpoison_user_mappings(), see [1][2]. Further more, for [3] to work correctly in the hugetlb repoisoning case, it's essential to inform VM the precise poisoned page, not the head page.
[1] https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org [2] https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com [3] https://lore.kernel.org/lkml/20251116013223.1557158-1-jiaqiyan@google.com/
Cc: stable@vger.kernel.org Signed-off-by: Jane Chu jane.chu@oracle.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com
v2 -> v3: incorporated suggestions from Miaohe and Matthew. v1 -> v2: pickup R-B, add stable to cc list.
Please don't send new versions when the discussions on your old submissions are still going on. Makes the whole discussion hard to follow.
You asked in the old version:
" What happens if non-head PFN of hugetlb is indicated in a SIGBUG to QEMU? Because, the regular path, the path via hwpoison_user_mappings() already behave this way.
I'm not familiar with QEMU. AFAIK, the need for this patch came from our VM/QEMU team. "
I just took a look and I think it's ok. I remembered a discussion around [1] where we concluded that the kernel would always give us the first PFN, but essentially the whole hugetlb folio will vanish.
But in QEMU we work completely on the given vaddr, and are able to identify that it's a hugetlb folio through our information on memory mappings.
QEMU stores a list of positioned vaddrs, to remap them (e.g., fallocate(PUNCH_HOLE)) when restarting the VM. If we get various vaddrs for the same hugetlb folio we will simply try to remap a hugetlb folio several times, which is not a real problem. I think we discussed that that could get optimized as part of [1] (or follow-up versions) if ever required.
[1] https://lore.kernel.org/qemu-devel/20240910090747.2741475-1-william.roche@or...
linux-stable-mirror@lists.linaro.org