Re: [PATCH v10 4/4] ACPI: APEI: handle synchronous exceptions in task work

18 Dec 2023


      On Mon, Dec 18, 2023 at 02:45:21PM +0800, Shuai Xue wrote:
...
Hardware errors could be signaled by asynchronous interrupt, e.g. when an
error is detected by a background scrubber, or signaled by synchronous
exception, e.g. when a CPU tries to access a poisoned cache line. Both
synchronous and asynchronous error are queued as a memory_failure() work
and handled by a dedicated kthread in workqueue.
However, the memory failure recovery sends SIBUS with wrong BUS_MCEERR_AO
si_code for synchronous errors in early kill mode, even MF_ACTION_REQUIRED
is set. The main problem is that the memory failure work is handled in
kthread context but not the user-space process which is accessing the
corrupt memory location, so it will send SIGBUS with BUS_MCEERR_AO si_code
to the user-space process instead of BUS_MCEERR_AR in kill_proc().
To this end, queue memory_failure() as a task_work so that the current
context in memory_failure() is exactly belongs to the process consuming
poison data and it will send SIBBUS with proper si_code.
Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com
Tested-by: Ma Wupeng mawupeng1@huawei.com
Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Reviewed-by: Xiaofei Tan tanxiaofei@huawei.com
Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com

drivers/acpi/apei/ghes.c | 77 +++++++++++++++++++++++-----------------
 include/acpi/ghes.h      |  3 --
 mm/memory-failure.c      | 13 -------
 3 files changed, 44 insertions(+), 49 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v10 4/4] ACPI: APEI: handle synchronous exceptions in task work