Re: [PATCH] x86/hyperv: fix kexec crash due to VP assist page corruption

27 Aug 2024

Anirudh Rayabharam anirudh@anirudhrb.com writes:
...
On Mon, Aug 26, 2024 at 02:36:44PM +0200, Vitaly Kuznetsov wrote:
...
Anirudh Rayabharam anirudh@anirudhrb.com writes:
...
From: Anirudh Rayabharam (Microsoft) anirudh@anirudhrb.com
9636be85cc5b ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go
online/offline") introduces a new cpuhp state for hyperv initialization.
cpuhp_setup_state() returns the state number if state is CPUHP_AP_ONLINE_DYN
or CPUHP_BP_PREPARE_DYN and 0 for all other states. For the hyperv case,
since a new cpuhp state was introduced it would return 0. However,
in hv_machine_shutdown(), the cpuhp_remove_state() call is conditioned upon
"hyperv_init_cpuhp > 0". This will never be true and so hv_cpu_die() won't be
called on all CPUs. This means the VP assist page won't be reset. When the
kexec kernel tries to setup the VP assist page again, the hypervisor corrupts
the memory region of the old VP assist page causing a panic in case the kexec
kernel is using that memory elsewhere. This was originally fixed in dfe94d4086e4
("x86/hyperv: Fix kexec panic/hang issues").
Set hyperv_init_cpuhp to CPUHP_AP_HYPERV_ONLINE upon successful setup so that
the hyperv cpuhp state is removed correctly on kexec and the necessary cleanup
takes place.
Cc: stable@vger.kernel.org
Fixes: 9636be85cc5b ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline")
Signed-off-by: Anirudh Rayabharam (Microsoft) anirudh@anirudhrb.com

arch/x86/hyperv/hv_init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 17a71e92a343..81d1981a75d1 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,7 +607,7 @@ void __init hyperv_init(void)
 
   register_syscore_ops(&hv_syscore_ops);

hyperv_init_cpuhp = cpuhp;


hyperv_init_cpuhp = CPUHP_AP_HYPERV_ONLINE;

Do we really need 'hyperv_init_cpuhp' at all? I.e. post-change (which
LGTM btw), I can only see one usage in hv_machine_shutdown():
if (kexec_in_progress && hyperv_init_cpuhp > 0)
           cpuhp_remove_state(hyperv_init_cpuhp);
and I'm wondering if the 'hyperv_init_cpuhp' check is really
needed. This only case where this check would fail is if we're crashing
in between ms_hyperv_init_platform() and hyperv_init() afaiu. Does it
Or if we fail to setup the cpuhp state for some reason but don't
actually crash and then later do a kexec?
I see this can happen for CPUHP_AP_ONLINE_DYN/CPUHP_BP_PREPARE_DYN
because we run out of free slots (40/20), but here we have our own
dedicated CPUHP_AP_HYPERV_ONLINE and other failure paths seem to be
exotic...
...
I guess I was just trying to be extra safe and make sure we have
actually setup the cpuhp state before calling cpuhp_remove_state()
for it. However, looking elsewhere in the kernel code I don't
see anybody doing this for custom states...
...
hurt if we try cpuhp_remove_state() anyway?
cpuhp_invoke_callback() would trigger a WARNING if we try to remove a
cpuhp state that was never setup.
184         if (cpuhp_step_empty(bringup, step)) {
185                 WARN_ON_ONCE(1);
186                 return 0;
187         }
Personally, I'd say that getting an extra WARN for such a corner case
(failing to setup cpuhp state or crashing in between
ms_hyperv_init_platform() and hyperv_init()) is OK.
Alternatively, we can convert hyperv_init_cpuhp to a boolean to make it
a bit more staitforward but as it's uncomon to do it for other states,
it's likely an overkill.
-- 
Vitaly



    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] x86/hyperv: fix kexec crash due to VP assist page corruption