From: Joerg Roedel jroedel@suse.de
Allow a runtime opt-out of kexec support for architecture code in case the kernel is running in an environment where kexec is not properly supported yet.
This will be used on x86 when the kernel is running as an SEV-ES guest. SEV-ES guests need special handling for kexec to hand over all CPUs to the new kernel. This requires special hypervisor support and handling code in the guest which is not yet implemented.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Joerg Roedel jroedel@suse.de --- include/linux/kexec.h | 1 + kernel/kexec.c | 14 ++++++++++++++ kernel/kexec_file.c | 9 +++++++++ 3 files changed, 24 insertions(+)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0c994ae37729..85c30dcd0bdc 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -201,6 +201,7 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, unsigned long buf_len); #endif int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); +bool arch_kexec_supported(void);
extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); diff --git a/kernel/kexec.c b/kernel/kexec.c index b5e40f069768..275cda429380 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -190,11 +190,25 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, * that to happen you need to do that yourself. */
+bool __weak arch_kexec_supported(void) +{ + return true; +} + static inline int kexec_load_check(unsigned long nr_segments, unsigned long flags) { int result;
+ /* + * The architecture may support kexec in general, but the kernel could + * run in an environment where it is not (yet) possible to execute a new + * kernel. Allow the architecture code to opt-out of kexec support when + * it is running in such an environment. + */ + if (!arch_kexec_supported()) + return -ENOSYS; + /* We only trust the superuser with rebooting the system. */ if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 33400ff051a8..96d08a512e9c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -358,6 +358,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, int ret = 0, i; struct kimage **dest_image, *image;
+ /* + * The architecture may support kexec in general, but the kernel could + * run in an environment where it is not (yet) possible to execute a new + * kernel. Allow the architecture code to opt-out of kexec support when + * it is running in such an environment. + */ + if (!arch_kexec_supported()) + return -ENOSYS; + /* We only trust the superuser with rebooting the system. */ if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM;
On Mon, Sep 13, 2021 at 05:55:52PM +0200, Joerg Roedel wrote:
From: Joerg Roedel jroedel@suse.de
Allow a runtime opt-out of kexec support for architecture code in case the kernel is running in an environment where kexec is not properly supported yet.
This will be used on x86 when the kernel is running as an SEV-ES guest. SEV-ES guests need special handling for kexec to hand over all CPUs to the new kernel. This requires special hypervisor support and handling code in the guest which is not yet implemented.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Joerg Roedel jroedel@suse.de
include/linux/kexec.h | 1 + kernel/kexec.c | 14 ++++++++++++++ kernel/kexec_file.c | 9 +++++++++ 3 files changed, 24 insertions(+)
I guess I can take this through the tip tree along with the next one.
Eric?
Borislav Petkov bp@alien8.de writes:
On Mon, Sep 13, 2021 at 05:55:52PM +0200, Joerg Roedel wrote:
From: Joerg Roedel jroedel@suse.de
Allow a runtime opt-out of kexec support for architecture code in case the kernel is running in an environment where kexec is not properly supported yet.
This will be used on x86 when the kernel is running as an SEV-ES guest. SEV-ES guests need special handling for kexec to hand over all CPUs to the new kernel. This requires special hypervisor support and handling code in the guest which is not yet implemented.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Joerg Roedel jroedel@suse.de
include/linux/kexec.h | 1 + kernel/kexec.c | 14 ++++++++++++++ kernel/kexec_file.c | 9 +++++++++ 3 files changed, 24 insertions(+)
I guess I can take this through the tip tree along with the next one.
I seem to remember the consensus when this was reviewed that it was unnecessary and there is already support for doing something like this at a more fine grained level so we don't need a new kexec hook.
Eric
On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
I seem to remember the consensus when this was reviewed that it was unnecessary and there is already support for doing something like this at a more fine grained level so we don't need a new kexec hook.
It was a discussion, no consenus :)
I still think it is better to solve this in generic code for everybody to re-use than with an hack in the architecture hooks.
More and more platforms which enable confidential computing features may need this hook in the future.
Regards,
Hi again,
On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
I seem to remember the consensus when this was reviewed that it was unnecessary and there is already support for doing something like this at a more fine grained level so we don't need a new kexec hook.
Forgot to state to problem again which these patches solve:
Currently a Linux kernel running as an SEV-ES guest has no way to successfully kexec into a new kernel. The normal SIPI sequence to reset the non-boot VCPUs does not work in SEV-ES guests and special code is needed in Linux to safely hand over the VCPUs from one kernel to the next. What happens currently is that the kexec'ed kernel will just hang.
The code which implements the VCPU hand-over is also included in this patch-set, but it requires a certain level of Hypervisor support which is not available everywhere.
To make it clear to the user that kexec will not work in their environment, it is best to disable the respected syscalls. This is what the hook is needed for.
Regards,
Joerg Roedel jroedel@suse.de writes:
Hi again,
On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
I seem to remember the consensus when this was reviewed that it was unnecessary and there is already support for doing something like this at a more fine grained level so we don't need a new kexec hook.
Forgot to state to problem again which these patches solve:
Currently a Linux kernel running as an SEV-ES guest has no way to successfully kexec into a new kernel. The normal SIPI sequence to reset the non-boot VCPUs does not work in SEV-ES guests and special code is needed in Linux to safely hand over the VCPUs from one kernel to the next. What happens currently is that the kexec'ed kernel will just hang.
The code which implements the VCPU hand-over is also included in this patch-set, but it requires a certain level of Hypervisor support which is not available everywhere.
To make it clear to the user that kexec will not work in their environment, it is best to disable the respected syscalls. This is what the hook is needed for.
Note this is environmental. This is the equivalent of a driver for a device without some feature.
The kernel already has machine_kexec_prepare, which is perfectly capable of detecting this is a problem and causing kexec_load to fail. Which is all that is required.
We don't need a new hook and a new code path to test for one architecture.
So when we can reliably cause the system call to fail with a specific error code I don't think it makes sense to make clutter up generic code because of one architecture's design mistakes.
My honest preference would be to go farther and have a firmware/hypervisor/platform independent rendezvous for the cpus so we don't have to worry about what bugs the code under has implemented for this special case. Because frankly there when there are layers of software if a bug can slip through it always seems to and causes problems.
But definitely there is no reason to add another generic hook when the existing hook is quite good enough.
Eric
On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
I seem to remember the consensus when this was reviewed that it was unnecessary and there is already support for doing something like this at a more fine grained level so we don't need a new kexec hook.
Well, the executive summary is that you have a guest whose memory *and* registers are encrypted so the hypervisor cannot have a poke inside and reset the vCPU like it would normally do. So you need to do that dance differently, i.e, the patchset.
If you try to kexec such a guest now, it'll init only the BSP, as Joerg said. So I guess a single-threaded kdump.
And yes, one of the prominent use cases is kdumping from such a guest, as distros love doing kdump for debugging.
I hope that explains it better.
linux-stable-mirror@lists.linaro.org