Mark Cc: Marc, Geoff
On 04/10/2015 12:02 AM, Mark Rutland wrote:
On Thu, Apr 09, 2015 at 05:53:33AM +0100, AKASHI Takahiro wrote:
Mark,
On 04/08/2015 10:05 PM, Mark Rutland wrote:
On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote:
The current kvm implementation keeps EL2 vector table installed even when the system is shut down. This prevents kexec from putting the system with kvm back into EL2 when starting a new kernel.
This patch resolves this issue by calling a cpu tear-down function via reboot notifier, kvm_reboot_notify(), which is invoked by kernel_restart_prepare() in kernel_kexec(). While kvm has a generic hook, kvm_reboot(), we can't use it here because a cpu teardown function will not be invoked, under current implementation, if no guest vm has been created by kvm_create_vm(). Please note that kvm_usage_count is zero in this case.
We'd better, in the future, implement cpu hotplug support and put the arch-specific initialization into kvm_arch_hardware_enable/disable(). This way, we would be able to revert this patch.
Why can't we use kvm_arch_hardware_enable/disable() currently?
IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being created *and* cpus have not been initialized yet. kvm_usage_count==0 indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever a guest is being terminated (i.e. kvm_usage_count != 0). Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table initialization, we don't have to have any particular operations, as my patch does, for kexec case. (a long-term solution)
Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why), I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset, and invoking it via a reboot hook. (an interim fix)
What I don't understand is why we can't move the init and tear-down functions into kvm_arch_hardware_enable/disable(). They seem to be for precisely what you are implementing, with the only difference being the time that they are called.
I don't know, neither. I just followed the discussions between Marc and Geoff, and their conclusion. I guessed that *refactoring* might be more complicated than expected.
FYI, I gave a quick try to kvm_arch_hardware_enable() approach by removing cpu_init_hyp_mode() from init_hyp_mode() and putting it into kvm_arch_hardware_enable(), and it seems to work, at least, in my environment: boot => start a kvm guest => kexec reboot => start a kvm guest
Either I'm missing something, or we can simply implement the existing hooks. I assume I'm missing something.
Marc, Geoff, any comments?
+static struct notifier_block kvm_reboot_nb = {
- .notifier_call = kvm_reboot_notify,
- .next = NULL,
- .priority = 0, /* FIXME */
It would be helpful for the comment to explain why this is wrong, and what needs fixing.
Thank for reminding me of this.
*priority* enforces a calling order of registered hook functions. If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called. (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/ notifier_call_chain().)
So we should make sure that kvm_reboot_notify() be called
- after any hook functions which may depend on kvm, and
Which hooks depend on KVM?
I think I answered this question below:
But how can we guarantee this and determine a priority of kvm_reboot_notify()? Looking into all the occurrences of register_reboot_notifier(),
- => nothing
- => virt/kvm/kvm_main.c (priority: 0)
- => drivers/cpufreq/s32416-cpufreq.c (priority: 0) drivers/cpufreq/s5pv210-cpufreq.c (priority: 0)
So a priority higher than zero might be safe and better, but exactly what? Some hooks use "INT_MAX."
Thanks, -Takahiro AKASHI
- before any hook functions which kvm may depend on, and
Which other hooks does KVM depend on?
- before any hook functions that may return NOTIFY_STOP_MASK
I think this would be solved by using kvm_arch_hardware_enable/disable. As far as I can tell, the VMs would be destroyed earlier (and hence KVM disabled) before we got to the final teardown.
Thanks, Mark.