On Thu, Sep 14, 2023 at 08:58:42PM -0400, Tyler Stachecki wrote:
On Thu, Sep 14, 2023 at 10:05:57AM -0700, Dongli Zhang wrote:
That is:
Without the commit (src and dst), something bad may happen.
With the commit on src, issue is fixed.
With the commit only dst, it is expected that issue is not fixed.
Therefore, from administrator's perspective, the bugfix should always be applied no the source server, in order to succeed the migration.
I fully agree. Though, I think this boils down to: The commit must be on the source or something bad may happen.
It then follows that you cannot live-migrate guests off the source to patch it without potentially corrupting the guests currently running on that source...
Well, the bug was a real bad issue, and even the solution does not solve all problems.
As we discussed, there is no way of safely removing any feature from the guest without potential issues. One potential solution would be having hosts that implement the missing guest features needed for the VMs, but this may be far from easy depending on the missing feature.
Other than that, all I can think of is removing the features from guest:
As you commented, there may be some features that would not be a problem to be removed, and also there may be features which are not used by the workload, and could be removed. But this would depend on the feature, and the workload, beind a custom solution for every case.
For this (removing guest features), from kernel side, I would suggest using SystemTap (and eBPF, IIRC). The procedures should be something like: - Try to migrate VM from host with older kernel: fail - Look at qemu error, which features are missing? - Are those features safely removable from guest ? - If so, get an SystemTap / eBPF script masking out the undesired bits. - Try the migration again, it should succeed.
IIRC, this could also be done in qemu side, with a custom qemu: - Try to migrate VM from host with older kernel: fail - Look at qemu error, which features are missing? - Are those features safely removable from guest ? - If so, get a custom qemu which mask-out the desired flags before the VM starts - Live migrate (can be inside the source host) to the custom qemu - Live migrate from custom qemu to target host. - The custom qemu could be on a auxiliary host, and used only for this
Yes, it's hard, takes time, and may not solve every case, but it gets a higher chance of the VM surviving in the long run.
But keep in mind this is a hack. Taking features from a live guest is not supported in any way, and has a high chance of crashing the VM.
Best regards, Leo
Regards, Tyler