On Fri, Sep 15, 2023 at 04:41:20AM -0300, Leonardo Bras wrote:
Other than that, all I can think of is removing the features from guest:
As you commented, there may be some features that would not be a problem to be removed, and also there may be features which are not used by the workload, and could be removed. But this would depend on the feature, and the workload, beind a custom solution for every case.
Yes, the "fixup back" should be refined to pointed and verified cases.
For this (removing guest features), from kernel side, I would suggest using SystemTap (and eBPF, IIRC). The procedures should be something like:
- Try to migrate VM from host with older kernel: fail
- Look at qemu error, which features are missing?
- Are those features safely removable from guest ?
- If so, get an SystemTap / eBPF script masking out the undesired bits.
- Try the migration again, it should succeed.
IIRC, this could also be done in qemu side, with a custom qemu:
- Try to migrate VM from host with older kernel: fail
- Look at qemu error, which features are missing?
- Are those features safely removable from guest ?
- If so, get a custom qemu which mask-out the desired flags before the VM starts
- Live migrate (can be inside the source host) to the custom qemu
- Live migrate from custom qemu to target host.
- The custom qemu could be on a auxiliary host, and used only for this
Yes, it's hard, takes time, and may not solve every case, but it gets a higher chance of the VM surviving in the long run.
Thank you for taking the time to throughly consider the issue and suggest some ways out - I really appreciate it.
But keep in mind this is a hack. Taking features from a live guest is not supported in any way, and has a high chance of crashing the VM.
OK - if there's no interest in the below, I will not push for including this patch in the kernel tree any longer. I do think the specific case below is what a vast majority of KVM users will struggle with in the near future, though:
I have a test environment with Broadwell-based (have only AVX-256) guests running under Skylake (PKRU, AVX512, ...) hypervisors.
I added some pr_debug statements to a guest kernel running under a hypervisor, with said hypervisor containing neither your nor my patches, and printed the guests view of `fpu_kernel_cfg.max_features` at boot. It was 0x7, or: XFEATURE_MASK_FP, XFEATURE_MASK_SSE, XFEATURE_MASK_YMM
Thus, I'm pretty sure that all that's happening here is that the guest's FP context is having PKRU/ZMM. saved and restored needlessly by the hypervisor. Stripping it on a live-migration does not seem to have any ill-effects in all the testing I have done.
Cheers, Tyler