Passing the host topology to the guest is almost certainly wrong and will confuse the scheduler. In addition, several fields of these CPUID leaves vary on each processor; it is simply impossible to return the right values from KVM_GET_SUPPORTED_CPUID in such a way that they can be passed to KVM_SET_CPUID2.
The values that will most likely prevent confusion are all zeroes. Userspace will have to override it anyway if it wishes to present a specific topology to the guest.
Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- Documentation/virt/kvm/api.rst | 14 ++++++++++++++ arch/x86/kvm/cpuid.c | 32 ++++++++++++++++---------------- 2 files changed, 30 insertions(+), 16 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eee9f857a986..20f4f6b302ff 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID`` It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel has enabled in-kernel emulation of the local APIC.
+CPU topology +~~~~~~~~~~~~ + +Several CPUID values include topology information for the host CPU: +0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different +versions of KVM return different values for this information and userspace +should not rely on it. Currently they return all zeroes. + +If userspace wishes to set up a guest topology, it should be careful that +the values of these three leaves differ for each CPU. In particular, +the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX +for 0x8000001e; the latter also encodes the core id and node id in bits +7:0 of EBX and ECX respectively. + Obsolete ioctls and capabilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0810e93cbedc..164bfb7e7a16 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -759,16 +759,22 @@ struct kvm_cpuid_array { int nent; };
+static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array) +{ + if (array->nent >= array->maxnent) + return NULL; + + return &array->entries[array->nent++]; +} + static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, u32 function, u32 index) { - struct kvm_cpuid_entry2 *entry; + struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);
- if (array->nent >= array->maxnent) + if (!entry) return NULL;
- entry = &array->entries[array->nent++]; - memset(entry, 0, sizeof(*entry)); entry->function = function; entry->index = index; @@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->edx = edx.full; break; } - /* - * Per Intel's SDM, the 0x1f is a superset of 0xb, - * thus they can be handled by common code. - */ case 0x1f: case 0xb: /* - * Populate entries until the level type (ECX[15:8]) of the - * previous entry is zero. Note, CPUID EAX.{0x1f,0xb}.0 is - * the starting entry, filled by the primary do_host_cpuid(). + * No topology; a valid topology is indicated by the presence + * of subleaf 1. */ - for (i = 1; entry->ecx & 0xff00; ++i) { - entry = do_host_cpuid(array, function, i); - if (!entry) - goto out; - } + entry->eax = entry->ebx = entry->ecx = 0; break; case 0xd: { u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); @@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->ebx = entry->ecx = entry->edx = 0; break; case 0x8000001e: + /* Do not return host topology information. */ + entry->eax = entry->ebx = entry->ecx = 0; + entry->edx = 0; /* reserved */ break; case 0x8000001F: if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
On Thu, Oct 27, 2022 at 2:21 AM Paolo Bonzini pbonzini@redhat.com wrote:
Passing the host topology to the guest is almost certainly wrong and will confuse the scheduler. In addition, several fields of these CPUID leaves vary on each processor; it is simply impossible to return the right values from KVM_GET_SUPPORTED_CPUID in such a way that they can be passed to KVM_SET_CPUID2.
The values that will most likely prevent confusion are all zeroes. Userspace will have to override it anyway if it wishes to present a specific topology to the guest.
Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com
Documentation/virt/kvm/api.rst | 14 ++++++++++++++ arch/x86/kvm/cpuid.c | 32 ++++++++++++++++---------------- 2 files changed, 30 insertions(+), 16 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eee9f857a986..20f4f6b302ff 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID`` It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel has enabled in-kernel emulation of the local APIC.
+CPU topology +~~~~~~~~~~~~
+Several CPUID values include topology information for the host CPU: +0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different +versions of KVM return different values for this information and userspace +should not rely on it. Currently they return all zeroes.
+If userspace wishes to set up a guest topology, it should be careful that +the values of these three leaves differ for each CPU. In particular, +the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX +for 0x8000001e; the latter also encodes the core id and node id in bits +7:0 of EBX and ECX respectively.
Obsolete ioctls and capabilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0810e93cbedc..164bfb7e7a16 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -759,16 +759,22 @@ struct kvm_cpuid_array { int nent; };
+static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array) +{
if (array->nent >= array->maxnent)
return NULL;
return &array->entries[array->nent++];
+}
static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, u32 function, u32 index) {
struct kvm_cpuid_entry2 *entry;
struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);
if (array->nent >= array->maxnent)
if (!entry) return NULL;
entry = &array->entries[array->nent++];
memset(entry, 0, sizeof(*entry)); entry->function = function; entry->index = index;
@@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->edx = edx.full; break; }
/*
* Per Intel's SDM, the 0x1f is a superset of 0xb,
* thus they can be handled by common code.
*/ case 0x1f: case 0xb: /*
* Populate entries until the level type (ECX[15:8]) of the
* previous entry is zero. Note, CPUID EAX.{0x1f,0xb}.0 is
* the starting entry, filled by the primary do_host_cpuid().
* No topology; a valid topology is indicated by the presence
* of subleaf 1. */
for (i = 1; entry->ecx & 0xff00; ++i) {
entry = do_host_cpuid(array, function, i);
if (!entry)
goto out;
}
entry->eax = entry->ebx = entry->ecx = 0; break; case 0xd: { u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
@@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->ebx = entry->ecx = entry->edx = 0; break; case 0x8000001e:
/* Do not return host topology information. */
entry->eax = entry->ebx = entry->ecx = 0;
entry->edx = 0; /* reserved */ break; case 0x8000001F: if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
-- 2.31.1
This is a userspace ABI change that breaks existing hypervisors. Please don't do this. Userspace ABIs are supposed to be inviolate.
On 1/25/23 00:16, Jim Mattson wrote:
This is a userspace ABI change that breaks existing hypervisors. Please don't do this. Userspace ABIs are supposed to be inviolate.
What exactly is broken?
Part of the definition of the API is that you can take KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs. Returning host topology information for a random host vCPU definitely violates the contract.
Furthermore, any existing userspace should be prepared to deal with nonexistent host topology leaves. If you mean that said userspace would now pass no topology to the guest, that's not an API change either.
(Now, there are certainly other parts of the KVM_GET_SUPPORTED_CPUID contract that should be specified better. But that should be done for each leaf one by one, which I intend to do, and would not extend to these three host topology leaves).
Paolo
On Wed, Jan 25, 2023 at 6:17 AM Paolo Bonzini pbonzini@redhat.com wrote:
On 1/25/23 00:16, Jim Mattson wrote:
This is a userspace ABI change that breaks existing hypervisors. Please don't do this. Userspace ABIs are supposed to be inviolate.
What exactly is broken?
KVM_GET_SUPPORTED_CPUID no longer returns the host topology in leaf 0xB.
Part of the definition of the API is that you can take KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs. Returning host topology information for a random host vCPU definitely violates the contract.
You are attempting to rewrite history. Leaf 0xB was added to KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest cpuid management"), and the only documentation of the KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm) supports
There is nothing in the commit message or the official documentation at the time that the ioctl was added that says anything about passing the result to KVM_SET_CPUID2 for all vCPUs. Operationally, it is quite clear from the committed code that the intention was to return the host topology information for the current logical processor.
Any future changes to either the operational behavior or the documented behavior of the ABI surely demand a version bump.
On 1/25/23 17:47, Jim Mattson wrote:
Part of the definition of the API is that you can take KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs. Returning host topology information for a random host vCPU definitely violates the contract.
You are attempting to rewrite history. Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest cpuid management"), and the only documentation of the KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm) supports
[...] the intention was to return the host topology information for the current logical processor.
The handling of unknown features is so naive in that commit, that I don't think it is possible to read anything from the implementation; and it certainly should not be a paragon for a future-proof implementation of KVM_GET_SUPPORTED_CPUID.
For example, it only hid _known_ CPUID leaves or features and passed the unknown ones through, which you'll agree is completely broken. It also didn't try to handle all leaves for which ECX might turn out to be significant---which happened for EAX=7 so the commit returns a wrong output for CPUID[EAX=7,ECX=0].EAX.
In other words, the only reason it handles 0xB is because it was known to have subleaves.
We can get more information about how userspace was intended to use it from the qemu-kvm fork, which at the time was practically the only KVM userspace. As of 2009 it was only checking a handful of leaves:
https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?...
so shall we say that userspace is supposed to build each CPUID leaf one by one and use KVM_GET_SUPPORTED_CPUID2 for validation only? I think the first committed documentation agrees: "Userspace can use the information returned by this ioctl to construct cpuid information (for KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace capabilities, and with user requirements".
However, that's the theory. "Do not break userspace" also involves looking at how userspace *really* uses the API, and make compromises to cater to those uses; which is different from rewriting history.
And in practice, people basically stopped reading after "(for KVM_SET_CPUID2)".
For example in kvmtool:
kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES; if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0) die_perror("KVM_GET_SUPPORTED_CPUID failed");
filter_cpuid(kvm_cpuid, vcpu->cpu_id);
if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0) die_perror("KVM_SET_CPUID2 failed");
where filter_cpuid only does minor adjustments that do not include 0xB, 0x1F and 0x8000001E. The result is a topology that makes no sense if host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC id in EDX.
https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030f...
crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but it fails to adjust the APIC id in EDX. On the other hand it also passes through 0x8000001E if ctx.cpu_config.host_cpu_topology is false, incorrectly. So on one hand this patch breaks host_cpu_topology == true, on the other hand it fixes host_cpu_topology == false on AMD processors.
https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec...
The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but not for 0x1F. Apart from that it passes through the host topology leaves, again resulting in nonsensical topology for host #vCPUs != guest #vCPUs.
https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c...
Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch is again a fix of sorts---having all zeroes in 0x1F is better than having a value that is valid but inconsistent with 0xB.
https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f17... https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f17...
So basically the only open source userspace that is penalized (but not broken, and also partly fixed) by the patch is crosvm. QEMU doesn't care, while firecracker/kvmtool/vmm-reference are a net positive.
Paolo
Any future changes to either the operational behavior or the documented behavior of the ABI surely demand a version bump.
On Wed, Jan 25, 2023 at 1:46 PM Paolo Bonzini pbonzini@redhat.com wrote:
On 1/25/23 17:47, Jim Mattson wrote:
Part of the definition of the API is that you can take KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs. Returning host topology information for a random host vCPU definitely violates the contract.
You are attempting to rewrite history. Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest cpuid management"), and the only documentation of the KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm) supports
[...] the intention was to return the host topology information for the current logical processor.
The handling of unknown features is so naive in that commit, that I don't think it is possible to read anything from the implementation; and it certainly should not be a paragon for a future-proof implementation of KVM_GET_SUPPORTED_CPUID.
For example, it only hid _known_ CPUID leaves or features and passed the unknown ones through, which you'll agree is completely broken. It also didn't try to handle all leaves for which ECX might turn out to be significant---which happened for EAX=7 so the commit returns a wrong output for CPUID[EAX=7,ECX=0].EAX.
In other words, the only reason it handles 0xB is because it was known to have subleaves.
We can get more information about how userspace was intended to use it from the qemu-kvm fork, which at the time was practically the only KVM userspace. As of 2009 it was only checking a handful of leaves:
https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?...
so shall we say that userspace is supposed to build each CPUID leaf one by one and use KVM_GET_SUPPORTED_CPUID2 for validation only? I think the first committed documentation agrees: "Userspace can use the information returned by this ioctl to construct cpuid information (for KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace capabilities, and with user requirements".
However, that's the theory. "Do not break userspace" also involves looking at how userspace *really* uses the API, and make compromises to cater to those uses; which is different from rewriting history.
And in practice, people basically stopped reading after "(for KVM_SET_CPUID2)".
For example in kvmtool:
kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES; if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0) die_perror("KVM_GET_SUPPORTED_CPUID failed"); filter_cpuid(kvm_cpuid, vcpu->cpu_id); if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0) die_perror("KVM_SET_CPUID2 failed");
where filter_cpuid only does minor adjustments that do not include 0xB, 0x1F and 0x8000001E. The result is a topology that makes no sense if host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC id in EDX.
https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030f...
crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but it fails to adjust the APIC id in EDX. On the other hand it also passes through 0x8000001E if ctx.cpu_config.host_cpu_topology is false, incorrectly. So on one hand this patch breaks host_cpu_topology == true, on the other hand it fixes host_cpu_topology == false on AMD processors.
https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec...
The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but not for 0x1F. Apart from that it passes through the host topology leaves, again resulting in nonsensical topology for host #vCPUs != guest #vCPUs.
https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c...
Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch is again a fix of sorts---having all zeroes in 0x1F is better than having a value that is valid but inconsistent with 0xB.
https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f17... https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f17...
So basically the only open source userspace that is penalized (but not broken, and also partly fixed) by the patch is crosvm. QEMU doesn't care, while firecracker/kvmtool/vmm-reference are a net positive.
Paolo
The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a decade* have been passed through unmodified from the host. They have never made sense for KVM_SET_CPUID2, with the unlikely exception of a whole-host VM.
Our VMM populates the topology of the guest CPUID table on its own, as any VMM must. However, it uses the host topology (which KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a decade*) to see if the requested guest topology is possible.
Changing a long-established ABI in a way that breaks userspace applications is a bad practice. I didn't think we, as a community, did that. I didn't realize that we were only catering to open source implementations here.
On 1/25/23 23:09, Jim Mattson wrote:
The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a decade* have been passed through unmodified from the host. They have never made sense for KVM_SET_CPUID2, with the unlikely exception of a whole-host VM.
True, unfortunately people have not read the nonexistent documentation and they are:
1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;
2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for example in inconsistencies between 0xB and 0x1F.
*But* (2) should not be needed unless you care about maintaining homogeneous CPUID within a VM migration pool. For something like kvmtool, having to do (2) would be a workaround for the bug that this patch fixes.
Our VMM populates the topology of the guest CPUID table on its own, as any VMM must. However, it uses the host topology (which KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a decade*) to see if the requested guest topology is possible.
Ok, thanks; this is useful to know.
Changing a long-established ABI in a way that breaks userspace applications is a bad practice. I didn't think we, as a community, did that. I didn't realize that we were only catering to open source implementations here.
We aren't. But the open source implementations provide some guidance as to how the API is being used in the wild, and what the pitfalls are.
You wrote it yourself: any VMM must either populate the topology on its own, or possibly fill it with zeros. Returning a value that is extremely unlikely to be used is worse in pretty much every way (apart from not breaking your VMM, of course).
With a total of six known users (QEMU, crosvm, kvmtool, firecracker, rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and damned if it doesn't. There is a tension between fixing the one VMM that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly, and fixing 3-4 that were silently broken and are now fixed. I will probably send a patch to crosvm, though.
The VMM being _proprietary_ doesn't really matter, however it does matter to me that it is not _public_: it is only used within Google, and the breakage is neither hard to fix in the VMM nor hard to temporarily avoid by reverting the patch in the Google kernel.
Paolo
On Wed, Jan 25, 2023 at 2:44 PM Paolo Bonzini pbonzini@redhat.com wrote:
On 1/25/23 23:09, Jim Mattson wrote:
The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a decade* have been passed through unmodified from the host. They have never made sense for KVM_SET_CPUID2, with the unlikely exception of a whole-host VM.
True, unfortunately people have not read the nonexistent documentation and they are:
rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;
never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for
example in inconsistencies between 0xB and 0x1F.
*But* (2) should not be needed unless you care about maintaining homogeneous CPUID within a VM migration pool. For something like kvmtool, having to do (2) would be a workaround for the bug that this patch fixes.
Maybe we should just populate up to leaf 3. :-)
Our VMM populates the topology of the guest CPUID table on its own, as any VMM must. However, it uses the host topology (which KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a decade*) to see if the requested guest topology is possible.
Ok, thanks; this is useful to know.
Changing a long-established ABI in a way that breaks userspace applications is a bad practice. I didn't think we, as a community, did that. I didn't realize that we were only catering to open source implementations here.
We aren't. But the open source implementations provide some guidance as to how the API is being used in the wild, and what the pitfalls are.
You wrote it yourself: any VMM must either populate the topology on its own, or possibly fill it with zeros. Returning a value that is extremely unlikely to be used is worse in pretty much every way (apart from not breaking your VMM, of course).
I've complained about this particular ioctl more than I can remember. This is just one of its many problems.
With a total of six known users (QEMU, crosvm, kvmtool, firecracker, rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and damned if it doesn't. There is a tension between fixing the one VMM that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly, and fixing 3-4 that were silently broken and are now fixed. I will probably send a patch to crosvm, though.
The VMM being _proprietary_ doesn't really matter, however it does matter to me that it is not _public_: it is only used within Google, and the breakage is neither hard to fix in the VMM nor hard to temporarily avoid by reverting the patch in the Google kernel.
Sadly, there isn't a single kernel involved. People running our VMM on their desktops are going to be impacted as soon as this patch hits that distro. (I don't know if I can say which distro that is.) So, now we have to get the VMM folks to urgently accommodate this change and get a new distribution out.
On 1/26/23 01:58, Jim Mattson wrote:
You wrote it yourself: any VMM must either populate the topology on its own, or possibly fill it with zeros. Returning a value that is extremely unlikely to be used is worse in pretty much every way (apart from not breaking your VMM, of course).
I've complained about this particular ioctl more than I can remember. This is just one of its many problems.
I agree. At the very least it should have been a VM ioctl.
With a total of six known users (QEMU, crosvm, kvmtool, firecracker, rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and damned if it doesn't. There is a tension between fixing the one VMM that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly, and fixing 3-4 that were silently broken and are now fixed. I will probably send a patch to crosvm, though.
The VMM being _proprietary_ doesn't really matter, however it does matter to me that it is not _public_: it is only used within Google, and the breakage is neither hard to fix in the VMM nor hard to temporarily avoid by reverting the patch in the Google kernel.
Sadly, there isn't a single kernel involved. People running our VMM on their desktops are going to be impacted as soon as this patch hits that distro. (I don't know if I can say which distro that is.) So, now we have to get the VMM folks to urgently accommodate this change and get a new distribution out.
Ok, this is what is needed to make a more informed choice. To be clear, this is _still_ not public (for example it's not ChromeOS), so there is at least some control on what version of the VMM they use? Would it make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
Thanks,
Paolo
On Thu, Jan 26, 2023 at 1:40 AM Paolo Bonzini pbonzini@redhat.com wrote:
On 1/26/23 01:58, Jim Mattson wrote:
You wrote it yourself: any VMM must either populate the topology on its own, or possibly fill it with zeros. Returning a value that is extremely unlikely to be used is worse in pretty much every way (apart from not breaking your VMM, of course).
I've complained about this particular ioctl more than I can remember. This is just one of its many problems.
I agree. At the very least it should have been a VM ioctl.
With a total of six known users (QEMU, crosvm, kvmtool, firecracker, rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and damned if it doesn't. There is a tension between fixing the one VMM that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly, and fixing 3-4 that were silently broken and are now fixed. I will probably send a patch to crosvm, though.
The VMM being _proprietary_ doesn't really matter, however it does matter to me that it is not _public_: it is only used within Google, and the breakage is neither hard to fix in the VMM nor hard to temporarily avoid by reverting the patch in the Google kernel.
Sadly, there isn't a single kernel involved. People running our VMM on their desktops are going to be impacted as soon as this patch hits that distro. (I don't know if I can say which distro that is.) So, now we have to get the VMM folks to urgently accommodate this change and get a new distribution out.
Ok, this is what is needed to make a more informed choice. To be clear, this is _still_ not public (for example it's not ChromeOS), so there is at least some control on what version of the VMM they use? Would it make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
Mainline isn't a problem. I'm more worried about 5.19 LTS.
Thanks!
On 1/26/23 17:06, Jim Mattson wrote:
Sadly, there isn't a single kernel involved. People running our VMM on their desktops are going to be impacted as soon as this patch hits that distro. (I don't know if I can say which distro that is.) So, now we have to get the VMM folks to urgently accommodate this change and get a new distribution out.
Ok, this is what is needed to make a more informed choice. To be clear, this is _still_ not public (for example it's not ChromeOS), so there is at least some control on what version of the VMM they use? Would it make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
Mainline isn't a problem. I'm more worried about 5.19 LTS.
5.19 is not LTS, is it? This patch is only in 6.1.7 and 6.1.8 as far as stable kernels is concerned, should I ask Greg to revert it there?
Paolo
On Thu, Jan 26, 2023 at 9:47 AM Paolo Bonzini pbonzini@redhat.com wrote:
On 1/26/23 17:06, Jim Mattson wrote:
Sadly, there isn't a single kernel involved. People running our VMM on their desktops are going to be impacted as soon as this patch hits that distro. (I don't know if I can say which distro that is.) So, now we have to get the VMM folks to urgently accommodate this change and get a new distribution out.
Ok, this is what is needed to make a more informed choice. To be clear, this is _still_ not public (for example it's not ChromeOS), so there is at least some control on what version of the VMM they use? Would it make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
Mainline isn't a problem. I'm more worried about 5.19 LTS.
5.19 is not LTS, is it? This patch is only in 6.1.7 and 6.1.8 as far as stable kernels is concerned, should I ask Greg to revert it there?
It came to my attention when commit 196c6f0c3e21 ("KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID") broke some of our tests under 5.10 LTS.
If it isn't bound for linux-5.19-y, then we have some breathing room.
On Thu, Jan 26, 2023 at 12:45:58PM -0800, Jim Mattson wrote:
On Thu, Jan 26, 2023 at 9:47 AM Paolo Bonzini pbonzini@redhat.com wrote:
On 1/26/23 17:06, Jim Mattson wrote:
Sadly, there isn't a single kernel involved. People running our VMM on their desktops are going to be impacted as soon as this patch hits that distro. (I don't know if I can say which distro that is.) So, now we have to get the VMM folks to urgently accommodate this change and get a new distribution out.
Ok, this is what is needed to make a more informed choice. To be clear, this is _still_ not public (for example it's not ChromeOS), so there is at least some control on what version of the VMM they use? Would it make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
Mainline isn't a problem. I'm more worried about 5.19 LTS.
5.19 is not LTS, is it? This patch is only in 6.1.7 and 6.1.8 as far as stable kernels is concerned, should I ask Greg to revert it there?
It came to my attention when commit 196c6f0c3e21 ("KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID") broke some of our tests under 5.10 LTS.
If it isn't bound for linux-5.19-y, then we have some breathing room.
5.19 is long end-of-life, it dropped off of being maintained back in October of last year.
You can always use the front page of kernel.org to determine what is still being maintained.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org