KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at error path it will try to return number of entries to the caller. But The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the lines that check for function return code and skip if error is found. Without it, the ioctl caller will see both the number of entries and the correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin valeriy.vdovin@virtuozzo.com --- v4: - Added description to documentation of KVM_GET_CPUID2. - Copy back nent only if E2BIG is returned. - Fixed error code sign. - Corrected version message
Documentation/virt/kvm/api.rst | 81 ++++++++++++------- arch/x86/kvm/x86.c | 11 ++- .../selftests/kvm/lib/x86_64/processor.c | 20 +++-- 3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 245d80581f15..c7cfe4b9614e 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case. };
-4.21 KVM_SET_SIGNAL_MASK +4.21 KVM_GET_CPUID2 +------------------ + +:Capability: basic +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: struct kvm_cpuid (in/out) +:Returns: 0 on success, -1 on error + +Returns a full list of cpuid entries that are supported by this vcpu and were +previously set by KVM_SET_CPUID/KVM_SET_CPUID2. + +The userspace must specify the number of cpuid entries it is ready to accept +from the kernel in the 'nent' field of 'struct kmv_cpuid'. + +The kernel will try to return all the cpuid entries it has in the response. +If the userspace nent value is too small for the full response, the kernel will +set the error code to -E2BIG, set the same 'nent' field to the actual number of +cpuid_entries and return without writing back any entries to the userspace. +The userspace can thus implement a two-call sequence, where the first call is +made with nent set to 0 to read the number of entries from the kernel and +use this response to allocate enough memory for a full response for the second +call. + +The call cal also return with error code -EFAULT in case of other errors. + + +4.22 KVM_SET_SIGNAL_MASK ------------------------
:Capability: basic @@ -737,7 +764,7 @@ signal mask. };
-4.22 KVM_GET_FPU +4.23 KVM_GET_FPU ----------------
:Capability: basic @@ -766,7 +793,7 @@ Reads the floating point state from the vcpu. };
-4.23 KVM_SET_FPU +4.24 KVM_SET_FPU ----------------
:Capability: basic @@ -795,7 +822,7 @@ Writes the floating point state to the vcpu. };
-4.24 KVM_CREATE_IRQCHIP +4.25 KVM_CREATE_IRQCHIP -----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) @@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE +4.26 KVM_IRQ_LINE -----------------
:Capability: KVM_CAP_IRQCHIP @@ -886,7 +913,7 @@ be used for a userspace interrupt controller. };
-4.26 KVM_GET_IRQCHIP +4.27 KVM_GET_IRQCHIP --------------------
:Capability: KVM_CAP_IRQCHIP @@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller. };
-4.27 KVM_SET_IRQCHIP +4.28 KVM_SET_IRQCHIP --------------------
:Capability: KVM_CAP_IRQCHIP @@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller. };
-4.28 KVM_XEN_HVM_CONFIG +4.29 KVM_XEN_HVM_CONFIG -----------------------
:Capability: KVM_CAP_XEN_HVM @@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK +4.30 KVM_GET_CLOCK ------------------
:Capability: KVM_CAP_ADJUST_CLOCK @@ -1005,7 +1032,7 @@ TSC is not stable. };
-4.30 KVM_SET_CLOCK +4.31 KVM_SET_CLOCK ------------------
:Capability: KVM_CAP_ADJUST_CLOCK @@ -1027,7 +1054,7 @@ such as migration. };
-4.31 KVM_GET_VCPU_EVENTS +4.32 KVM_GET_VCPU_EVENTS ------------------------
:Capability: KVM_CAP_VCPU_EVENTS @@ -1146,7 +1173,7 @@ directly to the virtual CPU). __u32 reserved[12]; };
-4.32 KVM_SET_VCPU_EVENTS +4.33 KVM_SET_VCPU_EVENTS ------------------------
:Capability: KVM_CAP_VCPU_EVENTS @@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API. See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS +4.34 KVM_GET_DEBUGREGS ----------------------
:Capability: KVM_CAP_DEBUGREGS @@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu. };
-4.34 KVM_SET_DEBUGREGS +4.35 KVM_SET_DEBUGREGS ----------------------
:Capability: KVM_CAP_DEBUGREGS @@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION +4.36 KVM_SET_USER_MEMORY_REGION -------------------------------
:Capability: KVM_CAP_USER_MEMORY @@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR +4.37 KVM_SET_TSS_ADDR ---------------------
:Capability: KVM_CAP_SET_TSS_ADDR @@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP +4.38 KVM_ENABLE_CAP -------------------
:Capability: KVM_CAP_ENABLE_CAP @@ -1390,7 +1417,7 @@ function properly, this is the place to put them. The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE +4.39 KVM_GET_MP_STATE ---------------------
:Capability: KVM_CAP_MP_STATE @@ -1438,7 +1465,7 @@ For arm/arm64: The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE +4.40 KVM_SET_MP_STATE ---------------------
:Capability: KVM_CAP_MP_STATE @@ -1460,7 +1487,7 @@ For arm/arm64: The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR +4.41 KVM_SET_IDENTITY_MAP_ADDR ------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR @@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID +4.42 KVM_SET_BOOT_CPU_ID ------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID @@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation, otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE +4.43 KVM_GET_XSAVE ------------------
:Capability: KVM_CAP_XSAVE @@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error. This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE +4.44 KVM_SET_XSAVE ------------------
:Capability: KVM_CAP_XSAVE @@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace. This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS +4.45 KVM_GET_XCRS -----------------
:Capability: KVM_CAP_XCRS @@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel. This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS +4.46 KVM_SET_XCRS -----------------
:Capability: KVM_CAP_XCRS @@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace. This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID +4.47 KVM_GET_SUPPORTED_CPUID ----------------------------
:Capability: KVM_CAP_EXT_CPUID @@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO +4.48 KVM_PPC_GET_PVINFO -----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index efc7a82ab140..3f941b1f4e78 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out; + r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid, cpuid_arg->entries); - if (r) + + if (r && r != -E2BIG) goto out; - r = -EFAULT; - if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) + + if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) { + r = -EFAULT; goto out; - r = 0; + } break; } case KVM_GET_MSRS: { diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index a8906e60a108..a412b39ad791 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2(); max_ent = cpuid->nent; + cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) { - rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid); - if (!rc) - break; + rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid); + TEST_ASSERT(rc == -1 && errno == E2BIG, + "KVM_GET_CPUID2 should return E2BIG: %d %d", + rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG, - "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d", - rc, errno); - } + TEST_ASSERT(cpuid->nent, + "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG"); + + TEST_ASSERT(cpuid->nent < max_ent, + "KVM_GET_CPUID2 has %d entries, expected maximum: %d", + cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid); TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i", rc, errno);
On Wed, Apr 28, 2021, Valeriy Vdovin wrote:
It's very explicit by the code that it was designed to receive some small number of entries to return E2BIG along with the corrected number.
LOL, saying KVM_GET_CPUID2 was "designed" is definitely giving the KVM forefathers the benefit of the doubt.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index efc7a82ab140..3f941b1f4e78 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
- r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid, cpuid_arg->entries);
if (r)
if (r && r != -E2BIG) goto out;
r = -EFAULT;
if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
r = -EFAULT; goto out;
As I pointed out[*], copying the number of entries but not the entries themselves is wrong. All of my feedback on v1 still stands.
[*] https://lkml.kernel.org/r/YIl4M/GgaYvwNuXv@google.com
r = 0;
break; } case KVM_GET_MSRS: {}
On 29/04/21 03:00, Sean Christopherson wrote:
On Wed, Apr 28, 2021, Valeriy Vdovin wrote:
It's very explicit by the code that it was designed to receive some small number of entries to return E2BIG along with the corrected number.
LOL, saying KVM_GET_CPUID2 was "designed" is definitely giving the KVM forefathers the benefit of the doubt.
I was going to make a different joke, i.e. that KVM_GET_CPUID2 was indeed designed the way Valeriy described, but that design was forgotten soon after.
Really, this ioctl has been such a trainwreck that I think the only good solution here is to drop it.
Paolo
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index efc7a82ab140..3f941b1f4e78 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
- r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid, cpuid_arg->entries);
if (r)
if (r && r != -E2BIG) goto out;
r = -EFAULT;
if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
r = -EFAULT; goto out;
As I pointed out[*], copying the number of entries but not the entries themselves is wrong. All of my feedback on v1 still stands.
[*] https://lkml.kernel.org/r/YIl4M/GgaYvwNuXv@google.com
r = 0;
break; } case KVM_GET_MSRS: {}
On 5/3/21 5:54 PM, Paolo Bonzini wrote:
On 29/04/21 03:00, Sean Christopherson wrote:
On Wed, Apr 28, 2021, Valeriy Vdovin wrote:
It's very explicit by the code that it was designed to receive some small number of entries to return E2BIG along with the corrected number.
LOL, saying KVM_GET_CPUID2 was "designed" is definitely giving the KVM forefathers the benefit of the doubt.
I was going to make a different joke, i.e. that KVM_GET_CPUID2 was indeed designed the way Valeriy described, but that design was forgotten soon after.
Really, this ioctl has been such a trainwreck that I think the only good solution here is to drop it.
Paolo
should we discuss KVM_GET_CPUID3 which will work "normally"?
Den
On 03/05/21 21:18, Denis V. Lunev wrote:
On 5/3/21 5:54 PM, Paolo Bonzini wrote:
On 29/04/21 03:00, Sean Christopherson wrote:
On Wed, Apr 28, 2021, Valeriy Vdovin wrote:
It's very explicit by the code that it was designed to receive some small number of entries to return E2BIG along with the corrected number.
LOL, saying KVM_GET_CPUID2 was "designed" is definitely giving the KVM forefathers the benefit of the doubt.
I was going to make a different joke, i.e. that KVM_GET_CPUID2 was indeed designed the way Valeriy described, but that design was forgotten soon after.
Really, this ioctl has been such a trainwreck that I think the only good solution here is to drop it.
Paolo
should we discuss KVM_GET_CPUID3 which will work "normally"?
Is anybody using KVM_GET_CPUID2 at all?
Paolo
On 5/4/21 10:23 AM, Paolo Bonzini wrote:
On 03/05/21 21:18, Denis V. Lunev wrote:
On 5/3/21 5:54 PM, Paolo Bonzini wrote:
On 29/04/21 03:00, Sean Christopherson wrote:
On Wed, Apr 28, 2021, Valeriy Vdovin wrote:
It's very explicit by the code that it was designed to receive some small number of entries to return E2BIG along with the corrected number.
LOL, saying KVM_GET_CPUID2 was "designed" is definitely giving the KVM forefathers the benefit of the doubt.
I was going to make a different joke, i.e. that KVM_GET_CPUID2 was indeed designed the way Valeriy described, but that design was forgotten soon after.
Really, this ioctl has been such a trainwreck that I think the only good solution here is to drop it.
Paolo
should we discuss KVM_GET_CPUID3 which will work "normally"?
Is anybody using KVM_GET_CPUID2 at all?
Paolo
As far as I understand only some testing within kernel now. Though we have plans to expose it for QAPI as the series in QEMU [PATCH 1/2] qapi: fix error handling for x-vz-query-cpu-model-cpuid [PATCH 2/2] qapi: blacklisted x-vz-query-cpu-model-cpuid in tests is not coming in a good way. The idea was to avoid manual code rework in QEMU and expose collected model at least for debug.
All it all, some control over things set to the kernel as CPU modes is needed. Without that I am feeling uncomfortable.
Den
On 04/05/21 10:15, Denis V. Lunev wrote:
As far as I understand only some testing within kernel now. Though we have plans to expose it for QAPI as the series in QEMU [PATCH 1/2] qapi: fix error handling for x-vz-query-cpu-model-cpuid [PATCH 2/2] qapi: blacklisted x-vz-query-cpu-model-cpuid in tests is not coming in a good way. The idea was to avoid manual code rework in QEMU and expose collected model at least for debug.
KVM_GET_CPUID2 as a VM ioctl cannot expose the whole truth about CPUID either, since it doesn't handle the TSX_CTRL_CPUID_CLEAR bit. Given that QEMU doesn't need KVM_GET_CPUID2; it only needs to save whatever it passed to KVM_SET_CPUID2.
Paolo
On 04.05.21 10:21, Paolo Bonzini wrote:
On 04/05/21 10:15, Denis V. Lunev wrote:
As far as I understand only some testing within kernel now. Though we have plans to expose it for QAPI as the series in QEMU [PATCH 1/2] qapi: fix error handling for x-vz-query-cpu-model-cpuid [PATCH 2/2] qapi: blacklisted x-vz-query-cpu-model-cpuid in tests is not coming in a good way. The idea was to avoid manual code rework in QEMU and expose collected model at least for debug.
KVM_GET_CPUID2 as a VM ioctl cannot expose the whole truth about CPUID either, since it doesn't handle the TSX_CTRL_CPUID_CLEAR bit. Given that QEMU doesn't need KVM_GET_CPUID2; it only needs to save whatever it passed to KVM_SET_CPUID2.
What if we instead deflect CPUID into user space so it can emulate it in whatever way it likes? Is the performance difference going to be relevant? Are people still using cpuid as barrier these days?
Alex
Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On 04/05/21 11:26, Alexander Graf wrote:
On 04.05.21 10:21, Paolo Bonzini wrote:
On 04/05/21 10:15, Denis V. Lunev wrote:
As far as I understand only some testing within kernel now. Though we have plans to expose it for QAPI as the series in QEMU [PATCH 1/2] qapi: fix error handling for x-vz-query-cpu-model-cpuid [PATCH 2/2] qapi: blacklisted x-vz-query-cpu-model-cpuid in tests is not coming in a good way. The idea was to avoid manual code rework in QEMU and expose collected model at least for debug.
KVM_GET_CPUID2 as a VM ioctl cannot expose the whole truth about CPUID either, since it doesn't handle the TSX_CTRL_CPUID_CLEAR bit. Given that QEMU doesn't need KVM_GET_CPUID2; it only needs to save whatever it passed to KVM_SET_CPUID2.
What if we instead deflect CPUID into user space so it can emulate it in whatever way it likes? Is the performance difference going to be relevant? Are people still using cpuid as barrier these days?
There's enough weirdness in CPUID (e.g. the magic redirection of unknown leaves to the highest Intel leaf) to make it relatively hard to implement correctly. So I think it should remain in the kernel.
Paolo
On Tue, May 4, 2021 at 2:26 AM Alexander Graf graf@amazon.com wrote:
On 04.05.21 10:21, Paolo Bonzini wrote:
On 04/05/21 10:15, Denis V. Lunev wrote:
As far as I understand only some testing within kernel now. Though we have plans to expose it for QAPI as the series in QEMU [PATCH 1/2] qapi: fix error handling for x-vz-query-cpu-model-cpuid [PATCH 2/2] qapi: blacklisted x-vz-query-cpu-model-cpuid in tests is not coming in a good way. The idea was to avoid manual code rework in QEMU and expose collected model at least for debug.
KVM_GET_CPUID2 as a VM ioctl cannot expose the whole truth about CPUID either, since it doesn't handle the TSX_CTRL_CPUID_CLEAR bit. Given that QEMU doesn't need KVM_GET_CPUID2; it only needs to save whatever it passed to KVM_SET_CPUID2.
What if we instead deflect CPUID into user space so it can emulate it in whatever way it likes? Is the performance difference going to be relevant? Are people still using cpuid as barrier these days?
What else would they use (in ring 3 code)? Sure, serialize is coming in Sapphire Rapids, but it will be 20+ years before kvm drops support for CPUs without serialize.
Alex
Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
linux-kselftest-mirror@lists.linaro.org