It is documented in Documentation/admin-guide/hw-vuln/spectre.rst, that
disabling indirect branch speculation for a user-space process creates
more overhead and cause it to run slower. The performance hit varies by
CPU, but on the AMD A4-9120C and A6-9220C CPUs, a simple ping-pong using
pipes between two processes runs ~10x slower when disabling IB
speculation.
Patch 2, included in this RFC but not intended for commit, is a simple
program that demonstrates this issue. Running on a A4-9120C without IB
speculation disabled, each process ping-pong takes ~7us:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1936300, iter/sec: 135383, us/iter: 7
But when IB speculation is disabled, that number increases
significantly:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 16384, t: 1500518, iter/sec: 10918, us/iter: 91
Although this test is a worst-case scenario, we can also consider a real
situation: an audio server (i.e. pulse). If we imagine a low-latency
capture, with 10ms packets and a concurrent task on the same CPU (i.e.
video encoding, for a video call), the audio server will preempt the
CPU at a rate of 100HZ. At 91us overhead per preemption (switching to
and from the audio process), that's 0.9% overhead for one process doing
preemption. In real-world testing (on a A4-9120C), I've seen 9% of CPU
used by IBPB when doing a 2-person video call.
With this patch, the number of IBPBs issued can be reduced to the
minimum necessary, only when there's a potential attacker->victim
process switch.
Running on the same A4-9120C device, this patch reduces the performance
hit of IBPB by ~half, as expected:
localhost ~ # taskset 1 /usr/local/bin/test ds
...
iters: 32768, t: 1824043, iter/sec: 17964, us/iter: 55
It should be noted, CPUs from multiple vendors experience a performance
hit due to IBPB. I also tested a Intel i3-8130U which sees a noticable
(~2x) increase in process switch time due to IBPB.
IB spec enabled:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1210821us, iter/sec: 216501, us/iter: 4
IB spec disabled:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 131072, t: 1257583us, iter/sec: 104225, us/iter: 9
Open questions:
- There are a significant number of task flags, which also now reaches the
limit of the 'long' on 32-bit systems. Should the 'mode' flags be
stored somewhere else?
- Having x86-specific flags in linux/sched.h feels wrong. However, this
is the mechanism for doing atomic flag updates. Is there an alternate
approach?
Open tasks:
- Documentation
- Naming
Changes in v2:
- Make flag per-process using prctl().
Anand K Mistry (2):
x86/speculation: Allow per-process control of when to issue IBPB
selftests: Benchmark for the cost of disabling IB speculation
arch/x86/include/asm/thread_info.h | 4 +
arch/x86/kernel/cpu/bugs.c | 56 +++++++++
arch/x86/kernel/process.c | 10 ++
arch/x86/mm/tlb.c | 51 ++++++--
include/linux/sched.h | 10 ++
include/uapi/linux/prctl.h | 5 +
.../testing/selftests/ib_spec/ib_spec_bench.c | 109 ++++++++++++++++++
7 files changed, 236 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/ib_spec/ib_spec_bench.c
--
2.31.1.498.g6c1eba8ee3d-goog
Hi, a friend and I were chasing bug 205219 [1] listed in Bugzilla.
We step into something a little bit different when trying to reproduce
the buggy behavior. In our try, compilation failed with a message form
make asking us to clean the source tree. We couldn't run kunit_tool
after compiling the kernel for x86, as described by Ted in the
discussion pointed out by the bug report.
Steps to reproduce:
0) Run kunit_tool
$ ./tools/testing/kunit/kunit.py run
Works fine with a clean tree.
1) Compile the kernel for some architecture (we did it for x86_64).
2) Run kunit_tool again
$ ./tools/testing/kunit/kunit.py run
Fails with a message form make asking us to clean the source tree.
Removing the clean source tree check from the top-level Makefile gives
us a similar error to what was described in the bug report. We see that
after running `git clean -fdx` kunit_tool runs nicely again. However,
this is not a real solution since some kernel binaries are erased by git.
We also had a look into the commit messages of Masahiro Yamada but
couldn't quite grasp why the check for the tree to be clean was added.
We could invest more time in this issue but actually don't know how to
proceed. We'd be glad to receive any comment about it. We could also try
something else if it's a too hard issue for beginners.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=205219
Best Regards,
Marcelo
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v2:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
- Fixed error code sign.
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..3f941b1f4e78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != -E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v2:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..fa9bb6b751c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
arch/x86/kvm/x86.c | 10 +++++-----
.../selftests/kvm/lib/x86_64/processor.c | 20 +++++++++++--------
2 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..df8a3e44e722 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
- goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
Hi Linus,
Please pull the following KUnit update for Linux 5.13-rc1.
This KUnit update for Linux 5.13-rc1 consists of several fixes and
new feature to support failure from dynamic analysis tools such as
UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure
(e.g. super_operations, iommu_ops, etc.).
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-kunit-5.13-rc1
for you to fetch changes up to de2fcb3e62013738f22bbb42cbd757d9a242574e:
Documentation: kunit: add tips for using current->kunit_test
(2021-04-07 16:40:37 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-5.13-rc1
This KUnit update for Linux 5.13-rc1 consists of several fixes and
new feature to support failure from dynamic analysis tools such as
UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure
(e.g. super_operations, iommu_ops, etc.).
----------------------------------------------------------------
Daniel Latypov (4):
kunit: make KUNIT_EXPECT_STREQ() quote values, don't print literals
kunit: tool: make --kunitconfig accept dirs, add lib/kunit fragment
kunit: fix -Wunused-function warning for __kunit_fail_current_test
Documentation: kunit: add tips for using current->kunit_test
Lucas Stankus (1):
kunit: Match parenthesis alignment to improve code readability
Uriel Guajardo (1):
kunit: support failure from dynamic analysis tools
Documentation/dev-tools/kunit/tips.rst | 78
+++++++++++++++++++++++++++++++++-
include/kunit/test-bug.h | 29 +++++++++++++
lib/kunit/.kunitconfig | 3 ++
lib/kunit/assert.c | 61 ++++++++++++++++++--------
lib/kunit/test.c | 39 +++++++++++++++--
tools/testing/kunit/kunit.py | 4 +-
tools/testing/kunit/kunit_kernel.py | 2 +
tools/testing/kunit/kunit_tool_test.py | 6 +++
8 files changed, 198 insertions(+), 24 deletions(-)
create mode 100644 include/kunit/test-bug.h
create mode 100644 lib/kunit/.kunitconfig
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 5.13-rc1.
This Kselftest update for Linux 5.13-rc1 consists of:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-next-5.13-rc1
for you to fetch changes up to e75074781f1735c1976bc551e29ccf2ba9a4b17f:
selftests/resctrl: Change a few printed messages (2021-04-07 16:37:49
-0600)
----------------------------------------------------------------
linux-kselftest-next-5.13-rc1
This Kselftest update for Linux 5.13-rc1 consists of:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
----------------------------------------------------------------
Antonio Terceiro (1):
Documentation: kselftest: fix path to test module files
Colin Ian King (1):
selftests/timers: Fix spelling mistake "clocksourc" -> "clocksource"
Fenghua Yu (20):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl
FS is supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is
not supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
selftests/resctrl: Change a few printed messages
Ilya Leoshkevich (1):
selftests: fix prepending $(OUTPUT) to $(TEST_PROGS)
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
Documentation/dev-tools/kselftest.rst | 4 +-
tools/testing/selftests/lib.mk | 3 +-
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 ++++++-
tools/testing/selftests/resctrl/cat_test.c | 57 +++----
.../selftests/resctrl/{cqm_test.c => cmt_test.c} | 75 +++-------
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 +++---
tools/testing/selftests/resctrl/mbm_test.c | 42 +++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
tools/testing/selftests/resctrl/resctrl_tests.c | 163
++++++++++++++-------
tools/testing/selftests/resctrl/resctrl_val.c | 95 +++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++++-------
.../testing/selftests/timers/clocksource-switch.c | 2 +-
17 files changed, 413 insertions(+), 300 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
----------------------------------------------------------------
This patchset introduces batched operations for the per-cpu variant of
the array map.
Also updates the batch ops test for arrays.
v4 -> v5:
- Revert removal of percpu macros
v3 -> v4:
- Prefer 'calloc()' over 'malloc()' on batch ops tests
- Add missing static keyword in a couple of test functions
- 'offset' to 'cpu_offset' as suggested by Martin
v2 -> v3:
- Remove percpu macros as suggested by Andrii
- Update tests that used the per cpu macros
v1 -> v2:
- Amended a more descriptive commit message
Pedro Tammela (2):
bpf: add batched ops support for percpu array
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
.../bpf/map_tests/array_map_batch_ops.c | 104 +++++++++++++-----
2 files changed, 77 insertions(+), 29 deletions(-)
--
2.25.1
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
assumed that !vma_is_anonymous() was equivalent to "the page will be in the
page cache". But, in this case we have an optimization where we allocate a new
*anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
- Commit 10 is a small documentation update to reflect the new changes.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
userfaultfd: update documentation to mention shmem minor faults
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 17 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 115 +++-----
mm/userfaultfd.c | 175 ++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 364 insertions(+), 251 deletions(-)
--
2.31.1.368.gbe11c130af-goog