Rework how KVM limits guest-unsupported xfeatures to effectively hide
only when saving state for userspace (KVM_GET_XSAVE), i.e. to let userspace
load all host-supported xfeatures (via KVM_SET_XSAVE) irrespective of
what features have been exposed to the guest.
The effect on KVM_SET_XSAVE was knowingly done by commit ad856280ddea
("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0"):
As a bonus, it will also fail if userspace tries to set fpu features
(with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
configuration. Such features will never be returned by KVM_GET_XSAVE
or KVM_GET_XSAVE2.
Peventing userspace from doing stupid things is usually a good idea, but in
this case restricting KVM_SET_XSAVE actually exacerbated the problem that
commit ad856280ddea was fixing. As reported by Tyler, rejecting KVM_SET_XSAVE
for guest-unsupported xfeatures breaks live migration from a kernel without
commit ad856280ddea, to a kernel with ad856280ddea. I.e. from a kernel that
saves guest-unsupported xfeatures to a kernel that doesn't allow loading
guest-unuspported xfeatures.
To make matters even worse, QEMU doesn't terminate if KVM_SET_XSAVE fails,
and so the end result is that the live migration results (possibly silent)
guest data corruption instead of a failed migration.
Patch 1 refactors the FPU code to let KVM pass in a mask of which xfeatures
to save, patch 2 fixes KVM by passing in guest_supported_xcr0 instead of
modifying user_xfeatures directly.
Patches 3-5 are regression tests.
I have no objection if anyone wants patches 1 and 2 squashed together, I
split them purely to make review easier.
Note, this doesn't fix the scenario where a guest is migrated from a "bad"
to a "good" kernel and the target host doesn't support the over-saved set
of xfeatures. I don't see a way to safely handle that in the kernel without
an opt-in, which more or less defeats the purpose of handling it in KVM.
Sean Christopherson (5):
x86/fpu: Allow caller to constrain xfeatures when copying to uabi
buffer
KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
KVM: selftests: Touch relevant XSAVE state in guest for state test
KVM: selftests: Load XSAVE state into untouched vCPU during state test
KVM: selftests: Force load all supported XSAVE state in state test
arch/x86/include/asm/fpu/api.h | 3 +-
arch/x86/kernel/fpu/core.c | 5 +-
arch/x86/kernel/fpu/xstate.c | 12 +-
arch/x86/kernel/fpu/xstate.h | 3 +-
arch/x86/kvm/cpuid.c | 8 --
arch/x86/kvm/x86.c | 37 +++---
.../selftests/kvm/include/x86_64/processor.h | 23 ++++
.../testing/selftests/kvm/x86_64/state_test.c | 110 +++++++++++++++++-
8 files changed, 168 insertions(+), 33 deletions(-)
base-commit: 5804c19b80bf625c6a9925317f845e497434d6d3
--
2.42.0.582.g8ccd20d70d-goog
This series fixes issues observed with selftests/amd-pstate while
running performance comparison tests with different governors. First
patch changes relative paths with absolute path and also change it
with correct path wherever it is broken.
The second patch adds an option to provide perf binary path to
handle the case where distro perf does not work.
Changelog v2->v3:
* Split the patch into two patches
Swapnil Sapkal (2):
selftests/amd-pstate: Fix broken paths to run workloads in amd-pstate-ut
selftests/amd-pstate: Added option to provide perf binary path
.../x86/amd_pstate_tracer/amd_pstate_trace.py | 2 +-
.../testing/selftests/amd-pstate/gitsource.sh | 14 +++++++----
tools/testing/selftests/amd-pstate/run.sh | 23 +++++++++++++------
tools/testing/selftests/amd-pstate/tbench.sh | 4 ++--
4 files changed, 28 insertions(+), 15 deletions(-)
--
2.34.1
Changelog: v2 -> v3
* Minimal code refactoring
* Rebased on v6.6-rc1
RFC v1:
https://lore.kernel.org/all/20210611124154.56427-1-psampat@linux.ibm.com/
RFC v2:
https://lore.kernel.org/all/20230828061530.126588-2-aboorvad@linux.vnet.ibm…
Other related RFC:
https://lore.kernel.org/all/20210430082804.38018-1-psampat@linux.ibm.com/
Userspace selftest:
https://lkml.org/lkml/2020/9/2/356
----
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will be additional latencies
of the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/powerpc/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output is as follows:
# --IPI Latency Test---
# Baseline Avg IPI latency(ns): 2720
# Observed Avg IPI latency(ns) - State snooze: 2565
# Observed Avg IPI latency(ns) - State stop0_lite: 3856
# Observed Avg IPI latency(ns) - State stop0: 3670
# Observed Avg IPI latency(ns) - State stop1: 3872
# Observed Avg IPI latency(ns) - State stop2: 17421
# Observed Avg IPI latency(ns) - State stop4: 1003922
# Observed Avg IPI latency(ns) - State stop5: 1058870
#
# --Timeout Latency Test--
# Baseline Avg timeout diff(ns): 1435
# Observed Avg timeout diff(ns) - State snooze: 1709
# Observed Avg timeout diff(ns) - State stop0_lite: 2028
# Observed Avg timeout diff(ns) - State stop0: 1954
# Observed Avg timeout diff(ns) - State stop1: 1895
# Observed Avg timeout diff(ns) - State stop2: 14556
# Observed Avg timeout diff(ns) - State stop4: 873988
# Observed Avg timeout diff(ns) - State stop5: 959137
Aboorva Devarajan (2):
powerpc/cpuidle: cpuidle wakeup latency based on IPI and timer events
powerpc/selftest: Add support for cpuidle latency measurement
arch/powerpc/Kconfig.debug | 10 +
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/kernel/test_cpuidle_latency.c | 154 ++++++
tools/testing/selftests/powerpc/Makefile | 1 +
.../powerpc/cpuidle_latency/.gitignore | 2 +
.../powerpc/cpuidle_latency/Makefile | 6 +
.../cpuidle_latency/cpuidle_latency.sh | 443 ++++++++++++++++++
.../powerpc/cpuidle_latency/settings | 1 +
8 files changed, 618 insertions(+)
create mode 100644 arch/powerpc/kernel/test_cpuidle_latency.c
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
create mode 100755 tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings
--
2.25.1
When execute the following command to test clone3 under !CONFIG_TIME_NS:
# make headers && cd tools/testing/selftests/clone3 && make && ./clone3
we can see the following error info:
# [7538] Trying clone3() with flags 0x80 (size 0)
# Invalid argument - Failed to create new process
# [7538] clone3() with flags says: -22 expected 0
not ok 18 [7538] Result (-22) is different than expected (0)
...
# Totals: pass:18 fail:1 xfail:0 xpass:0 skip:0 error:0
This is because if CONFIG_TIME_NS is not set, but the flag
CLONE_NEWTIME (0x80) is used to clone a time namespace, it
will return -EINVAL in copy_time_ns().
If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time
will be not exist, and then we should skip clone3() test with
CLONE_NEWTIME.
With this patch under !CONFIG_TIME_NS:
# make headers && cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
...
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:1 error:0
Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
---
v6: Rebase on 6.5-rc1 and update the commit message
tools/testing/selftests/clone3/clone3.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index e60cf4d..1c61e3c 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -196,7 +196,12 @@ int main(int argc, char *argv[])
CLONE3_ARGS_NO_TEST);
/* Do a clone3() in a new time namespace */
- test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ if (access("/proc/self/ns/time", F_OK) == 0) {
+ test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ } else {
+ ksft_print_msg("Time namespaces are not supported\n");
+ ksft_test_result_skip("Skipping clone3() with CLONE_NEWTIME\n");
+ }
/* Do a clone3() with exit signal (SIGCHLD) in flags */
test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST);
--
2.1.0
v2: Rebase on 6.5-rc1 and update the commit message
Tiezhu Yang (2):
selftests/vDSO: Add support for LoongArch
selftests/vDSO: Get version and name for all archs
tools/testing/selftests/vDSO/vdso_config.h | 6 ++++-
tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++--------
.../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++----------------
3 files changed, 18 insertions(+), 30 deletions(-)
--
2.1.0
And this is the last(?) revision of this series which should now compile
with or without CONFIG_HID_BPF set.
I had to do changes because [1] was failing
Nick, I kept your Tested-by, even if I made small changes in 1/3. Feel
free to shout if you don't want me to keep it.
Eduard, You helped us a lot in the review of v1 but never sent your
Reviewed-by or Acked-by. Do you want me to add one?
Cheers,
Benjamin
[1] https://gitlab.freedesktop.org/bentiss/hid/-/jobs/49754306
For reference, the v2 cover letter:
| Hi, I am sending this series on behalf of myself and Benjamin Tissoires. There
| existed an initial n=3 patch series which was later expanded to n=4 and
| is now back to n=3 with some fixes added in and rebased against
| mainline.
|
| This patch series aims to ensure that the hid/bpf selftests can be built
| without errors.
|
| Here's Benjamin's initial cover letter for context:
| | These fixes have been triggered by [0]:
| | basically, if you do not recompile the kernel first, and are
| | running on an old kernel, vmlinux.h doesn't have the required
| | symbols and the compilation fails.
| |
| | The tests will fail if you run them on that very same machine,
| | of course, but the binary should compile.
| |
| | And while I was sorting out why it was failing, I realized I
| | could do a couple of improvements on the Makefile.
| |
| | [0] https://lore.kernel.org/linux-input/56ba8125-2c6f-a9c9-d498-0ca1c153dcb2@re…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v3:
- Also overwrite all of the enum symbols in patch 1/3
- Link to v2: https://lore.kernel.org/r/20230908-kselftest-09-08-v2-0-0def978a4c1b@google…
Changes in v2:
- roll Justin's fix into patch 1/3
- add __attribute__((preserve_access_index)) (thanks Eduard)
- rebased onto mainline (2dde18cd1d8fac735875f2e4987f11817cc0bc2c)
- Link to v1: https://lore.kernel.org/r/20230825-wip-selftests-v1-0-c862769020a8@kernel.o…
Link: https://github.com/ClangBuiltLinux/linux/issues/1698
Link: https://github.com/ClangBuiltLinux/continuous-integration2/issues/61
---
Benjamin Tissoires (3):
selftests/hid: ensure we can compile the tests on kernels pre-6.3
selftests/hid: do not manually call headers_install
selftests/hid: force using our compiled libbpf headers
tools/testing/selftests/hid/Makefile | 10 ++-
tools/testing/selftests/hid/progs/hid.c | 3 -
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 77 ++++++++++++++++++++++
3 files changed, 81 insertions(+), 9 deletions(-)
---
base-commit: 29aa98d0fe013e2ab62aae4266231b7fb05d47a2
change-id: 20230825-wip-selftests-9a7502b56542
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
A number of corner cases were caught when trying to run the selftests on
older systems. Missed skip conditions, some error cases, and outdated
python setups would all report failures but the issue would actually be
related to some other condition rather than the selftest suite.
Address these individual cases.
Aaron Conole (4):
selftests: openvswitch: Add version check for pyroute2
selftests: openvswitch: Catch cases where the tests are killed
selftests: openvswitch: Skip drop testing on older kernels
selftests: openvswitch: Fix the ct_tuple for v4
.../selftests/net/openvswitch/openvswitch.sh | 21 ++++++++-
.../selftests/net/openvswitch/ovs-dpctl.py | 46 ++++++++++++++++++-
2 files changed, 65 insertions(+), 2 deletions(-)
--
2.40.1
On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki <akihiko.odaki(a)daynix.com> wrote:
>
> virtio-net have two usage of hashes: one is RSS and another is hash
> reporting. Conventionally the hash calculation was done by the VMM.
> However, computing the hash after the queue was chosen defeats the
> purpose of RSS.
>
> Another approach is to use eBPF steering program. This approach has
> another downside: it cannot report the calculated hash due to the
> restrictive nature of eBPF.
>
> Introduce the code to compute hashes to the kernel in order to overcome
> thse challenges. An alternative solution is to extend the eBPF steering
> program so that it will be able to report to the userspace, but it makes
> little sense to allow to implement different hashing algorithms with
> eBPF since the hash value reported by virtio-net is strictly defined by
> the specification.
>
> The hash value already stored in sk_buff is not used and computed
> independently since it may have been computed in a way not conformant
> with the specification.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
> ---
> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> + .max_indirection_table_length =
> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> +
> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> +};
No need to have explicit capabilities exchange like this? Tun either
supports all or none.
> case TUNSETSTEERINGEBPF:
> - ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
> + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
> + if (IS_ERR(bpf_ret))
> + ret = PTR_ERR(bpf_ret);
> + else if (bpf_ret)
> + tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
Don't make one feature disable another.
TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
functions. If one is enabled the other call should fail, with EBUSY
for instance.
> + case TUNSETVNETHASH:
> + len = sizeof(vnet_hash);
> + if (copy_from_user(&vnet_hash, argp, len)) {
> + ret = -EFAULT;
> + break;
> + }
> +
> + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> + (tun->vnet_hdr_sz < sizeof(struct virtio_net_hdr_v1_hash) ||
> + !tun_is_little_endian(tun))) ||
> + vnet_hash.indirection_table_mask >=
> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> + ret = -EINVAL;
> + break;
> + }
> +
> + argp = (u8 __user *)argp + len;
> + len = (vnet_hash.indirection_table_mask + 1) * 2;
> + if (copy_from_user(vnet_hash_indirection_table, argp, len)) {
> + ret = -EFAULT;
> + break;
> + }
> +
> + argp = (u8 __user *)argp + len;
> + len = virtio_net_hash_key_length(vnet_hash.types);
> +
> + if (copy_from_user(vnet_hash_key, argp, len)) {
> + ret = -EFAULT;
> + break;
> + }
Probably easier and less error-prone to define a fixed size control
struct with the max indirection table size.
Btw: please trim the CC: list considerably on future patches.