This patch series implements a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.
NT synchronization primitives are unique in that the wait functions both are
vectored, operate on multiple types of object with different behaviour (mutex,
semaphore, event), and affect the state of the objects they wait on. This model
is not compatible with existing kernel synchronization objects or interfaces,
and therefore the ntsync driver implements its own wait queues and locking.
Hence I would like to request review from someone familiar with locking to make
sure that the usage of low-level kernel primitives is correct and that the wait
queues work as intended, and to that end I've CC'd the locking maintainers.
== Background ==
The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.
The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.
This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:
https://www.youtube.com/watch?v=NjU4nyWyhU8
== Performance ==
The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuils, OnMars, and myself:
Game Upstream ntsync improvement
===========================================================================
Anger Foot 69 99 43%
Call of Juarez 99.8 224.1 125%
Dirt 3 110.6 860.7 678%
Forza Horizon 5 108 160 48%
Lara Croft: Temple of Osiris 141 326 131%
Metro 2033 164.4 199.2 21%
Resident Evil 2 26 77 196%
The Crew 26 51 96%
Tiny Tina's Wonderlands 130 360 177%
Total War Saga: Troy 109 146 34%
===========================================================================
== Patches ==
The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. For those not already familiar with the Windows
functions (or their undocumented behaviour), patch 27/27 provides a detailed
specification, and individual patches also include a brief description of the
API they are implementing.
The patches making use of this driver in Wine can be retrieved or browsed here:
https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync5
== Implementation ==
Some aspects of the implementation may deserve particular comment:
* In the interest of performance, each object is governed only by a single
spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
objects be changed as a single atomic operation. In order to achieve this, we
first take a device-wide lock ("wait_all_lock") any time we are going to lock
more than one object at a time.
The maximum number of objects that can be used in a vectored wait, and
therefore the maximum that can be locked simultaneously, is 64. This number is
NT's own limit.
The acquisition of multiple spinlocks will degrade performance. This is a
conscious choice, however. Wait-for-all is known to be a very rare operation
in practice, especially with counts that approach the maximum, and it is the
intent of the ntsync driver to optimize wait-for-any at the expense of
wait-for-all as much as possible.
* NT mutexes are tied to their threads on an OS level, and the kernel includes
builtin support for "robust" mutexes. In order to keep the ntsync driver
self-contained and avoid touching more code than necessary, it does not hook
into task exit nor use pids.
Instead, the user space emulator is expected to manage thread IDs and pass
them as an argument to any relevant functions; this is the "owner" field of
ntsync_wait_args and ntsync_mutex_args.
When the emulator detects that a thread dies, it should therefore call
NTSYNC_IOC_MUTEX_KILL on any open mutexes.
* ntsync is module-capable mostly because there was nothing preventing it, and
because it aided development. It is not a hard requirement, though.
== Previous versions ==
Changes from v3:
* Add .gitignore and use KHDR_INCLUDES in selftest build files, per Muhammad
Usama Anjum.
* Try to explain why we are rolling our own primitives a little better, per Greg
Kroah-Hartman.
* Link to v3: https://lore.kernel.org/lkml/20240329000621.148791-1-zfigura@codeweavers.co…
* Link to v2: https://lore.kernel.org/lkml/20240219223833.95710-1-zfigura@codeweavers.com/
* Link to v1: https://lore.kernel.org/lkml/20240214233645.9273-1-zfigura@codeweavers.com/
* Link to RFC v2: https://lore.kernel.org/lkml/20240131021356.10322-1-zfigura@codeweavers.com/
* Link to RFC v1: https://lore.kernel.org/lkml/20240124004028.16826-1-zfigura@codeweavers.com/
Elizabeth Figura (27):
ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
ntsync: Introduce NTSYNC_IOC_EVENT_SET.
ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
ntsync: Introduce NTSYNC_IOC_SEM_READ.
ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
ntsync: Introduce NTSYNC_IOC_EVENT_READ.
ntsync: Introduce alertable waits.
selftests: ntsync: Add some tests for semaphore state.
selftests: ntsync: Add some tests for mutex state.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for manual-reset event state.
selftests: ntsync: Add some tests for auto-reset event state.
selftests: ntsync: Add some tests for wakeup signaling with events.
selftests: ntsync: Add tests for alertable waits.
selftests: ntsync: Add some tests for wakeup signaling via alerts.
selftests: ntsync: Add a stress test for contended waits.
maintainers: Add an entry for ntsync.
docs: ntsync: Add documentation for the ntsync uAPI.
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/ntsync.rst | 399 +++++
MAINTAINERS | 9 +
drivers/misc/ntsync.c | 925 ++++++++++-
include/uapi/linux/ntsync.h | 39 +
tools/testing/selftests/Makefile | 1 +
.../selftests/drivers/ntsync/.gitignore | 1 +
.../testing/selftests/drivers/ntsync/Makefile | 7 +
tools/testing/selftests/drivers/ntsync/config | 1 +
.../testing/selftests/drivers/ntsync/ntsync.c | 1407 +++++++++++++++++
10 files changed, 2786 insertions(+), 4 deletions(-)
create mode 100644 Documentation/userspace-api/ntsync.rst
create mode 100644 tools/testing/selftests/drivers/ntsync/.gitignore
create mode 100644 tools/testing/selftests/drivers/ntsync/Makefile
create mode 100644 tools/testing/selftests/drivers/ntsync/config
create mode 100644 tools/testing/selftests/drivers/ntsync/ntsync.c
base-commit: ebbc1a4789c666846b9854ef845a37a64879e5f9
--
2.43.0
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zcmop, Zca, Zcf, Zcd and Zcb extensions isa string parsing,
hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have been left
out since they target microcontrollers/embedded CPUs and are not needed
by RVA23U64.
Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd
(if D), this series modifies the way ISA string is parsed and now does
it in two phases. First one parses the string and the second one
validates it for the final ISA description.
This series is based on the Zimop one [1]. An additional fix [2] should
be applied to correctly test that series.
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Link: https://lore.kernel.org/all/20240409143839.558784-1-cleger@rivosinc.com/ [2]
---
v4:
- Modify validate() callbacks to return an 0, -EPROBEDEFER or another
error.
- v3: https://lore.kernel.org/all/20240423124326.2532796-1-cleger@rivosinc.com/
v3:
- Fix typo "exists" -> "exist"
- Remove C implies Zca, Zcd, Zcf, dt-bindings rules
- Rework ISA string resolver to handle dependencies
- v2: https://lore.kernel.org/all/20240418124300.1387978-1-cleger@rivosinc.com/
v2:
- Add Zc* dependencies validation in dt-bindings
- v1: https://lore.kernel.org/lkml/20240410091106.749233-1-cleger@rivosinc.com/
Clément Léger (11):
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA extensions validation
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 24 ++
.../devicetree/bindings/riscv/extensions.yaml | 90 ++++++
arch/riscv/include/asm/cpufeature.h | 1 +
arch/riscv/include/asm/hwcap.h | 5 +
arch/riscv/include/uapi/asm/hwprobe.h | 5 +
arch/riscv/include/uapi/asm/kvm.h | 5 +
arch/riscv/kernel/cpufeature.c | 259 ++++++++++++------
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kvm/vcpu_onereg.c | 10 +
.../selftests/kvm/riscv/get-reg-list.c | 20 ++
10 files changed, 337 insertions(+), 87 deletions(-)
--
2.43.0
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 907f33028871fa7c9a3db1efd467b78ef82cce20 ]
The standard library perror() function provides a convenient way to print
an error message based on the current errno but this doesn't play nicely
with KTAP output. Provide a helper which does an equivalent thing in a KTAP
compatible format.
nolibc doesn't have a strerror() and adding the table of strings required
doesn't seem like a good fit for what it's trying to do so when we're using
that only print the errno.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Stable-dep-of: 071af0c9e582 ("selftests: timers: Convert posix_timers test to generate KTAP output")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
---
tools/testing/selftests/kselftest.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index e8eecbc83a60..ad7b97e16f37 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -48,6 +48,7 @@
#include <stdlib.h>
#include <unistd.h>
#include <stdarg.h>
+#include <string.h>
#include <stdio.h>
#include <sys/utsname.h>
#endif
@@ -156,6 +157,19 @@ static inline void ksft_print_msg(const char *msg, ...)
va_end(args);
}
+static inline void ksft_perror(const char *msg)
+{
+#ifndef NOLIBC
+ ksft_print_msg("%s: %s (%d)\n", msg, strerror(errno), errno);
+#else
+ /*
+ * nolibc doesn't provide strerror() and it seems
+ * inappropriate to add one, just print the errno.
+ */
+ ksft_print_msg("%s: %d)\n", msg, errno);
+#endif
+}
+
static inline void ksft_test_result_pass(const char *msg, ...)
{
int saved_errno = errno;
--
2.44.0.769.g3c40516874-goog
To verify IFS (In Field Scan [1]) driver functionality, add the following 6
test cases:
1. Verify that IFS sysfs entries are created after loading the IFS module
2. Check if loading an invalid IFS test image fails and loading a valid
one succeeds
3. Perform IFS scan test on each CPU using all the available image files
4. Perform IFS scan with first test image file on a random CPU for 3
rounds
5. Perform IFS ARRAY BIST(Board Integrated System Test) test on each CPU
6. Perform IFS ARRAY BIST test on a random CPU for 3 rounds
These are not exhaustive, but some minimal test runs to check various
parts of the driver. Some negative tests are also included.
[1] https://docs.kernel.org/arch/x86/ifs.html
Pengfei Xu (4):
selftests: ifs: verify test interfaces are created by the driver
selftests: ifs: verify test image loading functionality
selftests: ifs: verify IFS scan test functionality
selftests: ifs: verify IFS ARRAY BIST functionality
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../drivers/platform/x86/intel/ifs/Makefile | 6 +
.../platform/x86/intel/ifs/test_ifs.sh | 496 ++++++++++++++++++
4 files changed, 504 insertions(+)
create mode 100644 tools/testing/selftests/drivers/platform/x86/intel/ifs/Makefile
create mode 100755 tools/testing/selftests/drivers/platform/x86/intel/ifs/test_ifs.sh
--
2.43.0
It seems obvious once you know, but at first I didn't realise that the
suite name is part of this format. Document it and add example.
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
Documentation/dev-tools/kunit/run_wrapper.rst | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/kunit/run_wrapper.rst b/Documentation/dev-tools/kunit/run_wrapper.rst
index 19ddf5e07013..e75a5fc05814 100644
--- a/Documentation/dev-tools/kunit/run_wrapper.rst
+++ b/Documentation/dev-tools/kunit/run_wrapper.rst
@@ -156,13 +156,20 @@ Filtering tests
===============
By passing a bash style glob filter to the ``exec`` or ``run``
-commands, we can run a subset of the tests built into a kernel . For
+commands, we can run a subset of the tests built into a kernel,
+identified by a string like ``$suite_name.$test_name``. For
example: if we only want to run KUnit resource tests, use:
.. code-block::
./tools/testing/kunit/kunit.py run 'kunit-resource*'
+Or to run just one specific test from that suite:
+
+.. code-block::
+
+ ./tools/testing/kunit/kunit.py run 'kunit-resource-test.kunit_resource_test_init_resources'
+
This uses the standard glob format with wildcard characters.
.. _kunit-on-qemu:
--
2.44.0.396.g6e790dbe36-goog
It seems obvious once you know, but at first I didn't realise that the
suite name is part of this format. Document it and add some examples.
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
v1->v2: Expanded to clarify that suite_glob and test_glob are two separate
patterns. Also made some other trivial changes to formatting etc.
Documentation/dev-tools/kunit/run_wrapper.rst | 33 +++++++++++++++++--
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/kunit/run_wrapper.rst b/Documentation/dev-tools/kunit/run_wrapper.rst
index 19ddf5e07013..b07252d3fa9d 100644
--- a/Documentation/dev-tools/kunit/run_wrapper.rst
+++ b/Documentation/dev-tools/kunit/run_wrapper.rst
@@ -156,12 +156,39 @@ Filtering tests
===============
By passing a bash style glob filter to the ``exec`` or ``run``
-commands, we can run a subset of the tests built into a kernel . For
-example: if we only want to run KUnit resource tests, use:
+commands, we can run a subset of the tests built into a kernel,
+identified by a string like ``<suite_glob>[.<test_glob>]``.
+
+For example, to run the ``kunit-resource-test`` suite:
+
+.. code-block::
+
+ ./tools/testing/kunit/kunit.py run kunit-resource-test
+
+To run a specific test from that suite:
+
+.. code-block::
+
+ ./tools/testing/kunit/kunit.py run kunit-resource-test.kunit_resource_test
+
+To run all tests from suites whose names start with ``kunit``:
+
+.. code-block::
+
+ ./tools/testing/kunit/kunit.py run 'kunit*'
+
+To run all tests whose name ends with ``remove_resource``:
+
+.. code-block::
+
+ ./tools/testing/kunit/kunit.py run '*.*remove_resource'
+
+To run all tests whose name ends with ``remove_resource``, from suites whose
+names start with ``kunit``:
.. code-block::
- ./tools/testing/kunit/kunit.py run 'kunit-resource*'
+ ./tools/testing/kunit/kunit.py run 'kunit*.*remove_resource'
This uses the standard glob format with wildcard characters.
--
2.44.0.478.gd926399ef9-goog
Clean up the KVM clock mess somewhat so that it is either based on the guest
TSC ("master clock" mode), or on the host CLOCK_MONOTONIC_RAW in cases where
the TSC isn't usable.
Eliminate the third variant where it was based directly on the *host* TSC,
due to bugs in e.g. __get_kvmclock().
Kill off the last vestiges of the KVM clock being based on CLOCK_MONOTONIC
instead of CLOCK_MONOTONIC_RAW and thus being subject to NTP skew.
Fix up migration support to allow the KVM clock to be saved/restored as an
arithmetic function of the guest TSC, since that's what it actually is in
the *common* case so it can be migrated precisely. Or at least to within
±1 ns which is good enough, as discussed in
https://lore.kernel.org/kvm/c8dca08bf848e663f192de6705bf04aa3966e856.camel@…
In v2 of this series, TSC synchronization is improved and simplified a bit
too, and we allow masterclock mode to be used even when the guest TSCs are
out of sync, as long as they're running at the same *rate*. The different
*offset* shouldn't matter.
And the kvm_get_time_scale() function annoyed me by being entirely opaque,
so I studied it until my brain hurt and then added some comments.
In v2 I also dropped the commits which were removing the periodic clock
syncs. Those are going to be needed still but *only* for non-masterclock
mode, which I'll do next. Along with ensuring that a masterclock update
while already in masterclock mode doesn't jump the clock, and just does
the same as KVM_SET_CLOCK_GUEST does to preserve it.
Needs a *lot* more testing. I think I'm almost done refactoring the code,
so should focus on building up the tests next.
(I do still hate that we're abusing KVM_GET_CLOCK just to get the tuple
of {host_tsc, CLOCK_REALTIME} without even *caring* about the eponymous
KVM clock. Especially as this information is (a) fundamentally what the
vDSO gettimeofday() exposes to us anyway, (b) using CLOCK_REALTIME not
TAI, (c) not available on other platforms, for example for migrating
the Arm arch counter.)
David Woodhouse (13):
KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration
KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
KVM: x86: Fix KVM clock precision in __get_kvmclock()
KVM: x86: Fix software TSC upscaling in kvm_update_guest_time()
KVM: x86: Simplify and comment kvm_get_time_scale()
KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset()
KVM: x86: Improve synchronization in kvm_synchronize_tsc()
KVM: x86: Kill cur_tsc_{nsec,offset,write} fields
KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
KVM: x86: Factor out kvm_use_master_clock()
Jack Allister (2):
KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
KVM: selftests: Add KVM/PV clock selftest to prove timer correction
Documentation/virt/kvm/api.rst | 37 ++
Documentation/virt/kvm/devices/vcpu.rst | 115 +++-
arch/x86/include/asm/kvm_host.h | 15 +-
arch/x86/include/uapi/asm/kvm.h | 6 +
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 687 +++++++++++++++-------
arch/x86/kvm/xen.c | 4 +-
include/uapi/linux/kvm.h | 3 +
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++
11 files changed, 822 insertions(+), 243 deletions(-)
There is a 'malloc' call in test_vmx_nested_state function, which can
be unsuccessful. This patch will add the malloc failure checking
to avoid possible null dereference and give more information
about test fail reasons.
Signed-off-by: Kunwu Chan <chentao(a)kylinos.cn>
---
tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c b/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c
index 67a62a5a8895..18afc2000a74 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c
@@ -91,6 +91,7 @@ void test_vmx_nested_state(struct kvm_vcpu *vcpu)
const int state_sz = sizeof(struct kvm_nested_state) + getpagesize();
struct kvm_nested_state *state =
(struct kvm_nested_state *)malloc(state_sz);
+ TEST_ASSERT(state, "-ENOMEM when allocating kvm state");
/* The format must be set to 0. 0 for VMX, 1 for SVM. */
set_default_vmx_state(state, state_sz);
--
2.40.1