This is something that I've been thinking about for a while. We had a
discussion at LPC 2020 about this[1] but the proposals suggested there
never materialised.
In short, it is quite difficult for userspace to detect the feature
capability of syscalls at runtime. This is something a lot of programs
want to do, but they are forced to create elaborate scenarios to try to
figure out if a feature is supported without causing damage to the
system. For the vast majority of cases, each individual feature also
needs to be tested individually (because syscall results are
all-or-nothing), so testing even a single syscall's feature set can
easily inflate the startup time of programs.
This patchset implements the fairly minimal design I proposed in this
talk[2] and in some old LKML threads (though I can't find the exact
references ATM). The general flow looks like:
1. Userspace will indicate to the kernel that a syscall should a be
no-op by setting the top bit of the extensible struct size argument.
We will almost certainly never support exabyte sized structs, so the
top bits are free for us to use as makeshift flag bits. This is
preferable to using the per-syscall flag field inside the structure
because seccomp can easily detect the bit in the flag and allow the
probe or forcefully return -EEXTSYS_NOOP.
2. The kernel will then fill the provided structure with every valid
bit pattern that the current kernel understands.
For flags or other bitflag-like fields, this is the set of valid
flags or bits. For pointer fields or fields that take an arbitrary
value, the field has every bit set (0xFF... to fill the field) to
indicate that any value is valid in the field.
3. The syscall then returns -EEXTSYS_NOOP which is an errno that will
only ever be used for this purpose (so userspace can be sure that
the request succeeded).
On older kernels, the syscall will return a different error (usually
-E2BIG or -EFAULT) and userspace can do their old-fashioned checks.
4. Userspace can then check which flags and fields are supported by
looking at the fields in the returned structure. Flags are checked
by doing an AND with the flags field, and field support can checked
by comparing to 0. In principle you could just AND the entire
structure if you wanted to do this check generically without caring
about the structure contents (this is what libraries might consider
doing).
Userspace can even find out the internal kernel structure size by
passing a PAGE_SIZE buffer and seeing how many bytes are non-zero.
As with copy_struct_from_user(), this is designed to be forward- and
backwards- compatible.
This allows programas to get a one-shot understanding of what features a
syscall supports without having to do any elaborate setups or tricks to
detect support for destructive features. Flags can simply be ANDed to
check if they are in the supported set, and fields can just be checked
to see if they are non-zero.
This patchset is IMHO the simplest way we can add the ability to
introspect the feature set of extensible struct (copy_struct_from_user)
syscalls. It doesn't preclude the chance of a more generic mechanism
being added later.
The intended way of using this interface to get feature information
looks something like the following (imagine that openat2 has gained a
new field and a new flag in the future):
static bool openat2_no_automount_supported;
static bool openat2_cwd_fd_supported;
int check_openat2_support(void)
{
int err;
struct open_how how = {};
err = openat2(AT_FDCWD, ".", &how, CHECK_FIELDS | sizeof(how));
assert(err < 0);
switch (errno) {
case EFAULT: case E2BIG:
/* Old kernel... */
check_support_the_old_way();
break;
case EEXTSYS_NOOP:
openat2_no_automount_supported = (how.flags & RESOLVE_NO_AUTOMOUNT);
openat2_cwd_fd_supported = (how.cwd_fd != 0);
break;
}
}
This series adds CHECK_FIELDS support for the following extensible
struct syscalls, as they are quite likely to grow flags in the near
future:
* openat2
* clone3
* mount_setattr
[1]: https://lwn.net/Articles/830666/
[2]: https://youtu.be/ggD-eb3yPVs
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v3:
- Fix copy_struct_to_user() return values in case of clear_user() failure.
- v2: <https://lore.kernel.org/r/20240906-extensible-structs-check_fields-v2-0-0f4…>
Changes in v2:
- Add CHECK_FIELDS support to mount_setattr(2).
- Fix build failure on architectures with custom errno values.
- Rework selftests to use the tools/ uAPI headers rather than custom
defining EEXTSYS_NOOP.
- Make sure we return -EINVAL and -E2BIG for invalid sizes even if
CHECK_FIELDS is set, and add some tests for that.
- v1: <https://lore.kernel.org/r/20240902-extensible-structs-check_fields-v1-0-545…>
---
Aleksa Sarai (10):
uaccess: add copy_struct_to_user helper
sched_getattr: port to copy_struct_to_user
openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
openat2: add CHECK_FIELDS flag to usize argument
selftests: openat2: add 0xFF poisoned data after misaligned struct
selftests: openat2: add CHECK_FIELDS selftests
clone3: add CHECK_FIELDS flag to usize argument
selftests: clone3: add CHECK_FIELDS selftests
mount_setattr: add CHECK_FIELDS flag to usize argument
selftests: mount_setattr: add CHECK_FIELDS selftest
arch/alpha/include/uapi/asm/errno.h | 3 +
arch/mips/include/uapi/asm/errno.h | 3 +
arch/parisc/include/uapi/asm/errno.h | 3 +
arch/sparc/include/uapi/asm/errno.h | 3 +
fs/namespace.c | 17 ++
fs/open.c | 18 ++
include/linux/uaccess.h | 97 ++++++++
include/uapi/asm-generic/errno.h | 3 +
include/uapi/linux/openat2.h | 2 +
kernel/fork.c | 30 ++-
kernel/sched/syscalls.c | 42 +---
tools/arch/alpha/include/uapi/asm/errno.h | 3 +
tools/arch/mips/include/uapi/asm/errno.h | 3 +
tools/arch/parisc/include/uapi/asm/errno.h | 3 +
tools/arch/sparc/include/uapi/asm/errno.h | 3 +
tools/include/uapi/asm-generic/errno.h | 3 +
tools/include/uapi/asm-generic/posix_types.h | 101 ++++++++
tools/testing/selftests/clone3/.gitignore | 1 +
tools/testing/selftests/clone3/Makefile | 4 +-
.../testing/selftests/clone3/clone3_check_fields.c | 264 +++++++++++++++++++++
tools/testing/selftests/mount_setattr/Makefile | 2 +-
.../selftests/mount_setattr/mount_setattr_test.c | 53 ++++-
tools/testing/selftests/openat2/Makefile | 2 +
tools/testing/selftests/openat2/openat2_test.c | 165 ++++++++++++-
24 files changed, 777 insertions(+), 51 deletions(-)
---
base-commit: 98f7e32f20d28ec452afb208f9cffc08448a2652
change-id: 20240803-extensible-structs-check_fields-a47e94cef691
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
The new option controls tests run on boot or module load. With the new
debugfs "run" dentry allowing to run tests on demand, an ability to disable
automatic tests run becomes a useful option in case of intrusive tests.
The option is set to true by default to preserve the existent behavior. It
can be overridden by either the corresponding module option or by the
corresponding config build option.
Signed-off-by: Stanislav Kinsburskii <skinsburskii(a)linux.microsoft.com>
---
include/kunit/test.h | 4 +++-
lib/kunit/Kconfig | 12 ++++++++++++
lib/kunit/debugfs.c | 2 +-
lib/kunit/executor.c | 21 +++++++++++++++++++--
lib/kunit/test.c | 6 ++++--
5 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 34b71e42fb10..58dbab60f853 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -312,6 +312,7 @@ static inline void kunit_set_failure(struct kunit *test)
}
bool kunit_enabled(void);
+bool kunit_autorun(void);
const char *kunit_action(void);
const char *kunit_filter_glob(void);
char *kunit_filter(void);
@@ -334,7 +335,8 @@ kunit_filter_suites(const struct kunit_suite_set *suite_set,
int *err);
void kunit_free_suite_set(struct kunit_suite_set suite_set);
-int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites);
+int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites,
+ bool run_tests);
void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
index 34d7242d526d..a97897edd964 100644
--- a/lib/kunit/Kconfig
+++ b/lib/kunit/Kconfig
@@ -81,4 +81,16 @@ config KUNIT_DEFAULT_ENABLED
In most cases this should be left as Y. Only if additional opt-in
behavior is needed should this be set to N.
+config KUNIT_AUTORUN_ENABLED
+ bool "Default value of kunit.autorun"
+ default y
+ help
+ Sets the default value of kunit.autorun. If set to N then KUnit
+ tests will not run after initialization unless kunit.autorun=1 is
+ passed to the kernel command line. The test can still be run manually
+ via debugfs interface.
+
+ In most cases this should be left as Y. Only if additional opt-in
+ behavior is needed should this be set to N.
+
endif # KUNIT
diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c
index d548750a325a..9df064f40d98 100644
--- a/lib/kunit/debugfs.c
+++ b/lib/kunit/debugfs.c
@@ -145,7 +145,7 @@ static ssize_t debugfs_run(struct file *file,
struct inode *f_inode = file->f_inode;
struct kunit_suite *suite = (struct kunit_suite *) f_inode->i_private;
- __kunit_test_suites_init(&suite, 1);
+ __kunit_test_suites_init(&suite, 1, true);
return count;
}
diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 34b7b6833df3..3f39955cb0f1 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -29,6 +29,22 @@ const char *kunit_action(void)
return action_param;
}
+/*
+ * Run KUnit tests after initialization
+ */
+#ifdef CONFIG_KUNIT_AUTORUN_ENABLED
+static bool autorun_param = true;
+#else
+static bool autorun_param;
+#endif
+module_param_named(autorun, autorun_param, bool, 0);
+MODULE_PARM_DESC(autorun, "Run KUnit tests after initialization");
+
+bool kunit_autorun(void)
+{
+ return autorun_param;
+}
+
static char *filter_glob_param;
static char *filter_param;
static char *filter_action_param;
@@ -260,13 +276,14 @@ kunit_filter_suites(const struct kunit_suite_set *suite_set,
void kunit_exec_run_tests(struct kunit_suite_set *suite_set, bool builtin)
{
size_t num_suites = suite_set->end - suite_set->start;
+ bool autorun = kunit_autorun();
- if (builtin || num_suites) {
+ if (autorun && (builtin || num_suites)) {
pr_info("KTAP version 1\n");
pr_info("1..%zu\n", num_suites);
}
- __kunit_test_suites_init(suite_set->start, num_suites);
+ __kunit_test_suites_init(suite_set->start, num_suites, autorun);
}
void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool include_attr)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 089c832e3cdb..146d1b48a096 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -708,7 +708,8 @@ bool kunit_enabled(void)
return enable_param;
}
-int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites)
+int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites,
+ bool run_tests)
{
unsigned int i;
@@ -731,7 +732,8 @@ int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_
for (i = 0; i < num_suites; i++) {
kunit_init_suite(suites[i]);
- kunit_run_tests(suites[i]);
+ if (run_tests)
+ kunit_run_tests(suites[i]);
}
static_branch_dec(&kunit_running);
The new option controls tests run on boot or module load. With the new
debugfs "run" dentry allowing to run tests on demand, an ability to disable
automatic tests run becomes a useful option in case of intrusive tests.
The option is set to true by default to preserve the existent behavior. It
can be overridden by either the corresponding module option or by the
corresponding config build option.
Signed-off-by: Stanislav Kinsburskii <skinsburskii(a)linux.microsoft.com>
---
include/kunit/test.h | 4 +++-
lib/kunit/Kconfig | 12 ++++++++++++
lib/kunit/debugfs.c | 2 +-
lib/kunit/executor.c | 18 +++++++++++++++++-
lib/kunit/test.c | 6 ++++--
5 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 34b71e42fb10..58dbab60f853 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -312,6 +312,7 @@ static inline void kunit_set_failure(struct kunit *test)
}
bool kunit_enabled(void);
+bool kunit_autorun(void);
const char *kunit_action(void);
const char *kunit_filter_glob(void);
char *kunit_filter(void);
@@ -334,7 +335,8 @@ kunit_filter_suites(const struct kunit_suite_set *suite_set,
int *err);
void kunit_free_suite_set(struct kunit_suite_set suite_set);
-int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites);
+int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites,
+ bool run_tests);
void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
index 34d7242d526d..a97897edd964 100644
--- a/lib/kunit/Kconfig
+++ b/lib/kunit/Kconfig
@@ -81,4 +81,16 @@ config KUNIT_DEFAULT_ENABLED
In most cases this should be left as Y. Only if additional opt-in
behavior is needed should this be set to N.
+config KUNIT_AUTORUN_ENABLED
+ bool "Default value of kunit.autorun"
+ default y
+ help
+ Sets the default value of kunit.autorun. If set to N then KUnit
+ tests will not run after initialization unless kunit.autorun=1 is
+ passed to the kernel command line. The test can still be run manually
+ via debugfs interface.
+
+ In most cases this should be left as Y. Only if additional opt-in
+ behavior is needed should this be set to N.
+
endif # KUNIT
diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c
index d548750a325a..9df064f40d98 100644
--- a/lib/kunit/debugfs.c
+++ b/lib/kunit/debugfs.c
@@ -145,7 +145,7 @@ static ssize_t debugfs_run(struct file *file,
struct inode *f_inode = file->f_inode;
struct kunit_suite *suite = (struct kunit_suite *) f_inode->i_private;
- __kunit_test_suites_init(&suite, 1);
+ __kunit_test_suites_init(&suite, 1, true);
return count;
}
diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 34b7b6833df3..340723571b0f 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -29,6 +29,22 @@ const char *kunit_action(void)
return action_param;
}
+/*
+ * Run KUnit tests after initialization
+ */
+#ifdef CONFIG_KUNIT_AUTORUN_ENABLED
+static bool autorun_param = true;
+#else
+static bool autorun_param;
+#endif
+module_param_named(autorun, autorun_param, bool, 0);
+MODULE_PARM_DESC(autorun, "Run KUnit tests after initialization");
+
+bool kunit_autorun(void)
+{
+ return autorun_param;
+}
+
static char *filter_glob_param;
static char *filter_param;
static char *filter_action_param;
@@ -266,7 +282,7 @@ void kunit_exec_run_tests(struct kunit_suite_set *suite_set, bool builtin)
pr_info("1..%zu\n", num_suites);
}
- __kunit_test_suites_init(suite_set->start, num_suites);
+ __kunit_test_suites_init(suite_set->start, num_suites, kunit_autorun());
}
void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool include_attr)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 089c832e3cdb..146d1b48a096 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -708,7 +708,8 @@ bool kunit_enabled(void)
return enable_param;
}
-int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites)
+int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_suites,
+ bool run_tests)
{
unsigned int i;
@@ -731,7 +732,8 @@ int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_
for (i = 0; i < num_suites; i++) {
kunit_init_suite(suites[i]);
- kunit_run_tests(suites[i]);
+ if (run_tests)
+ kunit_run_tests(suites[i]);
}
static_branch_dec(&kunit_running);
Extend pmu_counters_test to AMD CPUs.
As the AMD PMU is quite different from Intel with different events and
feature sets, this series introduces a new code path to test it,
specifically focusing on the core counters including the
PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
counters exist, but are not as important and can be deferred to a
later series.
The first patch is a bug fix that could be submitted separately.
The series has been tested on both Intel and AMD machines, but I have
not found an AMD machine old enough to lack PerfCtrExtCore. I have
made efforts that no part of the code has any dependency on its
presence.
I am aware of similar work in this direction done by Jinrong Liang
[1]. He told me he is not working on it currently and I am not
intruding by making my own submission.
[1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/
v2:
* Test all combinations of VM setup rather than only the maximum
allowed by hardware
* Add fixes tag to bug fix in patch 1
* Refine some names
v1:
https://lore.kernel.org/kvm/20240813164244.751597-1-coltonlewis@google.com/
Colton Lewis (6):
KVM: x86: selftests: Fix typos in macro variable use
KVM: x86: selftests: Define AMD PMU CPUID leaves
KVM: x86: selftests: Set up AMD VM in pmu_counters_test
KVM: x86: selftests: Test read/write core counters
KVM: x86: selftests: Test core events
KVM: x86: selftests: Test PerfMonV2
.../selftests/kvm/include/x86_64/processor.h | 7 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 304 ++++++++++++++++--
2 files changed, 277 insertions(+), 34 deletions(-)
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.46.0.662.g92d0881bb0-goog
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v6:
* Avoid static inline in UAPI header as suggested by Pedro.
* Place PIDFD_SELF values out of range of errors and any other sentinel as
suggested by Pedro.
v5:
* Fixup self test dependencies on pidfd/pidfd.h.
https://lore.kernel.org/linux-mm/cover.1729848252.git.lorenzo.stoakes@oracl…
v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
Christian, instead simply always pin the pid and maintain fd scope in the
helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
UAPI pidfd.h header without encountering the collision between system
fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
stdbool.h in userland.
https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoakes@oracl…
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracl…
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (5):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
tools: testing: separate out wait_for_pid() into helper header
selftests: pidfd: add pidfd.h UAPI wrapper
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 34 ++++-
include/uapi/linux/pidfd.h | 10 ++
kernel/exit.c | 4 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 65 +++++---
kernel/signal.c | 29 +---
tools/include/linux/pidfd.h | 14 ++
tools/testing/selftests/cgroup/test_kill.c | 2 +-
.../pid_namespace/regression_enomem.c | 2 +-
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 28 +---
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
tools/testing/selftests/pidfd/pidfd_helpers.h | 39 +++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
15 files changed, 371 insertions(+), 88 deletions(-)
create mode 100644 tools/include/linux/pidfd.h
create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h
--
2.47.0
Recently the loongarch defconfig stopped working with the default 128 MiB
of memory. The VM just spins infinitively.
Increasing the available memory to 1 GiB, similar to s390, fixes the
issue. To avoid having to do this for each architecture on its own,
proactively apply to all architectures.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/testing/selftests/nolibc/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index 8de98ea7af8071caa0597aa7b86d91a2d1d50e68..e92e0b88586111072a0e043cb15f3b59cf42c3a6 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -130,9 +130,9 @@ QEMU_ARGS_ppc = -M g3beige -append "console=ttyS0 panic=-1 $(TEST:%=NOLIB
QEMU_ARGS_ppc64 = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_ppc64le = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_riscv = -M virt -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS_s390 = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
+QEMU_ARGS_s390 = -M s390-ccw-virtio -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_loongarch = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_BIOS) $(QEMU_ARGS_EXTRA)
+QEMU_ARGS = -m 1G $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_BIOS) $(QEMU_ARGS_EXTRA)
# OUTPUT is only set when run from the main makefile, otherwise
# it defaults to this nolibc directory.
---
base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
change-id: 20241007-nolibc-qemu-mem-5ed605520472
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
The idle HLT intercept feature is used for enlightened guests who wish
to securely handle the events. When an enlightened guest does a HLT
while an interrupt is pending, hypervisor will not have a way to
figure out whether the guest needs to be re-entered or not. The Idle
HLT intercept feature allows the HLT execution only if there are no
pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-next/next (64dbb3a771a1) + [2].
Experiments done:
----------------
kvm_amd.avic is set to '0' for this experiment.
The below numbers represent the average of 10 runs.
Normal guest (L1)
The below netperf command was run on the guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60
----------------------------------------------------------------
|with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)|
----------------------------------------------------------------
| 25645.7136 | 25773.2796 |
----------------------------------------------------------------
Number of transactions/sec with and without idle HLT intercept feature
are almost same.
Nested guest (L2)
The below netperf command was run on L2 guest with smp = 1 (pinned).
netperf -H <host ip> -t TCP_RR -l 60
----------------------------------------------------------------
|with Idle HLT(transactions/Sec)|w/o Idle HLT(transactions/Sec)|
----------------------------------------------------------------
| 5655.4468 | 5755.2189 |
----------------------------------------------------------------
Number of transactions/sec with and without idle HLT intercept feature
are almost same.
Testing Done:
- Tested the functionality for the Idle HLT intercept feature
using selftest svm_idle_hlt_test.
- Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
a way that each architecture can share the generic types and also
provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
https://bugzilla.kernel.org/attachment.cgi?id=306250
[2]: https://lore.kernel.org/kvm/20241021062226.108657-1-manali.shukla@amd.com/T…
Manali Shukla (4):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
KVM: nSVM: implement the nested idle halt intercept
KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/governed_features.h | 1 +
arch/x86/kvm/svm/nested.c | 7 ++
arch/x86/kvm/svm/svm.c | 15 +++-
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++
9 files changed, 115 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: c8d430db8eec7d4fd13a6bea27b7086a54eda6da
prerequisite-patch-id: ca912571db5c004f77b70843b8dd35517ff1267f
prerequisite-patch-id: 164ea3b4346f9e04bc69819278d20f5e1b5df5ed
prerequisite-patch-id: 90d870f426ebc2cec43c0dd89b701ee998385455
prerequisite-patch-id: 45812b799c517a4521782a1fdbcda881237e1eda
--
2.34.1
Hi,
Here is the v6 patch to support polling on event 'hist' file.
The previous version is here;
https://lore.kernel.org/all/172398710447.295714.4489282566285719918.stgit@d…
This version is rebased on the ftrace/for-next branch of the
linux-trace tree, and use global irq_work and wq instead of per-event
one.
Background
----------
There has been interest in allowing user programs to monitor kernel
events in real time. Ftrace provides `trace_pipe` interface to wait
on events in the ring buffer, but it is needed to wait until filling
up a page with events in the ring buffer. We can also peek the
`trace` file periodically, but that is inefficient way to monitor
a randomely happening event.
Overview
--------
This patch set allows user to `poll`(or `select`, `epoll`) on event
histogram interface. As you know each event has its own `hist` file
which shows histograms generated by trigger action. So user can set
a new hist trigger on any event you want to monitor, and poll on the
`hist` file until it is updated.
There are 2 poll events are supported, POLLIN and POLLPRI. POLLIN
means that there are any readable update on `hist` file and this
event will be flashed only when you call read(). So, this is
useful if you want to read the histogram periodically.
The other POLLPRI event is for monitoring trace event. Like the
POLLIN, this will be returned when the histogram is updated, but
you don't need to read() the file and use poll() again.
Note that this waits for histogram update (not event arrival), thus
you must set a histogram on the event at first.
Usage
-----
Here is an example usage:
----
TRACEFS=/sys/kernel/tracing
EVENT=$TRACEFS/events/sched/sched_process_free
# setup histogram trigger and enable event
echo "hist:key=comm" >> $EVENT/trigger
echo 1 > $EVENT/enable
# Wait for update
poll pri $EVENT/hist
# Event arrived.
echo "process free event is comming"
tail $TRACEFS/trace
----
The 'poll' command is in the selftest patch.
You can take this series also from here;
https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=t…
Thank you,
---
Masami Hiramatsu (Google) (3):
tracing/hist: Add poll(POLLIN) support on hist file
tracing/hist: Support POLLPRI event for poll on histogram
selftests/tracing: Add hist poll() support test
include/linux/trace_events.h | 14 +++
kernel/trace/trace_events.c | 14 +++
kernel/trace/trace_events_hist.c | 100 +++++++++++++++++++-
tools/testing/selftests/ftrace/Makefile | 2
tools/testing/selftests/ftrace/poll.c | 74 +++++++++++++++
.../ftrace/test.d/trigger/trigger-hist-poll.tc | 74 +++++++++++++++
6 files changed, 275 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/poll.c
create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-poll.tc
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
This series was prompted by feedback given in [1].
Patch 1 : Adds safe_hlt() and cli() helpers.
Patch 2, 3: Adds an interface to read vcpu stat in selftest. Adds
a macro to generate compiler error to detect typos at
compile time while parsing vcpu and vm stats.
Patch 4 : Fix few of the selftests based on newly defined macro.
This series was split from the Idle HLT intercept support series [2]
because the series has a few changes in the vm_get_stat() interface
as suggested in [1] and a few changes in two of the self-tests
(nx_huge_pages_test.c and dirty_log_page_splitting_test.c) which use
vm_get_stat() functionality to retrieve specified VM stats. These
changes are unrelated to the Idle HLT intercept support series [2].
[1] https://lore.kernel.org/kvm/ZruDweYzQRRcJeTO@google.com/T/#m7cd7a110f0fcff9…
[2] https://lore.kernel.org/kvm/ZruDweYzQRRcJeTO@google.com/T/#m6c67ca8ccb226e5…
Manali Shukla (4):
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Add an interface to read the data of named vcpu stat
KVM: selftests: convert vm_get_stat to macro
KVM: selftests: Replace previously used vm_get_stat() to macro
.../testing/selftests/kvm/include/kvm_util.h | 83 +++++++++++++++++--
.../kvm/include/x86_64/kvm_util_arch.h | 52 ++++++++++++
.../selftests/kvm/include/x86_64/processor.h | 17 ++++
tools/testing/selftests/kvm/lib/kvm_util.c | 40 +++++++++
.../x86_64/dirty_log_page_splitting_test.c | 6 +-
.../selftests/kvm/x86_64/nx_huge_pages_test.c | 4 +-
6 files changed, 191 insertions(+), 11 deletions(-)
base-commit: c8d430db8eec7d4fd13a6bea27b7086a54eda6da
--
2.34.1