From: Jiri Olsa <jolsa(a)redhat.com>
[ Upstream commit 5e22dd18626726028a93ff1350a8a71a00fd843d ]
The tc_redirect umounts /sys in the new namespace, which can be
mounted as shared and cause global umount. The lazy umount also
takes down mounted trees under /sys like debugfs, which won't be
available after sysfs mounts again and could cause fails in other
tests.
# cat /proc/self/mountinfo | grep debugfs
34 23 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:14 - debugfs debugfs rw
# cat /proc/self/mountinfo | grep sysfs
23 86 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
# mount | grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
# ./test_progs -t tc_redirect
#164 tc_redirect:OK
Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
# mount | grep debugfs
# cat /proc/self/mountinfo | grep debugfs
# cat /proc/self/mountinfo | grep sysfs
25 86 0:22 / /sys rw,relatime shared:2 - sysfs sysfs rw
Making the sysfs private under the new namespace so the umount won't
trigger the global sysfs umount.
Reported-by: Hangbin Liu <haliu(a)redhat.com>
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: Jussi Maki <joamaki(a)gmail.com>
Link: https://lore.kernel.org/bpf/20220104121030.138216-1-jolsa@kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/prog_tests/tc_redirect.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index e7201ba29ccd6..47e3159729d21 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -105,6 +105,13 @@ static int setns_by_fd(int nsfd)
if (!ASSERT_OK(err, "unshare"))
return err;
+ /* Make our /sys mount private, so the following umount won't
+ * trigger the global umount in case it's shared.
+ */
+ err = mount("none", "/sys", NULL, MS_PRIVATE, NULL);
+ if (!ASSERT_OK(err, "remount private /sys"))
+ return err;
+
err = umount2("/sys", MNT_DETACH);
if (!ASSERT_OK(err, "umount2 /sys"))
return err;
--
2.34.1
From: David Gow <davidgow(a)google.com>
[ Upstream commit 37dbb4c7c7442dbfc9b651e4ddd4afe30b26afc9 ]
It's possible that a parameterised test could end up with zero
parameters. At the moment, the test function will nevertheless be called
with NULL as the parameter. Instead, don't try to run the test code, and
just mark the test as SKIPped.
Reported-by: Daniel Latypov <dlatypov(a)google.com>
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kunit/test.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index f246b847024e3..9aef816e573c1 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -504,16 +504,18 @@ int kunit_run_tests(struct kunit_suite *suite)
struct kunit_result_stats param_stats = { 0 };
test_case->status = KUNIT_SKIPPED;
- if (test_case->generate_params) {
+ if (!test_case->generate_params) {
+ /* Non-parameterised test. */
+ kunit_run_case_catch_errors(suite, test_case, &test);
+ kunit_update_stats(¶m_stats, test.status);
+ } else {
/* Get initial param. */
param_desc[0] = '\0';
test.param_value = test_case->generate_params(NULL, param_desc);
- }
- do {
- kunit_run_case_catch_errors(suite, test_case, &test);
+ while (test.param_value) {
+ kunit_run_case_catch_errors(suite, test_case, &test);
- if (test_case->generate_params) {
if (param_desc[0] == '\0') {
snprintf(param_desc, sizeof(param_desc),
"param-%d", test.param_index);
@@ -530,11 +532,11 @@ int kunit_run_tests(struct kunit_suite *suite)
param_desc[0] = '\0';
test.param_value = test_case->generate_params(test.param_value, param_desc);
test.param_index++;
- }
- kunit_update_stats(¶m_stats, test.status);
+ kunit_update_stats(¶m_stats, test.status);
+ }
+ }
- } while (test.param_value);
kunit_print_test_stats(&test, param_stats);
--
2.34.1
From: Heiko Carstens <hca(a)linux.ibm.com>
[ Upstream commit e5992f373c6eed6d09e5858e9623df1259b3ce30 ]
Commit 32f6e5da83c7 ("selftests/ftrace: Add kprobe profile testcase")
added a new kprobes testcase, but has a description which does not
describe what the test case is doing and is duplicating the description
of another test case.
Therefore change the test case description, so it is unique and then
allows easily to tell which test case actually passed or failed.
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/ftrace/test.d/kprobe/profile.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc b/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc
index 98166fa3eb91c..34fb89b0c61fa 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc
@@ -1,6 +1,6 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-# description: Kprobe dynamic event - adding and removing
+# description: Kprobe profile
# requires: kprobe_events
! grep -q 'myevent' kprobe_profile
--
2.34.1
From: Jiri Olsa <jolsa(a)redhat.com>
[ Upstream commit 5e22dd18626726028a93ff1350a8a71a00fd843d ]
The tc_redirect umounts /sys in the new namespace, which can be
mounted as shared and cause global umount. The lazy umount also
takes down mounted trees under /sys like debugfs, which won't be
available after sysfs mounts again and could cause fails in other
tests.
# cat /proc/self/mountinfo | grep debugfs
34 23 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:14 - debugfs debugfs rw
# cat /proc/self/mountinfo | grep sysfs
23 86 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
# mount | grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
# ./test_progs -t tc_redirect
#164 tc_redirect:OK
Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
# mount | grep debugfs
# cat /proc/self/mountinfo | grep debugfs
# cat /proc/self/mountinfo | grep sysfs
25 86 0:22 / /sys rw,relatime shared:2 - sysfs sysfs rw
Making the sysfs private under the new namespace so the umount won't
trigger the global sysfs umount.
Reported-by: Hangbin Liu <haliu(a)redhat.com>
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: Jussi Maki <joamaki(a)gmail.com>
Link: https://lore.kernel.org/bpf/20220104121030.138216-1-jolsa@kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/prog_tests/tc_redirect.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index 4b18b73df10b6..c2426df58e172 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -105,6 +105,13 @@ static int setns_by_fd(int nsfd)
if (!ASSERT_OK(err, "unshare"))
return err;
+ /* Make our /sys mount private, so the following umount won't
+ * trigger the global umount in case it's shared.
+ */
+ err = mount("none", "/sys", NULL, MS_PRIVATE, NULL);
+ if (!ASSERT_OK(err, "remount private /sys"))
+ return err;
+
err = umount2("/sys", MNT_DETACH);
if (!ASSERT_OK(err, "umount2 /sys"))
return err;
--
2.34.1
From: David Gow <davidgow(a)google.com>
[ Upstream commit 37dbb4c7c7442dbfc9b651e4ddd4afe30b26afc9 ]
It's possible that a parameterised test could end up with zero
parameters. At the moment, the test function will nevertheless be called
with NULL as the parameter. Instead, don't try to run the test code, and
just mark the test as SKIPped.
Reported-by: Daniel Latypov <dlatypov(a)google.com>
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kunit/test.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 3bd741e50a2d3..f96498ede2cc5 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -504,16 +504,18 @@ int kunit_run_tests(struct kunit_suite *suite)
struct kunit_result_stats param_stats = { 0 };
test_case->status = KUNIT_SKIPPED;
- if (test_case->generate_params) {
+ if (!test_case->generate_params) {
+ /* Non-parameterised test. */
+ kunit_run_case_catch_errors(suite, test_case, &test);
+ kunit_update_stats(¶m_stats, test.status);
+ } else {
/* Get initial param. */
param_desc[0] = '\0';
test.param_value = test_case->generate_params(NULL, param_desc);
- }
- do {
- kunit_run_case_catch_errors(suite, test_case, &test);
+ while (test.param_value) {
+ kunit_run_case_catch_errors(suite, test_case, &test);
- if (test_case->generate_params) {
if (param_desc[0] == '\0') {
snprintf(param_desc, sizeof(param_desc),
"param-%d", test.param_index);
@@ -530,11 +532,11 @@ int kunit_run_tests(struct kunit_suite *suite)
param_desc[0] = '\0';
test.param_value = test_case->generate_params(test.param_value, param_desc);
test.param_index++;
- }
- kunit_update_stats(¶m_stats, test.status);
+ kunit_update_stats(¶m_stats, test.status);
+ }
+ }
- } while (test.param_value);
kunit_print_test_stats(&test, param_stats);
--
2.34.1
From: Heiko Carstens <hca(a)linux.ibm.com>
[ Upstream commit e5992f373c6eed6d09e5858e9623df1259b3ce30 ]
Commit 32f6e5da83c7 ("selftests/ftrace: Add kprobe profile testcase")
added a new kprobes testcase, but has a description which does not
describe what the test case is doing and is duplicating the description
of another test case.
Therefore change the test case description, so it is unique and then
allows easily to tell which test case actually passed or failed.
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/ftrace/test.d/kprobe/profile.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc b/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc
index 98166fa3eb91c..34fb89b0c61fa 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc
@@ -1,6 +1,6 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-# description: Kprobe dynamic event - adding and removing
+# description: Kprobe profile
# requires: kprobe_events
! grep -q 'myevent' kprobe_profile
--
2.34.1
[ I should really have CC'd the selftests maintainer and mailing list.
Adding them in Cc to patch 0/5 to bring this series to their attention. ]
----- On Jan 17, 2022, at 3:39 PM, Mathieu Desnoyers mathieu.desnoyers(a)efficios.com wrote:
> glibc-2.35 will be released on 2022-02-01. It introduces a user-space ABI
> based on the thread pointer to access a reserved area of the TCB.
>
> The rseq selftests originally expected the rseq thread data to sit in a
> __rseq_abi TLS variable.
>
> Considering that the rseq ABI only allows a single rseq registration per
> thread, both cannot actively coexist in a process.
>
> Adapt the selftests librseq implementation to become compatible with
> glibc-2.35. Keep a fallback implementation based on TLS available when
> an older glibc is detected.
>
> Feedback is welcome,
>
> Thanks,
>
> Mathieu
>
> Mathieu Desnoyers (5):
> selftests/rseq: Remove useless assignment to cpu variable
> selftests/rseq: Remove volatile from __rseq_abi
> selftests/rseq: Introduce rseq_get_abi() helper
> selftests/rseq: Introduce thread pointer getters
> selftests/rseq: Uplift rseq selftests for compatibility with
> glibc-2.35
>
> tools/testing/selftests/rseq/Makefile | 2 +-
> tools/testing/selftests/rseq/param_test.c | 4 +-
> tools/testing/selftests/rseq/rseq-arm.h | 32 ++--
> tools/testing/selftests/rseq/rseq-arm64.h | 32 ++--
> .../rseq/rseq-generic-thread-pointer.h | 25 +++
> tools/testing/selftests/rseq/rseq-mips.h | 32 ++--
> .../selftests/rseq/rseq-ppc-thread-pointer.h | 30 ++++
> tools/testing/selftests/rseq/rseq-ppc.h | 32 ++--
> tools/testing/selftests/rseq/rseq-s390.h | 24 +--
> .../selftests/rseq/rseq-thread-pointer.h | 19 +++
> .../selftests/rseq/rseq-x86-thread-pointer.h | 40 +++++
> tools/testing/selftests/rseq/rseq-x86.h | 30 ++--
> tools/testing/selftests/rseq/rseq.c | 161 +++++++++---------
> tools/testing/selftests/rseq/rseq.h | 24 ++-
> 14 files changed, 302 insertions(+), 185 deletions(-)
> create mode 100644 tools/testing/selftests/rseq/rseq-generic-thread-pointer.h
> create mode 100644 tools/testing/selftests/rseq/rseq-ppc-thread-pointer.h
> create mode 100644 tools/testing/selftests/rseq/rseq-thread-pointer.h
> create mode 100644 tools/testing/selftests/rseq/rseq-x86-thread-pointer.h
>
> --
> 2.17.1
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
[ Upstream commit 3c42e9542050d49610077e083c7c3f5fd5e26820 ]
A mis-match between reported and actual mitigation is not restricted to the
Vulnerable case. The guest might also report the mitigation as "Software
count cache flush" and the host will still mitigate with branch cache
disabled.
So, instead of skipping depending on the detected mitigation, simply skip
whenever the detected miss_percent is the expected one for a fully
mitigated system, that is, above 95%.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211207130557.40566-1-cascardo@canonical.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/spectre_v2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/security/spectre_v2.c b/tools/testing/selftests/powerpc/security/spectre_v2.c
index adc2b7294e5fd..83647b8277e7d 100644
--- a/tools/testing/selftests/powerpc/security/spectre_v2.c
+++ b/tools/testing/selftests/powerpc/security/spectre_v2.c
@@ -193,7 +193,7 @@ int spectre_v2_test(void)
* We are not vulnerable and reporting otherwise, so
* missing such a mismatch is safe.
*/
- if (state == VULNERABLE)
+ if (miss_percent > 95)
return 4;
return 1;
--
2.34.1
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
[ Upstream commit 3c42e9542050d49610077e083c7c3f5fd5e26820 ]
A mis-match between reported and actual mitigation is not restricted to the
Vulnerable case. The guest might also report the mitigation as "Software
count cache flush" and the host will still mitigate with branch cache
disabled.
So, instead of skipping depending on the detected mitigation, simply skip
whenever the detected miss_percent is the expected one for a fully
mitigated system, that is, above 95%.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211207130557.40566-1-cascardo@canonical.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/spectre_v2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/security/spectre_v2.c b/tools/testing/selftests/powerpc/security/spectre_v2.c
index adc2b7294e5fd..83647b8277e7d 100644
--- a/tools/testing/selftests/powerpc/security/spectre_v2.c
+++ b/tools/testing/selftests/powerpc/security/spectre_v2.c
@@ -193,7 +193,7 @@ int spectre_v2_test(void)
* We are not vulnerable and reporting otherwise, so
* missing such a mismatch is safe.
*/
- if (state == VULNERABLE)
+ if (miss_percent > 95)
return 4;
return 1;
--
2.34.1
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
[ Upstream commit 3c42e9542050d49610077e083c7c3f5fd5e26820 ]
A mis-match between reported and actual mitigation is not restricted to the
Vulnerable case. The guest might also report the mitigation as "Software
count cache flush" and the host will still mitigate with branch cache
disabled.
So, instead of skipping depending on the detected mitigation, simply skip
whenever the detected miss_percent is the expected one for a fully
mitigated system, that is, above 95%.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211207130557.40566-1-cascardo@canonical.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/spectre_v2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/security/spectre_v2.c b/tools/testing/selftests/powerpc/security/spectre_v2.c
index adc2b7294e5fd..83647b8277e7d 100644
--- a/tools/testing/selftests/powerpc/security/spectre_v2.c
+++ b/tools/testing/selftests/powerpc/security/spectre_v2.c
@@ -193,7 +193,7 @@ int spectre_v2_test(void)
* We are not vulnerable and reporting otherwise, so
* missing such a mismatch is safe.
*/
- if (state == VULNERABLE)
+ if (miss_percent > 95)
return 4;
return 1;
--
2.34.1
User Interrupts Introduction
============================
User Interrupts (Uintr) is a hardware technology that enables delivering
interrupts directly to user space.
Today, virtually all communication across privilege boundaries happens by going
through the kernel. These include signals, pipes, remote procedure calls and
hardware interrupt based notifications. User interrupts provide the foundation
for more efficient (low latency and low CPU utilization) versions of these
common operations by avoiding transitions through the kernel.
In the User Interrupts hardware architecture, a receiver is always expected to
be a user space task. However, a user interrupt can be sent by another user
space task, kernel or an external source (like a device).
In addition to the general infrastructure to receive user interrupts, this
series introduces a single source: interrupts from another user task. These
are referred to as User IPIs.
The first implementation of User IPIs will be in the Intel processor code-named
Sapphire Rapids. Refer Chapter 11 of the Intel Architecture instruction set
extensions for details of the hardware architecture [1].
Series-reviewed-by: Tony Luck <tony.luck(a)intel.com>
Main goals of this RFC
======================
- Introduce this upcoming technology to the community.
This cover letter includes a hardware architecture summary along with the
software architecture and kernel design choices. This post is a bit long as a
result. Hopefully, it helps answer more questions than it creates :) I am also
planning to talk about User Interrupts next week at the LPC Kernel summit.
- Discuss potential use cases.
We are starting to look at actual usages and libraries (like libevent[2] and
liburing[3]) that can take advantage of this technology. Unfortunately, we
don't have much to share on this right now. We need some help from the
community to identify usages that can benefit from this. We would like to make
sure the proposed APIs work for the eventual consumers.
- Get early feedback on the software architecture.
We are hoping to get some feedback on the direction of overall software
architecture - starting with User IPI, extending it for kernel-to-user
interrupt notifications and external interrupts in the future.
- Discuss some of the main architecture opens.
There is lot of work that still needs to happen to enable this technology. We
are looking for some input on future patches that would be of interest. Here
are some of the big opens that we are looking to resolve.
* Should Uintr interrupt all blocking system calls like sleep(), read(),
poll(), etc? If so, should we implement an SA_RESTART type of mechanism
similar to signals? - Refer Blocking for interrupts section below.
* Should the User Interrupt Target table (UITT) be shared between threads of a
multi-threaded application or maybe even across processes? - Refer Sharing
the UITT section below.
Why care about this? - Micro benchmark performance
==================================================
There is a ~9x or higher performance improvement using User IPI over other IPC
mechanisms for event signaling.
Below is the average normalized latency for a 1M ping-pong IPC notifications
with message size=1.
+------------+-------------------------+
| IPC type | Relative Latency |
| |(normalized to User IPI) |
+------------+-------------------------+
| User IPI | 1.0 |
| Signal | 14.8 |
| Eventfd | 9.7 |
| Pipe | 16.3 |
| Domain | 17.3 |
+------------+-------------------------+
Results have been estimated based on tests on internal hardware with Linux
v5.14 + User IPI patches.
Original benchmark: https://github.com/goldsborough/ipc-bench
Updated benchmark: https://github.com/intel/uintr-ipc-bench/tree/linux-rfc-v1
*Performance varies by use, configuration and other factors.
How it works underneath? - Hardware Summary
===========================================
User Interrupts is a posted interrupt delivery mechanism. The interrupts are
first posted to a memory location and then delivered to the receiver when they
are running with CPL=3.
Kernel managed architectural data structures
--------------------------------------------
UPID: User Posted Interrupt Descriptor - Holds receiver interrupt vector
information and notification state (like an ongoing notification, suppressed
notifications).
UITT: User Interrupt Target Table - Stores UPID pointer and vector information
for interrupt routing on the sender side. Referred by the senduipi instruction.
The interrupt state of each task is referenced via MSRs which are saved and
restored by the kernel during context switch.
Instructions
------------
senduipi <index> - send a user IPI to a target task based on the UITT index.
clui - Mask user interrupts by clearing UIF (User Interrupt Flag).
stui - Unmask user interrupts by setting UIF.
testui - Test current value of UIF.
uiret - return from a user interrupt handler.
User IPI
--------
When a User IPI sender executes 'senduipi <index>', the hardware refers the
UITT table entry pointed by the index and posts the interrupt vector (63-0)
into the receiver's UPID.
If the receiver is running (CPL=3), the sender cpu would send a physical IPI to
the receiver's cpu. On the receiver side this IPI is detected as a User
Interrupt. The User Interrupt handler for the receiver is invoked and the
vector number (63-0) is pushed onto the stack.
Upon execution of 'uiret' in the interrupt handler, the control is transferred
back to instruction that was interrupted.
Refer Chapter 11 of the Intel Architecture instruction set extensions [1] for
more details.
Application interface - Software Architecture
=============================================
User Interrupts (Uintr) is an opt-in feature (unlike signals). Applications
wanting to use Uintr are expected to register themselves with the kernel using
the Uintr related system calls. A Uintr receiver is always a userspace task. A
Uintr sender can be another userspace task, kernel or a device.
1) A receiver can register/unregister an interrupt handler using the Uintr
receiver related syscalls.
uintr_register_handler(handler, flags)
uintr_unregister_handler(flags)
2) A syscall also allows a receiver to register a vector and create a user
interrupt file descriptor - uintr_fd.
uintr_fd = uintr_create_fd(vector, flags)
Uintr can be useful in some of the usages where eventfd or signals are used for
frequent userspace event notifications. The semantics of uintr_fd are somewhat
similar to an eventfd() or the write end of a pipe.
3) Any sender with access to uintr_fd can use it to deliver events (in this
case - interrupts) to a receiver. A sender task can manage its connection with
the receiver using the sender related syscalls based on uintr_fd.
uipi_index = uintr_register_sender(uintr_fd, flags)
Using an FD abstraction provides a secure mechanism to connect with a receiver.
The FD sharing and isolation mechanisms put in place by the kernel would extend
to Uintr as well.
4a) After the initial setup, a sender task can use the SENDUIPI instruction
along with the uipi_index to generate user IPIs without any kernel
intervention.
SENDUIPI <uipi_index>
If the receiver is running (CPL=3), then the user interrupt is delivered
directly without a kernel transition. If the receiver isn't running the
interrupt is delivered when the receiver gets context switched back. If the
receiver is blocked in the kernel, the user interrupt is delivered to the
kernel which then unblocks the intended receiver to deliver the interrupt.
4b) If the sender is the kernel or a device, the uintr_fd can be passed onto
the related kernel entity to allow them to setup a connection and then generate
a user interrupt for event delivery. <The exact details of this API are still
being worked upon.>
For details of the user interface and associated system calls refer the Uintr
man-pages draft:
https://github.com/intel/uintr-linux-kernel/tree/rfc-v1/tools/uintr/manpages.
We have also included the same content as patch 1 of this series to make it
easier to review.
Refer the Uintr compiler programming guide [4] for details on Uintr integration
with GCC and Binutils.
Kernel design choices
=====================
Here are some of the reasons and trade-offs for the current design of the APIs.
System call interface
---------------------
Why a system call interface?: The 2 options we considered are using a char
device at /dev or use system calls (current approach). A syscall approach
avoids exposing a core cpu feature through a driver model. Also, we want to
have a user interrupt FD per vector and share a single common interrupt handler
among all vectors. This seems easier for the kernel and userspace to accomplish
using a syscall based approach.
Data sharing using user interrupts: Uintr doesn't include a mechanism to
share/transmit data. The expectation is applications use existing data sharing
mechanisms to share data and use Uintr only for signaling.
An FD for each vector: A uintr_fd is assigned to each vector to allow fine
grained priority and event management by the receiver. The alternative we
considered was to allocate an FD to the interrupt handler and having that
shared with the sender. However, that approach relies on the sender selecting
the vector and moves the vector priority management to the sender. Also, if
multiple senders want to send unique user interrupts they would need to
coordinate the vector selection amongst them.
Extending the APIs: Currently, the system calls are only extendable using the
flags argument. We can add a variable size struct to some of the syscalls if
needed.
Extending existing mechanisms
-----------------------------
Uintr can be beneficial in some of the usages where eventfd() or signals are
used. Since Uintr is hardware-dependent, thread-specific and bypasses the
kernel in the fast path, it makes extending existing mechanisms harder.
Main issues with extending signals:
Signal handlers are defined significantly differently than a User interrupt
handler. An application needs to save/restore registers in a user interrupt
handler and call uiret to return from it. Also, signals can be process directed
(or thread directed) but user interrupts are always thread directed.
Comparison of signals with User Interrupts:
+=====================+===========================+===========================+
| | Signals | User Interrupts |
+=====================+===========================+===========================+
| Stacks | Has alt stacks | Uses application stack |
| | | (alternate stack option |
| | | not yet enabled) |
+---------------------+---------------------------+---------------------------+
| Registers state | Kernel manages incl. | App responsible (Use GCC |
| | FPU/XSTATE area | 'interrupt' attribute for |
| | | general purpose registers)|
+---------------------+---------------------------+---------------------------+
| Blocking/Masking | sigprocmask(2)/sa_mask | CLUI instruction (No per |
| | | vector masking) |
+---------------------+---------------------------+---------------------------+
| Direction | Uni-directional | Uni-directional |
+---------------------+---------------------------+---------------------------+
| Post event | kill(), signal(), | SENDUIPI <index> - index |
| | sigqueue(), etc. | derived from uintr_fd |
+---------------------+---------------------------+---------------------------+
| Target | Process-directed or | Thread-directed |
| | thread-directed | |
+---------------------+---------------------------+---------------------------+
| Fork/inheritance | Empty signal set | Nothing is inherited |
+---------------------+---------------------------+---------------------------+
| Execv | Pending signals preserved | Nothing is inherited |
+---------------------+---------------------------+---------------------------+
| Order of delivery | Undetermined | High to low vector numbers|
| for multiple signals| | |
+---------------------+---------------------------+---------------------------+
| Handler re-entry | All signals except the | No interrupts can cause |
| | one being handled | handler re-entry. |
+---------------------+---------------------------+---------------------------+
| Delivery feedback | 0 or -1 based on whether | No feedback on whether the|
| | the signal was sent | interrupt was sent or |
| | | received. |
+---------------------+---------------------------+---------------------------+
Main issues with extending eventfd():
eventfd() has a counter value that is core to the API. User interrupts can't
have an associated counter since the signaling happens at the user level and
the hardware doesn't have a memory counter mechanism. Also, eventfd can be used
for bi-directional signaling where as uintr_fd is uni-directional.
Comparison of eventfd with uintr_fd:
+====================+======================+==============================+
| | Eventfd | uintr_fd (User Interrupt FD) |
+====================+======================+==============================+
| Object | Counter - uint64 | Receiver vector information |
+--------------------+----------------------+------------------------------+
| Post event | write() to eventfd | SENDUIPI <index> - index |
| | | derived from uintr_fd |
+--------------------+----------------------+------------------------------+
| Receive event | read() on eventfd | Implicit - Handler is |
| | | invoked with associated |
| | | vector. |
+--------------------+----------------------+------------------------------+
| Direction | Bi-directional | Uni-directional |
+--------------------+----------------------+------------------------------+
| Data transmitted | Counter - uint64 | None |
+--------------------+----------------------+------------------------------+
| Waiting for events | Poll() family of | No per vector wait. |
| | syscalls | uintr_wait() allows waiting |
| | | for all user interrupts |
+--------------------+----------------------+------------------------------+
Security Model
==============
User Interrupts is designed as an opt-in feature (unlike signals). The security
model for user interrupts is intended to be similar to eventfd(). The general
idea is that any sender with access to uintr_fd would be able to generate the
associated interrupt vector for the receiver task that created the fd.
Untrusted processes
-------------------
The current implementation expects only trusted and cooperating processes to
communicate using user interrupts. Coordination is expected between processes
for a connection teardown. In situations where coordination doesn't happen
(say, due to abrupt process exit), the kernel would end up keeping shared
resources (like UPID) allocated to avoid faults.
Currently, a sender can easily cause a denial of service for the receiver by
generating a storm of user interrupts. A user interrupt handler is invoked with
interrupts disabled, but upon execution of uiret, interrupts get enabled again
by the hardware. This can lead to the handler being invoked again before normal
execution can resume. There isn't a hardware mechanism to mask specific
interrupt vectors.
To enable untrusted processes to communicate, we need to add a per-vector
masking option through another syscall (or maybe IOCTL). However, this can add
some complexity to the kernel code. A vector can only be masked by modifying
the UITT entries at the source. We need to be careful about races while
removing and restoring the UPID from the UITT.
Resource limits
---------------
The maximum number of receiver-sender connections would be limited by the
maximum number of open file descriptors and the size of the UITT.
The UITT size is chosen as 4kB fixed size arbitrarily right now. We plan to
make it dynamic and configurable in size. RLIMIT_MEMLOCK or ENOMEM should be
triggered when the size limits have been hit.
Main Opens
==========
Blocking for interrupts
-----------------------
User interrupts are delivered to applications immediately if they are running
in userspace. If a receiver task has blocked in the kernel using the placeholder
uintr_wait() syscall, the task would be woken up to deliver the user interrupt.
However, if the task is blocked due to any other blocking calls like read(),
sleep(), etc; the interrupt will only get delivered when the application gets
scheduled again. We need to consider if applications need to receive User
Interrupts as soon as they are posted (similar to signals) when they are
blocked due to some other reason. Adding this capability would likely make the
kernel implementation more complex.
Interrupting system calls using User Interrupts would also mean we need to
consider an SA_RESTART type of mechanism. We also need to evaluate if some of
the signal handler related semantics in the kernel can be reused for User
Interrupts.
Sharing the User Interrupt Target Table (UITT)
----------------------------------------------
The current implementation assigns a unique UITT to each task. This assumes
that User interrupts are used for point-to-point communication between 2 tasks.
Also, this keeps the kernel implementation relatively simple.
However, there are of benefits to sharing the UITT between threads of a
multi-threaded application. One, they would see a consistent view of the UITT.
i.e. SENDUIPI <index> would mean the same on all threads of the application.
Also, each thread doesn't have to register itself using the common uintr_fd.
This would simplify the userspace setup and make efficient use of kernel
memory. The potential downside is that the kernel implementation to allocate,
modify, expand and free the UITT would be more complex.
A similar argument can be made for a set of processes that do a lot of IPC
amongst them. They would prefer to have a shared UITT that lets them target any
process from any process. With the current file descriptor based approach, the
connection setup can be time consuming and somewhat cumbersome. We need to
evaluate if this can be made simpler as well.
Kernel page table isolation (KPTI)
----------------------------------
SENDUIPI is a special ring-3 instruction that makes a supervisor mode memory
access to the UPID and UITT memory. The current patches need KPTI to be
disabled for User IPIs to work. To make User IPI work with KPTI, we need to
allocate these structures from a special memory region that has supervisor
access but it is mapped into userspace. The plan is to implement a mechanism
similar to LDT.
Processors that support user interrupts are not affected by Meltdown so the
auto mode of KPTI will default to off. Users who want to force enable KPTI will
need to wait for a later version of this patch series to use user interrupts.
Please let us know if you want the development of these patches to be
prioritized (or deprioritized).
FAQs
====
Q: What happens if a process is "surprised" by a user interrupt?
A: For tasks that haven't registered with the kernel and requested for user
interrupts aren't expected or able to receive to user interrupts.
Q: Do user interrupts affect kernel scheduling?
A: No. If a task is blocked waiting for user interrupts, when the kernel
receives a notification on behalf of that task we only put it back on the
runqueue. Delivery of a user interrupt in no way changes the scheduling
priorities of a task.
Q: Does the sender get to know if the interrupt was delivered?
A: No. User interrupts only provides a posted interrupt delivery mechanism. If
applications need to rely on whether the interrupt was delivered they should
consider a userspace mechanism for feedback (like a shared memory counter or a
user interrupt back to the sender).
Q: Why is there no feedback on interrupt delivery?
A: Being a posted interrupt delivery mechanism, the interrupt delivery
happens in 2 steps:
1) The interrupt information is stored in a memory location (UPID).
2) The physical interrupt is delivered to the interrupt receiver.
The 2nd step could happen immediately, after an extended period, or it might
never happen based on the state of the receiver after step 1. (The receiver
could have disabled interrupts, have been context switched out or it might have
crashed during that time.) This makes it very hard for the hardware to reliably
provide feedback upon execution of SENDUIPI.
Q: Can user interrupts be nested?
A: Yes. Using STUI instruction in the interrupt handler would allow new user
interrupts to be delivered. However, there no TPR(thread priority register)
like mechanism to allow only higher priority interrupts. Any user interrupt can
be taken when nesting is enabled.
Q: Can a task receive all pending user interrupts in one go?
A: No. The hardware allows only one vector to be processed at a time. If a task
is interested in knowing all the interrupts that are pending then we could add
a syscall that provides the pending interrupts information.
Q: Do the processes need to be pinned to a cpu?
A: No. User interrupts will be routed correctly to whichever cpu the receiver
is running on. The kernel updates the cpu information in the UPID during
context switch.
Q: Why are UPID and UITT allocated by the kernel?
A: If allocated by user space, applications could misuse the UPID and UITT to
write to unauthorized memory and generate interrupts on any cpu. The UPID and
UITT are allocated by the kernel and accessed by the hardware with supervisor
privilege.
Patch structure for this series
===============================
- Man-pages and Kernel documentation (patch 1,2)
- Hardware enumeration (patch 3, 4)
- User IPI kernel vector reservation (patch 5)
- Syscall interface for interrupt receiver, sender and vector
management(uintr_fd) (patch 6-12)
- Basic selftests (patch 13)
Along with the patches in this RFC, there are additional tests and samples that
are available at:
https://github.com/intel/uintr-linux-kernel/tree/rfc-v1
Links
=====
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-archite…
[2]: https://libevent.org/
[3]: https://github.com/axboe/liburing
[4]: https://github.com/intel/uintr-compiler-guide/blob/uintr-gcc-11.1/UINTR-com…
Sohil Mehta (13):
x86/uintr/man-page: Include man pages draft for reference
Documentation/x86: Add documentation for User Interrupts
x86/cpu: Enumerate User Interrupts support
x86/fpu/xstate: Enumerate User Interrupts supervisor state
x86/irq: Reserve a user IPI notification vector
x86/uintr: Introduce uintr receiver syscalls
x86/process/64: Add uintr task context switch support
x86/process/64: Clean up uintr task fork and exit paths
x86/uintr: Introduce vector registration and uintr_fd syscall
x86/uintr: Introduce user IPI sender syscalls
x86/uintr: Introduce uintr_wait() syscall
x86/uintr: Wire up the user interrupt syscalls
selftests/x86: Add basic tests for User IPI
.../admin-guide/kernel-parameters.txt | 2 +
Documentation/x86/index.rst | 1 +
Documentation/x86/user-interrupts.rst | 107 +++
arch/x86/Kconfig | 12 +
arch/x86/entry/syscalls/syscall_32.tbl | 6 +
arch/x86/entry/syscalls/syscall_64.tbl | 6 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/entry-common.h | 4 +
arch/x86/include/asm/fpu/types.h | 20 +-
arch/x86/include/asm/fpu/xstate.h | 3 +-
arch/x86/include/asm/hardirq.h | 4 +
arch/x86/include/asm/idtentry.h | 5 +
arch/x86/include/asm/irq_vectors.h | 6 +-
arch/x86/include/asm/msr-index.h | 8 +
arch/x86/include/asm/processor.h | 8 +
arch/x86/include/asm/uintr.h | 76 ++
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 61 ++
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/fpu/core.c | 17 +
arch/x86/kernel/fpu/xstate.c | 20 +-
arch/x86/kernel/idt.c | 4 +
arch/x86/kernel/irq.c | 51 +
arch/x86/kernel/process.c | 10 +
arch/x86/kernel/process_64.c | 4 +
arch/x86/kernel/uintr_core.c | 880 ++++++++++++++++++
arch/x86/kernel/uintr_fd.c | 300 ++++++
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/unistd.h | 15 +-
kernel/sys_ni.c | 8 +
scripts/checksyscalls.sh | 6 +
tools/testing/selftests/x86/Makefile | 10 +
tools/testing/selftests/x86/uintr.c | 147 +++
tools/uintr/manpages/0_overview.txt | 265 ++++++
tools/uintr/manpages/1_register_receiver.txt | 122 +++
.../uintr/manpages/2_unregister_receiver.txt | 62 ++
tools/uintr/manpages/3_create_fd.txt | 104 +++
tools/uintr/manpages/4_register_sender.txt | 121 +++
tools/uintr/manpages/5_unregister_sender.txt | 79 ++
tools/uintr/manpages/6_wait.txt | 59 ++
42 files changed, 2626 insertions(+), 8 deletions(-)
create mode 100644 Documentation/x86/user-interrupts.rst
create mode 100644 arch/x86/include/asm/uintr.h
create mode 100644 arch/x86/kernel/uintr_core.c
create mode 100644 arch/x86/kernel/uintr_fd.c
create mode 100644 tools/testing/selftests/x86/uintr.c
create mode 100644 tools/uintr/manpages/0_overview.txt
create mode 100644 tools/uintr/manpages/1_register_receiver.txt
create mode 100644 tools/uintr/manpages/2_unregister_receiver.txt
create mode 100644 tools/uintr/manpages/3_create_fd.txt
create mode 100644 tools/uintr/manpages/4_register_sender.txt
create mode 100644 tools/uintr/manpages/5_unregister_sender.txt
create mode 100644 tools/uintr/manpages/6_wait.txt
base-commit: 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
--
2.33.0
From: Stefan Berger <stefanb(a)linux.ibm.com>
This series of patches fixes two issues with TPM2 selftest.
- Determines available PCR banks for use by test cases
- Resets DA lock on TPM2 to avoid subsequent test failures
Stefan
v4:
- Switch to query TPM2_GET_CAP to determine the available PCR banks
- Moved call to reset DA lock into finally branch at end of test
- Dropped patch 3
v3:
- Mention SHA-256 PCR bank as alternative in patch 1 description
v2:
- Clarified patch 1 description
- Added patch 3 with support for SHA-384 and SHA-512
Stefan Berger (2):
selftests: tpm2: Determine available PCR bank
selftests: tpm2: Reset the dictionary attack lock
tools/testing/selftests/tpm2/tpm2.py | 31 ++++++++++++++++++++++
tools/testing/selftests/tpm2/tpm2_tests.py | 31 ++++++++++++++++------
2 files changed, 54 insertions(+), 8 deletions(-)
--
2.31.1
$ ./fcnal-test.sh -t help
Test names: help
Looks it intent to list the available tests but it didn't do the right
thing. I will add another option the do that in the later patch.
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
---
tools/testing/selftests/net/fcnal-test.sh | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh
index 7f5b265fcb90..5cb59947eed2 100755
--- a/tools/testing/selftests/net/fcnal-test.sh
+++ b/tools/testing/selftests/net/fcnal-test.sh
@@ -4068,8 +4068,6 @@ do
# setup namespaces and config, but do not run any tests
setup) setup; exit 0;;
vrf_setup) setup "yes"; exit 0;;
-
- help) echo "Test names: $TESTS"; exit 0;;
esac
done
--
2.33.0
Hello friend.
You might find it so difficult to remember me, though it is indeed a
very long time, I am much delighted to contact you again after a long
period of time, I remember you despite circumstances that made things
not worked out as we projected then. I want to inform you that the
transaction we're doing together then finally worked out and I decided
to contact you and to let you know because of your tremendous effort
to make things work out then.
Meanwhile I must inform you that I'm presently in Caribbean Island for
numerous business negotiation with some partners. with my sincere
heart i have decided to compensate you with USD$900,000 for your
dedication then on our transaction, you tried so much that period and
I appreciated your effort. I wrote a cheque/check on your name, as
soon as you receive it, you let me know.
Contact my secretary now on his email: mchristophdaniel(a)gmail.com
Name: Mr. Christoph Daniel
You are to forward to him your Name........ Address.......,Phone
number......for shipment/dispatch of the cheque/Check to you
Regards,
Mr. Marcus Galois
timeout in settings is used by each case under the same directory, so
it should adapt to the maximum runtime.
A normally running net/fib_nexthops.sh may be killed by this unsuitable
timeout. Furthermore, since the defect[1] of kselftests framework,
net/fib_nexthops.sh which might take at least (300 * 4) seconds would
block the whole kselftests framework previously.
$ git grep -w 'sleep 300' tools/testing/selftests/net
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
Enlarge the timeout by plus 300 based on the obvious largest runtime
to avoid the blocking.
[1]: https://www.spinics.net/lists/kernel/msg4185370.html
Signed-off-by: Zhou Jie <zhoujie2011(a)fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
---
tools/testing/selftests/net/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/settings b/tools/testing/selftests/net/settings
index 694d70710ff0..dfc27cdc6c05 100644
--- a/tools/testing/selftests/net/settings
+++ b/tools/testing/selftests/net/settings
@@ -1 +1 @@
-timeout=300
+timeout=1500
--
2.33.0
v9:
- Add a new patch 1 to remove the child cpuset restriction on parent's
"cpuset.cpus".
- Relax initial root partition entry limitation to allow cpuset.cpus to
overlap that of parent's.
- An "isolated invalid" displayed type is added to
cpuset.cpus.partition.
- Resetting partition root to "member" will leave child partition root
as invalid.
- Update documentation and test accordingly.
v8:
- Reorganize the patch series and rationalize the features and
constraints of a partition.
- Update patch descriptions and documentation accordingly.
v7:
- Simplify the documentation patch (patch 5) as suggested by Tejun.
- Fix a typo in patch 2 and improper commit log in patch 3.
This patchset includes one bug fix and four enhancements to the cpuset v2 code.
Patch 1: Allow parent to set "cpuset.cpus" that may not be a superset
of children's "cpuset.cpus" for default hierarchy.
Patch 2: Enable partition with no task to have empty cpuset.cpus.effective.
Patch 3: Refining the features and constraints of a cpuset partition
clarifying what changes are allowed.
Patch 4: Add a new partition state "isolated" to create a partition
root without load balancing. This is for handling intermitten workloads
that have a strict low latency requirement.
Patch 5: Enable the "cpuset.cpus.partition" file to show the reason
that causes invalid partition like "root invalid (No cpu available
due to hotplug)".
Patch 6 updates the cgroup-v2.rst file accordingly. Patch 7 adds a new
cpuset test to test the new cpuset partition code.
Waiman Long (7):
cgroup/cpuset: Don't let child cpusets restrict parent in default
hierarchy
cgroup/cpuset: Allow no-task partition to have empty
cpuset.cpus.effective
cgroup/cpuset: Refining features and constraints of a partition
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Update description of cpuset.cpus.partition in
cgroup-v2.rst
kselftest/cgroup: Add cpuset v2 partition root state test
Documentation/admin-guide/cgroup-v2.rst | 168 +++--
kernel/cgroup/cpuset.c | 440 +++++++-----
tools/testing/selftests/cgroup/Makefile | 5 +-
.../selftests/cgroup/test_cpuset_prs.sh | 667 ++++++++++++++++++
tools/testing/selftests/cgroup/wait_inotify.c | 87 +++
5 files changed, 1142 insertions(+), 225 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c
--
2.27.0
The timeout setting for the rtc kselftest is currently 90 seconds.
However, two of the tests set alarms, which take one minute to complete
each. So the timeout should be at least 120. Set it to 180, so that all
tests are able to complete and still have some slack.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
This issue was discovered as part of adding the rtc kselftest to run on KernelCI
for the rk3399-gru-kevin device, which uses rtc-cros-ec as the RTC driver.
The output log with the current timeout is shown in [1]. As can be seen, the
whole test times out before the alarm_wkalm_set_minute test has had a chance to
complete:
# # RUN rtc.alarm_wkalm_set_minute ...
# # rtctest.c:294:alarm_wkalm_set_minute:Alarm time now set to 11/01/2022 23:03:00.
#
not ok 1 selftests: rtc: rtctest # TIMEOUT 90 seconds
With the increased timeout, as shown in [2], the alarm_wkalm_set_minute test
does complete its run:
# # RUN rtc.alarm_wkalm_set_minute ...
# # rtctest.c:294:alarm_wkalm_set_minute:Alarm time now set to 12/01/2022 15:54:00.
# # OK rtc.alarm_wkalm_set_minute
# ok 7 rtc.alarm_wkalm_set_minute
# # FAILED: 6 / 7 tests passed.
The fact that the alarm_alm_set_minute test times out on its own is probably an
issue with the rtc-cros-ec driver. Still, since the tests are independent, all
of them should be able to run regardless of how long each one takes (so,
assuming the worst case scenario).
[1] https://lava.collabora.co.uk/scheduler/job/5409783
[2] https://lava.collabora.co.uk/scheduler/job/5415176
tools/testing/selftests/rtc/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rtc/settings b/tools/testing/selftests/rtc/settings
index ba4d85f74cd6..a953c96aa16e 100644
--- a/tools/testing/selftests/rtc/settings
+++ b/tools/testing/selftests/rtc/settings
@@ -1 +1 @@
-timeout=90
+timeout=180
--
2.34.1
Every KUNIT_ASSERT/EXPECT() invocation puts a `kunit_assert` object onto
the stack. The most common one is `kunit_binary_assert` which is 88
bytes on UML. So in the cases where the compiler doesn't optimize this
away, we can very quickly blow up the stack size.
This series implements Linus' suggestion in [1].
Namely, we split out the file, line number, and assert_type
(EXPECT/ASSERT) out of kunit_assert.
We can also drop the entirely unused `struct kunit *test` field, saving
a bit more space as well.
All together, sizeof(struct kunit_assert) went from 48 to 24 on UML.
Note: the other assert types are bigger, see [2].
This series also adds in an example test that uses all the base
KUNIT_EXPECT macros to both advertise their existence to new users and
serve as a smoketest for all these changes here.
[1] https://groups.google.com/g/kunit-dev/c/i3fZXgvBrfA/m/VULQg1z6BAAJ
[2] e.g. consider the most commonly used assert (also the biggest)
struct kunit_binary_assert {
struct kunit_assert assert;
const char *operation;
const char *left_text;
long long left_value;
const char *right_text;
long long right_value;
};
So sizeof(struct kunit_binary_assert) = went from 88 to 64.
I.e. only a 27% reduction instead of 50% in the most common case.
All 3 of the `const char*` could be split out into a `static` var as well,
but that's a bit trickier to do with how all the macros are written.
=== Changelog ===
v1 -> v2:
* made the new example test more focused on documenting the macros
rather than using them all as a smoketest
* s/kunit_failed_assertion()/kunit_do_failed_assertion()
* added `unlikely()` to `if(!(pass))` check in KUNIT_ASSERTION()
Daniel Latypov (6):
kunit: add example test case showing off all the expect macros
kunit: move check if assertion passed into the macros
kunit: drop unused kunit* field in kunit_assert
kunit: factor out kunit_base_assert_format() call into kunit_fail()
kunit: split out part of kunit_assert into a static const
kunit: drop unused assert_type from kunit_assert and clean up macros
include/kunit/assert.h | 88 +++++++++++-----------------------
include/kunit/test.h | 53 ++++++++++----------
lib/kunit/assert.c | 15 ++----
lib/kunit/kunit-example-test.c | 42 ++++++++++++++++
lib/kunit/test.c | 27 +++++------
5 files changed, 117 insertions(+), 108 deletions(-)
base-commit: ad659ccb5412874c6a89d3588cb18857c00e9d0f
--
2.34.1.575.g55b058a8bb-goog
Every KUNIT_ASSERT/EXPECT() invocation puts a `kunit_assert` object onto
the stack. The most common one is `kunit_binary_assert` which is 88
bytes on UML. So in the cases where the compiler doesn't optimize this
away, we can very quickly blow up the stack size.
This series implements Linus' suggestion in [1].
Namely, we split out the file, line number, and assert_type
(EXPECT/ASSERT) out of kunit_assert.
We can also drop the entirely unused `struct kunit *test` field, saving
a bit more space as well.
All together, sizeof(struct kunit_assert) went from 48 to 24 on UML.
Note: the other assert types are bigger, see [2].
This series also adds in an example test that uses all the KUNIT_EXPECT
macros to both advertise their existence to new users and serve as a
smoketest for all these changes here.
[1] https://groups.google.com/g/kunit-dev/c/i3fZXgvBrfA/m/VULQg1z6BAAJ
[2] e.g. consider the most commonly used assert (also the biggest)
struct kunit_binary_assert {
struct kunit_assert assert;
const char *operation;
const char *left_text;
long long left_value;
const char *right_text;
long long right_value;
};
So sizeof(struct kunit_binary_assert) = went from 88 to 64.
I.e. only a 27% reduction instead of 50% in the most common case.
All 3 of the `const char*` could be split out into a `static` var as well,
but that's a bit trickier to do with how all the macros are written.
Daniel Latypov (6):
kunit: add example test case showing off all the expect macros
kunit: move check if assertion passed into the macros
kunit: drop unused kunit* field in kunit_assert
kunit: factor out kunit_base_assert_format() call into kunit_fail()
kunit: split out part of kunit_assert into a static const
kunit: drop unused assert_type from kunit_assert and clean up macros
include/kunit/assert.h | 88 +++++++++++-----------------------
include/kunit/test.h | 52 ++++++++++----------
lib/kunit/assert.c | 15 ++----
lib/kunit/kunit-example-test.c | 46 ++++++++++++++++++
lib/kunit/test.c | 27 +++++------
5 files changed, 120 insertions(+), 108 deletions(-)
base-commit: ad659ccb5412874c6a89d3588cb18857c00e9d0f
--
2.34.1.575.g55b058a8bb-goog
Dzień dobry,
dostrzegam możliwość współpracy z Państwa firmą.
Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%.
Czy są Państwo zainteresowani weryfikacją wstępnych propozycji?
Pozdrawiam,
Jakub Daroch
Hi Linus,
Please pull the following KUnit update for Linux 5.17-rc1.
This KUnit update for Linux 5.17-rc1 consists of several fixes and
enhancements. A few highlights:
- Option --kconfig_add option allows easily tweaking kunitconfigs
- make build subcommand can reconfigure if needed
- doesn't error on tests without test plans
- doesn't crash if no parameters are generated
- defaults --jobs to # of cups
- reports test parameter results as (K)TAP subtests
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf:
Linux 5.16-rc1 (2021-11-14 13:56:52 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-5.17-rc1
for you to fetch changes up to ad659ccb5412874c6a89d3588cb18857c00e9d0f:
kunit: tool: Default --jobs to number of CPUs (2021-12-15 16:44:55 -0700)
----------------------------------------------------------------
linux-kselftest-kunit-5.17-rc1
This KUnit update for Linux 5.17-rc1 consists of several fixes and
enhancements. A few highlights:
- Option --kconfig_add option allows easily tweaking kunitconfigs
- make build subcommand can reconfigure if needed
- doesn't error on tests without test plans
- doesn't crash if no parameters are generated
- defaults --jobs to # of cups
- reports test parameter results as (K)TAP subtests
----------------------------------------------------------------
Daniel Latypov (13):
kunit: tool: fix --json output for skipped tests
Documentation: kunit: remove claims that kunit is a mocking framework
kunit: add run_checks.py script to validate kunit changes
kunit: tool: print parsed test results fully incrementally
kunit: tool: move Kconfig read_from_file/parse_from_string to package-level
kunit: tool: add --kconfig_add to allow easily tweaking kunitconfigs
kunit: tool: revamp message for invalid kunitconfig
kunit: tool: reconfigure when the used kunitconfig changes
kunit: tool: suggest using decode_stacktrace.sh on kernel crash
kunit: tool: use dataclass instead of collections.namedtuple
kunit: tool: delete kunit_parser.TestResult type
kunit: tool: make `build` subcommand also reconfigure if needed
kunit: tool: fix newly introduced typechecker errors
David Gow (5):
kunit: tool: Do not error on tests without test plans
kunit: tool: Report an error if any test has no subtests
kunit: Don't crash if no parameters are generated
kunit: Report test parameter results as (K)TAP subtests
kunit: tool: Default --jobs to number of CPUs
Documentation/dev-tools/kunit/api/index.rst | 3 +-
Documentation/dev-tools/kunit/api/test.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 2 +-
Documentation/dev-tools/kunit/start.rst | 8 +-
lib/kunit/test.c | 25 +--
tools/testing/kunit/kunit.py | 182 ++++++++++++---------
tools/testing/kunit/kunit_config.py | 61 +++----
tools/testing/kunit/kunit_json.py | 8 +-
tools/testing/kunit/kunit_kernel.py | 76 ++++++---
tools/testing/kunit/kunit_parser.py | 57 ++++---
tools/testing/kunit/kunit_tool_test.py | 171 ++++++++++++++++---
tools/testing/kunit/run_checks.py | 81 +++++++++
.../test_is_test_passed-no_tests_no_plan.log | 7 +
13 files changed, 480 insertions(+), 204 deletions(-)
create mode 100755 tools/testing/kunit/run_checks.py
create mode 100644 tools/testing/kunit/test_data/test_is_test_passed-no_tests_no_plan.log
----------------------------------------------------------------
Hi Linus,
Please pull these seccomp selftest updates for v5.17-rc1. The core
seccomp code hasn't changed for this cycle, but the selftests were
improved while helping to debug the recent signal handling refactoring
work Eric did.
Thanks!
-Kees
The following changes since commit d9bbdbf324cda23aa44873f505be77ed4b61d79c:
x86: deduplicate the spectre_v2_user documentation (2021-10-04 12:12:57 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/seccomp-v5.17-rc1
for you to fetch changes up to 1e6d69c7b9cd7735bbf4c6754ccbb9cce8bd8ff4:
selftests/seccomp: Report event mismatches more clearly (2021-11-03 12:02:07 -0700)
----------------------------------------------------------------
seccomp updates for v5.17-rc1
- Improve seccomp selftests in support of signal handler refactoring (Kees Cook)
----------------------------------------------------------------
Kees Cook (2):
selftests/seccomp: Stop USER_NOTIF test if kcmp() fails
selftests/seccomp: Report event mismatches more clearly
tools/testing/selftests/seccomp/seccomp_bpf.c | 56 ++++++++++++++++++++++++---
1 file changed, 50 insertions(+), 6 deletions(-)
--
Kees Cook
Attn,Dear
I need you to know that the fear of the LORD is
the beginning of wisdom, and knowledge of the Holy One is
understanding. As power of God Most High. And This is the confidence
we have in approaching God, that if we ask anything according to his
will, he hears us. I will make you know that Slow and steady wins the race.
It is your turn to receive your overdue compensation funds total
amount $18.5Milion USD.
I actualized that you will receive your transfer today without any more delay
No More fee OK, Believe me , I am your Attorney standing here on your favor.
I just concluded conversation with the Gt Bank Director, Mrs Mary Gate
And She told me that your transfer is ready today
So the Bank Asked you to contact them immediately by re-confirming
your Bank details asap.
Because this is the Only thing holding this transfer
If you did not trust me and Mrs Mary Gate,Who Else will you Trust?
For we are the ones trying to protect your funds here
and make sure that your funds is secure.
So Promisingly, I am here to assure you, that Grate Miracle is coming on
your way, and this funds total amount of $18.500,000 is your
compensation, entitlement inheritance overdue funds on your name.
Which you cannot let anything delay you from receiving your funds now,
Finally i advised you to try your possible best and contact Gt Bank Benin
once you get this message to receive your transfer $18.5 USD today.
I know that a journey of thousand miles begins with a single step.
Always put your best foot forward
Try as hard as you can, God give you best.
take my advice and follow the due process of your payment, the
transfer will be released to
you smoothly without any hitches or hindrance.
Contact DR.MRS MARY GATE, Director Gt bank-Benin to receive your
transfer amount of $18.5m US Dollars
It was deposited and registered to your name this morning.
Contact the Bank now to know when they will transfer to your
country today
Email id: gtbank107(a)yahoo.com
Tel/mobile, +229 99069872
Contact person, Mrs Mary Gate,Director Gt bank-Benin.
Among the blind the one-eyed man is king
As you sow, so you shall reap, i want you to receive your funds
Best things in life are free
Send to her your Bank Details as i listed here.
Your account name-------------
Your Bank Name----------------
Account Number----------
your Bank address----------
Country-----------
Your private phone number---------
Routing Numbers-------------
Swift Code-----------
Note, Your funds is %100 Percent ready for
transfer.
Everything you do remember that Good things come to those who wait.
I have done this work for you with my personally effort, Honesty is
the best policy.
now your transfer is currently deposited with paying bank this morning.
It is by the grace of God that I received Christ, having known the truth.
I had no choice than to do what is lawful and justice in the
sight of God for eternal life and in the sight of man for witness of
God & His Mercies and glory upon my life.
send this needed bank details to the bank today, so that you receive
your transfer today as
it is available for your confirmation today.
Please do your best as a serious person and send the fee urgent, Note
that this transfer of $18.500.000 M USD is a Gift from God to Bless
you.
If you did not contact the bank urgent, finally the Bank will release
your transfer of $18.500.000M USD to Mr. David Bollen as your
representative.
So not allow another to claim your Money.
Thanks For your Understanding.
Barr Robert Richter, UN Attorney At Law Court-Benin
Hello Dear,
how are you today,I hope you are doing great. It is my great pleasure
to contact you,I want to make a new and special friend,I hope you
don't mind. My name is Tracy Williams
from the United States, Am a french and English nationality. I will
give you pictures and more details about my self as soon as i hear
from you in my email account bellow,
Here is my email address; drtracywilliams89(a)gmail.com
Please send your reply to my PRIVATE mail box.
Thanks,
Tracy Williams.
Fix a typo: actualy -> actual
Signed-off-by: Qinghua Jin <qhjin.dev(a)gmail.com>
---
Documentation/dev-tools/kunit/usage.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 63f1bb89ebf5..b9940758787c 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -615,7 +615,7 @@ kunit_tool) only fully supports running tests inside of UML and QEMU; however,
this is only due to our own time limitations as humans working on KUnit. It is
entirely possible to support other emulators and even actual hardware, but for
now QEMU and UML is what is fully supported within the KUnit Wrapper. Again, to
-be clear, this is just the Wrapper. The actualy KUnit tests and the KUnit
+be clear, this is just the Wrapper. The actual KUnit tests and the KUnit
library they are written in is fully architecture agnostic and can be used in
virtually any setup, you just won't have the benefit of typing a single command
out of the box and having everything magically work perfectly.
--
2.30.2
From: Menglong Dong <imagedong(a)tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
The second patch use C99 initializers in test_sock.c
The third patch is the selftests for this modification.
Changes since v4:
- use C99 initializers in test_sock.c before adding the test case
Changes since v3:
- add the third patch which use C99 initializers in test_sock.c
Changes since v2:
- NULL check for sk->sk_prot->put_port
Changes since v1:
- introduce 'put_port' field for 'struct proto'
- add selftests for it
Menglong Dong (3):
net: bpf: handle return value of
BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
bpf: selftests: use C99 initializers in test_sock.c
bpf: selftests: add bind retry for post_bind{4, 6}
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 +
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 +
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
tools/testing/selftests/bpf/test_sock.c | 370 ++++++++++++++----------
10 files changed, 233 insertions(+), 148 deletions(-)
--
2.27.0
The hugetlb cgroup reservation test charge_reserved_hugetlb.sh assume
that no cgroup filesystems are mounted before running the test. That is
not true in many cases. As a result, the test fails to run. Fix that by
querying the current cgroup mount setting and using the existing cgroup
setup instead before attempting to freshly mount a cgroup filesystem.
Similar change is also made for hugetlb_reparenting_test.sh as well,
though it still has problem if cgroup v2 isn't used.
The patched test scripts were run on a centos 8 based system to verify
that they ran properly.
Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
.../selftests/vm/charge_reserved_hugetlb.sh | 34 +++++++++++--------
.../selftests/vm/hugetlb_reparenting_test.sh | 21 +++++++-----
.../selftests/vm/write_hugetlb_memory.sh | 2 +-
3 files changed, 34 insertions(+), 23 deletions(-)
mode change 100644 => 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh
mode change 100644 => 100755 tools/testing/selftests/vm/hugetlb_reparenting_test.sh
mode change 100644 => 100755 tools/testing/selftests/vm/write_hugetlb_memory.sh
diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
old mode 100644
new mode 100755
index fe8fcfb334e0..a5cb4b09a46c
--- a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
@@ -24,19 +24,23 @@ if [[ "$1" == "-cgroup-v2" ]]; then
reservation_usage_file=rsvd.current
fi
-cgroup_path=/dev/cgroup/memory
-if [[ ! -e $cgroup_path ]]; then
- mkdir -p $cgroup_path
- if [[ $cgroup2 ]]; then
+if [[ $cgroup2 ]]; then
+ cgroup_path=$(mount -t cgroup2 | head -1 | awk -e '{print $3}')
+ if [[ -z "$cgroup_path" ]]; then
+ cgroup_path=/dev/cgroup/memory
mount -t cgroup2 none $cgroup_path
- else
+ do_umount=1
+ fi
+ echo "+hugetlb" >$cgroup_path/cgroup.subtree_control
+else
+ cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk -e '{print $3}')
+ if [[ -z "$cgroup_path" ]]; then
+ cgroup_path=/dev/cgroup/memory
mount -t cgroup memory,hugetlb $cgroup_path
+ do_umount=1
fi
fi
-
-if [[ $cgroup2 ]]; then
- echo "+hugetlb" >/dev/cgroup/memory/cgroup.subtree_control
-fi
+export cgroup_path
function cleanup() {
if [[ $cgroup2 ]]; then
@@ -108,7 +112,7 @@ function setup_cgroup() {
function wait_for_hugetlb_memory_to_get_depleted() {
local cgroup="$1"
- local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
+ local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
# Wait for hugetlbfs memory to get depleted.
while [ $(cat $path) != 0 ]; do
echo Waiting for hugetlb memory to get depleted.
@@ -121,7 +125,7 @@ function wait_for_hugetlb_memory_to_get_reserved() {
local cgroup="$1"
local size="$2"
- local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
+ local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
# Wait for hugetlbfs memory to get written.
while [ $(cat $path) != $size ]; do
echo Waiting for hugetlb memory reservation to reach size $size.
@@ -134,7 +138,7 @@ function wait_for_hugetlb_memory_to_get_written() {
local cgroup="$1"
local size="$2"
- local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
+ local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
# Wait for hugetlbfs memory to get written.
while [ $(cat $path) != $size ]; do
echo Waiting for hugetlb memory to reach size $size.
@@ -574,5 +578,7 @@ for populate in "" "-o"; do
done # populate
done # method
-umount $cgroup_path
-rmdir $cgroup_path
+if [[ $do_umount ]]; then
+ umount $cgroup_path
+ rmdir $cgroup_path
+fi
diff --git a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
old mode 100644
new mode 100755
index 4a9a3afe9fd4..bf2d2a684edf
--- a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
@@ -18,19 +18,24 @@ if [[ "$1" == "-cgroup-v2" ]]; then
usage_file=current
fi
-CGROUP_ROOT='/dev/cgroup/memory'
-MNT='/mnt/huge/'
-if [[ ! -e $CGROUP_ROOT ]]; then
- mkdir -p $CGROUP_ROOT
- if [[ $cgroup2 ]]; then
+if [[ $cgroup2 ]]; then
+ CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk -e '{print $3}')
+ if [[ -z "$CGROUP_ROOT" ]]; then
+ CGROUP_ROOT=/dev/cgroup/memory
mount -t cgroup2 none $CGROUP_ROOT
- sleep 1
- echo "+hugetlb +memory" >$CGROUP_ROOT/cgroup.subtree_control
- else
+ do_umount=1
+ fi
+ echo "+hugetlb +memory" >$CGROUP_ROOT/cgroup.subtree_control
+else
+ CGROUP_ROOT=$(mount -t cgroup | grep ",hugetlb" | awk -e '{print $3}')
+ if [[ -z "$CGROUP_ROOT" ]]; then
+ CGROUP_ROOT=/dev/cgroup/memory
mount -t cgroup memory,hugetlb $CGROUP_ROOT
+ do_umount=1
fi
fi
+MNT='/mnt/huge/'
function get_machine_hugepage_size() {
hpz=$(grep -i hugepagesize /proc/meminfo)
diff --git a/tools/testing/selftests/vm/write_hugetlb_memory.sh b/tools/testing/selftests/vm/write_hugetlb_memory.sh
old mode 100644
new mode 100755
index d3d0d108924d..70a02301f4c2
--- a/tools/testing/selftests/vm/write_hugetlb_memory.sh
+++ b/tools/testing/selftests/vm/write_hugetlb_memory.sh
@@ -14,7 +14,7 @@ want_sleep=$8
reserve=$9
echo "Putting task in cgroup '$cgroup'"
-echo $$ > /dev/cgroup/memory/"$cgroup"/cgroup.procs
+echo $$ > ${cgroup_path:-/dev/cgroup/memory}/"$cgroup"/cgroup.procs
echo "Method is $method"
--
2.27.0
While building selftests the following warnings were noticed for arm
architecture on Linux stable v5.15.13 kernel and also on Linus's tree.
arm-linux-gnueabihf-gcc -Wall -Wl,--no-as-needed -O2 -g
-I../../../../usr/include/ txtimestamp.c -o
/home/tuxbuild/.cache/tuxmake/builds/current/kselftest/net/txtimestamp
txtimestamp.c: In function 'validate_timestamp':
txtimestamp.c:164:29: warning: format '0' expects argument of type
'long unsigned int', but argument 3 has type 'int64_t' {aka 'long long
int'} [-Wformat=]
164 | fprintf(stderr, "ERROR: 0 us expected between 0 and 0\n",
| ~~^
| |
| long unsigned int
| 0
165 | cur64 - start64, min_delay, max_delay);
| ~~~~~~~~~~~~~~~
| |
| int64_t {aka long long int}
txtimestamp.c: In function '__print_ts_delta_formatted':
txtimestamp.c:173:22: warning: format '0' expects argument of type
'long unsigned int', but argument 3 has type 'int64_t' {aka 'long long
int'} [-Wformat=]
173 | fprintf(stderr, "0 ns", ts_delta);
| ~~^ ~~~~~~~~
| | |
| | int64_t {aka long long int}
| long unsigned int
| 0
txtimestamp.c:175:22: warning: format '0' expects argument of type
'long unsigned int', but argument 3 has type 'int64_t' {aka 'long long
int'} [-Wformat=]
175 | fprintf(stderr, "0 us", ts_delta / NSEC_PER_USEC);
| ~~^
| |
| long unsigned int
| 0
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
build link:
https://builds.tuxbuild.com/23HFntxpqyCx0RbiuadfGZ36Kym/
metadata:
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
git commit: 734eb1fd2073f503f5c6b44f1c0d453ca6986b84
git describe: v5.15.13
toolchain: gcc-11
kernel-config: https://builds.tuxbuild.com/23HFntxpqyCx0RbiuadfGZ36Kym/config
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
tuxmake --runtime podman --target-arch arm --toolchain gcc-10 \
--kconfig https://builds.tuxbuild.com/23HFntxpqyCx0RbiuadfGZ36Kym/config \
dtbs dtbs-legacy headers kernel kselftest kselftest-merge modules
--
Linaro LKFT
https://lkft.linaro.org
From: Menglong Dong <imagedong(a)tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
The second patch is the selftests for this modification.
The third patch use C99 initializers in test_sock.c.
Changes since v3:
- add the third patch which use C99 initializers in test_sock.c
Changes since v2:
- NULL check for sk->sk_prot->put_port
Changes since v1:
- introduce 'put_port' field for 'struct proto'
- add selftests for it
Menglong Dong (3):
net: bpf: handle return value of
BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
bpf: selftests: add bind retry for post_bind{4, 6}
bpf: selftests: use C99 initializers in test_sock.c
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 +
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 +
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
tools/testing/selftests/bpf/test_sock.c | 370 ++++++++++++++----------
10 files changed, 233 insertions(+), 148 deletions(-)
--
2.27.0
Thanks a lot for all the review comments and guidance! Hope this
version is in a good state now. :)
(Jing is temporarily leave for family reason, Yang helped work out
this version)
----
v4->v5:
- Directly call fpu core to expand fpstate buffer in kvm_check_cpuid()
and remove duplicated permission check there (Sean)
- Accordingly remove Thomas's reviewed-by as a different wrapper is
introduced now (patch-7)
- Properly queue #NM exception in nested scenario (Sean)
- Verify non-XFD related #NM usage in nested scenario (Sean)
- Hide XFD in kvm_cpu_cap on 32bit host kernels (Sean)
- Use xstate_required_size() in KVM_CAP_XSAVE2 which may be called
before any vcpu is created (Sean/Paolo)
- Replace boot_cpu_has with kvm_cpu_cap_has when disabling RDMSR
interception for xfd_err (Sean)
v3->v4:
- Verify kvm selftest for AMX (Paolo)
- Expand fpstate buffer in kvm_check_cpuid() and improve patch
description (Sean)
- Drop 'preemption' word in #NM interception patch (Sean)
- Remove 'trap_nm' flag. Replace it by: (Sean)
* Trapping #NM according to guest_fpu::xfd when write to xfd is
intercepted.
* Always trapping #NM when xfd write interception is disabled
- Use better name for #NM related functions (Sean)
- Drop '#ifdef CONFIG_X86_64' in __kvm_set_xcr (Sean)
- Update description for KVM_CAP_XSAVE2 and prevent the guest from
using the wrong ioctl (Sean)
- Replace 'xfd_out_of_sync' with a better name (Sean)
v2->v3:
- Trap #NM until write IA32_XFD with a non-zero value (Thomas)
- Revise return value in __xstate_request_perm() (Thomas)
- Revise doc for KVM_GET_SUPPORTED_CPUID (Paolo)
- Add Thomas's reviewed-by on one patch
- Reorder disabling read interception of XFD_ERR patch (Paolo)
- Move disabling r/w interception of XFD from x86.c to vmx.c (Paolo)
- Provide the API doc together with the new KVM_GET_XSAVE2 ioctl (Paolo)
- Make KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) return minimum size of struct
kvm_xsave (4K) (Paolo)
- Request permission at the start of vm_create_with_vcpus() in selftest
- Request permission conditionally when XFD is supported (Paolo)
v1->v2:
- Live migration supported and verified with a selftest
- Rebase to Thomas's new series for guest fpstate reallocation [1]
- Expand fpstate at KVM_SET_CPUID2 instead of when emulating XCR0
and IA32_XFD (Thomas/Paolo)
- Accordingly remove all exit-to-userspace stuff
- Intercept #NM to save guest XFD_ERR and restore host/guest value
at preemption on/off boundary (Thomas)
- Accordingly remove all xfd_err logic in preemption callback and
fpu_swap_kvm_fpstate()
- Reuse KVM_SET_XSAVE to handle both legacy and expanded buffer (Paolo)
- Don't return dynamic bits w/o prctl() in KVM_GET_SUPPORTED_CPUID (Paolo)
- Check guest permissions for dynamic features in CPUID[0xD] instead
of only for AMX at KVM_SET_CPUID (Paolo)
- Remove dynamic bit check for 32-bit guest in __kvm_set_xcr() (Paolo)
- Fix CPUID emulation for 0x1d and 0x1e (Paolo)
- Move "disable interception" to the end of the series (Paolo)
This series brings AMX (Advanced Matrix eXtensions) virtualization support to
KVM. The preparatory series from Thomas [1] is also included.
A large portion of the changes in this series is to deal with eXtended Feature
Disable (XFD) which allows resizing of the fpstate buffer to support
dynamically-enabled XSTATE features with large state component (e.g. 8K for AMX).
There are a lot of simplications when comparing v5 to the original proposal [2]
and the first version [3]. Thanks to Thomas, Paolo and Sean for many good
suggestions.
The support is based on following key changes:
- Guest permissions for dynamically-enabled XSAVE features
Native tasks have to request permission via prctl() before touching
a dynamic-resized XSTATE compoenent. Introduce guest permissions
for the similar purpose. Userspace VMM is expected to request guest
permission only once when the first vCPU is created.
KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest
cpuid w/o proper permissions fails this operation. In the meantime,
unpermitted features are also excluded in KVM_GET_SUPPORTED_CPUID.
- Extend fpstate reallocation mechanism to cover guest fpu
Unlike native tasks which have reallocation triggered from #NM
handler, guest fpstate reallocation is requested by KVM when it
identifies the intention on using dynamically-enabled XSAVE
features inside guest.
Extend fpu core to allow KVM request fpstate buffer expansion
for a guest fpu containter.
- Trigger fpstate reallocation in KVM
This could be done either before guest runs or until xfd is updated
in the emulation path. According to discussion [1] we decide to
go the former option in KVM_SET_CPUID2, with fpstate buffer sized
accordingly. This spares a lot of code and also avoid imposing an
ordered restore sequence (XCR0, XFD and XSTATE) to userspace VMM.
- RDMSR/WRMSR emulation for IA32_XFD
Because fpstate expansion is completed in KVM_SET_CPUID2, emulating
r/w access to IA32_XFD simply involves the xfd field in the guest
fpu container. If write and guest fpu is currently active, the
software state (guest_fpstate::xfd and per-cpu xfd cache) is also
updated.
- RDMSR/WRMSR emulation for XFD_ERR
When XFD causes an instruction to generate #NM, XFD_ERR contains
information about which disabled state components are being accessed.
It'd be problematic if the XFD_ERR value generated in guest is
consumed/clobbered by the host before the guest itself doing so.
Intercept #NM exception to save the guest XFD_ERR value when write
IA32_XFD with a non-zero value for 1st time. There is at most one
interception per guest task given a dynamic feature.
RDMSR/WRMSR emulation uses the saved value. The host value (always
ZERO outside of the host #NM handler) is restored before enabling
interrupts. The saved guest value is restored right before entering
the guest (with interrupts disabled).
- Get/set dynamic xfeature state for migration
Introduce new capability (KVM_CAP_XSAVE2) to deal with >4KB fpstate
buffer. Reading this capability returns the size of the current
guest fpstate (e.g. after expansion). Userspace VMM uses a new ioctl
(KVM_GET_XSAVE2) to read guest fpstate from the kernel and reuses
the existing ioctl (KVM_SET_XSAVE) to update guest fpsate to the
kernel. KVM_SET_XSAVE is extended to do properly_sized memdup_user()
based on the guest fpstate.
- Expose related cpuid bits to guest
The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and
AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally
activates all previous logics in this series
- Optimization: disable interception for IA32_XFD
IA32_XFD can be frequently updated by the guest, as it is part of
the task state and swapped in context switch when prev and next have
different XFD setting. Always intercepting WRMSR can easily cause
non-negligible overhead.
Disable r/w emulation for IA32_XFD after intercepting the first
WRMSR(IA32_XFD) with a non-zero value. However MSR passthrough
implies the software state (guest_fpstate::xfd and per-cpu xfd
cache) might be out of sync with MSR. This suggests KVM needs to
re-sync them at VM-exit before preemption is enabled.
Thanks Jun Nakajima and Kevin Tian for the design suggestions when this was
being internally worked on.
[1] https://lore.kernel.org/all/20211214022825.563892248@linutronix.de/
[2] https://www.spinics.net/lists/kvm/msg259015.html
[3] https://lore.kernel.org/lkml/20211208000359.2853257-1-yang.zhong@intel.com/
Thanks,
Yang
----
Guang Zeng (1):
kvm: x86: Add support for getting/setting expanded xstate buffer
Jing Liu (11):
kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID
x86/fpu: Make XFD initialization in __fpstate_reset() a function
argument
kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
kvm: x86: Add emulation for IA32_XFD
x86/fpu: Prepare xfd_err in struct fpu_guest
kvm: x86: Intercept #NM for saving IA32_XFD_ERR
kvm: x86: Emulate IA32_XFD_ERR for guest
kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
kvm: x86: Add XCR0 support for Intel AMX
kvm: x86: Add CPUID support for Intel AMX
Kevin Tian (2):
x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
kvm: x86: Disable interception for IA32_XFD on demand
Sean Christopherson (1):
x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
Thomas Gleixner (5):
x86/fpu: Extend fpu_xstate_prctl() with guest permissions
x86/fpu: Prepare guest FPU for dynamically enabled FPU features
x86/fpu: Add guest support to xfd_enable_feature()
x86/fpu: Add uabi_size to guest_fpu
x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
Wei Wang (1):
kvm: selftests: Add support for KVM_CAP_XSAVE2
Documentation/virt/kvm/api.rst | 46 +++++-
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/fpu/api.h | 11 ++
arch/x86/include/asm/fpu/types.h | 32 ++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/kvm.h | 16 +-
arch/x86/include/uapi/asm/prctl.h | 26 ++--
arch/x86/kernel/fpu/core.c | 94 ++++++++++-
arch/x86/kernel/fpu/xstate.c | 147 +++++++++++-------
arch/x86/kernel/fpu/xstate.h | 15 +-
arch/x86/kernel/process.c | 2 +
arch/x86/kvm/cpuid.c | 86 +++++++---
arch/x86/kvm/cpuid.h | 2 +
arch/x86/kvm/vmx/vmcs.h | 5 +
arch/x86/kvm/vmx/vmx.c | 68 ++++++++
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 112 ++++++++++++-
include/uapi/linux/kvm.h | 4 +
tools/arch/x86/include/uapi/asm/kvm.h | 16 +-
tools/include/uapi/linux/kvm.h | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 2 +
.../selftests/kvm/include/x86_64/processor.h | 10 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++
.../selftests/kvm/lib/x86_64/processor.c | 67 +++++++-
.../testing/selftests/kvm/x86_64/evmcs_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +-
.../testing/selftests/kvm/x86_64/state_test.c | 2 +-
.../kvm/x86_64/vmx_preemption_timer_test.c | 2 +-
28 files changed, 702 insertions(+), 107 deletions(-)