KUnit tests often need to provide a struct device, and thus far have
mostly been using root_device_register() or platform devices to create
a 'fake device' for use with, e.g., code which uses device-managed
resources. This has several disadvantages, including not being designed
for test use, scattering files in sysfs, and requiring manual teardown
on test exit, which may not always be possible in case of failure.
Instead, introduce a set of helper functions which allow devices
(internally a struct kunit_device) to be created and managed by KUnit --
i.e., they will be automatically unregistered on test exit. These
helpers can either use a user-provided struct device_driver, or have one
automatically created and managed by KUnit. In both cases, the device
lives on a new kunit_bus.
This is a follow-up to a previous proposal here:
https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@g…
(The kunit_defer() function in the first patch there has since been
merged as the 'deferred actions' feature.)
My intention is to take this whole series in via the kselftest/kunit
branch, but I'm equally okay with splitting up the later patches which
use this to go via the various subsystem trees in case there are merge
conflicts.
I'd really appreciate any extra scrutiny that can be given to this;
particularly around the device refcounts and whether we can guarantee
that the device will be released at the correct point in the test
cleanup. I've seen a few crashes in kunit_cleanup, but only on some
already flaky/fragile UML/clang/alltests setups, which seem to go away
if I remove the devm_add_action() call (or if I enable any debugging
features / symbols, annoyingly).
Cheers,
-- David
Signed-off-by: David Gow <davidgow(a)google.com>
---
Changes in v2:
- Simplify device/driver/bus matching, removing the no-longer-required
kunit_bus_match function. (Thanks, Greg)
- The return values are both more consistent (kunit_device_register now
returns an explicit error pointer, rather than failing the test), and
better documented.
- Add some basic documentation to the implementations as well as the
headers. The documentation in the headers is still more complete, and
is now properly compiled into the HTML documentation (under
dev-tools/kunit/api/resources.html). (Thanks, Matti)
- Moved the internal-only kunit_bus_init() function to a private header,
lib/kunit/device-impl.h to avoid polluting the public headers, and
match other internal-only headers. (Thanks, Greg)
- Alphabetise KUnit includes in other test modules. (Thanks, Amadeusz.)
- Several code cleanups, particularly around error handling and
allocation. (Thanks Greg, Maxime)
- Several const-correctness and casting improvements. (Thanks, Greg)
- Added a new test to verify KUnit cleanup triggers device cleanup.
(Thanks, Maxime).
- Improved the user-specified device test to verify that probe/remove
hooks are called correctly. (Thanks, Maxime).
- The overflow test no-longer needlessly calls
kunit_device_unregister().
- Several other minor cleanups and documentation improvements, which
hopefully make this a bit clearer and more robust.
- Link to v1: https://lore.kernel.org/r/20231205-kunit_bus-v1-0-635036d3bc13@google.com
---
David Gow (4):
kunit: Add APIs for managing devices
fortify: test: Use kunit_device
overflow: Replace fake root_device with kunit_device
ASoC: topology: Replace fake root_device with kunit_device in tests
Documentation/dev-tools/kunit/api/resource.rst | 9 ++
Documentation/dev-tools/kunit/usage.rst | 50 +++++++
include/kunit/device.h | 80 +++++++++++
lib/fortify_kunit.c | 5 +-
lib/kunit/Makefile | 3 +-
lib/kunit/device.c | 181 +++++++++++++++++++++++++
lib/kunit/kunit-test.c | 134 +++++++++++++++++-
lib/kunit/test.c | 3 +
lib/overflow_kunit.c | 5 +-
sound/soc/soc-topology-test.c | 10 +-
10 files changed, 465 insertions(+), 15 deletions(-)
---
base-commit: c8613be119892ccceffbc550b9b9d7d68b995c9e
change-id: 20230718-kunit_bus-ab19c4ef48dc
Best regards,
--
David Gow <davidgow(a)google.com>
Alter the linker section of KUNIT_TABLE to move it out of INIT_DATA and
into DATA_DATA.
Data for KUnit tests does not need to be in the init section.
In order to run tests again after boot the KUnit data cannot be labeled as
init data as the kernel could write over it.
Add a KUNIT_INIT_TABLE in the next patch for KUnit tests that test init
data/functions.
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
include/asm-generic/vmlinux.lds.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bae0fe4d499b..1107905d37fc 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -370,7 +370,8 @@
BRANCH_PROFILE() \
TRACE_PRINTKS() \
BPF_RAW_TP() \
- TRACEPOINT_STR()
+ TRACEPOINT_STR() \
+ KUNIT_TABLE()
/*
* Data section helpers
@@ -699,8 +700,7 @@
THERMAL_TABLE(governor) \
EARLYCON_TABLE() \
LSM_TABLE() \
- EARLY_LSM_TABLE() \
- KUNIT_TABLE()
+ EARLY_LSM_TABLE()
#define INIT_TEXT \
*(.init.text .init.text.*) \
base-commit: b285ba6f8cc1b2bfece0b4350fdb92c8780bc698
--
2.43.0.472.g3155946c3a-goog
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Add a test that writes longs strings, some over the size of the sub buffer
and make sure that the entire content is there.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v2: https://lore.kernel.org/linux-trace-kernel/20231212151632.25c9b67d@gandalf.…
- Realized with the upcoming change of the dynamic subbuffer sizes, that
this test will fail if the subbuffer is bigger than what the trace_seq
can hold. Now the trace_marker does not always utilize the full subbuffer
but the size of the trace_seq instead. As that size isn't available to
user space, we can only just make sure all content is there.
.../ftrace/test.d/00basic/trace_marker.tc | 82 +++++++++++++++++++
1 file changed, 82 insertions(+)
create mode 100755 tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
new file mode 100755
index 000000000000..b24aff5807df
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
@@ -0,0 +1,82 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Basic tests on writing to trace_marker
+# requires: trace_marker
+# flags: instance
+
+get_buffer_data_size() {
+ sed -ne 's/^.*data.*size:\([0-9][0-9]*\).*/\1/p' events/header_page
+}
+
+get_buffer_data_offset() {
+ sed -ne 's/^.*data.*offset:\([0-9][0-9]*\).*/\1/p' events/header_page
+}
+
+get_event_header_size() {
+ type_len=`sed -ne 's/^.*type_len.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ time_len=`sed -ne 's/^.*time_delta.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ array_len=`sed -ne 's/^.*array.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ total_bits=$((type_len+time_len+array_len))
+ total_bits=$((total_bits+7))
+ echo $((total_bits/8))
+}
+
+get_print_event_buf_offset() {
+ sed -ne 's/^.*buf.*offset:\([0-9][0-9]*\).*/\1/p' events/ftrace/print/format
+}
+
+event_header_size=`get_event_header_size`
+print_header_size=`get_print_event_buf_offset`
+
+data_offset=`get_buffer_data_offset`
+
+marker_meta=$((event_header_size+print_header_size))
+
+make_str() {
+ cnt=$1
+ # subtract two for \n\0 as marker adds these
+ cnt=$((cnt-2))
+ printf -- 'X%.0s' $(seq $cnt)
+}
+
+write_buffer() {
+ size=$1
+
+ str=`make_str $size`
+
+ # clear the buffer
+ echo > trace
+
+ # write the string into the marker
+ echo -n $str > trace_marker
+
+ echo $str
+}
+
+test_buffer() {
+
+ size=`get_buffer_data_size`
+ oneline_size=$((size-marker_meta))
+ echo size = $size
+ echo meta size = $marker_meta
+
+ # Now add a little more the meta data overhead will overflow
+
+ str=`write_buffer $size`
+
+ # Make sure the line was broken
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; exit}' trace`
+
+ if [ "$new_str" = "$str" ]; then
+ exit fail;
+ fi
+
+ # Make sure the entire line can be found
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: */,"");printf "%s", $0; }' trace`
+
+ if [ "$new_str" != "$str" ]; then
+ exit fail;
+ fi
+}
+
+test_buffer
--
2.42.0
When we dynamically generate a name for a configuration in get-reg-list
we use strcat() to append to a buffer allocated using malloc() but we
never initialise that buffer. Since malloc() offers no guarantees
regarding the contents of the memory it returns this can lead to us
corrupting, and likely overflowing, the buffer:
vregs: PASS
vregs+pmu: PASS
sve: PASS
sve+pmu: PASS
vregs+pauth_address+pauth_generic: PASS
X�vr+gspauth_addre+spauth_generi+pmu: PASS
Initialise the buffer to an empty string to avoid this.
Fixes: 2f9ace5d4557 ("KVM: arm64: selftests: get-reg-list: Introduce vcpu configs")
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase this bugfix onto v6.7-rc1
- Link to v2: https://lore.kernel.org/r/20231017-kvm-get-reg-list-str-init-v2-1-ee30b1df3…
Changes in v2:
- Update Fixes: tag.
- Link to v1: https://lore.kernel.org/r/20231013-kvm-get-reg-list-str-init-v1-1-034f370ff…
---
tools/testing/selftests/kvm/get-reg-list.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/get-reg-list.c b/tools/testing/selftests/kvm/get-reg-list.c
index be7bf5224434..dd62a6976c0d 100644
--- a/tools/testing/selftests/kvm/get-reg-list.c
+++ b/tools/testing/selftests/kvm/get-reg-list.c
@@ -67,6 +67,7 @@ static const char *config_name(struct vcpu_reg_list *c)
c->name = malloc(len);
+ c->name[0] = '\0';
len = 0;
for_each_sublist(c, s) {
if (!strcmp(s->name, "base"))
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231012-kvm-get-reg-list-str-init-76c8ed4e19d6
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Add a test to exercize cpu hotplug with the function tracer active to
ensure that sensitive functions in idle path are excluded from being
traced. This helps catch issues such as the one fixed by commit
4b3338aaa74d ("powerpc/ftrace: Fix stack teardown in ftrace_no_trace").
Signed-off-by: Naveen N Rao <naveen(a)kernel.org>
---
.../ftrace/test.d/ftrace/func_hotplug.tc | 30 +++++++++++++++++++
1 file changed, 30 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
new file mode 100644
index 000000000000..49731a2b5c23
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
@@ -0,0 +1,30 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: ftrace - function trace across cpu hotplug
+# requires: function:tracer
+
+if ! which nproc ; then
+ nproc() {
+ ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l
+ }
+fi
+
+NP=`nproc`
+
+if [ $NP -eq 1 ] ;then
+ echo "We can not test cpu hotplug in UP environment"
+ exit_unresolved
+fi
+
+echo 0 > tracing_on
+echo > trace
+: "Set CPU1 offline/online with function tracer enabled"
+echo function > current_tracer
+echo 1 > tracing_on
+(echo 0 > /sys/devices/system/cpu/cpu1/online)
+(echo "forked"; sleep 1)
+(echo 1 > /sys/devices/system/cpu/cpu1/online)
+echo 0 > tracing_on
+
+: "Check CPU1 events are recorded"
+grep -q -e "\[001\]" trace
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.43.0
Alter the linker section of KUNIT_TABLE to move it out of INIT_DATA and
into DATA_DATA.
Data for KUnit tests does not need to be in the init section.
In order to run tests again after boot the KUnit data cannot be labeled as
init data as the kernel could write over it.
Add a KUNIT_INIT_TABLE in the next patch for KUnit tests that test init
data/functions.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v3:
- No changes
include/asm-generic/vmlinux.lds.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bae0fe4d499b..1107905d37fc 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -370,7 +370,8 @@
BRANCH_PROFILE() \
TRACE_PRINTKS() \
BPF_RAW_TP() \
- TRACEPOINT_STR()
+ TRACEPOINT_STR() \
+ KUNIT_TABLE()
/*
* Data section helpers
@@ -699,8 +700,7 @@
THERMAL_TABLE(governor) \
EARLYCON_TABLE() \
LSM_TABLE() \
- EARLY_LSM_TABLE() \
- KUNIT_TABLE()
+ EARLY_LSM_TABLE()
#define INIT_TEXT \
*(.init.text .init.text.*) \
base-commit: b285ba6f8cc1b2bfece0b4350fdb92c8780bc698
--
2.43.0.472.g3155946c3a-goog
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Now that the trace_marker can write up to the max size of the sub buffer.
Add a test to see if it actually can happen.
The README is updated to state that the trace_marker writes can be broken
up, and the test checks the README for that statement so that it does not
fail on older kernels that does not support this.
If the README does not have the specified update, the test will still test
if all the string is written, as that should work with older kernels.
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v1: https://lore.kernel.org/linux-trace-kernel/20231212135441.0337c3e9@gandalf.…
- Fix description as it was a cut and paste from the subbuffer size tests
that are not added yet.
kernel/trace/trace.c | 1 +
.../ftrace/test.d/00basic/trace_marker.tc | 112 ++++++++++++++++++
2 files changed, 113 insertions(+)
create mode 100755 tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2f8d59834c00..cbfcdd882590 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5595,6 +5595,7 @@ static const char readme_msg[] =
" delta: Delta difference against a buffer-wide timestamp\n"
" absolute: Absolute (standalone) timestamp\n"
"\n trace_marker\t\t- Writes into this file writes into the kernel buffer\n"
+ "\n May be broken into multiple events based on sub-buffer size.\n"
"\n trace_marker_raw\t\t- Writes into this file writes binary data into the kernel buffer\n"
" tracing_cpumask\t- Limit which CPUs to trace\n"
" instances\t\t- Make sub-buffers with: mkdir instances/foo\n"
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
new file mode 100755
index 000000000000..bf7f6f50c88a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
@@ -0,0 +1,112 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Basic tests on writing to trace_marker
+# requires: trace_marker
+# flags: instance
+
+get_buffer_data_size() {
+ sed -ne 's/^.*data.*size:\([0-9][0-9]*\).*/\1/p' events/header_page
+}
+
+get_buffer_data_offset() {
+ sed -ne 's/^.*data.*offset:\([0-9][0-9]*\).*/\1/p' events/header_page
+}
+
+get_event_header_size() {
+ type_len=`sed -ne 's/^.*type_len.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ time_len=`sed -ne 's/^.*time_delta.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ array_len=`sed -ne 's/^.*array.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ total_bits=$((type_len+time_len+array_len))
+ total_bits=$((total_bits+7))
+ echo $((total_bits/8))
+}
+
+get_print_event_buf_offset() {
+ sed -ne 's/^.*buf.*offset:\([0-9][0-9]*\).*/\1/p' events/ftrace/print/format
+}
+
+event_header_size=`get_event_header_size`
+print_header_size=`get_print_event_buf_offset`
+
+# Find the README
+README=""
+if [ -f README ]; then
+ README="README"
+# instance?
+elif [ -f ../../README ]; then
+ README="../../README"
+fi
+
+testone=0
+if [ ! -z "$README" ]; then
+ if grep -q "May be broken into multiple events based on sub-buffer size" $README; then
+ testone=1
+ fi
+fi
+
+data_offset=`get_buffer_data_offset`
+
+marker_meta=$((event_header_size+print_header_size))
+
+make_str() {
+ cnt=$1
+ # subtract two for \n\0 as marker adds these
+ cnt=$((cnt-2))
+ printf -- 'X%.0s' $(seq $cnt)
+}
+
+write_buffer() {
+ size=$1
+
+ str=`make_str $size`
+
+ # clear the buffer
+ echo > trace
+
+ # write the string into the marker
+ echo -n $str > trace_marker
+
+ echo $str
+}
+
+test_buffer() {
+
+ size=`get_buffer_data_size`
+ oneline_size=$((size-marker_meta))
+ echo size = $size
+ echo meta size = $marker_meta
+
+ if [ $testone -eq 1 ]; then
+ echo oneline size = $oneline_size
+
+ str=`write_buffer $oneline_size`
+
+ # Should be in one single event
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: */,"");printf "%s", $0; exit}' trace`
+
+ if [ "$new_str" != "$str" ]; then
+ exit fail;
+ fi
+ fi
+
+ # Now add a little more the meta data overhead will overflow
+
+ str=`write_buffer $size`
+
+ # Make sure the line was broken
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; exit}' trace`
+
+ if [ "$new_str" = "$str" ]; then
+ exit fail;
+ fi
+
+ # Make sure the entire line can be found
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: */,"");printf "%s", $0; }' trace`
+
+ if [ "$new_str" != "$str" ]; then
+ exit fail;
+ fi
+}
+
+test_buffer
+
--
2.42.0
Here is the 3rd part of converting net selftests to run in unique namespace.
This part converts all srv6 and fib tests.
Note that patch 06 is a fix for testing fib_nexthop_multiprefix.
Here is the part 1 link:
https://lore.kernel.org/netdev/20231202020110.362433-1-liuhangbin@gmail.com
And part 2 link:
https://lore.kernel.org/netdev/20231206070801.1691247-1-liuhangbin@gmail.com
Hangbin Liu (13):
selftests/net: add variable NS_LIST for lib.sh
selftests/net: convert srv6_end_dt46_l3vpn_test.sh to run it in unique
namespace
selftests/net: convert srv6_end_dt4_l3vpn_test.sh to run it in unique
namespace
selftests/net: convert srv6_end_dt6_l3vpn_test.sh to run it in unique
namespace
selftests/net: convert fcnal-test.sh to run it in unique namespace
selftests/net: fix grep checking for fib_nexthop_multiprefix
selftests/net: convert fib_nexthop_multiprefix to run it in unique
namespace
selftests/net: convert fib_nexthop_nongw.sh to run it in unique
namespace
selftests/net: convert fib_nexthops.sh to run it in unique namespace
selftests/net: convert fib-onlink-tests.sh to run it in unique
namespace
selftests/net: convert fib_rule_tests.sh to run it in unique namespace
selftests/net: convert fib_tests.sh to run it in unique namespace
selftests/net: convert fdb_flush.sh to run it in unique namespace
tools/testing/selftests/net/fcnal-test.sh | 30 ++-
tools/testing/selftests/net/fdb_flush.sh | 11 +-
.../testing/selftests/net/fib-onlink-tests.sh | 9 +-
.../selftests/net/fib_nexthop_multiprefix.sh | 98 +++++-----
.../selftests/net/fib_nexthop_nongw.sh | 34 ++--
tools/testing/selftests/net/fib_nexthops.sh | 142 +++++++-------
tools/testing/selftests/net/fib_rule_tests.sh | 36 ++--
tools/testing/selftests/net/fib_tests.sh | 184 +++++++++---------
tools/testing/selftests/net/lib.sh | 8 +
tools/testing/selftests/net/settings | 2 +-
.../selftests/net/srv6_end_dt46_l3vpn_test.sh | 51 +++--
.../selftests/net/srv6_end_dt4_l3vpn_test.sh | 48 ++---
.../selftests/net/srv6_end_dt6_l3vpn_test.sh | 46 ++---
13 files changed, 332 insertions(+), 367 deletions(-)
--
2.43.0
Changes from v1
(https://lore.kernel.org/damon/20231212191206.52917-1-sj@kernel.org/)
- Fix conflicts on latest mm-unstable tree
Changes from RFC
(https://lore.kernel.org/damon/20231202000806.46210-1-sj@kernel.org/)
- Make the working set size estimation test more reliable
- Wordsmith coverletter and commit messages
- Rename _damon.py to _damon_sysfs.py
DAMON exports most of its functionality via its sysfs interface. Hence
most DAMON functionality tests could be implemented using the interface.
However, because the interfaces require simple but multiple operations
for many controls, writing all such tests from the scratch could be
repetitive and time consuming.
Implement a minimum DAMON sysfs control module, and a couple of DAMON
functionality tests using the control module. The first test is for
ensuring minimum accuracy of data access monitoring, and the second test
is for finding if a previously found and fixed bug is introduced again.
Note that the DAMON sysfs control module is only for avoiding
duplicating code in tests. For convenient and general control of DAMON,
users should use DAMON user-space tools that developed for the purpose,
such as damo[1].
[1] https://github.com/damonitor/damo
Patches Sequence
----------------
This patchset is constructed with five patches. The first three patches
implement a Python-written test implementation-purpose DAMON sysfs
control module. The implementation is incrementally done in the
sequence of the basic data structure (first patch) first, kdamonds start
command (second patch) next, and finally DAMOS tried bytes update
command (third patch).
Then two patches for implementing selftests using the module follows.
The fourth patch implements a basic functionality test of DAMON for
working set estimation accuracy. Finally, the fifth patch implements a
corner case test for a previously found bug.
SeongJae Park (5):
selftests/damon: implement a python module for test-purpose DAMON
sysfs controls
selftests/damon/_damon_sysfs: implement kdamonds start function
selftests/damon/_damon_sysfs: implement updat_schemes_tried_bytes
command
selftests/damon: add a test for update_schemes_tried_regions sysfs
command
selftests/damon: add a test for update_schemes_tried_regions hang bug
tools/testing/selftests/damon/Makefile | 3 +
tools/testing/selftests/damon/_damon_sysfs.py | 322 ++++++++++++++++++
tools/testing/selftests/damon/access_memory.c | 41 +++
...sysfs_update_schemes_tried_regions_hang.py | 33 ++
...te_schemes_tried_regions_wss_estimation.py | 55 +++
5 files changed, 454 insertions(+)
create mode 100644 tools/testing/selftests/damon/_damon_sysfs.py
create mode 100644 tools/testing/selftests/damon/access_memory.c
create mode 100755 tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_hang.py
create mode 100755 tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
base-commit: 091b8c820de390a6235595bdb281edab63b9befe
--
2.34.1
Changes from RFC
(https://lore.kernel.org/damon/20231202000806.46210-1-sj@kernel.org/)
- Make the working set size estimation test more reliable
- Wordsmith coverletter and commit messages
- Rename _damon.py to _damon_sysfs.py
DAMON exports most of its functionality via its sysfs interface. Hence
most DAMON functionality tests could be implemented using the interface.
However, because the interfaces require simple but multiple operations
for many controls, writing all such tests from the scratch could be
repetitive and time consuming.
Implement a minimum DAMON sysfs control module, and a couple of DAMON
functionality tests using the control module. The first test is for
ensuring minimum accuracy of data access monitoring, and the second test
is for finding if a previously found and fixed bug is introduced again.
Note that the DAMON sysfs control module is only for avoiding
duplicating code in tests. For convenient and general control of DAMON,
users should use DAMON user-space tools that developed for the purpose,
such as damo[1].
[1] https://github.com/damonitor/damo
Patches Sequence
----------------
This patchset is constructed with five patches. The first three patches
implement a Python-written test implementation-purpose DAMON sysfs
control module. The implementation is incrementally done in the
sequence of the basic data structure (first patch) first, kdamonds start
command (second patch) next, and finally DAMOS tried bytes update
command (third patch).
Then two patches for implementing selftests using the module follows.
The fourth patch implements a basic functionality test of DAMON for
working set estimation accuracy. Finally, the fifth patch implements a
corner case test for a previously found bug.
SeongJae Park (5):
selftests/damon: implement a python module for test-purpose DAMON
sysfs controls
selftests/damon/_damon_sysfs: implement kdamonds start function
selftests/damon/_damon_sysfs: implement updat_schemes_tried_bytes
command
selftests/damon: add a test for update_schemes_tried_regions sysfs
command
selftests/damon: add a test for update_schemes_tried_regions hang bug
tools/testing/selftests/damon/Makefile | 3 +
tools/testing/selftests/damon/_damon_sysfs.py | 322 ++++++++++++++++++
tools/testing/selftests/damon/access_memory.c | 41 +++
...sysfs_update_schemes_tried_regions_hang.py | 33 ++
...te_schemes_tried_regions_wss_estimation.py | 55 +++
5 files changed, 454 insertions(+)
create mode 100644 tools/testing/selftests/damon/_damon_sysfs.py
create mode 100644 tools/testing/selftests/damon/access_memory.c
create mode 100755 tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_hang.py
create mode 100755 tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
base-commit: 5794dfaf6d1be564b0912d51d8a714baff329495
--
2.34.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Now that the trace_marker can write up to the max size of the sub buffer.
Add a test to see if it actually can happen.
The README is updated to state that the trace_marker writes can be broken
up, and the test checks the README for that statement so that it does not
fail on older kernels that does not support this.
If the README does not have the specified update, the test will still test
if all the string is written (although it would be broken up), as that
should work with older kernels.
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 1 +
.../ftrace/test.d/00basic/trace_marker.tc | 112 ++++++++++++++++++
2 files changed, 113 insertions(+)
create mode 100755 tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2f8d59834c00..cbfcdd882590 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5595,6 +5595,7 @@ static const char readme_msg[] =
" delta: Delta difference against a buffer-wide timestamp\n"
" absolute: Absolute (standalone) timestamp\n"
"\n trace_marker\t\t- Writes into this file writes into the kernel buffer\n"
+ "\n May be broken into multiple events based on sub-buffer size.\n"
"\n trace_marker_raw\t\t- Writes into this file writes binary data into the kernel buffer\n"
" tracing_cpumask\t- Limit which CPUs to trace\n"
" instances\t\t- Make sub-buffers with: mkdir instances/foo\n"
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
new file mode 100755
index 000000000000..bcb2dc6b8a66
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker.tc
@@ -0,0 +1,112 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Change the ringbuffer sub-buffer size
+# requires: trace_marker
+# flags: instance
+
+get_buffer_data_size() {
+ sed -ne 's/^.*data.*size:\([0-9][0-9]*\).*/\1/p' events/header_page
+}
+
+get_buffer_data_offset() {
+ sed -ne 's/^.*data.*offset:\([0-9][0-9]*\).*/\1/p' events/header_page
+}
+
+get_event_header_size() {
+ type_len=`sed -ne 's/^.*type_len.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ time_len=`sed -ne 's/^.*time_delta.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ array_len=`sed -ne 's/^.*array.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
+ total_bits=$((type_len+time_len+array_len))
+ total_bits=$((total_bits+7))
+ echo $((total_bits/8))
+}
+
+get_print_event_buf_offset() {
+ sed -ne 's/^.*buf.*offset:\([0-9][0-9]*\).*/\1/p' events/ftrace/print/format
+}
+
+event_header_size=`get_event_header_size`
+print_header_size=`get_print_event_buf_offset`
+
+# Find the README
+README=""
+if [ -f README ]; then
+ README="README"
+# instance?
+elif [ -f ../../README ]; then
+ README="../../README"
+fi
+
+testone=0
+if [ ! -z "$README" ]; then
+ if grep -q "May be broken into multiple events based on sub-buffer size" $README; then
+ testone=1
+ fi
+fi
+
+data_offset=`get_buffer_data_offset`
+
+marker_meta=$((event_header_size+print_header_size))
+
+make_str() {
+ cnt=$1
+ # subtract two for \n\0 as marker adds these
+ cnt=$((cnt-2))
+ printf -- 'X%.0s' $(seq $cnt)
+}
+
+write_buffer() {
+ size=$1
+
+ str=`make_str $size`
+
+ # clear the buffer
+ echo > trace
+
+ # write the string into the marker
+ echo -n $str > trace_marker
+
+ echo $str
+}
+
+test_buffer() {
+
+ size=`get_buffer_data_size`
+ oneline_size=$((size-marker_meta))
+ echo size = $size
+ echo meta size = $marker_meta
+
+ if [ $testone -eq 1 ]; then
+ echo oneline size = $oneline_size
+
+ str=`write_buffer $oneline_size`
+
+ # Should be in one single event
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: */,"");printf "%s", $0; exit}' trace`
+
+ if [ "$new_str" != "$str" ]; then
+ exit fail;
+ fi
+ fi
+
+ # Now add a little more the meta data overhead will overflow
+
+ str=`write_buffer $size`
+
+ # Make sure the line was broken
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; exit}' trace`
+
+ if [ "$new_str" = "$str" ]; then
+ exit fail;
+ fi
+
+ # Make sure the entire line can be found
+ new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: */,"");printf "%s", $0; }' trace`
+
+ if [ "$new_str" != "$str" ]; then
+ exit fail;
+ fi
+}
+
+test_buffer
+
--
2.42.0
This reverts commit 9fc96c7c19df ("selftests: error out if kernel header
files are not yet built").
It turns out that requiring the kernel headers to be built as a
prerequisite to building selftests, does not work in many cases. For
example, Peter Zijlstra writes:
"My biggest beef with the whole thing is that I simply do not want to use
'make headers', it doesn't work for me.
I have a ton of output directories and I don't care to build tools into
the output dirs, in fact some of them flat out refuse to work that way
(bpf comes to mind)." [1]
Therefore, stop erroring out on the selftests build. Additional patches
will be required in order to change over to not requiring the kernel
headers.
[1] https://lore.kernel.org/20231208221007.GO28727@noisy.programming.kicks-ass.…
Cc: Anders Roxell <anders.roxell(a)linaro.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
tools/testing/selftests/Makefile | 21 +----------------
tools/testing/selftests/lib.mk | 40 +++-----------------------------
2 files changed, 4 insertions(+), 57 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 3b2061d1c1a5..8247a7c69c36 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -155,12 +155,10 @@ ifneq ($(KBUILD_OUTPUT),)
abs_objtree := $(realpath $(abs_objtree))
BUILD := $(abs_objtree)/kselftest
KHDR_INCLUDES := -isystem ${abs_objtree}/usr/include
- KHDR_DIR := ${abs_objtree}/usr/include
else
BUILD := $(CURDIR)
abs_srctree := $(shell cd $(top_srcdir) && pwd)
KHDR_INCLUDES := -isystem ${abs_srctree}/usr/include
- KHDR_DIR := ${abs_srctree}/usr/include
DEFAULT_INSTALL_HDR_PATH := 1
endif
@@ -174,7 +172,7 @@ export KHDR_INCLUDES
# all isn't the first target in the file.
.DEFAULT_GOAL := all
-all: kernel_header_files
+all:
@ret=1; \
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
@@ -185,23 +183,6 @@ all: kernel_header_files
ret=$$((ret * $$?)); \
done; exit $$ret;
-kernel_header_files:
- @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \
- if [ $$? -ne 0 ]; then \
- RED='\033[1;31m'; \
- NOCOLOR='\033[0m'; \
- echo; \
- echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \
- echo "Please run this and try again:"; \
- echo; \
- echo " cd $(top_srcdir)"; \
- echo " make headers"; \
- echo; \
- exit 1; \
- fi
-
-.PHONY: kernel_header_files
-
run_tests: all
@for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 118e0964bda9..aa646e0661f3 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -44,26 +44,10 @@ endif
selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST))))
top_srcdir = $(selfdir)/../../..
-ifeq ("$(origin O)", "command line")
- KBUILD_OUTPUT := $(O)
+ifeq ($(KHDR_INCLUDES),)
+KHDR_INCLUDES := -isystem $(top_srcdir)/usr/include
endif
-ifneq ($(KBUILD_OUTPUT),)
- # Make's built-in functions such as $(abspath ...), $(realpath ...) cannot
- # expand a shell special character '~'. We use a somewhat tedious way here.
- abs_objtree := $(shell cd $(top_srcdir) && mkdir -p $(KBUILD_OUTPUT) && cd $(KBUILD_OUTPUT) && pwd)
- $(if $(abs_objtree),, \
- $(error failed to create output directory "$(KBUILD_OUTPUT)"))
- # $(realpath ...) resolves symlinks
- abs_objtree := $(realpath $(abs_objtree))
- KHDR_DIR := ${abs_objtree}/usr/include
-else
- abs_srctree := $(shell cd $(top_srcdir) && pwd)
- KHDR_DIR := ${abs_srctree}/usr/include
-endif
-
-KHDR_INCLUDES := -isystem $(KHDR_DIR)
-
# The following are built by lib.mk common compile rules.
# TEST_CUSTOM_PROGS should be used by tests that require
# custom build rule and prevent common build rule use.
@@ -74,25 +58,7 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
-all: kernel_header_files $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) \
- $(TEST_GEN_FILES)
-
-kernel_header_files:
- @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \
- if [ $$? -ne 0 ]; then \
- RED='\033[1;31m'; \
- NOCOLOR='\033[0m'; \
- echo; \
- echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \
- echo "Please run this and try again:"; \
- echo; \
- echo " cd $(top_srcdir)"; \
- echo " make headers"; \
- echo; \
- exit 1; \
- fi
-
-.PHONY: kernel_header_files
+all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
define RUN_TESTS
BASE_DIR="$(selfdir)"; \
--
2.43.0
Alter the linker section of KUNIT_TABLE to move it out of INIT_DATA and
into DATA_DATA.
Data for KUnit tests does not need to be in the init section.
In order to run tests again after boot the KUnit data cannot be labeled as
init data as the kernel could write over it.
Add a KUNIT_INIT_TABLE in the next patch for KUnit tests that test init
data/functions.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
include/asm-generic/vmlinux.lds.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bae0fe4d499b..1107905d37fc 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -370,7 +370,8 @@
BRANCH_PROFILE() \
TRACE_PRINTKS() \
BPF_RAW_TP() \
- TRACEPOINT_STR()
+ TRACEPOINT_STR() \
+ KUNIT_TABLE()
/*
* Data section helpers
@@ -699,8 +700,7 @@
THERMAL_TABLE(governor) \
EARLYCON_TABLE() \
LSM_TABLE() \
- EARLY_LSM_TABLE() \
- KUNIT_TABLE()
+ EARLY_LSM_TABLE()
#define INIT_TEXT \
*(.init.text .init.text.*) \
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.43.0.rc2.451.g8631bc7472-goog
When TPIDR2 is not supported the tpidr2 ABI test prints the same message
for each skipped test:
ok 1 skipped, TPIDR2 not supported
which isn't ideal for test automation software since it tracks kselftest
results based on the string used to describe the test. This is also not
standard KTAP output, the expected format is:
ok 1 # SKIP default_value
Updated the program to generate this, using the same set of test names that
we would run if the test actually executed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/abi/tpidr2.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/arm64/abi/tpidr2.c b/tools/testing/selftests/arm64/abi/tpidr2.c
index 351a098b503a..02ee3a91b780 100644
--- a/tools/testing/selftests/arm64/abi/tpidr2.c
+++ b/tools/testing/selftests/arm64/abi/tpidr2.c
@@ -254,6 +254,12 @@ static int write_clone_read(void)
putnum(++tests_run); \
putstr(" " #name "\n");
+#define skip_test(name) \
+ tests_skipped++; \
+ putstr("ok "); \
+ putnum(++tests_run); \
+ putstr(" # SKIP " #name "\n");
+
int main(int argc, char **argv)
{
int ret, i;
@@ -283,13 +289,11 @@ int main(int argc, char **argv)
} else {
putstr("# SME support not present\n");
- for (i = 0; i < EXPECTED_TESTS; i++) {
- putstr("ok ");
- putnum(i);
- putstr(" skipped, TPIDR2 not supported\n");
- }
-
- tests_skipped += EXPECTED_TESTS;
+ skip_test(default_value);
+ skip_test(write_read);
+ skip_test(write_sleep_read);
+ skip_test(write_fork_read);
+ skip_test(write_clone_read);
}
print_summary();
---
base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263
change-id: 20231124-kselftest-arm64-tpidr2-skip-43764f4ff4f4
Best regards,
--
Mark Brown <broonie(a)kernel.org>
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Extend the steering program feature by introducing a dedicated program
type: BPF_PROG_TYPE_VNET_HASH. This program type is capable to report
the hash value and the queue to use at the same time.
This is a rewrite of a RFC patch series submitted by Yuri Benditovich that
incorporates feedbacks for the series and V1 of this series:
https://lore.kernel.org/lkml/20210112194143.1494-1-yuri.benditovich@daynix.…
QEMU patched to use this new feature is available at:
https://github.com/daynix/qemu/tree/akihikodaki/bpf
The QEMU patches will soon be submitted to the upstream as RFC too.
V1 -> V2:
Changed to introduce a new BPF program type.
Akihiko Odaki (7):
bpf: Introduce BPF_PROG_TYPE_VNET_HASH
bpf: Add vnet_hash members to __sk_buff
skbuff: Introduce SKB_EXT_TUN_VNET_HASH
virtio_net: Add virtio_net_hdr_v1_hash_from_skb()
tun: Support BPF_PROG_TYPE_VNET_HASH
selftests/bpf: Test BPF_PROG_TYPE_VNET_HASH
vhost_net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/bpf/bpf_prog_run.rst | 1 +
Documentation/bpf/libbpf/program_types.rst | 2 +
drivers/net/tun.c | 158 +++++--
drivers/vhost/net.c | 16 +-
include/linux/bpf_types.h | 2 +
include/linux/filter.h | 7 +
include/linux/skbuff.h | 10 +
include/linux/virtio_net.h | 22 +
include/uapi/linux/bpf.h | 5 +
kernel/bpf/verifier.c | 6 +
net/core/filter.c | 86 +++-
net/core/skbuff.c | 3 +
tools/include/uapi/linux/bpf.h | 5 +
tools/lib/bpf/libbpf.c | 2 +
tools/testing/selftests/bpf/config | 1 +
tools/testing/selftests/bpf/config.aarch64 | 1 -
.../selftests/bpf/prog_tests/vnet_hash.c | 385 ++++++++++++++++++
tools/testing/selftests/bpf/progs/vnet_hash.c | 16 +
18 files changed, 681 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vnet_hash.c
create mode 100644 tools/testing/selftests/bpf/progs/vnet_hash.c
--
2.42.0
By default, all the test output will be printed to stdout or output.log if
-s supplied. The kselftest/runner.sh also supports per test log if the
variable per_test_logging is set. So add new option -p to set this
veriable. Note the -p option is conflict with -s option.
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
tools/testing/selftests/run_kselftest.sh | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh
index 92743980e553..965220a314ce 100755
--- a/tools/testing/selftests/run_kselftest.sh
+++ b/tools/testing/selftests/run_kselftest.sh
@@ -20,7 +20,8 @@ usage()
{
cat <<EOF
Usage: $0 [OPTIONS]
- -s | --summary Print summary with detailed log in output.log
+ -s | --summary Print summary with detailed log in output.log (conflict with -p)
+ -p | --per_test_log Print test log in /tmp with each test name (conflict with -s)
-t | --test COLLECTION:TEST Run TEST from COLLECTION
-c | --collection COLLECTION Run all tests from COLLECTION
-l | --list List the available collection:test entries
@@ -41,6 +42,9 @@ while true; do
logfile="$BASE_DIR"/output.log
cat /dev/null > $logfile
shift ;;
+ -p | --per_test_log)
+ per_test_logging=1
+ shift ;;
-t | --test)
TESTS="$TESTS $2"
shift 2 ;;
--
2.41.0
Hi,
Changes since v2 [1]:
* Added a new patch (sent separately earlier) at the end, to error out
if "make headers" has not yet been run.
* Reworked and simplified the uffd movement patch. Now it only moves
some uffd*() routines, not all, and doesn't have to touch the Makefile
at all. This lighter touch also allowed me to drop the "move psize(),
pshift() into vm_utils.c" entirely. I expect Peter Xu will be a little
happier with this new approach.
* Fixed the commit description for the MADV_COLLAPSE patch.
* Added more Reviewed-by tags from David Hildenbrand and Peter Xu.
[1] https://lore.kernel.org/all/20230603021558.95299-1-jhubbard@nvidia.com/
John Hubbard (11):
selftests/mm: fix uffd-stress unused function warning
selftests/mm: fix unused variable warnings in hugetlb-madvise.c,
migration.c
selftests/mm: fix "warning: expression which evaluates to zero..." in
mlock2-tests.c
selftests/mm: fix invocation of tests that are run via shell scripts
selftests/mm: .gitignore: add mkdirty, va_high_addr_switch
selftests/mm: fix two -Wformat-security warnings in uffd builds
selftests/mm: fix a "possibly uninitialized" warning in pkey-x86.h
selftests/mm: fix build failures due to missing MADV_COLLAPSE
selftests/mm: move certain uffd*() routines from vm_util.c to
uffd-common.c
Documentation: kselftest: "make headers" is a prerequisite
selftests: error out if kernel header files are not yet built
Documentation/dev-tools/kselftest.rst | 1 +
tools/testing/selftests/lib.mk | 36 +++++++++++-
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/cow.c | 7 ---
tools/testing/selftests/mm/hugetlb-madvise.c | 8 ++-
tools/testing/selftests/mm/khugepaged.c | 10 ----
tools/testing/selftests/mm/migration.c | 5 +-
tools/testing/selftests/mm/mlock2-tests.c | 1 -
tools/testing/selftests/mm/pkey-x86.h | 2 +-
tools/testing/selftests/mm/run_vmtests.sh | 6 +-
tools/testing/selftests/mm/uffd-common.c | 59 ++++++++++++++++++++
tools/testing/selftests/mm/uffd-common.h | 5 ++
tools/testing/selftests/mm/uffd-stress.c | 10 ----
tools/testing/selftests/mm/uffd-unit-tests.c | 16 ++----
tools/testing/selftests/mm/vm_util.c | 59 --------------------
tools/testing/selftests/mm/vm_util.h | 14 +++--
16 files changed, 130 insertions(+), 111 deletions(-)
base-commit: f8dba31b0a826e691949cd4fdfa5c30defaac8c5
--
2.40.1
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). Unlike normal stacks only the shadow stack
size is specified, similar issues to those that lead to the creation of
map_shadow_stack() apply.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (5):
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
kselftest/clone3: Test shadow stack support
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 56 ++++--
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 1 +
include/uapi/linux/sched.h | 4 +
kernel/fork.c | 53 ++++--
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 200 +++++++++++++++++-----
tools/testing/selftests/clone3/clone3_selftests.h | 7 +
12 files changed, 268 insertions(+), 77 deletions(-)
---
base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This patch set enables the Intel flexible return and event delivery
(FRED) architecture with KVM VMX to allow guests to utilize FRED.
The FRED architecture defines simple new transitions that change
privilege level (ring transitions). The FRED architecture was
designed with the following goals:
1) Improve overall performance and response time by replacing event
delivery through the interrupt descriptor table (IDT event
delivery) and event return by the IRET instruction with lower
latency transitions.
2) Improve software robustness by ensuring that event delivery
establishes the full supervisor context and that event return
establishes the full user context.
The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0. One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0. Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.
Intel VMX architecture is extended to run FRED guests, and the changes
are majorly:
1) New VMCS fields for FRED context management, which includes two new
event data VMCS fields, eight new guest FRED context VMCS fields and
eight new host FRED context VMCS fields.
2) VMX nested-Exception support for proper virtualization of stack
levels introduced with FRED architecture.
Search for the latest FRED spec in most search engines with this search pattern:
site:intel.com FRED (flexible return and event delivery) specification
We want to send out the FRED VMX patch set for review while the FRED
native patch set v12 is being reviewed @
https://lkml.kernel.org/kvm/20231003062458.23552-1-xin3.li@intel.com/.
For easier review, I have set up a base tree with the latest FRED native
patch set on top of tip tree in the 'fred_v12' branch of repo
https://github.com/xinli-intel/linux-fred-public.git.
Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent
out earlier as a preparation for FRED changes:
https://lore.kernel.org/kvm/20231030233940.438233-1-xin@zytor.com/.
Patch 3-14 add FRED support to VMX.
Patch 15-18 add FRED support to nested VMX.
Patch 19 exposes FRED to KVM guests to complete the enabling.
Patch 20-23 adds FRED selftests.
Shan Kang (1):
KVM: selftests: Add fred exception tests
Xin Li (22):
KVM: VMX: Cleanup VMX basic information defines and usages
KVM: VMX: Cleanup VMX misc information defines and usages
KVM: VMX: Add support for the secondary VM exit controls
KVM: x86: Mark CR4.FRED as not reserved
KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config
KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID
KVM: VMX: Disable intercepting FRED MSRs
KVM: VMX: Initialize VMCS FRED fields
KVM: VMX: Switch FRED RSP0 between host and guest
KVM: VMX: Add support for FRED context save/restore
KVM: x86: Add kvm_is_fred_enabled()
KVM: VMX: Handle FRED event data
KVM: VMX: Handle VMX nested exception for FRED
KVM: VMX: Dump FRED context in dump_vmcs()
KVM: nVMX: Add support for the secondary VM exit controls
KVM: nVMX: Add FRED VMCS fields
KVM: nVMX: Add support for VMX FRED controls
KVM: nVMX: Add VMCS FRED states checking
KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests
KVM: selftests: Add FRED VMCS fields to evmcs
KVM: selftests: Run debug_regs test with FRED enabled
KVM: selftests: Add a new VM guest mode to run user level code
Documentation/virt/kvm/x86/nested-vmx.rst | 19 +
arch/x86/include/asm/hyperv-tlfs.h | 19 +
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/include/asm/msr-index.h | 15 +-
arch/x86/include/asm/vmx.h | 57 ++-
arch/x86/kvm/cpuid.c | 4 +-
arch/x86/kvm/kvm_cache_regs.h | 10 +
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/vmx/capabilities.h | 20 +-
arch/x86/kvm/vmx/hyperv.c | 61 ++-
arch/x86/kvm/vmx/nested.c | 315 ++++++++++++--
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmcs.h | 1 +
arch/x86/kvm/vmx/vmcs12.c | 19 +
arch/x86/kvm/vmx/vmcs12.h | 38 ++
arch/x86/kvm/vmx/vmcs_shadow_fields.h | 6 +-
arch/x86/kvm/vmx/vmx.c | 404 ++++++++++++++++--
arch/x86/kvm/vmx/vmx.h | 14 +-
arch/x86/kvm/x86.c | 55 ++-
arch/x86/kvm/x86.h | 5 +-
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 1 +
.../selftests/kvm/include/x86_64/evmcs.h | 146 +++++++
.../selftests/kvm/include/x86_64/processor.h | 33 ++
.../selftests/kvm/include/x86_64/vmx.h | 20 +
tools/testing/selftests/kvm/lib/kvm_util.c | 5 +-
.../selftests/kvm/lib/x86_64/processor.c | 15 +-
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 4 +-
.../testing/selftests/kvm/x86_64/debug_regs.c | 50 ++-
.../testing/selftests/kvm/x86_64/fred_test.c | 262 ++++++++++++
30 files changed, 1464 insertions(+), 150 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c
base-commit: d49b86c24e836941c85c4906e9519fca9426a6e0
--
2.42.0
Changes in RFC v3:
------------------
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Changes in RFC v2:
------------------
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
----------------------
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Jakub Kicinski (2):
net: page_pool: factor out releasing DMA from releasing the page
net: page_pool: create hooks for custom page providers
Mina Almasry (10):
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
memory-provider: dmabuf devmem memory provider
page-pool: device memory support
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 28 ++
include/linux/netdevice.h | 93 ++++
include/linux/skbuff.h | 56 ++-
include/linux/socket.h | 1 +
include/net/netdev_rx_queue.h | 1 +
include/net/page_pool/helpers.h | 151 ++++++-
include/net/page_pool/types.h | 55 +++
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 10 +
include/uapi/linux/uio.h | 10 +
net/core/datagram.c | 6 +
net/core/dev.c | 240 +++++++++++
net/core/gro.c | 7 +-
net/core/netdev-genl-gen.c | 14 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 118 +++++
net/core/page_pool.c | 209 +++++++--
net/core/skbuff.c | 80 +++-
net/core/sock.c | 36 ++
net/ipv4/tcp.c | 205 ++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 7 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 10 +
tools/net/ynl/generated/netdev-user.c | 42 ++
tools/net/ynl/generated/netdev-user.h | 47 ++
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++
32 files changed, 1950 insertions(+), 64 deletions(-)
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.42.0.869.gea05f2083d-goog
We missed one of the casts of kfree() to kunit_action_t in kunit-test,
which was only enabled when debugfs was in use. This could potentially
break CFI.
Use the existing wrapper function instead.
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/kunit/kunit-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index 3e9c5192d095..ee6927c60979 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -559,7 +559,7 @@ static void kunit_log_test(struct kunit *test)
KUNIT_EXPECT_TRUE(test, test->log->append_newlines);
full_log = string_stream_get_string(test->log);
- kunit_add_action(test, (kunit_action_t *)kfree, full_log);
+ kunit_add_action(test, kfree_wrapper, full_log);
KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
strstr(full_log, "put this in log."));
KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
--
2.43.0.rc2.451.g8631bc7472-goog
Add parsing of attributes as diagnostic data. Fixes issue with test plan
being parsed incorrectly as diagnostic data when located after
suite-level attributes.
Note that if there does not exist a test plan line, the diagnostic lines
between the suite header and the first result will be saved in the suite
log rather than the first test case log.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
tools/testing/kunit/kunit_parser.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 79d8832c862a..ce34be15c929 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]:
Log of diagnostic lines
"""
log = [] # type: List[str]
- non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START]
+ non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN]
while lines and not any(re.match(lines.peek())
for re in non_diagnostic_lines):
log.append(lines.pop())
@@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# test plan
test.name = "main"
ktap_line = parse_ktap_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
else:
@@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
if parent_test:
# If KTAP version line and/or subtest header is found, attempt
# to parse test plan and print test header
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
print_test_header(test)
expected_count = test.expected_count
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.43.0.472.g3155946c3a-goog
When walking directory trees, instead of looking for specific files and
running dirname to get the parent folder, traverse all folders and
ignore the ones not containing the desired files. This avoids the need
to call dirname inside the loop, which gives a big performance boost,
approximately halving run time: Running locally on a
mt8192-asurada-spherion, which reports 160 test cases, has gone from
5.5s to 2.9s, while running remotely with an nfsroot has gone from
13.5s to 5.5s.
This change has a side-effect, which is that the root DT node now
also shows in the output, even though it isn't expected to bind to a
driver. However there shouldn't be a matching driver for the board
compatible, so the end result will be just an extra skipped test:
ok 1 / # SKIP
Reported-by: Mark Brown <broonie(a)kernel.org>
Closes: https://lore.kernel.org/all/310391e8-fdf2-4c2f-a680-7744eb685177@sirena.org…
Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed Devicetree devices")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
tools/testing/selftests/dt/test_unprobed_devices.sh | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/dt/test_unprobed_devices.sh b/tools/testing/selftests/dt/test_unprobed_devices.sh
index b07af2a4c4de..7fae90293a9d 100755
--- a/tools/testing/selftests/dt/test_unprobed_devices.sh
+++ b/tools/testing/selftests/dt/test_unprobed_devices.sh
@@ -33,8 +33,8 @@ if [[ ! -d "${PDT}" ]]; then
fi
nodes_compatible=$(
- for node_compat in $(find ${PDT} -name compatible); do
- node=$(dirname "${node_compat}")
+ for node in $(find ${PDT} -type d); do
+ [ ! -f "${node}"/compatible ] && continue
# Check if node is available
if [[ -e "${node}"/status ]]; then
status=$(tr -d '\000' < "${node}"/status)
@@ -46,10 +46,11 @@ nodes_compatible=$(
nodes_dev_bound=$(
IFS=$'\n'
- for uevent in $(find /sys/devices -name uevent); do
- if [[ -d "$(dirname "${uevent}")"/driver ]]; then
- grep '^OF_FULLNAME=' "${uevent}" | sed -e 's|OF_FULLNAME=||'
- fi
+ for dev_dir in $(find /sys/devices -type d); do
+ [ ! -f "${dev_dir}"/uevent ] && continue
+ [ ! -d "${dev_dir}"/driver ] && continue
+
+ grep '^OF_FULLNAME=' "${dev_dir}"/uevent | sed -e 's|OF_FULLNAME=||'
done
)
--
2.43.0
When running the set_memory_region_test on arm64 platform, it causes the
below assert:
==== Test Assertion Failure ====
set_memory_region_test.c:355: r && errno == EINVAL
pid=40695 tid=40695 errno=0 - Success
1 0x0000000000401baf: test_invalid_memory_region_flags at set_memory_region_test.c:355
2 (inlined by) main at set_memory_region_test.c:541
3 0x0000ffff951c879b: ?? ??:0
4 0x0000ffff951c886b: ?? ??:0
5 0x0000000000401caf: _start at ??:?
KVM_SET_USER_MEMORY_REGION should have failed on v2 only flag 0x2
This is because the arm64 platform also support the KVM_MEM_READONLY flag, but
the current implementation add it into the supportd_flags only on x86_64
platform, so this causes assert on other platform which also support the
KVM_MEM_READONLY flag.
Fix it by using the __KVM_HAVE_READONLY_MEM macro to detect if the
current platform support the KVM_MEM_READONLY, thus fix this problem on
all other platform which support KVM_MEM_READONLY.
Fixes: 5d74316466f4 ("KVM: selftests: Add a memory region subtest to validate invalid flags")
Signed-off-by: Shaoqin Huang <shahuang(a)redhat.com>
---
This patch is based on the latest kvm-next[1] branch.
[1] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next
---
tools/testing/selftests/kvm/set_memory_region_test.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index 6637a0845acf..1ce710fd7a5a 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -333,9 +333,11 @@ static void test_invalid_memory_region_flags(void)
struct kvm_vm *vm;
int r, i;
-#ifdef __x86_64__
+#ifdef __KVM_HAVE_READONLY_MEM
supported_flags |= KVM_MEM_READONLY;
+#endif
+#ifdef __x86_64__
if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))
vm = vm_create_barebones_protected_vm();
else
--
2.40.1
Regressions that cause a device to no longer be probed by a driver can
have a big impact on the platform's functionality, and despite being
relatively common there isn't currently any generic test to detect them.
As an example, bootrr [1] does test for device probe, but it requires
defining the expected probed devices for each platform.
Given that the Devicetree already provides a static description of
devices on the system, it is a good basis for building such a test on
top.
This series introduces a test to catch regressions that prevent devices
from probing.
Patches 1 and 2 extend the existing dt-extract-compatibles to be able to
output only the compatibles that can be expected to match a Devicetree
node to a driver. Patch 2 adds a kselftest that walks over the
Devicetree nodes on the current platform and compares the compatibles to
the ones on the list, and on an ignore list, to point out devices that
failed to be probed.
A compatible list is needed because not all compatibles that can show up
in a Devicetree node can be used to match to a driver, for example the
code for that compatible might use "OF_DECLARE" type macros and avoid
the driver framework, or the node might be controlled by a driver that
was bound to a different node.
An ignore list is needed for the few cases where it's common for a
driver to match a device but not probe, like for the "simple-mfd"
compatible, where the driver only probes if that compatible is the
node's first compatible.
The reason for parsing the kernel source instead of relying on
information exposed by the kernel at runtime (say, looking at modaliases
or introducing some other mechanism), is to be able to catch issues
where a config was renamed or a driver moved across configs, and the
.config used by the kernel not updated accordingly. We need to parse the
source to find all compatibles present in the kernel independent of the
current config being run.
[1] https://github.com/kernelci/bootrr
Changes in v3:
- Added DT selftest path to MAINTAINERS
- Enabled device probe test for nodes with 'status = "ok"'
- Added pass/fail/skip totals to end of test output
Changes in v2:
- Extended dt-extract-compatibles script to be able to extract driver
matching compatibles, instead of adding a new one in Coccinelle
- Made kselftest output in the KTAP format
Nícolas F. R. A. Prado (3):
dt: dt-extract-compatibles: Handle cfile arguments in generator
function
dt: dt-extract-compatibles: Add flag for driver matching compatibles
kselftest: Add new test for detecting unprobed Devicetree devices
MAINTAINERS | 1 +
scripts/dtc/dt-extract-compatibles | 74 +++++++++++++----
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/dt/.gitignore | 1 +
tools/testing/selftests/dt/Makefile | 21 +++++
.../selftests/dt/compatible_ignore_list | 1 +
tools/testing/selftests/dt/ktap_helpers.sh | 70 ++++++++++++++++
.../selftests/dt/test_unprobed_devices.sh | 83 +++++++++++++++++++
8 files changed, 236 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/dt/.gitignore
create mode 100644 tools/testing/selftests/dt/Makefile
create mode 100644 tools/testing/selftests/dt/compatible_ignore_list
create mode 100644 tools/testing/selftests/dt/ktap_helpers.sh
create mode 100755 tools/testing/selftests/dt/test_unprobed_devices.sh
--
2.42.0
Here is the 2nd part of converting net selftests to run in unique namespace.
This part converts all bridge, vxlan, vrf tests.
Here is the part 1 link:
https://lore.kernel.org/netdev/20231202020110.362433-1-liuhangbin@gmail.com
Hangbin Liu (9):
selftests/net: convert test_bridge_backup_port.sh to run it in unique
namespace
selftests/net: convert test_bridge_neigh_suppress.sh to run it in
unique namespace
selftests/net: convert test_vxlan_mdb.sh to run it in unique namespace
selftests/net: convert test_vxlan_nolocalbypass.sh to run it in unique
namespace
selftests/net: convert test_vxlan_under_vrf.sh to run it in unique
namespace
selftests/net: convert test_vxlan_vnifiltering.sh to run it in unique
namespace
selftests/net: convert vrf_route_leaking.sh to run it in unique
namespace
selftests/net: convert vrf_strict_mode_test.sh to run it in unique
namespace
selftests/net: convert vrf-xfrm-tests.sh to run it in unique namespace
.../selftests/net/test_bridge_backup_port.sh | 371 +++++++++---------
.../net/test_bridge_neigh_suppress.sh | 331 ++++++++--------
tools/testing/selftests/net/test_vxlan_mdb.sh | 202 +++++-----
.../selftests/net/test_vxlan_nolocalbypass.sh | 48 ++-
.../selftests/net/test_vxlan_under_vrf.sh | 70 ++--
.../selftests/net/test_vxlan_vnifiltering.sh | 154 +++++---
tools/testing/selftests/net/vrf-xfrm-tests.sh | 77 ++--
.../selftests/net/vrf_route_leaking.sh | 201 +++++-----
.../selftests/net/vrf_strict_mode_test.sh | 47 ++-
9 files changed, 751 insertions(+), 750 deletions(-)
--
2.43.0
KUnit tests often need to provide a struct device, and thus far have
mostly been using root_device_register() or platform devices to create
a 'fake device' for use with, e.g., code which uses device-managed
resources. This has several disadvantages, including not being designed
for test use, scattering files in sysfs, and requiring manual teardown
on test exit, which may not always be possible in case of failure.
Instead, introduce a set of helper functions which allow devices
(internally a struct kunit_device) to be created and managed by KUnit --
i.e., they will be automatically unregistered on test exit. These
helpers can either use a user-provided struct device_driver, or have one
automatically created and managed by KUnit. In both cases, the device
lives on a new kunit_bus.
This is a follow-up to a previous proposal here:
https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@g…
(The kunit_defer() function in the first patch there has since been
merged as the 'deferred actions' feature.)
My intention is to take this whole series in via the kselftest/kunit
branch, but I'm equally okay with splitting up the later patches which
use this to go via the various subsystem trees in case there are merge
conflicts.
Cheers,
-- David
Signed-off-by: David Gow <davidgow(a)google.com>
---
David Gow (4):
kunit: Add APIs for managing devices
fortify: test: Use kunit_device
overflow: Replace fake root_device with kunit_device
ASoC: topology: Replace fake root_device with kunit_device in tests
Documentation/dev-tools/kunit/usage.rst | 49 +++++++++
include/kunit/device.h | 76 ++++++++++++++
lib/fortify_kunit.c | 5 +-
lib/kunit/Makefile | 3 +-
lib/kunit/device.c | 176 ++++++++++++++++++++++++++++++++
lib/kunit/kunit-test.c | 68 +++++++++++-
lib/kunit/test.c | 3 +
lib/overflow_kunit.c | 5 +-
sound/soc/soc-topology-test.c | 11 +-
9 files changed, 382 insertions(+), 14 deletions(-)
---
base-commit: c8613be119892ccceffbc550b9b9d7d68b995c9e
change-id: 20230718-kunit_bus-ab19c4ef48dc
Best regards,
--
David Gow <davidgow(a)google.com>
Add parsing of attributes as diagnostic data. Fixes issue with test plan
being parsed incorrectly as diagnostic data when located after
suite-level attributes.
Note that if there does not exist a test plan line, the diagnostic lines
between the suite header and the first result will be saved in the suite
log rather than the first test case log.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
tools/testing/kunit/kunit_parser.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 79d8832c862a..ce34be15c929 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]:
Log of diagnostic lines
"""
log = [] # type: List[str]
- non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START]
+ non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN]
while lines and not any(re.match(lines.peek())
for re in non_diagnostic_lines):
log.append(lines.pop())
@@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# test plan
test.name = "main"
ktap_line = parse_ktap_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
else:
@@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
if parent_test:
# If KTAP version line and/or subtest header is found, attempt
# to parse test plan and print test header
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
print_test_header(test)
expected_count = test.expected_count
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.43.0.472.g3155946c3a-goog
Changelog:
v8:
* Fixed a couple of build errors in the case of !CONFIG_MEMCG
* Simplified the online memcg selection scheme for the zswap global
limit reclaim (suggested by Michal Hocko and Johannes Weiner)
(patch 2 and patch 3)
* Added a new kconfig to allows users to enable zswap shrinker by
default. (suggested by Johannes Weiner) (patch 6)
v7:
* Added the mem_cgroup_iter_online() function to the API for the new
behavior (suggested by Andrew Morton) (patch 2)
* Fixed a missing list_lru_del -> list_lru_del_obj (patch 1)
v6:
* Rebase on top of latest mm-unstable.
* Fix/improve the in-code documentation of the new list_lru
manipulation functions (patch 1)
v5:
* Replace reference getting with an rcu_read_lock() section for
zswap lru modifications (suggested by Yosry)
* Add a new prep patch that allows mem_cgroup_iter() to return
online cgroup.
* Add a callback that updates pool->next_shrink when the cgroup is
offlined (suggested by Yosry Ahmed, Johannes Weiner)
v4:
* Rename list_lru_add to list_lru_add_obj and __list_lru_add to
list_lru_add (patch 1) (suggested by Johannes Weiner and
Yosry Ahmed)
* Some cleanups on the memcg aware LRU patch (patch 2)
(suggested by Yosry Ahmed)
* Use event interface for the new per-cgroup writeback counters.
(patch 3) (suggested by Yosry Ahmed)
* Abstract zswap's lruvec states and handling into
zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
v3:
* Add a patch to export per-cgroup zswap writeback counters
* Add a patch to update zswap's kselftest
* Separate the new list_lru functions into its own prep patch
* Do not start from the top of the hierarchy when encounter a memcg
that is not online for the global limit zswap writeback (patch 2)
(suggested by Yosry Ahmed)
* Do not remove the swap entry from list_lru in
__read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
* Removed a redundant zswap pool getting (patch 2)
(reported by Ryan Roberts)
* Use atomic for the nr_zswap_protected (instead of lruvec's lock)
(patch 5) (suggested by Yosry Ahmed)
* Remove the per-cgroup zswap shrinker knob (patch 5)
(suggested by Yosry Ahmed)
v2:
* Fix loongarch compiler errors
* Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap
LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
(i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
new shrinker does not have any parameter that must be tuned by the
user, and can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark:
build the linux kernel in a memory-limited cgroup, and allocate some
cold data in tmpfs to see if the shrinker could write them out and
improved the overall performance. Depending on the amount of cold data
generated, we observe from 14% to 35% reduction in kernel CPU time used
in the kernel builds.
Domenico Cerasuolo (3):
zswap: make shrinking memcg-aware
mm: memcg: add per-memcg zswap writeback stat
selftests: cgroup: update per-memcg zswap writeback selftest
Nhat Pham (3):
list_lru: allows explicit memcg and NUMA node selection
memcontrol: implement mem_cgroup_tryget_online()
zswap: shrinks zswap pool based on memory pressure
Documentation/admin-guide/mm/zswap.rst | 10 +
drivers/android/binder_alloc.c | 7 +-
fs/dcache.c | 8 +-
fs/gfs2/quota.c | 6 +-
fs/inode.c | 4 +-
fs/nfs/nfs42xattr.c | 8 +-
fs/nfsd/filecache.c | 4 +-
fs/xfs/xfs_buf.c | 6 +-
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_qm.c | 2 +-
include/linux/list_lru.h | 54 ++-
include/linux/memcontrol.h | 15 +
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
include/linux/zswap.h | 27 +-
mm/Kconfig | 14 +
mm/list_lru.c | 48 ++-
mm/memcontrol.c | 3 +
mm/mmzone.c | 1 +
mm/swap.h | 3 +-
mm/swap_state.c | 26 +-
mm/vmstat.c | 1 +
mm/workingset.c | 4 +-
mm/zswap.c | 456 +++++++++++++++++---
tools/testing/selftests/cgroup/test_zswap.c | 74 ++--
25 files changed, 661 insertions(+), 125 deletions(-)
base-commit: 5cdba94229e58a39ca389ad99763af29e6b0c5a5
--
2.34.1
Hi all,
Here's v2 series to improve resctrl selftests with generalized test
framework and rewritten CAT test. In contrast to v1, this version does
not include L2 CAT test because it needs further work. In general, I
feel that v2 is in much better shape than v1 because I ended up
addressing a few small things beyond what came up during v1 review.
The series contains following improvements:
- Excludes shareable bits from CAT test allocation to avoid interference
- Replaces file "sink" with a volatile variable
- Alters read pattern to defeat HW prefetcher optimizations
- Rewrites CAT test to make the CAT test reliable and truly measure
if CAT is working or not
- Introduces generalized test framework making easier to add new tests
- Lots of other cleanups & refactoring
This series have been tested across a large number of systems from
different generations.
v2:
- Postpone adding L2 CAT test as more investigations are necessary
- Add patch to remove ctrlc_handler() from wrong place
- Improvements to changelogs
- Function comments improvements & comment cleanups
- Move some parts of the changes into more logical patch
- If checks: buf == NULL -> !buf
- Variable naming:
- p -> buf
- cbm_mask_path -> cbm_path
- Function naming:
- get_cbm_mask() -> get_full_cbm()
- cache_size() -> cache_portion_size()
- Use PATH_MAX
- Improved cache_portion_size() parameter names
- int count -> unsigned int
- Pass filename to measurement taking functions instead of
resctrl_val_param
- !lines ? : reversal
- Removed bogus static from function local variable
- Open perf fd only once, reset & enable in the innermost test loop
- Add perf fd ioctl() error handling
- Add patch to change compiler optimization prevention "sink" from file
to volatile variable
- Remove cpu_no and resource (the latter was added in v1) members from
resctrl_val_param (pass uparams and test where those are needed)
- Removed ARRAY_SIZE() macro
- Add patch to rename "resource_id" to "domain_id"
Ilpo Järvinen (26):
selftests/resctrl: Don't use ctrlc_handler() outside signal handling
selftests/resctrl: Split fill_buf to allow tests finer-grained control
selftests/resctrl: Refactor fill_buf functions
selftests/resctrl: Refactor get_cbm_mask() and rename to
get_full_cbm()
selftests/resctrl: Mark get_cache_size() cache_type const
selftests/resctrl: Create cache_portion_size() helper
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Split measure_cache_vals()
selftests/resctrl: Split show_cache_info() to test specific and
generic parts
selftests/resctrl: Remove unnecessary __u64 -> unsigned long
conversion
selftests/resctrl: Remove nested calls in perf event handling
selftests/resctrl: Consolidate naming of perf event related things
selftests/resctrl: Improve perf init
selftests/resctrl: Convert perf related globals to locals
selftests/resctrl: Move cat_val() to cat_test.c and rename to
cat_test()
selftests/resctrl: Open perf fd before start & add error handling
selftests/resctrl: Replace file write with volatile variable
selftests/resctrl: Read in less obvious order to defeat prefetch
optimizations
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
selftests/resctrl: Create struct for input parameters
selftests/resctrl: Introduce generalized test framework
selftests/resctrl: Pass write_schemata() resource instead of test name
selftests/resctrl: Add helper to convert L2/3 to integer
selftests/resctrl: Rename resource ID to domain ID
selftests/resctrl: Get domain id from cache id
selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
tools/testing/selftests/resctrl/cache.c | 274 +++++----------
tools/testing/selftests/resctrl/cat_test.c | 328 +++++++++++-------
tools/testing/selftests/resctrl/cmt_test.c | 76 +++-
tools/testing/selftests/resctrl/fill_buf.c | 132 ++++---
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 28 +-
tools/testing/selftests/resctrl/resctrl.h | 117 +++++--
.../testing/selftests/resctrl/resctrl_tests.c | 205 +++++------
tools/testing/selftests/resctrl/resctrl_val.c | 56 +--
tools/testing/selftests/resctrl/resctrlfs.c | 246 ++++++++-----
10 files changed, 821 insertions(+), 667 deletions(-)
--
2.30.2
When adapting the test to the kselftest framework, a few printf() calls
indicating test progress were not updated.
Fix this by replacing these printf() calls by ksft_print_msg() calls.
Fixes: ce7d101750ff8450 ("selftests: timers: clocksource-switch: adapt to kselftest framework")
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
---
When just running the test, the output looks like:
# Validating clocksource arch_sys_counter
TAP version 13
1..12
ok 1 CLOCK_REALTIME
...
# Validating clocksource ffca0000.timer
TAP version 13
1..12
ok 1 CLOCK_REALTIME
...
When redirecting the test output to a file, the progress prints are not
interspersed with the test output, but collated at the end:
TAP version 13
1..12
ok 1 CLOCK_REALTIME
...
TAP version 13
1..12
ok 1 CLOCK_REALTIME
...
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:6 error:0
# Validating clocksource arch_sys_counter
# Validating clocksource ffca0000.timer
...
This makes it hard to match the test results with the timer under test.
Is there a way to fix this? The test does use fork().
Thanks!
---
tools/testing/selftests/timers/clocksource-switch.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/timers/clocksource-switch.c b/tools/testing/selftests/timers/clocksource-switch.c
index c5264594064c8516..83faa4e354e389c2 100644
--- a/tools/testing/selftests/timers/clocksource-switch.c
+++ b/tools/testing/selftests/timers/clocksource-switch.c
@@ -156,8 +156,8 @@ int main(int argc, char **argv)
/* Check everything is sane before we start switching asynchronously */
if (do_sanity_check) {
for (i = 0; i < count; i++) {
- printf("Validating clocksource %s\n",
- clocksource_list[i]);
+ ksft_print_msg("Validating clocksource %s\n",
+ clocksource_list[i]);
if (change_clocksource(clocksource_list[i])) {
status = -1;
goto out;
@@ -169,7 +169,7 @@ int main(int argc, char **argv)
}
}
- printf("Running Asynchronous Switching Tests...\n");
+ ksft_print_msg("Running Asynchronous Switching Tests...\n");
pid = fork();
if (!pid)
return run_tests(runtime);
--
2.34.1
This series enables support for the data processing extensions in the
newly released 2023 architecture, this is mainly support for 8 bit
floating point formats. Most of the extensions only introduce new
instructions and therefore only require hwcaps but there is a new EL0
visible control register FPMR used to control the 8 bit floating point
formats, we need to manage traps for this and context switch it.
The sharing of floating point save code between the host and guest
kernels slightly complicates the introduction of KVM support, we first
introduce host support with some placeholders for KVM then replace those
with the actual KVM support.
I've not added test coverage for ptrace, I've got a not quite finished
test program which exercises all the FP ptrace interfaces and their
interactions together, my plan is to cover it there rather than add
another tiny test program that duplicates the boilerplace for tracing a
target and doesn't actually run the traced program.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kerne…
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (21):
arm64/sysreg: Add definition for ID_AA64PFR2_EL1
arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09
arm64/sysreg: Add definition for ID_AA64ISAR3_EL1
arm64/sysreg: Add definition for ID_AA64FPFR0_EL1
arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09
arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09
arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09
arm64/sysreg: Add definition for FPMR
arm64/cpufeature: Hook new identification registers up to cpufeature
arm64/fpsimd: Enable host kernel access to FPMR
arm64/fpsimd: Support FEAT_FPMR
arm64/signal: Add FPMR signal handling
arm64/ptrace: Expose FPMR via ptrace
KVM: arm64: Add newly allocated ID registers to register descriptions
KVM: arm64: Support FEAT_FPMR for guests
arm64/hwcap: Define hwcaps for 2023 DPISA features
kselftest/arm64: Handle FPMR context in generic signal frame parser
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Add 2023 DPISA hwcap test coverage
KVM: arm64: selftests: Document feature registers added in 2023 extensions
KVM: arm64: selftests: Teach get-reg-list about FPMR
Documentation/arch/arm64/elf_hwcaps.rst | 49 +++++
arch/arm64/include/asm/cpu.h | 3 +
arch/arm64/include/asm/cpufeature.h | 5 +
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/hwcap.h | 15 ++
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/include/asm/processor.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 15 ++
arch/arm64/include/uapi/asm/sigcontext.h | 8 +
arch/arm64/kernel/cpufeature.c | 72 +++++++
arch/arm64/kernel/cpuinfo.c | 18 ++
arch/arm64/kernel/fpsimd.c | 13 ++
arch/arm64/kernel/ptrace.c | 42 ++++
arch/arm64/kernel/signal.c | 59 ++++++
arch/arm64/kvm/emulate-nested.c | 9 +
arch/arm64/kvm/fpsimd.c | 19 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 7 +-
arch/arm64/kvm/sys_regs.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 153 ++++++++++++++-
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 217 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/fpmr_siginfo.c | 82 ++++++++
.../selftests/arm64/signal/testcases/testcases.c | 8 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 11 +-
28 files changed, 819 insertions(+), 18 deletions(-)
---
base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
As Guillaume pointed, many selftests create namespaces with very common
names (like "client" or "server") or even (partially) run directly in init_net.
This makes these tests prone to failure if another namespace with the same
name already exists. It also makes it impossible to run several instances
of these tests in parallel.
This patch set intend to conver all the net selftests to run in unique namespace,
so we can update the selftest freamwork to run all tests in it's own namespace
in parallel. After update, we only need to wait for the test which need
longest time.
As the total patch set is too large. I break it to severl parts. This is
the first part.
v2 -> v3:
- Convert all ip netns del to cleanup_ns (Justin Iurman)
v1 -> v2:
- Split the large patch set to small parts for easy review (Paolo Abeni)
- Move busywait from forwarding/lib.sh to net/lib.sh directly (Petr Machata)
- Update setup_ns/cleanup_ns struct (Petr Machata)
- Remove default trap in lib.sh (Petr Machata)
Hangbin Liu (14):
selftests/net: add lib.sh
selftests/net: convert arp_ndisc_evict_nocarrier.sh to run it in
unique namespace
selftests/net: specify the interface when do arping
selftests/net: convert arp_ndisc_untracked_subnets.sh to run it in
unique namespace
selftests/net: convert cmsg tests to make them run in unique namespace
selftests/net: convert drop_monitor_tests.sh to run it in unique
namespace
selftests/net: convert traceroute.sh to run it in unique namespace
selftests/net: convert icmp_redirect.sh to run it in unique namespace
sleftests/net: convert icmp.sh to run it in unique namespace
selftests/net: convert ioam6.sh to run it in unique namespace
selftests/net: convert l2tp.sh to run it in unique namespace
selftests/net: convert ndisc_unsolicited_na_test.sh to run it in
unique namespace
selftests/net: convert sctp_vrf.sh to run it in unique namespace
selftests/net: convert unicast_extensions.sh to run it in unique
namespace
tools/testing/selftests/net/Makefile | 2 +-
.../net/arp_ndisc_evict_nocarrier.sh | 46 ++--
.../net/arp_ndisc_untracked_subnets.sh | 20 +-
tools/testing/selftests/net/cmsg_ipv6.sh | 10 +-
tools/testing/selftests/net/cmsg_so_mark.sh | 7 +-
tools/testing/selftests/net/cmsg_time.sh | 7 +-
.../selftests/net/drop_monitor_tests.sh | 21 +-
tools/testing/selftests/net/forwarding/lib.sh | 27 +-
tools/testing/selftests/net/icmp.sh | 10 +-
tools/testing/selftests/net/icmp_redirect.sh | 182 +++++++------
tools/testing/selftests/net/ioam6.sh | 247 +++++++++---------
tools/testing/selftests/net/l2tp.sh | 130 +++++----
tools/testing/selftests/net/lib.sh | 85 ++++++
.../net/ndisc_unsolicited_na_test.sh | 19 +-
tools/testing/selftests/net/sctp_vrf.sh | 12 +-
tools/testing/selftests/net/traceroute.sh | 82 +++---
.../selftests/net/unicast_extensions.sh | 99 ++++---
17 files changed, 500 insertions(+), 506 deletions(-)
create mode 100644 tools/testing/selftests/net/lib.sh
--
2.43.0
An overlook from commit 74452d6329be ("selftests/hid: tablets: add
variants of states with buttons"), where I don't use the Enum...
Fixes: 74452d6329be ("selftests/hid: tablets: add variants of states with buttons")
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Not sure what happened, but I mustn't have run the tests before
sending the series, and they fail blatently.
I'm deeply sorry for that, I thought these failures were fixed.
Cheers,
Benjamin
---
tools/testing/selftests/hid/tests/test_tablet.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/hid/tests/test_tablet.py b/tools/testing/selftests/hid/tests/test_tablet.py
index dc8b0fe9e7f3..903f19f7cbe9 100644
--- a/tools/testing/selftests/hid/tests/test_tablet.py
+++ b/tools/testing/selftests/hid/tests/test_tablet.py
@@ -115,7 +115,7 @@ class PenState(Enum):
# we take only the highest button in account
for b in [libevdev.EV_KEY.BTN_STYLUS, libevdev.EV_KEY.BTN_STYLUS2]:
if bool(evdev.value[b]):
- button = b
+ button = BtnPressed(b)
# the kernel tends to insert an EV_SYN once removing the tool, so
# the button will be released after
@@ -155,7 +155,7 @@ class PenState(Enum):
if button_found:
raise ValueError(f"duplicated BTN_STYLUS* in {events}")
button_found = True
- button = ev.code if ev.value else None
+ button = BtnPressed(ev.code) if ev.value else None
# the kernel tends to insert an EV_SYN once removing the tool, so
# the button will be released after
---
base-commit: f556aa957df8cb3e98af0f54bf1fa65f59ae47a3
change-id: 20231207-b4-wip-selftests-c1565d4d4578
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Hi,
the main trigger of this series was the XP-Pen issue[0].
Basically, the tablets tests were good-ish but couldn't
handle that tablet both in terms of emulation or in terms
of detection of issues.
So rework the tablets test a bit to be able to include the
XP-Pen patch later, once I have a kernel fix for it (right
now I only have a HID-BPF fix, meaning that the test will
fail if I include them).
Also, vmtest.sh needed a little bit of care, because
boot2container moved, and I made it easier to reuse in a CI
environment.
Cheers,
Benjamin
Note: I got the confirmation off-list from Peter that his
rev-by applied to the whole series.
[0] https://lore.kernel.org/all/nycvar.YFH.7.76.2311012033290.29220@cbobk.fhfr.…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- took Peter's review into account
- Added a few patches to make mypy and ruff happy given that
I had to add a couple of type hints here and there
- Link to v1: https://lore.kernel.org/r/20231129-wip-selftests-v1-0-ba15a1fe1b0d@kernel.o…
---
Benjamin Tissoires (15):
selftests/hid: vmtest.sh: update vm2c and container
selftests/hid: vmtest.sh: allow finer control on the build steps
selftests/hid: base: allow for multiple skip_if_uhdev
selftests/hid: tablets: remove unused class
selftests/hid: tablets: move the transitions to PenState
selftests/hid: tablets: move move_to function to PenDigitizer
selftests/hid: tablets: do not set invert when the eraser is used
selftests/hid: tablets: set initial data for tilt/twist
selftests/hid: tablets: define the elements of PenState
selftests/hid: tablets: add variants of states with buttons
selftests/hid: tablets: convert the primary button tests
selftests/hid: tablets: add a secondary barrel switch test
selftests/hid: tablets: be stricter for some transitions
selftests/hid: fix mypy complains
selftests/hid: fix ruff linter complains
tools/testing/selftests/hid/tests/base.py | 7 +-
tools/testing/selftests/hid/tests/test_mouse.py | 14 +-
tools/testing/selftests/hid/tests/test_tablet.py | 764 ++++++++++++++-------
.../selftests/hid/tests/test_wacom_generic.py | 6 +-
tools/testing/selftests/hid/vmtest.sh | 46 +-
5 files changed, 567 insertions(+), 270 deletions(-)
---
base-commit: 4ea4ed22b57846facd9cb4af5f67cb7bd2792cf3
change-id: 20231121-wip-selftests-001ac427e086
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
From: Zhao Mengmeng <zhaomengmeng(a)kylinos.cn>
When building whole selftests on arm64, rsync gives an erorr about sgx:
rsync: [sender] link_stat "/root/linux-next/tools/testing/selftests/sgx/test_encl.elf" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1327) [sender=3.2.5]
The root casue is sgx only used on X86_64, and shall be skipped on other
platforms.
Fix this by moving TEST_CUSTOM_PROGS and TEST_FILES inside the if check,
then the build result will be "Skipping non-existent dir: sgx".
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Zhao Mengmeng <zhaomengmeng(a)kylinos.cn>
---
tools/testing/selftests/sgx/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 50aab6b57da3..01abe4969b0f 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -16,10 +16,10 @@ HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC -z noexecstack
ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
+ifeq ($(CAN_BUILD_X86_64), 1)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
TEST_FILES := $(OUTPUT)/test_encl.elf
-ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
endif
--
2.38.1
Hi Reinette, Fenghua,
This series introduces a new mount option enabling an alternate mode for
MBM to work around an issue on present AMD implementations and any other
resctrl implementation where there are more RMIDs (or equivalent) than
hardware counters.
The L3 External Bandwidth Monitoring feature of the AMD PQoS
extension[1] only guarantees that RMIDs currently assigned to a
processor will be tracked by hardware. The counters of any other RMIDs
which are no longer being tracked will be reset to zero. The MBM event
counters return "Unavailable" to indicate when this has happened.
An interval for effectively measuring memory bandwidth typically needs
to be multiple seconds long. In Google's workloads, it is not feasible
to bound the number of jobs with different RMIDs which will run in a
cache domain over any period of time. Consequently, on a
fully-committed system where all RMIDs are allocated, few groups'
counters return non-zero values.
To demonstrate the underlying issue, the first patch provides a test
case in tools/testing/selftests/resctrl/test_rmids.sh.
On an AMD EPYC 7B12 64-Core Processor with the default behavior:
# ./test_rmids.sh
Created 255 monitoring groups.
g1: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g2: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g3: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
[..]
g238: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g239: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g240: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g241: mbm_total_bytes: Unavailable -> 660497472
g242: mbm_total_bytes: Unavailable -> 660793344
g243: mbm_total_bytes: Unavailable -> 660477312
g244: mbm_total_bytes: Unavailable -> 660495360
g245: mbm_total_bytes: Unavailable -> 660775360
g246: mbm_total_bytes: Unavailable -> 660645504
g247: mbm_total_bytes: Unavailable -> 660696128
g248: mbm_total_bytes: Unavailable -> 660605248
g249: mbm_total_bytes: Unavailable -> 660681280
g250: mbm_total_bytes: Unavailable -> 660834240
g251: mbm_total_bytes: Unavailable -> 660440064
g252: mbm_total_bytes: Unavailable -> 660501504
g253: mbm_total_bytes: Unavailable -> 660590720
g254: mbm_total_bytes: Unavailable -> 660548352
g255: mbm_total_bytes: Unavailable -> 660607296
255 groups, 0 returned counts in first pass, 15 in second
successfully measured bandwidth from 15/255 groups
To compare, here is the output from an Intel(R) Xeon(R) Platinum 8173M
CPU:
# ./test_rmids.sh
Created 223 monitoring groups.
g1: mbm_total_bytes: 0 -> 606126080
g2: mbm_total_bytes: 0 -> 613236736
g3: mbm_total_bytes: 0 -> 610254848
[..]
g221: mbm_total_bytes: 0 -> 584679424
g222: mbm_total_bytes: 0 -> 588808192
g223: mbm_total_bytes: 0 -> 587317248
223 groups, 223 returned counts in first pass, 223 in second
successfully measured bandwidth from 223/223 groups
To make better use of the hardware in such a use case, this patchset
introduces a "soft" RMID implementation, where each CPU is permanently
assigned a "hard" RMID. On context switches which change the current
soft RMID, the difference between each CPU's current event counts and
most recent counts is added to the totals for the current or outgoing
soft RMID.
This technique does not work for cache occupancy counters, so this patch
series disables cache occupancy events when soft RMIDs are enabled.
This series adds the "mbm_soft_rmid" mount option to allow users to
opt-in to the functionaltiy when they deem it helpful.
When the same system from the earlier AMD example enables the
mbm_soft_rmid mount option:
# ./test_rmids.sh
Created 255 monitoring groups.
g1: mbm_total_bytes: 0 -> 686560576
g2: mbm_total_bytes: 0 -> 668204416
[..]
g252: mbm_total_bytes: 0 -> 672651200
g253: mbm_total_bytes: 0 -> 666956800
g254: mbm_total_bytes: 0 -> 665917056
g255: mbm_total_bytes: 0 -> 671049600
255 groups, 255 returned counts in first pass, 255 in second
successfully measured bandwidth from 255/255 groups
(patches are based on tip/master)
[1] https://www.amd.com/system/files/TechDocs/56375_1.03_PUB.pdf
Peter Newman (8):
selftests/resctrl: Verify all RMIDs count together
x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events
x86/resctrl: Flush MBM event counts on soft RMID change
x86/resctrl: Call mon_event_count() directly for soft RMIDs
x86/resctrl: Create soft RMID version of __mon_event_count()
x86/resctrl: Assign HW RMIDs to CPUs for soft RMID
x86/resctrl: Use mbm_update() to push soft RMID counts
x86/resctrl: Add mount option to enable soft RMID
Stephane Eranian (1):
x86/resctrl: Hold a spinlock in __rmid_read() on AMD
arch/x86/include/asm/resctrl.h | 29 +++-
arch/x86/kernel/cpu/resctrl/core.c | 80 ++++++++-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 9 +-
arch/x86/kernel/cpu/resctrl/internal.h | 19 ++-
arch/x86/kernel/cpu/resctrl/monitor.c | 158 +++++++++++++++++-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++
tools/testing/selftests/resctrl/test_rmids.sh | 93 +++++++++++
7 files changed, 425 insertions(+), 15 deletions(-)
create mode 100755 tools/testing/selftests/resctrl/test_rmids.sh
base-commit: dd806e2f030e57dd5bac973372aa252b6c175b73
--
2.40.0.634.g4ca3ef3211-goog
Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in
kunit_free_suite_set()") fixed a wild-memory-access bug that could have
happened during the loading phase of test suites built and executed as
loadable modules. However, it also introduced a problematic side effect
that causes test suites modules to crash when they attempt to register
fake devices.
When a module is loaded, it traverses the MODULE_STATE_UNFORMED and
MODULE_STATE_COMING states before reaching the normal operating state
MODULE_STATE_LIVE. Finally, when the module is removed, it moves to
MODULE_STATE_GOING before being released. However, if the loading
function load_module() fails between complete_formation() and
do_init_module(), the module goes directly from MODULE_STATE_COMING to
MODULE_STATE_GOING without passing through MODULE_STATE_LIVE.
This behavior was causing kunit_module_exit() to be called without
having first executed kunit_module_init(). Since kunit_module_exit() is
responsible for freeing the memory allocated by kunit_module_init()
through kunit_filter_suites(), this behavior was resulting in a
wild-memory-access bug.
Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in
kunit_free_suite_set()") fixed this issue by running the tests when the
module is still in MODULE_STATE_COMING. However, modules in that state
are not fully initialized, lacking sysfs kobjects. Therefore, if a test
module attempts to register a fake device, it will inevitably crash.
This patch proposes a different approach to fix the original
wild-memory-access bug while restoring the normal module execution flow
by making kunit_module_exit() able to detect if kunit_module_init() has
previously initialized the tests suite set. In this way, test modules
can once again register fake devices without crashing.
This behavior is achieved by checking whether mod->kunit_suites is a
virtual or direct mapping address. If it is a virtual address, then
kunit_module_init() has allocated the suite_set in kunit_filter_suites()
using kmalloc_array(). On the contrary, if mod->kunit_suites is still
pointing to the original address that was set when looking up the
.kunit_test_suites section of the module, then the loading phase has
failed and there's no memory to be freed.
v2:
- add include <linux/mm.h>
Fixes: 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()")
Signed-off-by: Marco Pagani <marpagan(a)redhat.com>
---
lib/kunit/test.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index f2eb71f1a66c..0e829b9f8ce5 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -16,6 +16,7 @@
#include <linux/panic.h>
#include <linux/sched/debug.h>
#include <linux/sched.h>
+#include <linux/mm.h>
#include "debugfs.h"
#include "hooks-impl.h"
@@ -737,12 +738,14 @@ static void kunit_module_exit(struct module *mod)
};
const char *action = kunit_action();
+ if (!suite_set.start || !virt_addr_valid(suite_set.start))
+ return;
+
if (!action)
__kunit_test_suites_exit(mod->kunit_suites,
mod->num_kunit_suites);
- if (suite_set.start)
- kunit_free_suite_set(suite_set);
+ kunit_free_suite_set(suite_set);
}
static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
@@ -752,12 +755,12 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
switch (val) {
case MODULE_STATE_LIVE:
+ kunit_module_init(mod);
break;
case MODULE_STATE_GOING:
kunit_module_exit(mod);
break;
case MODULE_STATE_COMING:
- kunit_module_init(mod);
break;
case MODULE_STATE_UNFORMED:
break;
base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab
--
2.42.0
This patch series introduces UFFDIO_MOVE feature to userfaultfd, which
has long been implemented and maintained by Andrea in his local tree [1],
but was not upstreamed due to lack of use cases where this approach would
be better than allocating a new page and copying the contents. Previous
upstraming attempts could be found at [6] and [7].
UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY). Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [3].
We see over 40% reduction (on a Google pixel 6 device) in the compacting
thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was
measured using a benchmark that emulates a heap compaction implementation
using userfaultfd (to allow concurrent accesses by application threads).
More details of the usecase are explained in [3].
Furthermore, UFFDIO_MOVE enables moving swapped-out pages without
touching them within the same vma. Today, it can only be done by mremap,
however it forces splitting the vma.
TODOs for follow-up improvements:
- cross-mm support. Known differences from single-mm and missing pieces:
- memcg recharging (might need to isolate pages in the process)
- mm counters
- cross-mm deposit table moves
- cross-mm test
- document the address space where src and dest reside in struct
uffdio_move
- TLB flush batching. Will require extensive changes to PTL locking in
move_pages_pte(). OTOH that might let us reuse parts of mremap code.
Changes since v4 [9]:
- added Acked-by in patch 1, per Peter Xu
- added description for ctx, mm and mode parameters of move_pages(),
per kernel test robot
- added Reviewed-by's, per Peter Xu and Axel Rasmussen
- removed unused operations in uffd_test_case_ops
- refactored uffd-unit-test changes to avoid using global variables and
handle pmd moves without page size overrides, per Peter Xu
Changes since v3 [8]:
- changed retry path in folio_lock_anon_vma_read() to unlock and then
relock RCU, per Peter Xu
- removed cross-mm support from initial patchset, per David Hildenbrand
- replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand
- added missing cache flushing, per Lokesh Gidra and Peter Xu
- updated manpage text in the patch description, per Peter Xu
- renamed internal functions from "remap" to "move", per Peter Xu
- added mmap_changing check after taking mmap_lock, per Peter Xu
- changed uffd context check to ensure dst_mm is registered onto uffd we
are operating on, Peter Xu and David Hildenbrand
- changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand
- fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot
- comments cleanup, per David Hildenbrand and Peter Xu
- checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand
- prevent moving pinned pages, per Peter Xu
- changed uffd tests to call move uffd_test_ctx_clear() at the end of the
test run instead of in the beginning of the next run
- added support for testcase-specific ops
- added test for moving PMD-aligned blocks
Changes since v2 [5]:
- renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand
- rebase over mm-unstable to use folio_move_anon_rmap(),
per David Hildenbrand
- added text for manpage explaining DONTFORK and KSM requirements for this
feature, per David Hildenbrand
- check for anon_vma changes in the fast path of folio_lock_anon_vma_read,
per Peter Xu
- updated the title and description of the first patch,
per David Hildenbrand
- updating comments in folio_lock_anon_vma_read() explaining the need for
anon_vma checks, per David Hildenbrand
- changed all mapcount checks to PageAnonExclusive, per Jann Horn and
David Hildenbrand
- changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS,
per Jann Horn
- added a check for PTE change after folio is locked in remap_pages_pte(),
per Jann Horn
- added handling of PMD migration entries and bailout when pmd_devmap(),
per Jann Horn
- added checks to ensure both src and dst VMAs are writable, per Peter Xu
- added UFFD_FEATURE_MOVE, per Peter Xu
- removed obsolete comments, per Peter Xu
- renamed remap_anon_pte to remap_present_pte, per Peter Xu
- added a comment for folio_get_anon_vma() explaining the need for
anon_vma checks, per Peter Xu
- changed error handling in remap_pages() to make it more clear,
per Peter Xu
- changed EFAULT to EAGAIN to retry when a hugepage appears or disappears
from under us, per Peter Xu
- added links to previous upstreaming attempts, per David Hildenbrand
Changes since v1 [4]:
- add mmget_not_zero in userfaultfd_remap, per Jann Horn
- removed extern from function definitions, per Matthew Wilcox
- converted to folios in remap_pages_huge_pmd, per Matthew Wilcox
- use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand
- handle pgtable transfers between MMs, per Jann Horn
- ignore concurrent A/D pte bit changes, per Jann Horn
- split functions into smaller units, per David Hildenbrand
- test for folio_test_large in remap_anon_pte, per Matthew Wilcox
- use pte_swp_exclusive for swapcount check, per David Hildenbrand
- eliminated use of mmu_notifier_invalidate_range_start_nonblock,
per Jann Horn
- simplified THP alignment checks, per Jann Horn
- refactored the loop inside remap_pages, per Jann Horn
- additional clarifying comments, per Jann Horn
Main changes since Andrea's last version [1]:
- Trivial translations from page to folio, mmap_sem to mmap_lock
- Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its
possible failure
- Move pte mapping into remap_pages_pte to allow for retries when source
page or anon_vma is contended. Since pte_offset_map_nolock() start RCU
read section, we can't block anymore after mapping a pte, so have to unmap
the ptesm do the locking and retry.
- Add and use anon_vma_trylock_write() to avoid blocking while in RCU
read section.
- Accommodate changes in mmu_notifier_range_init() API, switch to
mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in
RCU read section.
- Open-code now removed __swp_swapcount()
- Replace pmd_read_atomic() with pmdp_get_lockless()
- Add new selftest for UFFDIO_MOVE
[1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc…
[2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha…
[3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj…
[4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/
[5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/
[6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh…
[7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed…
[8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/
[9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/
Andrea Arcangeli (2):
mm/rmap: support move to different root anon_vma in
folio_move_anon_rmap()
userfaultfd: UFFDIO_MOVE uABI
Suren Baghdasaryan (3):
selftests/mm: call uffd_test_ctx_clear at the end of the test
selftests/mm: add uffd_test_case_ops to allow test case-specific
operations
selftests/mm: add UFFDIO_MOVE ioctl test
Documentation/admin-guide/mm/userfaultfd.rst | 3 +
fs/userfaultfd.c | 72 +++
include/linux/rmap.h | 5 +
include/linux/userfaultfd_k.h | 11 +
include/uapi/linux/userfaultfd.h | 29 +-
mm/huge_memory.c | 122 ++++
mm/khugepaged.c | 3 +
mm/rmap.c | 30 +
mm/userfaultfd.c | 599 +++++++++++++++++++
tools/testing/selftests/mm/uffd-common.c | 39 +-
tools/testing/selftests/mm/uffd-common.h | 9 +
tools/testing/selftests/mm/uffd-stress.c | 5 +-
tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++
13 files changed, 1115 insertions(+), 4 deletions(-)
--
2.43.0.rc1.413.gea7ed67945-goog
Doing a ksft_print_msg() before the ksft_print_header() seems to confuse
the ksft framework in a strange way: running the test on the cmdline
results in the expected output.
But piping the output somewhere else, results in some odd output,
whereby we repeatedly get the same info printed:
# [INFO] detected THP size: 2048 KiB
# [INFO] detected hugetlb page size: 2048 KiB
# [INFO] detected hugetlb page size: 1048576 KiB
# [INFO] huge zeropage is enabled
TAP version 13
1..190
# [INFO] Anonymous memory tests in private mappings
# [RUN] Basic COW after fork() ... with base page
# [INFO] detected THP size: 2048 KiB
# [INFO] detected hugetlb page size: 2048 KiB
# [INFO] detected hugetlb page size: 1048576 KiB
# [INFO] huge zeropage is enabled
TAP version 13
1..190
# [INFO] Anonymous memory tests in private mappings
# [RUN] Basic COW after fork() ... with base page
ok 1 No leak from parent into child
# [RUN] Basic COW after fork() ... with swapped out base page
# [INFO] detected THP size: 2048 KiB
# [INFO] detected hugetlb page size: 2048 KiB
# [INFO] detected hugetlb page size: 1048576 KiB
# [INFO] huge zeropage is enabled
Doing the ksft_print_header() first seems to resolve that and gives us
the output we expect:
TAP version 13
# [INFO] detected THP size: 2048 KiB
# [INFO] detected hugetlb page size: 2048 KiB
# [INFO] detected hugetlb page size: 1048576 KiB
# [INFO] huge zeropage is enabled
1..190
# [INFO] Anonymous memory tests in private mappings
# [RUN] Basic COW after fork() ... with base page
ok 1 No leak from parent into child
# [RUN] Basic COW after fork() ... with swapped out base page
ok 2 No leak from parent into child
# [RUN] Basic COW after fork() ... with THP
ok 3 No leak from parent into child
# [RUN] Basic COW after fork() ... with swapped-out THP
ok 4 No leak from parent into child
# [RUN] Basic COW after fork() ... with PTE-mapped THP
ok 5 No leak from parent into child
Reported-by: Nico Pache <npache(a)redhat.com>
Fixes: f4b5fd6946e2 ("selftests/vm: anon_cow: THP tests")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
tools/testing/selftests/mm/cow.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c
index 7324ce5363c0c..6f2f839904416 100644
--- a/tools/testing/selftests/mm/cow.c
+++ b/tools/testing/selftests/mm/cow.c
@@ -1680,6 +1680,8 @@ int main(int argc, char **argv)
{
int err;
+ ksft_print_header();
+
pagesize = getpagesize();
thpsize = read_pmd_pagesize();
if (thpsize)
@@ -1689,7 +1691,6 @@ int main(int argc, char **argv)
ARRAY_SIZE(hugetlbsizes));
detect_huge_zeropage();
- ksft_print_header();
ksft_set_plan(ARRAY_SIZE(anon_test_cases) * tests_per_anon_test_case() +
ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() +
ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case());
--
2.41.0