Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v2
* The macros in kselftest_harness.h seem to be broken - __EXPECT() is
terminated by '} while (0); OPTIONAL_HANDLER(_assert)' meaning it is not
safe in single line if / else or for /which blocks, however working
around this results in checkpatch producing invalid warnings, as reported
by Shuah.
* Fixing these macros is out of scope for this series, so compromise and
instead rewrite test blocks so as to use multiple lines by separating out
a decl in most cases. This has the side effect of, for the most part,
making things more readable.
* Heavily document the use of the volatile keyword - we can't avoid
checkpatch complaining about this, so we explain it, as reported by
Shuah.
* Updated commit message to highlight that we skip tests we lack
permissions for, as reported by Shuah.
* Replaced a perror() with ksft_exit_fail_perror(), as reported by Shuah.
* Added user friendly messages to cases where tests are skipped due to lack
of permissions, as reported by Shuah.
* Update the tool header to include the new MADV_GUARD_POISON/UNPOISON
defines and directly include asm-generic/mman.h to get the
platform-neutral versions to ensure we import them.
* Finally fixed Vlastimil's email address in Suggested-by tags from suze to
suse, as reported by Vlastimil.
* Added linux-api to cc list, as reported by Vlastimil.
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
https://lore.kernel.org/all/cover.1729196871.git.lorenzo.stoakes@oracle.com/
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
tools: testing: update tools UAPI header for mman-common.h
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 +++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++-
tools/include/uapi/asm-generic/mman-common.h | 3 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1228 ++++++++++++++++++
19 files changed, 1627 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.47.0
Currently if we encounter an error between fork() and exec() of a child
process we log the error to stderr. This means that the errors don't get
annotated with the child information which makes diagnostics harder and
means that if we miss the exit signal from the child we can deadlock
waiting for output from the child. Improve robustness and output quality
by logging to stdout instead.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/fp-stress.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c
index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..80f22789504d661efc52a90d4b0893fbebec42f8 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -79,7 +79,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(pipefd[1], 1);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -89,7 +89,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(startup_pipe[0], 3);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -107,16 +107,15 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = read(3, &i, sizeof(i));
if (ret < 0)
- fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
- strerror(errno), errno);
+ printf("read(startp pipe) failed: %s (%d)\n",
+ strerror(errno), errno);
if (ret > 0)
- fprintf(stderr, "%d bytes of data on startup pipe\n",
- ret);
+ printf("%d bytes of data on startup pipe\n", ret);
close(3);
ret = execl(program, program, NULL);
- fprintf(stderr, "execl(%s) failed: %d (%s)\n",
- program, errno, strerror(errno));
+ printf("execl(%s) failed: %d (%s)\n",
+ program, errno, strerror(errno));
exit(EXIT_FAILURE);
} else {
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-arm64-fp-stress-exec-fail-d074ec82cf43
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This patch series migrates test cases out of test_sock.c to
prog_tests-style tests. It moves all BPF_CGROUP_INET4_POST_BIND and
BPF_CGROUP_INET6_POST_BIND test cases into a new prog_test,
sock_post_bind.c, while reimplementing all LOAD_REJECT test cases as
verifier tests in progs/verifier_sock.c. Finally, it moves remaining
BPF_CGROUP_INET_SOCK_CREATE test coverage into prog_tests/sock_create.c
before retiring test_sock.c completely.
Changes
=======
v1->v2:
- Remove superfluous verbose bool from the top of sock_post_bind.c.
- Use ASSERT_OK_FD instead of ASSERT_GE to test cgroup_fd validity.
- Run sock_post_bind tests in their own namespace, "sock_post_bind".
Jordan Rife (4):
selftests/bpf: Migrate *_POST_BIND test cases to prog_tests
selftests/bpf: Migrate LOAD_REJECT test cases to prog_tests
selftests/bpf: Migrate BPF_CGROUP_INET_SOCK_CREATE test cases to
prog_tests
selftests/bpf: Retire test_sock.c
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/prog_tests/sock_create.c | 35 ++-
.../sock_post_bind.c} | 256 +++++-------------
.../selftests/bpf/progs/verifier_sock.c | 60 ++++
5 files changed, 150 insertions(+), 205 deletions(-)
rename tools/testing/selftests/bpf/{test_sock.c => prog_tests/sock_post_bind.c} (64%)
--
2.47.0.105.g07ac214952-goog
Hi Zheng,
Cc-ed kunit folks, as we usually do for DAMON kunit test changes.
On Tue, 22 Oct 2024 16:39:27 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
> As discussed in [1], damon_va_evenly_split_region() is called to
> size-evenly split a region into 'nr_pieces' small regions,
> when nr_pieces == 1, no actual split is required. Check that case
> for better code readability and add a simple kunit testcase.
>
> [1] https://lore.kernel.org/all/20241021163316.12443-1-sj@kernel.org/
>
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Thanks,
SJ
[...]
Hi Zheng,
We Cc kunit folks for any DAMON kunit test changes, so I Cc-ed them.
On Tue, 22 Oct 2024 16:39:26 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
> According to the logic of damon_va_evenly_split_region(), currently
> following split case would not meet the expectation:
>
> Suppose DAMON_MIN_REGION=0x1000,
> Case: Split [0x0, 0x3000) into 2 pieces, then the result would be
> acutually 3 regions:
> [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000)
> but NOT the expected 2 regions:
> [0x0, 0x1000), [0x1000, 0x3000) !!!
>
> The root cause is that when calculating size of each split piece in
> damon_va_evenly_split_region():
>
> `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);`
>
> both the dividing and the ALIGN_DOWN may cause loss of precision,
> then each time split one piece of size 'sz_piece' from origin 'start' to
> 'end' would cause more pieces are split out than expected!!!
>
> To fix it, count for each piece split and make sure no more than
> 'nr_pieces'. In addition, add above case into damon_test_split_evenly().
>
> After this patch, damon-operations test passed:
Just for a clarification. damon-operations test doesn't fail without this
patch. This patch introduces two changes. A new kunit test, and a bug fix.
Without the bug fix, the new kunit test fails.
I usually prefer separating test changes from fixes (introduc a fix first, and
then the test for it, to avoid unnecessary test failures). But, given the
small size and the simplicity of the kunit change for this patch, I think
introducing it together with the fix is ok.
>
> # ./tools/testing/kunit/kunit.py run damon-operations
> [...]
> ============== damon-operations (6 subtests) ===============
> [PASSED] damon_test_three_regions_in_vmas
> [PASSED] damon_test_apply_three_regions1
> [PASSED] damon_test_apply_three_regions2
> [PASSED] damon_test_apply_three_regions3
> [PASSED] damon_test_apply_three_regions4
> [PASSED] damon_test_split_evenly
> ================ [PASSED] damon-operations =================
>
> Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces")
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Thanks,
SJ
[...]
Thanks for all the reviews.
V5:
Replace /sys/kernel/livepatch also in other/already existing tests.
Improve commit message of 3rd patch.
V4:
Use variable for /sys/kernel/debug.
Be consistent with "" around variables.
Fix path in commit message to /sys/kernel/debug/kprobes/enabled.
V3:
Save and restore kprobe state also when test fails, by integrating it
into setup_config() and cleanup().
Rename SYSFS variables in a more logical way.
Sort test modules in alphabetical order.
Rename module description.
V2:
Save and restore kprobe state.
Michael Vetter (3):
selftests: livepatch: rename KLP_SYSFS_DIR to SYSFS_KLP_DIR
selftests: livepatch: save and restore kprobe state
selftests: livepatch: test livepatching a kprobed function
tools/testing/selftests/livepatch/Makefile | 3 +-
.../testing/selftests/livepatch/functions.sh | 29 +++++----
.../selftests/livepatch/test-callbacks.sh | 24 +++----
.../selftests/livepatch/test-ftrace.sh | 2 +-
.../selftests/livepatch/test-kprobe.sh | 62 +++++++++++++++++++
.../selftests/livepatch/test-livepatch.sh | 12 ++--
.../testing/selftests/livepatch/test-state.sh | 8 +--
.../selftests/livepatch/test-syscall.sh | 2 +-
.../testing/selftests/livepatch/test-sysfs.sh | 8 +--
.../selftests/livepatch/test_modules/Makefile | 3 +-
.../livepatch/test_modules/test_klp_kprobe.c | 38 ++++++++++++
11 files changed, 150 insertions(+), 41 deletions(-)
create mode 100755 tools/testing/selftests/livepatch/test-kprobe.sh
create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c
--
2.47.0
For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.
Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 73 -------------------
tools/testing/selftests/net/lib.sh | 73 +++++++++++++++++++
2 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index d28dbf27c1f0..8625e3c99f55 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -445,79 +445,6 @@ done
##############################################################################
# Helpers
-# Whether FAILs should be interpreted as XFAILs. Internal.
-FAIL_TO_XFAIL=
-
-check_err()
-{
- local err=$1
- local msg=$2
-
- if ((err)); then
- if [[ $FAIL_TO_XFAIL = yes ]]; then
- ret_set_ksft_status $ksft_xfail "$msg"
- else
- ret_set_ksft_status $ksft_fail "$msg"
- fi
- fi
-}
-
-check_fail()
-{
- local err=$1
- local msg=$2
-
- check_err $((!err)) "$msg"
-}
-
-check_err_fail()
-{
- local should_fail=$1; shift
- local err=$1; shift
- local what=$1; shift
-
- if ((should_fail)); then
- check_fail $err "$what succeeded, but should have failed"
- else
- check_err $err "$what failed"
- fi
-}
-
-xfail()
-{
- FAIL_TO_XFAIL=yes "$@"
-}
-
-xfail_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW = yes ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
-omit_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW != yes ]]; then
- "$@"
- fi
-}
-
-xfail_on_veth()
-{
- local dev=$1; shift
- local kind
-
- kind=$(ip -j -d link show dev $dev |
- jq -r '.[].linkinfo.info_kind')
- if [[ $kind = veth ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 4f52b8e48a3a..6bcf5d13879d 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -361,3 +361,76 @@ tests_run()
$current_test
done
}
+
+# Whether FAILs should be interpreted as XFAILs. Internal.
+FAIL_TO_XFAIL=
+
+check_err()
+{
+ local err=$1
+ local msg=$2
+
+ if ((err)); then
+ if [[ $FAIL_TO_XFAIL = yes ]]; then
+ ret_set_ksft_status $ksft_xfail "$msg"
+ else
+ ret_set_ksft_status $ksft_fail "$msg"
+ fi
+ fi
+}
+
+check_fail()
+{
+ local err=$1
+ local msg=$2
+
+ check_err $((!err)) "$msg"
+}
+
+check_err_fail()
+{
+ local should_fail=$1; shift
+ local err=$1; shift
+ local what=$1; shift
+
+ if ((should_fail)); then
+ check_fail $err "$what succeeded, but should have failed"
+ else
+ check_err $err "$what failed"
+ fi
+}
+
+xfail()
+{
+ FAIL_TO_XFAIL=yes "$@"
+}
+
+xfail_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW = yes ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
+
+omit_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW != yes ]]; then
+ "$@"
+ fi
+}
+
+xfail_on_veth()
+{
+ local dev=$1; shift
+ local kind
+
+ kind=$(ip -j -d link show dev $dev |
+ jq -r '.[].linkinfo.info_kind')
+ if [[ $kind = veth ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
--
2.45.0
It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.
Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 10 ++++++++++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 41dd14c42c48..d28dbf27c1f0 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1285,16 +1285,6 @@ matchall_sink_create()
action drop
}
-tests_run()
-{
- local current_test
-
- for current_test in ${TESTS:-$ALL_TESTS}; do
- in_defer_scope \
- $current_test
- done
-}
-
cleanup()
{
pre_cleanup
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 691318b1ec55..4f52b8e48a3a 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -351,3 +351,13 @@ log_info()
echo "INFO: $msg"
}
+
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ in_defer_scope \
+ $current_test
+ done
+}
--
2.45.0
This series is a follow-up to Joey's Permission Overlay Extension (POE)
series [1] that recently landed on mainline. The goal is to improve the
way we handle the register that governs which pkeys/POIndex are
accessible (POR_EL0) during signal delivery. As things stand, we may
unexpectedly fail to write the signal frame on the stack because POR_EL0
is not reset before the uaccess operations. See patch 3 for more details
and the main changes this series brings.
A similar series landed recently for x86/MPK [2]; the present series
aims at aligning arm64 with x86. Worth noting: once the signal frame is
written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
only. This means that a program that sets up an alternate signal stack
with a non-zero pkey will need some assembly trampoline to set POR_EL0
before invoking the real signal handler, as discussed here [3].
The x86 series also added kselftests to ensure that no spurious SIGSEGV
occurs during signal delivery regardless of which pkey is accessible at
the point where the signal is delivered. This series adapts those
kselftests to allow running them on arm64 (patch 4-5).
Finally patch 2 is a clean-up following feedback on Joey's series [4].
I have tested this series on arm64 and x86_64 (booting and running the
protection_keys and pkey_sighandler_tests mm kselftests).
- Kevin
[1] https://lore.kernel.org/linux-arm-kernel/20240822151113.1479789-1-joey.goul…
[2] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@ora…
[3] https://lore.kernel.org/lkml/CABi2SkWxNkP2O7ipkP67WKz0-LV33e5brReevTTtba6oK…
[4] https://lore.kernel.org/linux-arm-kernel/20241015114116.GA19334@willie-the-…
Cc: akpm(a)linux-foundation.org
Cc: anshuman.khandual(a)arm.com
Cc: aruna.ramakrishna(a)oracle.com
Cc: broonie(a)kernel.org
Cc: catalin.marinas(a)arm.com
Cc: dave.hansen(a)linux.intel.com
Cc: dave.martin(a)arm.com
Cc: jeffxu(a)chromium.org
Cc: joey.gouly(a)arm.com
Cc: shuah(a)kernel.org
Cc: will(a)kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: x86(a)kernel.org
Kevin Brodsky (5):
arm64: signal: Remove unused macro
arm64: signal: Remove unnecessary check when saving POE state
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
selftests/mm: Use generic pkey register manipulation
selftests/mm: Enable pkey_sighandler_tests on arm64
arch/arm64/kernel/signal.c | 92 +++++++++++++---
tools/testing/selftests/mm/Makefile | 8 +-
tools/testing/selftests/mm/pkey-arm64.h | 1 +
tools/testing/selftests/mm/pkey-x86.h | 2 +
.../selftests/mm/pkey_sighandler_tests.c | 101 +++++++++++++-----
5 files changed, 159 insertions(+), 45 deletions(-)
--
2.43.0
Commit 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX
instructions") unconditionally added -march=x86-64-v2 to the CFLAGS used
to build the KVM selftests which does not work on non-x86 architectures:
cc1: error: unknown value ‘x86-64-v2’ for ‘-march’
Fix this by making the addition of this x86 specific command line flag
conditional on building for x86.
Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/kvm/Makefile | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e6b7e01d57080b304b21120f0d47bda260ba6c43..156fbfae940feac649f933dc6e048a2e2926542a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -244,11 +244,13 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
-I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \
-I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \
- -march=x86-64-v2 \
$(KHDR_INCLUDES)
ifeq ($(ARCH),s390)
CFLAGS += -march=z10
endif
+ifeq ($(ARCH),x86)
+ CFLAGS += -march=x86-64-v2
+endif
ifeq ($(ARCH),arm64)
tools_dir := $(top_srcdir)/tools
arm64_tools_dir := $(tools_dir)/arch/arm64/tools/
---
base-commit: d129377639907fce7e0a27990e590e4661d3ee02
change-id: 20241021-kvm-build-break-495abedc51e0
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Recently, a defer helper was added to Python selftests. The idea is to keep
cleanup commands close to their dirtying counterparts, thereby making it
more transparent what is cleaning up what, making it harder to miss a
cleanup, and make the whole cleanup business exception safe. All these
benefits are applicable to bash as well, exception safety can be
interpreted in terms of safety vs. a SIGINT.
This patchset therefore introduces a framework of several helpers that
serve to schedule cleanups in bash selftests.
- Patch #1 has more details about the primitives being introduced.
Patch #2 adds a fallback cleanup() function to lib.sh, because ideally
selftests wouldn't need to introduce a dedicated cleanup function at all.
- Patch #3 adds a parameter to stop_traffic(), which makes it possible to
start other background processes after the traffic is started without
confusing the cleanup.
- Patches #4 to #10 convert a number of selftests.
The goal was to convert all tests that use start_traffic / stop_traffic
to the defer framework. Leftover traffic generators are a particularly
painful sort of a missed cleanup. Normal unfinished cleanups can usually
be cleaned up simply by rerunning the test and interrupting it early to
let the cleanups run again / in full. This does not work with
stop_traffic, because it is only issued at the end of the test case that
starts the traffic. At the same time, leftover traffic generators
influence follow-up test runs, and are hard to notice.
The tests were however converted whole-sale, not just their traffic bits.
Thus they form a proof of concept of the defer framework.
v2:
- Patch #1:
- In __defer__schedule(), use ndefers in place of
${__DEFER__NJOBS[$ndefers_key]}
- Patch #4:
- Defer stop_traffic including the sleep. The sleep is actually
necessary and v1 was wrong in that it had the sleep prior to the
stop_traffic invocation.
v1 (from the RFC):
- Patch #1:
- Added the priority defer track
- Dropped defer_scoped_fn, added in_defer_scope
- Extracted to a separate independent module
- Patch #2:
- Moved this bit to a separate patch
- Patch #3:
- New patch
- Patch #4 (RED):
- Squashed the individual RED-related patches into one
- Converted the SW datapath RED selftest as well
- Patch #5 (TBF):
- Fully converted the selftest, not just stop_traffic
- Patches #6, #7, #8, #9, #10:
- New patch
Petr Machata (10):
selftests: net: lib: Introduce deferred commands
selftests: forwarding: Add a fallback cleanup()
selftests: forwarding: lib: Allow passing PID to stop_traffic()
selftests: RED: Use defer for test cleanup
selftests: TBF: Use defer for test cleanup
selftests: ETS: Use defer for test cleanup
selftests: mlxsw: qos_mc_aware: Use defer for test cleanup
selftests: mlxsw: qos_ets_strict: Use defer for test cleanup
selftests: mlxsw: qos_max_descriptors: Use defer for test cleanup
selftests: mlxsw: devlink_trap_police: Use defer for test cleanup
.../drivers/net/mlxsw/devlink_trap_policer.sh | 85 ++++----
.../drivers/net/mlxsw/qos_ets_strict.sh | 167 ++++++++--------
.../drivers/net/mlxsw/qos_max_descriptors.sh | 118 ++++-------
.../drivers/net/mlxsw/qos_mc_aware.sh | 146 +++++++-------
.../selftests/drivers/net/mlxsw/sch_ets.sh | 26 ++-
.../drivers/net/mlxsw/sch_red_core.sh | 185 +++++++++---------
.../drivers/net/mlxsw/sch_red_ets.sh | 24 +--
.../drivers/net/mlxsw/sch_red_root.sh | 18 +-
tools/testing/selftests/net/forwarding/lib.sh | 13 +-
.../selftests/net/forwarding/sch_ets.sh | 7 +-
.../selftests/net/forwarding/sch_ets_core.sh | 81 +++-----
.../selftests/net/forwarding/sch_ets_tests.sh | 14 +-
.../selftests/net/forwarding/sch_red.sh | 103 ++++------
.../selftests/net/forwarding/sch_tbf_core.sh | 91 +++------
.../net/forwarding/sch_tbf_etsprio.sh | 7 +-
.../selftests/net/forwarding/sch_tbf_root.sh | 3 +-
tools/testing/selftests/net/lib.sh | 3 +
tools/testing/selftests/net/lib/Makefile | 2 +-
tools/testing/selftests/net/lib/sh/defer.sh | 115 +++++++++++
19 files changed, 595 insertions(+), 613 deletions(-)
create mode 100644 tools/testing/selftests/net/lib/sh/defer.sh
--
2.45.0
Currently fp-stress does not report a top level test result if it runs to
completion, it always exits with a return code 0. Use the ksft_finished()
helper to ensure that the exit code for the top level program reports a
failure if any of the individual tests has failed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/fp-stress.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c
index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..e62c9dbad5010234d70b477cf8c52ba0b312910e 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -651,7 +651,5 @@ int main(int argc, char **argv)
drain_output(true);
- ksft_print_cnts();
-
- return 0;
+ ksft_finished();
}
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-arm64-fp-stress-exit-code-90fe21dc4bc3
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
https://bugzilla.kernel.org/attachment.cgi?id=306250
Testing Done:
- Added a selftest to test the Idle HLT intercept functionality.
- Compile and functionality testing for the Idle HLT intercept selftest
are only done for x86_64.
- Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
a way that each architecture can share the generic types and also
provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Add an interface to read the data of named vcpu stat
KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 11 ++-
tools/testing/selftests/kvm/Makefile | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 44 +++++++++
.../kvm/include/x86_64/kvm_util_arch.h | 40 +++++++++
.../selftests/kvm/include/x86_64/processor.h | 18 ++++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 +++++++
.../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++
10 files changed, 236 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: d91a9cc16417b8247213a0144a1f0fd61dc855dd
--
2.34.1
This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
parent HWPT typically hold one stage-2 IO pagetable and tag it with only
one ID in the cache entries. When sharing one large stage-2 IO pagetable
across physical IOMMU instances, that one ID may not always be available
across all the IOMMU instances. In other word, it's ideal for SW to have
a different container for the stage-2 IO pagetable so it can hold another
ID that's available.
For this "different container", add vIOMMU, an additional layer to hold
extra virtualization information:
_______________________________________________________________________
| iommufd (with vIOMMU) |
| |
| [5] |
| _____________ |
| | | |
| [1] | vIOMMU | [4] [2] |
| ________________ | | _____________ ________ |
| | | | [3] | | | | | |
| | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
| |________________| |_____________| |_____________| |________| |
| | | | | |
|_________|____________________|__________________|_______________|_____|
| | | |
| ______v_____ ______v_____ ___v__
| PFN storage | (paging) | | (nested) | |struct|
|------------>|iommu_domain|<----|iommu_domain|<----|device|
|____________| |____________| |______|
The vIOMMU object should be seen as a slice of a physical IOMMU instance
that is passed to or shared with a VM. That can be some HW/SW resources:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting
On a multi-IOMMU system, the vIOMMU object must be instanced to the number
of the physical IOMMUs that are passed to (via devices) a guest VM, while
being able to hold the shareable parent HWPT. Each vIOMMU then just needs
to allocate its own individual ID to tag its own cache:
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested0 |--->| viommu0 ------------------
---------------- | | IDx |
----------------------------
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested1 |--->| viommu1 ------------------
---------------- | | IDy |
----------------------------
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation
only. Later series will add more data structures and their ioctls.
As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT
type for a core-allocated-core-managed vIOMMU object, allowing drivers to
simply hook a default viommu ops for viommu-based invalidation alone. And
add support for driver-specific type of vIOMMU allocation, and implement
that in the ARM SMMUv3 driver for a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up
series to support vDEVICE, vIRQ (vEVENT) and VQUEUE objects. Although we
repurposed the vIOMMU object from an earlier RFC, just for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v3
(paring QEMU branch for testing will be provided with the part2 series)
Changelog
v3
* Rebased on top of Jason's nesting v3 series
https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvid…
* Split the series into smaller parts
* Added Jason's Reviewed-by
* Added back viommu->iommu_dev
* Added support for driver-allocated vIOMMU v.s. core-allocated
* Dropped arm_smmu_cache_invalidate_user
* Added an iommufd_test_wait_for_users() in selftest
* Reworked test code to make viommu an individual FIXTURE
* Added missing TEST_LENGTH case for the new ioctl command
v2
https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
* Limited vdev_id to one per idev
* Added a rw_sem to protect the vdev_id list
* Reworked driver-level APIs with proper lockings
* Added a new viommu_api file for IOMMUFD_DRIVER config
* Dropped useless iommu_dev point from the viommu structure
* Added missing index numnbers to new types in the uAPI header
* Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
* Reworked mock_viommu_cache_invalidate() using the new iommu helper
* Reordered details of set/unset_vdev_id handlers for proper lockings
v1
https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Nicolin Chen (11):
iommufd: Move struct iommufd_object to public iommufd header
iommufd: Rename _iommufd_object_alloc to iommufd_object_alloc_elm
iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
iommu: Pass in a viommu pointer to domain_alloc_user op
iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
iommufd/selftest: Add refcount to mock_iommu_device
iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
Documentation: userspace-api: iommufd: Update vIOMMU
iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 18 ++++
drivers/iommu/iommufd/iommufd_private.h | 23 ++---
drivers/iommu/iommufd/iommufd_test.h | 2 +
include/linux/iommu.h | 15 +++
include/linux/iommufd.h | 52 +++++++++++
include/uapi/linux/iommufd.h | 54 +++++++++--
tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++
drivers/iommu/amd/iommu.c | 1 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 24 +++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +
drivers/iommu/intel/iommu.c | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 27 +++++-
drivers/iommu/iommufd/main.c | 38 ++------
drivers/iommu/iommufd/selftest.c | 79 ++++++++++++++--
drivers/iommu/iommufd/viommu.c | 91 +++++++++++++++++++
drivers/iommu/iommufd/viommu_api.c | 57 ++++++++++++
tools/testing/selftests/iommu/iommufd.c | 84 +++++++++++++++++
Documentation/userspace-api/iommufd.rst | 66 +++++++++++++-
19 files changed, 602 insertions(+), 65 deletions(-)
create mode 100644 drivers/iommu/iommufd/viommu.c
create mode 100644 drivers/iommu/iommufd/viommu_api.c
--
2.43.0
This patch series migrates test cases out of test_sock.c to
prog_tests-style tests. It moves all BPF_CGROUP_INET4_POST_BIND and
BPF_CGROUP_INET6_POST_BIND test cases into a new prog_test,
sock_post_bind.c, while reimplementing all LOAD_REJECT test cases as
verifier tests in progs/verifier_sock.c. Finally, it moves remaining
BPF_CGROUP_INET_SOCK_CREATE test coverage into prog_tests/sock_create.c
before retiring test_sock.c completely.
Jordan Rife (4):
selftests/bpf: Migrate *_POST_BIND test cases to prog_tests
selftests/bpf: Migrate LOAD_REJECT test cases to prog_tests
selftests/bpf: Migrate BPF_CGROUP_INET_SOCK_CREATE test cases to
prog_tests
selftests/bpf: Retire test_sock.c
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/prog_tests/sock_create.c | 35 ++-
.../sock_post_bind.c} | 251 ++++--------------
.../selftests/bpf/progs/verifier_sock.c | 60 +++++
5 files changed, 142 insertions(+), 208 deletions(-)
rename tools/testing/selftests/bpf/{test_sock.c => prog_tests/sock_post_bind.c} (64%)
--
2.47.0.rc1.288.g06298d1525-goog
Hello,
this series aims to bring test_tcp_check_syncookie.sh scope into
test_progs to make sure that the corresponding tests are also run
automatically in CI. This script tests for bpf_tcp_{gen,check}_syncookie
and bpf_skc_lookup_tcp, in different contexts (ipv4, v6 or dual, and
with tc and xdp programs).
Some other tests like btf_skc_cls_ingress have some overlapping tests with
test_tcp_check_syncookie.sh, so this series moves the missing bits from
test_tcp_check_syncookie.sh into btf_skc_cls_ingress, which is already
integrated into test_progs.
- the first three commits bring some minor improvements to
btf_skc_cls_ingress without changing its testing scope
- fourth and fifth commits bring test_tcp_check_syncookie.sh features
into btf_skc_cls_ingress
- last commit removes test_tcp_check_syncookie.sh
The only topic for which I am not sure for this integration is the
necessity or not to run the tests with different program types:
test_tcp_check_syncookie.sh runs tests with both tc and xdp programs, but
btf_skc_cls_ingress currently tests those helpers only with a tc
program. Would it make sense to also make sure that btf_skc_cls_ingress
is tested with all the programs types supported by those helpers ?
The series has been tested both in CI and in a local x86_64 qemu
environment:
# ./test_progs -a btf_skc_cls_ingress
#38/1 btf_skc_cls_ingress/conn_ipv4:OK
#38/2 btf_skc_cls_ingress/conn_ipv6:OK
#38/3 btf_skc_cls_ingress/conn_dual:OK
#38/4 btf_skc_cls_ingress/syncookie_ipv4:OK
#38/5 btf_skc_cls_ingress/syncookie_ipv6:OK
#38/6 btf_skc_cls_ingress/syncookie_dual:OK
#38 btf_skc_cls_ingress:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Changes in v2:
- fix initial test author mail in Cc
- Fix default cases in switches: indent, action
- remove unneeded initializer
- remove duplicate interface bring-up
- remove unnecessary check and return in bpf program
- Link to v1: https://lore.kernel.org/r/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com
---
Alexis Lothoré (eBPF Foundation) (6):
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: remove test_tcp_check_syncookie
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 9 +-
.../selftests/bpf/prog_tests/btf_skc_cls_ingress.c | 264 +++++++++++++--------
.../selftests/bpf/progs/test_btf_skc_cls_ingress.c | 82 ++++---
.../bpf/progs/test_tcp_check_syncookie_kern.c | 167 -------------
.../selftests/bpf/test_tcp_check_syncookie.sh | 85 -------
.../selftests/bpf/test_tcp_check_syncookie_user.c | 213 -----------------
7 files changed, 217 insertions(+), 604 deletions(-)
---
base-commit: 030207b7fce8bad6827615cfc2c6592916e2c336
change-id: 20241015-syncookie-ea7686264586
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Hi,
The asm-generic/unistd.h file has wrong __NR_userfaultfd syscall number which
doesn't even depend on the architecture. This has caused failure of a selftest
which was fixed recently [1].
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define __NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define __NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define __NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define __NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define __NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that it
is different for different arch:
x86 374
x86_64 323
ARM 347/358
It seems include/uapi/asm-generic/unistd has wrong 282 value in it. Maybe I'm
missing some context.. Please have a look at it.
The __NR_userfaultfd was added to include/uapi/asm-generic/unistd.h in
09f7298100ea ("Subject: [PATCH] userfaultfd: register uapi generic syscall (aarch64)").
[1] https://lore.kernel.org/all/20240912103151.1520254-1-usama.anjum@collabora.…
--
BR,
/Muhammad Usama Anjum
From: Jeff Xu <jeffxu(a)chromium.org>
This series increase the test coverage of mseal_test by:
Add check for vma_size, prot, and error code for existing tests.
Add more testcases for madvise, munmap, mmap and mremap to cover
sealing in different scenarios.
The increase test coverage hopefully help to prevent future regression.
It doesn't change any existing mm api's semantics, i.e. it will pass on
linux main and 6.10 branch.
Note: in order to pass this test in mm-unstable, mm-unstable must have
Liam's fix on mmap [1]
[1] https://lore.kernel.org/linux-kselftest/vyllxuh5xbqmaoyl2mselebij5ox7cseekj…
History:
V3:
- no-functional change, incooperate feedback from Pedro Falcato
V2:
- https://lore.kernel.org/linux-kselftest/20240829214352.963001-1-jeffxu@chro…
- remove the mmap fix (Liam R. Howlett will fix it separately)
- Add cover letter (Lorenzo Stoakes)
- split the testcase for ease of review (Mark Brown)
V1:
- https://lore.kernel.org/linux-kselftest/20240828225522.684774-1-jeffxu@chro…
Jeff Xu (5):
selftests/mseal_test: Check vma_size, prot, error code.
selftests/mseal: add sealed madvise type
selftests/mseal: munmap across multiple vma ranges.
selftests/mseal: add more tests for mmap
selftests/mseal: add more tests for mremap
tools/testing/selftests/mm/mseal_test.c | 830 ++++++++++++++++++++++--
1 file changed, 763 insertions(+), 67 deletions(-)
--
2.46.0.469.g59c65b2a67-goog
From: Jason Xing <kernelxing(a)tencent.com>
When I compiled the tools/testing/selftests/bpf, the following error
pops out:
uprobe_multi.c: In function ‘trigger_uprobe’:
uprobe_multi.c:109:26: error: ‘MADV_PAGEOUT’ undeclared (first use in this function); did you mean ‘MADV_RANDOM’?
madvise(addr, page_sz, MADV_PAGEOUT);
^~~~~~~~~~~~
MADV_RANDOM
Including the <linux/linux/mman.h> header file solves this compilation error.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
v2
Link: https://lore.kernel.org/all/20241020031422.46894-1-kerneljasonxing@gmail.co…
1. handle it in a proper way
---
tools/testing/selftests/bpf/uprobe_multi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
index c7828b13e5ff..40231f02b95d 100644
--- a/tools/testing/selftests/bpf/uprobe_multi.c
+++ b/tools/testing/selftests/bpf/uprobe_multi.c
@@ -4,6 +4,7 @@
#include <string.h>
#include <stdbool.h>
#include <stdint.h>
+#include <linux/mman.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sdt.h>
--
2.37.3
From: Jason Xing <kernelxing(a)tencent.com>
When I compiled the tools/testing/selftests/bpf, the following error
pops out:
uprobe_multi.c: In function ‘trigger_uprobe’:
uprobe_multi.c:109:26: error: ‘MADV_PAGEOUT’ undeclared (first use in this function); did you mean ‘MADV_RANDOM’?
madvise(addr, page_sz, MADV_PAGEOUT);
^~~~~~~~~~~~
MADV_RANDOM
We can see MADV_PAGEOUT existing in mman-common.h on x86 arch, so
including this header file solves this compilation error.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
tools/testing/selftests/bpf/uprobe_multi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
index c7828b13e5ff..b0e11ffe0e1c 100644
--- a/tools/testing/selftests/bpf/uprobe_multi.c
+++ b/tools/testing/selftests/bpf/uprobe_multi.c
@@ -5,6 +5,7 @@
#include <stdbool.h>
#include <stdint.h>
#include <sys/mman.h>
+#include <mman-common.h>
#include <unistd.h>
#include <sdt.h>
--
2.37.3
The PSCI v1.3 spec (https://developer.arm.com/documentation/den0022)
adds support for a SYSTEM_OFF2 function enabling a HIBERNATE_OFF state
which is analogous to ACPI S4. This will allow hosting environments to
determine that a guest is hibernated rather than just powered off, and
ensure that they preserve the virtual environment appropriately to
allow the guest to resume safely (or bump the hardware_signature in the
FACS to trigger a clean reboot instead).
This updates KVM to support advertising PSCI v1.3, and unconditionally
enables the SYSTEM_OFF2 support when PSCI v1.3 is enabled.
For the guest side, add a new SYS_OFF_MODE_POWER_OFF handler with higher
priority than the EFI one, but which *only* triggers when there's a
hibernation in progress. There are other ways to do this (see the commit
message for more details) but this seemed like the simplest.
Version 2 of the patch series splits out the psci.h definitions into a
separate commit (a dependency for both the guest and KVM side), and adds
definitions for the other new functions added in v1.3. It also moves the
pKVM psci-relay support to a separate commit; although in arch/arm64/kvm
that's actually about the *guest* side of SYSTEM_OFF2 (i.e. using it
from the host kernel, relayed through nVHE).
Version 3 dropped the KVM_CAP which allowed userspace to explicitly opt
in to the new feature like with SYSTEM_SUSPEND, and makes it depend only
on PSCI v1.3 being exposed to the guest.
Version 4 is no longer RFC, as the PSCI v1.3 spec is finally published.
Minor fixes from the last round of review, and an added KVM self test.
Version 5 drops some of the changes which didn't make it to the final
v1.3 spec, and cleans up a couple of places which still referred to it
as 'alpha' or 'beta'. It also temporarily drops the guest-side patch to
invoke SYSTEM_OFF2 for hibernation, pending confirmation that the final
PSCI v1.3 spec just has a typo where it changed to saying that 0x1
should be passed to mean HIBERNATE_OFF, even though it's advertised as
bit 0. That can be sent under separate cover, and perhaps should have
been anyway. The change in question doesn't matter for any of the KVM
patches, because we just treat SYSTEM_OFF2 like the existing
SYSTEM_RESET2, setting a flag to indicate that it was a SYSTEM_OFF2
call, but not actually caring about the argument; that's for userspace
to worry about.
David Woodhouse (5):
firmware/psci: Add definitions for PSCI v1.3 specification
KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
KVM: arm64: Add support for PSCI v1.2 and v1.3
KVM: selftests: Add test for PSCI SYSTEM_OFF2
KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
Documentation/virt/kvm/api.rst | 11 +++++
arch/arm64/include/uapi/asm/kvm.h | 6 +++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 +
arch/arm64/kvm/hypercalls.c | 2 +
arch/arm64/kvm/psci.c | 43 ++++++++++++++++-
include/kvm/arm_psci.h | 4 +-
include/uapi/linux/psci.h | 5 ++
tools/testing/selftests/kvm/aarch64/psci_test.c | 61 +++++++++++++++++++++++++
8 files changed, 132 insertions(+), 2 deletions(-)
Hello,
this series aims to bring test_tcp_check_syncookie.sh scope into
test_progs to make sure that the corresponding tests are also run
automatically in CI. This script tests for bpf_tcp_{gen,check}_syncookie
and bpf_skc_lookup_tcp, in different contexts (ipv4, v6 or dual, and
with tc and xdp programs).
Some other tests like btf_skc_cls_ingress have some overlapping tests with
test_tcp_check_syncookie.sh, so this series moves the missing bits from
test_tcp_check_syncookie.sh into btf_skc_cls_ingress, which is already
integrated into test_progs.
- the first three commits bring some minor improvements to
btf_skc_cls_ingress without changing its testing scope
- fourth and fifth commits bring test_tcp_check_syncookie.sh features
into btf_skc_cls_ingress
- last commit removes test_tcp_check_syncookie.sh
The only topic for which I am not sure for this integration is the
necessity or not to run the tests with different program types:
test_tcp_check_syncookie.sh runs tests with both tc and xdp programs, but
btf_skc_cls_ingress currently tests those helpers only with a tc
program. Would it make sense to also make sure that btf_skc_cls_ingress
is tested with all the programs types supported by those helpers ?
The series has been tested both in CI and in a local x86_64 qemu
environment:
# ./test_progs -a btf_skc_cls_ingress
#38/1 btf_skc_cls_ingress/conn_ipv4:OK
#38/2 btf_skc_cls_ingress/conn_ipv6:OK
#38/3 btf_skc_cls_ingress/conn_dual:OK
#38/4 btf_skc_cls_ingress/syncookie_ipv4:OK
#38/5 btf_skc_cls_ingress/syncookie_ipv6:OK
#38/6 btf_skc_cls_ingress/syncookie_dual:OK
#38 btf_skc_cls_ingress:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (6):
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: remove test_tcp_check_syncookie
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 9 +-
.../selftests/bpf/prog_tests/btf_skc_cls_ingress.c | 265 +++++++++++++--------
.../selftests/bpf/progs/test_btf_skc_cls_ingress.c | 83 +++++--
.../bpf/progs/test_tcp_check_syncookie_kern.c | 167 -------------
.../selftests/bpf/test_tcp_check_syncookie.sh | 85 -------
.../selftests/bpf/test_tcp_check_syncookie_user.c | 213 -----------------
7 files changed, 222 insertions(+), 601 deletions(-)
---
base-commit: 030207b7fce8bad6827615cfc2c6592916e2c336
change-id: 20241015-syncookie-ea7686264586
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.12-rc4.
-- fixes test makefile to install tests directory without which
the test fails with errors.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4ee5ca9a29384fcf3f18232fdf8474166dea8dca:
ftrace/selftest: Test combination of function_graph tracer and function profiler
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.12-rc4
for you to fetch changes up to fe05c40ca9c18cfdb003f639a30fc78a7ab49519:
selftest: hid: add the missing tests directory (2024-10-16 15:55:14 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.12-rc4
kselftest fixes for Linux 6.12-rc4
-- fixes test makefile to install tests directory without which
the test fails with errors.
----------------------------------------------------------------
Yun Lu (1):
selftest: hid: add the missing tests directory
tools/testing/selftests/hid/Makefile | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suze.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (4):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 ++++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++--
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1168 ++++++++++++++++++++++
18 files changed, 1564 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.46.2
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
The previous patches incorrectly assumed that the parent would
always initialize the pthread_barrier for the child thread. This
reverts the change and replaces the fix for wp-fork-with-event with the
original use of atomic_bool.
Edward Liaw (3):
Revert "selftests/mm: fix deadlock for fork after pthread_create on
ARM"
Revert "selftests/mm: replace atomic_bool with pthread_barrier_t"
selftests/mm: fix deadlock for fork after pthread_create with
atomic_bool
tools/testing/selftests/mm/uffd-common.c | 5 ++--
tools/testing/selftests/mm/uffd-common.h | 3 ++-
tools/testing/selftests/mm/uffd-unit-tests.c | 24 ++++++++------------
3 files changed, 14 insertions(+), 18 deletions(-)
--
2.47.0.105.g07ac214952-goog
Changes since V2:
- V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.chatre@intel.com/
- Add fix to protect against buffer overflow when parsing text from sysfs files.
- Add cleanup patch to address use of magic constants as pointed out by
Ilpo.
- Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache
size to determine "fill_buf" buffer size" that changed too much since
receiving the Reviewed-by tag.
- Please see individual patches for detailed changes.
Changes since V1:
- V1: https://lore.kernel.org/cover.1724970211.git.reinette.chatre@intel.com/
- V2 contains the same general solutions to stated problem as V1 but these
are now preceded by more fixes (patches 1 to 5) and improved robustness
(patches 6 to 9) to existing tests before the series gets back
to solving the original problem with more confidence in patches 10 to 13.
- The posibility of making "memflush = false" for CMT test was discussed
during V1. Modifying this setting does not have a significant impact on the
observed results that are already well within acceptable range and this
version thus keeps original default. If performance was a goal it may
be possible to do further experimentation where "memflush = false" could
eliminate the need for the sleep(1) within the test wrapper, but
improving the performance is not a goal of this work.
- (New) Support what seems to be unintended ability for user space to provide
parameters to "fill_buf" by making the parsing robust and only support
changing parameters that are supported to be changed. Drop support for
"write" operation since it has never been measured.
- (New) Improve wraparound handling. (Ilpo)
- (New) A couple of new fixes addressing issues discovered during development.
- (Change from V1) To support fill_buf parameters provided by user space as
well as test specific fill_buf parameters struct fill_buf_param is no longer
just a member of struct resctrl_val_param, instead there could be at most
two instances of struct fill_buf_param, the immutable parameters provided
by user space and the parameters used by individual tests. (Ilpo)
- Please see individual patches for detailed changes.
V1 cover:
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
MBA and MBM selftests measure memory traffic for which a hardcoded
250MB buffer has been sufficient so far. On platforms with L3 cache
larger than the buffer, the buffer fits in the L3 cache and thus
no/very little memory traffic is generated during the "memory
bandwidth" tests.
2) Some platform features, for example RAS features or memory
performance features that generate memory traffic may drive accesses
that are counted differently by performance counters and MBM
respectively, for instance generating "overhead" traffic which is not
counted against any specific RMID. Until now these counting
differences have always been "in the noise". On Emerald Rapids
systems the maximum MBA throttling (10% memory bandwidth)
throttles memory bandwidth to where memory accesses by these other
platform features push the memory bandwidth difference between
memory controller performance counters and resctrl (MBM) beyond the
tests' hardcoded tolerance.
Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
be used to measurement. Rework the tests to ensure that buffer
initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
bandwidth. The value of "low" is hardcoded to 750MiB based on
measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
systems. This limit is not applicable to AMD systems since it
only applies to the MBA and MBM tests that are isolated to Intel.
[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plat…
Reinette Chatre (15):
selftests/resctrl: Make functions only used in same file static
selftests/resctrl: Print accurate buffer size as part of MBM results
selftests/resctrl: Fix memory overflow due to unhandled wraparound
selftests/resctrl: Protect against array overrun during iMC config
parsing
selftests/resctrl: Protect against array overflow when reading strings
selftests/resctrl: Make wraparound handling obvious
selftests/resctrl: Remove "once" parameter required to be false
selftests/resctrl: Only support measured read operation
selftests/resctrl: Remove unused measurement code
selftests/resctrl: Make benchmark parameter passing robust
selftests/resctrl: Ensure measurements skip initialization of default
benchmark
selftests/resctrl: Use cache size to determine "fill_buf" buffer size
selftests/resctrl: Do not compare performance counters and resctrl at
low bandwidth
selftests/resctrl: Keep results from first test run
selftests/resctrl: Replace magic constants used as array size
tools/testing/selftests/resctrl/cmt_test.c | 37 +-
tools/testing/selftests/resctrl/fill_buf.c | 45 +-
tools/testing/selftests/resctrl/mba_test.c | 54 ++-
tools/testing/selftests/resctrl/mbm_test.c | 37 +-
tools/testing/selftests/resctrl/resctrl.h | 79 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 95 +++-
tools/testing/selftests/resctrl/resctrl_val.c | 447 +++++-------------
tools/testing/selftests/resctrl/resctrlfs.c | 19 +-
8 files changed, 354 insertions(+), 459 deletions(-)
--
2.46.2
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
Updated the synchronization primitive to use pthread_barrier instead of
atomic_bool.
Applied the same fix to the wp-fork-with-event test.
Edward Liaw (2):
selftests/mm: replace atomic_bool with pthread_barrier_t
selftests/mm: fix deadlock for fork after pthread_create on ARM
tools/testing/selftests/mm/uffd-common.c | 5 +++--
tools/testing/selftests/mm/uffd-common.h | 3 +--
tools/testing/selftests/mm/uffd-unit-tests.c | 21 ++++++++++++++------
3 files changed, 19 insertions(+), 10 deletions(-)
--
2.46.1.824.gd892dcdcdd-goog
If the kunit being run generates a WARN for some reason kunit.py ignores it
and declares the tested PASSED. This is very much not desirable, as tests that
are hitting WARN's are probably actually failing.
Take the simple approach to reducing this by setting panic_on_warn when
running the kernel. The kernel crashes and kunit.py shows the WARN and reports
the test fails.
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
tools/testing/kunit/kunit_kernel.py | 2 ++
1 file changed, 2 insertions(+)
I saw there was an earlier series working to make tests that deliberately made
WARNs not do that, so this would be consistent with that idea, tests should not
make WARNs, and WARNs should not be ignored..
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 61931c4926fd66..7a4228568dd73c 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -342,6 +342,8 @@ class LinuxSourceTree:
if filter_action:
args.append('kunit.filter_action=' + filter_action)
args.append('kunit.enable=1')
+ args.append('panic_on_warn=1')
+ args.append('panic=-1')
process = self._ops.start(args, build_dir)
assert process.stdout is not None # tell mypy it's set
base-commit: 2872987b1d009df556c0061ecdeede6a5f9bf42c
--
2.46.2
1. In order to make rtctest more explicit and robust, we propose to use
RTC_PARAM_GET ioctl interface to check rtc alarm feature state before
running alarm related tests.
2. The rtctest requires the read permission on /dev/rtc0. The rtctest will
be skipped if the /dev/rtc0 is not readable.
Joseph Jang (2):
selftest: rtc: Add to check rtc alarm status for alarm related test
selftest: rtc: Check if could access /dev/rtc0 before testing
tools/testing/selftests/rtc/Makefile | 2 +-
tools/testing/selftests/rtc/rtctest.c | 71 ++++++++++++++++++++++++++-
2 files changed, 71 insertions(+), 2 deletions(-)
--
2.34.1
Introduce a new kselftest to identify slowdowns in key boot events.
This test uses ftrace to monitor the start and end times, as well as
the durations of all initcalls, and compares these timings to reference
values to identify significant slowdowns.
The script functions in two modes: the 'generate' mode allows to create
a JSON file containing initial reference timings for all initcalls from
a known stable kernel. The 'test' mode can be used during subsequent
boots to assess current timings against the reference values and
determine if there are any significant differences.
The test ships with a bootconfig file for setting up ftrace and a
configuration fragment for the necessary kernel configs.
Signed-off-by: Laura Nao <laura.nao(a)collabora.com>
---
Hello,
This v2 is a follow-up to RFCv1[1] and includes changes based on feedback
from the LPC 2024 session [2], along with some other fixes.
[1] https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@collabora.com/
[2] https://www.youtube.com/watch?v=rWhW2-Vzi40
After reviewing other available tests and considering the feedback from
discussions at Plumbers, I decided to stick with the bootconfig file
approach but extend it to track all initcalls instead of a fixed set of
functions or events. The bootconfig file can be expanded and adapted to
track additional functions if needed for specific use cases.
I also defined a synthetic event to calculate initcall durations, while
still tracking their start and end times. Users are then allowed to choose
whether to compare start times, end times, or durations. Support for
specifying different rules for different initcalls has also been added.
In RFCv1, there was some discussion about using existing tools like
bootgraph.py. However, the output from these tools is mainly for manual
inspection (e.g., HTML visual output), whereas this test is designed to run
in automated CI environments too. The kselftest proposed here combines the
process of generating reference data and running tests into a single script
with two modes, making it easy to integrate into automated workflows.
Many of the features in this v2 (e.g., generating a JSON reference file,
comparing timings, and reporting results in KTAP format) could potentially
be integrated into bootgraph.py with some effort.
However, since this test is intended for automated execution rather than
manual use, I've decided to keep it separate for now and explore the
options suggested at LPC, such as using ftrace histograms for initcall
latencies. I'm open to revisiting this decision and working toward
integrating the changes into bootgraph.py if there's a strong preference
for unifying the tools.
Let me know your thoughts.
A comprehensive changelog is reported below.
Thanks,
Laura
---
Changes in v2:
- Updated ftrace configuration to track all initcall start times, end
times, and durations, and generate a histogram.
- Modified test logic to compare initcall durations by default, with the
option to compare start or end times if needed.
- Added warnings if the initcalls in the reference file differ from those
detected in the running system.
- Combined the scripts into a single script with two modes: one for
generating the reference file and one for running the test.
- Added support for specifying different rules for individual initcalls.
- Switched the reference format from YAML to JSON.
- Added metadata to the reference file, including kernel version, kernel
configuration, and cmdline.
- Link to v1: https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@collabora.com/
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/boot-time/Makefile | 16 ++
tools/testing/selftests/boot-time/bootconfig | 15 +
tools/testing/selftests/boot-time/config | 6 +
.../selftests/boot-time/test_boot_time.py | 265 ++++++++++++++++++
5 files changed, 303 insertions(+)
create mode 100644 tools/testing/selftests/boot-time/Makefile
create mode 100644 tools/testing/selftests/boot-time/bootconfig
create mode 100644 tools/testing/selftests/boot-time/config
create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b38199965f99..1bb20d1e3854 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -3,6 +3,7 @@ TARGETS += acct
TARGETS += alsa
TARGETS += amd-pstate
TARGETS += arm64
+TARGETS += boot-time
TARGETS += bpf
TARGETS += breakpoints
TARGETS += cachestat
diff --git a/tools/testing/selftests/boot-time/Makefile b/tools/testing/selftests/boot-time/Makefile
new file mode 100644
index 000000000000..cdcdc1bbe779
--- /dev/null
+++ b/tools/testing/selftests/boot-time/Makefile
@@ -0,0 +1,16 @@
+PY3 = $(shell which python3 2>/dev/null)
+
+ifneq ($(PY3),)
+
+TEST_PROGS := test_boot_time.py
+
+include ../lib.mk
+
+else
+
+all: no_py3_warning
+
+no_py3_warning:
+ @echo "Missing python3. This test will be skipped."
+
+endif
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/bootconfig b/tools/testing/selftests/boot-time/bootconfig
new file mode 100644
index 000000000000..e4b89a33b7a3
--- /dev/null
+++ b/tools/testing/selftests/boot-time/bootconfig
@@ -0,0 +1,15 @@
+ftrace.event {
+ synthetic.initcall_latency {
+ # Synthetic event to record initcall latency, start, and end times
+ fields = "unsigned long func", "u64 lat", "u64 start", "u64 end"
+ actions = "hist:keys=func.sym,start,end:vals=lat:sort=lat"
+ }
+ initcall.initcall_start {
+ # Capture the start time (ts0) when initcall starts
+ actions = "hist:keys=func:ts0=common_timestamp.usecs"
+ }
+ initcall.initcall_finish {
+ # Capture the end time, calculate latency, and trigger synthetic event
+ actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:start=$ts0:end=common_timestamp.usecs:onmatch(initcall.initcall_start).initcall_latency(func,$lat,$start,$end)"
+ }
+}
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/config b/tools/testing/selftests/boot-time/config
new file mode 100644
index 000000000000..bcb646ec3cd8
--- /dev/null
+++ b/tools/testing/selftests/boot-time/config
@@ -0,0 +1,6 @@
+CONFIG_TRACING=y
+CONFIG_BOOTTIME_TRACING=y
+CONFIG_BOOT_CONFIG_EMBED=y
+CONFIG_BOOT_CONFIG_EMBED_FILE="tools/testing/selftests/boot-time/bootconfig"
+CONFIG_SYNTH_EVENTS=y
+CONFIG_HIST_TRIGGERS=y
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/test_boot_time.py b/tools/testing/selftests/boot-time/test_boot_time.py
new file mode 100755
index 000000000000..556dacf04b6d
--- /dev/null
+++ b/tools/testing/selftests/boot-time/test_boot_time.py
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2024 Collabora Ltd
+#
+# This script reads the
+# /sys/kernel/debug/tracing/events/synthetic/initcall_latency/hist file,
+# extracts function names and timings, and compares them against reference
+# timings provided in an input JSON file to identify significant boot
+# slowdowns.
+# The script operates in two modes:
+# - Generate Mode: parses initcall timings from the current kernel's ftrace
+# event histogram and generates a JSON reference file with function
+# names, start times, end times, and latencies.
+# - Test Mode: compares current initcall timings against the reference
+# file, allowing users to define a maximum allowed difference between the
+# values (delta). Users can also apply custom delta thresholds for
+# specific initcalls using regex-based overrides. The comparison can be
+# done on latency, start, or end times.
+#
+
+import os
+import sys
+import argparse
+import gzip
+import json
+import re
+import subprocess
+
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../kselftest/"))
+
+import ksft
+
+def load_reference_from_json(file_path):
+ """
+ Load reference data from a JSON file and returns the parsed data.
+ @file_path: path to the JSON file.
+ """
+
+ try:
+ with open(file_path, 'r', encoding="utf-8") as file:
+ return json.load(file)
+ except FileNotFoundError:
+ ksft.print_msg(f"Error: File {file_path} not found.")
+ ksft.exit_fail()
+ except json.JSONDecodeError:
+ ksft.print_msg(f"Error: Failed to decode JSON from {file_path}.")
+ ksft.exit_fail()
+
+
+def mount_debugfs(path):
+ """
+ Mount debugfs at the specified path if it is not already mounted.
+ @path: path where debugfs should be mounted
+ """
+ # Check if debugfs is already mounted
+ with open('/proc/mounts', 'r', encoding="utf-8") as mounts:
+ for line in mounts:
+ if 'debugfs' in line and path in line:
+ print(f"debugfs is already mounted at {path}")
+ return True
+
+ # Mount debugfs
+ try:
+ subprocess.run(['mount', '-t', 'debugfs', 'none', path], check=True)
+ return True
+ except subprocess.CalledProcessError as e:
+ print(f"Failed to mount debugfs: {e.stderr}")
+ return False
+
+
+def ensure_unique_function_name(func, initcall_entries):
+ """
+ Ensure the function name is unique by appending a suffix if necessary.
+ @func: the original function name.
+ @initcall_entries: a dictionary containing parsed initcall entries.
+ """
+ i = 2
+ base_func = func
+ while func in initcall_entries:
+ func = f'{base_func}[{i}]'
+ i += 1
+ return func
+
+
+def parse_initcall_latency_hist():
+ """
+ Parse the ftrace histogram for the initcall_latency event, extracting
+ function names, start times, end times, and latencies. Return a
+ dictionary where each entry is structured as follows:
+ {
+ <function symbolic name>: {
+ "start": <start time>,
+ "end": <end time>,
+ "latency": <latency>
+ }
+ }
+ """
+
+ pattern = re.compile(r'\{ func: \[\w+\] ([\w_]+)\s*, start: *(\d+), end: *(\d+) \} hitcount: *\d+ lat: *(\d+)')
+ initcall_entries = {}
+
+ try:
+ with open('/sys/kernel/debug/tracing/events/synthetic/initcall_latency/hist', 'r', encoding="utf-8") as hist_file:
+ for line in hist_file:
+ match = pattern.search(line)
+ if match:
+ func = match.group(1).strip()
+ start = int(match.group(2))
+ end = int(match.group(3))
+ latency = int(match.group(4))
+
+ # filter out unresolved names
+ if not func.startswith("0x"):
+ func = ensure_unique_function_name(func, initcall_entries)
+
+ initcall_entries[func] = {
+ "start": start,
+ "end": end,
+ "latency": latency
+ }
+ except FileNotFoundError:
+ print("Error: Histogram file not found.")
+
+ return initcall_entries
+
+
+def compare_initcall_list(ref_initcall_entries, cur_initcall_entries):
+ """
+ Compare the current list of initcall functions against the reference
+ file. Print warnings if there are unique entries in either.
+ @ref_initcall_entries: reference initcall entries.
+ @cur_initcall_entries: current initcall entries.
+ """
+ ref_entries = set(ref_initcall_entries.keys())
+ cur_entries = set(cur_initcall_entries.keys())
+
+ unique_to_ref = ref_entries - cur_entries
+ unique_to_cur = cur_entries - ref_entries
+
+ if (unique_to_ref):
+ ksft.print_msg(
+ f"Warning: {list(unique_to_ref)} not found in current data. Consider updating reference file.")
+ if unique_to_cur:
+ ksft.print_msg(
+ f"Warning: {list(unique_to_cur)} not found in reference data. Consider updating reference file.")
+
+
+def run_test(ref_file_path, delta, overrides, mode):
+ """
+ Run the test comparing the current timings with the reference values.
+ @ref_file_path: path to the JSON file containing reference values.
+ @delta: default allowed difference between reference and current
+ values.
+ @overrides: override rules in the form of regex:threshold.
+ @mode: the comparison mode (either 'start', 'end', or 'latency').
+ """
+
+ ref_data = load_reference_from_json(ref_file_path)
+
+ ref_initcall_entries = ref_data['data']
+ cur_initcall_entries = parse_initcall_latency_hist()
+
+ compare_initcall_list(ref_initcall_entries, cur_initcall_entries)
+
+ ksft.set_plan(len(ref_initcall_entries))
+
+ for func_name in ref_initcall_entries:
+ effective_delta = delta
+ for regex, override_delta in overrides.items():
+ if re.match(regex, func_name):
+ effective_delta = override_delta
+ break
+ if (func_name in cur_initcall_entries):
+ ref_metric = ref_initcall_entries[func_name].get(mode)
+ cur_metric = cur_initcall_entries[func_name].get(mode)
+ if (cur_metric > ref_metric and (cur_metric - ref_metric) >= effective_delta):
+ ksft.test_result_fail(func_name)
+ ksft.print_msg(f"'{func_name}' {mode} differs by "
+ f"{(cur_metric - ref_metric)} usecs.")
+ else:
+ ksft.test_result_pass(func_name)
+ else:
+ ksft.test_result_skip(func_name)
+
+
+def generate_reference_file(file_path):
+ """
+ Generate a reference file in JSON format, containing kernel metadata
+ and initcall timing data.
+ @file_path: output file path.
+ """
+ metadata = {}
+
+ config_file = "/proc/config.gz"
+ if os.path.isfile(config_file):
+ with gzip.open(config_file, "rt", encoding="utf-8") as f:
+ config = f.read()
+ metadata["config"] = config
+
+ metadata["version"] = os.uname().release
+
+ cmdline_file = "/proc/cmdline"
+ if os.path.isfile(cmdline_file):
+ with open(cmdline_file, "r", encoding="utf-8") as f:
+ cmdline = f.read().strip()
+ metadata["cmdline"] = cmdline
+
+ ref_data = {
+ "metadata": metadata,
+ "data": parse_initcall_latency_hist(),
+ }
+
+ with open(file_path, "w", encoding='utf-8') as f:
+ json.dump(ref_data, f, indent=4)
+ print(f"Generated {file_path}")
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(
+ description="")
+
+ subparsers = parser.add_subparsers(dest='mode', required=True, help='Choose between generate or test modes')
+
+ generate_parser = subparsers.add_parser('generate', help="Generate a reference file")
+ generate_parser.add_argument('out_ref_file', nargs='?', default='reference_initcall_timings.json',
+ help='Path to output JSON reference file (default: reference_initcall_timings.json)')
+
+ compare_parser = subparsers.add_parser('test', help='Test against a reference file')
+ compare_parser.add_argument('in_ref_file', help='Path to JSON reference file')
+ compare_parser.add_argument(
+ 'delta', type=int, help='Maximum allowed delta between the current and the reference timings (usecs)')
+ compare_parser.add_argument('--override', '-o', action='append', type=str,
+ help="Specify regex-based rules as regex:delta (e.g., '^acpi_.*:50')")
+ compare_parser.add_argument('--mode', '-m', default='latency', choices=[
+ 'start', 'end', 'latency'],
+ help="Comparison mode: 'latency' (default) for latency, 'start' for start times, or 'end' for end times.")
+
+ args = parser.parse_args()
+
+ if args.mode == 'generate':
+ generate_reference_file(args.out_ref_file)
+ sys.exit(0)
+
+ # Process overrides
+ overrides = {}
+ if args.override:
+ for override in args.override:
+ try:
+ pattern, delta = override.split(":")
+ overrides[pattern] = int(delta)
+ except ValueError:
+ print(f"Invalid override format: {override}. Expected format is 'regex:delta'.")
+ sys.exit(1)
+
+ # Ensure debugfs is mounted
+ if not mount_debugfs("/sys/kernel/debug"):
+ ksft.exit_fail()
+
+ ksft.print_header()
+
+ run_test(args.in_ref_file, args.delta, overrides, args.mode)
+
+ ksft.finished()
--
2.30.2
This patch series adds a some not yet picked selftests to the kvm s390x
selftest suite.
The additional test cases are covering:
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
* Assert that memory region operations are rejected for ucontrol VMs
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
---
The patches in this series have been part of the previous patch series.
The test cases added here do depend on the fixture added in the earlier
patches.
From v5 PATCH 7-9 the segment and page table generation has been removed
and DAT
has been disabled. Since DAT is not necessary to validate the KVM code.
https://lore.kernel.org/kvm/20240807154512.316936-1-schlameuss@linux.ibm.co…
v6:
- add instruction intercept handling for skey specific instructions
(iske, sske, rrbe) in addition to kss intercept to work properly on
all machines
- reorder local variables
- fixup some method comments
- add a patch correcting the IP.b value length a debug message
v5:
- rebased to current upstream master
- corrected assertion on 0x00 to 0
- reworded fixup commit so that it can be merged on top of current
upstream
v4:
- fix whitespaces in pointer function arguments (thanks Claudio)
- fix whitespaces in comments (thanks Janosch)
v3:
- fix skey assertion (thanks Claudio)
- introduce a wrapper around UCAS map and unmap ioctls to improve
readability (Claudio)
- add an displacement to accessed memory to assert translation
intercepts actually point to segments to the uc_map_unmap test
- add an misaligned failing mapping try to the uc_map_unmap test
v2:
- Reenable KSS intercept and handle it within skey test.
- Modify the checked register between storing (sske) and reading (iske)
it within the test program to make sure the.
- Add an additional state assertion in the end of uc_skey
- Fix some typos and white spaces.
v1:
- Remove segment and page table generation and disable DAT. This is not
necessary to validate the KVM code.
Christoph Schlameuss (5):
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
selftests: kvm: s390: Fix whitespace confusion in ucontrol test
selftests: kvm: s390: correct IP.b length in uc_handle_sieic debug
output
.../selftests/kvm/include/s390x/processor.h | 6 +
.../selftests/kvm/s390x/ucontrol_test.c | 307 +++++++++++++++++-
2 files changed, 305 insertions(+), 8 deletions(-)
base-commit: eca631b8fe808748d7585059c4307005ca5c5820
--
2.47.0
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (10):
virtio_net: Add functions for hashing
skbuff: Introduce SKB_EXT_TUN_VNET_HASH
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Unify vnet implementation
tun: Pad virtio header with zero
tun: Introduce virtio-net hash reporting feature
tun: Introduce virtio-net RSS
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
MAINTAINERS | 1 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 218 ++++--------
drivers/net/tun.c | 293 ++++++----------
drivers/net/tun_vnet.h | 342 +++++++++++++++++++
drivers/vhost/net.c | 16 +-
include/linux/if_tap.h | 2 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 +++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 75 +++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 630 ++++++++++++++++++++++++++++++++++-
16 files changed, 1430 insertions(+), 356 deletions(-)
---
base-commit: 752ebcbe87aceeb6334e846a466116197711a982
change-id: 20240403-rss-e737d89efa77
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>