Hello,
this series aims to bring test_tcp_check_syncookie.sh scope into
test_progs to make sure that the corresponding tests are also run
automatically in CI. This script tests for bpf_tcp_{gen,check}_syncookie
and bpf_skc_lookup_tcp, in different contexts (ipv4, v6 or dual, and
with tc and xdp programs).
Some other tests like btf_skc_cls_ingress have some overlapping tests with
test_tcp_check_syncookie.sh, so this series moves the missing bits from
test_tcp_check_syncookie.sh into btf_skc_cls_ingress, which is already
integrated into test_progs.
- the first three commits bring some minor improvements to
btf_skc_cls_ingress without changing its testing scope
- fourth and fifth commits bring test_tcp_check_syncookie.sh features
into btf_skc_cls_ingress
- last commit removes test_tcp_check_syncookie.sh
The only topic for which I am not sure for this integration is the
necessity or not to run the tests with different program types:
test_tcp_check_syncookie.sh runs tests with both tc and xdp programs, but
btf_skc_cls_ingress currently tests those helpers only with a tc
program. Would it make sense to also make sure that btf_skc_cls_ingress
is tested with all the programs types supported by those helpers ?
The series has been tested both in CI and in a local x86_64 qemu
environment:
# ./test_progs -a btf_skc_cls_ingress
#38/1 btf_skc_cls_ingress/conn_ipv4:OK
#38/2 btf_skc_cls_ingress/conn_ipv6:OK
#38/3 btf_skc_cls_ingress/conn_dual:OK
#38/4 btf_skc_cls_ingress/syncookie_ipv4:OK
#38/5 btf_skc_cls_ingress/syncookie_ipv6:OK
#38/6 btf_skc_cls_ingress/syncookie_dual:OK
#38 btf_skc_cls_ingress:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (6):
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: remove test_tcp_check_syncookie
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 9 +-
.../selftests/bpf/prog_tests/btf_skc_cls_ingress.c | 265 +++++++++++++--------
.../selftests/bpf/progs/test_btf_skc_cls_ingress.c | 83 +++++--
.../bpf/progs/test_tcp_check_syncookie_kern.c | 167 -------------
.../selftests/bpf/test_tcp_check_syncookie.sh | 85 -------
.../selftests/bpf/test_tcp_check_syncookie_user.c | 213 -----------------
7 files changed, 222 insertions(+), 601 deletions(-)
---
base-commit: 030207b7fce8bad6827615cfc2c6592916e2c336
change-id: 20241015-syncookie-ea7686264586
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.12-rc4.
-- fixes test makefile to install tests directory without which
the test fails with errors.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4ee5ca9a29384fcf3f18232fdf8474166dea8dca:
ftrace/selftest: Test combination of function_graph tracer and function profiler
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.12-rc4
for you to fetch changes up to fe05c40ca9c18cfdb003f639a30fc78a7ab49519:
selftest: hid: add the missing tests directory (2024-10-16 15:55:14 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.12-rc4
kselftest fixes for Linux 6.12-rc4
-- fixes test makefile to install tests directory without which
the test fails with errors.
----------------------------------------------------------------
Yun Lu (1):
selftest: hid: add the missing tests directory
tools/testing/selftests/hid/Makefile | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suze.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (4):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 ++++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++--
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1168 ++++++++++++++++++++++
18 files changed, 1564 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.46.2
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
The previous patches incorrectly assumed that the parent would
always initialize the pthread_barrier for the child thread. This
reverts the change and replaces the fix for wp-fork-with-event with the
original use of atomic_bool.
Edward Liaw (3):
Revert "selftests/mm: fix deadlock for fork after pthread_create on
ARM"
Revert "selftests/mm: replace atomic_bool with pthread_barrier_t"
selftests/mm: fix deadlock for fork after pthread_create with
atomic_bool
tools/testing/selftests/mm/uffd-common.c | 5 ++--
tools/testing/selftests/mm/uffd-common.h | 3 ++-
tools/testing/selftests/mm/uffd-unit-tests.c | 24 ++++++++------------
3 files changed, 14 insertions(+), 18 deletions(-)
--
2.47.0.105.g07ac214952-goog
Changes since V2:
- V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.chatre@intel.com/
- Add fix to protect against buffer overflow when parsing text from sysfs files.
- Add cleanup patch to address use of magic constants as pointed out by
Ilpo.
- Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache
size to determine "fill_buf" buffer size" that changed too much since
receiving the Reviewed-by tag.
- Please see individual patches for detailed changes.
Changes since V1:
- V1: https://lore.kernel.org/cover.1724970211.git.reinette.chatre@intel.com/
- V2 contains the same general solutions to stated problem as V1 but these
are now preceded by more fixes (patches 1 to 5) and improved robustness
(patches 6 to 9) to existing tests before the series gets back
to solving the original problem with more confidence in patches 10 to 13.
- The posibility of making "memflush = false" for CMT test was discussed
during V1. Modifying this setting does not have a significant impact on the
observed results that are already well within acceptable range and this
version thus keeps original default. If performance was a goal it may
be possible to do further experimentation where "memflush = false" could
eliminate the need for the sleep(1) within the test wrapper, but
improving the performance is not a goal of this work.
- (New) Support what seems to be unintended ability for user space to provide
parameters to "fill_buf" by making the parsing robust and only support
changing parameters that are supported to be changed. Drop support for
"write" operation since it has never been measured.
- (New) Improve wraparound handling. (Ilpo)
- (New) A couple of new fixes addressing issues discovered during development.
- (Change from V1) To support fill_buf parameters provided by user space as
well as test specific fill_buf parameters struct fill_buf_param is no longer
just a member of struct resctrl_val_param, instead there could be at most
two instances of struct fill_buf_param, the immutable parameters provided
by user space and the parameters used by individual tests. (Ilpo)
- Please see individual patches for detailed changes.
V1 cover:
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
MBA and MBM selftests measure memory traffic for which a hardcoded
250MB buffer has been sufficient so far. On platforms with L3 cache
larger than the buffer, the buffer fits in the L3 cache and thus
no/very little memory traffic is generated during the "memory
bandwidth" tests.
2) Some platform features, for example RAS features or memory
performance features that generate memory traffic may drive accesses
that are counted differently by performance counters and MBM
respectively, for instance generating "overhead" traffic which is not
counted against any specific RMID. Until now these counting
differences have always been "in the noise". On Emerald Rapids
systems the maximum MBA throttling (10% memory bandwidth)
throttles memory bandwidth to where memory accesses by these other
platform features push the memory bandwidth difference between
memory controller performance counters and resctrl (MBM) beyond the
tests' hardcoded tolerance.
Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
be used to measurement. Rework the tests to ensure that buffer
initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
bandwidth. The value of "low" is hardcoded to 750MiB based on
measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
systems. This limit is not applicable to AMD systems since it
only applies to the MBA and MBM tests that are isolated to Intel.
[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plat…
Reinette Chatre (15):
selftests/resctrl: Make functions only used in same file static
selftests/resctrl: Print accurate buffer size as part of MBM results
selftests/resctrl: Fix memory overflow due to unhandled wraparound
selftests/resctrl: Protect against array overrun during iMC config
parsing
selftests/resctrl: Protect against array overflow when reading strings
selftests/resctrl: Make wraparound handling obvious
selftests/resctrl: Remove "once" parameter required to be false
selftests/resctrl: Only support measured read operation
selftests/resctrl: Remove unused measurement code
selftests/resctrl: Make benchmark parameter passing robust
selftests/resctrl: Ensure measurements skip initialization of default
benchmark
selftests/resctrl: Use cache size to determine "fill_buf" buffer size
selftests/resctrl: Do not compare performance counters and resctrl at
low bandwidth
selftests/resctrl: Keep results from first test run
selftests/resctrl: Replace magic constants used as array size
tools/testing/selftests/resctrl/cmt_test.c | 37 +-
tools/testing/selftests/resctrl/fill_buf.c | 45 +-
tools/testing/selftests/resctrl/mba_test.c | 54 ++-
tools/testing/selftests/resctrl/mbm_test.c | 37 +-
tools/testing/selftests/resctrl/resctrl.h | 79 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 95 +++-
tools/testing/selftests/resctrl/resctrl_val.c | 447 +++++-------------
tools/testing/selftests/resctrl/resctrlfs.c | 19 +-
8 files changed, 354 insertions(+), 459 deletions(-)
--
2.46.2
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
Updated the synchronization primitive to use pthread_barrier instead of
atomic_bool.
Applied the same fix to the wp-fork-with-event test.
Edward Liaw (2):
selftests/mm: replace atomic_bool with pthread_barrier_t
selftests/mm: fix deadlock for fork after pthread_create on ARM
tools/testing/selftests/mm/uffd-common.c | 5 +++--
tools/testing/selftests/mm/uffd-common.h | 3 +--
tools/testing/selftests/mm/uffd-unit-tests.c | 21 ++++++++++++++------
3 files changed, 19 insertions(+), 10 deletions(-)
--
2.46.1.824.gd892dcdcdd-goog
If the kunit being run generates a WARN for some reason kunit.py ignores it
and declares the tested PASSED. This is very much not desirable, as tests that
are hitting WARN's are probably actually failing.
Take the simple approach to reducing this by setting panic_on_warn when
running the kernel. The kernel crashes and kunit.py shows the WARN and reports
the test fails.
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
tools/testing/kunit/kunit_kernel.py | 2 ++
1 file changed, 2 insertions(+)
I saw there was an earlier series working to make tests that deliberately made
WARNs not do that, so this would be consistent with that idea, tests should not
make WARNs, and WARNs should not be ignored..
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 61931c4926fd66..7a4228568dd73c 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -342,6 +342,8 @@ class LinuxSourceTree:
if filter_action:
args.append('kunit.filter_action=' + filter_action)
args.append('kunit.enable=1')
+ args.append('panic_on_warn=1')
+ args.append('panic=-1')
process = self._ops.start(args, build_dir)
assert process.stdout is not None # tell mypy it's set
base-commit: 2872987b1d009df556c0061ecdeede6a5f9bf42c
--
2.46.2
1. In order to make rtctest more explicit and robust, we propose to use
RTC_PARAM_GET ioctl interface to check rtc alarm feature state before
running alarm related tests.
2. The rtctest requires the read permission on /dev/rtc0. The rtctest will
be skipped if the /dev/rtc0 is not readable.
Joseph Jang (2):
selftest: rtc: Add to check rtc alarm status for alarm related test
selftest: rtc: Check if could access /dev/rtc0 before testing
tools/testing/selftests/rtc/Makefile | 2 +-
tools/testing/selftests/rtc/rtctest.c | 71 ++++++++++++++++++++++++++-
2 files changed, 71 insertions(+), 2 deletions(-)
--
2.34.1
Introduce a new kselftest to identify slowdowns in key boot events.
This test uses ftrace to monitor the start and end times, as well as
the durations of all initcalls, and compares these timings to reference
values to identify significant slowdowns.
The script functions in two modes: the 'generate' mode allows to create
a JSON file containing initial reference timings for all initcalls from
a known stable kernel. The 'test' mode can be used during subsequent
boots to assess current timings against the reference values and
determine if there are any significant differences.
The test ships with a bootconfig file for setting up ftrace and a
configuration fragment for the necessary kernel configs.
Signed-off-by: Laura Nao <laura.nao(a)collabora.com>
---
Hello,
This v2 is a follow-up to RFCv1[1] and includes changes based on feedback
from the LPC 2024 session [2], along with some other fixes.
[1] https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@collabora.com/
[2] https://www.youtube.com/watch?v=rWhW2-Vzi40
After reviewing other available tests and considering the feedback from
discussions at Plumbers, I decided to stick with the bootconfig file
approach but extend it to track all initcalls instead of a fixed set of
functions or events. The bootconfig file can be expanded and adapted to
track additional functions if needed for specific use cases.
I also defined a synthetic event to calculate initcall durations, while
still tracking their start and end times. Users are then allowed to choose
whether to compare start times, end times, or durations. Support for
specifying different rules for different initcalls has also been added.
In RFCv1, there was some discussion about using existing tools like
bootgraph.py. However, the output from these tools is mainly for manual
inspection (e.g., HTML visual output), whereas this test is designed to run
in automated CI environments too. The kselftest proposed here combines the
process of generating reference data and running tests into a single script
with two modes, making it easy to integrate into automated workflows.
Many of the features in this v2 (e.g., generating a JSON reference file,
comparing timings, and reporting results in KTAP format) could potentially
be integrated into bootgraph.py with some effort.
However, since this test is intended for automated execution rather than
manual use, I've decided to keep it separate for now and explore the
options suggested at LPC, such as using ftrace histograms for initcall
latencies. I'm open to revisiting this decision and working toward
integrating the changes into bootgraph.py if there's a strong preference
for unifying the tools.
Let me know your thoughts.
A comprehensive changelog is reported below.
Thanks,
Laura
---
Changes in v2:
- Updated ftrace configuration to track all initcall start times, end
times, and durations, and generate a histogram.
- Modified test logic to compare initcall durations by default, with the
option to compare start or end times if needed.
- Added warnings if the initcalls in the reference file differ from those
detected in the running system.
- Combined the scripts into a single script with two modes: one for
generating the reference file and one for running the test.
- Added support for specifying different rules for individual initcalls.
- Switched the reference format from YAML to JSON.
- Added metadata to the reference file, including kernel version, kernel
configuration, and cmdline.
- Link to v1: https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@collabora.com/
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/boot-time/Makefile | 16 ++
tools/testing/selftests/boot-time/bootconfig | 15 +
tools/testing/selftests/boot-time/config | 6 +
.../selftests/boot-time/test_boot_time.py | 265 ++++++++++++++++++
5 files changed, 303 insertions(+)
create mode 100644 tools/testing/selftests/boot-time/Makefile
create mode 100644 tools/testing/selftests/boot-time/bootconfig
create mode 100644 tools/testing/selftests/boot-time/config
create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b38199965f99..1bb20d1e3854 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -3,6 +3,7 @@ TARGETS += acct
TARGETS += alsa
TARGETS += amd-pstate
TARGETS += arm64
+TARGETS += boot-time
TARGETS += bpf
TARGETS += breakpoints
TARGETS += cachestat
diff --git a/tools/testing/selftests/boot-time/Makefile b/tools/testing/selftests/boot-time/Makefile
new file mode 100644
index 000000000000..cdcdc1bbe779
--- /dev/null
+++ b/tools/testing/selftests/boot-time/Makefile
@@ -0,0 +1,16 @@
+PY3 = $(shell which python3 2>/dev/null)
+
+ifneq ($(PY3),)
+
+TEST_PROGS := test_boot_time.py
+
+include ../lib.mk
+
+else
+
+all: no_py3_warning
+
+no_py3_warning:
+ @echo "Missing python3. This test will be skipped."
+
+endif
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/bootconfig b/tools/testing/selftests/boot-time/bootconfig
new file mode 100644
index 000000000000..e4b89a33b7a3
--- /dev/null
+++ b/tools/testing/selftests/boot-time/bootconfig
@@ -0,0 +1,15 @@
+ftrace.event {
+ synthetic.initcall_latency {
+ # Synthetic event to record initcall latency, start, and end times
+ fields = "unsigned long func", "u64 lat", "u64 start", "u64 end"
+ actions = "hist:keys=func.sym,start,end:vals=lat:sort=lat"
+ }
+ initcall.initcall_start {
+ # Capture the start time (ts0) when initcall starts
+ actions = "hist:keys=func:ts0=common_timestamp.usecs"
+ }
+ initcall.initcall_finish {
+ # Capture the end time, calculate latency, and trigger synthetic event
+ actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:start=$ts0:end=common_timestamp.usecs:onmatch(initcall.initcall_start).initcall_latency(func,$lat,$start,$end)"
+ }
+}
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/config b/tools/testing/selftests/boot-time/config
new file mode 100644
index 000000000000..bcb646ec3cd8
--- /dev/null
+++ b/tools/testing/selftests/boot-time/config
@@ -0,0 +1,6 @@
+CONFIG_TRACING=y
+CONFIG_BOOTTIME_TRACING=y
+CONFIG_BOOT_CONFIG_EMBED=y
+CONFIG_BOOT_CONFIG_EMBED_FILE="tools/testing/selftests/boot-time/bootconfig"
+CONFIG_SYNTH_EVENTS=y
+CONFIG_HIST_TRIGGERS=y
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/test_boot_time.py b/tools/testing/selftests/boot-time/test_boot_time.py
new file mode 100755
index 000000000000..556dacf04b6d
--- /dev/null
+++ b/tools/testing/selftests/boot-time/test_boot_time.py
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2024 Collabora Ltd
+#
+# This script reads the
+# /sys/kernel/debug/tracing/events/synthetic/initcall_latency/hist file,
+# extracts function names and timings, and compares them against reference
+# timings provided in an input JSON file to identify significant boot
+# slowdowns.
+# The script operates in two modes:
+# - Generate Mode: parses initcall timings from the current kernel's ftrace
+# event histogram and generates a JSON reference file with function
+# names, start times, end times, and latencies.
+# - Test Mode: compares current initcall timings against the reference
+# file, allowing users to define a maximum allowed difference between the
+# values (delta). Users can also apply custom delta thresholds for
+# specific initcalls using regex-based overrides. The comparison can be
+# done on latency, start, or end times.
+#
+
+import os
+import sys
+import argparse
+import gzip
+import json
+import re
+import subprocess
+
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../kselftest/"))
+
+import ksft
+
+def load_reference_from_json(file_path):
+ """
+ Load reference data from a JSON file and returns the parsed data.
+ @file_path: path to the JSON file.
+ """
+
+ try:
+ with open(file_path, 'r', encoding="utf-8") as file:
+ return json.load(file)
+ except FileNotFoundError:
+ ksft.print_msg(f"Error: File {file_path} not found.")
+ ksft.exit_fail()
+ except json.JSONDecodeError:
+ ksft.print_msg(f"Error: Failed to decode JSON from {file_path}.")
+ ksft.exit_fail()
+
+
+def mount_debugfs(path):
+ """
+ Mount debugfs at the specified path if it is not already mounted.
+ @path: path where debugfs should be mounted
+ """
+ # Check if debugfs is already mounted
+ with open('/proc/mounts', 'r', encoding="utf-8") as mounts:
+ for line in mounts:
+ if 'debugfs' in line and path in line:
+ print(f"debugfs is already mounted at {path}")
+ return True
+
+ # Mount debugfs
+ try:
+ subprocess.run(['mount', '-t', 'debugfs', 'none', path], check=True)
+ return True
+ except subprocess.CalledProcessError as e:
+ print(f"Failed to mount debugfs: {e.stderr}")
+ return False
+
+
+def ensure_unique_function_name(func, initcall_entries):
+ """
+ Ensure the function name is unique by appending a suffix if necessary.
+ @func: the original function name.
+ @initcall_entries: a dictionary containing parsed initcall entries.
+ """
+ i = 2
+ base_func = func
+ while func in initcall_entries:
+ func = f'{base_func}[{i}]'
+ i += 1
+ return func
+
+
+def parse_initcall_latency_hist():
+ """
+ Parse the ftrace histogram for the initcall_latency event, extracting
+ function names, start times, end times, and latencies. Return a
+ dictionary where each entry is structured as follows:
+ {
+ <function symbolic name>: {
+ "start": <start time>,
+ "end": <end time>,
+ "latency": <latency>
+ }
+ }
+ """
+
+ pattern = re.compile(r'\{ func: \[\w+\] ([\w_]+)\s*, start: *(\d+), end: *(\d+) \} hitcount: *\d+ lat: *(\d+)')
+ initcall_entries = {}
+
+ try:
+ with open('/sys/kernel/debug/tracing/events/synthetic/initcall_latency/hist', 'r', encoding="utf-8") as hist_file:
+ for line in hist_file:
+ match = pattern.search(line)
+ if match:
+ func = match.group(1).strip()
+ start = int(match.group(2))
+ end = int(match.group(3))
+ latency = int(match.group(4))
+
+ # filter out unresolved names
+ if not func.startswith("0x"):
+ func = ensure_unique_function_name(func, initcall_entries)
+
+ initcall_entries[func] = {
+ "start": start,
+ "end": end,
+ "latency": latency
+ }
+ except FileNotFoundError:
+ print("Error: Histogram file not found.")
+
+ return initcall_entries
+
+
+def compare_initcall_list(ref_initcall_entries, cur_initcall_entries):
+ """
+ Compare the current list of initcall functions against the reference
+ file. Print warnings if there are unique entries in either.
+ @ref_initcall_entries: reference initcall entries.
+ @cur_initcall_entries: current initcall entries.
+ """
+ ref_entries = set(ref_initcall_entries.keys())
+ cur_entries = set(cur_initcall_entries.keys())
+
+ unique_to_ref = ref_entries - cur_entries
+ unique_to_cur = cur_entries - ref_entries
+
+ if (unique_to_ref):
+ ksft.print_msg(
+ f"Warning: {list(unique_to_ref)} not found in current data. Consider updating reference file.")
+ if unique_to_cur:
+ ksft.print_msg(
+ f"Warning: {list(unique_to_cur)} not found in reference data. Consider updating reference file.")
+
+
+def run_test(ref_file_path, delta, overrides, mode):
+ """
+ Run the test comparing the current timings with the reference values.
+ @ref_file_path: path to the JSON file containing reference values.
+ @delta: default allowed difference between reference and current
+ values.
+ @overrides: override rules in the form of regex:threshold.
+ @mode: the comparison mode (either 'start', 'end', or 'latency').
+ """
+
+ ref_data = load_reference_from_json(ref_file_path)
+
+ ref_initcall_entries = ref_data['data']
+ cur_initcall_entries = parse_initcall_latency_hist()
+
+ compare_initcall_list(ref_initcall_entries, cur_initcall_entries)
+
+ ksft.set_plan(len(ref_initcall_entries))
+
+ for func_name in ref_initcall_entries:
+ effective_delta = delta
+ for regex, override_delta in overrides.items():
+ if re.match(regex, func_name):
+ effective_delta = override_delta
+ break
+ if (func_name in cur_initcall_entries):
+ ref_metric = ref_initcall_entries[func_name].get(mode)
+ cur_metric = cur_initcall_entries[func_name].get(mode)
+ if (cur_metric > ref_metric and (cur_metric - ref_metric) >= effective_delta):
+ ksft.test_result_fail(func_name)
+ ksft.print_msg(f"'{func_name}' {mode} differs by "
+ f"{(cur_metric - ref_metric)} usecs.")
+ else:
+ ksft.test_result_pass(func_name)
+ else:
+ ksft.test_result_skip(func_name)
+
+
+def generate_reference_file(file_path):
+ """
+ Generate a reference file in JSON format, containing kernel metadata
+ and initcall timing data.
+ @file_path: output file path.
+ """
+ metadata = {}
+
+ config_file = "/proc/config.gz"
+ if os.path.isfile(config_file):
+ with gzip.open(config_file, "rt", encoding="utf-8") as f:
+ config = f.read()
+ metadata["config"] = config
+
+ metadata["version"] = os.uname().release
+
+ cmdline_file = "/proc/cmdline"
+ if os.path.isfile(cmdline_file):
+ with open(cmdline_file, "r", encoding="utf-8") as f:
+ cmdline = f.read().strip()
+ metadata["cmdline"] = cmdline
+
+ ref_data = {
+ "metadata": metadata,
+ "data": parse_initcall_latency_hist(),
+ }
+
+ with open(file_path, "w", encoding='utf-8') as f:
+ json.dump(ref_data, f, indent=4)
+ print(f"Generated {file_path}")
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(
+ description="")
+
+ subparsers = parser.add_subparsers(dest='mode', required=True, help='Choose between generate or test modes')
+
+ generate_parser = subparsers.add_parser('generate', help="Generate a reference file")
+ generate_parser.add_argument('out_ref_file', nargs='?', default='reference_initcall_timings.json',
+ help='Path to output JSON reference file (default: reference_initcall_timings.json)')
+
+ compare_parser = subparsers.add_parser('test', help='Test against a reference file')
+ compare_parser.add_argument('in_ref_file', help='Path to JSON reference file')
+ compare_parser.add_argument(
+ 'delta', type=int, help='Maximum allowed delta between the current and the reference timings (usecs)')
+ compare_parser.add_argument('--override', '-o', action='append', type=str,
+ help="Specify regex-based rules as regex:delta (e.g., '^acpi_.*:50')")
+ compare_parser.add_argument('--mode', '-m', default='latency', choices=[
+ 'start', 'end', 'latency'],
+ help="Comparison mode: 'latency' (default) for latency, 'start' for start times, or 'end' for end times.")
+
+ args = parser.parse_args()
+
+ if args.mode == 'generate':
+ generate_reference_file(args.out_ref_file)
+ sys.exit(0)
+
+ # Process overrides
+ overrides = {}
+ if args.override:
+ for override in args.override:
+ try:
+ pattern, delta = override.split(":")
+ overrides[pattern] = int(delta)
+ except ValueError:
+ print(f"Invalid override format: {override}. Expected format is 'regex:delta'.")
+ sys.exit(1)
+
+ # Ensure debugfs is mounted
+ if not mount_debugfs("/sys/kernel/debug"):
+ ksft.exit_fail()
+
+ ksft.print_header()
+
+ run_test(args.in_ref_file, args.delta, overrides, args.mode)
+
+ ksft.finished()
--
2.30.2
This patch series adds a some not yet picked selftests to the kvm s390x
selftest suite.
The additional test cases are covering:
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
* Assert that memory region operations are rejected for ucontrol VMs
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
---
The patches in this series have been part of the previous patch series.
The test cases added here do depend on the fixture added in the earlier
patches.
From v5 PATCH 7-9 the segment and page table generation has been removed
and DAT
has been disabled. Since DAT is not necessary to validate the KVM code.
https://lore.kernel.org/kvm/20240807154512.316936-1-schlameuss@linux.ibm.co…
v6:
- add instruction intercept handling for skey specific instructions
(iske, sske, rrbe) in addition to kss intercept to work properly on
all machines
- reorder local variables
- fixup some method comments
- add a patch correcting the IP.b value length a debug message
v5:
- rebased to current upstream master
- corrected assertion on 0x00 to 0
- reworded fixup commit so that it can be merged on top of current
upstream
v4:
- fix whitespaces in pointer function arguments (thanks Claudio)
- fix whitespaces in comments (thanks Janosch)
v3:
- fix skey assertion (thanks Claudio)
- introduce a wrapper around UCAS map and unmap ioctls to improve
readability (Claudio)
- add an displacement to accessed memory to assert translation
intercepts actually point to segments to the uc_map_unmap test
- add an misaligned failing mapping try to the uc_map_unmap test
v2:
- Reenable KSS intercept and handle it within skey test.
- Modify the checked register between storing (sske) and reading (iske)
it within the test program to make sure the.
- Add an additional state assertion in the end of uc_skey
- Fix some typos and white spaces.
v1:
- Remove segment and page table generation and disable DAT. This is not
necessary to validate the KVM code.
Christoph Schlameuss (5):
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
selftests: kvm: s390: Fix whitespace confusion in ucontrol test
selftests: kvm: s390: correct IP.b length in uc_handle_sieic debug
output
.../selftests/kvm/include/s390x/processor.h | 6 +
.../selftests/kvm/s390x/ucontrol_test.c | 307 +++++++++++++++++-
2 files changed, 305 insertions(+), 8 deletions(-)
base-commit: eca631b8fe808748d7585059c4307005ca5c5820
--
2.47.0
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (10):
virtio_net: Add functions for hashing
skbuff: Introduce SKB_EXT_TUN_VNET_HASH
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Unify vnet implementation
tun: Pad virtio header with zero
tun: Introduce virtio-net hash reporting feature
tun: Introduce virtio-net RSS
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
MAINTAINERS | 1 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 218 ++++--------
drivers/net/tun.c | 293 ++++++----------
drivers/net/tun_vnet.h | 342 +++++++++++++++++++
drivers/vhost/net.c | 16 +-
include/linux/if_tap.h | 2 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 +++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 75 +++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 630 ++++++++++++++++++++++++++++++++++-
16 files changed, 1430 insertions(+), 356 deletions(-)
---
base-commit: 752ebcbe87aceeb6334e846a466116197711a982
change-id: 20240403-rss-e737d89efa77
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This fixes several smaller issues I faced when compiling the arm64
kselftests on my machine.
Patch 1 avoids a warning about the double definition of GNU_SOURCE,
for the arm64/signal tests. Patch 2 fixes a typo, where the f8dp2 hwcap
feature test was looking at the f8dp*4* cpuinfo name. Patch 3 adjusts
the output of the MTE tests when MTE is not available, so that tools
parsing the TAP output don't get confused and report errors.
The remaining patches are about wrong printf format specifiers. I grouped
them by type of error, in patch 4-8.
Please have a look!
Cheers,
Andre
Andre Przywara (8):
kselftest/arm64: signal: drop now redundant GNU_SOURCE definition
kselftest/arm64: hwcap: fix f8dp2 cpuinfo name
kselftest/arm64: mte: use proper SKIP syntax
kselftest/arm64: mte: use string literal for printf-style functions
kselftest/arm64: mte: fix printf type warning about mask
kselftest/arm64: mte: fix printf type warnings about __u64
kselftest/arm64: mte: fix printf type warnings about pointers
kselftest/arm64: mte: fix printf type warnings about longs
tools/testing/selftests/arm64/abi/hwcap.c | 2 +-
.../selftests/arm64/mte/check_buffer_fill.c | 4 ++--
tools/testing/selftests/arm64/mte/check_prctl.c | 4 ++--
.../selftests/arm64/mte/check_tags_inclusion.c | 4 ++--
.../testing/selftests/arm64/mte/mte_common_util.c | 15 +++++++--------
.../testing/selftests/arm64/mte/mte_common_util.h | 6 +++---
tools/testing/selftests/arm64/signal/Makefile | 2 +-
7 files changed, 18 insertions(+), 19 deletions(-)
--
2.25.1
Currently, the situation when guest accesses MMIO during event delivery
is handled differently in VMX and SVM: on VMX KVM returns internal error
with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV, when SVM simply goes
into infinite loop trying to deliver an event again and again.
This patch series eliminates this difference by returning a KVM internal
error with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV when guest is
performing MMIO during event delivery, for both VMX and SVM.
Also, it introduces a selftest test case which covers the MMIO during
event delivery error handling.
Ivan Orlov (3):
KVM: x86, vmx: Add function for event delivery error generation
KVM: vmx, svm, mmu: Process MMIO during event delivery
selftests: KVM: Add test case for MMIO during event delivery
arch/x86/include/asm/kvm_host.h | 8 ++++
arch/x86/kvm/mmu/mmu.c | 15 +++++-
arch/x86/kvm/svm/svm.c | 4 ++
arch/x86/kvm/vmx/vmx.c | 32 ++++---------
arch/x86/kvm/x86.c | 22 +++++++++
.../selftests/kvm/set_memory_region_test.c | 46 +++++++++++++++++++
6 files changed, 104 insertions(+), 23 deletions(-)
--
2.43.0
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v3->v4 changes:
- Reduce size of exit.log by removing unnecessary text.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one.
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Made threads.c automated, by having an exit.log file created by
proc_filter.c, which threads.c checks to see if the values reported
for thread exits are correct. This was for a comment by Liam Howlett to
make the tests automated.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 ++++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 202 ++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 937 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one.
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Made threads.c automated, by having an exit.log file created by
proc_filter.c, which threads.c checks to see if the values reported
for thread exits are correct. This was for a comment by Liam Howlett to
make the tests automated.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 ++++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 202 ++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 937 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
Hi all,
This series implements the Permission Overlay Extension introduced in 2022
VMSA enhancements [1]. It is based on v6.11-rc4.
Changes since v4[2]:
- Added Acks and R-bs, thanks!
- KVM:
- Move POR_EL{0,1} handling inside TCR_EL2 blocks
- Add visibility functions for registers [4]
- Make ID_AA64MMFR3_EL1 writable
- use system_supports_poe() more consistently
- use BIT instead of hex constants
- fix off-by-one in arch_max_pkey() macro
- add PKEY_DISABLE_EXECUTE and PKEY_DISABLE_READ
- Update some comments and commit messages.
- No change to when we save/restore POR_EL0 for signals!
Conflicts with GCS:
- Uses the same (last) bit in HWCAP2
- Uses the same VM_HIGH_ARCH_5
Conflicts with arm64 KVM:
- Maz has taken patch 8 into one of his own series
- I have taken and modified a patch from Maz (patch 9)
The Permission Overlay Extension allows to constrain permissions on memory
regions. This can be used from userspace (EL0) without a system call or TLB
invalidation.
POE is used to implement the Memory Protection Keys [3] Linux syscall.
The first few patches add the basic framework, then the PKEYS interface is
implemented, and then the selftests are made to work on arm64.
I have tested the modified protection_keys test on x86_64, but not PPC.
I haven't build tested the x86/ppc arch changes.
Thanks,
Joey
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processor…
[2] https://lore.kernel.org/linux-arm-kernel/20240503130147.1154804-1-joey.goul…
[3] Documentation/core-api/protection-keys.rst
[4] https://lore.kernel.org/linux-arm-kernel/20240806-kvm-arm64-get-reg-list-v2…
Joey Gouly (30):
powerpc/mm: add ARCH_PKEY_BITS to Kconfig
x86/mm: add ARCH_PKEY_BITS to Kconfig
mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN
arm64: disable trapping of POR_EL0 to EL2
arm64: cpufeature: add Permission Overlay Extension cpucap
arm64: context switch POR_EL0 register
KVM: arm64: Save/restore POE registers
KVM: arm64: make kvm_at() take an OP_AT_*
KVM: arm64: use `at s1e1a` for POE
KVM: arm64: Sanitise ID_AA64MMFR3_EL1
arm64: enable the Permission Overlay Extension for EL0
arm64: re-order MTE VM_ flags
arm64: add POIndex defines
arm64: convert protection key into vm_flags and pgprot values
arm64: mask out POIndex when modifying a PTE
arm64: handle PKEY/POE faults
arm64: add pte_access_permitted_no_overlay()
arm64: implement PKEYS support
arm64: add POE signal support
arm64/ptrace: add support for FEAT_POE
arm64: enable POE and PIE to coexist
arm64: enable PKEY support for CPUs with S1POE
arm64: add Permission Overlay Extension Kconfig
kselftest/arm64: move get_header()
selftests: mm: move fpregs printing
selftests: mm: make protection_keys test work on arm64
kselftest/arm64: add HWCAP test for FEAT_S1POE
kselftest/arm64: parse POE_MAGIC in a signal frame
kselftest/arm64: Add test case for POR_EL0 signal frame records
KVM: selftests: get-reg-list: add Permission Overlay registers
Documentation/arch/arm64/elf_hwcaps.rst | 2 +
arch/arm64/Kconfig | 23 +++
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 10 +-
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 3 +-
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/mman.h | 10 +-
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/include/asm/mmu_context.h | 46 +++++-
arch/arm64/include/asm/pgtable-hwdef.h | 10 ++
arch/arm64/include/asm/pgtable-prot.h | 8 +-
arch/arm64/include/asm/pgtable.h | 34 ++++-
arch/arm64/include/asm/pkeys.h | 108 ++++++++++++++
arch/arm64/include/asm/por.h | 33 +++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/include/asm/sysreg.h | 3 +
arch/arm64/include/asm/traps.h | 1 +
arch/arm64/include/asm/vncr_mapping.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/mman.h | 9 ++
arch/arm64/include/uapi/asm/sigcontext.h | 7 +
arch/arm64/kernel/cpufeature.c | 23 +++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/process.c | 28 ++++
arch/arm64/kernel/ptrace.c | 46 ++++++
arch/arm64/kernel/signal.c | 62 ++++++++
arch/arm64/kernel/traps.c | 6 +
arch/arm64/kvm/hyp/include/hyp/fault.h | 5 +-
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 27 ++++
arch/arm64/kvm/sys_regs.c | 25 +++-
arch/arm64/mm/fault.c | 55 ++++++-
arch/arm64/mm/mmap.c | 11 ++
arch/arm64/mm/mmu.c | 45 ++++++
arch/arm64/tools/cpucaps | 1 +
arch/powerpc/Kconfig | 4 +
arch/x86/Kconfig | 4 +
fs/proc/task_mmu.c | 2 +
include/linux/mm.h | 20 ++-
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 14 ++
.../testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/poe_siginfo.c | 86 +++++++++++
.../arm64/signal/testcases/testcases.c | 27 +---
.../arm64/signal/testcases/testcases.h | 28 +++-
.../selftests/kvm/aarch64/get-reg-list.c | 14 ++
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/pkey-arm64.h | 139 ++++++++++++++++++
tools/testing/selftests/mm/pkey-helpers.h | 8 +
tools/testing/selftests/mm/pkey-powerpc.h | 3 +
tools/testing/selftests/mm/pkey-x86.h | 4 +
tools/testing/selftests/mm/protection_keys.c | 109 ++++++++++++--
52 files changed, 1060 insertions(+), 63 deletions(-)
create mode 100644 arch/arm64/include/asm/pkeys.h
create mode 100644 arch/arm64/include/asm/por.h
create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c
create mode 100644 tools/testing/selftests/mm/pkey-arm64.h
--
2.25.1
The mount option of tmpfs should be huge=advise, not madvise
which is not supported and may mislead the users.
Fixes: 1b03d0d558a2 ("selftests/vm: add thp collapse file and tmpfs testing")
Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com>
---
tools/testing/selftests/mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 56d4480e8d3c..8a4d34cce36b 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -1091,7 +1091,7 @@ static void usage(void)
fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n");
fprintf(stderr, "\tCONFIG_READ_ONLY_THP_FOR_FS=y\n");
fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n");
- fprintf(stderr, "\tmounted with huge=madvise option for khugepaged tests to work\n");
+ fprintf(stderr, "\tmounted with huge=advise option for khugepaged tests to work\n");
fprintf(stderr, "\n\tSupported Options:\n");
fprintf(stderr, "\t\t-h: This help message.\n");
fprintf(stderr, "\t\t-s: mTHP size, expressed as page order.\n");
--
2.33.0
Commit 160c826b4dd0 ("selftest: hid: add missing run-hid-tools-tests.sh")
has added the run-hid-tools-tests.sh script for it to be installed, but
I forgot to add the tests directory together.
If running the test case without the tests directory, will results in
the following error message:
make -C tools/testing/selftests/ TARGETS=hid install \
INSTALL_PATH=$KSFT_INSTALL_PATH
cd $KSFT_INSTALL_PATH
./run_kselftest.sh -t hid:hid-core.sh
/usr/lib/python3.11/site-packages/_pytest/config/__init__.py:331: PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: helpconfig, Hook: pytest_cmdline_parse
UsageError: usage: __main__.py [options] [file_or_dir] [file_or_dir] [...]
__main__.py: error: unrecognized arguments: --udevd
inifile: None
rootdir: /root/linux/kselftest_install/hid
In fact, the run-hid-tools-tests.sh script uses the scripts in the tests
directory to run tests. The tests directory also needs to be added to be
installed.
v2: add the error message
Fixes: ffb85d5c9e80 ("selftests: hid: import hid-tools hid-core tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yun Lu <luyun(a)kylinos.cn>
---
tools/testing/selftests/hid/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/hid/Makefile b/tools/testing/selftests/hid/Makefile
index 38ae31bb07b5..662209f5fabc 100644
--- a/tools/testing/selftests/hid/Makefile
+++ b/tools/testing/selftests/hid/Makefile
@@ -18,6 +18,7 @@ TEST_PROGS += hid-usb_crash.sh
TEST_PROGS += hid-wacom.sh
TEST_FILES := run-hid-tools-tests.sh
+TEST_FILES += tests
CXX ?= $(CROSS_COMPILE)g++
--
2.27.0
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (3):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 43 +++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 73 ++++++---
kernel/signal.c | 22 +--
tools/testing/selftests/pidfd/pidfd.h | 8 +
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
10 files changed, 341 insertions(+), 52 deletions(-)
--
2.46.2
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
RFC version:
https://lore.kernel.org/linux-mm/1d19f18c-5a60-44b5-a96f-9d0e74f2b02c@lucif…
Lorenzo Stoakes (3):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 43 +++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 73 +++++++---
kernel/signal.c | 22 +--
tools/testing/selftests/pidfd/pidfd.h | 8 ++
.../selftests/pidfd/pidfd_getfd_test.c | 136 ++++++++++++++++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 67 +++++++--
10 files changed, 330 insertions(+), 49 deletions(-)
--
2.46.2
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 240 ++++++++++++++++++
drivers/connector/cn_proc.c | 55 +++-
drivers/connector/connector.c | 96 ++++++-
include/linux/connector.h | 47 ++++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 4 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 ++++++++++++
lib/cn_hash_test.h | 12 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 5 +
tools/testing/selftests/connector/thread.c | 116 +++++++++
.../selftests/connector/thread_filter.c | 96 +++++++
15 files changed, 873 insertions(+), 10 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
The GCS stress test program currently uses the PID of the threads it
creates in the test names it reports, resulting in unstable test names
between runs. Fix this by using a thread number instead.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/gcs/gcs-stress.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/arm64/gcs/gcs-stress.c b/tools/testing/selftests/arm64/gcs/gcs-stress.c
index bdec7ee8cfd5..03222c36c436 100644
--- a/tools/testing/selftests/arm64/gcs/gcs-stress.c
+++ b/tools/testing/selftests/arm64/gcs/gcs-stress.c
@@ -56,7 +56,7 @@ static int num_processors(void)
return nproc;
}
-static void start_thread(struct child_data *child)
+static void start_thread(struct child_data *child, int id)
{
int ret, pipefd[2], i;
struct epoll_event ev;
@@ -132,7 +132,7 @@ static void start_thread(struct child_data *child)
ev.events = EPOLLIN | EPOLLHUP;
ev.data.ptr = child;
- ret = asprintf(&child->name, "Thread-%d", child->pid);
+ ret = asprintf(&child->name, "Thread-%d", id);
if (ret == -1)
ksft_exit_fail_msg("asprintf() failed\n");
@@ -437,7 +437,7 @@ int main(int argc, char **argv)
tests);
for (i = 0; i < gcs_threads; i++)
- start_thread(&children[i]);
+ start_thread(&children[i], i);
/*
* All children started, close the startup pipe and let them
---
base-commit: bb9ae1a66c85eeb626864efd812c62026e126ec0
change-id: 20241011-arm64-gcs-stress-stable-name-8550519fe152
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When TCP over IPv4 via INET6 API, sk->sk_family is AF_INET6, but it is a v4 pkt.
inet_csk(sk)->icsk_af_ops is ipv6_mapped and use ip_queue_xmit. Some sockopt did
not take effect, such as tos.
0001: Use sk_is_inet helper to fix it.
0002: Setget_sockopt add a test for tcp over ipv4 via ipv6.
Changelog:
v2->v3: Addressed comments from Eric Dumazet
- Use sk_is_inet() helper
Details in here:
https://lore.kernel.org/bpf/CANn89i+9GmBLCdgsfH=WWe-tyFYpiO27wONyxaxiU6aOBC…
v1->v2: Addressed comments from kernel test robot
- Fix compilation error
Details in here:
https://lore.kernel.org/bpf/202408152058.YXAnhLgZ-lkp@intel.com/T/
Feng Zhou (2):
bpf: Fix bpf_get/setsockopt to tos not take effect when TCP over IPv4
via INET6 API
selftests/bpf: Setget_sockopt add a test for tcp over ipv4 via ipv6
net/core/filter.c | 7 +++-
.../selftests/bpf/prog_tests/setget_sockopt.c | 33 +++++++++++++++++++
.../selftests/bpf/progs/setget_sockopt.c | 13 ++++++--
3 files changed, 49 insertions(+), 4 deletions(-)
--
2.30.2
From: Jeff Xu <jeffxu(a)chromium.org>
Pedro Falcato's optimization [1] for checking sealed VMAs, which replaces
the can_modify_mm() function with an in-loop check, necessitates an update
to the mseal.rst documentation to reflect this change.
Furthermore, the document has received offline comments regarding the code
sample and suggestions for sentence clarification to enhance reader
comprehension.
[1] https://lore.kernel.org/linux-mm/20240817-mseal-depessimize-v3-0-d8d2e037df…
History:
V3: update according to Randy Dunlap's comment
V2: update according to Randy Dunlap's comments.
https://lore.kernel.org/all/20241001002628.2239032-1-jeffxu@chromium.org/
V1: initial version
https://lore.kernel.org/all/20240927185211.729207-1-jeffxu@chromium.org/
Jeff Xu (1):
mseal: update mseal.rst
Documentation/userspace-api/mseal.rst | 307 +++++++++++++-------------
1 file changed, 148 insertions(+), 159 deletions(-)
--
2.47.0.rc0.187.ge670bccf7e-goog
This series introduces a new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT. It
allows hypervisors to inhibit remote TLB flushing of a vCPU coming from
Hyper-V hyper-calls (namely HvFlushVirtualAddressSpace(Ex) and
HvFlushirtualAddressList(Ex)). It is required to implement the
HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
emulate VSM within KVM and QEMU. The hyper-call requires several new KVM
APIs, one of which is KVM_HYPERV_SET_TLB_FLUSH_INHIBIT.
Once the inhibit flag is set, any processor attempting to flush the TLB on
the marked vCPU, with a HyperV hyper-call, will be suspended until the
flag is cleared again. During the suspension the vCPU will not run at all,
neither receiving events nor running other code. It will wake up from
suspension once the vCPU it is waiting on clears the inhibit flag. This
behaviour is specified in Microsoft's "Hypervisor Top Level Functional
Specification" (TLFS).
The vCPU will block execution during the suspension, making it transparent
to the hypervisor. An alternative design to what is proposed here would be
to exit from the Hyper-V hypercall upon finding an inhibited vCPU. We
decided against it, to allow for a simpler and more performant
implementation. Exiting to user space would create an additional
synchronisation burden and make the resulting code more complex.
Additionally, since the suspension is specific to HyperV events, it
wouldn't provide any functional benefits.
The TLFS specifies that the instruction pointer is not moved during the
suspension, so upon unsuspending the hyper-calls is re-executed. This
means that, if the vCPU encounters another inhibited TLB and is
resuspended, any pending events and interrupts are still executed. This is
identical to the vCPU receiving such events right before the hyper-call.
This inhibiting of TLB flushes is necessary, to securely implement
intercepts. These allow a higher "Virtual Trust Level" (VTL) to react to
a lower VTL accessing restricted memory. In such an intercept the VTL may
want to emulate a memory access in software, however, if another processor
flushes the TLB during that operation, incorrect behaviour can result.
The patch series includes basic testing of the ioctl and suspension state.
All previously passing KVM selftests and KVM unit tests still pass.
Series overview:
- 1: Document the new ioctl
- 2: Implement the suspension state
- 3: Update TLB flush hyper-call in preparation
- 4-5: Implement the ioctl
- 6: Add traces
- 7: Implement testing
As the suspension state is transparent to the hypervisor, testing is
complicated. The current version makes use of a set time intervall to give
the vCPU time to enter the hyper-call and get suspended. Ideas for
improvement on this are very welcome.
This series, alongside my series [1] implementing KVM_TRANSLATE2, the
series by Nicolas Saenz Julienne [2] implementing the core building blocks
for VSM and the accompanying QEMU implementation [3], is capable of
booting Windows Server 2019 with VSM/CredentialGuard enabled.
All three series are also available on GitHub [4].
[1] https://lore.kernel.org/linux-kernel/20240910152207.38974-1-nikwip@amazon.d…
[2] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.c…
[3] https://github.com/vianpl/qemu/tree/vsm/next
[4] https://github.com/vianpl/linux/tree/vsm/next
Best,
Nikolas
Nikolas Wipper (7):
KVM: Add API documentation for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Implement Hyper-V's vCPU suspended state
KVM: x86: Check vCPUs before enqueuing TLB flushes in
kvm_hv_flush_tlb()
KVM: Introduce KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Implement KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Add trace events to track Hyper-V suspensions
KVM: selftests: Add tests for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
Documentation/virt/kvm/api.rst | 41 +++
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/kvm/hyperv.c | 86 +++++-
arch/x86/kvm/hyperv.h | 17 ++
arch/x86/kvm/trace.h | 39 +++
arch/x86/kvm/x86.c | 41 ++-
include/uapi/linux/kvm.h | 15 +
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/x86_64/hyperv_tlb_flush_inhibit.c | 274 ++++++++++++++++++
9 files changed, 503 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush_inhibit.c
--
2.40.1
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.
- Patch 1: Explicitly reject such requests. A fix for >= v5.12.
- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- This new version fixes the root cause for the issue Cong Wang sent a
patch for a few weeks ago, see the v1, and the explanations below. The
new version is very different from the v1, from a different author.
Thanks to Cong Wang for the first analysis, and to Paolo for having
spot the root cause, and sent a fix for it.
- Link to v1: https://lore.kernel.org/r/20240908180620.822579-1-xiyou.wangcong@gmail.com
- Link: https://lore.kernel.org/r/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com
---
Paolo Abeni (2):
mptcp: prevent MPC handshake on port-based signal endpoints
selftests: mptcp: join: test for prohibited MPC to port-based endp
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 11 +++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +++++++++++++++++-------
6 files changed, 101 insertions(+), 31 deletions(-)
---
base-commit: 174714f0e505070a16be6fbede30d32b81df789f
change-id: 20241014-net-mptcp-mpc-port-endp-4f88bd428ec7
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
DAMON debugfs interface was the only user interface of DAMON at the
beginning[1]. However, it turned out the interface would be not good
enough for long-term flexibility and stability.
In Feb 2022[2], we therefore introduced DAMON sysfs interface as an
alternative user interface that aims long-term flexibility and
stability. With its introduction, DAMON debugfs interface has announced
to be deprecated in near future.
In Feb 2023[3], we announced the official deprecation of DAMON debugfs
interface. In Jan 2024[4], we further made the deprecation difficult to
be ignored.
And as of this writing (2024-10-14), no problem or concerns about the
deprecation have reported. Apparently users are already moved to the
alternative, or made good plans for the change.
Remove the DAMON debugfs interface code from the tree. Given the past
timeline and the absence of reported problems or concerns, it is safe
enough to be done. That said, we will not drop the RFC tag of this
patch series at least until the end of this year, to use this as the
real last call for users.
[1] https://lore.kernel.org/20210716081449.22187-1-sj38.park@gmail.com
[2] https://lore.kernel.org/20220228081314.5770-1-sj@kernel.org
[3] https://lore.kernel.org/20230209192009.7885-1-sj@kernel.org
[4] https://lore.kernel.org/20240130013549.89538-1-sj@kernel.org
SeongJae Park (7):
Docs/admin-guide/mm/damon/usage: remove DAMON debugfs interface
documentation
Docs/mm/damon/design: update for removal of DAMON debugfs interface
selftests/damon/config: remove configs for DAMON debugfs interface
selftests
selftests/damon: remove tests for DAMON debugfs interface
kunit: configs: remove configs for DAMON debugfs interface tests
mm/damon: remove DAMON debugfs interface kunit tests
mm/damon: remove DAMON debugfs interface
Documentation/admin-guide/mm/damon/usage.rst | 309 -----
Documentation/mm/damon/design.rst | 23 +-
mm/damon/Kconfig | 30 -
mm/damon/Makefile | 1 -
mm/damon/dbgfs.c | 1148 -----------------
mm/damon/tests/.kunitconfig | 7 -
mm/damon/tests/dbgfs-kunit.h | 173 ---
tools/testing/kunit/configs/all_tests.config | 3 -
tools/testing/selftests/damon/.gitignore | 3 -
tools/testing/selftests/damon/Makefile | 11 +-
tools/testing/selftests/damon/config | 1 -
.../testing/selftests/damon/debugfs_attrs.sh | 17 -
.../debugfs_duplicate_context_creation.sh | 27 -
.../selftests/damon/debugfs_empty_targets.sh | 21 -
.../damon/debugfs_huge_count_read_write.sh | 22 -
.../damon/debugfs_rm_non_contexts.sh | 19 -
.../selftests/damon/debugfs_schemes.sh | 19 -
.../selftests/damon/debugfs_target_ids.sh | 19 -
.../damon/debugfs_target_ids_pid_leak.c | 68 -
.../damon/debugfs_target_ids_pid_leak.sh | 22 -
...fs_target_ids_read_before_terminate_race.c | 80 --
...s_target_ids_read_before_terminate_race.sh | 14 -
.../selftests/damon/huge_count_read_write.c | 48 -
23 files changed, 11 insertions(+), 2074 deletions(-)
delete mode 100644 mm/damon/dbgfs.c
delete mode 100644 mm/damon/tests/dbgfs-kunit.h
delete mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_duplicate_context_creation.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_empty_targets.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_huge_count_read_write.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_rm_non_contexts.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_schemes.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids.sh
delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.c
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.sh
delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.c
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.sh
delete mode 100644 tools/testing/selftests/damon/huge_count_read_write.c
base-commit: 5ef943709a1b88304aa6e8cb8683a25bf81874f0
--
2.39.5
PACKET socket can retain its fanout membership through link down and up
and leave a fanout while closed regardless of link state.
However, socket was forbidden from joining a fanout while it was not
RUNNING.
This scenario was identified while studying DPDK pmd_af_packet_drv.
Since sockets are only created during initialization, there is no reason
to fail the initialization if a single link is temporarily down.
This patch allows PACKET socket to join a fanout while not RUNNING.
Selftest psock_fanout is extended to test this "fanout while link down"
scenario.
Selftest psock_fanout is also extended to test fanout create/join by
socket that did not bind or specified a protocol, which carries an
implicit bind.
This is the only test that was performed.
Changes:
V04:
* Minimized code change.
* Removed test of ifindex. A socket that went through bind "unlisted" race can
join a fanout.
V03: https://lore.kernel.org/netdev/cover.1728555449.git.gur.stavi@huawei.com
* psock_fanout: add test for joining fanout with unbound socket.
* Test that socket can receive packets before adding it to a fanout match.
This is kind of replaces the RUNNING test that was removed.
* Initialize po->ifindex in packet_create. To -1 if no protocol is specified
and add an explicit initialization to 0 if protocol is specified.
* Refactor relevant code in fanout_add within bind_lock, as a sequence of
if {} else if {}, in order to reduce indentation of nested if statements and
provide specific error codes.
V02: https://lore.kernel.org/netdev/cover.1728382839.git.gur.stavi@huawei.com
* psock_fanout: use explicit loopback up/down instead of toggle.
* psock_fanout: don't try to restore loopback state on failure.
* Rephrase commit message about "leaving a fanout".
V01: https://lore.kernel.org/netdev/cover.1728303615.git.gur.stavi@huawei.com/
Gur Stavi (3):
af_packet: allow fanout_add when socket is not RUNNING
selftests: net/psock_fanout: socket joins fanout when link is down
selftests: net/psock_fanout: unbound socket fanout
net/packet/af_packet.c | 9 +--
tools/testing/selftests/net/psock_fanout.c | 78 +++++++++++++++++++++-
2 files changed, 80 insertions(+), 7 deletions(-)
base-commit: c531f2269a53db5cf64b24baf785ccbcda52970f
--
2.45.2