Hello!
KUnit offers a parameterized testing framework, where tests can be
run multiple times with different inputs. However, the current
implementation uses the same `struct kunit` for each parameter run.
After each run, the test context gets cleaned up, which creates
the following limitations:
a. There is no way to store resources that are accessible across
the individual parameter runs.
b. It's not possible to pass additional context, besides the previous
parameter (and potentially anything else that is stored in the current
test context), to the parameter generator function.
c. Test users are restricted to using pre-defined static arrays
of parameter objects or generate_params() to define their
parameters. There is no flexibility to make a custom dynamic
array without using generate_params(), which can be complex if
generating the next parameter depends on more than just the single
previous parameter.
This patch series resolves these limitations by:
1. [P 1] Giving each parameterized run its own `struct kunit`. It will
remove the need to manage state, such as resetting the `test->priv`
field or the `test->status_comment` after every parameter run.
2. [P 1] Introducing parameterized test context available to all
parameter runs through the parent pointer of type `struct kunit`.
This context won't be used to execute any test logic, but will
instead be used for storing shared resources. Each parameter run
context will have a reference to that parent instance and thus,
have access to those resources.
3. [P 2] Introducing param_init() and param_exit() functions that can
initialize and exit the parameterized test context. They will run once
before and after the parameterized test. param_init() can be used to add
resources to share between parameter runs, pass parameter arrays, and
any other setup logic. While param_exit() can be used to clean up
resources that were not managed by the parameterized test, and
any other teardown logic.
4. [P 3] Passing the parameterized test context as an additional argument
to generate_params(). This provides generate_params() with more context,
making parameter generation much more flexible. The generate_params()
implementations in the KCSAN and drm/xe tests have been adapted to match
the new function pointer signature.
5. [P 4] Introducing a `params_array` field in `struct kunit`.
This will allow the parameterized test context to have direct
storage of the parameter array, enabling features like using
dynamic parameter arrays or using context beyond just the
previous parameter. This will also enable outputting the KTAP
test plan for a parameterized test when the parameter count is
available.
Patches 5 and 6 add examples tests to lib/kunit/kunit-example-test.c to
showcase the new features and patch 7 updates the KUnit documentation
to reflect all the framework changes.
Thank you!
-Marie
---
Changes in v2:
Link to v1 of this patch series:
https://lore.kernel.org/all/20250729193647.3410634-1-marievic@google.com/
- Establish parameterized testing terminology:
- "parameterized test" will refer to the group of all runs of a single test
function with different parameters.
- "parameter run" will refer to the execution of the test case function with
a single parameter.
- "parameterized test context" is the `struct kunit` that holds the context
for the entire parameterized test.
- "parameter run context" is the `struct kunit` that holds the context of the
individual parameter run.
- A test is defined to be a parameterized tests if it was registered with a
generator function.
- Make comment edits to reflect the established terminology.
- Require users to manually pass kunit_array_gen_params() to
KUNIT_CASE_PARAM_WITH_INIT() as the generator function, unless they want to
provide their own generator function, if the parameter array was registered
in param_init(). This is to be consistent with the definition of a
parameterized test, i.e. generate_params() is never NULL if it's
a parameterized test.
- Change name of kunit_get_next_param_and_desc() to
kunit_array_gen_params().
- Other minor function name changes such as removing the "__" prefix in front
of internal functions.
- Change signature of get_description() in `struct params_array` to accept
the parameterized test context, as well.
- Output the KTAP test plan for a parameterized test when the parameter count
is available.
- Cover letter was made more concise.
- Edits to the example tests.
- Fix bug of parameterized test init/exit logic being done outside of the
parameterized test check.
- Fix bugs identified by the kernel test robot.
---
Marie Zhussupova (7):
kunit: Add parent kunit for parameterized test context
kunit: Introduce param_init/exit for parameterized test context
management
kunit: Pass parameterized test context to generate_params()
kunit: Enable direct registration of parameter arrays to a KUnit test
kunit: Add example parameterized test with shared resource management
using the Resource API
kunit: Add example parameterized test with direct dynamic parameter
array setup
Documentation: kunit: Document new parameterized test features
Documentation/dev-tools/kunit/usage.rst | 342 +++++++++++++++++++++++-
drivers/gpu/drm/xe/tests/xe_pci.c | 2 +-
include/kunit/test.h | 95 ++++++-
kernel/kcsan/kcsan_test.c | 2 +-
lib/kunit/kunit-example-test.c | 222 +++++++++++++++
lib/kunit/test.c | 87 ++++--
rust/kernel/kunit.rs | 4 +
7 files changed, 726 insertions(+), 28 deletions(-)
--
2.51.0.rc0.205.g4a044479a3-goog
Skipped tests reported by kselftest.h use a different format than KTAP,
there is no explicit test name. Normally the test name is part of the
free-form string after the SKIP keyword:
ok 3 # SKIP test: some reason
Extend the parser to handle those correctly. Use the free-form string as
test name instead.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: David Gow <davidgow(a)google.com>
---
These patches where originally part of my series "kunit: Introduce UAPI
testing framework" [0], but that isn't going anywhere right now and the
patches are useful on their own.
Both series would go in through the KUnit tree in any case, so there is
no potential for conflicts.
[0] https://lore.kernel.org/lkml/20250717-kunit-kselftests-v5-0-442b711cde2e@li…
---
tools/testing/kunit/kunit_parser.py | 8 +++++---
tools/testing/kunit/test_data/test_is_test_passed-kselftest.log | 3 ++-
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index c176487356e6c94882046b19ea696d750905b8d5..333cd3a4a56b6f26c10aa1a5ecec9858bc57fbd7 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -352,9 +352,9 @@ def parse_test_plan(lines: LineStream, test: Test) -> bool:
lines.pop()
return True
-TEST_RESULT = re.compile(r'^\s*(ok|not ok) ([0-9]+) (- )?([^#]*)( # .*)?$')
+TEST_RESULT = re.compile(r'^\s*(ok|not ok) ([0-9]+) ?(- )?([^#]*)( # .*)?$')
-TEST_RESULT_SKIP = re.compile(r'^\s*(ok|not ok) ([0-9]+) (- )?(.*) # SKIP(.*)$')
+TEST_RESULT_SKIP = re.compile(r'^\s*(ok|not ok) ([0-9]+) ?(- )?(.*) # SKIP ?(.*)$')
def peek_test_name_match(lines: LineStream, test: Test) -> bool:
"""
@@ -379,6 +379,8 @@ def peek_test_name_match(lines: LineStream, test: Test) -> bool:
if not match:
return False
name = match.group(4)
+ if not name:
+ return False
return name == test.name
def parse_test_result(lines: LineStream, test: Test,
@@ -416,7 +418,7 @@ def parse_test_result(lines: LineStream, test: Test,
# Set name of test object
if skip_match:
- test.name = skip_match.group(4)
+ test.name = skip_match.group(4) or skip_match.group(5)
else:
test.name = match.group(4)
diff --git a/tools/testing/kunit/test_data/test_is_test_passed-kselftest.log b/tools/testing/kunit/test_data/test_is_test_passed-kselftest.log
index 65d3f27feaf22a3f47ed831c4c24f6f11c625a92..30d9ef18bcec177067288d5242771236f29b7d56 100644
--- a/tools/testing/kunit/test_data/test_is_test_passed-kselftest.log
+++ b/tools/testing/kunit/test_data/test_is_test_passed-kselftest.log
@@ -1,5 +1,5 @@
TAP version 13
-1..2
+1..3
# selftests: membarrier: membarrier_test_single_thread
# TAP version 13
# 1..2
@@ -12,3 +12,4 @@ ok 1 selftests: membarrier: membarrier_test_single_thread
# ok 1 sys_membarrier available
# ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected
ok 2 selftests: membarrier: membarrier_test_multi_thread
+ok 3 # SKIP selftests: membarrier: membarrier_test_multi_thread
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250813-kunit-kselftesth-skip-e289becd9746
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
For kbuild to properly clean up these build artifacts in the subdirectory,
even after CONFIG_KUNIT changed do disabled, the directory needs to be
processed always.
Pushing the special logic for hook.o into the kunit Makefile also makes the
logic easier to understand.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: David Gow <davidgow(a)google.com>
---
This patch was originally part of my series "kunit: Introduce UAPI
testing framework" [0], but that isn't going anywhere right now and the
patch is useful on its own.
Changes to the original series:
* Make the commit message more general, the same issue affects all build
artifacts.
[0] https://lore.kernel.org/lkml/20250717-kunit-kselftests-v5-0-442b711cde2e@li…
---
lib/Makefile | 4 ----
lib/kunit/Makefile | 2 +-
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/lib/Makefile b/lib/Makefile
index 392ff808c9b90210849e397356d1aa435a47bd07..15a03f4c16e2cd6c75297005e71fa2108c1f41f2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -109,11 +109,7 @@ test_fpu-y := test_fpu_glue.o test_fpu_impl.o
CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU)
CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU)
-# Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
-# so we can't just use obj-$(CONFIG_KUNIT).
-ifdef CONFIG_KUNIT
obj-y += kunit/
-endif
ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile
index 5aa51978e456ab3bb60c12071a26cf2bdcb1b508..656f1fa35abcc635e67d5b4cb1bc586b48415ac5 100644
--- a/lib/kunit/Makefile
+++ b/lib/kunit/Makefile
@@ -17,7 +17,7 @@ kunit-objs += debugfs.o
endif
# KUnit 'hooks' are built-in even when KUnit is built as a module.
-obj-y += hooks.o
+obj-$(if $(CONFIG_KUNIT),y) += hooks.o
obj-$(CONFIG_KUNIT_TEST) += kunit-test.o
obj-$(CONFIG_KUNIT_TEST) += platform-test.o
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250813-kunit-always-descend-39e502a8b2b9
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This series introduces NUMA-aware memory placement support for KVM guests
with guest_memfd memory backends. It builds upon Fuad Tabba's work (V17)
that enabled host-mapping for guest_memfd memory [1].
== Background ==
KVM's guest-memfd memory backend currently lacks support for NUMA policy
enforcement, causing guest memory allocations to be distributed across host
nodes according to kernel's default behavior, irrespective of any policy
specified by the VMM. This limitation arises because conventional userspace
NUMA control mechanisms like mbind(2) don't work since the memory isn't
directly mapped to userspace when allocations occur.
Fuad's work [1] provides the necessary mmap capability, and this series
leverages it to enable mbind(2).
== Implementation ==
This series implements proper NUMA policy support for guest-memfd by:
1. Adding mempolicy-aware allocation APIs to the filemap layer.
2. Introducing custom inodes (via a dedicated slab-allocated inode cache,
kvm_gmem_inode_info) to store NUMA policy and metadata for guest memory.
3. Implementing get/set_policy vm_ops in guest_memfd to support NUMA
policy.
With these changes, VMMs can now control guest memory placement by mapping
guest_memfd file descriptor and using mbind(2) to specify:
- Policy modes: default, bind, interleave, or preferred
- Host NUMA nodes: List of target nodes for memory allocation
These Policies affect only future allocations and do not migrate existing
memory. This matches mbind(2)'s default behavior which affects only new
allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (Not
supported for guest_memfd as it is unmovable by design).
== Upstream Plan ==
Phased approach as per David's guest_memfd extension overview [2] and
community calls [3]:
Phase 1 (this series):
1. Focuses on shared guest_memfd support (non-CoCo VMs).
2. Builds on Fuad's host-mapping work.
Phase2 (future work):
1. NUMA support for private guest_memfd (CoCo VMs).
2. Depends on SNP in-place conversion support [4].
This series provides a clean integration path for NUMA-aware memory
management for guest_memfd and lays the groundwork for future confidential
computing NUMA capabilities.
Please review and provide feedback!
Thanks,
Shivank
== Changelog ==
- v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy.
- v3: Introduced fbind() syscall for VMM memory-placement configuration.
- v4-v6: Current approach using shared_policy support and vm_ops (based on
suggestions from David [5] and guest_memfd bi-weekly upstream
call discussion [6]).
- v7: Use inodes to store NUMA policy instead of file [7].
- v8: Rebase on top of Fuad's V12: Host mmaping for guest_memfd memory.
- v9: Rebase on top of Fuad's V13 and incorporate review comments
- V10: Rebase on top of Fuad's V17. Use latest guest_memfd inode patch
from Ackerley (with David's review comments). Use newer kmem_cache_create()
API variant with arg parameter (Vlastimil)
[1] https://lore.kernel.org/all/20250729225455.670324-1-seanjc@google.com
[2] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAo…
[4] https://lore.kernel.org/all/20250613005400.3694904-1-michael.roth@amd.com
[5] https://lore.kernel.org/all/6fbef654-36e2-4be5-906e-2a648a845278@redhat.com
[6] https://lore.kernel.org/all/2b77e055-98ac-43a1-a7ad-9f9065d7f38f@amd.com
[7] https://lore.kernel.org/all/diqzbjumm167.fsf@ackerleytng-ctop.c.googlers.com
Ackerley Tng (1):
KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
Matthew Wilcox (Oracle) (2):
mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio()
mm/filemap: Extend __filemap_get_folio() to support NUMA memory
policies
Shivank Garg (4):
mm/mempolicy: Export memory policy symbols
KVM: guest_memfd: Add slab-allocated inode cache
KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
KVM: guest_memfd: selftests: Add tests for mmap and NUMA policy
support
fs/bcachefs/fs-io-buffered.c | 2 +-
fs/btrfs/compression.c | 4 +-
fs/btrfs/verity.c | 2 +-
fs/erofs/zdata.c | 2 +-
fs/f2fs/compress.c | 2 +-
include/linux/pagemap.h | 18 +-
include/uapi/linux/magic.h | 1 +
mm/filemap.c | 23 +-
mm/mempolicy.c | 6 +
mm/readahead.c | 2 +-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 121 ++++++++
virt/kvm/guest_memfd.c | 260 ++++++++++++++++--
virt/kvm/kvm_main.c | 7 +-
virt/kvm/kvm_mm.h | 9 +-
15 files changed, 410 insertions(+), 50 deletions(-)
--
2.43.0
_common.sh was recently introduced but is not installed and then
triggers an error when trying to run the damon selftests:
selftests: damon: sysfs.sh
./sysfs.sh: line 4: _common.sh: No such file or directory
Install this file to avoid this error.
Fixes: 511914506d19 ("selftests/damon: introduce _common.sh to host shared function")
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
---
tools/testing/selftests/damon/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile
index 5b230deb19e8ee6cee56eb8f18c35e12f331e8b7..ddc69e8bde2905ff1c461a08f2ad008e6b28ac87 100644
--- a/tools/testing/selftests/damon/Makefile
+++ b/tools/testing/selftests/damon/Makefile
@@ -4,6 +4,7 @@
TEST_GEN_FILES += access_memory access_memory_even
TEST_FILES = _damon_sysfs.py
+TEST_FILES += _common.sh
# functionality tests
TEST_PROGS += sysfs.sh
---
base-commit: 2754d549af31f8f029f02d02cd8e574676229b3d
change-id: 20250812-alex-fixes_manual-aed3ef75dd83
Best regards,
--
Alexandre Ghiti <alexghiti(a)rivosinc.com>
TLS expects that it owns the receive queue of the TCP socket.
This cannot be guaranteed in case the reader of the TCP socket
entered before the TLS ULP was installed, or uses some non-standard
read API (eg. zerocopy ones). Replace the WARN_ON() and a buggy
early exit (which leaves anchor pointing to a freed skb) with real
error handling. Wipe the parsing state and tell the reader to retry.
We already reload the anchor every time we (re)acquire the socket lock,
so the only condition we need to avoid is an out of bounds read
(not having enough bytes in the socket for previously parsed record len).
If some data was read from under TLS but there's enough in the queue
we'll reload and decrypt what is most likely not a valid TLS record.
Leading to some undefined behavior from TLS perspective (corrupting
a stream? missing an alert? missing an attack?) but no kernel crash
should take place.
Reported-by: William Liu <will(a)willsroot.io>
Reported-by: Savino Dicanosa <savy(a)syst3mfailure.io>
Link: https://lore.kernel.org/tFjq_kf7sWIG3A7CrCg_egb8CVsT_gsmHAK0_wxDPJXfIzxFAMx…
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
v2:
- fix the reporter tags
- drop the copied_seq nonsense, just correct the error handling
v1: https://lore.kernel.org/20250806180510.3656677-1-kuba@kernel.org
---
net/tls/tls.h | 2 +-
net/tls/tls_strp.c | 11 ++++++++---
net/tls/tls_sw.c | 3 ++-
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index 774859b63f0d..4e077068e6d9 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -196,7 +196,7 @@ void tls_strp_msg_done(struct tls_strparser *strp);
int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
void tls_rx_msg_ready(struct tls_strparser *strp);
-void tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh);
+bool tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh);
int tls_strp_msg_cow(struct tls_sw_context_rx *ctx);
struct sk_buff *tls_strp_msg_detach(struct tls_sw_context_rx *ctx);
int tls_strp_msg_hold(struct tls_strparser *strp, struct sk_buff_head *dst);
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index 095cf31bae0b..d71643b494a1 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -475,7 +475,7 @@ static void tls_strp_load_anchor_with_queue(struct tls_strparser *strp, int len)
strp->stm.offset = offset;
}
-void tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh)
+bool tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh)
{
struct strp_msg *rxm;
struct tls_msg *tlm;
@@ -484,8 +484,11 @@ void tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh)
DEBUG_NET_WARN_ON_ONCE(!strp->stm.full_len);
if (!strp->copy_mode && force_refresh) {
- if (WARN_ON(tcp_inq(strp->sk) < strp->stm.full_len))
- return;
+ if (unlikely(tcp_inq(strp->sk) < strp->stm.full_len)) {
+ WRITE_ONCE(strp->msg_ready, 0);
+ memset(&strp->stm, 0, sizeof(strp->stm));
+ return false;
+ }
tls_strp_load_anchor_with_queue(strp, strp->stm.full_len);
}
@@ -495,6 +498,8 @@ void tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh)
rxm->offset = strp->stm.offset;
tlm = tls_msg(strp->anchor);
tlm->control = strp->mark;
+
+ return true;
}
/* Called with lock held on lower socket */
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 549d1ea01a72..51c98a007dda 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1384,7 +1384,8 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
return sock_intr_errno(timeo);
}
- tls_strp_msg_load(&ctx->strp, released);
+ if (unlikely(!tls_strp_msg_load(&ctx->strp, released)))
+ return tls_rx_rec_wait(sk, psock, nonblock, false);
return 1;
}
--
2.50.1
A few tweaks to the devmem test to make it more "NIPA-compatible".
We still need a fix to make sure that the test sets hds threshold
to 0. Taehee is presumably already/still working on that:
https://lore.kernel.org/20250702104249.1665034-1-ap420073@gmail.com
so I'm not including my version.
# ./tools/testing/selftests/drivers/net/hw/devmem.py
TAP version 13
1..3
ok 1 devmem.check_rx
ok 2 devmem.check_tx
ok 3 devmem.check_tx_chunks
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski (5):
selftests: drv-net: add configs for zerocopy Rx
selftests: drv-net: devmem: remove sudo from system() calls
selftests: drv-net: devmem: add / correct the IPv6 support
selftests: net: terminate bkg() commands on exception
selftests: drv-net: devmem: flip the direction of Tx tests
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 14 +++++++-------
tools/testing/selftests/drivers/net/hw/config | 2 ++
tools/testing/selftests/drivers/net/hw/devmem.py | 14 +++++++-------
tools/testing/selftests/net/lib/py/utils.py | 5 ++++-
4 files changed, 20 insertions(+), 15 deletions(-)
--
2.50.1
--
Hi,
PERDIS SUPER U is a leading retail group in France with numerous
outlets across the country. After reviewing your company profile and
products, we’re very interested in establishing a long-term partnership.
Kindly share your product catalog or website so we can review your
offerings and pricing. We are ready to place orders and begin
cooperation.Please note: Our payment terms are SWIFT, 14 days after
delivery.
Looking forward to your response.
Best regards,
Dominique Schelcher
Director, PERDIS SUPER U
RUE DE SAVOIE, 45600 SAINT-PÈRE-SUR-LOIRE
VAT: FR65380071464
www.magasins-u.com
__kunit_add_resource() currently does the following
things in order: initializes the resource refcount to 1,
initializes the resource, and adds the resource to
the test's resource list. Currently, __kunit_add_resource()
only fails if the resource initialization fails.
The kunit_alloc_and_get_resource()
and kunit_alloc_resource() functions allocate memory
for `struct kunit_resource`. However, if the subsequent
call to __kunit_add_resource() fails, the functions
return NULL without releasing the memory.
This patch adds calls to kunit_put_resource() in these
functions before returning NULL to decrease the refcount
of the resource that failed to initialize to 0. This will
trigger kunit_release_resource(), which will both call
kunit_resource->free and kfree() on the resource.
Since kunit_resource->free is user defined, comments
were added to note that kunit_resource->free()
should be able to handle any inconsistent state
that may result from the resource init failure.
Signed-off-by: Marie Zhussupova <marievic(a)google.com>
---
include/kunit/resource.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/kunit/resource.h b/include/kunit/resource.h
index 4ad69a2642a5..2585e9a5242d 100644
--- a/include/kunit/resource.h
+++ b/include/kunit/resource.h
@@ -216,7 +216,9 @@ static inline int kunit_add_named_resource(struct kunit *test,
* kunit_alloc_and_get_resource() - Allocates and returns a *test managed resource*.
* @test: The test context object.
* @init: a user supplied function to initialize the resource.
- * @free: a user supplied function to free the resource (if needed).
+ * @free: a user supplied function to free the resource (if needed). Note that,
+ * if supplied, @free will run even if @init fails: Make sure it can handle any
+ * inconsistent state which may result.
* @internal_gfp: gfp to use for internal allocations, if unsure, use GFP_KERNEL
* @context: for the user to pass in arbitrary data to the init function.
*
@@ -258,6 +260,7 @@ kunit_alloc_and_get_resource(struct kunit *test,
kunit_get_resource(res);
return res;
}
+ kunit_put_resource(res);
return NULL;
}
@@ -265,7 +268,9 @@ kunit_alloc_and_get_resource(struct kunit *test,
* kunit_alloc_resource() - Allocates a *test managed resource*.
* @test: The test context object.
* @init: a user supplied function to initialize the resource.
- * @free: a user supplied function to free the resource (if needed).
+ * @free: a user supplied function to free the resource (if needed). Note that,
+ * if supplied, @free will run even if @init fails: Make sure it can handle any
+ * inconsistent state which may result.
* @internal_gfp: gfp to use for internal allocations, if unsure, use GFP_KERNEL
* @context: for the user to pass in arbitrary data to the init function.
*
@@ -293,6 +298,7 @@ static inline void *kunit_alloc_resource(struct kunit *test,
if (!__kunit_add_resource(test, init, free, res, context))
return res->data;
+ kunit_put_resource(res);
return NULL;
}
--
2.51.0.rc0.205.g4a044479a3-goog
On fast machines the tests run in quick succession so even
when tests clean up after themselves the carrier may need
some time to come back.
Specifically in NIPA when ping.py runs right after netpoll_basic.py
the first ping command fails.
Since the context manager callbacks are now common NetDrvEpEnv
gets an ip link up call as well.
Fixes: b4db9f840283 ("selftests: drivers: add scaffolding for Netlink tests in Python")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: petrm(a)nvidia.com
CC: linux-kselftest(a)vger.kernel.org
---
.../selftests/drivers/net/lib/py/__init__.py | 2 +-
.../selftests/drivers/net/lib/py/env.py | 38 +++++++++----------
tools/testing/selftests/net/lib/py/utils.py | 18 +++++++++
3 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/lib/py/__init__.py b/tools/testing/selftests/drivers/net/lib/py/__init__.py
index 8711c67ad658..a07b56a75c8a 100644
--- a/tools/testing/selftests/drivers/net/lib/py/__init__.py
+++ b/tools/testing/selftests/drivers/net/lib/py/__init__.py
@@ -15,7 +15,7 @@ KSFT_DIR = (Path(__file__).parent / "../../../..").resolve()
NlError, RtnlFamily, DevlinkFamily
from net.lib.py import CmdExitFailure
from net.lib.py import bkg, cmd, bpftool, bpftrace, defer, ethtool, \
- fd_read_timeout, ip, rand_port, tool, wait_port_listen
+ fd_read_timeout, ip, rand_port, tool, wait_port_listen, wait_file
from net.lib.py import fd_read_timeout
from net.lib.py import KsftSkipEx, KsftFailEx, KsftXfailEx
from net.lib.py import ksft_disruptive, ksft_exit, ksft_pr, ksft_run, \
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 1b8bd648048f..1de63734ddec 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -4,7 +4,7 @@ import os
import time
from pathlib import Path
from lib.py import KsftSkipEx, KsftXfailEx
-from lib.py import ksft_setup
+from lib.py import ksft_setup, wait_file
from lib.py import cmd, ethtool, ip, CmdExitFailure
from lib.py import NetNS, NetdevSimDev
from .remote import Remote
@@ -25,6 +25,9 @@ from .remote import Remote
self.env = self._load_env_file()
+ # Following attrs must be set be inheriting classes
+ self.dev = None
+
def _load_env_file(self):
env = os.environ.copy()
@@ -48,6 +51,19 @@ from .remote import Remote
env[pair[0]] = pair[1]
return ksft_setup(env)
+ def __enter__(self):
+ ip(f"link set dev {self.dev['ifname']} up")
+ wait_file(f"/sys/class/net/{self.dev['ifname']}/carrier",
+ lambda x: x.strip() == "1")
+
+ return self
+
+ def __exit__(self, ex_type, ex_value, ex_tb):
+ """
+ __exit__ gets called at the end of a "with" block.
+ """
+ self.__del__()
+
class NetDrvEnv(NetDrvEnvBase):
"""
@@ -72,17 +88,6 @@ from .remote import Remote
self.ifname = self.dev['ifname']
self.ifindex = self.dev['ifindex']
- def __enter__(self):
- ip(f"link set dev {self.dev['ifname']} up")
-
- return self
-
- def __exit__(self, ex_type, ex_value, ex_tb):
- """
- __exit__ gets called at the end of a "with" block.
- """
- self.__del__()
-
def __del__(self):
if self._ns:
self._ns.remove()
@@ -219,15 +224,6 @@ from .remote import Remote
raise Exception("Can't resolve remote interface name, multiple interfaces match")
return v6[0]["ifname"] if v6 else v4[0]["ifname"]
- def __enter__(self):
- return self
-
- def __exit__(self, ex_type, ex_value, ex_tb):
- """
- __exit__ gets called at the end of a "with" block.
- """
- self.__del__()
-
def __del__(self):
if self._ns:
self._ns.remove()
diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py
index f395c90fb0f1..c42bffea0d87 100644
--- a/tools/testing/selftests/net/lib/py/utils.py
+++ b/tools/testing/selftests/net/lib/py/utils.py
@@ -249,3 +249,21 @@ global_defer_queue = []
if time.monotonic() > end:
raise Exception("Waiting for port listen timed out")
time.sleep(sleep)
+
+
+def wait_file(fname, test_fn, sleep=0.005, deadline=5, encoding='utf-8'):
+ """
+ Wait for file contents on the local system to satisfy a condition.
+ test_fn() should take one argument (file contents) and return whether
+ condition is met.
+ """
+ end = time.monotonic() + deadline
+
+ with open(fname, "r", encoding=encoding) as fp:
+ while True:
+ if test_fn(fp.read()):
+ break
+ fp.seek(0)
+ if time.monotonic() > end:
+ raise TimeoutError("Wait for file contents failed", fname)
+ time.sleep(sleep)
--
2.50.1
Extend the existing netconsole cmdline selftest to also validate that
interface selection can be performed via MAC address.
The test now validates that netconsole works with both interface name
and MAC address, improving test coverage.
Suggested-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Andre Carvalho <asantostc(a)gmail.com>
---
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 10 +++-
.../selftests/drivers/net/netcons_cmdline.sh | 55 +++++++++++++---------
2 files changed, 42 insertions(+), 23 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
index b6071e80ebbb6a33283ab6cd6bcb7b925aefdb43..8e1085e896472d5c87ec8b236240878a5b2d00d2 100644
--- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
+++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
@@ -148,12 +148,20 @@ function create_dynamic_target() {
# Generate the command line argument for netconsole following:
# netconsole=[+][src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
function create_cmdline_str() {
+ local BINDMODE=${1:-"ifname"}
+ if [ "${BINDMODE}" == "ifname" ]
+ then
+ SRCDEV=${SRCIF}
+ else
+ SRCDEV=$(mac_get "${SRCIF}")
+ fi
+
DSTMAC=$(ip netns exec "${NAMESPACE}" \
ip link show "${DSTIF}" | awk '/ether/ {print $2}')
SRCPORT="1514"
TGTPORT="6666"
- echo "netconsole=\"+${SRCPORT}@${SRCIP}/${SRCIF},${TGTPORT}@${DSTIP}/${DSTMAC}\""
+ echo "netconsole=\"+${SRCPORT}@${SRCIP}/${SRCDEV},${TGTPORT}@${DSTIP}/${DSTMAC}\""
}
# Do not append the release to the header of the message
diff --git a/tools/testing/selftests/drivers/net/netcons_cmdline.sh b/tools/testing/selftests/drivers/net/netcons_cmdline.sh
index ad2fb8b1c46326c69af20f2c9d68e80fa8eb894f..a15149f3a905d7287258cd17f0e806fb50604cf4 100755
--- a/tools/testing/selftests/drivers/net/netcons_cmdline.sh
+++ b/tools/testing/selftests/drivers/net/netcons_cmdline.sh
@@ -17,10 +17,6 @@ source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
check_netconsole_module
modprobe netdevsim 2> /dev/null || true
-rmmod netconsole 2> /dev/null || true
-
-# The content of kmsg will be save to the following file
-OUTPUT_FILE="/tmp/${TARGET}"
# Check for basic system dependency and exit if not found
# check_for_dependencies
@@ -30,23 +26,38 @@ echo "6 5" > /proc/sys/kernel/printk
trap do_cleanup EXIT
# Create one namespace and two interfaces
set_network
-# Create the command line for netconsole, with the configuration from the
-# function above
-CMDLINE="$(create_cmdline_str)"
-
-# Load the module, with the cmdline set
-modprobe netconsole "${CMDLINE}"
-
-# Listed for netconsole port inside the namespace and destination interface
-listen_port_and_save_to "${OUTPUT_FILE}" &
-# Wait for socat to start and listen to the port.
-wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
-# Send the message
-echo "${MSG}: ${TARGET}" > /dev/kmsg
-# Wait until socat saves the file to disk
-busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
-# Make sure the message was received in the dst part
-# and exit
-validate_msg "${OUTPUT_FILE}"
+
+# Run the test twice, with different cmdline parameters
+for BINDMODE in "ifname" "mac"
+do
+ echo "Running with bind mode: ${BINDMODE}"
+ # Create the command line for netconsole, with the configuration from the
+ # function above
+ CMDLINE="$(create_cmdline_str "${BINDMODE}")"
+
+ # The content of kmsg will be save to the following file
+ OUTPUT_FILE="/tmp/${TARGET}-${BINDMODE}"
+
+ # Unload the module, if present
+ rmmod netconsole 2> /dev/null || true
+ # Load the module, with the cmdline set
+ modprobe netconsole "${CMDLINE}"
+
+ # Listed for netconsole port inside the namespace and destination interface
+ listen_port_and_save_to "${OUTPUT_FILE}" &
+ # Wait for socat to start and listen to the port.
+ wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+ # Send the message
+ echo "${MSG}: ${TARGET}" > /dev/kmsg
+ # Wait until socat saves the file to disk
+ busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+ # Make sure the message was received in the dst part
+ # and exit
+ validate_msg "${OUTPUT_FILE}"
+
+ # kill socat in case it is still running
+ pkill_socat
+ echo "${BINDMODE} : Test passed" >&2
+done
exit "${ksft_pass}"
---
base-commit: 37816488247ddddbc3de113c78c83572274b1e2e
change-id: 20250807-netcons-cmdline-selftest-b32e27a4bd16
Best regards,
--
Andre Carvalho <asantostc(a)gmail.com>
Running the test added with a recent fix on a driver with persistent
NAPI config leads to a deadlock. The deadlock is fixed by patch 3,
patch 2 is I think a more fundamental problem with the way we
implemented the config.
I hope the fix makes sense, my own thinking is definitely colored
by my preference (IOW how the per-queue config RFC was implemented).
v2: add missing kdoc
v1: https://lore.kernel.org/20250808014952.724762-1-kuba@kernel.org
Jakub Kicinski (3):
selftests: drv-net: don't assume device has only 2 queues
net: update NAPI threaded config even for disabled NAPIs
net: prevent deadlocks when enabling NAPIs with mixed kthread config
include/linux/netdevice.h | 5 ++++-
net/core/dev.h | 8 ++++++++
net/core/dev.c | 12 +++++++++---
tools/testing/selftests/drivers/net/napi_threaded.py | 10 ++++++----
4 files changed, 27 insertions(+), 8 deletions(-)
--
2.50.1
I looked at the fchmodat2() tests since I've been experiencing some
random intermittent segfaults with them in my test systems, while doing
so I noticed these two issues. Unfortunately I didn't figure out the
original yet, unless I managed to fix it unwittingly.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.17-rc1.
- Link to v1: https://lore.kernel.org/r/20250714-selftests-fchmodat2-v1-0-b74f3ee0d09c@ke…
---
Mark Brown (2):
selftests/fchmodat2: Clean up temporary files and directories
selftests/fchmodat2: Use ksft_finished()
tools/testing/selftests/fchmodat2/fchmodat2_test.c | 166 ++++++++++++++-------
1 file changed, 112 insertions(+), 54 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250711-selftests-fchmodat2-c30374c376f8
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Currently testing of userspace and in-kernel API use two different
frameworks. kselftests for the userspace ones and Kunit for the
in-kernel ones. Besides their different scopes, both have different
strengths and limitations:
Kunit:
* Tests are normal kernel code.
* They use the regular kernel toolchain.
* They can be packaged and distributed as modules conveniently.
Kselftests:
* Tests are normal userspace code
* They need a userspace toolchain.
A kernel cross toolchain is likely not enough.
* A fair amout of userland is required to run the tests,
which means a full distro or handcrafted rootfs.
* There is no way to conveniently package and run kselftests with a
given kernel image.
* The kselftests makefiles are not as powerful as regular kbuild.
For example they are missing proper header dependency tracking or more
complex compiler option modifications.
Therefore kunit is much easier to run against different kernel
configurations and architectures.
This series aims to combine kselftests and kunit, avoiding both their
limitations. It works by compiling the userspace kselftests as part of
the regular kernel build, embedding them into the kunit kernel or module
and executing them from there. If the kernel toolchain is not fit to
produce userspace because of a missing libc, the kernel's own nolibc can
be used instead.
The structured TAP output from the kselftest is integrated into the
kunit KTAP output transparently, the kunit parser can parse the combined
logs together.
Further room for improvements:
* Call each test in its completely dedicated namespace
* Handle additional test files besides the test executable through
archives. CPIO, cramfs, etc.
* Compatibility with kselftest_harness.h (in progress)
* Expose the blobs in debugfs
* Provide some convience wrappers around compat userprogs
* Figure out a migration path/coexistence solution for
kunit UAPI and tools/testing/selftests/
Output from the kunit example testcase, note the output of
"example_uapi_tests".
$ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit example
...
Running tests with:
$ .kunit/linux kunit.filter_glob=example kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:53:53] ================== example (10 subtests) ===================
[11:53:53] [PASSED] example_simple_test
[11:53:53] [SKIPPED] example_skip_test
[11:53:53] [SKIPPED] example_mark_skipped_test
[11:53:53] [PASSED] example_all_expect_macros_test
[11:53:53] [PASSED] example_static_stub_test
[11:53:53] [PASSED] example_static_stub_using_fn_ptr_test
[11:53:53] [PASSED] example_priv_test
[11:53:53] =================== example_params_test ===================
[11:53:53] [SKIPPED] example value 3
[11:53:53] [PASSED] example value 2
[11:53:53] [PASSED] example value 1
[11:53:53] [SKIPPED] example value 0
[11:53:53] =============== [PASSED] example_params_test ===============
[11:53:53] [PASSED] example_slow_test
[11:53:53] ======================= (4 subtests) =======================
[11:53:53] [PASSED] procfs
[11:53:53] [PASSED] userspace test 2
[11:53:53] [SKIPPED] userspace test 3: some reason
[11:53:53] [PASSED] userspace test 4
[11:53:53] ================ [PASSED] example_uapi_test ================
[11:53:53] ===================== [PASSED] example =====================
[11:53:53] ============================================================
[11:53:53] Testing complete. Ran 16 tests: passed: 11, skipped: 5
[11:53:53] Elapsed time: 67.543s total, 1.823s configuring, 65.655s building, 0.058s running
Based on v6.16-rc1.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v5:
- Initialize output variable of kernel_wait()
- Fix .incbin with in-tree builds
- Keep requirement of KTAP tests to have a number which was removed accidentally
- Only synthesize KTAP subtest failure if the outer one is TestStatus.FAILURE
- Use -I instead of -isystem in NOLIBC_USERCFLAGS to populate dependency files
- +To filesystem developers to all patches
- +To Luis Chamberlain for discussions about usage of usermodehelper
(see patches 6 and 12)
- Link to v4: https://lore.kernel.org/r/20250626-kunit-kselftests-v4-0-48760534fef5@linut…
Changes in v4:
- Move Kconfig.nolibc from tools/ to init/
- Drop generic userprogs nolibc integration
- Drop generic blob framework
- Pick up review tags from David
- Extend new kunit TAP parser tests
- Add MAINTAINERS entry
- Allow CONFIG_KUNIT_UAPI=m
- Split /proc validation into dedicated UAPI test
- Trim recipient list a bit
- Use KUNIT_FAIL_AND_ABORT() over KUNIT_FAIL()
- Link to v3: https://lore.kernel.org/r/20250611-kunit-kselftests-v3-0-55e3d148cbc6@linut…
Changes in v3:
- Reintroduce CONFIG_CC_CAN_LINK_STATIC
- Enable CONFIG_ARCH_HAS_NOLIBC for m68k and SPARC
- Properly handle 'clean' target for userprogs
- Use ramfs over tmpfs to reduce dependencies
- Inherit userprogs byte order and ABI from kernel
- Drop now unnecessary "#ifndef NOLIBC"
- Pick up review tags
- Drop usage of __private in blob.h,
sparse complains and it is not really necessary
- Fix execution on loongarch when using clang
- Drop userprogs libgcc handling, it was ugly and is not yet necessary
- Link to v2: https://lore.kernel.org/r/20250407-kunit-kselftests-v2-0-454114e287fd@linut…
Changes in v2:
- Rebase onto v6.15-rc1
- Add documentation and kernel docs
- Resolve invalid kconfig breakages
- Drop already applied patch "kbuild: implement CONFIG_HEADERS_INSTALL for Usermode Linux"
- Drop userprogs CONFIG_WERROR integration, it doesn't need to be part of this series
- Replace patch prefix "kconfig" with "kbuild"
- Rename kunit_uapi_run_executable() to kunit_uapi_run_kselftest()
- Generate private, conflict-free symbols in the blob framework
- Handle kselftest exit codes
- Handle SIGABRT
- Forward output also to kunit debugfs log
- Install a fd=0 stdin filedescriptor
- Link to v1: https://lore.kernel.org/r/20250217-kunit-kselftests-v1-0-42b4524c3b0a@linut…
---
Thomas Weißschuh (15):
kbuild: userprogs: avoid duplication of flags inherited from kernel
kbuild: userprogs: also inherit byte order and ABI from kernel
kbuild: doc: add label for userprogs section
init: re-add CONFIG_CC_CAN_LINK_STATIC
init: add nolibc build support
fs,fork,exit: export symbols necessary for KUnit UAPI support
kunit: tool: Add test for nested test result reporting
kunit: tool: Don't overwrite test status based on subtest counts
kunit: tool: Parse skipped tests from kselftest.h
kunit: Always descend into kunit directory during build
kunit: qemu_configs: loongarch: Enable LSX/LSAX
kunit: Introduce UAPI testing framework
kunit: uapi: Add example for UAPI tests
kunit: uapi: Introduce preinit executable
kunit: uapi: Validate usability of /proc
Documentation/dev-tools/kunit/api/index.rst | 5 +
Documentation/dev-tools/kunit/api/uapi.rst | 14 +
Documentation/kbuild/makefiles.rst | 2 +
MAINTAINERS | 11 +
Makefile | 7 +-
fs/exec.c | 2 +
fs/file.c | 1 +
fs/filesystems.c | 2 +
fs/fs_struct.c | 1 +
fs/pipe.c | 2 +
include/kunit/uapi.h | 77 ++++++
init/Kconfig | 7 +
init/Kconfig.nolibc | 15 +
init/Makefile.nolibc | 13 +
kernel/exit.c | 3 +
kernel/fork.c | 2 +
lib/Makefile | 4 -
lib/kunit/Kconfig | 14 +
lib/kunit/Makefile | 30 +-
lib/kunit/kunit-example-test.c | 15 +
lib/kunit/kunit-example-uapi.c | 22 ++
lib/kunit/kunit-test-uapi.c | 51 ++++
lib/kunit/kunit-test.c | 23 +-
lib/kunit/kunit-uapi.c | 305 +++++++++++++++++++++
lib/kunit/uapi-preinit.c | 63 +++++
tools/testing/kunit/kunit_parser.py | 11 +-
tools/testing/kunit/kunit_tool_test.py | 11 +
tools/testing/kunit/qemu_configs/loongarch.py | 2 +
.../test_is_test_passed-failure-nested.log | 10 +
.../test_data/test_is_test_passed-kselftest.log | 3 +-
30 files changed, 715 insertions(+), 13 deletions(-)
---
base-commit: 9d5898b413d17510b2a41664a42390a2c79f8bf4
change-id: 20241015-kunit-kselftests-56273bc40442
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Some cleanups for the vDSO selftests.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Also drop vdso_test_clock_getres from .gitignore
- Move patch to fix -Wunitialized in powerpc VDSO_CALL() into this series
- Rebase on v6.17-rc1
- Add test for clock_gettime64()
- Link to v1: https://lore.kernel.org/r/20250707-vdso-tests-fixes-v1-0-545be9781b0c@linut…
---
Thomas Weißschuh (8):
selftests: vDSO: fix -Wunitialized in powerpc VDSO_CALL() wrapper
selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
selftests: vDSO: vdso_test_abi: Use ksft_finished()
selftests: vDSO: vdso_test_abi: Drop clock availability tests
selftests: vDSO: vdso_test_abi: Use explicit indices for name array
selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
selftests: vDSO: Drop vdso_test_clock_getres
tools/testing/selftests/vDSO/.gitignore | 1 -
tools/testing/selftests/vDSO/Makefile | 2 -
tools/testing/selftests/vDSO/vdso_call.h | 7 +-
tools/testing/selftests/vDSO/vdso_test_abi.c | 101 +++++++++--------
.../selftests/vDSO/vdso_test_clock_getres.c | 123 ---------------------
5 files changed, 59 insertions(+), 175 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250707-vdso-tests-fixes-7e4ddffd7f27
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This patchset introduces a new per-port bonding option: `ad_actor_port_prio`.
It allows users to configure the actor's port priority, which can then be used
by the bonding driver for aggregator selection based on port priority.
This provides finer control over LACP aggregator choice, especially in setups
with multiple eligible aggregators over 2 switches.
Hangbin Liu (3):
bonding: add support for per-port LACP actor priority
bonding: support aggregator selection based on port priority
selftests: bonding: add test for LACP actor port priority
Documentation/networking/bonding.rst | 18 ++++-
drivers/net/bonding/bond_3ad.c | 31 ++++++++
drivers/net/bonding/bond_netlink.c | 16 ++++
drivers/net/bonding/bond_options.c | 36 +++++++++
include/net/bond_3ad.h | 2 +
include/net/bond_options.h | 1 +
include/uapi/linux/if_link.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_lacp_prio.sh | 73 +++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 24 ------
tools/testing/selftests/net/lib.sh | 24 ++++++
11 files changed, 203 insertions(+), 26 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
--
2.46.0
The get_next_frame() function in psock_tpacket.c was missing a return
statement in its default switch case, leading to a compiler warning.
This was caused by a `bug_on(1)` call, which is defined as an
`assert()`, being compiled out because NDEBUG is defined during the
build.
Instead of adding a `return NULL;` which would silently hide the error
and could lead to crashes later, this change restores the original
author's intent. By adding `#undef NDEBUG` before including <assert.h>,
we ensure the assertion is active and will cause the test to abort if
this unreachable code is ever executed.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/net/psock_tpacket.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 0dd909e325d9..2938045c5cf9 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -22,6 +22,7 @@
* - TPACKET_V3: RX_RING
*/
+#undef NDEBUG
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
--
2.50.1.703.g449372360f-goog
The `__WORDSIZE` macro, defined in the non-standard `<bits/wordsize.h>`
header, is a GNU extension and not universally available with all
toolchains, such as Clang when used with musl libc.
This can lead to build failures in environments where this header is
missing.
The intention of the code is to determine the bit width of a C `long`.
Replace the non-portable `__WORDSIZE` with the standard and portable
`sizeof(long) * 8` expression to achieve the same result.
This change also removes the inclusion of the now-unused
`<bits/wordsize.h>` header.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/net/psock_tpacket.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 221270cee3ea..0dd909e325d9 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -33,7 +33,6 @@
#include <ctype.h>
#include <fcntl.h>
#include <unistd.h>
-#include <bits/wordsize.h>
#include <net/ethernet.h>
#include <netinet/ip.h>
#include <arpa/inet.h>
@@ -785,7 +784,7 @@ static int test_kernel_bit_width(void)
static int test_user_bit_width(void)
{
- return __WORDSIZE;
+ return sizeof(long) * 8;
}
static const char *tpacket_str[] = {
--
2.50.1.703.g449372360f-goog
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder(a)us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv(a)jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
drivers/net/bonding/bond_main.c | 1 -
.../drivers/net/bonding/bond_options.sh | 59 +++++++++++++++++++
2 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..30cf97f4e814 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3355,7 +3355,6 @@ static void bond_ns_send_all(struct bonding *bond, struct slave *slave)
/* Find out through which dev should the packet go */
memset(&fl6, 0, sizeof(struct flowi6));
fl6.daddr = targets[i];
- fl6.flowi6_oif = bond->dev->ifindex;
dst = ip6_route_output(dev_net(bond->dev), NULL, &fl6);
if (dst->error) {
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
index 7bc148889ca7..b3eb8a919c71 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
@@ -7,6 +7,7 @@ ALL_TESTS="
prio
arp_validate
num_grat_arp
+ vlan_over_bond
"
lib_dir=$(dirname "$0")
@@ -376,6 +377,64 @@ num_grat_arp()
done
}
+vlan_over_bond_arp()
+{
+ local mode="$1"
+ RET=0
+
+ bond_reset "mode $mode arp_interval 100 arp_ip_target 192.0.3.10"
+ ip -n "${s_ns}" link add bond0.3 link bond0 type vlan id 3
+ ip -n "${s_ns}" link set bond0.3 up
+ ip -n "${s_ns}" addr add 192.0.3.1/24 dev bond0.3
+ ip -n "${s_ns}" addr add 2001:db8::3:1/64 dev bond0.3
+
+ slowwait_for_counter 5 5 tc_rule_handle_stats_get \
+ "dev eth0.3 ingress" 101 ".packets" "-n ${c_ns}" || RET=1
+ log_test "vlan over bond arp" "$mode"
+}
+
+vlan_over_bond_ns()
+{
+ local mode="$1"
+ RET=0
+
+ if skip_ns; then
+ log_test_skip "vlan_over_bond ns" "$mode"
+ return 0
+ fi
+
+ bond_reset "mode $mode arp_interval 100 ns_ip6_target 2001:db8::3:10"
+ ip -n "${s_ns}" link add bond0.3 link bond0 type vlan id 3
+ ip -n "${s_ns}" link set bond0.3 up
+ ip -n "${s_ns}" addr add 192.0.3.1/24 dev bond0.3
+ ip -n "${s_ns}" addr add 2001:db8::3:1/64 dev bond0.3
+
+ slowwait_for_counter 5 5 tc_rule_handle_stats_get \
+ "dev eth0.3 ingress" 102 ".packets" "-n ${c_ns}" || RET=1
+ log_test "vlan over bond ns" "$mode"
+}
+
+vlan_over_bond()
+{
+ # add vlan 3 for client
+ ip -n "${c_ns}" link add eth0.3 link eth0 type vlan id 3
+ ip -n "${c_ns}" link set eth0.3 up
+ ip -n "${c_ns}" addr add 192.0.3.10/24 dev eth0.3
+ ip -n "${c_ns}" addr add 2001:db8::3:10/64 dev eth0.3
+
+ # Add tc rule to check the vlan pkts
+ tc -n "${c_ns}" qdisc add dev eth0.3 clsact
+ tc -n "${c_ns}" filter add dev eth0.3 ingress protocol arp \
+ handle 101 flower skip_hw arp_op request \
+ arp_sip 192.0.3.1 arp_tip 192.0.3.10 action pass
+ tc -n "${c_ns}" filter add dev eth0.3 ingress protocol ipv6 \
+ handle 102 flower skip_hw ip_proto icmpv6 \
+ type 135 src_ip 2001:db8::3:1 action pass
+
+ vlan_over_bond_arp "active-backup"
+ vlan_over_bond_ns "active-backup"
+}
+
trap cleanup EXIT
setup_prepare
--
2.50.1
Currently it hard codes the number of hugepage to check for
check_huge_anon(), but it would be more reasonable to do the check based
on a number passed in.
Pass in the hugepage number and do the check based on it.
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Donet Tom <donettom(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
---
v2:
* use mm-new
* add back nr_hpages which is removed by an early commit
* adjust the change log a little
* drop RB and resend
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index 5ab488fab1cd..63ac82f0b9e0 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -105,12 +105,12 @@ static char *allocate_zero_filled_hugepage(size_t len)
return result;
}
-static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, size_t len)
+static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, int nr_hpages, size_t len)
{
unsigned long rss_anon_before, rss_anon_after;
size_t i;
- if (!check_huge_anon(one_page, 4, pmd_pagesize))
+ if (!check_huge_anon(one_page, nr_hpages, pmd_pagesize))
ksft_exit_fail_msg("No THP is allocated\n");
rss_anon_before = rss_anon();
@@ -141,7 +141,7 @@ void split_pmd_zero_pages(void)
size_t len = nr_hpages * pmd_pagesize;
one_page = allocate_zero_filled_hugepage(len);
- verify_rss_anon_split_huge_page_all_zeroes(one_page, len);
+ verify_rss_anon_split_huge_page_all_zeroes(one_page, nr_hpages, len);
ksft_test_result_pass("Split zero filled huge pages successful\n");
free(one_page);
}
--
2.34.1
--
Hi,
PERDIS SUPER U is a leading retail group in France with numerous
outlets across the country. After reviewing your company profile and
products, we’re very interested in establishing a long-term partnership.
Kindly share your product catalog or website so we can review your
offerings and pricing. We are ready to place orders and begin
cooperation.Please note: Our payment terms are SWIFT, 14 days after
delivery.
Looking forward to your response.
Best regards,
Dominique Schelcher
Director, PERDIS SUPER U
RUE DE SAVOIE, 45600 SAINT-PÈRE-SUR-LOIRE
VAT: FR65380071464
www.magasins-u.com
This patchset uses kpageflags to get after-split folio orders for a better
split_huge_page_test result check[1]. The added gather_folio_orders() scans
through a VPN range and collects the numbers of folios at different orders.
check_folio_orders() compares the result of gather_folio_orders() to
a given list of numbers of different orders.
This patchset also added new order and in folio offset to the split huge
page debugfs's pr_debug()s;
Changelog
===
From V1[2]:
1. Dropped split_huge_pages_pid() for loop step change to avoid messing
up with PTE-mapped THP handling. split_huge_page_test.c is changed to
perform split at [addr, addr + pagesize) range to limit one
folio_split() per folio.
2. Moved pr_debug changes in Patch 2 to Patch 1.
3. Moved KPF_* to vm_util.h and used PAGEMAP_PFN instead of local PFN_MASK.
4. Used pagemap_get_pfn() helper.
5. Used char *vaddr and size_t len as inputs to gather_folio_orders() and
check_folio_orders() instead of vpn and nr_pages.
6. Removed variable length variables and used malloc instead.
[1] https://lore.kernel.org/linux-mm/e2f32bdb-e4a4-447c-867c-31405cbba151@redha…
[2] https://lore.kernel.org/linux-mm/20250806022045.342824-1-ziy@nvidia.com/
Zi Yan (3):
mm/huge_memory: add new_order and offset to split_huge_pages*()
pr_debug.
selftests/mm: add check_folio_orders() helper.
selftests/mm: check after-split folio orders in split_huge_page_test.
mm/huge_memory.c | 8 +-
.../selftests/mm/split_huge_page_test.c | 102 ++++++++++----
tools/testing/selftests/mm/vm_util.c | 133 ++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 7 +
4 files changed, 217 insertions(+), 33 deletions(-)
--
2.47.2
This series introduces NUMA-aware memory placement support for KVM guests
with guest_memfd memory backends. It builds upon Fuad Tabba's work that
enabled host-mapping for guest_memfd memory [1].
== Background ==
KVM's guest-memfd memory backend currently lacks support for NUMA policy
enforcement, causing guest memory allocations to be distributed across host
nodes according to kernel's default behavior, irrespective of any policy
specified by the VMM. This limitation arises because conventional userspace
NUMA control mechanisms like mbind(2) don't work since the memory isn't
directly mapped to userspace when allocations occur.
Fuad's work [1] provides the necessary mmap capability, and this series
leverages it to enable mbind(2).
== Implementation ==
This series implements proper NUMA policy support for guest-memfd by:
1. Adding mempolicy-aware allocation APIs to the filemap layer.
2. Introducing custom inodes (via a dedicated slab-allocated inode cache,
kvm_gmem_inode_info) to store NUMA policy and metadata for guest memory.
3. Implementing get/set_policy vm_ops in guest_memfd to support NUMA
policy.
With these changes, VMMs can now control guest memory placement by mapping
guest_memfd file descriptor and using mbind(2) to specify:
- Policy modes: default, bind, interleave, or preferred
- Host NUMA nodes: List of target nodes for memory allocation
These Policies affect only future allocations and do not migrate existing
memory. This matches mbind(2)'s default behavior which affects only new
allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (Not
supported for guest_memfd as it is unmovable by design).
== Upstream Plan ==
Phased approach as per David's guest_memfd extension overview [2] and
community calls [3]:
Phase 1 (this series):
1. Focuses on shared guest_memfd support (non-CoCo VMs).
2. Builds on Fuad's host-mapping work.
Phase2 (future work):
1. NUMA support for private guest_memfd (CoCo VMs).
2. Depends on SNP in-place conversion support [4].
This series provides a clean integration path for NUMA-aware memory
management for guest_memfd and lays the groundwork for future confidential
computing NUMA capabilities.
Please review and provide feedback!
Thanks,
Shivank
== Changelog ==
- v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy.
- v3: Introduced fbind() syscall for VMM memory-placement configuration.
- v4-v6: Current approach using shared_policy support and vm_ops (based on
suggestions from David [5] and guest_memfd bi-weekly upstream
call discussion [6]).
- v7: Use inodes to store NUMA policy instead of file [7].
- v8: Rebase on top of Fuad's V12: Host mmaping for guest_memfd memory.
- v9: Rebase on top of Fuad's V13 and incorporate review comments
[1] https://lore.kernel.org/all/20250709105946.4009897-1-tabba@google.com
[2] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAo…
[4] https://lore.kernel.org/all/20250613005400.3694904-1-michael.roth@amd.com
[5] https://lore.kernel.org/all/6fbef654-36e2-4be5-906e-2a648a845278@redhat.com
[6] https://lore.kernel.org/all/2b77e055-98ac-43a1-a7ad-9f9065d7f38f@amd.com
[7] https://lore.kernel.org/all/diqzbjumm167.fsf@ackerleytng-ctop.c.googlers.com
Ackerley Tng (1):
KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
Matthew Wilcox (Oracle) (2):
mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio()
mm/filemap: Extend __filemap_get_folio() to support NUMA memory
policies
Shivank Garg (4):
mm/mempolicy: Export memory policy symbols
KVM: guest_memfd: Add slab-allocated inode cache
KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
KVM: guest_memfd: selftests: Add tests for mmap and NUMA policy
support
fs/bcachefs/fs-io-buffered.c | 2 +-
fs/btrfs/compression.c | 4 +-
fs/btrfs/verity.c | 2 +-
fs/erofs/zdata.c | 2 +-
fs/f2fs/compress.c | 2 +-
include/linux/pagemap.h | 18 +-
include/uapi/linux/magic.h | 1 +
mm/filemap.c | 23 +-
mm/mempolicy.c | 6 +
mm/readahead.c | 2 +-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 122 ++++++++-
virt/kvm/guest_memfd.c | 255 ++++++++++++++++--
virt/kvm/kvm_main.c | 7 +-
virt/kvm/kvm_mm.h | 10 +-
15 files changed, 408 insertions(+), 49 deletions(-)
--
2.43.0
---
== Earlier Postings ==
v8: https://lore.kernel.org/all/20250618112935.7629-1-shivankg@amd.com
v7: https://lore.kernel.org/all/20250408112402.181574-1-shivankg@amd.com
v6: https://lore.kernel.org/all/20250226082549.6034-1-shivankg@amd.com
v5: https://lore.kernel.org/all/20250219101559.414878-1-shivankg@amd.com
v4: https://lore.kernel.org/all/20250210063227.41125-1-shivankg@amd.com
v3: https://lore.kernel.org/all/20241105164549.154700-1-shivankg@amd.com
v2: https://lore.kernel.org/all/20240919094438.10987-1-shivankg@amd.com
v1: https://lore.kernel.org/all/20240916165743.201087-1-shivankg@amd.com
Resolve a conflict between
commit 6a68d28066b6 ("selftests/coredump: Fix "socket_detect_userspace_client" test failure")
and
commit 994dc26302ed ("selftests/coredump: fix build")
The first commit adds a read() to wait for write() from another thread to
finish. But the second commit removes the write().
Now that the two commits are in the same tree, the read() now gets EOF and
the test fails.
Remove this read() so that the test passes.
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
---
tools/testing/selftests/coredump/stackdump_test.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/tools/testing/selftests/coredump/stackdump_test.c b/tools/testing/selftests/coredump/stackdump_test.c
index 5a5a7a5f7e1d..a4ac80bb1003 100644
--- a/tools/testing/selftests/coredump/stackdump_test.c
+++ b/tools/testing/selftests/coredump/stackdump_test.c
@@ -446,9 +446,6 @@ TEST_F(coredump, socket_detect_userspace_client)
if (info.coredump_mask & PIDFD_COREDUMPED)
goto out;
- if (read(fd_coredump, &c, 1) < 1)
- goto out;
-
exit_code = EXIT_SUCCESS;
out:
if (fd_peer_pidfd >= 0)
--
2.39.5
Various KUnit tests require PCI infrastructure to work. All normal
platforms enable PCI by default, but UML does not. Enabling PCI from
.kunitconfig files is problematic as it would not be portable. So in
commit 6fc3a8636a7b ("kunit: tool: Enable virtio/PCI by default on UML")
PCI was enabled by way of CONFIG_UML_PCI_OVER_VIRTIO=y. However
CONFIG_UML_PCI_OVER_VIRTIO requires additional configuration of
CONFIG_UML_PCI_OVER_VIRTIO_DEVICE_ID or will otherwise trigger a WARN() in
virtio_pcidev_init(). However there is no one correct value for
UML_PCI_OVER_VIRTIO_DEVICE_ID which could be used by default.
This warning is confusing when debugging test failures.
On the other hand, the functionality of CONFIG_UML_PCI_OVER_VIRTIO is not
used at all, given that it is completely non-functional as indicated by
the WARN() in question. Instead it is only used as a way to enable
CONFIG_UML_PCI which itself is not directly configurable.
Instead of going through CONFIG_UML_PCI_OVER_VIRTIO, introduce a custom
configuration option which enables CONFIG_UML_PCI without triggering
warnings or building dead code.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Johannes Berg <johannes(a)sipsolutions.net>
---
Changes in v2:
- Rebase onto v6.17-rc1
- Pick up review from Johannes
- Link to v1: https://lore.kernel.org/r/20250627-kunit-uml-pci-v1-1-a622fa445e58@linutron…
---
lib/kunit/Kconfig | 7 +++++++
tools/testing/kunit/configs/arch_uml.config | 5 ++---
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
index c10ede4b1d2201d5f8cddeb71cc5096e21be9b6a..1823539e96da30e165fa8d395ccbd3f6754c836e 100644
--- a/lib/kunit/Kconfig
+++ b/lib/kunit/Kconfig
@@ -106,4 +106,11 @@ config KUNIT_DEFAULT_TIMEOUT
If unsure, the default timeout of 300 seconds is suitable for most
cases.
+config KUNIT_UML_PCI
+ bool "KUnit UML PCI Support"
+ depends on UML
+ select UML_PCI
+ help
+ Enables the PCI subsystem on UML for use by KUnit tests.
+
endif # KUNIT
diff --git a/tools/testing/kunit/configs/arch_uml.config b/tools/testing/kunit/configs/arch_uml.config
index 54ad8972681a2cc724e6122b19407188910b9025..28edf816aa70e6f408d9486efff8898df79ee090 100644
--- a/tools/testing/kunit/configs/arch_uml.config
+++ b/tools/testing/kunit/configs/arch_uml.config
@@ -1,8 +1,7 @@
# Config options which are added to UML builds by default
-# Enable virtio/pci, as a lot of tests require it.
-CONFIG_VIRTIO_UML=y
-CONFIG_UML_PCI_OVER_VIRTIO=y
+# Enable pci, as a lot of tests require it.
+CONFIG_KUNIT_UML_PCI=y
# Enable FORTIFY_SOURCE for wider checking.
CONFIG_FORTIFY_SOURCE=y
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250626-kunit-uml-pci-a2b687553746
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hello,
kernel test robot noticed "kernel-selftests.filesystems/mount-notify.make.fail" on:
commit: c6d9775c2066a37385e784ee2e0ce83bd6644610 ("selftests/fs/mount-notify: build with tools include dir")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master 6e64f4580381e32c06ee146ca807c555b8f73e24]
[test failed on linux-next/master 442d93313caebc8ccd6d53f4572c50732a95bc48]
in testcase: kernel-selftests
version: kernel-selftests-x86_64-186f3edfdd41-1_20250803
with following parameters:
group: filesystems
config: x86_64-rhel-9.4-kselftests
compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang(a)intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508110628.65069d92-lkp@intel.com
2025-08-06 18:21:58 make -j36 TARGETS=filesystems/mount-notify
make[1]: Entering directory '/usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/tools/testing/selftests/filesystems/mount-notify'
CC mount-notify_test
mount-notify_test.c:21:3: error: conflicting types for ‘__kernel_fsid_t’; have ‘struct <anonymous>’
21 | } __kernel_fsid_t;
| ^~~~~~~~~~~~~~~
In file included from /usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/usr/include/asm/posix_types_64.h:18,
from /usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/usr/include/asm/posix_types.h:7,
from /usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/usr/include/linux/posix_types.h:36,
from /usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/usr/include/linux/types.h:9,
from /usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/usr/include/linux/stat.h:5,
from /usr/include/x86_64-linux-gnu/bits/statx.h:31,
from /usr/include/x86_64-linux-gnu/sys/stat.h:465,
from mount-notify_test.c:9:
/usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/usr/include/asm-generic/posix_types.h:81:3: note: previous declaration of ‘__kernel_fsid_t’ with type ‘__kernel_fsid_t’
81 | } __kernel_fsid_t;
| ^~~~~~~~~~~~~~~
make[1]: *** [../../lib.mk:222: /usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/tools/testing/selftests/filesystems/mount-notify/mount-notify_test] Error 1
make[1]: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-9.4-kselftests-c6d9775c2066a37385e784ee2e0ce83bd6644610/tools/testing/selftests/filesystems/mount-notify'
make: *** [Makefile:203: all] Error 2
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250811/202508110628.65069d92-lkp@…
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Currently it hard coded the number of hugepage to check for
check_huge_anon(), but we already have the number passed in.
Do the check based on the number of hugepage passed in is more
reasonable.
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index 44a3f8a58806..bf40e6b121ab 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -111,7 +111,7 @@ static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, int nr_hp
unsigned long rss_anon_before, rss_anon_after;
size_t i;
- if (!check_huge_anon(one_page, 4, pmd_pagesize))
+ if (!check_huge_anon(one_page, nr_hpages, pmd_pagesize))
ksft_exit_fail_msg("No THP is allocated\n");
rss_anon_before = rss_anon();
--
2.34.1
The SBI Firmware Feature extension allows the S-mode to request some
specific features (either hardware or software) to be enabled. This
series uses this extension to request misaligned access exception
delegation to S-mode in order to let the kernel handle it. It also adds
support for the KVM FWFT SBI extension based on the misaligned access
handling infrastructure.
FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be
tested using the qemu provided at [2] which contains the series from
[3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct
delegation of misaligned exceptions. Upstream OpenSBI can be used.
Note: Since SBI V3.0 is not yet ratified, FWFT extension API is split
between interface only and implementation, allowing to pick only the
interface which do not have hard dependencies on SBI.
The tests can be run using the kselftest from series [4].
$ qemu-system-riscv64 \
-cpu rv64,trap-misaligned-access=true,v=true \
-M virt \
-m 1024M \
-bios fw_dynamic.bin \
-kernel Image
...
# ./misaligned
TAP version 13
1..23
# Starting 23 tests from 1 test cases.
# RUN global.gp_load_lh ...
# OK global.gp_load_lh
ok 1 global.gp_load_lh
# RUN global.gp_load_lhu ...
# OK global.gp_load_lhu
ok 2 global.gp_load_lhu
# RUN global.gp_load_lw ...
# OK global.gp_load_lw
ok 3 global.gp_load_lw
# RUN global.gp_load_lwu ...
# OK global.gp_load_lwu
ok 4 global.gp_load_lwu
# RUN global.gp_load_ld ...
# OK global.gp_load_ld
ok 5 global.gp_load_ld
# RUN global.gp_load_c_lw ...
# OK global.gp_load_c_lw
ok 6 global.gp_load_c_lw
# RUN global.gp_load_c_ld ...
# OK global.gp_load_c_ld
ok 7 global.gp_load_c_ld
# RUN global.gp_load_c_ldsp ...
# OK global.gp_load_c_ldsp
ok 8 global.gp_load_c_ldsp
# RUN global.gp_load_sh ...
# OK global.gp_load_sh
ok 9 global.gp_load_sh
# RUN global.gp_load_sw ...
# OK global.gp_load_sw
ok 10 global.gp_load_sw
# RUN global.gp_load_sd ...
# OK global.gp_load_sd
ok 11 global.gp_load_sd
# RUN global.gp_load_c_sw ...
# OK global.gp_load_c_sw
ok 12 global.gp_load_c_sw
# RUN global.gp_load_c_sd ...
# OK global.gp_load_c_sd
ok 13 global.gp_load_c_sd
# RUN global.gp_load_c_sdsp ...
# OK global.gp_load_c_sdsp
ok 14 global.gp_load_c_sdsp
# RUN global.fpu_load_flw ...
# OK global.fpu_load_flw
ok 15 global.fpu_load_flw
# RUN global.fpu_load_fld ...
# OK global.fpu_load_fld
ok 16 global.fpu_load_fld
# RUN global.fpu_load_c_fld ...
# OK global.fpu_load_c_fld
ok 17 global.fpu_load_c_fld
# RUN global.fpu_load_c_fldsp ...
# OK global.fpu_load_c_fldsp
ok 18 global.fpu_load_c_fldsp
# RUN global.fpu_store_fsw ...
# OK global.fpu_store_fsw
ok 19 global.fpu_store_fsw
# RUN global.fpu_store_fsd ...
# OK global.fpu_store_fsd
ok 20 global.fpu_store_fsd
# RUN global.fpu_store_c_fsd ...
# OK global.fpu_store_c_fsd
ok 21 global.fpu_store_c_fsd
# RUN global.fpu_store_c_fsdsp ...
# OK global.fpu_store_c_fsdsp
ok 22 global.fpu_store_c_fsdsp
# RUN global.gen_sigbus ...
[12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000]
[12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51
[12797.989169] Hardware name: riscv-virtio,qemu (DT)
[12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100
[12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008
[12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160
[12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002
[12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef
[12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000
[12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848
[12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200
[12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000
[12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0
[12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073
[12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006
[12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001
# OK global.gen_sigbus
ok 23 global.gen_sigbus
# PASSED: 23 / 23 tests passed.
# Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0
With kvm-tools:
# lkvm run -k sbi.flat -m 128
Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97
Info: Removed ghost socket file "/root/.lkvm//guest-97.sock".
##########################################################################
# kvm-unit-tests
##########################################################################
... [test messages elided]
PASS: sbi: fwft: FWFT extension probing no error
PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1
PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor
SUMMARY: 50 tests, 2 unexpected failures, 12 skipped
This series is available at [5].
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/… [1]
Link: https://github.com/rivosinc/qemu/tree/dev/cleger/misaligned [2]
Link: https://lore.kernel.org/all/20241211211933.198792-3-fkonrad@amd.com/T/ [3]
Link: https://lore.kernel.org/linux-riscv/20250414123543.1615478-1-cleger@rivosin… [4]
Link: https://github.com/rivosinc/linux/tree/dev/cleger/fwft [5]
---
V8:
- Move misaligned_access_speed under CONFIG_RISCV_MISALIGNED and add a
separate commit for that.
V7:
- Fix ifdefery build problems
- Move sbi_fwft_is_supported with fwft_set_req struct
- Added Atish Reviewed-by
- Updated KVM vcpu cfg hedeleg value in set_delegation
- Changed SBI ETIME error mapping to ETIMEDOUT
- Fixed a few typo reported by Alok
V6:
- Rename FWFT interface to remove "_local"
- Fix test for MEDELEG values in KVM FWFT support
- Add __init for unaligned_access_init()
- Rebased on master
V5:
- Return ERANGE as mapping for SBI_ERR_BAD_RANGE
- Removed unused sbi_fwft_get()
- Fix kernel for sbi_fwft_local_set_cpumask()
- Fix indentation for sbi_fwft_local_set()
- Remove spurious space in kvm_sbi_fwft_ops.
- Rebased on origin/master
- Remove fixes commits and sent them as a separate series [4]
V4:
- Check SBI version 3.0 instead of 2.0 for FWFT presence
- Use long for kvm_sbi_fwft operation return value
- Init KVM sbi extension even if default_disabled
- Remove revert_on_fail parameter for sbi_fwft_feature_set().
- Fix comments for sbi_fwft_set/get()
- Only handle local features (there are no globals yet in the spec)
- Add new SBI errors to sbi_err_map_linux_errno()
V3:
- Added comment about kvm sbi fwft supported/set/get callback
requirements
- Move struct kvm_sbi_fwft_feature in kvm_sbi_fwft.c
- Add a FWFT interface
V2:
- Added Kselftest for misaligned testing
- Added get_user() usage instead of __get_user()
- Reenable interrupt when possible in misaligned access handling
- Document that riscv supports unaligned-traps
- Fix KVM extension state when an init function is present
- Rework SBI misaligned accesses trap delegation code
- Added support for CPU hotplugging
- Added KVM SBI reset callback
- Added reset for KVM SBI FWFT lock
- Return SBI_ERR_DENIED_LOCKED when LOCK flag is set
Clément Léger (14):
riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions
riscv: sbi: remove useless parenthesis
riscv: sbi: add new SBI error mappings
riscv: sbi: add FWFT extension interface
riscv: sbi: add SBI FWFT extension calls
riscv: misaligned: request misaligned exception from SBI
riscv: misaligned: use on_each_cpu() for scalar misaligned access
probing
riscv: misaligned: declare misaligned_access_speed under
CONFIG_RISCV_MISALIGNED
riscv: misaligned: move emulated access uniformity check in a function
riscv: misaligned: add a function to check misalign trap delegability
RISC-V: KVM: add SBI extension init()/deinit() functions
RISC-V: KVM: add SBI extension reset callback
RISC-V: KVM: add support for FWFT SBI extension
RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEG
arch/riscv/include/asm/cpufeature.h | 14 +-
arch/riscv/include/asm/kvm_host.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_sbi.h | 12 +
arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 29 +++
arch/riscv/include/asm/sbi.h | 60 +++++
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/sbi.c | 81 ++++++-
arch/riscv/kernel/traps_misaligned.c | 112 ++++++++-
arch/riscv/kernel/unaligned_access_speed.c | 8 +-
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu.c | 4 +-
arch/riscv/kvm/vcpu_sbi.c | 54 +++++
arch/riscv/kvm/vcpu_sbi_fwft.c | 257 +++++++++++++++++++++
arch/riscv/kvm/vcpu_sbi_sta.c | 3 +-
14 files changed, 620 insertions(+), 21 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h
create mode 100644 arch/riscv/kvm/vcpu_sbi_fwft.c
--
2.49.0
Hi all,
This patch series addresses false positives in the generic mm selftests
and skips tests that cannot run correctly due to missing features or system
limitations.
v2: https://lore.kernel.org/all/20250703060656.54345-1-aboorvad@linux.ibm.com/
Changes in v3:
- Rebased onto the latest mm-new branch, top commit of the base is commit 0709ddf8951f ("mm: add zblock allocator").
- Minor refactor based on the review comments.
- Included the tags from the previous version.
---
v1: https://lore.kernel.org/all/20250616160632.35250-1-aboorvad@linux.ibm.com/
Changes in v2:
- Rebased onto the mm-new branch, top commit of the base is commit 3b4a8ad89f7e ("mm: add zblock allocator").
- Split some patches for clarity.
- Updated virtual_address_range test to support testing 4PB VA on PPC64.
- Added proper Fixes: tags.
- Included a patch to skip a failing userfaultfd test when unsupported,
instead of reporting a failure.
---
Please let us know if you have any further comments.
Thanks,
Aboorva
Aboorva Devarajan (3):
selftests/mm: Fix child process exit codes in ksm_functional_tests
selftests/mm: Skip thuge-gen test if system is not setup properly
selftests/mm: Skip hugepage-mremap test if userfaultfd unavailable
Donet Tom (4):
mm/selftests: Fix incorrect pointer being passed to mark_range()
selftests/mm: Add support to test 4PB VA on PPC64
selftest/mm: Fix ksm_funtional_test failures
mm/selftests: Fix split_huge_page_test failure on systems with 64KB
page size
tools/testing/selftests/mm/hugepage-mremap.c | 16 +++++++++--
.../selftests/mm/ksm_functional_tests.c | 28 +++++++++++++------
.../selftests/mm/split_huge_page_test.c | 23 +++++++++------
tools/testing/selftests/mm/thuge-gen.c | 11 +++++---
.../selftests/mm/virtual_address_range.c | 13 ++++++++-
5 files changed, 67 insertions(+), 24 deletions(-)
--
2.47.1
Changes in v2:
- Restore RET_FAIL assignments in error paths to ensure the test's exit
code accurately reflects the failure status.
Wake Liu (1):
selftests/futex: Check for shmget support at runtime
.../selftests/futex/functional/futex_wait.c | 49 +++++++------
.../selftests/futex/functional/futex_waitv.c | 73 ++++++++++++-------
2 files changed, 73 insertions(+), 49 deletions(-)
--
2.50.1.703.g449372360f-goog
The binderfs selftests, specifically `binderfs_stress` and
`binderfs_test_unprivileged`, depend on user namespaces to run.
On kernels built without user namespace support (CONFIG_USER_NS=n),
these tests will fail.
To prevent these failures, add a check for the availability of user
namespaces by testing for the existence of "/proc/self/ns/user". If
the check fails, skip the tests and print an informative message.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
.../selftests/filesystems/binderfs/binderfs_test.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c
index 81db85a5cc16..e77ed34ebd06 100644
--- a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c
+++ b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c
@@ -291,6 +291,11 @@ static int write_id_mapping(enum idmap_type type, pid_t pid, const char *buf,
return 0;
}
+static bool has_userns(void)
+{
+ return (access("/proc/self/ns/user", F_OK) == 0);
+}
+
static void change_userns(struct __test_metadata *_metadata, int syncfds[2])
{
int ret;
@@ -378,6 +383,9 @@ static void *binder_version_thread(void *data)
*/
TEST(binderfs_stress)
{
+ if (!has_userns())
+ SKIP(return, "%s: user namespace not supported\n", __func__);
+
int fds[1000];
int syncfds[2];
pid_t pid;
@@ -502,6 +510,8 @@ TEST(binderfs_test_privileged)
TEST(binderfs_test_unprivileged)
{
+ if (!has_userns())
+ SKIP(return, "%s: user namespace not supported\n", __func__);
int ret;
int syncfds[2];
pid_t pid;
--
2.50.1.703.g449372360f-goog
The futex_waitv() syscall was introduced in Linux 5.16. The existing
test in futex_wait_timeout.c will fail on kernels older than 5.16
due to the syscall not being implemented.
Modify the test_timeout() function to check if the error returned
is ENOSYS. If it is, skip the test and report it as such, rather than
failing. This ensures the selftests can be run on a wider range of
kernel versions without false negatives.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
.../selftests/futex/functional/futex_wait_timeout.c | 11 ++++++++---
.../testing/selftests/futex/functional/futex_waitv.c | 8 ++++++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/futex/functional/futex_wait_timeout.c b/tools/testing/selftests/futex/functional/futex_wait_timeout.c
index d183f878360b..323cab339814 100644
--- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c
@@ -64,9 +64,14 @@ void *get_pi_lock(void *arg)
static void test_timeout(int res, int *ret, char *test_name, int err)
{
if (!res || errno != err) {
- ksft_test_result_fail("%s returned %d\n", test_name,
- res < 0 ? errno : res);
- *ret = RET_FAIL;
+ if (errno == ENOSYS) {
+ ksft_test_result_skip("%s: %s\n",
+ test_name, strerror(errno));
+ } else {
+ ksft_test_result_fail("%s returned %d\n", test_name,
+ res < 0 ? errno : res);
+ *ret = RET_FAIL;
+ }
} else {
ksft_test_result_pass("%s succeeds\n", test_name);
}
diff --git a/tools/testing/selftests/futex/functional/futex_waitv.c b/tools/testing/selftests/futex/functional/futex_waitv.c
index 034dbfef40cb..2a86fd3ea657 100644
--- a/tools/testing/selftests/futex/functional/futex_waitv.c
+++ b/tools/testing/selftests/futex/functional/futex_waitv.c
@@ -59,6 +59,14 @@ void *waiterfn(void *arg)
int main(int argc, char *argv[])
{
+ if (!ksft_min_kernel_version(5, 16)) {
+ ksft_print_header();
+ ksft_set_plan(0);
+ ksft_print_msg("%s: FUTEX_WAITV not implemented until 5.16\n",
+ basename(argv[0]));
+ ksft_print_cnts();
+ return KSFT_SKIP;
+ }
pthread_t waiter;
int res, ret = RET_PASS;
struct timespec to;
--
2.50.1.703.g449372360f-goog
On heterogeneous arm64 systems, KVM's PMU emulation is based on the
features of a single host PMU instance. When a vCPU is migrated to a
pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
incrementing.
Although this behavior is permitted by the architecture, Windows does
not handle it gracefully and may crash with a division-by-zero error.
The current workaround requires VMMs to pin vCPUs to a set of pCPUs
that share a compatible PMU. This is difficult to implement correctly in
QEMU/libvirt, where pinning occurs after vCPU initialization, and it
also restricts the guest to a subset of available pCPUs.
This patch introduces the KVM_ARM_VCPU_PMU_V3_COMPOSITION attribute to
create a "composite" PMU. When set, KVM exposes a PMU that is compatible
with all pCPUs by advertising only a single cycle counter, a feature
common to all PMUs.
This allows Windows guests to run reliably on heterogeneous systems
without crashing, even without vCPU pinning, and enables VMMs to
schedule vCPUs across all available pCPUs, making full use of the host
hardware.
A QEMU patch that demonstrates the usage of the new attribute is
available at:
https://lore.kernel.org/qemu-devel/20250806-kvm-v1-1-d1d50b7058cd@rsg.ci.i.…
("[PATCH RFC] target/arm/kvm: Choose PMU backend")
Signed-off-by: Akihiko Odaki <odaki(a)rsg.ci.i.u-tokyo.ac.jp>
---
Changes in v2:
- Added the KVM_ARM_VCPU_PMU_V3_COMPOSITION attribute to opt in the
feature.
- Added code to handle overflow.
- Link to v1: https://lore.kernel.org/r/20250319-hybrid-v1-1-4d1ada10e705@daynix.com
---
Akihiko Odaki (2):
KVM: arm64: PMU: Introduce KVM_ARM_VCPU_PMU_V3_COMPOSITION
KVM: arm64: selftests: Test guest PMUv3 composition
Documentation/virt/kvm/devices/vcpu.rst | 30 ++
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/arm.c | 5 +-
arch/arm64/kvm/pmu-emul.c | 495 +++++++++++++--------
arch/arm64/kvm/sys_regs.c | 2 +-
include/kvm/arm_pmu.h | 12 +-
.../selftests/kvm/arm64/vpmu_counter_access.c | 148 ++++--
8 files changed, 461 insertions(+), 234 deletions(-)
---
base-commit: 8ec6d99a41e3d1dbdff2bdb3aa42951681e1e76c
change-id: 20250224-hybrid-01d5ff47edd2
Best regards,
--
Akihiko Odaki <odaki(a)rsg.ci.i.u-tokyo.ac.jp>
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find DUALPI2 iproute2 patch v12.
For more details of DualPI2, please refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
Best Regards,
Chia-Yu
---
v12 (04-Aug-2025)
- Split into 3 patches: one move get_float(), one add get_float_min_max(), one for dualpi2 (David Ahern <dsahern(a)kernel.org>)
- Repalce matches() with strcmp() within get_packets() (David Ahern <dsahern(a)kernel.org>)
- Apply reverse xmas tree listing of variables (David Ahern <dsahern(a)kernel.org>)
v11 (18-Jul-2025)
- Replace TCA_DUALPI2 prefix with TC_DUALPI2 prefix for enums (Jakub Kicinski <kuba(a)kernel.org>)
v10 (02-Jul-2025)
- Replace STEP_THRESH and STEP_PACKETS w/ STEP_THRESH_PKTS and STEP_THRESH_US of net-next patch (Jakub Kicinski <kuba(a)kernel.org>)
v9 (13-Jun-2025)
- Fix space issue and typos (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Change 'rtt_typical' to 'typical_rtt' in tc/q_dualpi2.c (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Add the num of enum used by DualPI2 in pkt_sched.h
v8 (09-May-2025)
- Update pkt_sched.h with the one in nex-next
- Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
v7 (05-May-2025)
- Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml
- Reorganize dualpi2_print_opt() to match the order in tc.yaml
- Remove credit-queue in PRINT_JSON
v6 (26-Apr-2025)
- Update JSON file output due to spec modification in tc.yaml of net-next
v5 (25-Mar-2025)
- Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>)
- Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>)
v4 (16-Mar-2025)
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking.
v3 (21-Feb-2025)
- Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update the manual to align with the latest implementation and clarify the queue naming and default unit
- Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2
v2 (23-Oct-2024)
- Rename get_float in dualpi2 to get_float_min_max in utils.c
- Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>)
---
Chia-Yu Chang (3):
Move get_float() from ip/iplink_can.c to lib/utils.c
Add get_float_min_max() in lib/utils.c
tc: add dualpi2 scheduler module
bash-completion/tc | 11 +-
include/utils.h | 2 +
ip/iplink_can.c | 14 --
lib/utils.c | 30 +++
man/man8/tc-dualpi2.8 | 249 ++++++++++++++++++++
tc/Makefile | 1 +
tc/q_dualpi2.c | 535 ++++++++++++++++++++++++++++++++++++++++++
7 files changed, 827 insertions(+), 15 deletions(-)
create mode 100644 man/man8/tc-dualpi2.8
create mode 100644 tc/q_dualpi2.c
--
2.34.1
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Paul Walmsley <paul.walmsley(a)sifive.com>
To: Palmer Dabbelt <palmer(a)dabbelt.com>
To: Albert Ou <aou(a)eecs.berkeley.edu>
To: Conor Dooley <conor(a)kernel.org>
To: Rob Herring <robh(a)kernel.org>
To: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
To: Arnd Bergmann <arnd(a)arndb.de>
To: Christian Brauner <brauner(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Oleg Nesterov <oleg(a)redhat.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Kees Cook <kees(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
To: Jann Horn <jannh(a)google.com>
To: Conor Dooley <conor+dt(a)kernel.org>
To: Miguel Ojeda <ojeda(a)kernel.org>
To: Alex Gaynor <alex.gaynor(a)gmail.com>
To: Boqun Feng <boqun.feng(a)gmail.com>
To: Gary Guo <gary(a)garyguo.net>
To: Björn Roy Baron <bjorn3_gh(a)protonmail.com>
To: Benno Lossin <benno.lossin(a)proton.me>
To: Andreas Hindborg <a.hindborg(a)kernel.org>
To: Alice Ryhl <aliceryhl(a)google.com>
To: Trevor Gross <tmgross(a)umich.edu>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-riscv(a)lists.infradead.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: alistair.francis(a)wdc.com
Cc: richard.henderson(a)linaro.org
Cc: jim.shu(a)sifive.com
Cc: andybnac(a)gmail.com
Cc: kito.cheng(a)sifive.com
Cc: charlie(a)rivosinc.com
Cc: atishp(a)rivosinc.com
Cc: evan(a)rivosinc.com
Cc: cleger(a)rivosinc.com
Cc: alexghiti(a)rivosinc.com
Cc: samitolvanen(a)google.com
Cc: broonie(a)kernel.org
Cc: rick.p.edgecombe(a)intel.com
Cc: rust-for-linux(a)vger.kernel.org
changelog
---------
v19:
- riscv_nousercfi was `int`. changed it to unsigned long.
Thanks to Alex Ghiti for reporting it. It was a bug.
- ELP is cleared on trap entry only when CONFIG_64BIT.
- restore ssp back on return to usermode was being done
before `riscv_v_context_nesting_end` on trap exit path.
If kernel shadow stack were enabled this would result in
kernel operating on user shadow stack and panic (as I found
in my testing of kcfi patch series). So fixed that.
v18:
- rebased on 6.16-rc1
- uprobe handling clears ELP in sstatus image in pt_regs
- vdso was missing shadow stack elf note for object files.
added that. Additional asm file for vdso needed the elf marker
flag. toolchain should complain if `-fcf-protection=full` and
marker is missing for object generated from asm file. Asked
toolchain folks to fix this. Although no reason to gate the merge
on that.
- Split up compile options for march and fcf-protection in vdso
Makefile
- CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu
Added `arch/riscv/configs/hardening.config` fragment which selects
CONFIG_RISCV_USER_CFI
v17:
- fixed warnings due to empty macros in usercfi.h (reported by alexg)
- fixed prefixes in commit titles reported by alexg
- took below uprobe with fcfi v2 patch from Zong Li and squashed it with
"riscv/traps: Introduce software check exception and uprobe handling"
https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/
v16:
- If FWFT is not implemented or returns error for shadow stack activation, then
no_usercfi is set to disable shadow stack. Although this should be picked up
by extension validation and activation. Fixed this bug for zicfilp and zicfiss
both. Thanks to Charlie Jenkins for reporting this.
- If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by
Charlie Jenkins.
- Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to
keep it off till we have more hardware availibility with RVA23 profile and
zimop/zcmop implemented. Else this will start breaking people's workflow
- Includes the fix if "!RV64 and !SBI" then definitions for FWFT in
asm-offsets.c error.
v15:
- Toolchain has been updated to include `-fcf-protection` flag. This
exists for x86 as well. Updated kernel patches to compile vDSO and
selftest to compile with `fcf-protection=full` flag.
- selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI.
- Patch to enable shadow stack for kernel wasn't hidden behind
CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14:
- rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches
Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants.
- Took Radim's suggestions on bitfields.
- Placed cfi_state at the end of thread_info block so that current situation
is not disturbed with respect to member fields of thread_info in single
cacheline.
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to
to `lpad 0`.
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
Changes in v19:
- Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri…
Changes in v18:
- Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri…
Changes in v17:
- Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri…
Changes in v16:
- Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri…
Changes in v15:
- changelog posted just below cover letter
- Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri…
Changes in v14:
- changelog posted just below cover letter
- Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri…
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri…
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri…
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (25):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv/mm: manufacture shadow stack pte
riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs
riscv/mm: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv: Implements arch agnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception and uprobe handling
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: kernel command line option to opt out of user cfi
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad and shadow stack note
Documentation/admin-guide/kernel-parameters.txt | 8 +
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 179 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 21 +
arch/riscv/Makefile | 5 +-
arch/riscv/configs/hardening.config | 4 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 12 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 26 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 1 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 95 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 34 ++
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 10 +
arch/riscv/kernel/cpufeature.c | 27 +
arch/riscv/kernel/entry.S | 38 ++
arch/riscv/kernel/head.S | 27 +
arch/riscv/kernel/process.c | 27 +-
arch/riscv/kernel/ptrace.c | 95 ++++
arch/riscv/kernel/signal.c | 148 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 54 ++
arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 11 +-
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +-
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 16 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 27 +
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 16 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++
tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 27 +
56 files changed, 2389 insertions(+), 30 deletions(-)
---
base-commit: a2a05801de77ca5122fc34e3eb84d6359ef70389
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
The "nohz_full" and "rcu_nocbs" boot command parameters can be used to
remove a lot of kernel overhead on a specific set of isolated CPUs which
can be used to run some latency/bandwidth sensitive workloads with as
little kernel disturbance/noise as possible. The problem with this mode
of operation is the fact that it is a static configuration which cannot
be changed after boot to adjust for changes in application loading.
There is always a desire to enable runtime modification of the number
of isolated CPUs that can be dedicated to this type of demanding
workloads. This patchset is an attempt to do just that with an amount of
CPU isolation close to what can be done with the nohz_full and rcu_nocbs
boot kernel parameters.
This patch series provides the ability to change the set of housekeeping
CPUs at run time via the cpuset isolated partition functionality.
Currently, the cpuset isolated partition is able to disable scheduler
load balancing and the CPU affinity of the unbound workqueue to avoid the
isolated CPUs. This patch series will extend that with other kernel noises
associated with the nohz_full boot command line parameter which has the
following sub-categories:
- tick
- timer
- RCU
- MISC
- WQ
- kthread
The rcu_nocbs is actually a subset of nohz_full focusing just on the
RCU part of the kernel noises. The WQ part has already been handled by
the current cpuset code.
This series focuses on the tick and RCU part of the kernel noises by
actively changing their internal data structures to track changes in
the list of isolated CPUs used by cpuset isolated partitions.
The dynamic update of the lists of housekeeping CPUs at run time will
also have impact on the other part of the kernel noises that reference
the lists of housekeeping CPUs at run time.
The pending patch series on timer migration[1], when properly integrated
will support the timer part too.
The CPU hotplug functionality of the Linux kernel is used to facilitate
the runtime change of the nohz_full isolated CPUs with minimal code
changes. The CPUs that need to be switched from non-isolated to
isolated or vice versa will be brought offline first, making the
necessary changes and then brought back online afterward.
The use of CPU hotplug, however, does have a slight drawback of
freezing all the other CPUs in part of the offlining process using
the stop machine feature of the kernel. That will cause a noticeable
latency spikes in other running applications which may be significant
to sensitive applications running on isolated CPUs in other isolated
partitions at the time. Hopefully we can find a way to solve this
problem in the future.
One possible workaround for this is to reserve a set of nohz_full
isolated CPUs at boot time using the nohz_full boot command parameter.
The bringing of those nohz_full reserved CPUs into and out of isolated
partitions will not invoke CPU hotplug and hence will not cause
unexpected latency spikes. These reserved CPUs will only be needed
if there are other existing isolated partitions running critical
applications at the time when an isolated partition needs to be created.
Patches 1-4 updates the CPU isolation code at kernel/sched/isolation.c
to enable dynamic update of the lists of housekeeping CPUs.
Patch 5 introduces a new cpuhp_offline_cb() API for shutting down the
given set of CPUs, running the given callback method and then bringing
those CPUs back online again. This new API will block any incoming
hotplug events from interfering this operation.
Patches 6-9 updates the cpuset partition code to use the new cpuhp API
to shut down the affect CPUs, making changes to the housekeeping
cpumasks and then bring those CPUs online afterward.
Patch 10 works around an issue in the DL server code that block the
hotplug operation under certain configurations.
Patch 11-14 updates the timer tick and related code to enable proper
updates to the set of CPUs requiring nohz_full dynticks support.
Patch 15 enables runtime modification to the set of isolated CPUs
requiring RCU NO-CB CPU support with minor changes to the RCU code.
Patches 16-18 includes other miscellaneous updates to cpuset code and
documentation.
This patch series is applied on top of some other cpuset patches[1]
posted upstream recently.
[1] https://lore.kernel.org/lkml/20250806093855.86469-1-gmonaco@redhat.com/
[2] https://lore.kernel.org/lkml/20250806172430.1155133-1-longman@redhat.com/
Waiman Long (18):
sched/isolation: Enable runtime update of housekeeping cpumasks
sched/isolation: Call sched_tick_offload_init() when
HK_FLAG_KERNEL_NOISE is first set
sched/isolation: Use RCU to delay successive housekeeping cpumask
updates
sched/isolation: Add a debugfs file to dump housekeeping cpumasks
cpu/hotplug: Add a new cpuhp_offline_cb() API
cgroup/cpuset: Introduce a new top level isolcpus_update_mutex
cgroup/cpuset: Allow overwriting HK_TYPE_DOMAIN housekeeping cpumask
cgroup/cpuset: Use CPU hotplug to enable runtime nohz_full
modification
cgroup/cpuset: Revert "Include isolated cpuset CPUs in
cpu_is_isolated() check"
sched/core: Ignore DL BW deactivation error if in
cpuhp_offline_cb_mode
tick/nohz: Make nohz_full parameter optional
tick/nohz: Introduce tick_nohz_full_update_cpus() to update
tick_nohz_full_mask
tick/nohz: Allow runtime changes in full dynticks CPUs
tick: Pass timer tick job to an online HK CPU in tick_cpu_dying()
cgroup/cpuset: Enable RCU NO-CB CPU offloading of newly isolated CPUs
cgroup/cpuset: Don't set have_boot_nohz_full without any boot time
nohz_full CPU
cgroup/cpuset: Documentation updates & don't use CPU 0 for isolated
partition
cgroup/cpuset: Add pr_debug() statements for cpuhp_offline_cb() call
Documentation/admin-guide/cgroup-v2.rst | 33 +-
.../admin-guide/kernel-parameters.txt | 19 +-
include/linux/context_tracking.h | 8 +-
include/linux/cpuhplock.h | 9 +
include/linux/cpuset.h | 6 -
include/linux/rcupdate.h | 2 +
include/linux/sched/isolation.h | 9 +-
include/linux/tick.h | 2 +
kernel/cgroup/cpuset.c | 344 ++++++++++++------
kernel/context_tracking.c | 21 +-
kernel/cpu.c | 47 +++
kernel/rcu/tree_nocb.h | 7 +-
kernel/sched/core.c | 8 +-
kernel/sched/debug.c | 32 ++
kernel/sched/isolation.c | 151 +++++++-
kernel/sched/sched.h | 2 +-
kernel/time/tick-common.c | 15 +-
kernel/time/tick-sched.c | 24 +-
.../selftests/cgroup/test_cpuset_prs.sh | 15 +-
19 files changed, 583 insertions(+), 171 deletions(-)
--
2.50.0
David asked me if there is a way of checking split_huge_page_test
results instead of the existing smap check[1]. This patchset uses
kpageflags to get after-split folio orders for a better
split_huge_page_test result check. The added gather_folio_orders() scans
through a VPN range and collects the numbers of folios at different orders.
check_folio_orders() compares the result of gather_folio_orders() to
a given list of numbers of different orders.
split_huge_page_test needs the FORCE_READ fix in [2] to work correctly.
This patchset also:
1. added new order and in folio offset to the split huge page debugfs's
pr_debug()s;
2. changed split_huge_pages_pid() to skip the rest of a folio if it is
split by folio_split() (not changing split_folio_to_order() part
since split_pte_mapped_thp test relies on its behavior).
[1] https://lore.kernel.org/linux-mm/e2f32bdb-e4a4-447c-867c-31405cbba151@redha…
[2] https://lore.kernel.org/linux-mm/20250805175140.241656-1-ziy@nvidia.com/
Zi Yan (4):
mm/huge_memory: add new_order and offset to split_huge_pages*()
pr_debug.
mm/huge_memory: move to next folio after folio_split() succeeds.
selftests/mm: add check_folio_orders() helper.
selftests/mm: check after-split folio orders in split_huge_page_test.
mm/huge_memory.c | 22 +--
.../selftests/mm/split_huge_page_test.c | 67 ++++++---
tools/testing/selftests/mm/vm_util.c | 139 ++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 2 +
4 files changed, 200 insertions(+), 30 deletions(-)
--
2.47.2