In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct kunit_suite **" but the returned type will
be "struct kunit_suite * const *". Since it isn't generally possible
to remove the const qualifier, adjust the allocation type to match
the assignment.
Signed-off-by: Kees Cook <kees(a)kernel.org>
---
Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Cc: <linux-kselftest(a)vger.kernel.org>
Cc: <kunit-dev(a)googlegroups.com>
---
lib/kunit/executor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 3f39955cb0f1..0061d4c7e351 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -177,7 +177,7 @@ kunit_filter_suites(const struct kunit_suite_set *suite_set,
const size_t max = suite_set->end - suite_set->start;
- copy = kcalloc(max, sizeof(*filtered.start), GFP_KERNEL);
+ copy = kcalloc(max, sizeof(*copy), GFP_KERNEL);
if (!copy) { /* won't be able to run anything, return an empty set */
return filtered;
}
--
2.34.1
From: Alexander Shatalin <sashatalin03(a)gmail.com>
This patch improves the safety when restoring WRITES_STRICT by ensuring:
- The variable is not empty (`-n`)
- The target file is writable (`-w`)
Also improves formatting by quoting variables and following shell best
practices.
Signed-off-by: Alexander Shatalin <sashatalin03(a)gmail.com>
From: Madhavan Srinivasan <maddy(a)linux.ibm.com>
Commit 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently")
added a pkey_util.c to refactor some of the protection_keys functions accessible
by other tests. But this broken the build in powerpc in two ways,
pkey-powerpc.h: In function ‘arch_is_powervm’:
pkey-powerpc.h:73:21: error: storage size of ‘buf’ isn’t known
73 | struct stat buf;
| ^~~
pkey-powerpc.h:75:14: error: implicit declaration of function ‘stat’; did you mean ‘strcat’? [-Wimplicit-function-declaration]
75 | if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 0) &&
| ^~~~
| strcat
Since pkey_util.c includes pkeys-helper.h, which in turn includes pkeys-powerpc.h,
stat.h including is missing for "struct stat". This is fixed by adding "sys/stat.h"
in pkeys-powerpc.h
Secondly,
pkey-powerpc.h:55:18: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘u64’ {aka ‘long unsigned int’} [-Wformat=]
55 | dprintf4("%s() changing %016llx to %016llx\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __func__, __read_pkey_reg(), pkey_reg);
| ~~~~~~~~~~~~~~~~~
| |
| u64 {aka long unsigned int}
pkey-helpers.h:63:32: note: in definition of macro ‘dprintf_level’
63 | sigsafe_printf(args); \
| ^~~~
These format specifier related warning are removed by adding
"__SANE_USERSPACE_TYPES__" to pkeys_utils.c.
Fixes: 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently")
Signed-off-by: Madhavan Srinivasan <maddy(a)linux.ibm.com>
Signed-off-by: Nysal Jan K.A. <nysal(a)linux.ibm.com>
---
tools/testing/selftests/mm/pkey-powerpc.h | 2 ++
tools/testing/selftests/mm/pkey_util.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/tools/testing/selftests/mm/pkey-powerpc.h b/tools/testing/selftests/mm/pkey-powerpc.h
index 1bad310d282a..d8ec906b8120 100644
--- a/tools/testing/selftests/mm/pkey-powerpc.h
+++ b/tools/testing/selftests/mm/pkey-powerpc.h
@@ -3,6 +3,8 @@
#ifndef _PKEYS_POWERPC_H
#define _PKEYS_POWERPC_H
+#include <sys/stat.h>
+
#ifndef SYS_pkey_alloc
# define SYS_pkey_alloc 384
# define SYS_pkey_free 385
diff --git a/tools/testing/selftests/mm/pkey_util.c b/tools/testing/selftests/mm/pkey_util.c
index ca4ad0d44ab2..255b332f7a08 100644
--- a/tools/testing/selftests/mm/pkey_util.c
+++ b/tools/testing/selftests/mm/pkey_util.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
+#define __SANE_USERSPACE_TYPES__
#include <sys/syscall.h>
#include <unistd.h>
--
2.47.0
A few functions used by different selftests.
Adding them now avoids later conflicts between different selftest serieses.
Also add full support for nolibc-test.c on riscv32.
All unsupported syscalls have been replaced.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Rebase onto latest nolibc next branch
- Move #include "nolibc.h" to the top of the headers
- Include linux/random.h from sys/random.h
- Don't block on missing entropy in getrandom() selftest
- Also test negative result of difftime()
- Simplify fopen()
- Link to v1: https://lore.kernel.org/r/20250423-nolibc-misc-v1-0-a925bf40297b@linutronix…
---
Thomas Weißschuh (15):
tools/nolibc: add strstr()
tools/nolibc: add %m printf format
tools/nolibc: add more stat() variants
tools/nolibc: add mremap()
tools/nolibc: add getrandom()
tools/nolibc: add abs() and friends
tools/nolibc: add support for access() and faccessat()
tools/nolibc: add clock_getres(), clock_gettime() and clock_settime()
tools/nolibc: add timer functions
tools/nolibc: add timerfd functionality
tools/nolibc: add difftime()
tools/nolibc: add namespace functionality
tools/nolibc: add fopen()
tools/nolibc: fall back to sys_clock_gettime() in gettimeofday()
tools/nolibc: implement wait() in terms of waitpid()
tools/include/nolibc/Makefile | 4 +
tools/include/nolibc/math.h | 31 ++++
tools/include/nolibc/nolibc.h | 4 +
tools/include/nolibc/sched.h | 50 +++++
tools/include/nolibc/stdio.h | 27 +++
tools/include/nolibc/stdlib.h | 18 ++
tools/include/nolibc/string.h | 20 ++
tools/include/nolibc/sys/mman.h | 19 ++
tools/include/nolibc/sys/random.h | 34 ++++
tools/include/nolibc/sys/stat.h | 25 ++-
tools/include/nolibc/sys/time.h | 15 +-
tools/include/nolibc/sys/timerfd.h | 87 +++++++++
tools/include/nolibc/sys/wait.h | 12 +-
tools/include/nolibc/time.h | 185 ++++++++++++++++++
tools/include/nolibc/types.h | 3 +
tools/include/nolibc/unistd.h | 28 +++
tools/testing/selftests/nolibc/Makefile | 2 +
tools/testing/selftests/nolibc/nolibc-test.c | 268 ++++++++++++++++++++++++++-
18 files changed, 820 insertions(+), 12 deletions(-)
---
base-commit: 2051d3b830c0889ae55e37e9e8ff0d43a4acd482
change-id: 20250415-nolibc-misc-f2548ccc5ce1
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM.
However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0
indicates that MTE_ASYNC is supported. On a host with
ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the
MTE capability enabled will incorrectly see MTE_ASYNC advertised as
supported. This series fixes that.
This was found by inspection and the current behaviour is not known to
break anything. Linux doesn't check MTE_frac, and wrongly, assumes
MTE async faults can be generated whenever MTE is supported. This is
a separate problem and not addressed here.
I am looking for feedback on whether this change is valuable or
otherwise.
Ben Horgan (3):
arm64/sysreg: Expose MTE_frac so that it is visible to KVM
KVM: arm64: Make MTE_frac masking conditional on MTE capability
KVM: selftests: Confirm exposing MTE_frac does not break migration
arch/arm64/kernel/cpufeature.c | 1 +
arch/arm64/kvm/sys_regs.c | 26 ++++++-
.../testing/selftests/kvm/arm64/set_id_regs.c | 77 ++++++++++++++++++-
3 files changed, 101 insertions(+), 3 deletions(-)
base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
--
2.43.0
Hello everyone,
A heads up!
Work on moving some aspects of the KernelCI configuration into an in-tree
(or per-project, if you wish) file called ".kernelci.yaml" has started and
is currently in the early design and implementation stage.
This is why we need your feedback to ensure that the feature implementation
moves in the right direction and will meet the user's needs.
Please have a look at the following GitHub issue:
https://github.com/kernelci/kernelci-pipeline/issues/1126
The linked issue is where anyone interested in this feature can get
involved to discuss implementation details, ask questions, etc. You can,
of course, ask questions here, too.
This new feature will eventually pave the way for the introduction of
a push model to KernelCI, along with the pull model we currently have,
allowing for the integration of different Git hosting platforms, such as
GitHub, GitLab, etc., where events sent to us from these platforms will
trigger an appropriate behaviour on the KernelCI's side.
However, this new functionality will be added as a mid-term deliverable,
where the ability to configure KernelCI via the ".kernelci.yaml" file is
one of the crucial dependencies.
Please don't hesitate to get involved and provide feedback.
Thank you!
On behalf of the KernelCI maintainers,
Krzysztof
Change the type of 'dwRegionSize' in wp_init() and wp_free()
from int to long to match callers that pass long or
unsigned long long values.
wp_addr_range function is left unchanged because it passes
'dwRegionSize' parameter directly to pagemap_ioctl, which expects
an int.
Signed-off-by: Siddarth G <siddarthsgml(a)gmail.com>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/pagemap_ioctl.c b/tools/testing/selftests/mm/pagemap_ioctl.c
index 57b4bba2b45f..5773666f07ea 100644
--- a/tools/testing/selftests/mm/pagemap_ioctl.c
+++ b/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -112,7 +112,7 @@ int init_uffd(void)
return 0;
}
-int wp_init(void *lpBaseAddress, int dwRegionSize)
+int wp_init(void *lpBaseAddress, long dwRegionSize)
{
struct uffdio_register uffdio_register;
struct uffdio_writeprotect wp;
@@ -136,7 +136,7 @@ int wp_init(void *lpBaseAddress, int dwRegionSize)
return 0;
}
-int wp_free(void *lpBaseAddress, int dwRegionSize)
+int wp_free(void *lpBaseAddress, long dwRegionSize)
{
struct uffdio_register uffdio_register;
--
2.43.0
The kunit test that checks the longests symbol length [1], has triggered
warnings in some CI pilelines when symbol prefixes are used [2]. The test
is adjusted to depend on !CONFIG_PREFIX_SYMBOLS as sujested in [3].
[1] https://lore.kernel.org/rust-for-linux/CABVgOSm=5Q0fM6neBhxSbOUHBgNzmwf2V22…
[2] https://lore.kernel.org/all/20250328112156.2614513-1-arnd@kernel.org/T/#u
[3] https://lore.kernel.org/all/ycgbf7jcq7nc62ndqiynogt6hkabgl3hld4uyelgo7rksyl…
Fixes: c104c16073b7 ("Kunit to check the longest symbol length")
Signed-off-by: Sergio González Collado <sergio.collado(a)gmail.com>
---
lib/Kconfig.debug | 2 +-
lib/tests/longest_symbol_kunit.c | 3 +--
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f9051ab610d5..6937dedce04d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2885,7 +2885,7 @@ config FORTIFY_KUNIT_TEST
config LONGEST_SYM_KUNIT_TEST
tristate "Test the longest symbol possible" if !KUNIT_ALL_TESTS
- depends on KUNIT && KPROBES
+ depends on KUNIT && KPROBES && !CONFIG_PREFIX_SYMBOLS
default KUNIT_ALL_TESTS
help
Tests the longest symbol possible
diff --git a/lib/tests/longest_symbol_kunit.c b/lib/tests/longest_symbol_kunit.c
index e3c28ff1807f..b183fb92d1b2 100644
--- a/lib/tests/longest_symbol_kunit.c
+++ b/lib/tests/longest_symbol_kunit.c
@@ -3,8 +3,7 @@
* Test the longest symbol length. Execute with:
* ./tools/testing/kunit/kunit.py run longest-symbol
* --arch=x86_64 --kconfig_add CONFIG_KPROBES=y --kconfig_add CONFIG_MODULES=y
- * --kconfig_add CONFIG_RETPOLINE=n --kconfig_add CONFIG_CFI_CLANG=n
- * --kconfig_add CONFIG_MITIGATION_RETPOLINE=n
+ * --kconfig_add CONFIG_CPU_MITIGATIONS=n
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
base-commit: f1a3944c860b0615d0513110d8cf62bb94adbb41
--
2.39.2
From: Neill Kapron <nkapron(a)google.com>
The inconsistencies in the systcall ABI between arm and arm-compat can
can cause a failure in the syscall_restart test due to the logic
attempting to work around the differences. The 'machine' field for an
ARM64 device running in compat mode can report 'armv8l' or 'armv8b'
which matches with the string 'arm' when only examining the first three
characters of the string.
This change adds additional validation to the workaround logic to make
sure we only take the arm path when running natively, not in arm-compat.
Fixes: 256d0afb11d6 ("selftests/seccomp: build and pass on arm64")
Signed-off-by: Neill Kapron <nkapron(a)google.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index b2f76a52215a..53bf6a9c801f 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -3166,12 +3166,15 @@ TEST(syscall_restart)
ret = get_syscall(_metadata, child_pid);
#if defined(__arm__)
/*
- * FIXME:
* - native ARM registers do NOT expose true syscall.
* - compat ARM registers on ARM64 DO expose true syscall.
+ * - values of utsbuf.machine include 'armv8l' or 'armb8b'
+ * for ARM64 running in compat mode.
*/
ASSERT_EQ(0, uname(&utsbuf));
- if (strncmp(utsbuf.machine, "arm", 3) == 0) {
+ if ((strncmp(utsbuf.machine, "arm", 3) == 0) &&
+ (strncmp(utsbuf.machine, "armv8l", 6) != 0) &&
+ (strncmp(utsbuf.machine, "armv8b", 6) != 0)) {
EXPECT_EQ(__NR_nanosleep, ret);
} else
#endif
--
2.49.0.850.g28803427d3-goog
This patchset adds KVM selftests for LoongArch system, currently only
some common test cases are supported and pass to run. These test cases
are listed as following:
coalesced_io_test
demand_paging_test
dirty_log_perf_test
dirty_log_test
guest_print_test
hardware_disable_test
kvm_binary_stats_test
kvm_create_max_vcpus
kvm_page_table_test
memslot_modification_stress_test
memslot_perf_test
set_memory_region_test
---
Changes in v10:
1. Add PS_64K and remove PS_8K in file include/loongarch/processor.h
2. Fix a typo issue in file lib/loongarch/processor.c
3. Update file MAINTAINERS about LoongArch KVM selftests
Changes in v9:
1. Add vm mode VM_MODE_P47V47_16K, LoongArch VM uses this mode by
default, rather than VM_MODE_P36V47_16K.
2. Refresh some spelling issues in changelog.
Changes in v8:
1. Porting patch based on the latest version.
2. For macro PC_OFFSET_EXREGS, offsetof() method is used for C header file,
still hardcoded definition for assemble language.
Changes in v7:
1. Refine code to add LoongArch support in test case
set_memory_region_test.
Changes in v6:
1. Refresh the patch based on latest kernel 6.8-rc1, add LoongArch
support about testcase set_memory_region_test.
2. Add hardware_disable_test test case.
3. Drop modification about macro DEFAULT_GUEST_TEST_MEM, it is problem
of LoongArch binutils, this issue is raised to LoongArch binutils owners.
Changes in v5:
1. In LoongArch kvm self tests, the DEFAULT_GUEST_TEST_MEM could be
0x130000000, it is different from the default value in memstress.h.
So we Move the definition of DEFAULT_GUEST_TEST_MEM into LoongArch
ucall.h, and add 'ifndef' condition for DEFAULT_GUEST_TEST_MEM
in memstress.h.
Changes in v4:
1. Remove the based-on flag, as the LoongArch KVM patch series
have been accepted by Linux kernel, so this can be applied directly
in kernel.
Changes in v3:
1. Improve implementation of LoongArch VM page walk.
2. Add exception handler for LoongArch.
3. Add dirty_log_test, dirty_log_perf_test, guest_print_test
test cases for LoongArch.
4. Add __ASSEMBLER__ macro to distinguish asm file and c file.
5. Move ucall_arch_do_ucall to the header file and make it as
static inline to avoid function calls.
6. Change the DEFAULT_GUEST_TEST_MEM base addr for LoongArch.
Changes in v2:
1. We should use ".balign 4096" to align the assemble code with 4K in
exception.S instead of "align 12".
2. LoongArch only supports 3 or 4 levels page tables, so we remove the
hanlders for 2-levels page table.
3. Remove the DEFAULT_LOONGARCH_GUEST_STACK_VADDR_MIN and use the common
DEFAULT_GUEST_STACK_VADDR_MIN to allocate stack memory in guest.
4. Reorganize the test cases supported by LoongArch.
5. Fix some code comments.
6. Add kvm_binary_stats_test test case into LoongArch KVM selftests.
---
Bibo Mao (5):
KVM: selftests: Add VM_MODE_P47V47_16K vm mode
KVM: selftests: Add KVM selftests header files for LoongArch
KVM: selftests: Add core KVM selftests support for LoongArch
KVM: selftests: Add ucall test support for LoongArch
KVM: selftests: Add test cases for LoongArch
MAINTAINERS | 2 +
tools/testing/selftests/kvm/Makefile | 2 +-
tools/testing/selftests/kvm/Makefile.kvm | 18 +
.../testing/selftests/kvm/include/kvm_util.h | 6 +
.../kvm/include/loongarch/kvm_util_arch.h | 7 +
.../kvm/include/loongarch/processor.h | 141 ++++++++
.../selftests/kvm/include/loongarch/ucall.h | 20 +
tools/testing/selftests/kvm/lib/kvm_util.c | 3 +
.../selftests/kvm/lib/loongarch/exception.S | 59 +++
.../selftests/kvm/lib/loongarch/processor.c | 342 ++++++++++++++++++
.../selftests/kvm/lib/loongarch/ucall.c | 38 ++
.../selftests/kvm/set_memory_region_test.c | 2 +-
12 files changed, 638 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
create mode 100644 tools/testing/selftests/kvm/include/loongarch/processor.h
create mode 100644 tools/testing/selftests/kvm/include/loongarch/ucall.h
create mode 100644 tools/testing/selftests/kvm/lib/loongarch/exception.S
create mode 100644 tools/testing/selftests/kvm/lib/loongarch/processor.c
create mode 100644 tools/testing/selftests/kvm/lib/loongarch/ucall.c
base-commit: 9d7a0577c9db35c4cc52db90bc415ea248446472
--
2.39.3
v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c…
====
No changes in v12, just restored the selftests patch I accidentally dropped in
v11
v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c…
====
Addressed a couple of nits and collected Acked-by from Harshitha
(thanks!)
v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.…
====
Addressed comments following conversations with Pavel, Stan, and
Harshitha. Thank you guys for the reviews again. Overall minor changes:
Changelog:
- Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with
a TX niov in that path (Pavel).
- Fix locking case in !netif_device_present (Jakub/Stan).
v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c…
===
Changelog:
- Use priv->bindings list instead of sock_bindings_list. This was missed
during the rebase as the bindings have been updated to use
priv->bindings recently (thanks Stan!)
v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.…
===
Only address minor comments on V7
Changelog:
- Use netdev locking instead of rtnl_locking to match rx path.
- Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of
NET_IOV_UNSPECIFIED.
- Post send binding to net_devmem_dmabuf_bindings after it's been fully
initialized (Stan).
v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.…
===
Changelog:
- Check the dmabuf net_iov binding belongs to the device the TX is going
out on. (Jakub)
- Provide detailed inspection of callsites of
__skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub)
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c…
===
v6 has no major changes. Addressed a few issues from Paolo and David,
and collected Acks from Stan. Thank you everyone for the review!
Changes:
- retain behavior to process MSG_FASTOPEN even if the provided cmsg is
invalid (Paolo).
- Rework the freeing of tx_vec slightly (it now has its own err label).
(Paolo).
- Squash the commit that makes dmabuf unbinding scheduled work into the
same one which implements the TX path so we don't run into future
errors on bisecting (Paolo).
- Fix/add comments to explain how dmabuf binding refcounting works
(David).
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c…
===
v5 has no major changes; it clears up the relatively minor issues
pointed out to in v4, and rebases the series on top of net-next to
resolve the conflict with a patch that raced to the tree. It also
collects the review tags from v4.
Changes:
- Rebase to net-next
- Fix issues in selftest (Stan).
- Address comments in the devmem and netmem driver docs (Stan and Bagas)
- Fix zerocopy_fill_skb_from_devmem return error code (Stan).
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.…
===
v4 mainly addresses the critical driver support issue surfaced in v3 by
Paolo and Stan. Drivers aiming to support netmem_tx should make sure not
to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs
may come from dma-bufs.
Additionally other feedback from v3 is addressed.
Major changes:
- Add helpers to handle netmem dma-addrs. Add GVE support for
netmem_tx.
- Fix binding->tx_vec not being freed on error paths during the
tx binding.
- Add a minimal devmem_tx test to devmem.py.
- Clean up everything obsolete from the cover letter (Paolo).
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
===
Address minor comments from RFCv2 and fix a few build warnings and
ynl-regen issues. No major changes.
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now multiple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Cc: Samiullah Khawaja <skhawaja(a)google.com>
Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Mina Almasry (8):
netmem: add niov->type attribute to distinguish different net_iov
types
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: add devmem TCP TX documentation
net: enable driver support for netmem TX
gve: add netmem TX support to GVE DQO-RDA mode
net: check for driver support in netmem TX
selftests: ncdevmem: Implement devmem TCP TX
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 150 ++++++++-
.../networking/net_cachelines/net_device.rst | 1 +
Documentation/networking/netdev-features.rst | 5 +
Documentation/networking/netmem.rst | 23 +-
drivers/net/ethernet/google/gve/gve_main.c | 3 +
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +-
include/linux/netdevice.h | 2 +
include/linux/skbuff.h | 17 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 34 +-
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
io_uring/zcrx.c | 3 +-
net/core/datagram.c | 48 ++-
net/core/dev.c | 34 +-
net/core/devmem.c | 133 ++++++--
net/core/devmem.h | 83 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 80 ++++-
net/core/skbuff.c | 48 ++-
net/core/sock.c | 6 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/tcp.c | 50 ++-
net/ipv6/ip6_output.c | 3 +-
net/vmw_vsock/virtio_transport_common.c | 5 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/devmem.py | 26 +-
.../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++-
30 files changed, 1010 insertions(+), 88 deletions(-)
base-commit: 4acf6d4f6afc3478753e49c495132619667549d9
--
2.49.0.850.g28803427d3-goog
kernel test robot <lkp(a)intel.com> writes:
> All errors (new ones prefixed by >>):
>
> arch/powerpc/net/bpf_jit_comp64.c: In function 'bpf_jit_build_body':
>>> arch/powerpc/net/bpf_jit_comp64.c:814:4: error: a label can only be part of a statement and a declaration is not a statement
> 814 | bool sync_emitted = false;
> | ^~~~
>>> arch/powerpc/net/bpf_jit_comp64.c:815:4: error: expected expression before 'bool'
> 815 | bool ori31_emitted = false;
> | ^~~~
>>> arch/powerpc/net/bpf_jit_comp64.c:833:6: error: 'ori31_emitted' undeclared (first use in this function)
> 833 | ori31_emitted = true;
> | ^~~~~~~~~~~~~
> arch/powerpc/net/bpf_jit_comp64.c:833:6: note: each undeclared identifier is reported only once for each function it appears in
Fixed this for v3. For the other archs, the patches also don't add
declarations in a switch/case.
I also checked that there are no new W=2 warnings for the touched C
files on x86 with the vmtest bpf config.
I have not checked that all files that include a touched header don't
have new warnings. When doing -j $(nproc) the diff does not work and
with -j 1 it takes forever (e.g., because bpf.h is touched). If you
think this is required just let me know (and if you have a tip on how to
do it more quickly that would be great too).
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.
The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.
With CONFIG_KUNIT enabled, image size increase with this series applied is
approximately 1%. The image size increase (and with it the functionality
introduced by this series) can be avoided by disabling
CONFIG_KUNIT_SUPPRESS_BACKTRACE.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b…
and offers a more comprehensive solution of the problem discussed there.
Design note:
Function pointers are only added to the __bug_table section if both
CONFIG_KUNIT_SUPPRESS_BACKTRACE and CONFIG_DEBUG_BUGVERBOSE are enabled
to avoid image size increases if CONFIG_KUNIT is disabled. There would be
some benefits to adding those pointers all the time (reduced complexity,
ability to display function names in BUG/WARNING messages). That change,
if desired, can be made later.
Checkpatch note:
Remaining checkpatch errors and warnings were deliberately ignored.
Some are triggered by matching coding style or by comments interpreted
as code, others by assembler macros which are disliked by checkpatch.
Suggestions for improvements are welcome.
Changes since RFC:
- Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests
Changes since v1:
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags
[I retained those tags since there have been no functional changes]
- Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by
default.
Changes since v2:
- Rebased to v6.9-rc2
- Added comments to drm warning suppression explaining why it is needed.
- Added patch to move conditional code in arch/sh/include/asm/bug.h
to avoid kerneldoc warning
- Added architecture maintainers to Cc: for architecture specific patches
- No functional changes
Changes since v3:
- Rebased to v6.14-rc6
- Dropped net: "kunit: Suppress lock warning noise at end of dev_addr_lists tests"
since 3db3b62955cd6d73afde05a17d7e8e106695c3b9
- Added __kunit_ and KUNIT_ prefixes.
- Tested on interessed architectures.
----
Guenter Roeck (14):
bug/kunit: Core support for suppressing warning backtraces
kunit: bug: Count suppressed warning backtraces
kunit: Add test cases for backtrace warning suppression
kunit: Add documentation for warning backtrace suppression API
drm: Suppress intentional warning backtraces in scaling unit tests
x86: Add support for suppressing warning backtraces
arm64: Add support for suppressing warning backtraces
loongarch: Add support for suppressing warning backtraces
parisc: Add support for suppressing warning backtraces
s390: Add support for suppressing warning backtraces
sh: Add support for suppressing warning backtraces
sh: Move defines needed for suppressing warning backtraces
riscv: Add support for suppressing warning backtraces
powerpc: Add support for suppressing warning backtraces
Documentation/dev-tools/kunit/usage.rst | 30 ++++++-
arch/arm64/include/asm/asm-bug.h | 27 ++++--
arch/arm64/include/asm/bug.h | 8 +-
arch/loongarch/include/asm/bug.h | 42 +++++++---
arch/parisc/include/asm/bug.h | 29 +++++--
arch/powerpc/include/asm/bug.h | 37 +++++++--
arch/riscv/include/asm/bug.h | 38 ++++++---
arch/s390/include/asm/bug.h | 17 +++-
arch/sh/include/asm/bug.h | 28 ++++++-
arch/x86/include/asm/bug.h | 21 +++--
drivers/gpu/drm/tests/drm_rect_test.c | 16 ++++
include/asm-generic/bug.h | 16 +++-
include/kunit/bug.h | 56 +++++++++++++
include/kunit/test.h | 1 +
include/linux/bug.h | 13 +++
lib/bug.c | 51 +++++++++++-
lib/kunit/Kconfig | 9 ++
lib/kunit/Makefile | 7 +-
lib/kunit/backtrace-suppression-test.c | 104 ++++++++++++++++++++++++
lib/kunit/bug.c | 42 ++++++++++
20 files changed, 519 insertions(+), 73 deletions(-)
create mode 100644 include/kunit/bug.h
create mode 100644 lib/kunit/backtrace-suppression-test.c
create mode 100644 lib/kunit/bug.c
--
2.34.1
The following set of commands:
ip link add br0 type bridge vlan_filtering 1 # vlan_default_pvid 1 is implicit
ip link set swp0 master br0
bridge vlan add dev swp0 vid 1
should result in the dropping of untagged and 802.1p-tagged traffic, but
we see that it continues to be accepted. Whereas, had we deleted VID 1
instead, the aforementioned dropping would have worked
This is because the ANA_PORT_DROP_CFG update logic doesn't run, because
ocelot_vlan_add() only calls ocelot_port_set_pvid() if the new VLAN has
the BRIDGE_VLAN_INFO_PVID flag.
Similar to other drivers like mt7530_port_vlan_add() which handle this
case correctly, we need to test whether the VLAN we're changing used to
have the BRIDGE_VLAN_INFO_PVID flag, but lost it now. That amounts to a
PVID deletion and should be treated as such.
Regarding blame attribution: this never worked properly since the
introduction of bridge VLAN filtering in commit 7142529f1688 ("net:
mscc: ocelot: add VLAN filtering"). However, there was a significant
paradigm shift which aligned the ANA_PORT_DROP_CFG register with the
PVID concept rather than with the native VLAN concept, and that change
wasn't targeted for 'stable'. Realistically, that is as far as this fix
needs to be propagated to.
Fixes: be0576fed6d3 ("net: mscc: ocelot: move the logic to drop 802.1p traffic to the pvid deletion")
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
drivers/net/ethernet/mscc/ocelot.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index ef93df520887..08bee56aea35 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -830,6 +830,7 @@ EXPORT_SYMBOL(ocelot_vlan_prepare);
int ocelot_vlan_add(struct ocelot *ocelot, int port, u16 vid, bool pvid,
bool untagged)
{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
int err;
/* Ignore VID 0 added to our RX filter by the 8021q module, since
@@ -849,6 +850,11 @@ int ocelot_vlan_add(struct ocelot *ocelot, int port, u16 vid, bool pvid,
ocelot_bridge_vlan_find(ocelot, vid));
if (err)
return err;
+ } else if (ocelot_port->pvid_vlan &&
+ ocelot_bridge_vlan_find(ocelot, vid) == ocelot_port->pvid_vlan) {
+ err = ocelot_port_set_pvid(ocelot, port, NULL);
+ if (err)
+ return err;
}
/* Untagged egress vlan clasification */
--
2.43.0
This patch series introduces the Hornet LSM. The goal of Hornet is to
provide a signature verification mechanism for eBPF programs.
eBPF has similar requirements to that of modules when it comes to
loading: find symbol addresses, fix up ELF relocations, some struct
field offset handling stuff called CO-RE (compile-once run-anywhere),
and some other miscellaneous bookkeeping. During eBPF program
compilation, pseudo-values get written to the immediate operands of
instructions. During loading, those pseudo-values get rewritten with
concrete addresses or data applicable to the currently running system,
e.g., a kallsyms address or an fd for a map. This needs to happen
before the instructions for a bpf program are loaded into the kernel
via the bpf() syscall. Unlike modules, an in-kernel loader
unfortunately doesn't exist. Typically, the instruction rewriting is
done dynamically in userspace via libbpf. Since the relocations and
instruction modifications are happening in userspace, and their values
may change depending upon the running system, this breaks known
signature verification mechanisms.
Light skeleton programs were introduced in order to support early
loading of eBPF programs along with user-mode drivers. They utilize a
separate eBPF program that can load a target eBPF program and perform
all necessary relocations in-kernel without needing a working
userspace. Light skeletons were mentioned as a possible path forward
for signature verification.
Hornet takes a simple approach to light-skeleton-based eBPF signature
verification. A PKCS#7 signature of a data buffer containing the raw
instructions of an eBPF program, followed by the initial values of any
maps used by the program is used. A utility script is provided to
parse and extract the contents of autogenerated header files created
via bpftool. That payload can then be signed and appended to the light
skeleton executable.
Maps are frozen to prevent TOCTOU bugs where a sufficiently privileged
user could rewrite map data between the calls to BPF_PROG_LOAD and
BPF_PROG_RUN. Additionally, both sparse-array-based and
fd_array_cnt-based map fd arrays are supported for signature
verification.
References:
[1] https://lore.kernel.org/bpf/20220209054315.73833-1-alexei.starovoitov@gmail…
[2] https://lore.kernel.org/bpf/CAADnVQ+wPK1KKZhCgb-Nnf0Xfjk8M1UpX5fnXC=cBzdEYb…
Change list:
- v1 -> v2
- Jargon clarification, maintainer entry and a few cosmetic fixes
Revisions:
- v1
https://lore.kernel.org/bpf/20250321164537.16719-1-bboscaccy@linux.microsof…
Blaise Boscaccy (4):
security: Hornet LSM
hornet: Introduce sign-ebpf
hornet: Add a light-skeleton data extactor script
selftests/hornet: Add a selftest for the Hornet LSM
Documentation/admin-guide/LSM/Hornet.rst | 53 +++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 9 +
crypto/asymmetric_keys/pkcs7_verify.c | 10 +
include/linux/kernel_read_file.h | 1 +
include/linux/verification.h | 1 +
include/uapi/linux/lsm.h | 1 +
scripts/Makefile | 1 +
scripts/hornet/Makefile | 5 +
scripts/hornet/extract-skel.sh | 29 ++
scripts/hornet/sign-ebpf.c | 411 +++++++++++++++++++
security/Kconfig | 3 +-
security/Makefile | 1 +
security/hornet/Kconfig | 11 +
security/hornet/Makefile | 4 +
security/hornet/hornet_lsm.c | 239 +++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/hornet/Makefile | 51 +++
tools/testing/selftests/hornet/loader.c | 21 +
tools/testing/selftests/hornet/trivial.bpf.c | 33 ++
20 files changed, 885 insertions(+), 1 deletion(-)
create mode 100644 Documentation/admin-guide/LSM/Hornet.rst
create mode 100644 scripts/hornet/Makefile
create mode 100755 scripts/hornet/extract-skel.sh
create mode 100644 scripts/hornet/sign-ebpf.c
create mode 100644 security/hornet/Kconfig
create mode 100644 security/hornet/Makefile
create mode 100644 security/hornet/hornet_lsm.c
create mode 100644 tools/testing/selftests/hornet/Makefile
create mode 100644 tools/testing/selftests/hornet/loader.c
create mode 100644 tools/testing/selftests/hornet/trivial.bpf.c
--
2.48.1
Greetings:
Welcome to v4.
This series fixes netdevsim to correctly set the NAPI ID on the skb.
This is helpful for writing tests around features that use
SO_INCOMING_NAPI_ID.
In addition to the netdevsim fix in patch 1, patches 2 & 3 do some self
test refactoring and add a test for NAPI IDs. The test itself (patch 3)
introduces a C helper because apparently python doesn't have
socket.SO_INCOMING_NAPI_ID.
Thanks,
Joe
v4:
- Updated the macro guard in patch 2
- Removed the remote deploy from patch 3
v3: https://lore.kernel.org/netdev/20250418013719.12094-1-jdamato@fastly.com/
- Dropped patch 3 from v2 as it is no longer necessary.
- Patch 3 from this series (which was patch 4 in the v2)
- Sorted .gitignore alphabetically
- added cfg.remote_deploy so the test supports real remote machines
- Dropped the NetNSEnter as it is unnecessary
- Fixed a string interpolation issue that Paolo hit with his Python
version
v2: https://lore.kernel.org/netdev/20250417013301.39228-1-jdamato@fastly.com/
- No longer an RFC
- Minor whitespace change in patch 1 (no functional change).
- Patches 2-4 new in v2
rfcv1: https://lore.kernel.org/netdev/20250329000030.39543-1-jdamato@fastly.com/
Joe Damato (3):
netdevsim: Mark NAPI ID on skb in nsim_rcv
selftests: drv-net: Factor out ksft C helpers
selftests: drv-net: Test that NAPI ID is non-zero
drivers/net/netdevsim/netdev.c | 2 +
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 6 +-
tools/testing/selftests/drivers/net/ksft.h | 56 +++++++++++++
.../testing/selftests/drivers/net/napi_id.py | 23 +++++
.../selftests/drivers/net/napi_id_helper.c | 83 +++++++++++++++++++
.../selftests/drivers/net/xdp_helper.c | 49 +----------
7 files changed, 172 insertions(+), 48 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/ksft.h
create mode 100755 tools/testing/selftests/drivers/net/napi_id.py
create mode 100644 tools/testing/selftests/drivers/net/napi_id_helper.c
base-commit: cd7276ecac9c64c80433fbcff2e35aceaea6f477
--
2.43.0
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
---
changelog
---------
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to
to `lpad 0`.
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
---
Changes in v12:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri…
Changes in v11:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Clément Léger (1):
riscv: Add Firmware Feature SBI extensions definitions
Deepak Gupta (25):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv: Implements arch agnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: kernel command line option to opt out of user cfi
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 176 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 20 +
arch/riscv/Makefile | 5 +-
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 13 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 25 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/sbi.h | 26 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 89 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 22 +
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 8 +
arch/riscv/kernel/cpufeature.c | 13 +
arch/riscv/kernel/entry.S | 31 +-
arch/riscv/kernel/head.S | 12 +
arch/riscv/kernel/process.c | 26 +-
arch/riscv/kernel/ptrace.c | 83 ++++
arch/riscv/kernel/signal.c | 142 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 43 ++
arch/riscv/kernel/usercfi.c | 530 +++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 12 +
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 17 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 27 ++
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 10 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 78 +++
tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 27 ++
54 files changed, 2195 insertions(+), 29 deletions(-)
---
base-commit: 39a803b754d5224a3522016b564113ee1e4091b2
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
Drivers that are told to allocate RX buffers from pools of DMA memory
should have enough memory in the pool to satisfy projected allocation
requests (a function of ring size, MTU & other parameters). If there's
not enough memory, RX ring refill might fail later at inconvenient times
(e.g. during NAPI poll).
This commit adds a check at dmabuf pool init time that compares the
amount of memory in the underlying chunk pool (configured by the user
space application providing dmabuf memory) with the desired pool size
(previously set by the driver) and fails with an error message if chunk
memory isn't enough.
Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
Signed-off-by: Cosmin Ratiu <cratiu(a)nvidia.com>
---
net/core/devmem.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 6e27a47d0493..651cd55ebb28 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -299,6 +299,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
int mp_dmabuf_devmem_init(struct page_pool *pool)
{
struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
+ size_t size;
if (!binding)
return -EINVAL;
@@ -312,6 +313,16 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
if (pool->p.order != 0)
return -E2BIG;
+ /* Validate that the underlying dmabuf has enough memory to satisfy
+ * requested pool size.
+ */
+ size = gen_pool_size(binding->chunk_pool) >> PAGE_SHIFT;
+ if (size < pool->p.pool_size) {
+ pr_warn("%s: Insufficient dmabuf memory (%zu pages) to satisfy pool_size (%u pages)\n",
+ __func__, size, pool->p.pool_size);
+ return -ENOMEM;
+ }
+
net_devmem_dmabuf_binding_get(binding);
return 0;
}
--
2.45.0
The Automated Testing Summit (ATS) 2025 will be held as a co-located event at the Open Source Summit North America, and we’re now accepting talk proposals!
https://events.linuxfoundation.org/open-source-summit-north-america/feature…
📅 Date: June 26, 2025
📍 Location: Denver, CO, USA
Hosted by KernelCI, ATS is a technical summit focused on the challenges of testing and quality assurance in the Linux ecosystem — especially in upstream kernel development, embedded systems, cloud environments, and CI integration.
This is a great opportunity to share your work on:
* Kernel and userspace test frameworks
* Lab infrastructure and automation
* CI/CD pipelines for Linux
* Fuzzing, performance testing, and debugging tools
* Sharing and standardizing test results across systems
Whether you’re working on kernel testing, running tests on hardware labs, developing QA tools, or building infrastructure that scales across projects, ATS is the place to collaborate and move the ecosystem forward.
Submit your talk by May 18, 2025:
👉 Call for Proposals (CFP): https://sessionize.com/atsna2025
We hope to see you in Denver!
— The KernelCI Team
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Paul Walmsley <paul.walmsley(a)sifive.com>
To: Palmer Dabbelt <palmer(a)dabbelt.com>
To: Albert Ou <aou(a)eecs.berkeley.edu>
To: Conor Dooley <conor(a)kernel.org>
To: Rob Herring <robh(a)kernel.org>
To: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
To: Arnd Bergmann <arnd(a)arndb.de>
To: Christian Brauner <brauner(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Oleg Nesterov <oleg(a)redhat.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Kees Cook <kees(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
To: Jann Horn <jannh(a)google.com>
To: Conor Dooley <conor+dt(a)kernel.org>
To: Miguel Ojeda <ojeda(a)kernel.org>
To: Alex Gaynor <alex.gaynor(a)gmail.com>
To: Boqun Feng <boqun.feng(a)gmail.com>
To: Gary Guo <gary(a)garyguo.net>
To: Björn Roy Baron <bjorn3_gh(a)protonmail.com>
To: Benno Lossin <benno.lossin(a)proton.me>
To: Andreas Hindborg <a.hindborg(a)kernel.org>
To: Alice Ryhl <aliceryhl(a)google.com>
To: Trevor Gross <tmgross(a)umich.edu>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-riscv(a)lists.infradead.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: alistair.francis(a)wdc.com
Cc: richard.henderson(a)linaro.org
Cc: jim.shu(a)sifive.com
Cc: andybnac(a)gmail.com
Cc: kito.cheng(a)sifive.com
Cc: charlie(a)rivosinc.com
Cc: atishp(a)rivosinc.com
Cc: evan(a)rivosinc.com
Cc: cleger(a)rivosinc.com
Cc: alexghiti(a)rivosinc.com
Cc: samitolvanen(a)google.com
Cc: broonie(a)kernel.org
Cc: rick.p.edgecombe(a)intel.com
Cc: rust-for-linux(a)vger.kernel.org
changelog
---------
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to
to `lpad 0`.
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri…
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri…
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Clément Léger (1):
riscv: Add Firmware Feature SBI extensions definitions
Deepak Gupta (25):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv: Implements arch agnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: kernel command line option to opt out of user cfi
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad
Documentation/admin-guide/kernel-parameters.txt | 8 +
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 179 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 20 +
arch/riscv/Makefile | 5 +-
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 12 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 25 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/sbi.h | 26 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 96 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 34 ++
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 8 +
arch/riscv/kernel/cpufeature.c | 13 +
arch/riscv/kernel/entry.S | 28 +-
arch/riscv/kernel/head.S | 23 +
arch/riscv/kernel/process.c | 26 +-
arch/riscv/kernel/ptrace.c | 95 ++++
arch/riscv/kernel/signal.c | 142 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 43 ++
arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 6 +
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 17 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 27 +
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 10 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 171 +++++++
tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 27 +
55 files changed, 2346 insertions(+), 29 deletions(-)
---
base-commit: 39a803b754d5224a3522016b564113ee1e4091b2
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure
represents a HW supported queue/buffer for VM to use exclusively, e.g.
- NVIDIA's virtual command queue
- AMD vIOMMU's command buffer
either of which is an IOMMU HW feature to directly load and execute cache
invalidation commands issued by a guest kernel, to shoot down TLB entries
that HW cached for guest-owned stage-1 page table entries. This is a big
improvement since there is no VM Exit during an invalidation, compared to
the traditional invalidation pathway by trapping a guest-own invalidation
queue and forwarding those commands/requests to the host kernel that will
eventually fill a HW-owned queue to execute those commands.
Thus, a vCMDQ object, as an initial use case, is all about a guest-owned
HW command queue that VMM can allocate/configure depending on the request
from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator
IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue
info, such as queue base address, size, and etc.
Meanwhile, a guest-owned command queue needs the kernel (a command queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the command queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an
output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a
driver-specific user data support by a vIOMMU object.
As a real-world use case, this series implements a vCMDQ support to the
tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word,
this is also the Tegra CMDQV series Part-2 (user-space support), reworked
from Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v1
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v1
Thanks
Nicolin
Nicolin Chen (16):
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommu: Add iommu_copy_struct_to_user helper
iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add covearge for viommu data
iommufd/viommu: Add driver-allocated vDEVICE support
iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct
iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update vCMDQ
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/arm-smmu-v3: Add vsmmu_alloc impl op
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +-
drivers/iommu/iommufd/iommufd_private.h | 20 +-
drivers/iommu/iommufd/iommufd_test.h | 17 +
include/linux/iommu.h | 43 ++-
include/linux/iommufd.h | 93 +++++
include/uapi/linux/iommufd.h | 87 +++++
tools/testing/selftests/iommu/iommufd_utils.h | 21 +-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 26 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 349 +++++++++++++++++-
drivers/iommu/iommufd/driver.c | 54 +++
drivers/iommu/iommufd/main.c | 54 ++-
drivers/iommu/iommufd/selftest.c | 58 ++-
drivers/iommu/iommufd/viommu.c | 78 +++-
tools/testing/selftests/iommu/iommufd.c | 34 +-
.../selftests/iommu/iommufd_fail_nth.c | 5 +-
Documentation/userspace-api/iommufd.rst | 11 +
16 files changed, 912 insertions(+), 62 deletions(-)
--
2.43.0
The SBI Firmware Feature extension allows the S-mode to request some
specific features (either hardware or software) to be enabled. This
series uses this extension to request misaligned access exception
delegation to S-mode in order to let the kernel handle it. It also adds
support for the KVM FWFT SBI extension based on the misaligned access
handling infrastructure.
FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be
tested using the qemu provided at [2] which contains the series from
[3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct
delegation of misaligned exceptions. Upstream OpenSBI can be used.
Note: Since SBI V3.0 is not yet ratified, FWFT extension API is split
between interface only and implementation, allowing to pick only the
interface which do not have hard dependencies on SBI.
The tests can be run using the kselftest from series [4].
$ qemu-system-riscv64 \
-cpu rv64,trap-misaligned-access=true,v=true \
-M virt \
-m 1024M \
-bios fw_dynamic.bin \
-kernel Image
...
# ./misaligned
TAP version 13
1..23
# Starting 23 tests from 1 test cases.
# RUN global.gp_load_lh ...
# OK global.gp_load_lh
ok 1 global.gp_load_lh
# RUN global.gp_load_lhu ...
# OK global.gp_load_lhu
ok 2 global.gp_load_lhu
# RUN global.gp_load_lw ...
# OK global.gp_load_lw
ok 3 global.gp_load_lw
# RUN global.gp_load_lwu ...
# OK global.gp_load_lwu
ok 4 global.gp_load_lwu
# RUN global.gp_load_ld ...
# OK global.gp_load_ld
ok 5 global.gp_load_ld
# RUN global.gp_load_c_lw ...
# OK global.gp_load_c_lw
ok 6 global.gp_load_c_lw
# RUN global.gp_load_c_ld ...
# OK global.gp_load_c_ld
ok 7 global.gp_load_c_ld
# RUN global.gp_load_c_ldsp ...
# OK global.gp_load_c_ldsp
ok 8 global.gp_load_c_ldsp
# RUN global.gp_load_sh ...
# OK global.gp_load_sh
ok 9 global.gp_load_sh
# RUN global.gp_load_sw ...
# OK global.gp_load_sw
ok 10 global.gp_load_sw
# RUN global.gp_load_sd ...
# OK global.gp_load_sd
ok 11 global.gp_load_sd
# RUN global.gp_load_c_sw ...
# OK global.gp_load_c_sw
ok 12 global.gp_load_c_sw
# RUN global.gp_load_c_sd ...
# OK global.gp_load_c_sd
ok 13 global.gp_load_c_sd
# RUN global.gp_load_c_sdsp ...
# OK global.gp_load_c_sdsp
ok 14 global.gp_load_c_sdsp
# RUN global.fpu_load_flw ...
# OK global.fpu_load_flw
ok 15 global.fpu_load_flw
# RUN global.fpu_load_fld ...
# OK global.fpu_load_fld
ok 16 global.fpu_load_fld
# RUN global.fpu_load_c_fld ...
# OK global.fpu_load_c_fld
ok 17 global.fpu_load_c_fld
# RUN global.fpu_load_c_fldsp ...
# OK global.fpu_load_c_fldsp
ok 18 global.fpu_load_c_fldsp
# RUN global.fpu_store_fsw ...
# OK global.fpu_store_fsw
ok 19 global.fpu_store_fsw
# RUN global.fpu_store_fsd ...
# OK global.fpu_store_fsd
ok 20 global.fpu_store_fsd
# RUN global.fpu_store_c_fsd ...
# OK global.fpu_store_c_fsd
ok 21 global.fpu_store_c_fsd
# RUN global.fpu_store_c_fsdsp ...
# OK global.fpu_store_c_fsdsp
ok 22 global.fpu_store_c_fsdsp
# RUN global.gen_sigbus ...
[12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000]
[12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51
[12797.989169] Hardware name: riscv-virtio,qemu (DT)
[12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100
[12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008
[12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160
[12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002
[12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef
[12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000
[12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848
[12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200
[12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000
[12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0
[12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073
[12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006
[12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001
# OK global.gen_sigbus
ok 23 global.gen_sigbus
# PASSED: 23 / 23 tests passed.
# Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0
With kvm-tools:
# lkvm run -k sbi.flat -m 128
Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97
Info: Removed ghost socket file "/root/.lkvm//guest-97.sock".
##########################################################################
# kvm-unit-tests
##########################################################################
... [test messages elided]
PASS: sbi: fwft: FWFT extension probing no error
PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1
PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor
SUMMARY: 50 tests, 2 unexpected failures, 12 skipped
This series is available at [5].
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/… [1]
Link: https://github.com/rivosinc/qemu/tree/dev/cleger/misaligned [2]
Link: https://lore.kernel.org/all/20241211211933.198792-3-fkonrad@amd.com/T/ [3]
Link: https://lore.kernel.org/linux-riscv/20250414123543.1615478-1-cleger@rivosin… [4]
Link: https://github.com/rivosinc/linux/tree/dev/cleger/fwft [5]
---
V5:
- Return ERANGE as mapping for SBI_ERR_BAD_RANGE
- Removed unused sbi_fwft_get()
- Fix kernel for sbi_fwft_local_set_cpumask()
- Fix indentation for sbi_fwft_local_set()
- Remove spurious space in kvm_sbi_fwft_ops.
- Rebased on origin/master
- Remove fixes commits and sent them as a separate series [4]
V4:
- Check SBI version 3.0 instead of 2.0 for FWFT presence
- Use long for kvm_sbi_fwft operation return value
- Init KVM sbi extension even if default_disabled
- Remove revert_on_fail parameter for sbi_fwft_feature_set().
- Fix comments for sbi_fwft_set/get()
- Only handle local features (there are no globals yet in the spec)
- Add new SBI errors to sbi_err_map_linux_errno()
V3:
- Added comment about kvm sbi fwft supported/set/get callback
requirements
- Move struct kvm_sbi_fwft_feature in kvm_sbi_fwft.c
- Add a FWFT interface
V2:
- Added Kselftest for misaligned testing
- Added get_user() usage instead of __get_user()
- Reenable interrupt when possible in misaligned access handling
- Document that riscv supports unaligned-traps
- Fix KVM extension state when an init function is present
- Rework SBI misaligned accesses trap delegation code
- Added support for CPU hotplugging
- Added KVM SBI reset callback
- Added reset for KVM SBI FWFT lock
- Return SBI_ERR_DENIED_LOCKED when LOCK flag is set
Clément Léger (13):
riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions
riscv: sbi: add new SBI error mappings
riscv: sbi: add FWFT extension interface
riscv: sbi: add SBI FWFT extension calls
riscv: misaligned: request misaligned exception from SBI
riscv: misaligned: use on_each_cpu() for scalar misaligned access
probing
riscv: misaligned: use correct CONFIG_ ifdef for
misaligned_access_speed
riscv: misaligned: move emulated access uniformity check in a function
riscv: misaligned: add a function to check misalign trap delegability
RISC-V: KVM: add SBI extension init()/deinit() functions
RISC-V: KVM: add SBI extension reset callback
RISC-V: KVM: add support for FWFT SBI extension
RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEG
arch/riscv/include/asm/cpufeature.h | 8 +-
arch/riscv/include/asm/kvm_host.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_sbi.h | 12 +
arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 29 +++
arch/riscv/include/asm/sbi.h | 60 +++++
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/sbi.c | 75 ++++++
arch/riscv/kernel/traps_misaligned.c | 110 ++++++++-
arch/riscv/kernel/unaligned_access_speed.c | 8 +-
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu.c | 7 +-
arch/riscv/kvm/vcpu_sbi.c | 54 +++++
arch/riscv/kvm/vcpu_sbi_fwft.c | 252 +++++++++++++++++++++
arch/riscv/kvm/vcpu_sbi_sta.c | 3 +-
14 files changed, 610 insertions(+), 15 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h
create mode 100644 arch/riscv/kvm/vcpu_sbi_fwft.c
--
2.49.0
v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.…
Addressed comments following conversations with Pavel, Stan, and
Harshitha. Thank you guys for the reviews again. Overall minor changes:
Changelog:
- Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with
a TX niov in that path (Pavel).
- Fix locking case in !netif_device_present (Jakub/Stan).
v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c…
Changelog:
- Use priv->bindings list instead of sock_bindings_list. This was missed
during the rebase as the bindings have been updated to use
priv->bindings recently (thanks Stan!)
v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.…
Only address minor comments on V7
Changelog:
- Use netdev locking instead of rtnl_locking to match rx path.
- Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of
NET_IOV_UNSPECIFIED.
- Post send binding to net_devmem_dmabuf_bindings after it's been fully
initialized (Stan).
v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.…
===
Changelog:
- Check the dmabuf net_iov binding belongs to the device the TX is going
out on. (Jakub)
- Provide detailed inspection of callsites of
__skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub)
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c…
===
v6 has no major changes. Addressed a few issues from Paolo and David,
and collected Acks from Stan. Thank you everyone for the review!
Changes:
- retain behavior to process MSG_FASTOPEN even if the provided cmsg is
invalid (Paolo).
- Rework the freeing of tx_vec slightly (it now has its own err label).
(Paolo).
- Squash the commit that makes dmabuf unbinding scheduled work into the
same one which implements the TX path so we don't run into future
errors on bisecting (Paolo).
- Fix/add comments to explain how dmabuf binding refcounting works
(David).
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c…
===
v5 has no major changes; it clears up the relatively minor issues
pointed out to in v4, and rebases the series on top of net-next to
resolve the conflict with a patch that raced to the tree. It also
collects the review tags from v4.
Changes:
- Rebase to net-next
- Fix issues in selftest (Stan).
- Address comments in the devmem and netmem driver docs (Stan and Bagas)
- Fix zerocopy_fill_skb_from_devmem return error code (Stan).
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.…
===
v4 mainly addresses the critical driver support issue surfaced in v3 by
Paolo and Stan. Drivers aiming to support netmem_tx should make sure not
to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs
may come from dma-bufs.
Additionally other feedback from v3 is addressed.
Major changes:
- Add helpers to handle netmem dma-addrs. Add GVE support for
netmem_tx.
- Fix binding->tx_vec not being freed on error paths during the
tx binding.
- Add a minimal devmem_tx test to devmem.py.
- Clean up everything obsolete from the cover letter (Paolo).
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
===
Address minor comments from RFCv2 and fix a few build warnings and
ynl-regen issues. No major changes.
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now multiple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Cc: Samiullah Khawaja <skhawaja(a)google.com>
Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Mina Almasry (8):
netmem: add niov->type attribute to distinguish different net_iov
types
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: add devmem TCP TX documentation
net: enable driver support for netmem TX
gve: add netmem TX support to GVE DQO-RDA mode
net: check for driver support in netmem TX
selftests: ncdevmem: Implement devmem TCP TX
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 150 ++++++++-
.../networking/net_cachelines/net_device.rst | 1 +
Documentation/networking/netdev-features.rst | 5 +
Documentation/networking/netmem.rst | 23 +-
drivers/net/ethernet/google/gve/gve_main.c | 4 +
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +-
include/linux/netdevice.h | 2 +
include/linux/skbuff.h | 17 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 34 +-
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
io_uring/zcrx.c | 3 +-
net/core/datagram.c | 48 ++-
net/core/dev.c | 34 +-
net/core/devmem.c | 139 ++++++--
net/core/devmem.h | 83 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 80 ++++-
net/core/skbuff.c | 48 ++-
net/core/sock.c | 6 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/tcp.c | 50 ++-
net/ipv6/ip6_output.c | 3 +-
net/vmw_vsock/virtio_transport_common.c | 5 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/devmem.py | 26 +-
.../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++-
30 files changed, 1015 insertions(+), 90 deletions(-)
base-commit: 21b01cb8e88ea200a834a2c114b5dc6aa378ac56
--
2.49.0.805.g082f7c87e0-goog
When running mm selftest to verify mm patches, 'compaction_test' case
failed on an x86 server with 1TB memory. And the root cause is that it
has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB
for that x86 server, and when it succeeds, it expects it's large than
1/3 of 80% of the free memory in system. This logic only works for
platform with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may
raise false alarm for others.
Fix it by changing the fixed page number to self-adjustable number
according to the real number of free memory.
Fixes: bd67d5c15cc19 ("Test compaction of mlocked memory")
Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com>
---
tools/testing/selftests/mm/compaction_test.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 2c3a0eb6b22d..9bc4591c7b16 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
int compaction_index = 0;
char nr_hugepages[20] = {0};
char init_nr_hugepages[24] = {0};
+ char target_nr_hugepages[24] = {0};
+ int slen;
snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
"%lu", initial_nr_hugepages);
@@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
goto out;
}
- /* Request a large number of huge pages. The Kernel will allocate
- as much as it can */
- if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
- ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
+ /*
+ * Request huge pages for about half of the free memory. The Kernel
+ * will allocate as much as it can, and we expect it will get at least 1/3
+ */
+ nr_hugepages_ul = mem_free / hugepage_size / 2;
+ snprintf(target_nr_hugepages, sizeof(target_nr_hugepages),
+ "%lu", nr_hugepages_ul);
+
+ slen = strlen(target_nr_hugepages);
+ if (write(fd, target_nr_hugepages, slen) != slen) {
+ ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n",
+ nr_hugepages_ul, strerror(errno));
goto close_fd;
}
--
2.43.5
Greetings:
Welcome to v3.
This series fixes netdevsim to correctly set the NAPI ID on the skb.
This is helpful for writing tests around features that use
SO_INCOMING_NAPI_ID.
In addition to the netdevsim fix in patch 1, patches 2 & 3 do some self
test refactoring and add a test for NAPI IDs. The test itself (patch 4)
introduces a C helper because apparently python doesn't have
socket.SO_INCOMING_NAPI_ID.
Thanks,
Joe
v3:
- Dropped patch 3 from v2 as it is no longer necessary.
- Patch 3 from this series (which was patch 4 in the v2)
- Sorted .gitignore alphabetically
- added cfg.remote_deploy so the test supports real remote machines
- Dropped the NetNSEnter as it is unnecessary
- Fixed a string interpolation issue that Paolo hit with his Python
version
v2: https://lore.kernel.org/netdev/20250417013301.39228-1-jdamato@fastly.com/
- No longer an RFC
- Minor whitespace change in patch 1 (no functional change).
- Patches 2-4 new in v2
rfcv1: https://lore.kernel.org/netdev/20250329000030.39543-1-jdamato@fastly.com/
Joe Damato (3):
netdevsim: Mark NAPI ID on skb in nsim_rcv
selftests: drv-net: Factor out ksft C helpers
selftests: drv-net: Test that NAPI ID is non-zero
drivers/net/netdevsim/netdev.c | 2 +
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 6 +-
tools/testing/selftests/drivers/net/ksft.h | 56 +++++++++++++
.../testing/selftests/drivers/net/napi_id.py | 24 ++++++
.../selftests/drivers/net/napi_id_helper.c | 83 +++++++++++++++++++
.../selftests/drivers/net/xdp_helper.c | 49 +----------
7 files changed, 173 insertions(+), 48 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/ksft.h
create mode 100755 tools/testing/selftests/drivers/net/napi_id.py
create mode 100644 tools/testing/selftests/drivers/net/napi_id_helper.c
base-commit: 22ab6b9467c1822291a1175a0eb825b7ec057ef9
--
2.43.0
The selftest I provided can reproduce a panic:
'./test_progs -a cgroup_storage_update'
When we attach a program to cgroup and if prog->aux->cgroup_storage
exists, which means the cgroup_storage map is used in the program, we
will then allocate storage by bpf_cgroup_storages_alloc() and assign it
to pl->storage.
At the end, pl->storage will be assigned to
cgrp->bpf.effective[atype]->cgroup_storage by xxx_effective_progs().
But when we attach a program without the cgroup_storage map being used
(prog->aux->cgroup_storage is empty), the cgroup_storage in struct
bpf_prog_array_item is empty.
Then, if we use BPF_LINK_UPDATE to replace the old program with a new one
that uses the cgroup_storage map, we miss the cgroup_storage being
initialized.
This causes a panic when accessing storage in bpf_get_local_storage.
Jiayuan Chen (2):
bpf: Create cgroup storage if needed when updating link
selftests/bpf: Add link update test for cgroup_storage
kernel/bpf/cgroup.c | 24 +++++++---
.../selftests/bpf/prog_tests/cgroup_storage.c | 45 +++++++++++++++++++
.../selftests/bpf/progs/cgroup_storage.c | 6 +++
3 files changed, 70 insertions(+), 5 deletions(-)
--
2.47.1
If we try to access argument which is pointer to const void, it's an
UNKNOWN type, verifier will fail to load.
Use is_void_or_int_ptr to check if type is void or int pointer.
Add a selftest to check it.
---
KaFai Wan (2):
bpf: Allow access to const void pointer arguments in tracing programs
selftests/bpf: Add test to access const void pointer argument in
tracing program
kernel/bpf/btf.c | 13 +++----------
net/bpf/test_run.c | 8 +++++++-
.../selftests/bpf/progs/verifier_btf_ctx_access.c | 12 ++++++++++++
3 files changed, 22 insertions(+), 11 deletions(-)
Changelog:
v3->v4: Addressed comments from Alexei Starovoitov
- change SOB to match From email address
- add Acked-by from jirka
Details in here:
https://lore.kernel.org/all/20250417151548.1276279-1-kafai.wan@hotmail.com/
v2->v3: Addressed comments from jirka
- remove duplicate checks for void pointer
Details in here:
https://lore.kernel.org/bpf/20250416161756.1079178-1-kafai.wan@hotmail.com/
v1->v2: Addressed comments from jirka
- use btf_type_is_void to check if type is void
- merge is_void_ptr and is_int_ptr to is_void_or_int_ptr
- fix selftests
Details in here:
https://lore.kernel.org/all/20250412170626.3638516-1-kafai.wan@hotmail.com/
--
2.43.0
Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to
perform per-buffer accounting with debugfs which is not suitable for
production environments. Eventually we discovered the overhead with
per-buffer sysfs file creation/removal was significantly impacting
allocation and free times, and exacerbated kernfs lock contention. [2]
dma_buf_stats_setup() is responsible for 39% of single-page buffer
creation duration, or 74% of single-page dma_buf_export() duration when
stressing dmabuf allocations and frees.
I prototyped a change from per-buffer to per-exporter statistics with a
RCU protected list of exporter allocations that accommodates most (but
not all) of our use-cases and avoids almost all of the sysfs overhead.
While that adds less overhead than per-buffer sysfs, and less even than
the maintenance of the dmabuf debugfs_list, it's still *additional*
overhead on top of the debugfs_list and doesn't give us per-buffer info.
This series uses the existing dmabuf debugfs_list to implement a BPF
dmabuf iterator, which adds no overhead to buffer allocation/free and
provides per-buffer info. While the kernel must have CONFIG_DEBUG_FS for
the dmabuf_iter to be available, debugfs does not need to be mounted.
The BPF program loaded by userspace that extracts per-buffer information
gets to define its own interface which avoids the lack of ABI stability
with debugfs (even if it were mounted).
As this is a replacement for our use of CONFIG_DMABUF_SYSFS_STATS, the
last patch is a RFC for removing it from the kernel. Please see my
suggestion there regarding the timeline for that.
[1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.…
[2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com/
T.J. Mercier (4):
dma-buf: Rename and expose debugfs symbols
bpf: Add dmabuf iterator
selftests/bpf: Add test for dmabuf_iter
RFC: dma-buf: Remove DMA-BUF statistics
.../ABI/testing/sysfs-kernel-dmabuf-buffers | 24 ---
Documentation/driver-api/dma-buf.rst | 5 -
drivers/dma-buf/Kconfig | 15 --
drivers/dma-buf/Makefile | 1 -
drivers/dma-buf/dma-buf-sysfs-stats.c | 202 ------------------
drivers/dma-buf/dma-buf-sysfs-stats.h | 35 ---
drivers/dma-buf/dma-buf.c | 40 +---
include/linux/btf_ids.h | 1 +
include/linux/dma-buf.h | 6 +
kernel/bpf/Makefile | 3 +
kernel/bpf/dmabuf_iter.c | 130 +++++++++++
tools/testing/selftests/bpf/config | 1 +
.../selftests/bpf/prog_tests/dmabuf_iter.c | 116 ++++++++++
.../testing/selftests/bpf/progs/dmabuf_iter.c | 31 +++
14 files changed, 299 insertions(+), 311 deletions(-)
delete mode 100644 Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
delete mode 100644 drivers/dma-buf/dma-buf-sysfs-stats.c
delete mode 100644 drivers/dma-buf/dma-buf-sysfs-stats.h
create mode 100644 kernel/bpf/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c
--
2.49.0.604.gff1f9ca942-goog
On Tue, Apr 22, 2025 at 07:58:56PM -0400, Waiman Long <llong(a)redhat.com> wrote:
> Am I correct to assume that the purpose of 1d09069f5313f ("selftests:
> memcg: expect no low events in unprotected sibling") is to force a
> failure in the test_memcg_low test to force a change in the current
> behavior? Or was it the case that it didn't fail when you submit your
> patch?
Yes, the failure had been intended to mark unexpected mode of reclaim
(there's still a reproducer somewhere in the references). However, I
learnt that:
a) it ain't easy to fix,
b) the only occurence of the troublesome behavior was in the test and
never reported by users in real life.
I've started to prefer the variant where the particular check is
indefinite since that.
HTH,
Michal
v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c…
Changelog:
- Use priv->bindings list instead of sock_bindings_list. This was missed
during the rebase as the bindings have been updated to use
priv->bindings recently (thanks Stan!)
v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.…
Only address minor comments on V7
Changelog:
- Use netdev locking instead of rtnl_locking to match rx path.
- Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of
NET_IOV_UNSPECIFIED.
- Post send binding to net_devmem_dmabuf_bindings after it's been fully
initialized (Stan).
v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.…
===
Changelog:
- Check the dmabuf net_iov binding belongs to the device the TX is going
out on. (Jakub)
- Provide detailed inspection of callsites of
__skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub)
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c…
===
v6 has no major changes. Addressed a few issues from Paolo and David,
and collected Acks from Stan. Thank you everyone for the review!
Changes:
- retain behavior to process MSG_FASTOPEN even if the provided cmsg is
invalid (Paolo).
- Rework the freeing of tx_vec slightly (it now has its own err label).
(Paolo).
- Squash the commit that makes dmabuf unbinding scheduled work into the
same one which implements the TX path so we don't run into future
errors on bisecting (Paolo).
- Fix/add comments to explain how dmabuf binding refcounting works
(David).
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c…
===
v5 has no major changes; it clears up the relatively minor issues
pointed out to in v4, and rebases the series on top of net-next to
resolve the conflict with a patch that raced to the tree. It also
collects the review tags from v4.
Changes:
- Rebase to net-next
- Fix issues in selftest (Stan).
- Address comments in the devmem and netmem driver docs (Stan and Bagas)
- Fix zerocopy_fill_skb_from_devmem return error code (Stan).
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.…
===
v4 mainly addresses the critical driver support issue surfaced in v3 by
Paolo and Stan. Drivers aiming to support netmem_tx should make sure not
to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs
may come from dma-bufs.
Additionally other feedback from v3 is addressed.
Major changes:
- Add helpers to handle netmem dma-addrs. Add GVE support for
netmem_tx.
- Fix binding->tx_vec not being freed on error paths during the
tx binding.
- Add a minimal devmem_tx test to devmem.py.
- Clean up everything obsolete from the cover letter (Paolo).
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
===
Address minor comments from RFCv2 and fix a few build warnings and
ynl-regen issues. No major changes.
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now multiple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Cc: Samiullah Khawaja <skhawaja(a)google.com>
Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Mina Almasry (8):
netmem: add niov->type attribute to distinguish different net_iov
types
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: add devmem TCP TX documentation
net: enable driver support for netmem TX
gve: add netmem TX support to GVE DQO-RDA mode
net: check for driver support in netmem TX
selftests: ncdevmem: Implement devmem TCP TX
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 150 ++++++++-
.../networking/net_cachelines/net_device.rst | 1 +
Documentation/networking/netdev-features.rst | 5 +
Documentation/networking/netmem.rst | 23 +-
drivers/net/ethernet/google/gve/gve_main.c | 4 +
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +-
include/linux/netdevice.h | 2 +
include/linux/skbuff.h | 17 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 34 +-
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
io_uring/zcrx.c | 1 +
net/core/datagram.c | 48 ++-
net/core/dev.c | 34 +-
net/core/devmem.c | 139 ++++++--
net/core/devmem.h | 83 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 75 ++++-
net/core/skbuff.c | 48 ++-
net/core/sock.c | 6 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/tcp.c | 50 ++-
net/ipv6/ip6_output.c | 3 +-
net/vmw_vsock/virtio_transport_common.c | 5 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/devmem.py | 26 +-
.../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++-
30 files changed, 1009 insertions(+), 89 deletions(-)
base-commit: 240ce924d2718b8f6f622f2a9a9c219b9da736e8
--
2.49.0.805.g082f7c87e0-goog
This patch series introduces UFFDIO_MOVE feature to userfaultfd, which
has long been implemented and maintained by Andrea in his local tree [1],
but was not upstreamed due to lack of use cases where this approach would
be better than allocating a new page and copying the contents. Previous
upstraming attempts could be found at [6] and [7].
UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY). Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [3].
We see over 40% reduction (on a Google pixel 6 device) in the compacting
thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was
measured using a benchmark that emulates a heap compaction implementation
using userfaultfd (to allow concurrent accesses by application threads).
More details of the usecase are explained in [3].
Furthermore, UFFDIO_MOVE enables moving swapped-out pages without
touching them within the same vma. Today, it can only be done by mremap,
however it forces splitting the vma.
TODOs for follow-up improvements:
- cross-mm support. Known differences from single-mm and missing pieces:
- memcg recharging (might need to isolate pages in the process)
- mm counters
- cross-mm deposit table moves
- cross-mm test
- document the address space where src and dest reside in struct
uffdio_move
- TLB flush batching. Will require extensive changes to PTL locking in
move_pages_pte(). OTOH that might let us reuse parts of mremap code.
Changes since v5 [10]:
- added logic to split large folios in move_pages_pte(),
per David Hildenbrand
- added check for PAE before split_huge_pmd() to avoid the split if the
move operation can't be done
- replaced calls to default_huge_page_size() with read_pmd_pagesize() in
uffd_move_pmd test, per David Hildenbrand
- fixed the condition in uffd_move_test_common() checking if area
alignment is needed
Changes since v4 [9]:
- added Acked-by in patch 1, per Peter Xu
- added description for ctx, mm and mode parameters of move_pages(),
per kernel test robot
- added Reviewed-by's, per Peter Xu and Axel Rasmussen
- removed unused operations in uffd_test_case_ops
- refactored uffd-unit-test changes to avoid using global variables and
handle pmd moves without page size overrides, per Peter Xu
Changes since v3 [8]:
- changed retry path in folio_lock_anon_vma_read() to unlock and then
relock RCU, per Peter Xu
- removed cross-mm support from initial patchset, per David Hildenbrand
- replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand
- added missing cache flushing, per Lokesh Gidra and Peter Xu
- updated manpage text in the patch description, per Peter Xu
- renamed internal functions from "remap" to "move", per Peter Xu
- added mmap_changing check after taking mmap_lock, per Peter Xu
- changed uffd context check to ensure dst_mm is registered onto uffd we
are operating on, Peter Xu and David Hildenbrand
- changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand
- fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot
- comments cleanup, per David Hildenbrand and Peter Xu
- checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand
- prevent moving pinned pages, per Peter Xu
- changed uffd tests to call move uffd_test_ctx_clear() at the end of the
test run instead of in the beginning of the next run
- added support for testcase-specific ops
- added test for moving PMD-aligned blocks
Changes since v2 [5]:
- renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand
- rebase over mm-unstable to use folio_move_anon_rmap(),
per David Hildenbrand
- added text for manpage explaining DONTFORK and KSM requirements for this
feature, per David Hildenbrand
- check for anon_vma changes in the fast path of folio_lock_anon_vma_read,
per Peter Xu
- updated the title and description of the first patch,
per David Hildenbrand
- updating comments in folio_lock_anon_vma_read() explaining the need for
anon_vma checks, per David Hildenbrand
- changed all mapcount checks to PageAnonExclusive, per Jann Horn and
David Hildenbrand
- changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS,
per Jann Horn
- added a check for PTE change after folio is locked in remap_pages_pte(),
per Jann Horn
- added handling of PMD migration entries and bailout when pmd_devmap(),
per Jann Horn
- added checks to ensure both src and dst VMAs are writable, per Peter Xu
- added UFFD_FEATURE_MOVE, per Peter Xu
- removed obsolete comments, per Peter Xu
- renamed remap_anon_pte to remap_present_pte, per Peter Xu
- added a comment for folio_get_anon_vma() explaining the need for
anon_vma checks, per Peter Xu
- changed error handling in remap_pages() to make it more clear,
per Peter Xu
- changed EFAULT to EAGAIN to retry when a hugepage appears or disappears
from under us, per Peter Xu
- added links to previous upstreaming attempts, per David Hildenbrand
Changes since v1 [4]:
- add mmget_not_zero in userfaultfd_remap, per Jann Horn
- removed extern from function definitions, per Matthew Wilcox
- converted to folios in remap_pages_huge_pmd, per Matthew Wilcox
- use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand
- handle pgtable transfers between MMs, per Jann Horn
- ignore concurrent A/D pte bit changes, per Jann Horn
- split functions into smaller units, per David Hildenbrand
- test for folio_test_large in remap_anon_pte, per Matthew Wilcox
- use pte_swp_exclusive for swapcount check, per David Hildenbrand
- eliminated use of mmu_notifier_invalidate_range_start_nonblock,
per Jann Horn
- simplified THP alignment checks, per Jann Horn
- refactored the loop inside remap_pages, per Jann Horn
- additional clarifying comments, per Jann Horn
Main changes since Andrea's last version [1]:
- Trivial translations from page to folio, mmap_sem to mmap_lock
- Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its
possible failure
- Move pte mapping into remap_pages_pte to allow for retries when source
page or anon_vma is contended. Since pte_offset_map_nolock() start RCU
read section, we can't block anymore after mapping a pte, so have to unmap
the ptesm do the locking and retry.
- Add and use anon_vma_trylock_write() to avoid blocking while in RCU
read section.
- Accommodate changes in mmu_notifier_range_init() API, switch to
mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in
RCU read section.
- Open-code now removed __swp_swapcount()
- Replace pmd_read_atomic() with pmdp_get_lockless()
- Add new selftest for UFFDIO_MOVE
[1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc…
[2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha…
[3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj…
[4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/
[5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/
[6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh…
[7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed…
[8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/
[9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/
[10] https://lore.kernel.org/all/20231121171643.3719880-1-surenb@google.com/
Andrea Arcangeli (2):
mm/rmap: support move to different root anon_vma in
folio_move_anon_rmap()
userfaultfd: UFFDIO_MOVE uABI
Suren Baghdasaryan (3):
selftests/mm: call uffd_test_ctx_clear at the end of the test
selftests/mm: add uffd_test_case_ops to allow test case-specific
operations
selftests/mm: add UFFDIO_MOVE ioctl test
Documentation/admin-guide/mm/userfaultfd.rst | 3 +
fs/userfaultfd.c | 72 +++
include/linux/rmap.h | 5 +
include/linux/userfaultfd_k.h | 11 +
include/uapi/linux/userfaultfd.h | 29 +-
mm/huge_memory.c | 122 ++++
mm/khugepaged.c | 3 +
mm/rmap.c | 30 +
mm/userfaultfd.c | 614 +++++++++++++++++++
tools/testing/selftests/mm/uffd-common.c | 39 +-
tools/testing/selftests/mm/uffd-common.h | 9 +
tools/testing/selftests/mm/uffd-stress.c | 5 +-
tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++
13 files changed, 1130 insertions(+), 4 deletions(-)
--
2.43.0.rc2.451.g8631bc7472-goog
If we try to access argument which is pointer to const void, it's an
UNKNOWN type, verifier will fail to load.
Use is_void_or_int_ptr to check if type is void or int pointer.
Add a selftest to check it.
---
KaFai Wan (2):
bpf: Allow access to const void pointer arguments in tracing programs
selftests/bpf: Add test to access const void pointer argument in
tracing program
kernel/bpf/btf.c | 13 +++----------
net/bpf/test_run.c | 8 +++++++-
.../selftests/bpf/progs/verifier_btf_ctx_access.c | 12 ++++++++++++
3 files changed, 22 insertions(+), 11 deletions(-)
Changelog:
v2->v3: Addressed comments from jirka
- remove duplicate checks for void pointer
Details in here:
https://lore.kernel.org/bpf/20250416161756.1079178-1-kafai.wan@hotmail.com/
v1->v2: Addressed comments from jirka
- use btf_type_is_void to check if type is void
- merge is_void_ptr and is_int_ptr to is_void_or_int_ptr
- fix selftests
Details in here:
https://lore.kernel.org/all/20250412170626.3638516-1-kafai.wan@hotmail.com/
--
2.43.0
When the select of PRIME_MUMBERS was removed from it's KUnit test
Kconfig nothing was added to the KUnit configs, meaning that when run
via the KUnit runner the tests are neither built nor run. Add
PRIME_NUMBERS to all_tests.config so they are enabled when the KUnit
runner builds the kernel.
Fixes: 3f2925174f8b ("lib/prime_numbers: KUnit test should not select PRIME_NUMBERS")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/kunit/configs/all_tests.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index cdd9782f9646..7bb885b0c32d 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -43,6 +43,8 @@ CONFIG_REGMAP_BUILD=y
CONFIG_AUDIT=y
+CONFIG_PRIME_NUMBERS=y
+
CONFIG_SECURITY=y
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_LANDLOCK=y
---
base-commit: 9c32cda43eb78f78c73aee4aa344b777714e259b
change-id: 20250422-lib-fix-prime-numbers-kunit-323659c2cfe2
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Steven Rostedt <rostedt(a)goodmis.org>
Running the following commands was broken:
# cd /sys/kernel/tracing
# echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter
# echo 1 > events/syscalls/sys_enter_openat/enable
# ls /proc/$$/maps
# cat trace
And would produce nothing when it should have produced something like:
ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0)
Add a test to check this case so that it will be caught if it breaks
again.
Link: https://lore.kernel.org/linux-trace-kernel/20250417183003.505835fb@gandalf.…
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v1: https://lore.kernel.org/20250417223323.3edb4f6c@batman.local.home
- Use $TMPDIR instead of $TESTDIR as test file (Masami Hiramatsu)
.../test.d/filter/event-filter-function.tc | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 118247b8dd84..c62165fabd0c 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -80,6 +80,26 @@ if [ $misscnt -gt 0 ]; then
exit_fail
fi
+# Check strings too
+if [ -f events/syscalls/sys_enter_openat/filter ]; then
+ DIRNAME=`basename $TMPDIR`
+ echo "filename.ustring ~ \"*$DIRNAME*\"" > events/syscalls/sys_enter_openat/filter
+ echo 1 > events/syscalls/sys_enter_openat/enable
+ echo 1 > tracing_on
+ ls /bin/sh
+ nocnt=`grep openat trace | wc -l`
+ ls $TMPDIR
+ echo 0 > tracing_on
+ hitcnt=`grep openat trace | wc -l`;
+ echo 0 > events/syscalls/sys_enter_openat/enable
+ if [ $nocnt -gt 0 ]; then
+ exit_fail
+ fi
+ if [ $hitcnt -eq 0 ]; then
+ exit_fail
+ fi
+fi
+
reset_events_filter
exit 0
--
2.47.2