August 2024 - Linux-kselftest-mirror

[PATCH v4 0/2] selftests/resctrl: SNC kernel support discovery

by Maciej Wieczor-Retman

Changes v4: - Printing SNC warnings at the start of every test. - Printing SNC warnings at the end of every relevant test. - Remove global snc_mode variable, consolidate snc detection functions into one. - Correct minor mistakes. Changes v3: - Reworked patch 2. - Changed minor things in patch 1 like function name and made corrections to the patch message. Changes v2: - Removed patches 2 and 3 since now this part will be supported by the kernel. Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory into multiple NUMA nodes. When enabled, NUMA-aware applications can achieve better performance on bigger server platforms. SNC support in the kernel was merged into x86/cache [1]. With SNC enabled and kernel support in place all the tests will function normally (aside from effective cache size). There might be a problem when SNC is enabled but the system is still using an older kernel version without SNC support. Currently the only message displayed in that situation is a guess that SNC might be enabled and is causing issues. That message also is displayed whenever the test fails on an Intel platform. Add a mechanism to discover kernel support for SNC which will add more meaning and certainty to the error message. Add runtime SNC mode detection and verify how reliable that information is. Series was tested on Ice Lake server platforms with SNC disabled, SNC-2 and SNC-4. The tests were also ran with and without kernel support for SNC. Series applies cleanly on kselftest/next. [1] https://lore.kernel.org/all/20240628215619.76401-1-tony.luck@intel.com/ Previous versions of this series: [v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@inte… [v2] https://lore.kernel.org/all/cover.1715769576.git.maciej.wieczor-retman@inte… [v3] https://lore.kernel.org/all/cover.1719842207.git.maciej.wieczor-retman@inte… Maciej Wieczor-Retman (2): selftests/resctrl: Adjust effective L3 cache size with SNC enabled selftests/resctrl: Adjust SNC support messages tools/testing/selftests/resctrl/cat_test.c | 8 ++ tools/testing/selftests/resctrl/cmt_test.c | 10 +- tools/testing/selftests/resctrl/mba_test.c | 7 + tools/testing/selftests/resctrl/mbm_test.c | 9 +- tools/testing/selftests/resctrl/resctrl.h | 7 + .../testing/selftests/resctrl/resctrl_tests.c | 8 +- tools/testing/selftests/resctrl/resctrlfs.c | 130 ++++++++++++++++++ 7 files changed, 174 insertions(+), 5 deletions(-) -- 2.45.2

1 year, 1 month

2
8
0 0

[PATCH] selftests: riscv: Allow mmap test to compile on 32-bit

by Charlie Jenkins

Macros needed for 32-bit compilations were hidden behind 64-bit riscv ifdefs. Fix the 32-bit compilations by moving macros to allow the memory_layout test to run on 32-bit. Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com> Fixes: 73d05262a2ca ("selftests: riscv: Generalize mm selftests") --- tools/testing/selftests/riscv/mm/mmap_test.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/riscv/mm/mmap_test.h b/tools/testing/selftests/riscv/mm/mmap_test.h index 3b29ca3bb3d4..1c3313c201d5 100644 --- a/tools/testing/selftests/riscv/mm/mmap_test.h +++ b/tools/testing/selftests/riscv/mm/mmap_test.h @@ -48,11 +48,11 @@ uint32_t random_addresses[] = { }; #endif -// Only works on 64 bit -#if __riscv_xlen == 64 #define PROT (PROT_READ | PROT_WRITE) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS) +// Only works on 64 bit +#if __riscv_xlen == 64 /* mmap must return a value that doesn't use more bits than the hint address. */ static inline unsigned long get_max_value(unsigned long input) { @@ -80,6 +80,8 @@ static inline unsigned long get_max_value(unsigned long input) }) #endif /* __riscv_xlen == 64 */ +#define TEST_MMAPS do { } while (0) + static inline int memory_layout(void) { void *value1 = mmap(NULL, sizeof(int), PROT, FLAGS, 0, 0); --- base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b change-id: 20240807-mmap_tests__fixes-651cc2b5fead -- - Charlie

1 year, 1 month

2
1
0 0

[PATCH bpf-next v4 0/8] libbpf, selftests/bpf: Support cross-endian usage

by Tony Ambardar

Hello all, This patch series targets a long-standing BPF usability issue - the lack of general cross-compilation support - by enabling cross-endian usage of libbpf and bpftool, as well as supporting cross-endian build targets for selftests/bpf. Benefits include improved BPF development and testing for embedded systems based on e.g. big-endian MIPS, more build options e.g for s390x systems, and better accessibility to the very latest test tools e.g. 'test_progs'. Initial development and testing used mips64, since this arch makes switching the build byte-order trivial and is thus very handy for A/B testing. However, it lacks some key features (bpf2bpf call, kfuncs, etc) making for poor selftests/bpf coverage. Final testing takes the kernel and selftests/bpf cross-built from x86_64 to s390x, and runs the result under QEMU/s390x. That same configuration could also be used on kernel-patches/bpf CI for regression testing endian support or perhaps load-sharing s390x builds across x86_64 systems. This thread includes some background regarding testing on QEMU/s390x and the generally favourable results: https://lore.kernel.org/bpf/ZsEcsaa3juxxQBUf@kodidev-ubuntu/ Feedback and suggestions are welcome! Best regards, Tony Changelog: --------- v3 -> v4: - fix a use-after-free ELF data-handling error causing rare CI failures - move bswap functions for func/line/core-relo records to internal header - use bswap functions also for info blobs in light skeleton v2 -> v3: (feedback from Andrii) - improve some log and commit message formatting - restructure BTF.ext endianness safety checks and byte-swapping - use BTF.ext info record definitions for swapping, require BTF v1 - follow BTF API implementation more closely for BTF.ext - explicitly reject loading non-native endianness program into kernel - simplify linker output byte-order setting - drop redundant safety checks during linking - simplify endianness macro and improve blob setup code for light skel - no unexpected test failures after cross-compiling x86_64 -> s390x v1 -> v2: - fixed a light skeleton bug causing test_progs 'map_ptr' failure - simplified some BTF.ext related endianness logic - remove an 'inline' usage related to CI checkpatch failure - improve some formatting noted by checkpatch warnings - unexpected 'test_progs' failures drop 3 -> 2 (x86_64 to s390x cross) Tony Ambardar (8): libbpf: Improve log message formatting libbpf: Fix header comment typos for BTF.ext libbpf: Fix output .symtab byte-order during linking libbpf: Support BTF.ext loading and output in either endianness libbpf: Support opening bpf objects of either endianness libbpf: Support linking bpf objects of either endianness libbpf: Support creating light skeleton of either endianness selftests/bpf: Support cross-endian building tools/lib/bpf/bpf_gen_internal.h | 1 + tools/lib/bpf/btf.c | 196 ++++++++++++++++++++++++--- tools/lib/bpf/btf.h | 3 + tools/lib/bpf/btf_dump.c | 2 +- tools/lib/bpf/btf_relocate.c | 2 +- tools/lib/bpf/gen_loader.c | 187 +++++++++++++++++++------ tools/lib/bpf/libbpf.c | 54 ++++++-- tools/lib/bpf/libbpf.map | 2 + tools/lib/bpf/libbpf_internal.h | 48 ++++++- tools/lib/bpf/linker.c | 92 ++++++++++--- tools/lib/bpf/relo_core.c | 2 +- tools/lib/bpf/skel_internal.h | 3 +- tools/testing/selftests/bpf/Makefile | 7 +- 13 files changed, 502 insertions(+), 97 deletions(-) -- 2.34.1

1 year, 2 months

4
27
0 0

[PATCH] selftests: mm: Fix build errors on armhf

by Muhammad Usama Anjum

The __NR_mmap isn't found on armhf. The mmap() is commonly available system call and its wrapper is presnet on all architectures. So it should be used directly. It solves problem for armhf and doesn't create problem for architectures as well. Remove sys_mmap() functions as they aren't doing anything else other than calling mmap(). There is no need to set errno = 0 manually as glibc always resets it. For reference errors are as following: CC seal_elf seal_elf.c: In function 'sys_mmap': seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function) 39 | sret = (void *) syscall(__NR_mmap, addr, len, prot, | ^~~~~~~~~ mseal_test.c: In function 'sys_mmap': mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function) 90 | sret = (void *) syscall(__NR_mmap, addr, len, prot, | ^~~~~~~~~ Cc: stable(a)vger.kernel.org Fixes: 4926c7a52de7 ("selftest mm/mseal memory sealing") Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- tools/testing/selftests/mm/mseal_test.c | 37 +++++++++---------------- tools/testing/selftests/mm/seal_elf.c | 13 +-------- 2 files changed, 14 insertions(+), 36 deletions(-) diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index a818f010de479..bfcea5cf9a484 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -81,17 +81,6 @@ static int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot, return sret; } -static void *sys_mmap(void *addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long fd, unsigned long offset) -{ - void *sret; - - errno = 0; - sret = (void *) syscall(__NR_mmap, addr, len, prot, - flags, fd, offset); - return sret; -} - static int sys_munmap(void *ptr, size_t size) { int sret; @@ -172,7 +161,7 @@ static void setup_single_address(int size, void **ptrOut) { void *ptr; - ptr = sys_mmap(NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + ptr = mmap(NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); *ptrOut = ptr; } @@ -181,7 +170,7 @@ static void setup_single_address_rw(int size, void **ptrOut) void *ptr; unsigned long mapflags = MAP_ANONYMOUS | MAP_PRIVATE; - ptr = sys_mmap(NULL, size, PROT_READ | PROT_WRITE, mapflags, -1, 0); + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, mapflags, -1, 0); *ptrOut = ptr; } @@ -205,7 +194,7 @@ bool seal_support(void) void *ptr; unsigned long page_size = getpagesize(); - ptr = sys_mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + ptr = mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (ptr == (void *) -1) return false; @@ -481,8 +470,8 @@ static void test_seal_zero_address(void) int prot; /* use mmap to change protection. */ - ptr = sys_mmap(0, size, PROT_NONE, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + ptr = mmap(0, size, PROT_NONE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); FAIL_TEST_IF_FALSE(ptr == 0); size = get_vma_size(ptr, &prot); @@ -1209,8 +1198,8 @@ static void test_seal_mmap_overwrite_prot(bool seal) } /* use mmap to change protection. */ - ret2 = sys_mmap(ptr, size, PROT_NONE, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + ret2 = mmap(ptr, size, PROT_NONE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); if (seal) { FAIL_TEST_IF_FALSE(ret2 == MAP_FAILED); FAIL_TEST_IF_FALSE(errno == EPERM); @@ -1240,8 +1229,8 @@ static void test_seal_mmap_expand(bool seal) } /* use mmap to expand. */ - ret2 = sys_mmap(ptr, size, PROT_READ, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + ret2 = mmap(ptr, size, PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); if (seal) { FAIL_TEST_IF_FALSE(ret2 == MAP_FAILED); FAIL_TEST_IF_FALSE(errno == EPERM); @@ -1268,8 +1257,8 @@ static void test_seal_mmap_shrink(bool seal) } /* use mmap to shrink. */ - ret2 = sys_mmap(ptr, 8 * page_size, PROT_READ, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + ret2 = mmap(ptr, 8 * page_size, PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); if (seal) { FAIL_TEST_IF_FALSE(ret2 == MAP_FAILED); FAIL_TEST_IF_FALSE(errno == EPERM); @@ -1650,7 +1639,7 @@ static void test_seal_discard_ro_anon_on_filebacked(bool seal) ret = fallocate(fd, 0, 0, size); FAIL_TEST_IF_FALSE(!ret); - ptr = sys_mmap(NULL, size, PROT_READ, mapflags, fd, 0); + ptr = mmap(NULL, size, PROT_READ, mapflags, fd, 0); FAIL_TEST_IF_FALSE(ptr != MAP_FAILED); if (seal) { @@ -1680,7 +1669,7 @@ static void test_seal_discard_ro_anon_on_shared(bool seal) int ret; unsigned long mapflags = MAP_ANONYMOUS | MAP_SHARED; - ptr = sys_mmap(NULL, size, PROT_READ, mapflags, -1, 0); + ptr = mmap(NULL, size, PROT_READ, mapflags, -1, 0); FAIL_TEST_IF_FALSE(ptr != (void *)-1); if (seal) { diff --git a/tools/testing/selftests/mm/seal_elf.c b/tools/testing/selftests/mm/seal_elf.c index 7aa1366063e4e..d9f8ba8d5050b 100644 --- a/tools/testing/selftests/mm/seal_elf.c +++ b/tools/testing/selftests/mm/seal_elf.c @@ -30,17 +30,6 @@ static int sys_mseal(void *start, size_t len) return sret; } -static void *sys_mmap(void *addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long fd, unsigned long offset) -{ - void *sret; - - errno = 0; - sret = (void *) syscall(__NR_mmap, addr, len, prot, - flags, fd, offset); - return sret; -} - static inline int sys_mprotect(void *ptr, size_t size, unsigned long prot) { int sret; @@ -56,7 +45,7 @@ static bool seal_support(void) void *ptr; unsigned long page_size = getpagesize(); - ptr = sys_mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + ptr = mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (ptr == (void *) -1) return false; -- 2.39.2

1 year, 2 months

4
6
0 0

[PATCH] kunit: tool: Build compile_commands.json

by Brendan Jackman

compile_commands.json is used by clangd[1] to provide code navigation and completion functionality to editors. See [2] for an example configuration that includes this functionality for VSCode. It can currently be built manually when using kunit.py, by running: ./scripts/clang-tools/gen_compile_commands.py -d .kunit With this change however, it's built automatically so you don't need to manually keep it up to date. Unlike the manual approach, having make build the compile_commands.json means that it appears in the build output tree instead of at the root of the source tree, so you'll need to add --compile-commands-dir= to your clangd args for it to be found. [1] https://clangd.llvm.org/ [2] https://github.com/FlorentRevest/linux-kernel-vscode Signed-off-by: Brendan Jackman <jackmanb(a)google.com> --- tools/testing/kunit/kunit_kernel.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py index 7254c110ff23..61931c4926fd 100644 --- a/tools/testing/kunit/kunit_kernel.py +++ b/tools/testing/kunit/kunit_kernel.py @@ -72,7 +72,8 @@ class LinuxSourceTreeOperations: raise ConfigError(e.output.decode()) def make(self, jobs: int, build_dir: str, make_options: Optional[List[str]]) -> None: - command = ['make', 'ARCH=' + self._linux_arch, 'O=' + build_dir, '--jobs=' + str(jobs)] + command = ['make', 'all', 'compile_commands.json', 'ARCH=' + self._linux_arch, + 'O=' + build_dir, '--jobs=' + str(jobs)] if make_options: command.extend(make_options) if self._cross_compile: --- base-commit: 3c999d1ae3c75991902a1a7dad0cb62c2a3008b4 change-id: 20240516-kunit-compile-commands-d994074fc2be Best regards, -- Brendan Jackman <jackmanb(a)google.com>

1 year, 2 months

3
2
0 0

[PATCH v2] selftest/powerpc/benchmark: remove requirement libc-dev

by Madhavan Srinivasan

Currently exec-target.c file is linked as static and this post a requirement to install libc dev package to build. Without it, build-break when compiling selftest/powerpc/benchmark. CC exec_target /usr/bin/ld: cannot find -lc: No such file or directory collect2: error: ld returned 1 exit status exec_target.c is using "syscall" library function which could be replaced with a inline assembly and the same is proposed as a fix here. Suggested-by: Michael Ellerman <mpe(a)ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy(a)linux.ibm.com> --- Chnagelog v1: - Add comment for clobber register and proper list of clobber registers as suggested by Michael Ellerman and Christophe Leroy .../selftests/powerpc/benchmarks/Makefile | 2 +- .../selftests/powerpc/benchmarks/exec_target.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/powerpc/benchmarks/Makefile b/tools/testing/selftests/powerpc/benchmarks/Makefile index 1321922038d0..ca4483c238b9 100644 --- a/tools/testing/selftests/powerpc/benchmarks/Makefile +++ b/tools/testing/selftests/powerpc/benchmarks/Makefile @@ -18,4 +18,4 @@ $(OUTPUT)/context_switch: LDLIBS += -lpthread $(OUTPUT)/fork: LDLIBS += -lpthread -$(OUTPUT)/exec_target: CFLAGS += -static -nostartfiles +$(OUTPUT)/exec_target: CFLAGS += -nostartfiles diff --git a/tools/testing/selftests/powerpc/benchmarks/exec_target.c b/tools/testing/selftests/powerpc/benchmarks/exec_target.c index c14b0fc1edde..a6408d3f26cd 100644 --- a/tools/testing/selftests/powerpc/benchmarks/exec_target.c +++ b/tools/testing/selftests/powerpc/benchmarks/exec_target.c @@ -7,10 +7,22 @@ */ #define _GNU_SOURCE -#include <unistd.h> #include <sys/syscall.h> void _start(void) { - syscall(SYS_exit, 0); + asm volatile ( + "li %%r0, %[sys_exit];" + "li %%r3, 0;" + "sc;" + : + : [sys_exit] "i" (SYS_exit) + /* + * "sc" will clobber r0, r3-r13, cr0, ctr, xer and memory. + * Even though sys_exit never returns, handle clobber + * registers. + */ + : "r0", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", + "r11", "r12", "r13", "cr0", "ctr", "xer", "memory" + ); } -- 2.45.2

1 year, 2 months

3
2
0 0

[PATCH v2 0/2] Exposing nice CPU usage to userspace

by Joshua Hahn

From: Joshua Hahn <joshua.hahn6(a)gmail.com> v1 -> v2: Edited commit messages for clarity. Niced CPU usage is a metric reported in host-level /prot/stat, but is not reported in cgroup-level statistics in cpu.stat. However, when a host contains multiple tasks across different workloads, it becomes difficult to gauge how much of the task is being spent on niced processes based on /proc/stat alone, since host-level metrics do not provide this cgroup-level granularity. Exposing this metric will allow users to accurately probe the niced CPU metric for each workload, and make more informed decisions when directing higher priority tasks. Joshua Hahn (2): Tracking cgroup-level niced CPU time Selftests for niced CPU statistics include/linux/cgroup-defs.h | 1 + kernel/cgroup/rstat.c | 16 ++++- tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++ 3 files changed, 86 insertions(+), 3 deletions(-) -- 2.43.5

1 year, 2 months

3
6
0 0

[PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests

by Reinette Chatre

The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald Rapids systems. The test failures result from the following two properties of these systems: 1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl MBA and MBM selftests measure memory traffic for which a hardcoded 250MB buffer has been sufficient so far. On platforms with L3 cache larger than the buffer, the buffer fits in the L3 cache and thus no/very little memory traffic is generated during the "memory bandwidth" tests. 2) Some platform features, for example RAS features or memory performance features that generate memory traffic may drive accesses that are counted differently by performance counters and MBM respectively, for instance generating "overhead" traffic which is not counted against any specific RMID. Until now these counting differences have always been "in the noise". On Emerald Rapids systems the maximum MBA throttling (10% memory bandwidth) throttles memory bandwidth to where memory accesses by these other platform features push the memory bandwidth difference between memory controller performance counters and resctrl (MBM) beyond the tests' hardcoded tolerance. Make the tests more robust against platform variations: 1) Let the buffer used by memory bandwidth tests be guided by the size of the L3 cache. 2) Larger buffers require longer initialization time before the buffer can be used to measurement. Rework the tests to ensure that buffer initialization is complete before measurements start. 3) Do not compare performance counters and MBM measurements at low bandwidth. The value of "low" is hardcoded to 750MiB based on measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake systems. This limit is not applicable to AMD systems since it only applies to the MBA and MBM tests that are isolated to Intel. [1] https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plat… Reinette Chatre (6): selftests/resctrl: Fix sparse warnings selftests/resctrl: Ensure measurements skip initialization of default benchmark selftests/resctrl: Simplify benchmark parameter passing selftests/resctrl: Use cache size to determine "fill_buf" buffer size selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth selftests/resctrl: Keep results from first test run tools/testing/selftests/resctrl/cmt_test.c | 33 +-- tools/testing/selftests/resctrl/fill_buf.c | 19 +- tools/testing/selftests/resctrl/mba_test.c | 26 +- tools/testing/selftests/resctrl/mbm_test.c | 25 +- tools/testing/selftests/resctrl/resctrl.h | 57 +++-- .../testing/selftests/resctrl/resctrl_tests.c | 15 +- tools/testing/selftests/resctrl/resctrl_val.c | 223 +++++------------- 7 files changed, 152 insertions(+), 246 deletions(-) -- 2.46.0

1 year, 2 months

2
29
0 0

[PATCH net-next v24 00/13] Device Memory TCP

by Mina Almasry

v24: https://patchwork.kernel.org/project/netdevbpf/list/?series=884556&state=* ==== Changes: - Fix failing ynl regen error. - Error path fixes & extack error messages in dmabuf binding. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v24/ v23: https://patchwork.kernel.org/project/netdevbpf/list/?series=882978&state=* ==== Fixing relatively minor issues called out in v22. (thanks again!) Mostly code cleanups, extack error messages, and minor reworks. Nothing major really changed, so the exact changes per commit is called in the commit messages. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v23/ v22: https://patchwork.kernel.org/project/netdevbpf/list/?series=881158&state=* ==== v22 aims to resolve the pending issue pointed to in v21, which is the interaction with xdp. In this series I rebase on top of the minor refactor which refactors propagating xdp configuration to slave devices: https://patchwork.kernel.org/project/netdevbpf/list/?series=881994&state=* I then disable setting xdp on devices using memory providers, and propagating xdp configuration to devices using memory providers. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v22/ v21: https://patchwork.kernel.org/project/netdevbpf/list/?series=880735&state=* ==== v20 addressed some comments and resolved a test failure, but introduced an unfortunate build error with a config edge case I wasn't testing. v21 simply resolves that error. Major Changes: - Resolve build error with CONFIG_PAGE_POOL=n && CONFIG_NET=y Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v21/ v20: https://patchwork.kernel.org/project/netdevbpf/list/?series=879373&state=* ==== v20 aims to resolve a couple of bug reports against v19, and addresses some review comments around the page_pool_check_memory_provider mechanism. Major changes: - Test edge cases such as header split disabled in selftest. - Change `offset = 0` back to `offset = offset - start` to resolve issue found in RX path by Taehee (thanks!) - Address a few comments around page_pool_check_memory_provider() from Pavel & Jakub. - Removed some unnecessary includes across various patches in the series. - Removed unnecessary EXPORT_SYMBOL(page_pool_mem_providers) (Jakub). - Fix regression caused by incorrect dev_get_max_mp_channel check, along with rename (Jakub). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v20/ v19: https://patchwork.kernel.org/project/netdevbpf/list/?series=876852&state=* ==== v18 got a thorough review (thanks!), and this iteration addresses the feedback. Major changes: - Prevent deactivating mp bound queues. - Prevent installing xdp on mp bound netdevs, or installing mps on xdp installed netdevs. - Fix corner cases in netlink API vis-a-vis missing attributes. - Iron out the unreadable netmem driver support story. To be honest, the conversation with Jakub & Pavel got a bit confusing for me. I've implemented an approach in this set that makes sense to me, and AFAICT, addresses the requirements. It may be good as-is, or it may be a conversation starter/continuer. To be honest IMO there are many ways to skin this cat and I don't see an extremely strong reason to go for one approach over another. Here is one approach you may like. - Don't reset niov dma_addr on allocation & free. - Add some tests to the selftest that catches some of the issues around missing netlink attributes or deactivating mp-bound queues. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v19/ v18: https://patchwork.kernel.org/project/netdevbpf/list/?series=874848&state=* ==== v17 got minor feedback: (a) to beef up the description on patch 1 and (b) to remove the leading underscores in the header definition. I applied (a). (b) seems to be against current conventions so I did not apply before further discussion. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=* ==== v16 also got a very thorough review and some testing (thanks again!). Thes version addresses all the concerns reported on v15, in terms of feedback and issues reported. Major changes: - Use ASSERT_RTNL. - Moved around some of the page_pool helpers definitions so I can hide some netmem helpers in private files as Jakub suggested. - Don't make every net_iov hold a ref on the binding as Jakub suggested. - Fix issue reported by Taehee where we access queues after they have been freed. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=* ==== v15 got a thorough review and some testing, and this version addresses almost all the feedback. Some more minor comments where the authors said it could be done later, I left out. Major changes: - Addition of dma-buf introspection to page-pool-get and queue-get. - Fixes to selftests suggested by Taehee. - Fixes to documentation suggested by Donald. - A couple of suggestions and fixes to TCP patches by Eric and David. - Fixes to number assignements suggested by Arnd. - Use rtnl_lock()ing to guard against queue reconfiguration while the page_pool initialization is happening. (Jakub). - Fixes to a few warnings reproduced by Taehee. - Fixes to dma-buf binding suggested by Taehee and Jakub. - Fixes to netlink UAPI suggested by Jakub - Applied a number of Reviewed-bys and Acked-bys (including ones I lost from v13+). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v16/ One caveat: Taehee reproduced a KASAN warning and reported it here: https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd… I estimate the issue to be minor and easily fixable: https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW… I hope to be able to follow up with a fix to net tree as net-next closes imminently, but if this iteration doesn't make it in, I will repost with a fix squashed after net-next reopens, no problem. v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=* ==== No material changes in this version, only a fix to linking against libynl.a from the last version. Per Jakub's instructions I've pulled one of his patches into this series, and now use the new libynl.a correctly, I hope. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v15/ v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=… ==== No material changes in this version. Only rebase and re-verification on top of net-next. v13, I think, raced with commit ebad6d0334793 ("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to net-next that caused a patchwork failure to apply. This series should apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded IP_GRE config"). I did not wait the customary 24hr as Jakub said it's OK to repost as soon as I build test the rebased version: https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/ v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=… ==== Major changes: -------------- This iteration addresses Pavel's review comments, applies his reviewed-by's, and seeks to fix the patchwork build error (sorry!). As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v13/ v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=* ==== Major changes: -------------- This iteration only addresses one minor comment from Pavel with regards to the trace printing of netmem, and the patchwork build error introduced in v11 because I missed doing an allmodconfig build, sorry. Other than that v11, AFAICT, received no feedback. There is one discussion about how the specifics of plugging io uring memory through the page pool, but not relevant to content in this particular patchset, AFAICT. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v12/ v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=* ==== Major Changes: -------------- v11 addresses feedback received in v10. The major change is the removal of the memory provider ops as requested by Christoph. We still accomplish the same thing, but utilizing direct function calls with if statements rather than generic ops. Additionally address sparse warnings, bugs and review comments from folks that reviewed. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v11/ Detailed changelog: ------------------- - Fixes in netdev_rx_queue_restart() from Pavel & David. - Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for custom page providers") from the series to address Christoph's feedback and rebased other patches on the series on this change. - Fixed build errors with CONFIG_DMA_SHARED_BUFFER && !CONFIG_GENERIC_ALLOCATOR build. - Fixed sparse warnings pointed out by Paolo. - Drop unnecessary gro_pull_from_frag0 checks. - Added Bagas reviewed-by to docs. v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=* ==== Major Changes: -------------- v9 was sent right before the merge window closed (sorry!). v10 is almost a re-send of the series now that the merge window re-opened. Only rebased to latest net-next and addressed some minor iterative comments received on v9. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v10/ Detailed changelog: ------------------- - Fixed tokens leaking in DONTNEED setsockopt (Nikolay). - Moved net_iov_dma_addr() to devmem.c and made it a devmem specific helpers (David). - Rename hook alloc_pages to alloc_netmems as alloc_pages is now preprocessor macro defined and causes a build error. v9: === Major Changes: -------------- GVE queue API has been merged. Submitting this version as non-RFC after rebasing on top of the merged API, and dropped the out of tree queue API I was carrying on github. Addressed the little feedback v8 has received. Detailed changelog: ------------------ - Added new patch from David Wei to this series for netdev_rx_queue_restart() - Fixed sparse error. - Removed CONFIG_ checks in netmem_is_net_iov() - Flipped skb->readable to skb->unreadable - Minor fixes to selftests & docs. RFC v8: ======= Major Changes: -------------- - Fixed build error generated by patch-by-patch build. - Applied docs suggestions from Randy. RFC v7: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the feedback RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v7/ Detailed changelog: - Use admin-perm in netlink API. - Addressed feedback from Jakub with regards to netlink API implementation. - Renamed devmem.c functions to something more appropriate for that file. - Improve the performance seen through the page_pool benchmark. - Fix the value definition of all the SO_DEVMEM_* uapi. - Various fixes to documentation. Perf - page-pool benchmark: --------------------------- Improved performance of bench_page_pool_simple.ko tests compared to v6: https://pastebin.com/raw/v5dYRg8L net-next base: 8 cycle fast path. RFC v6: 10 cycle fast path. RFC v7: 9 cycle fast path. RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path, same as baseline. Perf - Devmem TCP benchmark: --------------------- Perf is about the same regardless of the changes in v7, namely the removal of the static_branch_unlikely to improve the page_pool benchmark performance: 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. RFC v6: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the little feedback RFCv5 received. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v6/ This version also comes with some performance data recorded in the cover letter (see below changelog). Detailed changelog: - Rebased on top of the merged netmem_ref changes. - Converted skb->dmabuf to skb->readable (Pavel). Pavel's original suggestion was to remove the skb->dmabuf flag entirely, but when I looked into it closely, I found the issue that if we remove the flag we have to dereference the shinfo(skb) pointer to obtain the first frag to tell whether an skb is readable or not. This can cause a performance regression if it dirties the cache line when the shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf flag into a generic skb->readable flag which can be re-used by io_uring 0-copy RX. - Squashed a few locking optimizations from Eric Dumazet in the RX path and the DEVMEM_DONTNEED setsockopt. - Expanded the tests a bit. Added validation for invalid scenarios and added some more coverage. Perf - page-pool benchmark: --------------------------- bench_page_pool_simple.ko tests with and without these changes: https://pastebin.com/raw/ncHDwAbn AFAIK the number that really matters in the perf tests is the 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8 cycles without the changes but there is some 1 cycle noise in some results. With the patches this regresses to 9 cycles with the changes but there is 1 cycle noise occasionally running this test repeatedly. Lastly I tried disable the static_branch_unlikely() in netmem_is_net_iov() check. To my surprise disabling the static_branch_unlikely() check reduces the fast path back to 8 cycles, but the 1 cycle noise remains. Perf - Devmem TCP benchmark: --------------------- 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. Major changes in RFC v5: ======================== 1. Rebased on top of 'Abstract page from net stack' series and used the new netmem type to refer to LSB set pointers instead of re-using struct page. 2. Downgraded this series back to RFC and called it RFC v5. This is because this series is now dependent on 'Abstract page from net stack'[1] and the queue API. Both are removed from the series to reduce the patch # and those bits are fairly independent or pre-requisite work. 3. Reworked the page_pool devmem support to use netmem and for some more unified handling. 4. Reworked the reference counting of net_iov (renamed from page_pool_iov) to use pp_ref_count for refcounting. The full changes including the dependent series and GVE page pool support is here: https://github.com/mina/linux/commits/tcpdevmem-rfcv5/ [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774 Major changes in v1: ==================== 1. Implemented MVP queue API ndos to remove the userspace-visible driver reset. 2. Fixed issues in the napi_pp_put_page() devmem frag unref path. 3. Removed RFC tag. Many smaller addressed comments across all the patches (patches have individual change log). Full tree including the rest of the GVE driver changes: https://github.com/mina/linux/commits/tcpdevmem-v1 Changes in RFC v3: ================== 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergeable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Changes in RFC v2: ================== The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ================== * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. Approach #1 was attempted in RFC v1. RFC v2 implements approach #2. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this series is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This series is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this series and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: David Wei <dw(a)davidwei.uk> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Yunsheng Lin <linyunsheng(a)huawei.com> Cc: Shailend Chand <shailend(a)google.com> Cc: Harshitha Ramamurthy <hramamurthy(a)google.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Nikolay Aleksandrov <razor(a)blackwall.org> Cc: Taehee Yoo <ap420073(a)gmail.com> Cc: Donald Hunter <donald.hunter(a)gmail.com> Mina Almasry (13): netdev: add netdev_rx_queue_restart() net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator page_pool: devmem support memory-provider: dmabuf devmem memory provider net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags net: add devmem TCP documentation selftests: add ncdevmem, netcat for devmem TCP netdev: add dmabuf introspection Documentation/netlink/specs/netdev.yaml | 61 +++ Documentation/networking/devmem.rst | 269 +++++++++++ Documentation/networking/index.rst | 1 + arch/alpha/include/uapi/asm/socket.h | 6 + arch/mips/include/uapi/asm/socket.h | 6 + arch/parisc/include/uapi/asm/socket.h | 6 + arch/sparc/include/uapi/asm/socket.h | 6 + include/linux/netdevice.h | 2 + include/linux/skbuff.h | 61 ++- include/linux/skbuff_ref.h | 9 +- include/linux/socket.h | 1 + include/net/devmem.h | 136 ++++++ include/net/mp_dmabuf_devmem.h | 44 ++ include/net/netdev_rx_queue.h | 5 + include/net/netmem.h | 163 ++++++- include/net/page_pool/helpers.h | 39 +- include/net/page_pool/types.h | 22 +- include/net/sock.h | 2 + include/net/tcp.h | 5 +- include/trace/events/page_pool.h | 12 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 13 + include/uapi/linux/uio.h | 17 + net/Kconfig | 5 + net/core/Makefile | 2 + net/core/datagram.c | 6 + net/core/dev.c | 28 +- net/core/devmem.c | 388 ++++++++++++++++ net/core/gro.c | 3 +- net/core/netdev-genl-gen.c | 23 + net/core/netdev-genl-gen.h | 6 + net/core/netdev-genl.c | 137 +++++- net/core/netdev_rx_queue.c | 81 ++++ net/core/netmem_priv.h | 31 ++ net/core/page_pool.c | 117 +++-- net/core/page_pool_priv.h | 46 ++ net/core/page_pool_user.c | 31 +- net/core/skbuff.c | 77 +++- net/core/sock.c | 68 +++ net/ethtool/common.c | 8 + net/ipv4/esp4.c | 3 +- net/ipv4/tcp.c | 261 ++++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 16 + net/ipv4/tcp_minisocks.c | 2 + net/ipv4/tcp_output.c | 5 +- net/ipv6/esp6.c | 3 +- net/packet/af_packet.c | 4 +- net/xdp/xsk_buff_pool.c | 5 + tools/include/uapi/linux/netdev.h | 13 + tools/net/ynl/lib/.gitignore | 1 + tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 9 + tools/testing/selftests/net/ncdevmem.c | 570 ++++++++++++++++++++++++ 54 files changed, 2731 insertions(+), 124 deletions(-) create mode 100644 Documentation/networking/devmem.rst create mode 100644 include/net/devmem.h create mode 100644 include/net/mp_dmabuf_devmem.h create mode 100644 net/core/devmem.c create mode 100644 net/core/netdev_rx_queue.c create mode 100644 net/core/netmem_priv.h create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.46.0.469.g59c65b2a67-goog

1 year, 2 months

3
21
0 0

[PATCH] kunit: Fix kernel-doc for EXPORT_SYMBOL_IF_KUNIT

by Michal Wajdeczko

While kunit/visibility.h is today not included in any generated kernel documentation, also likely due to the fact that none of the existing comments are correctly recognized as kernel-doc, but once we decide to add this header and fix the tool, there will be: ../include/kunit/visibility.h:61: warning: Function parameter or struct member 'symbol' not described in 'EXPORT_SYMBOL_IF_KUNIT' Signed-off-by: Michal Wajdeczko <michal.wajdeczko(a)intel.com> --- Cc: Rae Moar <rmoar(a)google.com> Cc: David Gow <davidgow(a)google.com> --- include/kunit/visibility.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/kunit/visibility.h b/include/kunit/visibility.h index 0dfe35feeec6..efff77b58dd6 100644 --- a/include/kunit/visibility.h +++ b/include/kunit/visibility.h @@ -22,6 +22,7 @@ * EXPORTED_FOR_KUNIT_TESTING namespace only if CONFIG_KUNIT is * enabled. Must use MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING) * in test file in order to use symbols. + * @symbol: the symbol identifier to export */ #define EXPORT_SYMBOL_IF_KUNIT(symbol) EXPORT_SYMBOL_NS(symbol, \ EXPORTED_FOR_KUNIT_TESTING) -- 2.43.0

1 year, 2 months

3
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror August 2024