April 2025 - Linux-kselftest-mirror

[PATCH v3 0/3] introduce PIDFD_SELF* sentinels

by Lorenzo Stoakes

If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like: int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD); ... close(pidfd); Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread. This series introduces sentinels for this purposes which can be passed as the pidfd in this instance rather than having to establish a dummy fd for this purpose. It is useful to refer to both the current thread from the userland's perspective for which we use PIDFD_SELF, and the current process from the userland's perspective, for which we use PIDFD_SELF_PROCESS. There is unfortunately some confusion between the kernel and userland as to what constitutes a process - a thread from the userland perspective is a process in userland, and a userland process is a thread group (more specifically the thread group leader from the kernel perspective). We therefore alias things thusly: * PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID. * PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID. In all of the kernel code we refer to PIDFD_SELF_THREAD and PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and PIDFD_SELF_PROCESS. This matters for cases where, for instance, a user unshare()'s FDs or does thread-specific signal handling and where the user would be hugely confused if the FDs referenced or signal processed referred to the thread group leader rather than the individual thread. We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and assert as much in selftests. All other interfaces except setns() will work implicitly with this new interface, however it doesn't make sense to test waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation. In the case of setns() we explicitly disallow use of PIDFD_SELF* as it doesn't make sense to obtain the namespaces of our own process, and it would require work to implement this functionality there that would be of no use. We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd operations such as open() or poll(), as this would require extensive work and be of no real use. v3: * Do not fput() an invalid fd as reported by kernel test bot. * Fix unintended churn from moving variable declaration. v2: * Fix tests as reported by Shuah. * Correct RFC version lore link. https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl… Non-RFC v1: * Removed RFC tag - there seems to be general consensus that this change is a good idea, but perhaps some debate to be had on implementation. It seems sensible then to move forward with the RFC flag removed. * Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases PIDFD_SELF and PIDFD_SELF_PROCESS respectively. * Updated testing accordingly. https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl… RFC version: https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl… Lorenzo Stoakes (3): pidfd: extend pidfd_get_pid() and de-duplicate pid lookup pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process selftests: pidfd: add tests for PIDFD_SELF_* include/linux/pid.h | 43 +++++- include/uapi/linux/pidfd.h | 15 ++ kernel/exit.c | 3 +- kernel/nsproxy.c | 1 + kernel/pid.c | 73 ++++++--- kernel/signal.c | 26 +--- tools/testing/selftests/pidfd/pidfd.h | 8 + .../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++ .../selftests/pidfd/pidfd_setns_test.c | 11 ++ tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++-- 10 files changed, 342 insertions(+), 55 deletions(-) -- 2.46.2

7 months, 1 week

6
31
0 0

[PATCH RFC v7 0/8] Add NUMA mempolicy support for KVM guest-memfd

by Shivank Garg

KVM's guest-memfd memory backend currently lacks support for NUMA policy enforcement, causing guest memory allocations to be distributed arbitrarily across host NUMA nodes regardless of the policy specified by the VMM. This occurs because conventional userspace NUMA control mechanisms like mbind() are ineffective with guest-memfd, as the memory isn't directly mapped to userspace when allocations occur. This patch-series adds NUMA-aware memory placement for guest_memfd backed KVM guests. Based on community feedback, the approach has evolved as follows: - v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy. - v3: Introduced fbind() syscall for VMM memory-placement configuration. - v4-v6: Current approach using shared_policy support and vm_ops (based on suggestions from David[1] and guest_memfd biweekly upstream calls[2][4]). - v7: Use inodes to store NUMA policy instead of file[5]. == Implementation == This series implements proper NUMA policy support for guest-memfd by: 1. Adding mempolicy-aware allocation APIs to the filemap layer. 2. Add custom inodes (via a dedicated slab-allocated inode cache, kvm_gmem_inode_info) to store NUMA policy and metadata for guest memory. 3. Implementing get/set_policy vm_ops in guest_memfd to support shared policy. With these changes, VMMs can now control guest memory placement by specifying: - Policy modes: default, bind, interleave, or preferred - Host NUMA nodes: List of target nodes for memory allocation Policies only affect future allocations and do not migrate existing memory. This matches mbind(2)'s default behavior which affects only new allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (Not supported for guest_memfd as it is unmovable). This series builds on the existing guest-memfd support in KVM and provides a clean integration path for NUMA-aware memory management in confidential computing environments. The work is primarily focused on supporting SEV-SNP requirements, though the benefits extend to any VMM using the guest-memfd backend that needs control over guest memory placement. == Example usage with QEMU (requires patched QEMU from [3]) == Snippet of the QEMU changes[3] needed to support this feature: /* Create and map guest-memfd region */ new_block->guest_memfd = kvm_create_guest_memfd( new_block->max_length, 0, errp); ... void *ptr_memfd = mmap(NULL, new_block->max_length, PROT_READ | PROT_WRITE, MAP_SHARED, new_block->guest_memfd, 0); ... /* Apply NUMA policy */ int ret = mbind(ptr_memfd, new_block->max_length, backend->policy, backend->host_nodes, maxnode+1, 0); ... QEMU Command to run SEV-SNP guest with interleaved memory across nodes 0 and 1 of the host: $ qemu-system-x86_64 \ -enable-kvm \ ... -machine memory-encryption=sev0,vmport=off \ -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \ -numa node,nodeid=0,memdev=ram0,cpus=0-15 \ -object memory-backend-memfd,id=ram0,host-nodes=0-1,policy=interleave,size=1024M,share=true,prealloc=false == Experiment and Analysis == SEV-SNP enabled host, AMD Zen 3, 2 socket 2 NUMA node system NUMA for Policy Guest Node 0: policy=interleave, host-node=0-1 Test: Allocate and touch 50GB inside guest on node=0. * Generic Kernel (without NUMA supported guest-memfd): Node 0 Node 1 Total Before running Test: MemUsed 9981.60 3312.00 13293.60 After running Test: MemUsed 61451.72 3201.62 64653.34 Arbitrary allocations: all ~50GB allocated on node 0. * With NUMA supported guest-memfd: Node 0 Node 1 Total Before running Test: MemUsed 5003.88 3963.07 8966.94 After running Test: MemUsed 30607.55 29670.00 60277.55 Balanced memory distribution: Equal increase (~25GB) on both nodes. == Conclusion == Adding the NUMA-aware memory management to guest_memfd will make a lot of sense. Improving performance of memory-intensive and locality-sensitive workloads with fine-grained control over guest memory allocations, as pointed out in the analysis. Please review and provide feedback! Thanks, Shivank [1] https://lore.kernel.org/all/6fbef654-36e2-4be5-906e-2a648a845278@redhat.com [2] https://lore.kernel.org/all/6f2bfac2-d9e7-4e4a-9298-7accded16b4f@redhat.com [3] https://github.com/shivankgarg98/qemu/tree/guest_memfd_mbind_NUMA [4] https://lore.kernel.org/all/2b77e055-98ac-43a1-a7ad-9f9065d7f38f@amd.com [5] https://lore.kernel.org/all/diqzbjumm167.fsf@ackerleytng-ctop.c.googlers.com == Earlier postings and changelogs == v7 (current): - Add fixes suggested by Vlastimil and Ackerley. - Store NUMA policy in custom inode struct instead of file. v6: - https://lore.kernel.org/all/20250226082549.6034-1-shivankg@amd.com - Rebase to linux mainline - Drop RFC tag - Add selftests to ensure NUMA support for guest_memfd works correctly. v5: - https://lore.kernel.org/all/20250219101559.414878-1-shivankg@amd.com - Fix documentation and style issues. - Use EXPORT_SYMBOL_GPL - Split preparatory change in separate patch v4: - https://lore.kernel.org/all/20250210063227.41125-1-shivankg@amd.com - Dropped fbind() approach in favor of shared policy support. v3: - https://lore.kernel.org/all/20241105164549.154700-1-shivankg@amd.com - Introduce fbind() syscall and drop the IOCTL-based approach. v2: - https://lore.kernel.org/all/20240919094438.10987-1-shivankg@amd.com - Add fixes suggested by Matthew Wilcox. v1: - https://lore.kernel.org/all/20240916165743.201087-1-shivankg@amd.com - Proposed IOCTL based approach to pass NUMA mempolicy. Ackerley Tng (1): KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Shivank Garg (6): mm/mempolicy: Export memory policy symbols security: Export security_inode_init_security_anon for KVM guest_memfd KVM: Add kvm_gmem_exit() cleanup function KVM: guest_memfd: Add slab-allocated inode cache KVM: guest_memfd: Enforce NUMA mempolicy using shared policy KVM: guest_memfd: selftests: Add tests for mmap and NUMA policy support Shivansh Dhiman (1): mm/filemap: Add mempolicy support to the filemap layer include/linux/pagemap.h | 41 +++ include/uapi/linux/magic.h | 1 + mm/filemap.c | 27 +- mm/mempolicy.c | 6 + security/security.c | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 86 +++++- virt/kvm/guest_memfd.c | 261 ++++++++++++++++-- virt/kvm/kvm_main.c | 2 + virt/kvm/kvm_mm.h | 6 + 9 files changed, 402 insertions(+), 29 deletions(-) -- 2.34.1

7 months, 1 week

5
20
0 0

[PATCH] clk: test: Forward-declare struct of_phandle_args in kunit/clk.h

by Richard Fitzgerald

Add a forward-declare of struct of_phandle_args to prevent the compiler warning: ../include/kunit/clk.h:29:63: warning: ‘struct of_phandle_args’ declared inside parameter list will not be visible outside of this definition or declaration struct clk_hw *(*get)(struct of_phandle_args *clkspec, void *data), Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> --- include/kunit/clk.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/kunit/clk.h b/include/kunit/clk.h index 0afae7688157..f226044cc78d 100644 --- a/include/kunit/clk.h +++ b/include/kunit/clk.h @@ -6,6 +6,7 @@ struct clk; struct clk_hw; struct device; struct device_node; +struct of_phandle_args; struct kunit; struct clk * -- 2.43.0

7 months, 1 week

2
1
0 0

[PATCH v5 net-next 00/15] AccECN protocol patch series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Plese find the v5: v5 (22-Apr-2025) - Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>) v4 (18-Apr-2025) - Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>) v3 (14-Apr-2025) - Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Mar-2025) - Add one missing patch from previous AccECN protocol preparation patch series to this patch series The full patch series can be found in https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ The Accurate ECN draft can be found in https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28 Best regards, Chia-Yu Chia-Yu Chang (1): tcp: accecn: AccECN option failure handling Ilpo Järvinen (14): tcp: reorganize SYN ECN code tcp: fast path functions later tcp: AccECN core tcp: accecn: AccECN negotiation tcp: accecn: add AccECN rx byte counters tcp: accecn: AccECN needs to know delivered bytes tcp: allow embedding leftover into option padding tcp: sack option handling improvements tcp: accecn: AccECN option tcp: accecn: AccECN option send control tcp: accecn: AccECN option ceb/cep heuristic tcp: accecn: AccECN ACE field multi-wrap heuristic tcp: accecn: try to fit AccECN option with SACK tcp: try to avoid safer when ACKs are thinned include/linux/tcp.h | 27 +- include/net/netns/ipv4.h | 2 + include/net/tcp.h | 198 +++++++++++-- include/uapi/linux/tcp.h | 7 + net/ipv4/syncookies.c | 3 + net/ipv4/sysctl_net_ipv4.c | 19 ++ net/ipv4/tcp.c | 26 +- net/ipv4/tcp_input.c | 591 +++++++++++++++++++++++++++++++++++-- net/ipv4/tcp_ipv4.c | 5 +- net/ipv4/tcp_minisocks.c | 92 +++++- net/ipv4/tcp_output.c | 302 +++++++++++++++++-- net/ipv6/syncookies.c | 1 + net/ipv6/tcp_ipv6.c | 1 + 13 files changed, 1178 insertions(+), 96 deletions(-) -- 2.34.1

7 months, 1 week

7
39
0 0

[PATCH v8 00/10] Basic SEV-SNP Selftests

by Pratik R. Sampat

This patch series extends the sev_init2 and the sev_smoke test to exercise the SEV-SNP VM launch workflow. Primarily, it introduces the architectural defines, its support in the SEV library and extends the tests to interact with the SEV-SNP ioctl() wrappers. Patch 1 - Do not advertise SNP on initialization failure Patch 2 - SNP test for KVM_SEV_INIT2 Patch 3 - Add vmgexit helper Patch 4 - Add SMT control interface helper Patch 5 - Replace assert() with TEST_ASSERT_EQ() Patch 6 - Introduce SEV+ VM type check Patch 7 - SNP iotcl() plumbing for the SEV library Patch 8 - Force set GUEST_MEMFD for SNP Patch 9 - Cleanups of smoke test - Decouple policy from type Patch 10 - SNP smoke test The series is based on git.kernel.org/pub/scm/virt/kvm/kvm.git next v7..v8: * Dropped exporting the SNP initialized API from ccp to KVM. Instead call SNP_PLATFORM_STATUS within KVM to query the initialization. (Tom) While it may be cheaper to query sev->snp_initialized from ccp, making the SNP platform call within KVM does away with any dependencies. v6..v7: https://lore.kernel.org/kvm/20250221210200.244405-7-prsampat@amd.com/ Based on comments from Sean - * Replaced FW check with sev->snp_initialized * Dropped the patch which removes SEV+ KVM advertisement if INIT fails. This should be now be resolved by the combination of the patches [1,2] from Ashish. * Change vmgexit to an inline function * Export SMT control parsing interface to kvm_util Note: hyperv_cpuid KST only compile tested * Replace assert() with TEST_ASSERT_EQ() within SEV library * Define KVM_SEV_PAGE_TYPE_INVALID for SEV call of encrypt_region() * Parameterize encrypt_region() to include privatize_region() * Deduplication of sev test calls between SEV,SEV-ES and SNP * Removed FW version tests for SNP * Included testing of SNP_POLICY_DBG * Dropped most tags from patches that have been changed or indirectly affected [1] https://lore.kernel.org/all/d6d08c6b-9602-4f3d-92c2-8db6d50a1b92@amd.com [2] https://lore.kernel.org/all/f78ddb64087df27e7bcb1ae0ab53f55aa0804fab.173922… v5..v6: https://lore.kernel.org/kvm/ab433246-e97c-495b-ab67-b0cb1721fb99@amd.com/ * Rename is_sev_platform_init to sev_fw_initialized (Nikunj) * Rename KVM CPU feature X86_FEATURE_SNP to X86_FEATURE_SEV_SNP (Nikunj) * Collected Tags from Nikunj, Pankaj, Srikanth. v4..v5: https://lore.kernel.org/kvm/8e7d8172-879e-4a28-8438-343b1c386ec9@amd.com/ * Introduced a check to disable advertising support for SEV, SEV-ES and SNP when platform initialization fails (Nikunj) * Remove the redundant SNP check within is_sev_vm() (Nikunj) * Cleanup of the encrypt_region flow for better readability (Nikunj) * Refactor paths to use the canonical $(ARCH) to rebase for kvm/next v3..v4: https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd… * Remove SNP FW API version check in the test and ensure the KVM capability advertises the presence of the feature. Retain the minimum version definitions to exercise these API versions in the smoke test * Retained only the SNP smoke test and SNP_INIT2 test * The SNP architectural defined merged with SNP_INIT2 test patch * SNP shutdown merged with SNP smoke test patch * Add SEV VM type check to abstract comparisons and reduce clutter * Define a SNP default policy which sets bits based on the presence of SMT * Decouple privatization and encryption for it to be SNP agnostic * Assert for only positive tests using vm_ioctl() * Dropped tested-by tags In summary - based on comments from Sean, I have primarily reduced the scope of this patch series to focus on breaking down the SNP smoke test patch (v3 - patch2) to first introduce SEV-SNP support and use this interface to extend the sev_init2 and the sev_smoke test. The rest of the v3 patchset that introduces ioctl, pre fault, fallocate and negative tests, will be re-worked and re-introduced subsequently in future patch series post addressing the issues discussed. v2..v3: https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c… * Remove the assignments for the prefault and fallocate test type enums. * Fix error message for sev launch measure and finish. * Collect tested-by tags [Peter, Srikanth] Pratik R. Sampat (10): KVM: SEV: Disable SEV-SNP support on initialization failure KVM: selftests: SEV-SNP test for KVM_SEV_INIT2 KVM: selftests: Add vmgexit helper KVM: selftests: Add SMT control state helper KVM: selftests: Replace assert() with TEST_ASSERT_EQ() KVM: selftests: Introduce SEV VM type check KVM: selftests: Add library support for interacting with SNP KVM: selftests: Force GUEST_MEMFD flag for SNP VM type KVM: selftests: Abstractions for SEV to decouple policy from type KVM: selftests: Add a basic SEV-SNP smoke test arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/svm/sev.c | 30 +++++- tools/arch/x86/include/uapi/asm/kvm.h | 1 + .../testing/selftests/kvm/include/kvm_util.h | 35 +++++++ .../selftests/kvm/include/x86/processor.h | 1 + tools/testing/selftests/kvm/include/x86/sev.h | 42 ++++++++- tools/testing/selftests/kvm/lib/kvm_util.c | 7 +- .../testing/selftests/kvm/lib/x86/processor.c | 4 +- tools/testing/selftests/kvm/lib/x86/sev.c | 93 +++++++++++++++++-- .../testing/selftests/kvm/x86/hyperv_cpuid.c | 19 ---- .../selftests/kvm/x86/sev_init2_tests.c | 13 +++ .../selftests/kvm/x86/sev_smoke_test.c | 75 +++++++++------ 12 files changed, 261 insertions(+), 60 deletions(-) -- 2.43.0

7 months, 1 week

4
20
0 0

[PATCH net-next v3] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is automatically built, though not used. Only tested on x86. To run: $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Results: # linux/tools/testing/selftests/vsock/vmtest.log setup: Building kernel and tests setup: Booting up VM setup: VM booted up test:vm_server_host_client:guest: Control socket listening on 0.0.0.0:51000 test:vm_server_host_client:guest: Control socket connection accepted... [...] test:vm_loopback:guest: 30 - SOCK_STREAM retry failed connect()...ok test:vm_loopback:guest: 31 - SOCK_STREAM SO_LINGER null-ptr-deref...ok test:vm_loopback:guest: 31 - SOCK_STREAM SO_LINGER null-ptr-deref...ok Future work can include vsock_diag_test. vmtest.sh is loosely based off of tools/testing/selftests/net/pmtu.sh, which was picked out of the bag of tests I knew to work with NIPA. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). This aspect is worth review, as I'm not aware of all of the enviroments where this would run. Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change <tab><backslash> to <space><backslash> - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 1 + tools/testing/selftests/vsock/Makefile | 9 + tools/testing/selftests/vsock/config.vsock | 10 + tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 354 +++++++++++++++++++++++++++++ 6 files changed, 376 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..1950aa8ac68c0831c12c1aaa429da45bbe41e60f --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1 @@ +vsock_selftests.log diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..6fded8c4d593541a6f7462147bffcb719def378f --- /dev/null +++ b/tools/testing/selftests/vsock/Makefile @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0 +.PHONY: all +all: + +TEST_PROGS := vmtest.sh +EXTRA_CLEAN := vmtest.log + +include ../lib.mk + diff --git a/tools/testing/selftests/vsock/config.vsock b/tools/testing/selftests/vsock/config.vsock new file mode 100644 index 0000000000000000000000000000000000000000..9e0fb2270e6a2fc0beb5f0d9f0bc37158d0a9d23 --- /dev/null +++ b/tools/testing/selftests/vsock/config.vsock @@ -0,0 +1,10 @@ +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings new file mode 100644 index 0000000000000000000000000000000000000000..e7b9417537fbc4626153b72e8f295ab4594c844b --- /dev/null +++ b/tools/testing/selftests/vsock/settings @@ -0,0 +1 @@ +timeout=0 diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..d70b9446e531d6d20beb24ddeda2cf0a9f7e9a39 --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,354 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +KERNEL_CHECKOUT=$(realpath ${SCRIPT_DIR}/../../../..) +QEMU=$(command -v qemu-system-$(uname -m)) +VERBOSE=0 +SKIP_BUILD=0 +VSOCK_TEST=${KERNEL_CHECKOUT}/tools/testing/vsock/vsock_test + +TEST_GUEST_PORT=51000 +TEST_HOST_PORT=50000 +TEST_HOST_PORT_LISTENER=50001 +SSH_GUEST_PORT=22 +SSH_HOST_PORT=2222 +VSOCK_CID=1234 +WAIT_PERIOD=3 +WAIT_PERIOD_MAX=20 + +QEMU_PIDFILE=/tmp/qemu.pid + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +QEMU_OPTS="" +QEMU_OPTS="${QEMU_OPTS} -netdev user,id=n0,hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +QEMU_OPTS="${QEMU_OPTS},hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +QEMU_OPTS="${QEMU_OPTS} -device virtio-net-pci,netdev=n0" +QEMU_OPTS="${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${VSOCK_CID}" +QEMU_OPTS="${QEMU_OPTS} --pidfile ${QEMU_PIDFILE}" +KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh virtme_ssh_user=$USER" + +LOG=${SCRIPT_DIR}/vmtest.log + +# Name Description +avail_tests=" + vm_server_host_client Run vsock_test in server mode on the VM and in client mode on the host. + vm_client_host_server Run vsock_test in client mode on the VM and in server mode on the host. + vm_loopback Run vsock_test using the loopback transport in the VM. +" + +usage() { + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo "Options" + echo " -v: verbose output" + echo " -s: skip build" + echo + echo "Available tests${avail_tests}" + exit 1 +} + +die() { + echo "$*" >&2 + exit 1 +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p 2222 localhost $* + return $? +} + +cleanup() { + if [[ -f "${QEMU_PIDFILE}" ]]; then + pkill -SIGTERM -F ${QEMU_PIDFILE} 2>&1 >/dev/null + fi +} + +build() { + log_setup "Building kernel and tests" + + pushd ${KERNEL_CHECKOUT} >/dev/null + vng \ + --kconfig \ + --config ${KERNEL_CHECKOUT}/tools/testing/selftests/vsock/config.vsock + make -j$(nproc) + make -C ${KERNEL_CHECKOUT}/tools/testing/vsock + popd >/dev/null + echo +} + +vm_setup() { + local VNG_OPTS="" + if [[ "${VERBOSE}" = 1 ]]; then + VNG_OPTS="--verbose" + fi + vng \ + $VNG_OPTS \ + --run ${KERNEL_CHECKOUT} \ + --qemu-opts="${QEMU_OPTS}" \ + --qemu="${QEMU}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw 2>&1 >/dev/null & +} + +vm_wait_for_ssh() { + i=0 + while [[ true ]]; do + if [[ ${i} > ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for guest ssh" + fi + vm_ssh -- true + if [[ $? -eq 0 ]]; then + break + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +wait_for_listener() { + local PORT=$1 + local i=0 + while ! ss -ltn | grep -q ":${PORT}"; do + if [[ ${i} > ${WAIT_PERIOD_MAX} ]]; then + die "Timed out waiting for listener on port ${PORT}" + fi + i=$(( i + 1 )) + sleep ${WAIT_PERIOD} + done +} + +vm_wait_for_listener() { + local port=$1 + vm_ssh -- "$(declare -f wait_for_listener); wait_for_listener ${port}" +} + +host_wait_for_listener() { + wait_for_listener ${TEST_HOST_PORT_LISTENER} +} + +log() { + local prefix="$1" + shift + + if [[ "$#" -eq 0 ]]; then + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' | tee -a ${LOG} + else + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' | tee -a ${LOG} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + testname=$1 + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + testname=$1 + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener ${TEST_GUEST_PORT} + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + return $? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + return $? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- ${VSOCK_TEST} \ + --mode=server \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener ${port} + + vm_ssh -- ${VSOCK_TEST} \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid=1 2>&1 | log_guest "${testname}" + + return $? +} + +run_test() { + unset IFS + local host_oops_cnt_before + local host_warn_cnt_before + local vm_oops_cnt_before + local vm_warn_cnt_before + local host_oops_cnt_after + local host_warn_cnt_after + local vm_oops_cnt_after + local vm_warn_cnt_after + local name + local rc + + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} > ${host_oops_cnt_before} ]]; then + echo "${name}: kernel oops detected on host" | log_host ${name} + rc=1 + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} > ${host_warn_cnt_before} ]]; then + echo "${name}: kernel warning detected on host" | log_host ${name} + rc=1 + fi + + vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} > ${vm_oops_cnt_before} ]]; then + echo "${name}: kernel oops detected on vm" | log_host ${name} + rc=1 + fi + + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} > ${vm_warn_cnt_before} ]]; then + echo "${name}: kernel warning detected on vm" | log_host ${name} + rc=1 + fi + + return ${rc} +} + +while getopts :hvsq: o +do + case $o in + v) VERBOSE=1;; + s) SKIP_BUILD=1;; + q) QEMU=$OPTARG;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +if [[ ! -x "$(command -v vng)" ]]; then + die "vng not found." +fi + +if [[ ! -x "${QEMU}" ]]; then + die "${QEMU} not found." +fi + +rm -f "${LOG}" +if [[ "${SKIP_BUILD}" != 1 ]]; then + build +fi +log_setup "Booting up VM" +vm_setup +vm_wait_for_ssh +log_setup "VM booted up" + +for arg in "$@"; do + if ! command -v > /dev/null "test_${arg}"; then + echo "Test ${arg} not found" + die "${usage}" + fi +done + +IFS=" +" +cnt=0 +name="" +desc="" +for t in ${avail_tests}; do + [ "${name}" = "" ] && name="${t}" && continue + # desc is unused, but we need to eat it. + [ "${desc}" = "" ] && desc="${t}" + + run_this=0 + if [[ "${#}" -eq 0 ]]; then + run_this=1 + else + for arg in "$@"; do + if [[ "${arg}" = "${name}" ]]; then + run_this=1 + fi + done + fi + + if [[ "${run_this}" = 1 ]]; then + run_test "${name}" + rc=$? + if [[ ${rc} != 0 ]]; then + cnt=$(( cnt + 1 )) + fi + fi + name="" + desc="" +done + +if [[ ${cnt} = 0 ]]; then + echo OK +else + echo FAILED: ${cnt} +fi +echo "Log: ${LOG}" +exit ${cnt} --- base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

7 months, 1 week

3
6
0 0

[RFC PATCH 00/11] New KVM ioctl to link a gmem inode to a new gmem file

by Ackerley Tng

Hello, This patchset builds upon the code at https://lore.kernel.org/lkml/20230718234512.1690985-1-seanjc@google.com/T/. This code is available at https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfc…. In guest_mem v11, a split file/inode model was proposed, where memslot bindings belong to the file and pages belong to the inode. This model lends itself well to having different VMs use separate files pointing to the same inode. This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and a gmem fd, and returns another gmem fd referencing a different file and associated with VM. This RFC also includes an update to KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context (slot->arch.lpage_info and kvm->mem_attr_array) from source to destination vm, intra-host. Intended usage of the two ioctls: 1. Source VM’s fd is passed to destination VM via unix sockets 2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source VM’s fd to a new fd. 3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION, which will bind the new file, pointing to the same inode that the source VM’s file points to, to memslots 4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array and slot->arch.lpage_info to the destination VM. 5. Run the destination VM as per normal Some other approaches considered were: + Using the linkat() syscall, but that requires a mount/directory for a source fd to be linked to + Using the dup() syscall, but that only duplicates the fd, and both fds point to the same file --- Ackerley Tng (11): KVM: guest_mem: Refactor out kvm_gmem_alloc_file() KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl KVM: selftests: Test transferring private memory to another VM KVM: x86: Refactor sev's flag migration_in_progress to kvm struct KVM: x86: Refactor common code out of sev.c KVM: x86: Refactor common migration preparation code out of sev_vm_move_enc_context_from KVM: x86: Let moving encryption context be configurable KVM: x86: Handle moving of memory context for intra-host migration KVM: selftests: Generalize migration functions from sev_migrate_tests.c KVM: selftests: Add tests for migration of private mem arch/x86/include/asm/kvm_host.h | 4 +- arch/x86/kvm/svm/sev.c | 85 ++----- arch/x86/kvm/svm/svm.h | 3 +- arch/x86/kvm/x86.c | 221 +++++++++++++++++- arch/x86/kvm/x86.h | 6 + include/linux/kvm_host.h | 18 ++ include/uapi/linux/kvm.h | 8 + tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 42 ++++ .../selftests/kvm/include/kvm_util_base.h | 31 +++ .../kvm/x86_64/private_mem_migrate_tests.c | 93 ++++++++ .../selftests/kvm/x86_64/sev_migrate_tests.c | 48 ++-- virt/kvm/guest_mem.c | 151 ++++++++++-- virt/kvm/kvm_main.c | 10 + virt/kvm/kvm_mm.h | 7 + 15 files changed, 596 insertions(+), 132 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_migrate_tests.c -- 2.41.0.640.ga95def55d0-goog

7 months, 2 weeks

3
15
0 0

[PATCH] selftests/run_kselftest.sh: Use readlink if realpath is not available

by Yosry Ahmed

'realpath' is not always available, fallback to 'readlink -f' if is not available. They seem to work equally well in this context. Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> --- tools/testing/selftests/run_kselftest.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 50e03eefe7ac7..0443beacf3621 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -3,7 +3,14 @@ # # Run installed kselftest tests. # -BASE_DIR=$(realpath $(dirname $0)) + +# Fallback to readlink if realpath is not available +if which realpath > /dev/null; then + BASE_DIR=$(realpath $(dirname $0)) +else + BASE_DIR=$(readlink -f $(dirname $0)) +fi + cd $BASE_DIR TESTS="$BASE_DIR"/kselftest-list.txt if [ ! -r "$TESTS" ] ; then -- 2.49.0.rc1.451.g8f38331e32-goog

7 months, 2 weeks

2
3
0 0

[PATCH] KVM: selftests: add test for SVE host corruption

by Mark Brown

This test program, originally written by Mark Rutland and lightly modified by me for upstream, verifies that we do not have the issues with host SVE state being discarded which were fixed in fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state") by running a simple VM while checking the SVE register state for corruption. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/arm64/host_sve.c | 127 +++++++++++++++++++++++++++ 2 files changed, 128 insertions(+) diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index f62b0a5aba35..d37072054a3d 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -147,6 +147,7 @@ TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON) TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases TEST_GEN_PROGS_arm64 += arm64/debug-exceptions +TEST_GEN_PROGS_arm64 += arm64/host_sve TEST_GEN_PROGS_arm64 += arm64/hypercalls TEST_GEN_PROGS_arm64 += arm64/mmio_abort TEST_GEN_PROGS_arm64 += arm64/page_fault_test diff --git a/tools/testing/selftests/kvm/arm64/host_sve.c b/tools/testing/selftests/kvm/arm64/host_sve.c new file mode 100644 index 000000000000..3826772fd470 --- /dev/null +++ b/tools/testing/selftests/kvm/arm64/host_sve.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Host SVE: Check FPSIMD/SVE/SME save/restore over KVM_RUN ioctls. + * + * Copyright 2025 Arm, Ltd + */ + +#include <errno.h> +#include <signal.h> +#include <sys/auxv.h> +#include <asm/kvm.h> +#include <kvm_util.h> + +#include "ucall_common.h" + +static void guest_code(void) +{ + for (int i = 0; i < 10; i++) { + GUEST_UCALL_NONE(); + } + + GUEST_DONE(); +} + +void handle_sigill(int sig, siginfo_t *info, void *ctx) +{ + ucontext_t *uctx = ctx; + + printf(" < host signal %d >\n", sig); + + /* + * Skip the UDF + */ + uctx->uc_mcontext.pc += 4; +} + +void register_sigill_handler(void) +{ + struct sigaction sa = { + .sa_sigaction = handle_sigill, + .sa_flags = SA_SIGINFO, + }; + sigaction(SIGILL, &sa, NULL); +} + +static void do_sve_roundtrip(void) +{ + unsigned long before, after; + + /* + * Set all bits in a predicate register, force a save/restore via a + * SIGILL (which handle_sigill() will recover from), then report + * whether the value has changed. + */ + asm volatile( + " .arch_extension sve\n" + " ptrue p0.B\n" + " cntp %[before], p0, p0.B\n" + " udf #0\n" + " cntp %[after], p0, p0.B\n" + : [before] "=r" (before), + [after] "=r" (after) + : + : "p0" + ); + + if (before != after) { + TEST_FAIL("Signal roundtrip discarded predicate bits (%ld => %ld)\n", + before, after); + } else { + printf("Signal roundtrip preserved predicate bits (%ld => %ld)\n", + before, after); + } +} + +static void test_run(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + bool guest_done = false; + + register_sigill_handler(); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + do_sve_roundtrip(); + + while (!guest_done) { + + printf("Running VCPU...\n"); + vcpu_run(vcpu); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_NONE: + do_sve_roundtrip(); + do_sve_roundtrip(); + break; + case UCALL_DONE: + guest_done = true; + break; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + default: + TEST_FAIL("Unexpected guest exit"); + } + } + + kvm_vm_free(vm); +} + +int main(void) +{ + /* + * This is testing the host environment, we don't care about + * guest SVE support. + */ + if (!(getauxval(AT_HWCAP) & HWCAP_SVE)) { + printf("SVE not supported\n"); + return KSFT_SKIP; + } + + test_run(); + return 0; +} --- base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444 change-id: 20250226-kvm-selftest-sve-signal-1add0d9d716c Best regards, -- Mark Brown <broonie(a)kernel.org>

7 months, 2 weeks

3
3
0 0

[PATCH net-next v13 0/9] Device memory TCP TX

by Mina Almasry

v13: https://lore.kernel.org/netdev/20250425204743.617260-1-almasrymina@google.c… === Changelog: - Fix unneeded error label pointed out by Christoph, and addressed nitpick. v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c… ==== No changes in v12, just restored the selftests patch I accidentally dropped in v11 v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c… ==== Addressed a couple of nits and collected Acked-by from Harshitha (thanks!) v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.… ==== Addressed comments following conversations with Pavel, Stan, and Harshitha. Thank you guys for the reviews again. Overall minor changes: Changelog: - Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with a TX niov in that path (Pavel). - Fix locking case in !netif_device_present (Jakub/Stan). v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c… === Changelog: - Use priv->bindings list instead of sock_bindings_list. This was missed during the rebase as the bindings have been updated to use priv->bindings recently (thanks Stan!) v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.… === Only address minor comments on V7 Changelog: - Use netdev locking instead of rtnl_locking to match rx path. - Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of NET_IOV_UNSPECIFIED. - Post send binding to net_devmem_dmabuf_bindings after it's been fully initialized (Stan). v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.… === Changelog: - Check the dmabuf net_iov binding belongs to the device the TX is going out on. (Jakub) - Provide detailed inspection of callsites of __skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub) v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Mina Almasry (8): netmem: add niov->type attribute to distinguish different net_iov types net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 3 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 34 +- include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + io_uring/zcrx.c | 3 +- net/core/datagram.c | 48 ++- net/core/dev.c | 34 +- net/core/devmem.c | 131 ++++++-- net/core/devmem.h | 83 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 80 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 30 files changed, 1008 insertions(+), 88 deletions(-) base-commit: 0d15a26b247d25cd012134bf8825128fedb15cc9 -- 2.49.0.901.g37484f566f-goog

7 months, 2 weeks

3
17
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror April 2025