September 2024 - Linux-kselftest-mirror

[RFC PATCH 00/39] 1G page support for guest_memfd

by Ackerley Tng

Hello, This patchset is our exploration of how to support 1G pages in guest_memfd, and how the pages will be used in Confidential VMs. The patchset covers: + How to get 1G pages + Allowing mmap() of guest_memfd to userspace so that both private and shared memory can use the same physical pages + Splitting and reconstructing pages to support conversions and mmap() + How the VM, userspace and guest_memfd interact to support conversions + Selftests to test all the above + Selftests also demonstrate the conversion flow between VM, userspace and guest_memfd. Why 1G pages in guest memfd? Bring guest_memfd to performance and memory savings parity with VMs that are backed by HugeTLBfs. + Performance is improved with 1G pages by more TLB hits and faster page walks on TLB misses. + Memory savings from 1G pages comes from HugeTLB Vmemmap Optimization (HVO). Options for 1G page support: 1. HugeTLB 2. Contiguous Memory Allocator (CMA) 3. Other suggestions are welcome! Comparison between options: 1. HugeTLB + Refactor HugeTLB to separate allocator from the rest of HugeTLB + Pro: Graceful transition for VMs backed with HugeTLB to guest_memfd + Near term: Allows co-tenancy of HugeTLB and guest_memfd backed VMs + Pro: Can provide iterative steps toward new future allocator + Unexplored: Managing userspace-visible changes + e.g. HugeTLB's free_hugepages will decrease if HugeTLB is used, but not when future allocator is used 2. CMA + Port some HugeTLB features to be applied on CMA + Pro: Clean slate What would refactoring HugeTLB involve? (Some refactoring was done in this RFC, more can be done.) 1. Broadly involves separating the HugeTLB allocator from the rest of HugeTLB + Brings more modularity to HugeTLB + No functionality change intended + Likely step towards HugeTLB's integration into core-mm 2. guest_memfd will use just the allocator component of HugeTLB, not including the complex parts of HugeTLB like + Userspace reservations (resv_map) + Shared PMD mappings + Special page walkers What features would need to be ported to CMA? + Improved allocation guarantees + Per NUMA node pool of huge pages + Subpools per guest_memfd + Memory savings + Something like HugeTLB Vmemmap Optimization + Configuration/reporting features + Configuration of number of pages available (and per NUMA node) at and after host boot + Reporting of memory usage/availability statistics at runtime HugeTLB was picked as the source of 1G pages for this RFC because it allows a graceful transition, and retains memory savings from HVO. To illustrate this, if a host machine uses HugeTLBfs to back VMs, and a confidential VM were to be scheduled on that host, some HugeTLBfs pages would have to be given up and returned to CMA for guest_memfd pages to be rebuilt from that memory. This requires memory to be reserved for HVO to be removed and reapplied on the new guest_memfd memory. This not only slows down memory allocation but also trims the benefits of HVO. Memory would have to be reserved on the host to facilitate these transitions. Improving how guest_memfd uses the allocator in a future revision of this RFC: To provide an easier transition away from HugeTLB, guest_memfd's use of HugeTLB should be limited to these allocator functions: + reserve(node, page_size, num_pages) => opaque handle + Used when a guest_memfd inode is created to reserve memory from backend allocator + allocate(handle, mempolicy, page_size) => folio + To allocate a folio from guest_memfd's reservation + split(handle, folio, target_page_size) => void + To take a huge folio, and split it to smaller folios, restore to filemap + reconstruct(handle, first_folio, nr_pages) => void + To take a folio, and reconstruct a huge folio out of nr_pages from the first_folio + free(handle, folio) => void + To return folio to guest_memfd's reservation + error(handle, folio) => void + To handle memory errors + unreserve(handle) => void + To return guest_memfd's reservation to allocator backend Userspace should only provide a page size when creating a guest_memfd and should not have to specify HugeTLB. Overview of patches: + Patches 01-12 + Many small changes to HugeTLB, mostly to separate HugeTLBfs concepts from HugeTLB, and to expose HugeTLB functions. + Patches 13-16 + Letting guest_memfd use HugeTLB + Creation of each guest_memfd reserves pages from HugeTLB's global hstate and puts it into the guest_memfd inode's subpool + Each folio allocation takes a page from the guest_memfd inode's subpool + Patches 17-21 + Selftests for new HugeTLB features in guest_memfd + Patches 22-24 + More small changes on the HugeTLB side to expose functions needed by guest_memfd + Patch 25: + Uses the newly available functions from patches 22-24 to split HugeTLB pages. In this patch, HugeTLB folios are always split to 4K before any usage, private or shared. + Patches 26-28 + Allow mmap() in guest_memfd and faulting in shared pages + Patch 29 + Enables conversion between private/shared pages + Patch 30 + Required to zero folios after conversions to avoid leaking initialized kernel memory + Patch 31-38 + Add selftests to test mapping pages to userspace, guest/host memory sharing and update conversions tests + Patch 33 illustrates the conversion flow between VM/userspace/guest_memfd + Patch 39 + Dynamically split and reconstruct HugeTLB pages instead of always splitting before use. All earlier selftests are expected to still pass. TODOs: + Add logic to wait for safe_refcount [1] + Look into lazy splitting/reconstruction of pages + Currently, when the KVM_SET_MEMORY_ATTRIBUTES is invoked, not only is the mem_attr_array and faultability updated, the pages in the requested range are also split/reconstructed as necessary. We want to look into delaying splitting/reconstruction to fault time. + Solve race between folios being faulted in and being truncated + When running private_mem_conversions_test with more than 1 vCPU, a folio getting truncated may get faulted in by another process, causing elevated mapcounts when the folio is freed (VM_BUG_ON_FOLIO). + Add intermediate splits (1G should first split to 2M and not split directly to 4K) + Use guest's lock instead of hugetlb_lock + Use multi-index xarray/replace xarray with some other data struct for faultability flag + Refactor HugeTLB better, present generic allocator interface Please let us know your thoughts on: + HugeTLB as the choice of transitional allocator backend + Refactoring HugeTLB to provide generic allocator interface + Shared/private conversion flow + Requiring user to request kernel to unmap pages from userspace using madvise(MADV_DONTNEED) + Failing conversion on elevated mapcounts/pincounts/refcounts + Process of splitting/reconstructing page + Anything else! [1] https://lore.kernel.org/all/20240829-guest-memfd-lib-v2-0-b9afc1ff3656@quic… Ackerley Tng (37): mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() mm: hugetlb: Remove unnecessary check for avoid_reserve mm: mempolicy: Refactor out policy_node_nodemask() mm: hugetlb: Refactor alloc_buddy_hugetlb_folio_with_mpol() to interpret mempolicy instead of vma mm: hugetlb: Refactor dequeue_hugetlb_folio_vma() to use mpol mm: hugetlb: Refactor out hugetlb_alloc_folio mm: truncate: Expose preparation steps for truncate_inode_pages_final mm: hugetlb: Expose hugetlb_subpool_{get,put}_pages() mm: hugetlb: Add option to create new subpool without using surplus mm: hugetlb: Expose hugetlb_acct_memory() mm: hugetlb: Move and expose hugetlb_zero_partial_page() KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes KVM: guest_memfd: hugetlb: initialization and cleanup KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb KVM: guest_memfd: Add page alignment check for hugetlb guest_memfd KVM: selftests: Add basic selftests for hugetlb-backed guest_memfd KVM: selftests: Support various types of backing sources for private memory KVM: selftests: Update test for various private memory backing source types KVM: selftests: Add private_mem_conversions_test.sh KVM: selftests: Test that guest_memfd usage is reported via hugetlb mm: hugetlb: Expose vmemmap optimization functions mm: hugetlb: Expose HugeTLB functions for promoting/demoting pages mm: hugetlb: Add functions to add/move/remove from hugetlb lists KVM: guest_memfd: Track faultability within a struct kvm_gmem_private KVM: guest_memfd: Allow mmapping guest_memfd files KVM: guest_memfd: Use vm_type to determine default faultability KVM: Handle conversions in the SET_MEMORY_ATTRIBUTES ioctl KVM: guest_memfd: Handle folio preparation for guest_memfd mmap KVM: selftests: Allow vm_set_memory_attributes to be used without asserting return value of 0 KVM: selftests: Test using guest_memfd memory from userspace KVM: selftests: Test guest_memfd memory sharing between guest and host KVM: selftests: Add notes in private_mem_kvm_exits_test for mmap-able guest_memfd KVM: selftests: Test that pinned pages block KVM from setting memory attributes to PRIVATE KVM: selftests: Refactor vm_mem_add to be more flexible KVM: selftests: Add helper to perform madvise by memslots KVM: selftests: Update private_mem_conversions_test for mmap()able guest_memfd Vishal Annapurve (2): KVM: guest_memfd: Split HugeTLB pages for guest_memfd use KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page fs/hugetlbfs/inode.c | 35 +- include/linux/hugetlb.h | 54 +- include/linux/kvm_host.h | 1 + include/linux/mempolicy.h | 2 + include/linux/mm.h | 1 + include/uapi/linux/kvm.h | 26 + include/uapi/linux/magic.h | 1 + mm/hugetlb.c | 346 ++-- mm/hugetlb_vmemmap.h | 11 - mm/mempolicy.c | 36 +- mm/truncate.c | 26 +- tools/include/linux/kernel.h | 4 +- tools/testing/selftests/kvm/Makefile | 3 + .../kvm/guest_memfd_hugetlb_reporting_test.c | 222 +++ .../selftests/kvm/guest_memfd_pin_test.c | 104 ++ .../selftests/kvm/guest_memfd_sharing_test.c | 160 ++ .../testing/selftests/kvm/guest_memfd_test.c | 238 ++- .../testing/selftests/kvm/include/kvm_util.h | 45 +- .../testing/selftests/kvm/include/test_util.h | 18 + tools/testing/selftests/kvm/lib/kvm_util.c | 443 +++-- tools/testing/selftests/kvm/lib/test_util.c | 99 ++ .../kvm/x86_64/private_mem_conversions_test.c | 158 +- .../x86_64/private_mem_conversions_test.sh | 91 + .../kvm/x86_64/private_mem_kvm_exits_test.c | 11 +- virt/kvm/guest_memfd.c | 1563 ++++++++++++++++- virt/kvm/kvm_main.c | 17 + virt/kvm/kvm_mm.h | 16 + 27 files changed, 3288 insertions(+), 443 deletions(-) create mode 100644 tools/testing/selftests/kvm/guest_memfd_hugetlb_reporting_test.c create mode 100644 tools/testing/selftests/kvm/guest_memfd_pin_test.c create mode 100644 tools/testing/selftests/kvm/guest_memfd_sharing_test.c create mode 100755 tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.sh -- 2.46.0.598.g6f2099f65c-goog

2 months, 3 weeks

17
129
0 0

[PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

by Nicolin Chen

This series introduces a new VIOMMU infrastructure and related ioctls. IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone. The new VIOMMU object is an additional layer, between the nested HWPT and its parent HWPT, to give to both the IOMMUFD core and an IOMMU driver an additional structure to support HW-accelerated feature: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | HW-accel feats | ---------------------------- On a multi-IOMMU system, the VIOMMU object can be instanced to the number of vIOMMUs in a guest VM, while holding the same parent HWPT to share the stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own VMID to attach the shared stage-2 IO pagetable to the physical IOMMU: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | VMID0 | ---------------------------- ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------ ---------------- | | VMID1 | ---------------------------- As an initial part-1, add ioctls to support a VIOMMU-based invalidation: IOMMUFD_CMD_VIOMMU_ALLOC to allocate a VIOMMU object IOMMUFD_CMD_VIOMMU_SET/UNSET_VDEV_ID to set/clear device's virtual ID (Resue IOMMUFD_CMD_HWPT_INVALIDATE for a VIOMMU object to flush cache by a given driver data) Worth noting that the VDEV_ID is for a per-VIOMMU device list for drivers to look up the device's physical instance from its virtual ID in a VM. It is essential for a VIOMMU-based invalidation where the request contains a device's virtual ID for its device cache flush, e.g. ATC invalidation. As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT type for a core-allocated-core-managed VIOMMU object, allowing drivers to simply hook a default viommu ops for viommu-based invalidation alone. And provide some viommu helpers to drivers for VDEV_ID translation and parent domain lookup. Add VIOMMU invalidation support to ARM SMMUv3 driver for a real world use case. This adds supports of arm-smmuv-v3's CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD/ALL commands, supplementing HWPT-based invalidations. In the future, drivers will also be able to choose a driver-managed type to hold its own structure by adding a new type to enum iommu_viommu_type. More VIOMMU-based structures and ioctls will be introduced in part-2/3 to support a driver-managed VIOMMU, e.g. VQUEUE object for a HW accelerated queue, VIRQ (or VEVENT) object for IRQ injections. Although we repurposed the VIOMMU object from an earlier RFC discussion, for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v2 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p1-v2 Changelog v2 * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings * Added arm_smmu_cache_invalidate_user patch from Jason's nesting series v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/ Thanks! Nicolin Jason Gunthorpe (3): iommu: Add iommu_copy_struct_from_full_user_array helper iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Update comments about ATS and bypass Nicolin Chen (16): iommufd: Reorder struct forward declarations iommufd/viommu: Add IOMMUFD_OBJ_VIOMMU and IOMMU_VIOMMU_ALLOC ioctl iommu: Pass in a viommu pointer to domain_alloc_user op iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl iommufd/selftest: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID test coverage iommufd/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add vdev_id helpers for IOMMU drivers iommufd/selftest: Add mock_viommu_invalidate_user op iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add VIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/viommu: Add iommufd_viommu_to_parent_domain helper iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user iommu/arm-smmu-v3: Add arm_smmu_viommu_cache_invalidate drivers/iommu/amd/iommu.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 218 ++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 + drivers/iommu/intel/iommu.c | 1 + drivers/iommu/iommufd/Makefile | 5 +- drivers/iommu/iommufd/device.c | 12 + drivers/iommu/iommufd/hw_pagetable.c | 59 +++- drivers/iommu/iommufd/iommufd_private.h | 37 +++ drivers/iommu/iommufd/iommufd_test.h | 30 ++ drivers/iommu/iommufd/main.c | 12 + drivers/iommu/iommufd/selftest.c | 101 ++++++- drivers/iommu/iommufd/viommu.c | 196 +++++++++++++ drivers/iommu/iommufd/viommu_api.c | 53 ++++ include/linux/iommu.h | 56 +++- include/linux/iommufd.h | 51 +++- include/uapi/linux/iommufd.h | 117 +++++++- tools/testing/selftests/iommu/iommufd.c | 259 +++++++++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 126 +++++++++ 18 files changed, 1299 insertions(+), 38 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu.c create mode 100644 drivers/iommu/iommufd/viommu_api.c -- 2.43.0

2 months, 4 weeks

7
148
0 0

[RFC PATCH 00/11] New KVM ioctl to link a gmem inode to a new gmem file

by Ackerley Tng

Hello, This patchset builds upon the code at https://lore.kernel.org/lkml/20230718234512.1690985-1-seanjc@google.com/T/. This code is available at https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfc…. In guest_mem v11, a split file/inode model was proposed, where memslot bindings belong to the file and pages belong to the inode. This model lends itself well to having different VMs use separate files pointing to the same inode. This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and a gmem fd, and returns another gmem fd referencing a different file and associated with VM. This RFC also includes an update to KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context (slot->arch.lpage_info and kvm->mem_attr_array) from source to destination vm, intra-host. Intended usage of the two ioctls: 1. Source VM’s fd is passed to destination VM via unix sockets 2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source VM’s fd to a new fd. 3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION, which will bind the new file, pointing to the same inode that the source VM’s file points to, to memslots 4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array and slot->arch.lpage_info to the destination VM. 5. Run the destination VM as per normal Some other approaches considered were: + Using the linkat() syscall, but that requires a mount/directory for a source fd to be linked to + Using the dup() syscall, but that only duplicates the fd, and both fds point to the same file --- Ackerley Tng (11): KVM: guest_mem: Refactor out kvm_gmem_alloc_file() KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl KVM: selftests: Test transferring private memory to another VM KVM: x86: Refactor sev's flag migration_in_progress to kvm struct KVM: x86: Refactor common code out of sev.c KVM: x86: Refactor common migration preparation code out of sev_vm_move_enc_context_from KVM: x86: Let moving encryption context be configurable KVM: x86: Handle moving of memory context for intra-host migration KVM: selftests: Generalize migration functions from sev_migrate_tests.c KVM: selftests: Add tests for migration of private mem arch/x86/include/asm/kvm_host.h | 4 +- arch/x86/kvm/svm/sev.c | 85 ++----- arch/x86/kvm/svm/svm.h | 3 +- arch/x86/kvm/x86.c | 221 +++++++++++++++++- arch/x86/kvm/x86.h | 6 + include/linux/kvm_host.h | 18 ++ include/uapi/linux/kvm.h | 8 + tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 42 ++++ .../selftests/kvm/include/kvm_util_base.h | 31 +++ .../kvm/x86_64/private_mem_migrate_tests.c | 93 ++++++++ .../selftests/kvm/x86_64/sev_migrate_tests.c | 48 ++-- virt/kvm/guest_mem.c | 151 ++++++++++-- virt/kvm/kvm_main.c | 10 + virt/kvm/kvm_mm.h | 7 + 15 files changed, 596 insertions(+), 132 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_migrate_tests.c -- 2.41.0.640.ga95def55d0-goog

3 months

3
15
0 0

[PATCH v6 0/5] userfaultfd move option

by Suren Baghdasaryan

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. Previous upstraming attempts could be found at [6] and [7]. UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [3]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [3]. Furthermore, UFFDIO_MOVE enables moving swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. TODOs for follow-up improvements: - cross-mm support. Known differences from single-mm and missing pieces: - memcg recharging (might need to isolate pages in the process) - mm counters - cross-mm deposit table moves - cross-mm test - document the address space where src and dest reside in struct uffdio_move - TLB flush batching. Will require extensive changes to PTL locking in move_pages_pte(). OTOH that might let us reuse parts of mremap code. Changes since v5 [10]: - added logic to split large folios in move_pages_pte(), per David Hildenbrand - added check for PAE before split_huge_pmd() to avoid the split if the move operation can't be done - replaced calls to default_huge_page_size() with read_pmd_pagesize() in uffd_move_pmd test, per David Hildenbrand - fixed the condition in uffd_move_test_common() checking if area alignment is needed Changes since v4 [9]: - added Acked-by in patch 1, per Peter Xu - added description for ctx, mm and mode parameters of move_pages(), per kernel test robot - added Reviewed-by's, per Peter Xu and Axel Rasmussen - removed unused operations in uffd_test_case_ops - refactored uffd-unit-test changes to avoid using global variables and handle pmd moves without page size overrides, per Peter Xu Changes since v3 [8]: - changed retry path in folio_lock_anon_vma_read() to unlock and then relock RCU, per Peter Xu - removed cross-mm support from initial patchset, per David Hildenbrand - replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand - added missing cache flushing, per Lokesh Gidra and Peter Xu - updated manpage text in the patch description, per Peter Xu - renamed internal functions from "remap" to "move", per Peter Xu - added mmap_changing check after taking mmap_lock, per Peter Xu - changed uffd context check to ensure dst_mm is registered onto uffd we are operating on, Peter Xu and David Hildenbrand - changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand - fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot - comments cleanup, per David Hildenbrand and Peter Xu - checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand - prevent moving pinned pages, per Peter Xu - changed uffd tests to call move uffd_test_ctx_clear() at the end of the test run instead of in the beginning of the next run - added support for testcase-specific ops - added test for moving PMD-aligned blocks Changes since v2 [5]: - renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand - rebase over mm-unstable to use folio_move_anon_rmap(), per David Hildenbrand - added text for manpage explaining DONTFORK and KSM requirements for this feature, per David Hildenbrand - check for anon_vma changes in the fast path of folio_lock_anon_vma_read, per Peter Xu - updated the title and description of the first patch, per David Hildenbrand - updating comments in folio_lock_anon_vma_read() explaining the need for anon_vma checks, per David Hildenbrand - changed all mapcount checks to PageAnonExclusive, per Jann Horn and David Hildenbrand - changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS, per Jann Horn - added a check for PTE change after folio is locked in remap_pages_pte(), per Jann Horn - added handling of PMD migration entries and bailout when pmd_devmap(), per Jann Horn - added checks to ensure both src and dst VMAs are writable, per Peter Xu - added UFFD_FEATURE_MOVE, per Peter Xu - removed obsolete comments, per Peter Xu - renamed remap_anon_pte to remap_present_pte, per Peter Xu - added a comment for folio_get_anon_vma() explaining the need for anon_vma checks, per Peter Xu - changed error handling in remap_pages() to make it more clear, per Peter Xu - changed EFAULT to EAGAIN to retry when a hugepage appears or disappears from under us, per Peter Xu - added links to previous upstreaming attempts, per David Hildenbrand Changes since v1 [4]: - add mmget_not_zero in userfaultfd_remap, per Jann Horn - removed extern from function definitions, per Matthew Wilcox - converted to folios in remap_pages_huge_pmd, per Matthew Wilcox - use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand - handle pgtable transfers between MMs, per Jann Horn - ignore concurrent A/D pte bit changes, per Jann Horn - split functions into smaller units, per David Hildenbrand - test for folio_test_large in remap_anon_pte, per Matthew Wilcox - use pte_swp_exclusive for swapcount check, per David Hildenbrand - eliminated use of mmu_notifier_invalidate_range_start_nonblock, per Jann Horn - simplified THP alignment checks, per Jann Horn - refactored the loop inside remap_pages, per Jann Horn - additional clarifying comments, per Jann Horn Main changes since Andrea's last version [1]: - Trivial translations from page to folio, mmap_sem to mmap_lock - Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its possible failure - Move pte mapping into remap_pages_pte to allow for retries when source page or anon_vma is contended. Since pte_offset_map_nolock() start RCU read section, we can't block anymore after mapping a pte, so have to unmap the ptesm do the locking and retry. - Add and use anon_vma_trylock_write() to avoid blocking while in RCU read section. - Accommodate changes in mmu_notifier_range_init() API, switch to mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in RCU read section. - Open-code now removed __swp_swapcount() - Replace pmd_read_atomic() with pmdp_get_lockless() - Add new selftest for UFFDIO_MOVE [1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc… [2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha… [3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj… [4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/ [5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/ [6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh… [7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed… [8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/ [9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/ [10] https://lore.kernel.org/all/20231121171643.3719880-1-surenb@google.com/ Andrea Arcangeli (2): mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan (3): selftests/mm: call uffd_test_ctx_clear at the end of the test selftests/mm: add uffd_test_case_ops to allow test case-specific operations selftests/mm: add UFFDIO_MOVE ioctl test Documentation/admin-guide/mm/userfaultfd.rst | 3 + fs/userfaultfd.c | 72 +++ include/linux/rmap.h | 5 + include/linux/userfaultfd_k.h | 11 + include/uapi/linux/userfaultfd.h | 29 +- mm/huge_memory.c | 122 ++++ mm/khugepaged.c | 3 + mm/rmap.c | 30 + mm/userfaultfd.c | 614 +++++++++++++++++++ tools/testing/selftests/mm/uffd-common.c | 39 +- tools/testing/selftests/mm/uffd-common.h | 9 + tools/testing/selftests/mm/uffd-stress.c | 5 +- tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++ 13 files changed, 1130 insertions(+), 4 deletions(-) -- 2.43.0.rc2.451.g8631bc7472-goog

3 months, 2 weeks

7
43
0 0

[PATCH bpf-next v2 0/6] selftests/bpf: Various sockmap-related fixes

by Michal Luczaj

Series takes care of few bugs and missing features with the aim to improve the test coverage of sockmap/sockhash. Last patch is a create_pair() rewrite making use of __attribute__((cleanup)) to handle socket fd lifetime. Signed-off-by: Michal Luczaj <mhal(a)rbox.co> --- Changes in v2: - Rebase on bpf-next (Jakub) - Use cleanup helpers from kernel's cleanup.h (Jakub) - Fix subject of patch 3, rephrase patch 4, use correct prefix - Link to v1: https://lore.kernel.org/r/20240724-sockmap-selftest-fixes-v1-0-46165d224712… Changes in v1: - No declarations in function body (Jakub) - Don't touch output arguments until function succeeds (Jakub) - Link to v0: https://lore.kernel.org/netdev/027fdb41-ee11-4be0-a493-22f28a1abd7c@rbox.co/ --- Michal Luczaj (6): selftests/bpf: Support more socket types in create_pair() selftests/bpf: Socket pair creation, cleanups selftests/bpf: Simplify inet_socketpair() and vsock_socketpair_connectible() selftests/bpf: Honour the sotype of af_unix redir tests selftests/bpf: Exercise SOCK_STREAM unix_inet_redir_to_connected() selftests/bpf: Introduce __attribute__((cleanup)) in create_pair() .../selftests/bpf/prog_tests/sockmap_basic.c | 28 ++-- .../selftests/bpf/prog_tests/sockmap_helpers.h | 149 ++++++++++++++------- .../selftests/bpf/prog_tests/sockmap_listen.c | 117 ++-------------- 3 files changed, 124 insertions(+), 170 deletions(-) --- base-commit: 92cc2456e9775dc4333fb4aa430763ae4ac2f2d9 change-id: 20240729-selftest-sockmap-fixes-bcca996e143b Best regards, -- Michal Luczaj <mhal(a)rbox.co>

4 months

3
26
0 0

[PATCH bpf-next v2] arm64, bpf: Add 12-argument support for bpf trampoline

by Puranjay Mohan

The arm64 bpf JIT currently supports attaching the trampoline to functions with <= 8 arguments. This is because up to 8 arguments can be passed in registers r0-r7. If there are more than 8 arguments then the 9th and later arguments are passed on the stack, with SP pointing to the first stacked argument. See aapcs64[1] for more details. If the 8th argument is a structure of size > 8B, then it is passed fully on stack and r7 is not used for passing any argument. If there is a 9th argument, it will be passed on the stack, even though r7 is available. Add the support of storing and restoring arguments passed on the stack to the arm64 bpf trampoline. This will allow attaching the trampoline to functions that take up to 12 arguments. [1] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parame… Signed-off-by: Puranjay Mohan <puranjay(a)kernel.org> --- Changes in V1 -> V2: V1: https://lore.kernel.org/all/20240704173227.130491-1-puranjay@kernel.org/ - Fixed the argument handling for composite types (structs) --- arch/arm64/net/bpf_jit_comp.c | 139 ++++++++++++++----- tools/testing/selftests/bpf/DENYLIST.aarch64 | 3 - 2 files changed, 107 insertions(+), 35 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 751331f5ba90..063bf5e11fc6 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -30,6 +30,8 @@ #define TMP_REG_3 (MAX_BPF_JIT_REG + 3) #define FP_BOTTOM (MAX_BPF_JIT_REG + 4) #define ARENA_VM_START (MAX_BPF_JIT_REG + 5) +/* Up to eight function arguments are passed in registers r0-r7 */ +#define ARM64_MAX_REG_ARGS 8 #define check_imm(bits, imm) do { \ if ((((imm) > 0) && ((imm) >> (bits))) || \ @@ -2001,26 +2003,51 @@ static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl, } } -static void save_args(struct jit_ctx *ctx, int args_off, int nregs) +static void save_args(struct jit_ctx *ctx, int args_off, int orig_sp_off, + int nargs, int nreg_args) { + const u8 tmp = bpf2a64[TMP_REG_1]; + int arg_pos; int i; - for (i = 0; i < nregs; i++) { - emit(A64_STR64I(i, A64_SP, args_off), ctx); + for (i = 0; i < nargs; i++) { + if (i < nreg_args) { + emit(A64_STR64I(i, A64_SP, args_off), ctx); + } else { + arg_pos = orig_sp_off + (i - nreg_args) * 8; + emit(A64_LDR64I(tmp, A64_SP, arg_pos), ctx); + emit(A64_STR64I(tmp, A64_SP, args_off), ctx); + } args_off += 8; } } -static void restore_args(struct jit_ctx *ctx, int args_off, int nregs) +static void restore_args(struct jit_ctx *ctx, int args_off, int nreg_args) { int i; - for (i = 0; i < nregs; i++) { + for (i = 0; i < nreg_args; i++) { emit(A64_LDR64I(i, A64_SP, args_off), ctx); args_off += 8; } } +static void restore_stack_args(struct jit_ctx *ctx, int args_off, int stk_arg_off, + int nargs, int nreg_args) +{ + const u8 tmp = bpf2a64[TMP_REG_1]; + int arg_pos; + int i; + + for (i = nreg_args; i < nargs; i++) { + arg_pos = args_off + i * 8; + emit(A64_LDR64I(tmp, A64_SP, arg_pos), ctx); + emit(A64_STR64I(tmp, A64_SP, stk_arg_off), ctx); + + stk_arg_off += 8; + } +} + /* Based on the x86's implementation of arch_prepare_bpf_trampoline(). * * bpf prog and function entry before bpf trampoline hooked: @@ -2034,15 +2061,17 @@ static void restore_args(struct jit_ctx *ctx, int args_off, int nregs) */ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, struct bpf_tramp_links *tlinks, void *func_addr, - int nregs, u32 flags) + int nargs, int nreg_args, u32 flags) { int i; int stack_size; + int stk_arg_off; + int orig_sp_off; int retaddr_off; int regs_off; int retval_off; int args_off; - int nregs_off; + int nargs_off; int ip_off; int run_ctx_off; struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; @@ -2052,6 +2081,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, __le32 **branches = NULL; /* trampoline stack layout: + * SP + orig_sp_off [ first stack arg ] if nargs > 8 * [ parent ip ] * [ FP ] * SP + retaddr_off [ self ip ] @@ -2069,14 +2099,24 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, * [ ... ] * SP + args_off [ arg reg 1 ] * - * SP + nregs_off [ arg regs count ] + * SP + nargs_off [ arg count ] * * SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag * * SP + run_ctx_off [ bpf_tramp_run_ctx ] + * + * [ stack_argN ] + * [ ... ] + * SP + stk_arg_off [ stack_arg1 ] BPF_TRAMP_F_CALL_ORIG */ stack_size = 0; + stk_arg_off = stack_size; + if ((flags & BPF_TRAMP_F_CALL_ORIG) && (nargs - nreg_args > 0)) { + /* room for saving arguments passed on stack */ + stack_size += (nargs - nreg_args) * 8; + } + run_ctx_off = stack_size; /* room for bpf_tramp_run_ctx */ stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8); @@ -2086,13 +2126,13 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, if (flags & BPF_TRAMP_F_IP_ARG) stack_size += 8; - nregs_off = stack_size; + nargs_off = stack_size; /* room for args count */ stack_size += 8; args_off = stack_size; /* room for args */ - stack_size += nregs * 8; + stack_size += nargs * 8; /* room for return value */ retval_off = stack_size; @@ -2110,6 +2150,11 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, /* return address locates above FP */ retaddr_off = stack_size + 8; + /* original SP position + * stack_size + parent function frame + patched function frame + */ + orig_sp_off = stack_size + 32; + /* bpf trampoline may be invoked by 3 instruction types: * 1. bl, attached to bpf prog or kernel function via short jump * 2. br, attached to bpf prog or kernel function via long jump @@ -2135,12 +2180,12 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx); } - /* save arg regs count*/ - emit(A64_MOVZ(1, A64_R(10), nregs, 0), ctx); - emit(A64_STR64I(A64_R(10), A64_SP, nregs_off), ctx); + /* save argument count */ + emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx); + emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx); - /* save arg regs */ - save_args(ctx, args_off, nregs); + /* save arguments passed in regs and on the stack */ + save_args(ctx, args_off, orig_sp_off, nargs, nreg_args); /* save callee saved registers */ emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx); @@ -2167,7 +2212,10 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, } if (flags & BPF_TRAMP_F_CALL_ORIG) { - restore_args(ctx, args_off, nregs); + /* restore arguments that were passed in registers */ + restore_args(ctx, args_off, nreg_args); + /* restore arguments that were passed on the stack */ + restore_stack_args(ctx, args_off, stk_arg_off, nargs, nreg_args); /* call original func */ emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx); emit(A64_ADR(A64_LR, AARCH64_INSN_SIZE * 2), ctx); @@ -2196,7 +2244,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, } if (flags & BPF_TRAMP_F_RESTORE_REGS) - restore_args(ctx, args_off, nregs); + restore_args(ctx, args_off, nreg_args); /* restore callee saved register x19 and x20 */ emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx); @@ -2228,19 +2276,42 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, return ctx->idx; } -static int btf_func_model_nregs(const struct btf_func_model *m) +static int btf_func_model_nargs(const struct btf_func_model *m) { - int nregs = m->nr_args; + int nargs = m->nr_args; int i; - /* extra registers needed for struct argument */ + /* extra registers or stack slots needed for struct argument */ for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) { /* The arg_size is at most 16 bytes, enforced by the verifier. */ if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) - nregs += (m->arg_size[i] + 7) / 8 - 1; + nargs += (m->arg_size[i] + 7) / 8 - 1; } - return nregs; + return nargs; +} + +/* get the count of the regs that are used to pass arguments */ +static int btf_func_model_nreg_args(const struct btf_func_model *m) +{ + int nargs = m->nr_args; + int nreg_args = 0; + int i; + + for (i = 0; i < nargs; i++) { + /* The arg_size is at most 16 bytes, enforced by the verifier. */ + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) { + /* struct members are all in the registers or all + * on the stack. + */ + if (nreg_args + ((m->arg_size[i] + 7) / 8 - 1) > 7) + break; + nreg_args += (m->arg_size[i] + 7) / 8 - 1; + } + nreg_args++; + } + + return (nreg_args > ARM64_MAX_REG_ARGS ? ARM64_MAX_REG_ARGS : nreg_args); } int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, @@ -2251,14 +2322,16 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, .idx = 0, }; struct bpf_tramp_image im; - int nregs, ret; + int nargs, nreg_args, ret; - nregs = btf_func_model_nregs(m); - /* the first 8 registers are used for arguments */ - if (nregs > 8) + nargs = btf_func_model_nargs(m); + if (nargs > MAX_BPF_FUNC_ARGS) return -ENOTSUPP; - ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, nregs, flags); + nreg_args = btf_func_model_nreg_args(m); + + ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, nargs, nreg_args, + flags); if (ret < 0) return ret; @@ -2285,7 +2358,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr) { - int ret, nregs; + int ret, nargs, nreg_args; void *image, *tmp; u32 size = ro_image_end - ro_image; @@ -2302,13 +2375,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image, .idx = 0, }; - nregs = btf_func_model_nregs(m); - /* the first 8 registers are used for arguments */ - if (nregs > 8) + nargs = btf_func_model_nargs(m); + if (nargs > MAX_BPF_FUNC_ARGS) return -ENOTSUPP; + nreg_args = btf_func_model_nreg_args(m); + jit_fill_hole(image, (unsigned int)(ro_image_end - ro_image)); - ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); + ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nargs, nreg_args, + flags); if (ret > 0 && validate_code(&ctx) < 0) { ret = -EINVAL; diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64 index 3c7c3e79aa93..e865451e90d2 100644 --- a/tools/testing/selftests/bpf/DENYLIST.aarch64 +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64 @@ -4,9 +4,6 @@ fexit_sleep # The test never returns. The r kprobe_multi_bench_attach # needs CONFIG_FPROBE kprobe_multi_test # needs CONFIG_FPROBE module_attach # prog 'kprobe_multi': failed to auto-attach: -95 -fentry_test/fentry_many_args # fentry_many_args:FAIL:fentry_many_args_attach unexpected error: -524 -fexit_test/fexit_many_args # fexit_many_args:FAIL:fexit_many_args_attach unexpected error: -524 -tracing_struct/struct_many_args # struct_many_args:FAIL:tracing_struct_many_args__attach unexpected error: -524 fill_link_info/kprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95 fill_link_info/kretprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95 fill_link_info/kprobe_multi_invalid_ubuff # bpf_program__attach_kprobe_multi_opts unexpected error: -95 -- 2.40.1

5 months, 2 weeks

3
4
0 0

[PATCH v2 0/6] cleanups, fixes, and progress towards avoiding "make headers"

by John Hubbard

Jeff Xu, I apologize for this churn: I was forced to drop your Reviewed-by and Tested-by tags from 2 of the 3 mseal patches, because the __NR_mseal fix is completely different now. Changes since v1: a) Reworked the mseal fix to use the kernel's in-tree unistd*.h files, instead of hacking in a __NR_mseal definition directly. (Thanks to David Hildenbrand for pointing out that this needed to be done.) b) Fixed the subject line of the kvm and mdwe patch. c) Reordered the patches so as to group the mseal changes together. d) ADDED an additional patch, 6/6, to remove various __NR_xx items and checks from the mm selftests. Cover letter, updated for v2: Eventually, once the build succeeds on a sufficiently old distro, the idea is to delete $(KHDR_INCLUDES) from the selftests/mm build, and then after that, from selftests/lib.mk and all of the other selftest builds. For now, this series merely achieves a clean build of selftests/mm on a not-so-old distro: Ubuntu 23.04. In other words, after this series is applied, it is possible to delete $(KHDR_INCLUDES) from selftests/mm/Makefile and the build will still succeed. 1. Add tools/uapi/asm/unistd_[32|x32|64].h files, which include definitions of __NR_mseal, and include them (indirectly) from the files that use __NR_mseal. The new files are copied from ./usr/include/asm, which is how we have agreed to do this sort of thing, see [1]. 2. Add fs.h, similarly created: it was copied directly from a snapshot of ./usr/include/linux/fs.h after running "make headers". 3. Add a few selected prctl.h values that the ksm and mdwe tests require. 4. Factor out some common code from mseal_test.c and seal_elf.c, into a new mseal_helpers.h file. 5. Remove local __NR_* definitions and checks. [1] commit e076eaca5906 ("selftests: break the dependency upon local header files") John Hubbard (6): selftests/mm: mseal, self_elf: fix missing __NR_mseal selftests/mm: mseal, self_elf: factor out test macros and other duplicated items selftests/mm: mseal, self_elf: rename TEST_END_CHECK to REPORT_TEST_PASS selftests/mm: fix vm_util.c build failures: add snapshot of fs.h selftests/mm: kvm, mdwe fixes to avoid requiring "make headers" selftests/mm: remove local __NR_* definitions tools/include/uapi/asm/unistd_32.h | 458 ++++++++++++++++++ tools/include/uapi/asm/unistd_64.h | 380 +++++++++++++++ tools/include/uapi/asm/unistd_x32.h | 369 ++++++++++++++ tools/include/uapi/linux/fs.h | 392 +++++++++++++++ tools/testing/selftests/mm/hugepage-mremap.c | 2 +- .../selftests/mm/ksm_functional_tests.c | 8 +- tools/testing/selftests/mm/mdwe_test.c | 1 + tools/testing/selftests/mm/memfd_secret.c | 14 +- tools/testing/selftests/mm/mkdirty.c | 8 +- tools/testing/selftests/mm/mlock2.h | 1 + tools/testing/selftests/mm/mrelease_test.c | 2 +- tools/testing/selftests/mm/mseal_helpers.h | 41 ++ tools/testing/selftests/mm/mseal_test.c | 143 ++---- tools/testing/selftests/mm/pagemap_ioctl.c | 2 +- tools/testing/selftests/mm/protection_keys.c | 2 +- tools/testing/selftests/mm/seal_elf.c | 37 +- tools/testing/selftests/mm/uffd-common.c | 4 - tools/testing/selftests/mm/uffd-stress.c | 16 +- tools/testing/selftests/mm/uffd-unit-tests.c | 14 +- tools/testing/selftests/mm/vm_util.h | 15 + 20 files changed, 1717 insertions(+), 192 deletions(-) create mode 100644 tools/include/uapi/asm/unistd_32.h create mode 100644 tools/include/uapi/asm/unistd_64.h create mode 100644 tools/include/uapi/asm/unistd_x32.h create mode 100644 tools/include/uapi/linux/fs.h create mode 100644 tools/testing/selftests/mm/mseal_helpers.h base-commit: 2ccbdf43d5e758f8493a95252073cf9078a5fea5 -- 2.45.2

5 months, 3 weeks

4
22
0 0

[PATCH v2 0/2] unicode: kunit: refactor selftest to kunit tests

by Pedro Orlando

Hey all, We are making these changes as part of a KUnit Hackathon at LKCamp [1]. This patch sets out to refactor fs/unicode/utf8-selftest.c to KUnit tests. The main benefit of this change is that we can leverage KUnit's test suite for quickly compiling and testing the functions in utf8, instead of compiling the kernel and loading the previous utf8-selftest module, as well as adopting a pattern across all kernel tests. The first commit is the refactoring itself from self test into KUnit, which kept the original test logic intact -- maintaining the purpose of the original tests -- with the added benefit of including these tests into the KUnit test suite. The second commit applies the naming style and file path conventions defined on Documentation/dev-tools/kunit/style.rst We appreciate any feedback and suggestions. :) [1] https://lkcamp.dev/about/ Co-developed-by: Pedro Orlando <porlando(a)lkcamp.dev> Signed-off-by: Pedro Orlando <porlando(a)lkcamp.dev> Co-developed-by: Danilo Pereira <dpereira(a)lkcamp.dev> Signed-off-by: Danilo Pereira <dpereira(a)lkcamp.dev> Signed-off-by: Gabriela Bittencourt <gbittencourt(a)lkcamp.dev> Gabriela Bittencourt (2): unicode: kunit: refactor selftest to kunit tests unicode: kunit: change tests filename and path fs/unicode/Kconfig | 5 +- fs/unicode/Makefile | 2 +- fs/unicode/tests/.kunitconfig | 3 + .../{utf8-selftest.c => tests/utf8_kunit.c} | 149 ++++++++---------- 4 files changed, 76 insertions(+), 83 deletions(-) create mode 100644 fs/unicode/tests/.kunitconfig rename fs/unicode/{utf8-selftest.c => tests/utf8_kunit.c} (64%) -- 2.34.1

5 months, 3 weeks

6
11
0 0

[PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs

by Colton Lewis

Extend pmu_counters_test to AMD CPUs. As the AMD PMU is quite different from Intel with different events and feature sets, this series introduces a new code path to test it, specifically focusing on the core counters including the PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache counters exist, but are not as important and can be deferred to a later series. The first patch is a bug fix that could be submitted separately. The series has been tested on both Intel and AMD machines, but I have not found an AMD machine old enough to lack PerfCtrExtCore. I have made efforts that no part of the code has any dependency on its presence. I am aware of similar work in this direction done by Jinrong Liang [1]. He told me he is not working on it currently and I am not intruding by making my own submission. [1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/ v2: * Test all combinations of VM setup rather than only the maximum allowed by hardware * Add fixes tag to bug fix in patch 1 * Refine some names v1: https://lore.kernel.org/kvm/20240813164244.751597-1-coltonlewis@google.com/ Colton Lewis (6): KVM: x86: selftests: Fix typos in macro variable use KVM: x86: selftests: Define AMD PMU CPUID leaves KVM: x86: selftests: Set up AMD VM in pmu_counters_test KVM: x86: selftests: Test read/write core counters KVM: x86: selftests: Test core events KVM: x86: selftests: Test PerfMonV2 .../selftests/kvm/include/x86_64/processor.h | 7 + .../selftests/kvm/x86_64/pmu_counters_test.c | 304 ++++++++++++++++-- 2 files changed, 277 insertions(+), 34 deletions(-) base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63 -- 2.46.0.662.g92d0881bb0-goog

6 months, 2 weeks

2
20
0 0

[PATCH v10 00/14] riscv: Add support for xtheadvector

by Charlie Jenkins

xtheadvector is a custom extension that is based upon riscv vector version 0.7.1 [1]. All of the vector routines have been modified to support this alternative vector version based upon whether xtheadvector was determined to be supported at boot. vlenb is not supported on the existing xtheadvector hardware, so a devicetree property thead,vlenb is added to provide the vlenb to Linux. There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is used to request which thead vendor extensions are supported on the current platform. This allows future vendors to allocate hwprobe keys for their vendor. Support for xtheadvector is also added to the vector kselftests. Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com> [1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361… --- This series is a continuation of a different series that was fragmented into two other series in an attempt to get part of it merged in the 6.10 merge window. The split-off series did not get merged due to a NAK on the series that added the generic riscv,vlenb devicetree entry. This series has converted riscv,vlenb to thead,vlenb to remedy this issue. The original series is titled "riscv: Support vendor extensions and xtheadvector" [3]. The series titled "riscv: Extend cpufeature.c to detect vendor extensions" is still under development and this series is based on that series! [4] I have tested this with an Allwinner Nezha board. I used SkiffOS [1] to manage building the image, but upgraded the U-Boot version to Samuel Holland's more up-to-date version [2] and changed out the device tree used by U-Boot with the device trees that are present in upstream linux and this series. Thank you Samuel for all of the work you did to make this task possible. [1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha [2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9… [3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v… [4] https://lore.kernel.org/lkml/20240719-support_vendor_extensions-v3-4-0af758… --- Changes in v10: - In DT probing disable vector with new function to clear vendor extension bits for xtheadvector - Add ghostwrite mitigations for c9xx CPUs. This disables xtheadvector unless mitigations=off is set as a kernel boot arg - Link to v9: https://lore.kernel.org/r/20240806-xtheadvector-v9-0-62a56d2da5d0@rivosinc.… Changes in v9: - Rebase onto palmer's for-next - Fix sparse error in arch/riscv/kernel/vendor_extensions/thead.c - Fix maybe-uninitialized warning in arch/riscv/include/asm/vendor_extensions/vendor_hwprobe.h - Wrap some long lines - Link to v8: https://lore.kernel.org/r/20240724-xtheadvector-v8-0-cf043168e137@rivosinc.… Changes in v8: - Rebase onto palmer's for-next - Link to v7: https://lore.kernel.org/r/20240724-xtheadvector-v7-0-b741910ada3e@rivosinc.… Changes in v7: - Add defs for has_xtheadvector_no_alternatives() and has_xtheadvector() when vector disabled. (Palmer) - Link to v6: https://lore.kernel.org/r/20240722-xtheadvector-v6-0-c9af0130fa00@rivosinc.… Changes in v6: - Fix return type of is_vector_supported()/is_xthead_supported() to be bool - Link to v5: https://lore.kernel.org/r/20240719-xtheadvector-v5-0-4b485fc7d55f@rivosinc.… Changes in v5: - Rebase on for-next - Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.… Changes in v4: - Replace inline asm with C (Samuel) - Rename VCSRs to CSRs (Samuel) - Replace .insn directives with .4byte directives - Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.… Changes in v3: - Add back Heiko's signed-off-by (Conor) - Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask - Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.… Changes in v2: - Removed extraneous references to "riscv,vlenb" (Jess) - Moved declaration of "thead,vlenb" into cpus.yaml and added restriction that it's only applicable to thead cores (Conor) - Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for thead,vlenb (Jess) - Fix naming of hwprobe variables (Evan) - Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.… --- Charlie Jenkins (13): dt-bindings: riscv: Add xtheadvector ISA extension description dt-bindings: cpus: add a thead vlen register length property riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree riscv: Add thead and xtheadvector as a vendor extension riscv: vector: Use vlenb from DT for thead riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT riscv: Add xtheadvector instruction definitions riscv: vector: Support xtheadvector save/restore riscv: hwprobe: Add thead vendor extension probing riscv: hwprobe: Document thead vendor extensions and xtheadvector extension selftests: riscv: Fix vector tests selftests: riscv: Support xtheadvector in vector tests riscv: Add ghostwrite vulnerability Heiko Stuebner (1): RISC-V: define the elements of the VCSR vector CSR Documentation/arch/riscv/hwprobe.rst | 10 + Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++ .../devicetree/bindings/riscv/extensions.yaml | 10 + arch/riscv/Kconfig.errata | 11 + arch/riscv/Kconfig.vendor | 26 ++ arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +- arch/riscv/errata/thead/errata.c | 28 ++ arch/riscv/include/asm/bugs.h | 22 ++ arch/riscv/include/asm/cpufeature.h | 2 + arch/riscv/include/asm/csr.h | 15 + arch/riscv/include/asm/errata_list.h | 3 +- arch/riscv/include/asm/hwprobe.h | 3 +- arch/riscv/include/asm/switch_to.h | 2 +- arch/riscv/include/asm/vector.h | 225 +++++++++++---- arch/riscv/include/asm/vendor_extensions/thead.h | 48 ++++ .../include/asm/vendor_extensions/thead_hwprobe.h | 19 ++ .../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++ arch/riscv/include/uapi/asm/hwprobe.h | 3 +- arch/riscv/include/uapi/asm/vendor/thead.h | 3 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/bugs.c | 55 ++++ arch/riscv/kernel/cpufeature.c | 58 +++- arch/riscv/kernel/kernel_mode_vector.c | 8 +- arch/riscv/kernel/process.c | 4 +- arch/riscv/kernel/signal.c | 6 +- arch/riscv/kernel/sys_hwprobe.c | 5 + arch/riscv/kernel/vector.c | 24 +- arch/riscv/kernel/vendor_extensions.c | 10 + arch/riscv/kernel/vendor_extensions/Makefile | 2 + arch/riscv/kernel/vendor_extensions/thead.c | 29 ++ .../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++ drivers/base/cpu.c | 3 + include/linux/cpu.h | 1 + tools/testing/selftests/riscv/vector/.gitignore | 3 +- tools/testing/selftests/riscv/vector/Makefile | 17 +- .../selftests/riscv/vector/v_exec_initval_nolibc.c | 94 +++++++ tools/testing/selftests/riscv/vector/v_helpers.c | 68 +++++ tools/testing/selftests/riscv/vector/v_helpers.h | 8 + tools/testing/selftests/riscv/vector/v_initval.c | 22 ++ .../selftests/riscv/vector/v_initval_nolibc.c | 68 ----- .../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +- .../testing/selftests/riscv/vector/vstate_prctl.c | 305 +++++++++++++-------- 42 files changed, 1048 insertions(+), 272 deletions(-) --- base-commit: 0e3f3649d44bf1b388a7613ade14c29cbdedf075 change-id: 20240530-xtheadvector-833d3d17b423 -- - Charlie

6 months, 2 weeks

9
35
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror September 2024