October 2025 - Linux-kselftest-mirror

[PATCH 0/7] KVM: x86: Improve the handling of debug exceptions during instruction emulation

by Hou Wenlong

During my testing, I found that guest debugging with 'DR6.BD' does not work in instruction emulation, as the current code only considers the guest's DR7. Upon reviewing the code, I also observed that the checks for the userspace guest debugging feature and the guest's own debugging feature are repeated in different places during instruction emulation, but the overall logic is the same. If guest debugging is enabled, it needs to exit to userspace; otherwise, a #DB exception needs to be injected into the guest. Therefore, as suggested by Jiangshan Lai, some cleanup has been done for #DB handling in instruction emulation in this patchset. A new function named 'kvm_inject_emulated_db()' is introduced to consolidate all the checking logic. Moreover, I hope we can make the #DB interception path use the same function as well. Additionally, when I looked into the single-step #DB handling in instruction emulation, I noticed that the interrupt shadow is toggled, but it is not considered in the single-step #DB injection. This oversight causes VM entry to fail on VMX (due to pending debug exceptions checking) or breaks the 'MOV SS' suppressed #DB. For the latter, I have kept the behavior for now in my patchset, as I need some suggestions. Hou Wenlong (7): KVM: x86: Set guest DR6 by kvm_queue_exception_p() in instruction emulation KVM: x86: Check guest debug in DR access instruction emulation KVM: x86: Only check effective code breakpoint in emulation KVM: x86: Consolidate KVM_GUESTDBG_SINGLESTEP check into the kvm_inject_emulated_db() KVM: VMX: Set 'BS' bit in pending debug exceptions during instruction emulation KVM: selftests: Verify guest debug DR7.GD checking during instruction emulation KVM: selftests: Verify 'BS' bit checking in pending debug exception during VM entry arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/emulate.c | 14 +-- arch/x86/kvm/kvm_emulate.h | 7 +- arch/x86/kvm/vmx/main.c | 9 ++ arch/x86/kvm/vmx/vmx.c | 14 ++- arch/x86/kvm/vmx/x86_ops.h | 1 + arch/x86/kvm/x86.c | 109 +++++++++++------- arch/x86/kvm/x86.h | 7 ++ .../selftests/kvm/include/x86/processor.h | 3 +- tools/testing/selftests/kvm/x86/debug_regs.c | 64 +++++++++- 11 files changed, 167 insertions(+), 63 deletions(-) base-commit: ecbcc2461839e848970468b44db32282e5059925 -- 2.31.1

1 week, 3 days

2
6
0 0

[PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA

by Jiaqi Yan

Problem ======= When host APEI is unable to claim a synchronous external abort (SEA) during guest abort, today KVM directly injects an asynchronous SError into the VCPU then resumes it. The injected SError usually results in unpleasant guest kernel panic. One of the major situation of guest SEA is when VCPU consumes recoverable uncorrected memory error (UER), which is not uncommon at all in modern datacenter servers with large amounts of physical memory. Although SError and guest panic is sufficient to stop the propagation of corrupted memory, there is room to recover from an UER in a more graceful manner. Proposed Solution ================= The idea is, we can replay the SEA to the faulting VCPU. If the memory error consumption or the fault that cause SEA is not from guest kernel, the blast radius can be limited to the poison-consuming guest process, while the VM can keep running. In addition, instead of doing under the hood without involving userspace, there are benefits to redirect the SEA to VMM: - VM customers care about the disruptions caused by memory errors, and VMM usually has the responsibility to start the process of notifying the customers of memory error events in their VMs. For example some cloud provider emits a critical log in their observability UI [1], and provides a playbook for customers on how to mitigate disruptions to their workloads. - VMM can protect future memory error consumption by unmapping the poisoned pages from stage-2 page table with KVM userfault [2], or by splitting the memslot that contains the poisoned pages. - VMM can keep track of SEA events in the VM. When VMM thinks the status on the host or the VM is bad enough, e.g. number of distinct SEAs exceeds a threshold, it can restart the VM on another healthy host. - Behavior parity with x86 architecture. When machine check exception (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to let VMM either recover from the MCE, or terminate itself with VM. The prior RFC proposes to implement SIGBUS on arm64 as well, but Marc preferred KVM exit over signal [3]. However, implementation aside, returning SEA to VMM is on par with returning MCE to VMM. Once SEA is redirected to VMM, among other actions, VMM is encouraged to inject external aborts into the faulting VCPU. New UAPIs ========= This patchset introduces following userspace-visible changes to empower VMM to control what happens for SEA on guest memory: - KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled this new capability at VM creation, and the SEA is not owned by kernel allocated memory, instead of injecting SError, return KVM_EXIT_ARM_SEA to userspace. - KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details about the SEA is provided in arm_sea as much as possible, including sanitized ESR value at EL2, faulting guest virtual and physical addresses if available. * From v3 [4] - Rebased on commit 3a8660878839 ("Linux 6.18-rc1"). - In selftest, print a message if GVA or GPA expects to be valid. * From v2 [5]: - Rebased on "[PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection" [6] and kvmarm/next commit 7b8346bd9fce6 ("KVM: arm64: Don't attempt vLPI mappings when vPE allocation is disabled") - Took the host_owns_sea implementation from Oliver [7, 8]. - Excluded the guest SEA injection patches. - Updated selftest. * From v1 [9]: - Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer"). - Sanitize ESR_EL2 before reporting it to userspace. - Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to stage-2 translation table. [1] https://cloud.google.com/solutions/sap/docs/manage-host-errors [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com [3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org [4] https://lore.kernel.org/kvmarm/20250731205844.1346839-1-jiaqiyan@google.com [5] https://lore.kernel.org/kvm/20250604050902.3944054-1-jiaqiyan@google.com [6] https://lore.kernel.org/kvmarm/20250729182342.3281742-1-oliver.upton@linux.… [7] https://lore.kernel.org/kvm/aHFohmTb9qR_JG1E@linux.dev [8] https://lore.kernel.org/kvm/aHK-DPufhLy5Dtuk@linux.dev [9] https://lore.kernel.org/kvm/20250505161412.1926643-1-jiaqiyan@google.com Jiaqi Yan (3): KVM: arm64: VM exit to userspace to handle SEA KVM: selftests: Test for KVM_EXIT_ARM_SEA Documentation: kvm: new UAPI for handling SEA Documentation/virt/kvm/api.rst | 61 ++++ arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/arm.c | 5 + arch/arm64/kvm/mmu.c | 68 +++- include/uapi/linux/kvm.h | 10 + tools/arch/arm64/include/asm/esr.h | 2 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/arm64/sea_to_user.c | 331 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 1 + 9 files changed, 480 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c -- 2.51.0.760.g7b8bcc2412-goog

2 weeks, 2 days

9
22
0 0

[PATCH v5 0/7] platform/chrome: Fix a possible UAF via revocable

by Tzung-Bi Shih

This is a follow-up series of [1]. It tries to fix a possible UAF in the fops of cros_ec_chardev after the underlying protocol device has gone by using revocable. The 1st patch introduces the revocable which is an implementation of ideas from the talk [2]. The 2nd and 3rd patches add test cases for revocable in Kunit and selftest. The 4th patch converts existing protocol devices to resource providers of cros_ec_device. The 5th - 7th are PoC patches for showing the use case of "Replace file operations" below. --- I came out with 2 possible usages of revocable. 1. Use primitive APIs Use the primitive APIs of revocable directly. The file operations make sure the resources are available when using them. This is what the series original proposed[3][4]. Even though it has the finest grain for accessing the resources, it makes the user code verbose. Per feedback from the community, I'm looking for some subsystem level helpers so that user code can be simlper. 2. Replace file operations Replace filp->f_op to revocable-aware warppers. The warppers make sure the resources are available in the file operations. The user code needs to provide a callback .try_access() to tell the wrappers where/how to *save* the pointers of resources. Known drawback: - The warppers reserve the resources for all file operations even if they might be unused. - The user code still needs to be revocable-aware. - The whole file operation becomes a SRCU read-side critical section. Are there any functions can't be called in the critical section? If there is, the file operations may not be awared of that. See 5th - 7th patches for an example usage. [1] https://lore.kernel.org/chrome-platform/20250721044456.2736300-6-tzungbi@ke… [2] https://lpc.events/event/17/contributions/1627/ [3] https://lore.kernel.org/chrome-platform/20250912081718.3827390-5-tzungbi@ke… [4] https://lore.kernel.org/chrome-platform/20250912081718.3827390-6-tzungbi@ke… v5: - Rebase onto next-20251015. - Add more context about the PoC. - Support multiple revocable providers in the PoC. v4: https://lore.kernel.org/chrome-platform/20250923075302.591026-1-tzungbi@ker… - Rebase onto next-20250922. - Remove the 5th patch from v3. - Add fops replacement PoC in 5th - 7th patches. v3: https://lore.kernel.org/chrome-platform/20250912081718.3827390-1-tzungbi@ke… - Rebase onto https://lore.kernel.org/chrome-platform/20250828083601.856083-1-tzungbi@ker… and next-20250912. - The 4th patch changed accordingly. v2: https://lore.kernel.org/chrome-platform/20250820081645.847919-1-tzungbi@ker… - Rename "ref_proxy" -> "revocable". - Add test cases in Kunit and selftest. v1: https://lore.kernel.org/chrome-platform/20250814091020.1302888-1-tzungbi@ke… Tzung-Bi Shih (7): revocable: Revocable resource management revocable: Add Kunit test cases selftests: revocable: Add kselftest cases platform/chrome: Protect cros_ec_device lifecycle with revocable revocable: Add fops replacement char: misc: Leverage revocable fops replacement platform/chrome: cros_ec_chardev: Secure cros_ec_device via revocable .../driver-api/driver-model/index.rst | 1 + .../driver-api/driver-model/revocable.rst | 87 +++++++ MAINTAINERS | 9 + drivers/base/Kconfig | 8 + drivers/base/Makefile | 5 +- drivers/base/revocable.c | 233 ++++++++++++++++++ drivers/base/revocable_test.c | 110 +++++++++ drivers/char/misc.c | 8 + drivers/platform/chrome/cros_ec.c | 5 + drivers/platform/chrome/cros_ec_chardev.c | 22 +- fs/Makefile | 2 +- fs/fs_revocable.c | 154 ++++++++++++ include/linux/fs.h | 2 + include/linux/fs_revocable.h | 21 ++ include/linux/miscdevice.h | 4 + include/linux/platform_data/cros_ec_proto.h | 4 + include/linux/revocable.h | 53 ++++ tools/testing/selftests/Makefile | 1 + .../selftests/drivers/base/revocable/Makefile | 7 + .../drivers/base/revocable/revocable_test.c | 116 +++++++++ .../drivers/base/revocable/test-revocable.sh | 39 +++ .../base/revocable/test_modules/Makefile | 10 + .../revocable/test_modules/revocable_test.c | 188 ++++++++++++++ 23 files changed, 1086 insertions(+), 3 deletions(-) create mode 100644 Documentation/driver-api/driver-model/revocable.rst create mode 100644 drivers/base/revocable.c create mode 100644 drivers/base/revocable_test.c create mode 100644 fs/fs_revocable.c create mode 100644 include/linux/fs_revocable.h create mode 100644 include/linux/revocable.h create mode 100644 tools/testing/selftests/drivers/base/revocable/Makefile create mode 100644 tools/testing/selftests/drivers/base/revocable/revocable_test.c create mode 100755 tools/testing/selftests/drivers/base/revocable/test-revocable.sh create mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/Makefile create mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/revocable_test.c -- 2.51.0.788.g6d19910ace-goog

2 weeks, 2 days

7
41
0 0

[PATCH kvm-next V11 0/7] Add NUMA mempolicy support for KVM guest-memfd

by Shivank Garg

This series introduces NUMA-aware memory placement support for KVM guests with guest_memfd memory backends. It builds upon Fuad Tabba's work (V17) that enabled host-mapping for guest_memfd memory [1] and can be applied directly applied on KVM tree [2] (branch kvm-next, base commit: a6ad5413, Merge branch 'guest-memfd-mmap' into HEAD) == Background == KVM's guest-memfd memory backend currently lacks support for NUMA policy enforcement, causing guest memory allocations to be distributed across host nodes according to kernel's default behavior, irrespective of any policy specified by the VMM. This limitation arises because conventional userspace NUMA control mechanisms like mbind(2) don't work since the memory isn't directly mapped to userspace when allocations occur. Fuad's work [1] provides the necessary mmap capability, and this series leverages it to enable mbind(2). == Implementation == This series implements proper NUMA policy support for guest-memfd by: 1. Adding mempolicy-aware allocation APIs to the filemap layer. 2. Introducing custom inodes (via a dedicated slab-allocated inode cache, kvm_gmem_inode_info) to store NUMA policy and metadata for guest memory. 3. Implementing get/set_policy vm_ops in guest_memfd to support NUMA policy. With these changes, VMMs can now control guest memory placement by mapping guest_memfd file descriptor and using mbind(2) to specify: - Policy modes: default, bind, interleave, or preferred - Host NUMA nodes: List of target nodes for memory allocation These Policies affect only future allocations and do not migrate existing memory. This matches mbind(2)'s default behavior which affects only new allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (Not supported for guest_memfd as it is unmovable by design). == Upstream Plan == Phased approach as per David's guest_memfd extension overview [3] and community calls [4]: Phase 1 (this series): 1. Focuses on shared guest_memfd support (non-CoCo VMs). 2. Builds on Fuad's host-mapping work [1]. Phase2 (future work): 1. NUMA support for private guest_memfd (CoCo VMs). 2. Depends on SNP in-place conversion support [5]. This series provides a clean integration path for NUMA-aware memory management for guest_memfd and lays the groundwork for future confidential computing NUMA capabilities. Thanks, Shivank == Changelog == - v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy. - v3: Introduced fbind() syscall for VMM memory-placement configuration. - v4-v6: Current approach using shared_policy support and vm_ops (based on suggestions from David [6] and guest_memfd bi-weekly upstream call discussion [7]). - v7: Use inodes to store NUMA policy instead of file [8]. - v8: Rebase on top of Fuad's V12: Host mmaping for guest_memfd memory. - v9: Rebase on top of Fuad's V13 and incorporate review comments - V10: Rebase on top of Fuad's V17. Use latest guest_memfd inode patch from Ackerley (with David's review comments). Use newer kmem_cache_create() API variant with arg parameter (Vlastimil) - V11: Rebase on kvm-next, remove RFC tag, use Ackerley's latest patch and fix a rcu race bug during kvm module unload. [1] https://lore.kernel.org/all/20250729225455.670324-1-seanjc@google.com [2] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next [3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com [4] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAo… [5] https://lore.kernel.org/all/20250613005400.3694904-1-michael.roth@amd.com [6] https://lore.kernel.org/all/6fbef654-36e2-4be5-906e-2a648a845278@redhat.com [7] https://lore.kernel.org/all/2b77e055-98ac-43a1-a7ad-9f9065d7f38f@amd.com [8] https://lore.kernel.org/all/diqzbjumm167.fsf@ackerleytng-ctop.c.googlers.com Ackerley Tng (1): KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes Matthew Wilcox (Oracle) (2): mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio() mm/filemap: Extend __filemap_get_folio() to support NUMA memory policies Shivank Garg (4): mm/mempolicy: Export memory policy symbols KVM: guest_memfd: Add slab-allocated inode cache KVM: guest_memfd: Enforce NUMA mempolicy using shared policy KVM: guest_memfd: selftests: Add tests for mmap and NUMA policy support fs/bcachefs/fs-io-buffered.c | 2 +- fs/btrfs/compression.c | 4 +- fs/btrfs/verity.c | 2 +- fs/erofs/zdata.c | 2 +- fs/f2fs/compress.c | 2 +- include/linux/pagemap.h | 18 +- include/uapi/linux/magic.h | 1 + mm/filemap.c | 23 +- mm/mempolicy.c | 6 + mm/readahead.c | 2 +- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 121 ++++++++ virt/kvm/guest_memfd.c | 262 ++++++++++++++++-- virt/kvm/kvm_main.c | 7 +- virt/kvm/kvm_mm.h | 9 +- 15 files changed, 412 insertions(+), 50 deletions(-) -- 2.43.0 --- == Earlier Postings == v10: https://lore.kernel.org/all/20250811090605.16057-2-shivankg@amd.com v9: https://lore.kernel.org/all/20250713174339.13981-2-shivankg@amd.com v8: https://lore.kernel.org/all/20250618112935.7629-1-shivankg@amd.com v7: https://lore.kernel.org/all/20250408112402.181574-1-shivankg@amd.com v6: https://lore.kernel.org/all/20250226082549.6034-1-shivankg@amd.com v5: https://lore.kernel.org/all/20250219101559.414878-1-shivankg@amd.com v4: https://lore.kernel.org/all/20250210063227.41125-1-shivankg@amd.com v3: https://lore.kernel.org/all/20241105164549.154700-1-shivankg@amd.com v2: https://lore.kernel.org/all/20240919094438.10987-1-shivankg@amd.com v1: https://lore.kernel.org/all/20240916165743.201087-1-shivankg@amd.com

2 weeks, 5 days

10
38
0 0

[PATCH v7 00/12] Direct Map Removal Support for guest_memfd

by Patrick Roy

From: Patrick Roy <roypat(a)amazon.co.uk> [ based on kvm/next ] Unmapping virtual machine guest memory from the host kernel's direct map is a successful mitigation against Spectre-style transient execution issues: If the kernel page tables do not contain entries pointing to guest memory, then any attempted speculative read through the direct map will necessarily be blocked by the MMU before any observable microarchitectural side-effects happen. This means that Spectre-gadgets and similar cannot be used to target virtual machine memory. Roughly 60% of speculative execution issues fall into this category [1, Table 1]. This patch series extends guest_memfd with the ability to remove its memory from the host kernel's direct map, to be able to attain the above protection for KVM guests running inside guest_memfd. Additionally, a Firecracker branch with support for these VMs can be found on GitHub [2]. For more details, please refer to the v5 cover letter [v5]. No substantial changes in design have taken place since. === Changes Since v6 === - Drop patch for passing struct address_space to ->free_folio(), due to possible races with freeing of the address_space. (Hugh) - Stop using PG_uptodate / gmem preparedness tracking to keep track of direct map state. Instead, use the lowest bit of folio->private. (Mike, David) - Do direct map removal when establishing mapping of gmem folio instead of at allocation time, due to impossibility of handling direct map removal errors in kvm_gmem_populate(). (Patrick) - Do TLB flushes after direct map removal, and provide a module parameter to opt out from them, and a new patch to export flush_tlb_kernel_range() to KVM. (Will) [1]: https://download.vusec.net/papers/quarantine_raid23.pdf [2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hidi… [RFCv1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/ [RFCv2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/ [RFCv3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/ [v4]: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/ [v5]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk/ [v6]: https://lore.kernel.org/kvm/20250912091708.17502-1-roypat@amazon.co.uk/ Patrick Roy (12): arch: export set_direct_map_valid_noflush to KVM module x86/tlb: export flush_tlb_kernel_range to KVM module mm: introduce AS_NO_DIRECT_MAP KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate KVM: guest_memfd: Add flag to remove from direct map KVM: guest_memfd: add module param for disabling TLB flushing KVM: selftests: load elf via bounce buffer KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 KVM: selftests: Add guest_memfd based vm_mem_backing_src_types KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests KVM: selftests: stuff vm_mem_backing_src_type into vm_shape KVM: selftests: Test guest execution from direct map removed gmem Documentation/virt/kvm/api.rst | 5 ++ arch/arm64/include/asm/kvm_host.h | 12 ++++ arch/arm64/mm/pageattr.c | 1 + arch/loongarch/mm/pageattr.c | 1 + arch/riscv/mm/pageattr.c | 1 + arch/s390/mm/pageattr.c | 1 + arch/x86/include/asm/tlbflush.h | 3 +- arch/x86/mm/pat/set_memory.c | 1 + arch/x86/mm/tlb.c | 1 + include/linux/kvm_host.h | 9 +++ include/linux/pagemap.h | 16 +++++ include/linux/secretmem.h | 18 ----- include/uapi/linux/kvm.h | 2 + lib/buildid.c | 4 +- mm/gup.c | 19 ++---- mm/mlock.c | 2 +- mm/secretmem.c | 8 +-- .../testing/selftests/kvm/guest_memfd_test.c | 2 + .../testing/selftests/kvm/include/kvm_util.h | 37 ++++++++--- .../testing/selftests/kvm/include/test_util.h | 8 +++ tools/testing/selftests/kvm/lib/elf.c | 8 +-- tools/testing/selftests/kvm/lib/io.c | 23 +++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 61 +++++++++-------- tools/testing/selftests/kvm/lib/test_util.c | 8 +++ tools/testing/selftests/kvm/lib/x86/sev.c | 1 + .../selftests/kvm/pre_fault_memory_test.c | 1 + .../selftests/kvm/set_memory_region_test.c | 50 ++++++++++++-- .../kvm/x86/private_mem_conversions_test.c | 7 +- virt/kvm/guest_memfd.c | 66 +++++++++++++++++-- virt/kvm/kvm_main.c | 8 +++ 30 files changed, 290 insertions(+), 94 deletions(-) base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383 -- 2.51.0

3 weeks, 2 days

11
54
0 0

[PATCH] selftests/seccomp: fix pointer type mismatch in UPROBE test

by Nirbhay Sharma

Fix compilation error in UPROBE_setup caused by pointer type mismatch in ternary expression. The probed_uretprobe and probed_uprobe function pointers have different type attributes (__attribute__((nocf_check))), which causes the conditional operator to fail with: seccomp_bpf.c:5175:74: error: pointer type mismatch in conditional expression [-Wincompatible-pointer-types] Cast both function pointers to 'const void *' to match the expected parameter type of get_uprobe_offset(), resolving the type mismatch while preserving the function selection logic. Signed-off-by: Nirbhay Sharma <nirbhay.lkd(a)gmail.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 874f17763536..e13ffe18ef95 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -5172,7 +5172,8 @@ FIXTURE_SETUP(UPROBE) ASSERT_GE(bit, 0); } - offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe); + offset = get_uprobe_offset(variant->uretprobe ? + (const void *)probed_uretprobe : (const void *)probed_uprobe); ASSERT_GE(offset, 0); if (variant->uretprobe) -- 2.48.1

3 weeks, 4 days

4
8
0 0

[PATCH v12 00/23] TDX KVM selftests

by Sagi Shahar

This is v12 of the TDX selftests. This series is based on v6.18-rc3 Changes from v11 [1]: - Rebased on top of v6.18-rc3. - Hook vm_tdx_finalize() into kvm_arch_vm_finalize_vcpus instead of calling it as part of vm_tdx_create_with_one_vcpu. See "KVM: selftests: Finalize TD memory as part of kvm_arch_vm_finalize_vcpus" which was added to this series. - Replaced vm_tdx_create_with_one_vcpu with vm_create_shape_with_one_vcpu following Sean's patch to simplify creating VM shapes. [1] https://lore.kernel.org/lkml/20250925172851.606193-1-sagis@google.com/ Ackerley Tng (2): KVM: selftests: Add helpers to init TDX memory and finalize VM KVM: selftests: Add ucall support for TDX Erdem Aktas (2): KVM: selftests: Add TDX boot code KVM: selftests: Add support for TDX TDCALL from guest Isaku Yamahata (2): KVM: selftests: Update kvm_init_vm_address_properties() for TDX KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs' attribute configuration Sagi Shahar (16): KVM: selftests: Allocate pgd in virt_map() as necessary KVM: selftests: Expose functions to get default sregs values KVM: selftests: Expose function to allocate guest vCPU stack KVM: selftests: Expose segment definitons to assembly files KVM: selftests: Add kbuild definitons KVM: selftests: Define structs to pass parameters to TDX boot code KVM: selftests: Set up TDX boot code region KVM: selftests: Set up TDX boot parameters region KVM: selftests: Add helper to initialize TDX VM KVM: selftests: Call TDX init when creating a new TDX vm KVM: selftests: Setup memory regions for TDX on vm creation KVM: selftests: Call KVM_TDX_INIT_VCPU when creating a new TDX vcpu KVM: selftests: Set entry point for TDX guest code KVM: selftests: Finalize TD memory as part of kvm_arch_vm_finalize_vcpus KVM: selftests: Add wrapper for TDX MMIO from guest KVM: selftests: Add TDX lifecycle test Sean Christopherson (1): KVM: selftests: Add macros so simplify creating VM shapes for non-default types tools/include/linux/kbuild.h | 18 + tools/testing/selftests/kvm/Makefile.kvm | 32 ++ .../testing/selftests/kvm/include/kvm_util.h | 14 + .../selftests/kvm/include/ucall_common.h | 1 + .../selftests/kvm/include/x86/processor.h | 40 +++ .../selftests/kvm/include/x86/processor_asm.h | 12 + tools/testing/selftests/kvm/include/x86/sev.h | 2 - .../selftests/kvm/include/x86/tdx/td_boot.h | 74 ++++ .../kvm/include/x86/tdx/td_boot_asm.h | 16 + .../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++ .../selftests/kvm/include/x86/tdx/tdx.h | 14 + .../selftests/kvm/include/x86/tdx/tdx_util.h | 84 +++++ .../testing/selftests/kvm/include/x86/ucall.h | 6 - tools/testing/selftests/kvm/lib/kvm_util.c | 10 +- .../testing/selftests/kvm/lib/x86/processor.c | 99 ++++-- tools/testing/selftests/kvm/lib/x86/sev.c | 16 - .../selftests/kvm/lib/x86/tdx/td_boot.S | 60 ++++ .../kvm/lib/x86/tdx/td_boot_offsets.c | 21 ++ .../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++ .../kvm/lib/x86/tdx/tdcall_offsets.c | 16 + tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 23 ++ .../selftests/kvm/lib/x86/tdx/tdx_util.c | 330 ++++++++++++++++++ tools/testing/selftests/kvm/lib/x86/ucall.c | 46 ++- .../selftests/kvm/x86/sev_smoke_test.c | 40 +-- tools/testing/selftests/kvm/x86/tdx_vm_test.c | 33 ++ 25 files changed, 1056 insertions(+), 78 deletions(-) create mode 100644 tools/include/linux/kbuild.h create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c -- 2.51.1.851.g4ebd6896fd-goog

3 weeks, 6 days

5
70
0 0

[PATCH v8 00/29] KVM: arm64: Implement support for SME

by Mark Brown

I've removed the RFC tag from this version of the series, but the items that I'm looking for feedback on remains the same: - The userspace ABI, in particular: - The vector length used for the SVE registers, access to the SVE registers and access to ZA and (if available) ZT0 depending on the current state of PSTATE.{SM,ZA}. - The use of a single finalisation for both SVE and SME. - The addition of control for enabling fine grained traps in a similar manner to FGU but without the UNDEF, I'm not clear if this is desired at all and at present this requires symmetric read and write traps like FGU. That seemed like it might be desired from an implementation point of view but we already have one case where we enable an asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it seems generally useful to enable asymmetrically. This series implements support for SME use in non-protected KVM guests. Much of this is very similar to SVE, the main additional challenge that SME presents is that it introduces a new vector length similar to the SVE vector length and two new controls which change the registers seen by guests: - PSTATE.ZA enables the ZA matrix register and, if SME2 is supported, the ZT0 LUT register. - PSTATE.SM enables streaming mode, a new floating point mode which uses the SVE register set with the separately configured SME vector length. In streaming mode implementation of the FFR register is optional. It is also permitted to build systems which support SME without SVE, in this case when not in streaming mode no SVE registers or instructions are available. Further, there is no requirement that there be any overlap in the set of vector lengths supported by SVE and SME in a system, this is expected to be a common situation in practical systems. Since there is a new vector length to configure we introduce a new feature parallel to the existing SVE one with a new pseudo register for the streaming mode vector length. Due to the overlap with SVE caused by streaming mode rather than finalising SME as a separate feature we use the existing SVE finalisation to also finalise SME, a new define KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising SVE and SME separately would introduce complication with register access since finalising SVE makes the SVE registers writeable by userspace and doing multiple finalisations results in an error being reported. Dealing with a state where the SVE registers are writeable due to one of SVE or SME being finalised but may have their VL changed by the other being finalised seems like needless complexity with minimal practical utility, it seems clearer to just express directly that only one finalisation can be done in the ABI. Access to the floating point registers follows the architecture: - When both SVE and SME are present: - If PSTATE.SM == 0 the vector length used for the Z and P registers is the SVE vector length. - If PSTATE.SM == 1 the vector length used for the Z and P registers is the SME vector length. - If only SME is present: - If PSTATE.SM == 0 the Z and P registers are inaccessible and the floating point state accessed via the encodings for the V registers. - If PSTATE.SM == 1 the vector length used for the Z and P registers - The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1. The VMM must understand this, in particular when loading state SVCR should be configured before other state. It should be noted that while the architecture refers to PSTATE.SM and PSTATE.ZA these PSTATE bits are not preserved in SPSR_ELx, they are only accessible via SVCR. There are a large number of subfeatures for SME, most of which only offer additional instructions but some of which (SME2 and FA64) add architectural state. These are configured via the ID registers as per usual. Protected KVM supported, with the implementation maintaining the existing restriction that the hypervisor will refuse to run if streaming mode or ZA is enabled. This both simplfies the code and avoids the need to allocate storage for host ZA and ZT0 state, there seems to be little practical use case for supporting this and the memory usage would be non-trivial. The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been added to the get-reg-list selftest, the idea of supporting additional features there without restructuring the program to generate all possible feature combinations has been rejected. I will post a separate series which does that restructuring. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v8: - Small fixes in ABI documentation. - Link to v7: https://lore.kernel.org/r/20250822-kvm-arm64-sme-v7-0-7a65d82b8b10@kernel.o… Changes in v7: - Rebase onto v6.17-rc1. - Handle SMIDR_EL1 as a VM wide ID register and use this in feat_sme_smps(). - Expose affinity fields in SMIDR_EL1. - Remove SMPRI_EL1 from vcpu_sysreg, the value is always 0 currently. - Prevent userspace writes to SMPRIMAP_EL2. - Link to v6: https://lore.kernel.org/r/20250625-kvm-arm64-sme-v6-0-114cff4ffe04@kernel.o… Changes in v6: - Rebase onto v6.16-rc3. - Link to v5: https://lore.kernel.org/r/20250417-kvm-arm64-sme-v5-0-f469a2d5f574@kernel.o… Changes in v5: - Rebase onto v6.15-rc2. - Add pKVM guest support. - Always restore SVCR. - Link to v4: https://lore.kernel.org/r/20250214-kvm-arm64-sme-v4-0-d64a681adcc2@kernel.o… Changes in v4: - Rebase onto v6.14-rc2 and Mark Rutland's fixes. - Expose SME to nested guests. - Additional cleanups and test fixes following on from the rebase. - Flush register state on VMM PSTATE.{SM,ZA}. - Link to v3: https://lore.kernel.org/r/20241220-kvm-arm64-sme-v3-0-05b018c1ffeb@kernel.o… Changes in v3: - Rebase onto v6.12-rc2. - Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.o… Changes in v2: - Rebase onto v6.7-rc3. - Configure subfeatures based on host system only. - Complete nVHE support. - There was some snafu with sending v1 out, it didn't make it to the lists but in case it hit people's inboxes I'm sending as v2. --- Mark Brown (29): arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time arm64/fpsimd: Check enable bit for FA64 when saving EFI state arm64/fpsimd: Determine maximum virtualisable SME vector length KVM: arm64: Introduce non-UNDEF FGT control KVM: arm64: Pay attention to FFR parameter in SVE save and load KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h KVM: arm64: Move SVE state access macros after feature test macros KVM: arm64: Rename SVE finalization constants to be more general KVM: arm64: Document the KVM ABI for SME KVM: arm64: Define internal features for SME KVM: arm64: Rename sve_state_reg_region KVM: arm64: Store vector lengths in an array KVM: arm64: Implement SME vector length configuration KVM: arm64: Support SME control registers KVM: arm64: Support TPIDR2_EL0 KVM: arm64: Support SME identification registers for guests KVM: arm64: Support SME priority registers KVM: arm64: Provide assembly for SME register access KVM: arm64: Support userspace access to streaming mode Z and P registers KVM: arm64: Flush register state on writes to SVCR.SM and SVCR.ZA KVM: arm64: Expose SME specific state to userspace KVM: arm64: Context switch SME state for guests KVM: arm64: Handle SME exceptions KVM: arm64: Expose SME to nested guests KVM: arm64: Provide interface for configuring and enabling SME for guests KVM: arm64: selftests: Add SME system registers to get-reg-list KVM: arm64: selftests: Add SME to set_id_regs test Documentation/virt/kvm/api.rst | 115 ++++++++--- arch/arm64/include/asm/fpsimd.h | 26 +++ arch/arm64/include/asm/kvm_emulate.h | 6 + arch/arm64/include/asm/kvm_host.h | 169 ++++++++++++--- arch/arm64/include/asm/kvm_hyp.h | 5 +- arch/arm64/include/asm/kvm_pkvm.h | 2 +- arch/arm64/include/asm/vncr_mapping.h | 2 + arch/arm64/include/uapi/asm/kvm.h | 33 +++ arch/arm64/kernel/cpufeature.c | 2 - arch/arm64/kernel/fpsimd.c | 89 ++++---- arch/arm64/kvm/arm.c | 10 + arch/arm64/kvm/config.c | 8 +- arch/arm64/kvm/fpsimd.c | 28 ++- arch/arm64/kvm/guest.c | 252 ++++++++++++++++++++--- arch/arm64/kvm/handle_exit.c | 14 ++ arch/arm64/kvm/hyp/fpsimd.S | 28 ++- arch/arm64/kvm/hyp/include/hyp/switch.h | 175 ++++++++++++++-- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 110 ++++++---- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++-- arch/arm64/kvm/hyp/nvhe/pkvm.c | 85 ++++++-- arch/arm64/kvm/hyp/nvhe/switch.c | 4 +- arch/arm64/kvm/hyp/nvhe/sys_regs.c | 6 + arch/arm64/kvm/hyp/vhe/switch.c | 17 +- arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 7 + arch/arm64/kvm/nested.c | 3 +- arch/arm64/kvm/reset.c | 156 ++++++++++---- arch/arm64/kvm/sys_regs.c | 141 ++++++++++++- arch/arm64/tools/sysreg | 8 +- include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/arm64/get-reg-list.c | 15 +- tools/testing/selftests/kvm/arm64/set_id_regs.c | 27 ++- 31 files changed, 1327 insertions(+), 303 deletions(-) --- base-commit: 062b3e4a1f880f104a8d4b90b767788786aa7b78 change-id: 20230301-kvm-arm64-sme-06a1246d3636 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 month

4
37
0 0

[RFC PATCH v2 0/3] Add testable code specifications

by Gabriele Paoloni

[1] was an initial proposal defining testable code specifications for some functions in /drivers/char/mem.c. However a Guideline to write such specifications was missing and test cases tracing to such specifications were missing. This patchset represents a next step and is organised as follows: - patch 1/3 contains the Guideline for writing code specifications - patch 2/3 contains examples of code specfications defined for some functions of drivers/char/mem.c - patch 3/3 contains examples of selftests that map to some code specifications of patch 2/3 [1] https://lore.kernel.org/all/20250821170419.70668-1-gpaoloni@redhat.com/ --- Changes from v1: 1) Added a Guideline to write code specifications in the Linux Kernel Documentation 2) Addressed Greg KH comments in /drivers/char/mem.c 3) Added example of test cases mapping to the code specifications in /drivers/char/mem.c --- Alessandro Carminati (1): selftests/devmem: initial testset Gabriele Paoloni (2): Documentation: add guidelines for writing testable code specifications /dev/mem: Add initial documentation of memory_open() and mem_fops .../doc-guide/code-specifications.rst | 208 +++++++ Documentation/doc-guide/index.rst | 1 + drivers/char/mem.c | 231 ++++++- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/devmem/Makefile | 13 + tools/testing/selftests/devmem/debug.c | 25 + tools/testing/selftests/devmem/debug.h | 14 + tools/testing/selftests/devmem/devmem.c | 200 ++++++ tools/testing/selftests/devmem/ram_map.c | 250 ++++++++ tools/testing/selftests/devmem/ram_map.h | 38 ++ tools/testing/selftests/devmem/secret.c | 46 ++ tools/testing/selftests/devmem/secret.h | 13 + tools/testing/selftests/devmem/tests.c | 569 ++++++++++++++++++ tools/testing/selftests/devmem/tests.h | 45 ++ tools/testing/selftests/devmem/utils.c | 379 ++++++++++++ tools/testing/selftests/devmem/utils.h | 119 ++++ 16 files changed, 2146 insertions(+), 6 deletions(-) create mode 100644 Documentation/doc-guide/code-specifications.rst create mode 100644 tools/testing/selftests/devmem/Makefile create mode 100644 tools/testing/selftests/devmem/debug.c create mode 100644 tools/testing/selftests/devmem/debug.h create mode 100644 tools/testing/selftests/devmem/devmem.c create mode 100644 tools/testing/selftests/devmem/ram_map.c create mode 100644 tools/testing/selftests/devmem/ram_map.h create mode 100644 tools/testing/selftests/devmem/secret.c create mode 100644 tools/testing/selftests/devmem/secret.h create mode 100644 tools/testing/selftests/devmem/tests.c create mode 100644 tools/testing/selftests/devmem/tests.h create mode 100644 tools/testing/selftests/devmem/utils.c create mode 100644 tools/testing/selftests/devmem/utils.h -- 2.48.1

1 month

7
23
0 0

[PATCH 00/32] ns: support file handles

by Christian Brauner

For a while now we have supported file handles for pidfds. This has proven to be very useful. Extend the concept to cover namespaces as well. After this patchset it is possible to encode and decode namespace file handles using the commong name_to_handle_at() and open_by_handle_at() apis. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. Both the network namespace and the mount namespace already have an associated cookie that isn't recycled and is fully exposed to userspace. Move this into ns_common and use the same id space for all namespaces so they can trivially and reliably be compared. There's more coming based on the iterator infrastructure but the series is large enough and focuses on file handles. Extensive selftests included. I still have various other test-suites to run but it holds up so far. Signed-off-by: Christian Brauner <brauner(a)kernel.org> --- Christian Brauner (32): pidfs: validate extensible ioctls nsfs: validate extensible ioctls block: use extensible_ioctl_valid() ns: move to_ns_common() to ns_common.h nsfs: add nsfs.h header ns: uniformly initialize ns_common mnt: use ns_common_init() ipc: use ns_common_init() cgroup: use ns_common_init() pid: use ns_common_init() time: use ns_common_init() uts: use ns_common_init() user: use ns_common_init() net: use ns_common_init() ns: remove ns_alloc_inum() nstree: make iterator generic mnt: support iterator cgroup: support iterator ipc: support iterator net: support iterator pid: support iterator time: support iterator userns: support iterator uts: support iterator ns: add to_<type>_ns() to respective headers nsfs: add current_in_namespace() nsfs: support file handles nsfs: support exhaustive file handles nsfs: add missing id retrieval support tools: update nsfs.h uapi header selftests/namespaces: add identifier selftests selftests/namespaces: add file handle selftests block/blk-integrity.c | 8 +- fs/fhandle.c | 6 + fs/internal.h | 1 + fs/mount.h | 10 +- fs/namespace.c | 156 +-- fs/nsfs.c | 266 +++- fs/pidfs.c | 2 +- include/linux/cgroup.h | 5 + include/linux/exportfs.h | 6 + include/linux/fs.h | 14 + include/linux/ipc_namespace.h | 5 + include/linux/ns_common.h | 29 + include/linux/nsfs.h | 40 + include/linux/nsproxy.h | 11 - include/linux/nstree.h | 89 ++ include/linux/pid_namespace.h | 5 + include/linux/proc_ns.h | 32 +- include/linux/time_namespace.h | 9 + include/linux/user_namespace.h | 5 + include/linux/utsname.h | 5 + include/net/net_namespace.h | 6 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/nsfs.h | 12 +- init/main.c | 2 + ipc/msgutil.c | 1 + ipc/namespace.c | 12 +- ipc/shm.c | 2 + kernel/Makefile | 2 +- kernel/cgroup/cgroup.c | 2 + kernel/cgroup/namespace.c | 24 +- kernel/nstree.c | 233 ++++ kernel/pid_namespace.c | 13 +- kernel/time/namespace.c | 23 +- kernel/user_namespace.c | 17 +- kernel/utsname.c | 28 +- net/core/net_namespace.c | 59 +- tools/include/uapi/linux/nsfs.h | 23 +- tools/testing/selftests/namespaces/.gitignore | 2 + tools/testing/selftests/namespaces/Makefile | 7 + tools/testing/selftests/namespaces/config | 7 + .../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++ tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++ 42 files changed, 3306 insertions(+), 270 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250905-work-namespace-c68826dda0d4

1 month

15
80
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror October 2025