Linux-kselftest-mirror September 2023

linux-kselftest-mirror@lists.linaro.org

165 participants
257 discussions

[PATCH v4 00/17] iommufd: Add nesting infrastructure

by Yi Liu

Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation. Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation. .-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |---------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation> In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt) and each has an iommu_domain allocated from iommu driver. So in this series hw_pagetable and iommu_domain means the same thing if no special note. IOMMUFD has already supported allocating hw_pagetable that is linked with an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable with driver specific parameters and interface to sync stage-1 IOTLB as user owns the stage-1 translation table. This series is based on the iommu hw info reporting series [1]. It first extends domain_alloc_user to allocate domains with user data and adds new op for invalidate stage-1 IOTLB for user-managed domains, then extends the IOMMUFD internal infrastructure to accept user_data and parent hwpt, relay the user_data/parent to iommu core to allocate user-managed iommu_domain. After it, extends the ioctl IOMMU_HWPT_ALLOC to accept user data and stage-2 hwpt ID. Along with it, ioctl IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed for user-managed hwpts. Selftest is added as well to cover the new ioctls. Complete code can be found in [2], QEMU could can be found in [3]. At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks. [1] https://lore.kernel.org/linux-iommu/20230818101033.4100-1-yi.l.liu@intel.co… - merged [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1 Change log: v4: - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs - Rework invalidate uAPI to be a multi-request array-based design - Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array - Add a patch fixing TEST_LENGTH() in selftest program - Drop IOMMU_RESV_IOVA_RANGES patches - Update kdoc and inline comments - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far. v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.… - Add new uAPI things in alphabetical order - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for sanity, replacing the previous op->domain_alloc_user_data_len solution - Return ERR_PTR from domain_alloc_user instead of NULL - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin) - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O page table). (Kevin) - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl - Minor changes per Kevin's inputs v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c… - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1 v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.… Thanks, Yi Liu Lu Baolu (1): iommu: Add nested domain support Nicolin Chen (12): iommufd: Unite all kernel-managed members into a struct iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions iommufd: Add shared alloc_fn function pointer and mutex pointer iommufd: Add user-managed hw_pagetable support iommufd: Always setup MSI and anforce cc on kernel-managed domains iommufd/device: Add helpers to enforce/remove device reserved regions iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly iommufd/selftest: Add nested domain allocation for mock domain iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs iommufd/selftest: Add mock_domain_cache_invalidate_user support iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu (4): iommu: Add hwpt_type with user_data for domain_alloc_user op iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc() iommufd: Support IOMMU_HWPT_ALLOC allocation with user data iommufd: Add IOMMU_HWPT_INVALIDATE drivers/iommu/intel/iommu.c | 5 +- drivers/iommu/iommufd/device.c | 51 +++- drivers/iommu/iommufd/hw_pagetable.c | 257 ++++++++++++++++-- drivers/iommu/iommufd/iommufd_private.h | 59 +++- drivers/iommu/iommufd/iommufd_test.h | 40 +++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 184 ++++++++++++- include/linux/iommu.h | 110 +++++++- include/uapi/linux/iommufd.h | 60 +++- tools/testing/selftests/iommu/iommufd.c | 209 +++++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 3 +- tools/testing/selftests/iommu/iommufd_utils.h | 91 ++++++- 12 files changed, 998 insertions(+), 74 deletions(-) -- 2.34.1

2 years, 2 months

[PATCH v4 00/36] arm64/gcs: Provide support for GCS in userspace

by Mark Brown

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling. When GCS is active a secondary stack called the Guarded Control Stack is maintained, protected with a memory attribute which means that it can only be written with specific GCS operations. The current GCS pointer can not be directly written to by userspace. When a BL is executed the value stored in LR is also pushed onto the GCS, and when a RET is executed the top of the GCS is popped and compared to LR with a fault being raised if the values do not match. GCS operations may only be performed on GCS pages, a data abort is generated if they are not. The combination of hardware enforcement and lack of extra instructions in the function entry and exit paths should result in something which has less overhead and is more difficult to attack than a purely software implementation like clang's shadow stacks. This series implements support for use of GCS by userspace, along with support for use of GCS within KVM guests. It does not enable use of GCS by either EL1 or EL2, this will be implemented separately. Executables are started without GCS and must use a prctl() to enable it, it is expected that this will be done very early in application execution by the dynamic linker or other startup code. x86 has an equivalent feature called shadow stacks, this series depends on the x86 patches for generic memory management support for the new guarded/shadow stack page type and shares APIs as much as possible. As there has been extensive discussion with the wider community around the ABI for shadow stacks I have as far as practical kept implementation decisions close to those for x86, anticipating that review would lead to similar conclusions in the absence of strong reasoning for divergence. The main divergence I am concious of is that x86 allows shadow stack to be enabled and disabled repeatedly, freeing the shadow stack for the thread whenever disabled, while this implementation keeps the GCS allocated after disable but refuses to reenable it. This is to avoid races with things actively walking the GCS during a disable, we do anticipate that some systems will wish to disable GCS at runtime but are not aware of any demand for subsequently reenabling it. x86 uses an arch_prctl() to manage enable and disable, since only x86 and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a patch set for the equivalent RISC-V zisslpcfi feature which I initially adopted fairly directly but following review feedback has been revised quite a bit. There is an open issue with support for CRIU, on x86 this required the ability to set the GCS mode via ptrace. This series supports configuring mode bits other than enable/disable via ptrace but it needs to be confirmed if this is sufficient. There's a few bits where I'm not convinced with where I've placed things, in particular the GCS write operation is in the GCS header not in uaccess.h, I wasn't sure what was clearest there and am probably too close to the code to have a clear opinion. The reporting of GCS in /proc/PID/smaps is also a bit awkward. The series depends on the x86 shadow stack support: https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.… I've rebased this onto v6.5-rc4 but not included it in the series in order to avoid confusion with Rick's work and cut down the size of the series, you can see the branch at: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs [1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v4: - Implement flags for map_shadow_stack() allowing the cap and end of stack marker to be enabled independently or not at all. - Relax size and alignment requirements for map_shadow_stack(). - Add more blurb explaining the advantages of hardware enforcement. - Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org Changes in v3: - Rebase onto v6.5-rc4. - Add a GCS barrier on context switch. - Add a GCS stress test. - Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org Changes in v2: - Rebase onto v6.5-rc3. - Rework prctl() interface to allow each bit to be locked independently. - map_shadow_stack() now places the cap token based on the size requested by the caller not the actual space allocated. - Mode changes other than enable via ptrace are now supported. - Expand test coverage. - Various smaller fixes and adjustments. - Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org --- Mark Brown (36): prctl: arch-agnostic prctl for shadow stack arm64: Document boot requirements for Guarded Control Stacks arm64/gcs: Document the ABI for Guarded Control Stacks arm64/sysreg: Add new system registers for GCS arm64/sysreg: Add definitions for architected GCS caps arm64/gcs: Add manual encodings of GCS instructions arm64/gcs: Provide copy_to_user_gcs() arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) arm64/mm: Allocate PIE slots for EL0 guarded control stack mm: Define VM_SHADOW_STACK for arm64 when we support GCS arm64/mm: Map pages for guarded control stack KVM: arm64: Manage GCS registers for guests arm64/gcs: Allow GCS usage at EL0 and EL1 arm64/idreg: Add overrride for GCS arm64/hwcap: Add hwcap for GCS arm64/traps: Handle GCS exceptions arm64/mm: Handle GCS data aborts arm64/gcs: Context switch GCS state for EL0 arm64/gcs: Allocate a new GCS for threads with GCS enabled arm64/gcs: Implement shadow stack prctl() interface arm64/mm: Implement map_shadow_stack() arm64/signal: Set up and restore the GCS context for signal handlers arm64/signal: Expose GCS state in signal frames arm64/ptrace: Expose GCS via ptrace and core files arm64: Add Kconfig for Guarded Control Stack (GCS) kselftest/arm64: Verify the GCS hwcap kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Add very basic GCS test program kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add test coverage for GCS mode locking selftests/arm64: Add GCS signal tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Enable GCS for the FP stress tests Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/arch/arm64/booting.rst | 22 + Documentation/arch/arm64/elf_hwcaps.rst | 3 + Documentation/arch/arm64/gcs.rst | 228 +++++++++ Documentation/arch/arm64/index.rst | 1 + Documentation/filesystems/proc.rst | 2 +- arch/arm64/Kconfig | 19 + arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 17 + arch/arm64/include/asm/esr.h | 28 +- arch/arm64/include/asm/exception.h | 2 + arch/arm64/include/asm/gcs.h | 106 ++++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_arm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 12 + arch/arm64/include/asm/pgtable-prot.h | 14 +- arch/arm64/include/asm/processor.h | 7 + arch/arm64/include/asm/sysreg.h | 20 + arch/arm64/include/asm/uaccess.h | 42 ++ arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 8 + arch/arm64/include/uapi/asm/sigcontext.h | 9 + arch/arm64/kernel/cpufeature.c | 19 + arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/entry-common.c | 23 + arch/arm64/kernel/idreg-override.c | 2 + arch/arm64/kernel/process.c | 85 ++++ arch/arm64/kernel/ptrace.c | 59 +++ arch/arm64/kernel/signal.c | 237 ++++++++- arch/arm64/kernel/traps.c | 11 + arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 + arch/arm64/kvm/sys_regs.c | 22 + arch/arm64/mm/Makefile | 1 + arch/arm64/mm/fault.c | 78 ++- arch/arm64/mm/gcs.c | 234 +++++++++ arch/arm64/mm/mmap.c | 12 +- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 55 +++ fs/proc/task_mmu.c | 3 + include/linux/mm.h | 16 +- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 22 + kernel/sys.c | 30 ++ kernel/sys_ni.c | 1 + tools/testing/selftests/arm64/Makefile | 2 +- tools/testing/selftests/arm64/abi/hwcap.c | 19 + tools/testing/selftests/arm64/fp/assembler.h | 15 + tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + tools/testing/selftests/arm64/fp/sve-test.S | 2 + tools/testing/selftests/arm64/fp/za-test.S | 2 + tools/testing/selftests/arm64/fp/zt-test.S | 2 + tools/testing/selftests/arm64/gcs/.gitignore | 5 + tools/testing/selftests/arm64/gcs/Makefile | 24 + tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 tools/testing/selftests/arm64/gcs/basic-gcs.c | 356 ++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++ .../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++ tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-util.h | 87 ++++ tools/testing/selftests/arm64/gcs/libc-gcs.c | 500 +++++++++++++++++++ tools/testing/selftests/arm64/signal/.gitignore | 1 + .../testing/selftests/arm64/signal/test_signals.c | 17 +- .../testing/selftests/arm64/signal/test_signals.h | 6 + .../selftests/arm64/signal/test_signals_utils.c | 32 +- .../selftests/arm64/signal/test_signals_utils.h | 39 ++ .../arm64/signal/testcases/gcs_exception_fault.c | 59 +++ .../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++ .../arm64/signal/testcases/gcs_write_fault.c | 67 +++ .../selftests/arm64/signal/testcases/testcases.c | 7 + .../selftests/arm64/signal/testcases/testcases.h | 1 + 72 files changed, 3823 insertions(+), 34 deletions(-) --- base-commit: ed0e1456f04be7a93c9a186e8e13aed78b555617 change-id: 20230303-arm64-gcs-e311ab0d8729 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years, 2 months

[PATCH v2 0/6] iommufd support allocating nested parent domain

by Yi Liu

IOMMU hardwares that support nested translation would have two stages address translation (normally mentioned as stage-1 and stage-2). The page table formats of the stage-1 and stage-2 can be different. e.g., VT-d has different page table formats for stage-1 and stage-2. Nested parent domain is the iommu domain used to represent the stage-2 translation. In IOMMUFD, both stage-1 and stage-2 translation are tracked as HWPT (a.k.a. iommu domain). Stage-2 HWPT is parent of stage-1 HWPT as stage-1 cannot work alone in nested translation. In the cases of stage-1 and stage-2 page table format are different, the parent HWPT should use exactly the stage-2 page table format. However, the existing kernel hides the format selection in iommu drivers, so the domain allocated via IOMMU_HWPT_ALLOC can use either stage-1 page table format or stage-2 page table format, there is no guarantees for it. To enforce the page table format of the nested parent domain, this series introduces a new iommu op (domain_alloc_user) which can accept user flags to allocate domain as userspace requires. It also converts IOMMUFD to use the new domain_alloc_user op for domain allocation if supported, then extends the IOMMU_HWPT_ALLOC ioctl to pass down a NEST_PARENT flag to allocate a HWPT which can be used as parent. This series implements the new op in Intel iommu driver to have a complete picture. It is a preparation for adding nesting support in IOMMUFD/IOMMU. Complete code can be found: https://github.com/yiliu1765/iommufd/tree/iommufd_alloc_user_v2 Change log: v2: - Require domain_alloc_user op if IOMMU_HWPT_ALLOC passes non-zero flags (Kevin) - IOMMUFD core should check kernel known flags while iommu driver needs to check supported flags as well (Jason) - Minor tweaks per Baolu's comment v1: https://lore.kernel.org/linux-iommu/20230919092523.39286-1-yi.l.liu@intel.c… Regards, Yi Liu Yi Liu (6): iommu: Add new iommu op to create domains owned by userspace iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation iommufd/hw_pagetable: Accepts user flags for domain allocation iommufd/hw_pagetable: Support allocating nested parent domain iommufd/selftest: Add domain_alloc_user() support in iommu mock iommu/vt-d: Add domain_alloc_user op drivers/iommu/intel/iommu.c | 28 +++++++++++++++++ drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 31 ++++++++++++++----- drivers/iommu/iommufd/iommufd_private.h | 3 +- drivers/iommu/iommufd/selftest.c | 19 ++++++++++++ include/linux/iommu.h | 11 ++++++- include/uapi/linux/iommufd.h | 12 ++++++- tools/testing/selftests/iommu/iommufd.c | 24 +++++++++++--- .../selftests/iommu/iommufd_fail_nth.c | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 11 +++++-- 10 files changed, 124 insertions(+), 19 deletions(-) -- 2.34.1

2 years, 2 months

[PATCH v5 00/11] Add Intel VT-d nested translation

by Yi Liu

This is to add Intel VT-d nested translation based on IOMMUFD nesting infrastructure. As the iommufd nesting infrastructure series[1], iommu core supports new ops to report iommu hardware information, allocate domains with user data and invalidate stage-1 IOTLB when there is mapping changed in stage-1 page table. The data required in the three paths are vendor-specific, so 1) IOMMU_HWPT_TYPE_VTD_S1 is defined for the Intel VT-d stage-1 page table, it will be used in the stage-1 domain allocation and IOTLB syncing path. struct iommu_hwpt_vtd_s1 is defined to pass user_data for the Intel VT-d stage-1 domain allocation. struct iommu_hwpt_vtd_s1_invalidate is defined to pass the data for the Intel VT-d stage-1 IOTLB invalidation. 2) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_hw_info_vtd are defined to report iommu hardware information for Intel VT-d. With above IOMMUFD extensions, the intel iommu driver implements the three paths to support nested translation. The first Intel platform supporting nested translation is Sapphire Rapids which, unfortunately, has a hardware errata [2] requiring special treatment. This errata happens when a stage-1 page table page (either level) is located in a stage-2 read-only region. In that case the IOMMU hardware may ignore the stage-2 RO permission and still set the A/D bit in stage-1 page table entries during page table walking. A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report this errata to userspace. With that restriction the user should either disable nested translation to favor RO stage-2 mappings or ensure no RO stage-2 mapping to enable nested translation. Intel-iommu driver is armed with necessary checks to prevent such mix in patch12 of this series. Qemu currently does add RO mappings though. The vfio agent in Qemu simply maps all valid regions in the GPA address space which certainly includes RO regions e.g. vbios. In reality we don't know a usage relying on DMA reads from the BIOS region. Hence finding a way to skip RO regions (e.g. via a discard manager) in Qemu might be an acceptable tradeoff. The actual change needs more discussion in Qemu community. For now we just hacked Qemu to test. Complete code can be found in [3], corresponding QEMU could can be found in [4]. [1] https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.… [2] https://www.intel.com/content/www/us/en/content-details/772415/content-deta… [3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [4] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1 Change log: v5: - Add Kevin's r-b for patch 2, 3 ,5 8, 10 - Drop enforce_cache_coherency callback from the nested type domain ops (Kevin) - Remove duplicate agaw check in patch 04 (Kevin) - Remove duplicate domain_update_iommu_cap() in patch 06 (Kevin) - Check parent's force_snooping to set pgsnp in the pasid entry (Kevin) - uapi data structure check (Kevin) - Simplify the errata handling as user can allocate nested parent domain v4: https://lore.kernel.org/linux-iommu/20230724111335.107427-1-yi.l.liu@intel.… - Remove ascii art tables (Jason) - Drop EMT (Tina, Jason) - Drop MTS and related definitions (Kevin) - Rename macro IOMMU_VTD_PGTBL_ to IOMMU_VTD_S1_ (Kevin) - Rename struct iommu_hwpt_intel_vtd_ to iommu_hwpt_vtd_ (Kevin) - Rename struct iommu_hwpt_intel_vtd to iommu_hwpt_vtd_s1 (Kevin) - Put the vendor specific hwpt alloc data structure before enuma iommu_hwpt_type (Kevin) - Do not trim the higher page levels of S2 domain in nested domain attachment as the S2 domain may have been used independently. (Kevin) - Remove the first-stage pgd check against the maximum address of s2_domain as hw can check it anyhow. It makes sense to check every pfns used in the stage-1 page table. But it cannot make it. So just leave it to hw. (Kevin) - Split the iotlb flush part into an order of uapi, helper and callback implementation (Kevin) - Change the policy of VT-d nesting errata, disallow RO mapping once a domain is used as parent domain of a nested domain. This removes the nested_users counting. (Kevin) - Minor fix for "make htmldocs" v3: https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.c… - Further split the patches into an order of adding helpers for nested domain, iotlb flush, nested domain attachment and nested domain allocation callback, then report the hw_info to userspace. - Add batch support in cache invalidation from userspace - Disallow nested translation usage if RO mappings exists in stage-2 domain due to errata on readonly mappings on Sapphire Rapids platform. v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.… - The iommufd infrastructure is split to be separate series. v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.c… Regards, Yi Liu Lu Baolu (5): iommu/vt-d: Extend dmar_domain to support nested domain iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add nested domain allocation iommu/vt-d: Disallow read-only mappings to nest parent domain Yi Liu (6): iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Set the nested domain to a device iommufd: Add data structure for Intel VT-d stage-1 cache invalidation iommu/vt-d: Make iotlb flush helpers to be extern iommu/vt-d: Add iotlb flush for nested domain drivers/iommu/intel/Makefile | 2 +- drivers/iommu/intel/iommu.c | 60 +++++++++---- drivers/iommu/intel/iommu.h | 51 +++++++++-- drivers/iommu/intel/nested.c | 162 +++++++++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.c | 125 +++++++++++++++++++++++++++ drivers/iommu/intel/pasid.h | 2 + include/uapi/linux/iommufd.h | 76 +++++++++++++++- 7 files changed, 452 insertions(+), 26 deletions(-) create mode 100644 drivers/iommu/intel/nested.c -- 2.34.1

2 years, 2 months

[PATCH] selftests/powerpc: Fix emit_tests to work with run_kselftest.sh

by Michael Ellerman

In order to use run_kselftest.sh the list of tests must be emitted to populate kselftest-list.txt. The powerpc Makefile is written to use EMIT_TESTS. But support for EMIT_TESTS was dropped in commit d4e59a536f50 ("selftests: Use runner.sh for emit targets"). Although prior to that commit a548de0fe8e1 ("selftests: lib.mk: add test execute bit check to EMIT_TESTS") had already broken run_kselftest.sh for powerpc due to the executable check using the wrong path. It can be fixed by replacing the EMIT_TESTS definitions with actual emit_tests rules in the powerpc Makefiles. This makes run_kselftest.sh able to run powerpc tests: $ cd linux $ export ARCH=powerpc $ export CROSS_COMPILE=powerpc64le-linux-gnu- $ make headers $ make -j -C tools/testing/selftests install $ grep -c "^powerpc" tools/testing/selftests/kselftest_install/kselftest-list.txt 182 Fixes: d4e59a536f50 ("selftests: Use runner.sh for emit targets") Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> --- tools/testing/selftests/powerpc/Makefile | 7 +++---- tools/testing/selftests/powerpc/pmu/Makefile | 11 ++++++----- 2 files changed, 9 insertions(+), 9 deletions(-) I'll plan to merge this via the powerpc tree. cheers diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 49f2ad1793fd..7ea42fa02eab 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -59,12 +59,11 @@ override define INSTALL_RULE done; endef -override define EMIT_TESTS +emit_tests: +@for TARGET in $(SUB_DIRS); do \ BUILD_TARGET=$(OUTPUT)/$$TARGET; \ - $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests;\ + $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET $@;\ done; -endef override define CLEAN +@for TARGET in $(SUB_DIRS); do \ @@ -77,4 +76,4 @@ endef tags: find . -name '*.c' -o -name '*.h' | xargs ctags -.PHONY: tags $(SUB_DIRS) +.PHONY: tags $(SUB_DIRS) emit_tests diff --git a/tools/testing/selftests/powerpc/pmu/Makefile b/tools/testing/selftests/powerpc/pmu/Makefile index 2b95e44d20ff..a284fa874a9f 100644 --- a/tools/testing/selftests/powerpc/pmu/Makefile +++ b/tools/testing/selftests/powerpc/pmu/Makefile @@ -30,13 +30,14 @@ override define RUN_TESTS +TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests endef -DEFAULT_EMIT_TESTS := $(EMIT_TESTS) -override define EMIT_TESTS - $(DEFAULT_EMIT_TESTS) +emit_tests: + for TEST in $(TEST_GEN_PROGS); do \ + BASENAME_TEST=`basename $$TEST`; \ + echo "$(COLLECTION):$$BASENAME_TEST"; \ + done +TARGET=ebb; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests +TARGET=sampling_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests +TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests -endef DEFAULT_INSTALL_RULE := $(INSTALL_RULE) override define INSTALL_RULE @@ -64,4 +65,4 @@ sampling_tests: event_code_tests: TARGET=$@; BUILD_TARGET=$$OUTPUT/$$TARGET; mkdir -p $$BUILD_TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -k -C $$TARGET all -.PHONY: all run_tests ebb sampling_tests event_code_tests +.PHONY: all run_tests ebb sampling_tests event_code_tests emit_tests -- 2.41.0

2 years, 2 months

[PATCH 0/5] KVM: x86: Fix breakage in KVM_SET_XSAVE's ABI

by Sean Christopherson

Rework how KVM limits guest-unsupported xfeatures to effectively hide only when saving state for userspace (KVM_GET_XSAVE), i.e. to let userspace load all host-supported xfeatures (via KVM_SET_XSAVE) irrespective of what features have been exposed to the guest. The effect on KVM_SET_XSAVE was knowingly done by commit ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0"): As a bonus, it will also fail if userspace tries to set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to the guest configuration. Such features will never be returned by KVM_GET_XSAVE or KVM_GET_XSAVE2. Peventing userspace from doing stupid things is usually a good idea, but in this case restricting KVM_SET_XSAVE actually exacerbated the problem that commit ad856280ddea was fixing. As reported by Tyler, rejecting KVM_SET_XSAVE for guest-unsupported xfeatures breaks live migration from a kernel without commit ad856280ddea, to a kernel with ad856280ddea. I.e. from a kernel that saves guest-unsupported xfeatures to a kernel that doesn't allow loading guest-unuspported xfeatures. To make matters even worse, QEMU doesn't terminate if KVM_SET_XSAVE fails, and so the end result is that the live migration results (possibly silent) guest data corruption instead of a failed migration. Patch 1 refactors the FPU code to let KVM pass in a mask of which xfeatures to save, patch 2 fixes KVM by passing in guest_supported_xcr0 instead of modifying user_xfeatures directly. Patches 3-5 are regression tests. I have no objection if anyone wants patches 1 and 2 squashed together, I split them purely to make review easier. Note, this doesn't fix the scenario where a guest is migrated from a "bad" to a "good" kernel and the target host doesn't support the over-saved set of xfeatures. I don't see a way to safely handle that in the kernel without an opt-in, which more or less defeats the purpose of handling it in KVM. Sean Christopherson (5): x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2} KVM: selftests: Touch relevant XSAVE state in guest for state test KVM: selftests: Load XSAVE state into untouched vCPU during state test KVM: selftests: Force load all supported XSAVE state in state test arch/x86/include/asm/fpu/api.h | 3 +- arch/x86/kernel/fpu/core.c | 5 +- arch/x86/kernel/fpu/xstate.c | 12 +- arch/x86/kernel/fpu/xstate.h | 3 +- arch/x86/kvm/cpuid.c | 8 -- arch/x86/kvm/x86.c | 37 +++--- .../selftests/kvm/include/x86_64/processor.h | 23 ++++ .../testing/selftests/kvm/x86_64/state_test.c | 110 +++++++++++++++++- 8 files changed, 168 insertions(+), 33 deletions(-) base-commit: 5804c19b80bf625c6a9925317f845e497434d6d3 -- 2.42.0.582.g8ccd20d70d-goog

2 years, 2 months

[RFC v3 0/2] CPU-Idle latency selftest framework

by Aboorva Devarajan

Changelog: v2 -> v3 * Minimal code refactoring * Rebased on v6.6-rc1 RFC v1: https://lore.kernel.org/all/20210611124154.56427-1-psampat@linux.ibm.com/ RFC v2: https://lore.kernel.org/all/20230828061530.126588-2-aboorvad@linux.vnet.ibm… Other related RFC: https://lore.kernel.org/all/20210430082804.38018-1-psampat@linux.ibm.com/ Userspace selftest: https://lkml.org/lkml/2020/9/2/356 ---- A kernel module + userspace driver to estimate the wakeup latency caused by going into stop states. The motivation behind this program is to find significant deviations behind advertised latency and residency values. The patchset measures latencies for two kinds of events. IPIs and Timers As this is a software-only mechanism, there will be additional latencies of the kernel-firmware-hardware interactions. To account for that, the program also measures a baseline latency on a 100 percent loaded CPU and the latencies achieved must be in view relative to that. To achieve this, we introduce a kernel module and expose its control knobs through the debugfs interface that the selftests can engage with. The kernel module provides the following interfaces within /sys/kernel/debug/powerpc/latency_test/ for, IPI test: ipi_cpu_dest = Destination CPU for the IPI ipi_cpu_src = Origin of the IPI ipi_latency_ns = Measured latency time in ns Timeout test: timeout_cpu_src = CPU on which the timer to be queued timeout_expected_ns = Timer duration timeout_diff_ns = Difference of actual duration vs expected timer Sample output is as follows: # --IPI Latency Test--- # Baseline Avg IPI latency(ns): 2720 # Observed Avg IPI latency(ns) - State snooze: 2565 # Observed Avg IPI latency(ns) - State stop0_lite: 3856 # Observed Avg IPI latency(ns) - State stop0: 3670 # Observed Avg IPI latency(ns) - State stop1: 3872 # Observed Avg IPI latency(ns) - State stop2: 17421 # Observed Avg IPI latency(ns) - State stop4: 1003922 # Observed Avg IPI latency(ns) - State stop5: 1058870 # # --Timeout Latency Test-- # Baseline Avg timeout diff(ns): 1435 # Observed Avg timeout diff(ns) - State snooze: 1709 # Observed Avg timeout diff(ns) - State stop0_lite: 2028 # Observed Avg timeout diff(ns) - State stop0: 1954 # Observed Avg timeout diff(ns) - State stop1: 1895 # Observed Avg timeout diff(ns) - State stop2: 14556 # Observed Avg timeout diff(ns) - State stop4: 873988 # Observed Avg timeout diff(ns) - State stop5: 959137 Aboorva Devarajan (2): powerpc/cpuidle: cpuidle wakeup latency based on IPI and timer events powerpc/selftest: Add support for cpuidle latency measurement arch/powerpc/Kconfig.debug | 10 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/test_cpuidle_latency.c | 154 ++++++ tools/testing/selftests/powerpc/Makefile | 1 + .../powerpc/cpuidle_latency/.gitignore | 2 + .../powerpc/cpuidle_latency/Makefile | 6 + .../cpuidle_latency/cpuidle_latency.sh | 443 ++++++++++++++++++ .../powerpc/cpuidle_latency/settings | 1 + 8 files changed, 618 insertions(+) create mode 100644 arch/powerpc/kernel/test_cpuidle_latency.c create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile create mode 100755 tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings -- 2.25.1

2 years, 2 months

[PATCH v6] selftests/clone3: Fix broken test under !CONFIG_TIME_NS

by Tiezhu Yang

When execute the following command to test clone3 under !CONFIG_TIME_NS: # make headers && cd tools/testing/selftests/clone3 && make && ./clone3 we can see the following error info: # [7538] Trying clone3() with flags 0x80 (size 0) # Invalid argument - Failed to create new process # [7538] clone3() with flags says: -22 expected 0 not ok 18 [7538] Result (-22) is different than expected (0) ... # Totals: pass:18 fail:1 xfail:0 xpass:0 skip:0 error:0 This is because if CONFIG_TIME_NS is not set, but the flag CLONE_NEWTIME (0x80) is used to clone a time namespace, it will return -EINVAL in copy_time_ns(). If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time will be not exist, and then we should skip clone3() test with CLONE_NEWTIME. With this patch under !CONFIG_TIME_NS: # make headers && cd tools/testing/selftests/clone3 && make && ./clone3 ... # Time namespaces are not supported ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME ... # Totals: pass:18 fail:0 xfail:0 xpass:0 skip:1 error:0 Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME") Suggested-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> --- v6: Rebase on 6.5-rc1 and update the commit message tools/testing/selftests/clone3/clone3.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c index e60cf4d..1c61e3c 100644 --- a/tools/testing/selftests/clone3/clone3.c +++ b/tools/testing/selftests/clone3/clone3.c @@ -196,7 +196,12 @@ int main(int argc, char *argv[]) CLONE3_ARGS_NO_TEST); /* Do a clone3() in a new time namespace */ - test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST); + if (access("/proc/self/ns/time", F_OK) == 0) { + test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST); + } else { + ksft_print_msg("Time namespaces are not supported\n"); + ksft_test_result_skip("Skipping clone3() with CLONE_NEWTIME\n"); + } /* Do a clone3() with exit signal (SIGCHLD) in flags */ test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST); -- 2.1.0

2 years, 2 months

[PATCH v2 0/2] Modify vDSO selftests

by Tiezhu Yang

v2: Rebase on 6.5-rc1 and update the commit message Tiezhu Yang (2): selftests/vDSO: Add support for LoongArch selftests/vDSO: Get version and name for all archs tools/testing/selftests/vDSO/vdso_config.h | 6 ++++- tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++-------- .../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++---------------- 3 files changed, 18 insertions(+), 30 deletions(-) -- 2.1.0

2 years, 2 months

[PATCH v1 00/20] Permission Overlay Extension

by Joey Gouly

Hi all, This series implements the Permission Overlay Extension introduced in 2022 VMSA enhancements [1]. It is based on v6.6-rc3. The Permission Overlay Extension allows to constrain permissions on memory regions. This can be used from userspace (EL0) without a system call or TLB invalidation. POE is used to implement the Memory Protection Keys [2] Linux syscall. The first few patches add the basic framework, then the PKEYS interface is implemented, and then the selftests are made to work on arm64. There was discussion about what the 'default' protection key value should be, I used disallow-all (apart from pkey 0), which matches what x86 does. Patch 15 contains a call to cpus_have_const_cap(), which I couldn't avoid until Mark's patch to re-order when the alternatives were applied [3] is committed. The KVM part isn't tested yet. I have tested the modified protection_keys test on x86_64 [4], but not PPC. Hopefully I have CC'd everyone correctly. Thanks, Joey Joey Gouly (20): arm64/sysreg: add system register POR_EL{0,1} arm64/sysreg: update CPACR_EL1 register arm64: cpufeature: add Permission Overlay Extension cpucap arm64: disable trapping of POR_EL0 to EL2 arm64: context switch POR_EL0 register KVM: arm64: Save/restore POE registers arm64: enable the Permission Overlay Extension for EL0 arm64: add POIndex defines arm64: define VM_PKEY_BIT* for arm64 arm64: mask out POIndex when modifying a PTE arm64: enable ARCH_HAS_PKEYS on arm64 arm64: handle PKEY/POE faults arm64: stop using generic mm_hooks.h arm64: implement PKEYS support arm64: add POE signal support arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist kselftest/arm64: move get_header() selftests: mm: move fpregs printing selftests: mm: make protection_keys test work on arm64 Documentation/arch/arm64/elf_hwcaps.rst | 3 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/el2_setup.h | 10 +- arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/mman.h | 6 +- arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/mmu_context.h | 51 ++++++- arch/arm64/include/asm/pgtable-hwdef.h | 10 ++ arch/arm64/include/asm/pgtable-prot.h | 8 +- arch/arm64/include/asm/pgtable.h | 28 +++- arch/arm64/include/asm/pkeys.h | 110 ++++++++++++++ arch/arm64/include/asm/por.h | 33 +++++ arch/arm64/include/asm/processor.h | 1 + arch/arm64/include/asm/sysreg.h | 16 ++ arch/arm64/include/asm/traps.h | 1 + arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/sigcontext.h | 7 + arch/arm64/kernel/cpufeature.c | 15 ++ arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/process.c | 16 ++ arch/arm64/kernel/signal.c | 51 +++++++ arch/arm64/kernel/traps.c | 12 +- arch/arm64/kvm/sys_regs.c | 2 + arch/arm64/mm/fault.c | 44 +++++- arch/arm64/mm/mmap.c | 7 + arch/arm64/mm/mmu.c | 38 +++++ arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 11 +- fs/proc/task_mmu.c | 2 + include/linux/mm.h | 11 +- .../arm64/signal/testcases/testcases.c | 23 --- .../arm64/signal/testcases/testcases.h | 26 +++- tools/testing/selftests/mm/Makefile | 2 +- tools/testing/selftests/mm/pkey-arm64.h | 138 ++++++++++++++++++ tools/testing/selftests/mm/pkey-helpers.h | 8 + tools/testing/selftests/mm/pkey-powerpc.h | 3 + tools/testing/selftests/mm/pkey-x86.h | 4 + tools/testing/selftests/mm/protection_keys.c | 29 ++-- 39 files changed, 685 insertions(+), 52 deletions(-) create mode 100644 arch/arm64/include/asm/pkeys.h create mode 100644 arch/arm64/include/asm/por.h create mode 100644 tools/testing/selftests/mm/pkey-arm64.h -- 2.25.1

2 years, 2 months

← Newer
1
2
3
4
5
6
7
...
26
Older →

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror September 2023