Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v10:
- Move pgtable.h defintions into a no __ASSEMBLY__ region to resolve compilation
conflicts (pointed out by Conor)
- Will now compile with allmodconfig
v9:
- Raise the mmap_end default to STACK_TOP_MAX to allow the address space to grow
beyond the default of sv48 on sv57 machines as suggested by Alexandre
- Some of the mmap macros had unnecessary conditionals that I have removed
v8:
- Fix RV32 and the RV32 compat mode of RV64 (suggested by Conor)
- Extract out addr and base from the mmap macros (suggested by Alexandre)
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 33 ++++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 261 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.34.1
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V Zicfiss feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
We currently maintain the x86 pattern of implicitly allocating a shadow
stack for threads started with shadow stack enabled, there has been some
discussion of removing this support and requiring the use of clone3()
with explicit allocation of shadow stacks instead. I have no strong
feelings either way, implicit allocation is not really consistent with
anything else we do and creates the potential for errors around thread
exit but on the other hand it is existing ABI on x86 and minimises the
changes needed in userspace code.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
The series depends on support for shadow stacks in clone3(), that series
includes the addition of ARCH_HAS_USER_SHADOW_STACK.
https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
It also depends on the addition of more waitpid() flags to nolibc:
https://lore.kernel.org/r/20231023-nolibc-waitpid-flags-v2-1-b09d096f091f@k…
You can see a branch with the full set of dependencies against Linus'
tree at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (39):
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
mman: Add map_shadow_stack() flags
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide put_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
kselftest/clone3: Enable GCS in the clone3 selftests
Documentation/admin-guide/kernel-parameters.txt | 6 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 233 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 20 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 107 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/mman.h | 23 +-
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 40 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 81 +++
arch/arm64/kernel/ptrace.c | 59 ++
arch/arm64/kernel/signal.c | 236 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 79 ++-
arch/arm64/mm/gcs.c | 259 +++++++
arch/arm64/mm/mmap.c | 13 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 ++
arch/x86/include/uapi/asm/mman.h | 3 -
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/uapi/asm-generic/mman.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 742 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/clone3/clone3.c | 37 +
73 files changed, 4234 insertions(+), 40 deletions(-)
---
base-commit: 3d0134d322380292c055454d9633738733992d61
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This extends the KVM RISC-V ONE_REG interface to report more ISA extensions
namely: Zbz, scalar crypto, vector crypto, Zfh[min], Zihintntl, Zvfh[min],
and Zfa.
This series depends upon the "riscv: report more ISA extensions through
hwprobe" series.from Clement.
(Link: https://lore.kernel.org/lkml/20231114141256.126749-1-cleger@rivosinc.com/)
To test these patches, use KVMTOOL from the riscv_more_exts_v1 branch at:
https://github.com/avpatel/kvmtool.git
These patches can also be found in the riscv_kvm_more_exts_v1 branch at:
https://github.com/avpatel/linux.git
Anup Patel (15):
KVM: riscv: selftests: Generate ISA extension reg_list using macros
RISC-V: KVM: Allow Zbc extension for Guest/VM
KVM: riscv: selftests: Add Zbc extension to get-reg-list test
RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list
test
RISC-V: KVM: Allow vector crypto extensions for Guest/VM
KVM: riscv: selftests: Add vector crypto extensions to get-reg-list
test
RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zihintntl extension for Guest/VM
KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zfa extension for Guest/VM
KVM: riscv: selftests: Add Zfa extension to get-reg-list test
arch/riscv/include/uapi/asm/kvm.h | 27 ++
arch/riscv/kvm/vcpu_onereg.c | 54 +++
.../selftests/kvm/riscv/get-reg-list.c | 439 ++++++++----------
3 files changed, 265 insertions(+), 255 deletions(-)
--
2.34.1
Hi folks,
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework for nested translation. Nested
translation is a hardware feature that supports two-stage translation
tables for IOMMU. The second-stage translation table is managed by the
host VMM, while the first-stage translation table is owned by user
space. This allows user space to control the IOMMU mappings for its
devices.
When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.
User space indicates its capability of handling IO page faults by
setting the IOMMU_HWPT_ALLOC_IOPF_CAPABLE flag when allocating a
hardware page table (HWPT). IOMMUFD will then set up its infrastructure
for page fault delivery. On a successful return of HWPT allocation, the
user can retrieve and respond to page faults by reading and writing to
the file descriptor (FD) returned in out_fault_fd.
The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.
This series is based on the latest implementation of nested translation
under discussion [1] and the page fault handling framework refactoring in
the IOMMU core [2].
The series and related patches are available on GitHub: [3]
[1] https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
[2] https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i…
[3] https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-…
Best regards,
baolu
Change log:
v2:
- Move all iommu refactoring patches into a sparated series and discuss
it in a different thread. The latest patch series [v6] is available at
https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i…
- We discussed the timeout of the pending page fault messages. We
agreed that we shouldn't apply any timeout policy for the page fault
handling in user space.
https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
- Jason suggested that we adopt a simple file descriptor interface for
reading and responding to I/O page requests, so that user space
applications can improve performance using io_uring.
https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/
v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.…
Lu Baolu (6):
iommu: Add iommu page fault cookie helpers
iommufd: Add iommu page fault uapi data
iommufd: Initializing and releasing IO page fault data
iommufd: Deliver fault messages to user space
iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_IOPF test support
iommufd/selftest: Add coverage for IOMMU_TEST_OP_TRIGGER_IOPF
include/linux/iommu.h | 9 +
drivers/iommu/iommu-priv.h | 15 +
drivers/iommu/iommufd/iommufd_private.h | 12 +
drivers/iommu/iommufd/iommufd_test.h | 8 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd_utils.h | 66 ++++-
drivers/iommu/io-pgfault.c | 50 ++++
drivers/iommu/iommufd/device.c | 69 ++++-
drivers/iommu/iommufd/hw_pagetable.c | 260 +++++++++++++++++-
drivers/iommu/iommufd/selftest.c | 56 ++++
tools/testing/selftests/iommu/iommufd.c | 24 +-
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
12 files changed, 620 insertions(+), 16 deletions(-)
--
2.34.1
This adds the pasid attach/detach uAPIs for userspace to attach/detach
a PASID of a device to/from a given ioas/hwpt. Only vfio-pci driver is
enabled in this series. After this series, PASID-capable devices bound
with vfio-pci can report PASID capability to userspace and VM to enable
PASID usages like Shared Virtual Addressing (SVA).
This series first adds the helpers for pasid attach in vfio core and then
add the device cdev ioctls for pasid attach/detach, finally exposes the
device PASID capability to user. It depends on iommufd pasid attach/detach
series [1].
Complete code can be found at [2], tested with a draft Qemu branch[3]
[1] https://lore.kernel.org/linux-iommu/20231127063428.127436-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1%…
Change log:
v1:
- Report PASID capability via VFIO_DEVICE_FEATURE (Alex)
rfc: https://lore.kernel.org/linux-iommu/20230926093121.18676-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
Yi Liu (2):
vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT
vfio: Report PASID capability via VFIO_DEVICE_FEATURE ioctl
drivers/vfio/device_cdev.c | 45 +++++++++++++++++++++
drivers/vfio/iommufd.c | 48 ++++++++++++++++++++++
drivers/vfio/pci/vfio_pci.c | 2 +
drivers/vfio/pci/vfio_pci_core.c | 47 ++++++++++++++++++++++
drivers/vfio/vfio.h | 4 ++
drivers/vfio/vfio_main.c | 8 ++++
include/linux/vfio.h | 11 ++++++
include/uapi/linux/vfio.h | 68 ++++++++++++++++++++++++++++++++
8 files changed, 233 insertions(+)
--
2.34.1
From: Christoph Müllner <christoph.muellner(a)vrull.eu>
When building the RISC-V selftests with a riscv32 compiler I ran into
a couple of compiler warnings. While riscv32 support for these tests is
questionable, the fixes are so trivial that it is probably best to simply
apply them.
Note that the missing-include patch and some format string warnings
are also relevant for riscv64.
Christoph Müllner (5):
tools: selftests: riscv: Fix compile warnings in hwprobe
tools: selftests: riscv: Fix compile warnings in cbo
tools: selftests: riscv: Add missing include for vector test
tools: selftests: riscv: Fix compile warnings in vector tests
tools: selftests: riscv: Fix compile warnings in mm tests
tools/testing/selftests/riscv/hwprobe/cbo.c | 6 +++---
tools/testing/selftests/riscv/hwprobe/hwprobe.c | 4 ++--
tools/testing/selftests/riscv/mm/mmap_test.h | 3 +++
tools/testing/selftests/riscv/vector/v_initval_nolibc.c | 2 +-
tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c | 3 +++
tools/testing/selftests/riscv/vector/vstate_prctl.c | 4 ++--
6 files changed, 14 insertions(+), 8 deletions(-)
--
2.41.0
This is the second part to add Intel VT-d nested translation based on IOMMUFD
nesting infrastructure. As the iommufd nesting infrastructure series [1],
iommu core supports new ops to invalidate the cache after the modifictions
in stage-1 page table. So far, the cache invalidation data is vendor specific,
the data_type (IOMMU_HWPT_DATA_VTD_S1) defined for the vendor specific HWPT
allocation is reused in the cache invalidation path. User should provide the
correct data_type that suit with the type used in HWPT allocation.
IOMMU_HWPT_INVALIDATE iotcl returns an error in @out_driver_error_code. However
Intel VT-d does not define error code so far, so it's not easy to pre-define it
in iommufd neither. As a result, this field should just be ignored on VT-d platform.
Complete code can be found in [2], corresponding QEMU could can be found in [3].
[1] https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v7:
- No much change, just rebase on top of 6.7-rc1
v6: https://lore.kernel.org/linux-iommu/20231020093719.18725-1-yi.l.liu@intel.c…
- Address comments from Kevin
- Split the VT-d nesting series into two parts (Jason)
v5: https://lore.kernel.org/linux-iommu/20230921075431.125239-1-yi.l.liu@intel.…
- Add Kevin's r-b for patch 2, 3 ,5 8, 10
- Drop enforce_cache_coherency callback from the nested type domain ops (Kevin)
- Remove duplicate agaw check in patch 04 (Kevin)
- Remove duplicate domain_update_iommu_cap() in patch 06 (Kevin)
- Check parent's force_snooping to set pgsnp in the pasid entry (Kevin)
- uapi data structure check (Kevin)
- Simplify the errata handling as user can allocate nested parent domain
v4: https://lore.kernel.org/linux-iommu/20230724111335.107427-1-yi.l.liu@intel.…
- Remove ascii art tables (Jason)
- Drop EMT (Tina, Jason)
- Drop MTS and related definitions (Kevin)
- Rename macro IOMMU_VTD_PGTBL_ to IOMMU_VTD_S1_ (Kevin)
- Rename struct iommu_hwpt_intel_vtd_ to iommu_hwpt_vtd_ (Kevin)
- Rename struct iommu_hwpt_intel_vtd to iommu_hwpt_vtd_s1 (Kevin)
- Put the vendor specific hwpt alloc data structure before enuma iommu_hwpt_type (Kevin)
- Do not trim the higher page levels of S2 domain in nested domain attachment as the
S2 domain may have been used independently. (Kevin)
- Remove the first-stage pgd check against the maximum address of s2_domain as hw
can check it anyhow. It makes sense to check every pfns used in the stage-1 page
table. But it cannot make it. So just leave it to hw. (Kevin)
- Split the iotlb flush part into an order of uapi, helper and callback implementation (Kevin)
- Change the policy of VT-d nesting errata, disallow RO mapping once a domain is used
as parent domain of a nested domain. This removes the nested_users counting. (Kevin)
- Minor fix for "make htmldocs"
v3: https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.c…
- Further split the patches into an order of adding helpers for nested
domain, iotlb flush, nested domain attachment and nested domain allocation
callback, then report the hw_info to userspace.
- Add batch support in cache invalidation from userspace
- Disallow nested translation usage if RO mappings exists in stage-2 domain
due to errata on readonly mappings on Sapphire Rapids platform.
v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.…
- The iommufd infrastructure is split to be separate series.
v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Yi Liu (3):
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
iommu/vt-d: Make iotlb flush helpers to be extern
iommu/vt-d: Add iotlb flush for nested domain
drivers/iommu/intel/iommu.c | 10 +++----
drivers/iommu/intel/iommu.h | 6 ++++
drivers/iommu/intel/nested.c | 54 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/iommufd.h | 36 ++++++++++++++++++++++++
4 files changed, 101 insertions(+), 5 deletions(-)
--
2.34.1
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series adds the cache invalidation path for the userspace to invalidate
cache after modifying the stage-1 page table. This is based on the first part
of nesting [1]
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v6:
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (1):
iommu: Add cache_invalidate_user op
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (1):
iommufd: Add IOMMU_HWPT_INVALIDATE
drivers/iommu/iommufd/hw_pagetable.c | 35 ++++++++
drivers/iommu/iommufd/iommufd_private.h | 9 ++
drivers/iommu/iommufd/iommufd_test.h | 22 +++++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 69 +++++++++++++++
include/linux/iommu.h | 84 +++++++++++++++++++
include/uapi/linux/iommufd.h | 35 ++++++++
tools/testing/selftests/iommu/iommufd.c | 75 +++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++++++++++++++
9 files changed, 395 insertions(+)
--
2.34.1
This patchset moves the current kernel testing livepatch modules from
lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles
them as out-of-tree modules before testing.
There is also a new test being added. This new test exercises multiple processes
calling a syscall, while a livepatch patched the syscall.
Why this move is an improvement:
* The modules are now compiled as out-of-tree modules against the current
running kernel, making them capable of being tested on different systems with
newer or older kernels.
* Such approach now needs kernel-devel package to be installed, since they are
out-of-tree modules. These can be generated by running "make rpm-pkg" in the
kernel source.
What needs to be solved:
* Currently gen_tar only packages the resulting binaries of the tests, and not
the sources. For the current approach, the newly added modules would be
compiled and then packaged. It works when testing on a system with the same
kernel version. But it will fail when running on a machine with different kernel
version, since module was compiled against the kernel currently running.
This is not a new problem, just aligning the expectations. For the current
approach to be truly system agnostic gen_tar would need to include the module
and program sources to be compiled in the target systems.
I'm sending the patches now so it can be discussed before Plumbers.
Thanks in advance!
Marcos
To: Shuah Khan <shuah(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Heiko Carstens <hca(a)linux.ibm.com>
To: Vasily Gorbik <gor(a)linux.ibm.com>
To: Alexander Gordeev <agordeev(a)linux.ibm.com>
To: Christian Borntraeger <borntraeger(a)linux.ibm.com>
To: Sven Schnelle <svens(a)linux.ibm.com>
To: Josh Poimboeuf <jpoimboe(a)kernel.org>
To: Jiri Kosina <jikos(a)kernel.org>
To: Miroslav Benes <mbenes(a)suse.cz>
To: Petr Mladek <pmladek(a)suse.com>
To: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-s390(a)vger.kernel.org
Cc: live-patching(a)vger.kernel.org
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
Changes in v3:
* Rebased on top of v6.6-rc5
* The commits messages were improved (Thanks Petr!)
* Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel
modules, and adapt selftests to build it before running the test.
* Moved test_klp-call_getpid out of test_programs, since the gen_tar
would just copy the generated test programs to the livepatches dir,
and so scripts relying on test_programs/test_klp-call_getpid will fail.
* Added a module_param for klp_pids, describing it's usage.
* Simplified the call_getpid program to ignore the return of getpid syscall,
since we only want to make sure the process transitions correctly to the
patched stated
* The test-syscall.sh not prints a log message showing the number of remaining
processes to transition into to livepatched state, and check_output expects it
to be 0.
* Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c
The v2 can be seen here:
https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus…
---
Marcos Paulo de Souza (3):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
Documentation/dev-tools/kselftest.rst | 4 +
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ----
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 ---
tools/testing/selftests/lib.mk | 20 +++-
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 17 +--
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 34 +++---
.../testing/selftests/livepatch/test-callbacks.sh | 50 ++++-----
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 ++--
tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++
.../selftests/livepatch/test_modules/Makefile | 20 ++++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++
31 files changed, 325 insertions(+), 121 deletions(-)
---
base-commit: 6489bf2e1df1c84e9bcd4694029ff35b39fd3397
change-id: 20231031-send-lp-kselftests-4c917dcd4565
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first move some pmu common code from vpmu_counter_access to
lib/aarch64/vpmu.c and include/aarch64/vpmu.h, which can be used by
pmu_event_filter_test. Then fix a bug related to the [enable|disable]_counter,
and at last, implement the test itself.
Changelog:
----------
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
Shaoqin Huang (5):
KVM: selftests: aarch64: Make the [create|destroy]_vpmu_vm() public
KVM: selftests: aarch64: Move pmu helper functions into vpmu.h
KVM: selftests: aarch64: Fix the buggy [enable|disable]_counter
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 2 +
.../kvm/aarch64/pmu_event_filter_test.c | 267 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 218 ++------------
.../selftests/kvm/include/aarch64/vpmu.h | 135 +++++++++
.../testing/selftests/kvm/lib/aarch64/vpmu.c | 74 +++++
5 files changed, 502 insertions(+), 194 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
create mode 100644 tools/testing/selftests/kvm/lib/aarch64/vpmu.c
--
2.40.1