Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq
activity away from the isolated core.
If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.
These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive
queue creation time without any fourther per device configuration
required.
Additionally, a simple self-test is introduced to check the
rps_default_mask behavior.
v1 -> v2:
- fix sparse warning in patch 2/3
Paolo Abeni (3):
net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
net/core: introduce default_rps_mask netns attribute
self-tests: introduce self-tests for RPS default mask
Documentation/admin-guide/sysctl/net.rst | 6 ++
include/linux/netdevice.h | 1 +
net/core/net-sysfs.c | 73 +++++++++++--------
net/core/sysctl_net_core.c | 58 +++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 3 +
.../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
7 files changed, 169 insertions(+), 30 deletions(-)
create mode 100755 tools/testing/selftests/net/rps_default_mask.sh
--
2.26.2
[
This has been in linux-next for a little while now, and we've completed
the syzkaller run. 1300 hours of CPU time have been invested since the
last report with no improvement in coverage or new detections. syzkaller
coverage reached 69%(75%), and review of the misses show substantial
amounts are WARN_ON's and other debugging which are not expected to be
covered.
]
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU device
specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance the
combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a
guest. Dirty tracking enables VM live migration with SRIOV devices and
PASID support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs, which
is currently VFIO and VDPA.
The pre-v1 series proposed re-using the VFIO type 1 data structure,
however it was suggested that if we are doing this big update then we
should also come with an improved data structure that solves the
limitations that VFIO type1 has. Notably this addresses:
- Multiple IOAS/'containers' and multiple domains inside a single FD
- Single-pin operation no matter how many domains and containers use
a page
- A fine grained locking scheme supporting user managed concurrency for
multi-threaded map/unmap
- A pre-registration mechanism to optimize vIOMMU use cases by
pre-pinning pages
- Extended ioctl API that can manage these new objects and exposes
domains directly to user space
- domains are sharable between subsystems, eg VFIO and VDPA
The bulk of this code is a new data structure design to track how the
IOVAs are mapped to PFNs.
iommufd intends to be general and consumable by any driver that wants to
DMA to userspace. From a driver perspective it can largely be dropped in
in-place of iommu_attach_device() and provides a uniform full feature set
to all consumers.
As this is a larger project this series is the first step. This series
provides the iommfd "generic interface" which is designed to be suitable
for applications like DPDK and VMM flows that are not optimized to
specific HW scenarios. It is close to being a drop in replacement for the
existing VFIO type 1 and supports existing qemu based VM flows.
Several follow-on series are being prepared:
- Patches integrating with qemu in native mode:
https://github.com/yiliu1765/qemu/commits/qemu-iommufd-6.0-rc2
- A completed integration with VFIO now exists that covers "emulated" mdev
use cases now, and can pass testing with qemu/etc in compatability mode:
https://github.com/jgunthorpe/linux/commits/vfio_iommufd
- A draft providing system iommu dirty tracking on top of iommufd,
including iommu driver implementations:
https://github.com/jpemartins/linux/commits/x86-iommufd
This pairs with patches for providing a similar API to support VFIO-device
tracking to give a complete vfio solution:
https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@nvidia.com/
- Userspace page tables aka 'nested translation' for ARM and Intel iommu
drivers:
https://github.com/nicolinc/iommufd/commits/iommufd_nesting
- "device centric" vfio series to expose the vfio_device FD directly as a
normal cdev, and provide an extended API allowing dynamically changing
the IOAS binding:
https://github.com/yiliu1765/iommufd/commits/iommufd-v6.0-rc2-nesting-0901
- Drafts for PASID and PRI interfaces are included above as well
Overall enough work is done now to show the merit of the new API design
and at least draft solutions to many of the main problems.
Several people have contributed directly to this work: Eric Auger, Joao
Martins, Kevin Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have
participated in the discussions that lead here, and provided ideas. Thanks
to all!
The v1/v2 iommufd series has been used to guide a large amount of preparatory
work that has now been merged. The general theme is to organize things in
a way that makes injecting iommufd natural:
- VFIO live migration support with mlx5 and hisi_acc drivers.
These series need a dirty tracking solution to be really usable.
https://lore.kernel.org/kvm/20220224142024.147653-1-yishaih@nvidia.com/https://lore.kernel.org/kvm/20220308184902.2242-1-shameerali.kolothum.thodi…
- Significantly rework the VFIO gvt mdev and remove struct
mdev_parent_ops
https://lore.kernel.org/lkml/20220411141403.86980-1-hch@lst.de/
- Rework how PCIe no-snoop blocking works
https://lore.kernel.org/kvm/0-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia…
- Consolidate dma ownership into the iommu core code
https://lore.kernel.org/linux-iommu/20220418005000.897664-1-baolu.lu@linux.…
- Make all vfio driver interfaces use struct vfio_device consistently
https://lore.kernel.org/kvm/0-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nv…
- Remove the vfio_group from the kvm/vfio interface
https://lore.kernel.org/kvm/0-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@n…
- Simplify locking in vfio
https://lore.kernel.org/kvm/0-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nv…
- Remove the vfio notifiter scheme that faces drivers
https://lore.kernel.org/kvm/0-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidi…
- Improve the driver facing API for vfio pin/unpin pages to make the
presence of struct page clear
https://lore.kernel.org/kvm/20220723020256.30081-1-nicolinc@nvidia.com/
- Clean up in the Intel IOMMU driver
https://lore.kernel.org/linux-iommu/20220301020159.633356-1-baolu.lu@linux.…https://lore.kernel.org/linux-iommu/20220510023407.2759143-1-baolu.lu@linux…https://lore.kernel.org/linux-iommu/20220514014322.2927339-1-baolu.lu@linux…https://lore.kernel.org/linux-iommu/20220706025524.2904370-1-baolu.lu@linux…https://lore.kernel.org/linux-iommu/20220702015610.2849494-1-baolu.lu@linux…
- Rework s390 vfio drivers
https://lore.kernel.org/kvm/20220707135737.720765-1-farman@linux.ibm.com/
- Normalize vfio ioctl handling
https://lore.kernel.org/kvm/0-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidi…
- VFIO API for dirty tracking (aka dma logging) managed inside a PCI
device, with mlx5 implementation
https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@nvidia.com
- Introduce a struct device sysfs presence for struct vfio_device
https://lore.kernel.org/kvm/20220901143747.32858-1-kevin.tian@intel.com/
- Complete restructuring the vfio mdev model
https://lore.kernel.org/kvm/20220822062208.152745-1-hch@lst.de/
- Isolate VFIO container code in preperation for iommufd to provide an
alternative implementation of it all
https://lore.kernel.org/kvm/0-v1-a805b607f1fb+17b-vfio_container_split_jgg@…
- Simplify and consolidate iommu_domain/device compatability checking
https://lore.kernel.org/linux-iommu/cover.1666042872.git.nicolinc@nvidia.co…
- Align iommu SVA support with the domain-centric model
https://lore.kernel.org/all/20221031005917.45690-1-baolu.lu@linux.intel.com/
This is about 233 patches applied since March, thank you to everyone
involved in all this work!
Currently there are a number of supporting series still in progress:
- DMABUF exporter support for VFIO to allow PCI P2P with VFIO
https://lore.kernel.org/r/0-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com
- Start to provide iommu_domain ops for POWER
https://lore.kernel.org/all/20220714081822.3717693-1-aik@ozlabs.ru/
However, these are not necessary for this series to advance.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd
v4:
- Rebase to v6.1-rc3, include the iommu branch with the needed EINVAL
patch series and also the SVA rework
- All bug fixes and comments with no API or behavioral changes
- gvt tests are passing again
- Syzkaller is no longer finding issues and achieved high coverage of
69%(75%)
- Coverity has been run by two people
- new "nth failure" test that systematically sweeps all error unwind paths
looking for splats
- All fixes noted in the mailing list
If you sent an email and I didn't reply please ping it, I have lost it.
- The selftest patch has been broken into three to make the additional
modification to the main code clearer
- The interdiff is 1.8k lines for the main code, with another 3k of
test suite changes
v3: https://lore.kernel.org/r/0-v3-402a7d6459de+24b-iommufd_jgg@nvidia.com
- Rebase to v6.1-rc1
- Improve documentation
- Use EXPORT_SYMBOL_NS
- Fix W1, checkpatch stuff
- Revise pages.c to resolve the FIXMEs. Create a
interval_tree_double_span_iter which allows a simple expression of the
previously problematic algorithms
- Consistently use the word 'access' instead of user to refer to an
access from an in-kernel user (eg vfio mdev)
- Support two forms of rlimit accounting and make the vfio compatible one
the default in compatability mode (following series)
- Support old VFIO type1 by disabling huge pages and implementing a
simple algorithm to split a struct iopt_area
- Full implementation of access support, test coverage and optimizations
- Complete COPY to be able to copy across contiguous areas. Improve
all the algorithms around contiguous areas with a dedicated iterator
- Functional ENFORCED_COHERENT support
- Support multi-device groups
- Lots of smaller changes (the interdiff is 5k lines)
v2: https://lore.kernel.org/r/0-v2-f9436d0bde78+4bb-iommufd_jgg@nvidia.com
- Rebase to v6.0-rc3
- Improve comments
- Change to an iterative destruction approach to avoid cycles
- Near rewrite of the vfio facing implementation, supported by a complete
implementation on the vfio side
- New IOMMU_IOAS_ALLOW_IOVAS API as discussed. Allows userspace to
assert that ranges of IOVA must always be mappable. To be used by a VMM
that has promised a guest a certain availability of IOVA. May help
guide PPC's multi-window implementation.
- Rework how unmap_iova works, user can unmap the whole ioas now
- The no-snoop / wbinvd support is implemented
- Bug fixes
- Test suite improvements
- Lots of smaller changes (the interdiff is 3k lines)
v1: https://lore.kernel.org/r/0-v1-e79cd8d168e8+6-iommufd_jgg@nvidia.com
Jason Gunthorpe (15):
iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY
interval-tree: Add a utility to iterate over spans in an interval tree
iommufd: File descriptor, context, kconfig and makefiles
kernel/user: Allow user::locked_vm to be usable for iommufd
iommufd: PFN handling for iopt_pages
iommufd: Algorithms for PFN storage
iommufd: Data structure to provide IOVA to PFN mapping
iommufd: IOCTLs for the io_pagetable
iommufd: Add a HW pagetable object
iommufd: Add kAPI toward external drivers for physical devices
iommufd: Add kAPI toward external drivers for kernel access
iommufd: vfio container FD ioctl compatibility
iommufd: Add a selftest
iommufd: Add some fault injection points
iommufd: Add additional invariant assertions
Kevin Tian (1):
iommufd: Document overview of iommufd
Lu Baolu (1):
iommu: Add device-centric DMA ownership interfaces
.clang-format | 3 +
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 1 +
Documentation/userspace-api/iommufd.rst | 222 ++
MAINTAINERS | 12 +
drivers/iommu/Kconfig | 1 +
drivers/iommu/Makefile | 2 +-
drivers/iommu/amd/iommu.c | 2 +
drivers/iommu/intel/iommu.c | 16 +-
drivers/iommu/iommu.c | 124 +-
drivers/iommu/iommufd/Kconfig | 23 +
drivers/iommu/iommufd/Makefile | 13 +
drivers/iommu/iommufd/device.c | 748 +++++++
drivers/iommu/iommufd/double_span.h | 98 +
drivers/iommu/iommufd/hw_pagetable.c | 57 +
drivers/iommu/iommufd/io_pagetable.c | 1214 +++++++++++
drivers/iommu/iommufd/io_pagetable.h | 241 +++
drivers/iommu/iommufd/ioas.c | 390 ++++
drivers/iommu/iommufd/iommufd_private.h | 307 +++
drivers/iommu/iommufd/iommufd_test.h | 93 +
drivers/iommu/iommufd/main.c | 419 ++++
drivers/iommu/iommufd/pages.c | 1884 +++++++++++++++++
drivers/iommu/iommufd/selftest.c | 853 ++++++++
drivers/iommu/iommufd/vfio_compat.c | 452 ++++
include/linux/interval_tree.h | 58 +
include/linux/iommu.h | 17 +
include/linux/iommufd.h | 102 +
include/linux/sched/user.h | 2 +-
include/uapi/linux/iommufd.h | 332 +++
kernel/user.c | 1 +
lib/Kconfig | 4 +
lib/interval_tree.c | 132 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/iommu/.gitignore | 3 +
tools/testing/selftests/iommu/Makefile | 12 +
tools/testing/selftests/iommu/config | 2 +
tools/testing/selftests/iommu/iommufd.c | 1627 ++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 580 +++++
tools/testing/selftests/iommu/iommufd_utils.h | 278 +++
39 files changed, 10294 insertions(+), 33 deletions(-)
create mode 100644 Documentation/userspace-api/iommufd.rst
create mode 100644 drivers/iommu/iommufd/Kconfig
create mode 100644 drivers/iommu/iommufd/Makefile
create mode 100644 drivers/iommu/iommufd/device.c
create mode 100644 drivers/iommu/iommufd/double_span.h
create mode 100644 drivers/iommu/iommufd/hw_pagetable.c
create mode 100644 drivers/iommu/iommufd/io_pagetable.c
create mode 100644 drivers/iommu/iommufd/io_pagetable.h
create mode 100644 drivers/iommu/iommufd/ioas.c
create mode 100644 drivers/iommu/iommufd/iommufd_private.h
create mode 100644 drivers/iommu/iommufd/iommufd_test.h
create mode 100644 drivers/iommu/iommufd/main.c
create mode 100644 drivers/iommu/iommufd/pages.c
create mode 100644 drivers/iommu/iommufd/selftest.c
create mode 100644 drivers/iommu/iommufd/vfio_compat.c
create mode 100644 include/linux/iommufd.h
create mode 100644 include/uapi/linux/iommufd.h
create mode 100644 tools/testing/selftests/iommu/.gitignore
create mode 100644 tools/testing/selftests/iommu/Makefile
create mode 100644 tools/testing/selftests/iommu/config
create mode 100644 tools/testing/selftests/iommu/iommufd.c
create mode 100644 tools/testing/selftests/iommu/iommufd_fail_nth.c
create mode 100644 tools/testing/selftests/iommu/iommufd_utils.h
base-commit: 69e61edebea030f177de7a23b8d5d9b8c4a90bda
--
2.38.1
Arm have recently released versions 2 and 2.1 of the SME extension.
Among the features introduced by SME 2 is some new architectural state,
the ZT0 register. This series adds support for this and all the other
features of the new SME versions.
Since the architecture has been designed with the possibility of adding
further ZTn registers in mind the interfaces added for ZT0 are done with
this possibility in mind. As ZT0 is a simple fixed size register these
interfaces are all fairly simple, the main complication is that ZT0 is
only accessible when PSTATE.ZA is enabled. The memory allocation that we
already do for PSTATE.ZA is extended to include space for ZT0.
Due to textual collisions especially around the addition of hwcaps this
is based on the recently merged series "arm64: Support for 2022 data
processing instructions" but there is no meaningful interaction.
v3:
- Rebase onto merged series for the 2022 architectur extensions.
- Clarifications and typo fixes in the ABI documentation.
v2:
- Add missing initialisation of user->zt in signal context parsing.
- Change the magic for ZT signal frames to 0x5a544e01 (ZTN0).
Mark Brown (21):
arm64/sme: Rename za_state to sme_state
arm64: Document boot requirements for SME 2
arm64/sysreg: Update system registers for SME 2 and 2.1
arm64/sme: Document SME 2 and SME 2.1 ABI
arm64/esr: Document ISS for ZT0 being disabled
arm64/sme: Manually encode ZT0 load and store instructions
arm64/sme: Enable host kernel to access ZT0
arm64/sme: Add basic enumeration for SME2
arm64/sme: Provide storage for ZT0
arm64/sme: Implement context switching for ZT0
arm64/sme: Implement signal handling for ZT
arm64/sme: Implement ZT0 ptrace support
arm64/sme: Add hwcaps for SME 2 and 2.1 features
kselftest/arm64: Add a stress test program for ZT0
kselftest/arm64: Cover ZT in the FP stress test
kselftest/arm64: Enumerate SME2 in the signal test utility code
kselftest/arm64: Teach the generic signal context validation about ZT
kselftest/arm64: Add test coverage for ZT register signal frames
kselftest/arm64: Add SME2 coverage to syscall-abi
kselftest/arm64: Add coverage of the ZT ptrace regset
kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
Documentation/arm64/booting.rst | 10 +
Documentation/arm64/elf_hwcaps.rst | 18 +
Documentation/arm64/sme.rst | 52 ++-
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/include/asm/fpsimd.h | 28 +-
arch/arm64/include/asm/fpsimdmacros.h | 22 ++
arch/arm64/include/asm/hwcap.h | 6 +
arch/arm64/include/asm/processor.h | 2 +-
arch/arm64/include/uapi/asm/hwcap.h | 6 +
arch/arm64/include/uapi/asm/sigcontext.h | 19 +
arch/arm64/kernel/cpufeature.c | 27 ++
arch/arm64/kernel/cpuinfo.c | 6 +
arch/arm64/kernel/entry-fpsimd.S | 30 +-
arch/arm64/kernel/fpsimd.c | 53 ++-
arch/arm64/kernel/hyp-stub.S | 6 +
arch/arm64/kernel/idreg-override.c | 1 +
arch/arm64/kernel/process.c | 21 +-
arch/arm64/kernel/ptrace.c | 60 ++-
arch/arm64/kernel/signal.c | 113 +++++-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 26 +-
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 115 ++++++
.../selftests/arm64/abi/syscall-abi-asm.S | 43 ++-
.../testing/selftests/arm64/abi/syscall-abi.c | 40 +-
tools/testing/selftests/arm64/fp/.gitignore | 2 +
tools/testing/selftests/arm64/fp/Makefile | 5 +
tools/testing/selftests/arm64/fp/fp-stress.c | 29 +-
tools/testing/selftests/arm64/fp/sme-inst.h | 20 +
tools/testing/selftests/arm64/fp/zt-ptrace.c | 365 ++++++++++++++++++
tools/testing/selftests/arm64/fp/zt-test.S | 324 ++++++++++++++++
.../testing/selftests/arm64/signal/.gitignore | 1 +
.../selftests/arm64/signal/test_signals.h | 2 +
.../arm64/signal/test_signals_utils.c | 3 +
.../arm64/signal/testcases/testcases.c | 36 ++
.../arm64/signal/testcases/testcases.h | 1 +
.../arm64/signal/testcases/zt_no_regs.c | 51 +++
.../arm64/signal/testcases/zt_regs.c | 85 ++++
39 files changed, 1564 insertions(+), 73 deletions(-)
create mode 100644 tools/testing/selftests/arm64/fp/zt-ptrace.c
create mode 100644 tools/testing/selftests/arm64/fp/zt-test.S
create mode 100644 tools/testing/selftests/arm64/signal/testcases/zt_no_regs.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/zt_regs.c
base-commit: c5195b027d29bbaae0c9668704ab996773788d23
--
2.30.2
Hello,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] Add a signal handler to cleanup properly before exiting the
parent process if there is an error occurs after creating
a child process with fork() in the CAT test.
[patch 5] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on Linux v6.1-rc5
Difference from v3:
[patch 2]
Rename "failed" to "ret" to avoid confusion.
[patch 4]
- Use sigaction(2) instead of signal().
- Add a description of using global bm_pid in commit message.
- Add comments to clarify why let the child continue to its
infinite loop after the write() failed.
[patch 5]
Ensure to run cat/cmt/mbm/mba_test_cleanup() to clear test result
file before return if an error occurs.
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (5):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 31 +++++++++++++------
tools/testing/selftests/resctrl/cmt_test.c | 7 ++---
tools/testing/selftests/resctrl/mba_test.c | 23 +++++++-------
tools/testing/selftests/resctrl/mbm_test.c | 20 ++++++------
.../testing/selftests/resctrl/resctrl_tests.c | 4 ---
tools/testing/selftests/resctrl/resctrl_val.c | 1 +
tools/testing/selftests/resctrl/resctrlfs.c | 5 ++-
7 files changed, 50 insertions(+), 41 deletions(-)
--
2.27.0
Changes in v6:
- Updated the interface and made cosmetic changes
Original Cover Letter in v5:
Hello,
This patch series implements IOCTL on the pagemap procfs file to get the
information about the page table entries (PTEs). The following operations
are supported in this ioctl:
- Get the information if the pages are soft-dirty, file mapped, present
or swapped.
- Clear the soft-dirty PTE bit of the pages.
- Get and clear the soft-dirty PTE bit of the pages atomically.
Soft-dirty PTE bit of the memory pages can be read by using the pagemap
procfs file. The soft-dirty PTE bit for the whole memory range of the
process can be cleared by writing to the clear_refs file. There are other
methods to mimic this information entirely in userspace with poor
performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty PTE bit status and clear operation
possible.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows. This syscall is used by games to
keep track of dirty pages to process only the dirty pages.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project[2][3]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project[2].
The IOCTL returns the addresses of the pages which match the specific masks.
The page addresses are returned in struct page_region in a compact form.
The max_pages is needed to support a use case where user only wants to get
a specific number of pages. So there is no need to find all the pages of
interest in the range when max_pages is specified. The IOCTL returns when
the maximum number of the pages are found. The max_pages is optional. If
max_pages is specified, it must be equal or greater than the vec_size.
This restriction is needed to handle worse case when one page_region only
contains info of one page and it cannot be compacted. This is needed to
emulate the Windows getWriteWatch() syscall.
Some non-dirty pages get marked as dirty because of the kernel's
internal activity (such as VMA merging as soft-dirty bit difference isn't
considered while deciding to merge VMAs). The dirty bit of the pages is
stored in the VMA flags and in the per page flags. If any of these two bits
are set, the page is considered to be soft dirty. Suppose you have cleared
the soft dirty bit of half of VMA which will be done by splitting the VMA
and clearing soft dirty bit flag in the half VMA and the pages in it. Now
kernel may decide to merge the VMAs again. So the half VMA becomes dirty
again. This splitting/merging costs performance. The application receives
a lot of pages which aren't dirty in reality but marked as dirty.
Performance is lost again here. Also sometimes user doesn't want the newly
allocated memory to be marked as dirty. PAGEMAP_NO_REUSED_REGIONS flag
solves both the problems. It is used to not depend on the soft dirty flag
in the VMA flags. So VMA splitting and merging doesn't happen. It only
depends on the soft dirty bit of the individual pages. Thus by using this
flag, there may be a scenerio such that the new memory regions which are
just created, doesn't look dirty when seen with the IOCTL, but look dirty
when seen from procfs. This seems okay as the user of this flag know the
implication of using it.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[3] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (3):
fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit
fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs
selftests: vm: add pagemap ioctl tests
fs/proc/task_mmu.c | 410 +++++++++++-
include/uapi/linux/fs.h | 56 ++
tools/include/uapi/linux/fs.h | 56 ++
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 5 +-
tools/testing/selftests/vm/pagemap_ioctl.c | 698 +++++++++++++++++++++
6 files changed, 1193 insertions(+), 33 deletions(-)
create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
--
2.30.2
This series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/20221109041358.GA118963@chaop.bj.intel.com…
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
private_mem_test.c file adds selftest to access private memory from the
guest via private/shared accesses and checking if the contents can be
leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V1 (Compared to RFC v3 patches):
1) Incorporated suggestions from Sean around simplifying KVM changes
2) Addressed comments from Sean
3) Added private mem test with shared memory backed by 2MB hugepages.
RFC v3 series:
https://lore.kernel.org/lkml/20220819174659.2427983-1-vannapurve@google.com…
This series has dependency on following patches:
1) V9 series patches from Chao mentioned above.
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests-v1
Vishal Annapurve (6):
KVM: x86: Add support for testing private memory
KVM: Selftests: Add support for private memory
KVM: selftests: x86: Add IS_ALIGNED/IS_PAGE_ALIGNED helpers
KVM: selftests: x86: Execute VMs with private memory
KVM: selftests: Add get_free_huge_2m_pages
KVM: selftests: x86: Add selftest for private memory
arch/x86/kvm/mmu/mmu.c | 4 +
arch/x86/kvm/mmu/mmu_internal.h | 4 +-
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/kvm_util_base.h | 15 +-
.../testing/selftests/kvm/include/test_util.h | 5 +
.../kvm/include/x86_64/private_mem.h | 37 +++
.../selftests/kvm/include/x86_64/processor.h | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 58 ++++-
tools/testing/selftests/kvm/lib/test_util.c | 30 +++
.../selftests/kvm/lib/x86_64/private_mem.c | 211 ++++++++++++++++++
.../selftests/kvm/x86_64/private_mem_test.c | 190 ++++++++++++++++
virt/kvm/Kconfig | 4 +
virt/kvm/kvm_main.c | 2 +-
14 files changed, 555 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c
create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_test.c
--
2.38.1.431.g37b22c650d-goog
When the async test case was introduced, despite being a completely
independent test case, the command to run it was added to the same shell
script as the smoke test case. Since a shell script implicitly returns
the error code from the last run command, this effectively caused the
script to only return as error code the result from the async test case,
hiding the smoke test result (which could then only be seen from the
python unittest logs).
Move the async test case call to its own shell script runner to avoid
the aforementioned issue. This also makes the output clearer to read,
since each kselftest KTAP result now matches with one python unittest
report.
While at it, also make it so the async test case is skipped if
/dev/tpmrm0 doesn't exist, since commit 8335adb8f9d3 ("selftests: tpm:
add async space test with noneexisting handle") added a test that relies
on it.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
tools/testing/selftests/tpm2/Makefile | 2 +-
tools/testing/selftests/tpm2/test_async.sh | 10 ++++++++++
tools/testing/selftests/tpm2/test_smoke.sh | 1 -
3 files changed, 11 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/tpm2/test_async.sh
diff --git a/tools/testing/selftests/tpm2/Makefile b/tools/testing/selftests/tpm2/Makefile
index 1a5db1eb8ed5..a9bf9459fb25 100644
--- a/tools/testing/selftests/tpm2/Makefile
+++ b/tools/testing/selftests/tpm2/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
include ../lib.mk
-TEST_PROGS := test_smoke.sh test_space.sh
+TEST_PROGS := test_smoke.sh test_space.sh test_async.sh
TEST_PROGS_EXTENDED := tpm2.py tpm2_tests.py
diff --git a/tools/testing/selftests/tpm2/test_async.sh b/tools/testing/selftests/tpm2/test_async.sh
new file mode 100755
index 000000000000..43bf5bd772fd
--- /dev/null
+++ b/tools/testing/selftests/tpm2/test_async.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+[ -e /dev/tpm0 ] || exit $ksft_skip
+[ -e /dev/tpmrm0 ] || exit $ksft_skip
+
+python3 -m unittest -v tpm2_tests.AsyncTest
diff --git a/tools/testing/selftests/tpm2/test_smoke.sh b/tools/testing/selftests/tpm2/test_smoke.sh
index 3e5ff29ee1dd..58af963e5b55 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -7,4 +7,3 @@ ksft_skip=4
[ -e /dev/tpm0 ] || exit $ksft_skip
python3 -m unittest -v tpm2_tests.SmokeTest
-python3 -m unittest -v tpm2_tests.AsyncTest
--
2.38.1
hi,
fw_fallback.sh test failed.
The error may caused by failed to write /sys/devices/virtual/misc/test_firmware/trigger_request.
diff firmware/fw_fallback.sh_org firmware/fw_fallback.sh
165c164,165
< echo -n "nope-$NAME" >"$DIR"/trigger_request 2>/dev/null &
---
> echo "echo -n \"nope-$NAME\" >\"$DIR\"/trigger_request &"
> echo -n "nope-$NAME" >"$DIR"/trigger_request &
# Test request_partial_firmware_into_buf() off=1 size=6 nofile: OK
# Test request_partial_firmware_into_buf() off=2 size=10 nofile: OK
# echo -n "nope-test-firmware.bin" >"/sys/devices/virtual/misc/test_firmware"/trigger_request &
# ./fw_fallback.sh: line 165: echo: write error: No such file or directory
# ./fw_fallback.sh: fallback mechanism immediately cancelled
$ echo -n "nope-test-firmware.bin" >/sys/devices/virtual/misc/test_firmware/trigger_request
-bash: echo: write error: No such file or directory
test OS: Debian 11
test kernel: v6.1-rc6
test output:
# ./fw_fallback.sh: fallback mechanism immediately cancelled
#
# The file never appeared: /sys/devices/virtual/misc/test_firmware/nope-test-firmware.bin/loading
#
# This might be a distribution udev rule setup by your distribution
# to immediately cancel all fallback requests, this must be
# removed before running these tests. To confirm look for
# a firmware rule like /lib/udev/rules.d/50-firmware.rules
# and see if you have something like this:
#
# SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1"
#
# If you do remove this file or comment out this line before
# proceeding with these tests.
not ok 1 selftests: firmware: fw_run_tests.sh # exit=1
There is not /lib/udev/rules.d/50-firmware.rules in Debian 11.
How can fw_run_tests.sh run successfully in Debian 11?
best regards,