From: Ashish Kalra <ashish.kalra(a)amd.com>
During platform init, SNP initialization may fail for several reasons,
such as firmware command failures and incompatible versions. However,
the KVM capability may continue to advertise support for it.
The platform may have SNP enabled but if SNP_INIT fails then SNP is
not supported by KVM.
During KVM module initialization query the SNP platform status to obtain
the SNP initialization state and use it as an additional condition to
determine support for SEV-SNP.
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Co-developed-by: Pratik R. Sampat <prsampat(a)amd.com>
Signed-off-by: Pratik R. Sampat <prsampat(a)amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com>
---
arch/x86/kvm/svm/sev.c | 43 +++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ada53f04158c..a6abdb26f877 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2934,6 +2934,32 @@ void __init sev_set_cpu_caps(void)
}
}
+static bool sev_is_snp_initialized(void)
+{
+ struct sev_user_data_snp_status *status;
+ struct sev_data_snp_addr buf;
+ bool initialized = false;
+ void *data;
+ int error;
+
+ data = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!data)
+ return initialized;
+
+ buf.address = __psp_pa(data);
+ if (sev_do_cmd(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &error))
+ goto out;
+
+ status = (struct sev_user_data_snp_status *)data;
+ if (status->state)
+ initialized = true;
+
+out:
+ snp_free_firmware_page(data);
+
+ return initialized;
+}
+
void __init sev_hardware_setup(void)
{
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
@@ -3038,6 +3064,14 @@ void __init sev_hardware_setup(void)
sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);
out:
+ if (sev_enabled) {
+ init_args.probe = true;
+ if (sev_platform_init(&init_args))
+ sev_supported = sev_es_supported = sev_snp_supported = false;
+ else
+ sev_snp_supported &= sev_is_snp_initialized();
+ }
+
if (boot_cpu_has(X86_FEATURE_SEV))
pr_info("SEV %s (ASIDs %u - %u)\n",
sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" :
@@ -3064,15 +3098,6 @@ void __init sev_hardware_setup(void)
sev_supported_vmsa_features = 0;
if (sev_es_debug_swap_enabled)
sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
-
- if (!sev_enabled)
- return;
-
- /*
- * Do both SNP and SEV initialization at KVM module load.
- */
- init_args.probe = true;
- sev_platform_init(&init_args);
}
void sev_hardware_unsetup(void)
--
2.34.1
Poll program is a helper to ftracetest, thus make it a
generic file and remove it from being run as a test.
Currently when executing tests using
$ make run_tests
CC poll
TAP version 13
1..2
# timeout set to 0
# selftests: ftrace: poll
# Error: Polling file is not specified
not ok 1 selftests: ftrace: poll # exit=255
Fix this by using TEST_GEN_FILES to build the 'poll' binary as a helper
rather than as a test.
Fixes: 80c3e28528ff ("selftests/tracing: Add hist poll() support test")
Signed-off-by: Ayush Jain <Ayush.jain3(a)amd.com>
---
tools/testing/selftests/ftrace/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index 49d96bb16355..7c12263f8260 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -6,6 +6,6 @@ TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
-TEST_GEN_PROGS = poll
+TEST_GEN_FILES := poll
include ../lib.mk
--
2.34.1
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
A vQUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which is an IOMMU HW feature to directly access the virtual queue
in the guest address space, to avoid VM Exits to improve the performance.
As an initial use case, it adds support for guest-owned HW virtual queues
that VMM can allocate per request from a guest OS writing the VM register.
Introduce IOMMUFD_OBJ_VQUEUE and its allocator IOMMUFD_CMD_VQUEUE_ALLOC,
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.
Meanwhile, a guest-owned virtual queue needs the kernel (a virtual queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the virtual queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VQUEUE_ALLOC and returned to the user space as an output
driver-data carried via the IOMMUFD_CMD_VQUEUE_ALLOC. So, this requires a
driver-specific user data support in the vIOMMU allocation flow.
As a real-world use case, this series implements a vQUEUE support to the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.
// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
4KB 1 35.7 us 5.3 us
16KB 1 41.8 us 6.8 us
64KB 1 68.9 us 9.9 us
128KB 1 109.0 us 12.6 us
256KB 1 187.1 us 18.0 us
4KB 2 96.9 us 6.8 us
16KB 2 97.8 us 7.5 us
64KB 2 151.5 us 10.7 us
128KB 2 257.8 us 12.7 us
256KB 2 443.0 us 17.9 us
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vqueue-v3
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vqueue-v3
Changelog
v3
* Add Reviewed-by from Baolu, Pranjal, and Alok
* Revise kdocs, uAPI docs, and commit logs
* Rename "vCMDQ" back to "vQUEUE" for AMD cases
* [tegra] Add tegra241_vcmdq_hw_flush_timeout()
* [tegra] Rename vsmmu_alloc to alloc_vintf_user
* [tegra] Use writel for SID replacement registers
* [tegra] Move mmap removal call to vsmmu_destroy op
* [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
* [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
* [iommufd] Add an object-type "owner" to immap structure
* [iommufd] Drop the ictx input in the new for-driver APIs
* [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
* [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
* [iommufd] Rename iommufd_ctx_alloc/free_mmap to
_iommufd_alloc/destroy_mmap
v2
https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
* Add Reviewed-by from Jason
* [smmu] Fix vsmmu initial value
* [smmu] Support impl for hw_info
* [tegra] Rename "slot" to "vsid"
* [tegra] Update kdocs and commit logs
* [tegra] Map/unmap LVCMDQ dynamically
* [tegra] Refcount the previous LVCMDQ
* [tegra] Return -EEXIST if LVCMDQ exists
* [tegra] Simplify VINTF cleanup routine
* [tegra] Use vmid and s2_domain in vsmmu
* [tegra] Rename "mmap_pgoff" to "immap_id"
* [tegra] Add more addr and length validation
* [iommufd] Add more narrative to mmap's kdoc
* [iommufd] Add iommufd_struct_depend/undepend()
* [iommufd] Rename vcmdq_free op to vcmdq_destroy
* [iommufd] Fix bug in iommu_copy_struct_to_user()
* [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
* [iommufd] Test the queue memory for its contiguity
* [iommufd] Return -ENXIO if address or length fails
* [iommufd] Do not change @min_last in mock_viommu_alloc()
* [iommufd] Generalize TEGRA241_VCMDQ data in core structure
* [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
* [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (23):
iommufd/viommu: Add driver-allocated vDEVICE support
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommu: Add iommu_copy_struct_to_user helper
iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
viommu->ictx
iommufd/driver: Add iommufd_struct_destroy to revert
iommufd_viommu_alloc
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add covearge for viommu data
iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
iommufd/viommu: Introduce IOMMUFD_OBJ_VQUEUE and its related struct
iommufd/viommu: Add IOMMUFD_CMD_VQUEUE_ALLOC ioctl
iommufd/driver: Add iommufd_vqueue_depend/undepend() helpers
iommufd/selftest: Add coverage for IOMMUFD_CMD_VQUEUE_ALLOC
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update vQUEUE
iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/tegra241-cmdqv: Simplify deinit flow in
tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +-
drivers/iommu/iommufd/io_pagetable.h | 8 +
drivers/iommu/iommufd/iommufd_private.h | 28 +-
drivers/iommu/iommufd/iommufd_test.h | 20 +
include/linux/iommu.h | 43 +-
include/linux/iommufd.h | 184 ++++++-
include/uapi/linux/iommufd.h | 117 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 52 +-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 42 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 481 +++++++++++++++++-
drivers/iommu/iommufd/device.c | 117 +----
drivers/iommu/iommufd/driver.c | 88 ++++
drivers/iommu/iommufd/io_pagetable.c | 95 ++++
drivers/iommu/iommufd/main.c | 84 ++-
drivers/iommu/iommufd/selftest.c | 126 ++++-
drivers/iommu/iommufd/viommu.c | 116 ++++-
tools/testing/selftests/iommu/iommufd.c | 96 +++-
.../selftests/iommu/iommufd_fail_nth.c | 11 +-
Documentation/userspace-api/iommufd.rst | 15 +
19 files changed, 1555 insertions(+), 193 deletions(-)
--
2.43.0
I'd like to cut down the memory usage of parsing vmlinux BTF in ebpf-go.
With some upcoming changes the library is sitting at 5MiB for a parse.
Most of that memory is simply copying the BTF blob into user space.
By allowing vmlinux BTF to be mmapped read-only into user space I can
cut memory usage by about 75%.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v3:
- Remove slightly confusing calculation of trailing (Alexei)
- Use vm_insert_page (Alexei)
- Simplified libbpf code
- Link to v2: https://lore.kernel.org/r/20250502-vmlinux-mmap-v2-0-95c271434519@isovalent…
Changes in v2:
- Use btf__new in selftest
- Avoid vm_iomap_memory in btf_vmlinux_mmap
- Add VM_DONTDUMP
- Add support to libbpf
- Link to v1: https://lore.kernel.org/r/20250501-vmlinux-mmap-v1-0-aa2724572598@isovalent…
---
Lorenz Bauer (3):
btf: allow mmap of vmlinux btf
selftests: bpf: add a test for mmapable vmlinux BTF
libbpf: Use mmap to parse vmlinux BTF from sysfs
include/asm-generic/vmlinux.lds.h | 3 +-
kernel/bpf/sysfs_btf.c | 37 ++++++++++
tools/lib/bpf/btf.c | 83 +++++++++++++++++++---
tools/testing/selftests/bpf/prog_tests/btf_sysfs.c | 83 ++++++++++++++++++++++
4 files changed, 194 insertions(+), 12 deletions(-)
---
base-commit: 38d976c32d85ef12dcd2b8a231196f7049548477
change-id: 20250501-vmlinux-mmap-2ec5563c3ef1
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
Revert fbbf93556f0c ("selftests: nic_performance: Add selftest for performance of NIC driver")
Revert c087dc54394b ("selftests: nic_link_layer: Add selftest case for speed and duplex states")
Revert 6116075e18f7 ("selftests: nic_link_layer: Add link layer selftest for NIC driver")
These tests don't clean up after themselves, don't use the disruptive
annotations, don't get included in make install etc. etc. The tests
were added before we have any "HW" runner, so the issues were missed.
Our CI doesn't have any way of excluding broken tests, remove these
for now to stop the random pollution of results due to broken env.
We can always add them back once / if fixed.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: sdf(a)fomichev.me
CC: mohan.prasad(a)microchip.com
CC: dw(a)davidwei.uk
CC: petrm(a)nvidia.com
CC: linux-kselftest(a)vger.kernel.org
---
.../testing/selftests/drivers/net/hw/Makefile | 2 -
.../drivers/net/hw/lib/py/__init__.py | 1 -
.../drivers/net/hw/lib/py/linkconfig.py | 222 ------------------
.../drivers/net/hw/nic_link_layer.py | 113 ---------
.../drivers/net/hw/nic_performance.py | 137 -----------
.../selftests/drivers/net/lib/py/load.py | 20 +-
6 files changed, 1 insertion(+), 494 deletions(-)
delete mode 100644 tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
delete mode 100644 tools/testing/selftests/drivers/net/hw/nic_link_layer.py
delete mode 100644 tools/testing/selftests/drivers/net/hw/nic_performance.py
diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 5447785c286e..df2c047ffa90 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -15,8 +15,6 @@ TEST_PROGS = \
iou-zcrx.py \
irq.py \
loopback.sh \
- nic_link_layer.py \
- nic_performance.py \
pp_alloc_fail.py \
rss_ctx.py \
rss_input_xfrm.py \
diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
index 399789a9676a..b582885786f5 100644
--- a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
+++ b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
@@ -9,7 +9,6 @@ KSFT_DIR = (Path(__file__).parent / "../../../../..").resolve()
sys.path.append(KSFT_DIR.as_posix())
from net.lib.py import *
from drivers.net.lib.py import *
- from .linkconfig import LinkConfig
except ModuleNotFoundError as e:
ksft_pr("Failed importing `net` library from kernel sources")
ksft_pr(str(e))
diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py b/tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
deleted file mode 100644
index 79fde603cbbc..000000000000
--- a/tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
+++ /dev/null
@@ -1,222 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-
-from lib.py import cmd, ethtool, ip
-from lib.py import ksft_pr, ksft_eq, KsftSkipEx
-from typing import Optional
-import re
-import time
-import json
-
-#The LinkConfig class is implemented to handle the link layer configurations.
-#Required minimum ethtool version is 6.10
-
-class LinkConfig:
- """Class for handling the link layer configurations"""
- def __init__(self, cfg: object) -> None:
- self.cfg = cfg
- self.partner_netif = self.get_partner_netif_name()
-
- """Get the initial link configuration of local interface"""
- self.common_link_modes = self.get_common_link_modes()
-
- def get_partner_netif_name(self) -> Optional[str]:
- partner_netif = None
- try:
- if not self.verify_link_up():
- return None
- """Get partner interface name"""
- partner_json_output = ip("addr show", json=True, host=self.cfg.remote)
- for interface in partner_json_output:
- for addr in interface.get('addr_info', []):
- if addr.get('local') == self.cfg.remote_addr:
- partner_netif = interface['ifname']
- ksft_pr(f"Partner Interface name: {partner_netif}")
- if partner_netif is None:
- ksft_pr("Unable to get the partner interface name")
- except Exception as e:
- print(f"Unexpected error occurred while getting partner interface name: {e}")
- self.partner_netif = partner_netif
- return partner_netif
-
- def verify_link_up(self) -> bool:
- """Verify whether the local interface link is up"""
- with open(f"/sys/class/net/{self.cfg.ifname}/operstate", "r") as fp:
- link_state = fp.read().strip()
-
- if link_state == "down":
- ksft_pr(f"Link state of interface {self.cfg.ifname} is DOWN")
- return False
- else:
- return True
-
- def reset_interface(self, local: bool = True, remote: bool = True) -> bool:
- ksft_pr("Resetting interfaces in local and remote")
- if remote:
- if self.verify_link_up():
- if self.partner_netif is not None:
- ifname = self.partner_netif
- link_up_cmd = f"ip link set up {ifname}"
- link_down_cmd = f"ip link set down {ifname}"
- reset_cmd = f"{link_down_cmd} && sleep 5 && {link_up_cmd}"
- try:
- cmd(reset_cmd, host=self.cfg.remote)
- except Exception as e:
- ksft_pr(f"Unexpected error occurred while resetting remote: {e}")
- else:
- ksft_pr("Partner interface not available")
- if local:
- ifname = self.cfg.ifname
- link_up_cmd = f"ip link set up {ifname}"
- link_down_cmd = f"ip link set down {ifname}"
- reset_cmd = f"{link_down_cmd} && sleep 5 && {link_up_cmd}"
- try:
- cmd(reset_cmd)
- except Exception as e:
- ksft_pr(f"Unexpected error occurred while resetting local: {e}")
- time.sleep(10)
- if self.verify_link_up() and self.get_ethtool_field("link-detected"):
- ksft_pr("Local and remote interfaces reset to original state")
- return True
- else:
- ksft_pr("Error occurred after resetting interfaces. Link is DOWN.")
- return False
-
- def set_speed_and_duplex(self, speed: str, duplex: str, autoneg: bool = True) -> bool:
- """Set the speed and duplex state for the interface"""
- autoneg_state = "on" if autoneg is True else "off"
- process = None
- try:
- process = ethtool(f"--change {self.cfg.ifname} speed {speed} duplex {duplex} autoneg {autoneg_state}")
- except Exception as e:
- ksft_pr(f"Unexpected error occurred while setting speed/duplex: {e}")
- if process is None or process.ret != 0:
- return False
- else:
- ksft_pr(f"Speed: {speed} Mbps, Duplex: {duplex} set for Interface: {self.cfg.ifname}")
- return True
-
- def verify_speed_and_duplex(self, expected_speed: str, expected_duplex: str) -> bool:
- if not self.verify_link_up():
- return False
- """Verifying the speed and duplex state for the interface"""
- with open(f"/sys/class/net/{self.cfg.ifname}/speed", "r") as fp:
- actual_speed = fp.read().strip()
- with open(f"/sys/class/net/{self.cfg.ifname}/duplex", "r") as fp:
- actual_duplex = fp.read().strip()
-
- ksft_eq(actual_speed, expected_speed)
- ksft_eq(actual_duplex, expected_duplex)
- return True
-
- def set_autonegotiation_state(self, state: str, remote: bool = False) -> bool:
- common_link_modes = self.common_link_modes
- speeds, duplex_modes = self.get_speed_duplex_values(self.common_link_modes)
- speed = speeds[0]
- duplex = duplex_modes[0]
- if not speed or not duplex:
- ksft_pr("No speed or duplex modes found")
- return False
-
- speed_duplex_cmd = f"speed {speed} duplex {duplex}" if state == "off" else ""
- if remote:
- if not self.verify_link_up():
- return False
- """Set the autonegotiation state for the partner"""
- command = f"-s {self.partner_netif} {speed_duplex_cmd} autoneg {state}"
- partner_autoneg_change = None
- """Set autonegotiation state for interface in remote pc"""
- try:
- partner_autoneg_change = ethtool(command, host=self.cfg.remote)
- except Exception as e:
- ksft_pr(f"Unexpected error occurred while changing auto-neg in remote: {e}")
- if partner_autoneg_change is None or partner_autoneg_change.ret != 0:
- ksft_pr(f"Not able to set autoneg parameter for interface {self.partner_netif}.")
- return False
- ksft_pr(f"Autoneg set as {state} for {self.partner_netif}")
- else:
- """Set the autonegotiation state for the interface"""
- try:
- process = ethtool(f"-s {self.cfg.ifname} {speed_duplex_cmd} autoneg {state}")
- if process.ret != 0:
- ksft_pr(f"Not able to set autoneg parameter for interface {self.cfg.ifname}")
- return False
- except Exception as e:
- ksft_pr(f"Unexpected error occurred while changing auto-neg in local: {e}")
- return False
- ksft_pr(f"Autoneg set as {state} for {self.cfg.ifname}")
- return True
-
- def check_autoneg_supported(self, remote: bool = False) -> bool:
- if not remote:
- local_autoneg = self.get_ethtool_field("supports-auto-negotiation")
- if local_autoneg is None:
- ksft_pr(f"Unable to fetch auto-negotiation status for interface {self.cfg.ifname}")
- """Return autoneg status of the local interface"""
- return local_autoneg
- else:
- if not self.verify_link_up():
- raise KsftSkipEx("Link is DOWN")
- """Check remote auto-negotiation support status"""
- partner_autoneg = False
- if self.partner_netif is not None:
- partner_autoneg = self.get_ethtool_field("supports-auto-negotiation", remote=True)
- if partner_autoneg is None:
- ksft_pr(f"Unable to fetch auto-negotiation status for interface {self.partner_netif}")
- return partner_autoneg
-
- def get_common_link_modes(self) -> set[str]:
- common_link_modes = []
- """Populate common link modes"""
- link_modes = self.get_ethtool_field("supported-link-modes")
- partner_link_modes = self.get_ethtool_field("link-partner-advertised-link-modes")
- if link_modes is None:
- raise KsftSkipEx(f"Link modes not available for {self.cfg.ifname}")
- if partner_link_modes is None:
- raise KsftSkipEx(f"Partner link modes not available for {self.cfg.ifname}")
- common_link_modes = set(link_modes) and set(partner_link_modes)
- return common_link_modes
-
- def get_speed_duplex_values(self, link_modes: list[str]) -> tuple[list[str], list[str]]:
- speed = []
- duplex = []
- """Check the link modes"""
- for data in link_modes:
- parts = data.split('/')
- speed_value = re.match(r'\d+', parts[0])
- if speed_value:
- speed.append(speed_value.group())
- else:
- ksft_pr(f"No speed value found for interface {self.ifname}")
- return None, None
- duplex.append(parts[1].lower())
- return speed, duplex
-
- def get_ethtool_field(self, field: str, remote: bool = False) -> Optional[str]:
- process = None
- if not remote:
- """Get the ethtool field value for the local interface"""
- try:
- process = ethtool(self.cfg.ifname, json=True)
- except Exception as e:
- ksft_pr("Required minimum ethtool version is 6.10")
- ksft_pr(f"Unexpected error occurred while getting ethtool field in local: {e}")
- return None
- else:
- if not self.verify_link_up():
- return None
- """Get the ethtool field value for the remote interface"""
- self.cfg.require_cmd("ethtool", remote=True)
- if self.partner_netif is None:
- ksft_pr(f"Partner interface name is unavailable.")
- return None
- try:
- process = ethtool(self.partner_netif, json=True, host=self.cfg.remote)
- except Exception as e:
- ksft_pr("Required minimum ethtool version is 6.10")
- ksft_pr(f"Unexpected error occurred while getting ethtool field in remote: {e}")
- return None
- json_data = process[0]
- """Check if the field exist in the json data"""
- if field not in json_data:
- raise KsftSkipEx(f'Field {field} does not exist in the output of interface {json_data["ifname"]}')
- return json_data[field]
diff --git a/tools/testing/selftests/drivers/net/hw/nic_link_layer.py b/tools/testing/selftests/drivers/net/hw/nic_link_layer.py
deleted file mode 100644
index efd921180532..000000000000
--- a/tools/testing/selftests/drivers/net/hw/nic_link_layer.py
+++ /dev/null
@@ -1,113 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: GPL-2.0
-
-#Introduction:
-#This file has basic link layer tests for generic NIC drivers.
-#The test comprises of auto-negotiation, speed and duplex checks.
-#
-#Setup:
-#Connect the DUT PC with NIC card to partner pc back via ethernet medium of your choice(RJ45, T1)
-#
-# DUT PC Partner PC
-#┌───────────────────────┐ ┌──────────────────────────┐
-#│ │ │ │
-#│ │ │ │
-#│ ┌───────────┐ │ │
-#│ │DUT NIC │ Eth │ │
-#│ │Interface ─┼─────────────────────────┼─ any eth Interface │
-#│ └───────────┘ │ │
-#│ │ │ │
-#│ │ │ │
-#└───────────────────────┘ └──────────────────────────┘
-#
-#Configurations:
-#Required minimum ethtool version is 6.10 (supports json)
-#Default values:
-#time_delay = 8 #time taken to wait for transitions to happen, in seconds.
-
-import time
-import argparse
-from lib.py import ksft_run, ksft_exit, ksft_pr, ksft_eq
-from lib.py import KsftFailEx, KsftSkipEx
-from lib.py import NetDrvEpEnv
-from lib.py import LinkConfig
-
-def _pre_test_checks(cfg: object, link_config: LinkConfig) -> None:
- if link_config.partner_netif is None:
- KsftSkipEx("Partner interface is not available")
- if not link_config.check_autoneg_supported() or not link_config.check_autoneg_supported(remote=True):
- KsftSkipEx(f"Auto-negotiation not supported for interface {cfg.ifname} or {link_config.partner_netif}")
- if not link_config.verify_link_up():
- raise KsftSkipEx(f"Link state of interface {cfg.ifname} is DOWN")
-
-def verify_autonegotiation(cfg: object, expected_state: str, link_config: LinkConfig) -> None:
- if not link_config.verify_link_up():
- raise KsftSkipEx(f"Link state of interface {cfg.ifname} is DOWN")
- """Verifying the autonegotiation state in partner"""
- partner_autoneg_output = link_config.get_ethtool_field("auto-negotiation", remote=True)
- if partner_autoneg_output is None:
- KsftSkipEx(f"Auto-negotiation state not available for interface {link_config.partner_netif}")
- partner_autoneg_state = "on" if partner_autoneg_output is True else "off"
-
- ksft_eq(partner_autoneg_state, expected_state)
-
- """Verifying the autonegotiation state of local"""
- autoneg_output = link_config.get_ethtool_field("auto-negotiation")
- if autoneg_output is None:
- KsftSkipEx(f"Auto-negotiation state not available for interface {cfg.ifname}")
- actual_state = "on" if autoneg_output is True else "off"
-
- ksft_eq(actual_state, expected_state)
-
- """Verifying the link establishment"""
- link_available = link_config.get_ethtool_field("link-detected")
- if link_available is None:
- KsftSkipEx(f"Link status not available for interface {cfg.ifname}")
- if link_available != True:
- raise KsftSkipEx("Link not established at interface {cfg.ifname} after changing auto-negotiation")
-
-def test_autonegotiation(cfg: object, link_config: LinkConfig, time_delay: int) -> None:
- _pre_test_checks(cfg, link_config)
- for state in ["off", "on"]:
- if not link_config.set_autonegotiation_state(state, remote=True):
- raise KsftSkipEx(f"Unable to set auto-negotiation state for interface {link_config.partner_netif}")
- if not link_config.set_autonegotiation_state(state):
- raise KsftSkipEx(f"Unable to set auto-negotiation state for interface {cfg.ifname}")
- time.sleep(time_delay)
- verify_autonegotiation(cfg, state, link_config)
-
-def test_network_speed(cfg: object, link_config: LinkConfig, time_delay: int) -> None:
- _pre_test_checks(cfg, link_config)
- common_link_modes = link_config.common_link_modes
- if not common_link_modes:
- KsftSkipEx("No common link modes exist")
- speeds, duplex_modes = link_config.get_speed_duplex_values(common_link_modes)
-
- if speeds and duplex_modes and len(speeds) == len(duplex_modes):
- for idx in range(len(speeds)):
- speed = speeds[idx]
- duplex = duplex_modes[idx]
- if not link_config.set_speed_and_duplex(speed, duplex):
- raise KsftFailEx(f"Unable to set speed and duplex parameters for {cfg.ifname}")
- time.sleep(time_delay)
- if not link_config.verify_speed_and_duplex(speed, duplex):
- raise KsftSkipEx(f"Error occurred while verifying speed and duplex states for interface {cfg.ifname}")
- else:
- if not speeds or not duplex_modes:
- KsftSkipEx(f"No supported speeds or duplex modes found for interface {cfg.ifname}")
- else:
- KsftSkipEx("Mismatch in the number of speeds and duplex modes")
-
-def main() -> None:
- parser = argparse.ArgumentParser(description="Run basic link layer tests for NIC driver")
- parser.add_argument('--time-delay', type=int, default=8, help='Time taken to wait for transitions to happen(in seconds). Default is 8 seconds.')
- args = parser.parse_args()
- time_delay = args.time_delay
- with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
- link_config = LinkConfig(cfg)
- ksft_run(globs=globals(), case_pfx={"test_"}, args=(cfg, link_config, time_delay,))
- link_config.reset_interface()
- ksft_exit()
-
-if __name__ == "__main__":
- main()
diff --git a/tools/testing/selftests/drivers/net/hw/nic_performance.py b/tools/testing/selftests/drivers/net/hw/nic_performance.py
deleted file mode 100644
index 201403b76ea3..000000000000
--- a/tools/testing/selftests/drivers/net/hw/nic_performance.py
+++ /dev/null
@@ -1,137 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: GPL-2.0
-
-#Introduction:
-#This file has basic performance test for generic NIC drivers.
-#The test comprises of throughput check for TCP and UDP streams.
-#
-#Setup:
-#Connect the DUT PC with NIC card to partner pc back via ethernet medium of your choice(RJ45, T1)
-#
-# DUT PC Partner PC
-#┌───────────────────────┐ ┌──────────────────────────┐
-#│ │ │ │
-#│ │ │ │
-#│ ┌───────────┐ │ │
-#│ │DUT NIC │ Eth │ │
-#│ │Interface ─┼─────────────────────────┼─ any eth Interface │
-#│ └───────────┘ │ │
-#│ │ │ │
-#│ │ │ │
-#└───────────────────────┘ └──────────────────────────┘
-#
-#Configurations:
-#To prevent interruptions, Add ethtool, ip to the sudoers list in remote PC and get the ssh key from remote.
-#Required minimum ethtool version is 6.10
-#Change the below configuration based on your hw needs.
-# """Default values"""
-#time_delay = 8 #time taken to wait for transitions to happen, in seconds.
-#test_duration = 10 #performance test duration for the throughput check, in seconds.
-#send_throughput_threshold = 80 #percentage of send throughput required to pass the check
-#receive_throughput_threshold = 50 #percentage of receive throughput required to pass the check
-
-import time
-import json
-import argparse
-from lib.py import ksft_run, ksft_exit, ksft_pr, ksft_true
-from lib.py import KsftFailEx, KsftSkipEx, GenerateTraffic
-from lib.py import NetDrvEpEnv, bkg, wait_port_listen
-from lib.py import cmd
-from lib.py import LinkConfig
-
-class TestConfig:
- def __init__(self, time_delay: int, test_duration: int, send_throughput_threshold: int, receive_throughput_threshold: int) -> None:
- self.time_delay = time_delay
- self.test_duration = test_duration
- self.send_throughput_threshold = send_throughput_threshold
- self.receive_throughput_threshold = receive_throughput_threshold
-
-def _pre_test_checks(cfg: object, link_config: LinkConfig) -> None:
- if not link_config.verify_link_up():
- KsftSkipEx(f"Link state of interface {cfg.ifname} is DOWN")
- common_link_modes = link_config.common_link_modes
- if common_link_modes is None:
- KsftSkipEx("No common link modes found")
- if link_config.partner_netif == None:
- KsftSkipEx("Partner interface is not available")
- if link_config.check_autoneg_supported():
- KsftSkipEx("Auto-negotiation not supported by local")
- if link_config.check_autoneg_supported(remote=True):
- KsftSkipEx("Auto-negotiation not supported by remote")
- cfg.require_cmd("iperf3", remote=True)
-
-def check_throughput(cfg: object, link_config: LinkConfig, test_config: TestConfig, protocol: str, traffic: GenerateTraffic) -> None:
- common_link_modes = link_config.common_link_modes
- speeds, duplex_modes = link_config.get_speed_duplex_values(common_link_modes)
- """Test duration in seconds"""
- duration = test_config.test_duration
-
- ksft_pr(f"{protocol} test")
- test_type = "-u" if protocol == "UDP" else ""
-
- send_throughput = []
- receive_throughput = []
- for idx in range(0, len(speeds)):
- if link_config.set_speed_and_duplex(speeds[idx], duplex_modes[idx]) == False:
- raise KsftFailEx(f"Not able to set speed and duplex parameters for {cfg.ifname}")
- time.sleep(test_config.time_delay)
- if not link_config.verify_link_up():
- raise KsftSkipEx(f"Link state of interface {cfg.ifname} is DOWN")
-
- send_command=f"{test_type} -b 0 -t {duration} --json"
- receive_command=f"{test_type} -b 0 -t {duration} --reverse --json"
-
- send_result = traffic.run_remote_test(cfg, command=send_command)
- if send_result.ret != 0:
- raise KsftSkipEx("Error occurred during data transmit: {send_result.stdout}")
-
- send_output = send_result.stdout
- send_data = json.loads(send_output)
-
- """Convert throughput to Mbps"""
- send_throughput.append(round(send_data['end']['sum_sent']['bits_per_second'] / 1e6, 2))
- ksft_pr(f"{protocol}: Send throughput: {send_throughput[idx]} Mbps")
-
- receive_result = traffic.run_remote_test(cfg, command=receive_command)
- if receive_result.ret != 0:
- raise KsftSkipEx("Error occurred during data receive: {receive_result.stdout}")
-
- receive_output = receive_result.stdout
- receive_data = json.loads(receive_output)
-
- """Convert throughput to Mbps"""
- receive_throughput.append(round(receive_data['end']['sum_received']['bits_per_second'] / 1e6, 2))
- ksft_pr(f"{protocol}: Receive throughput: {receive_throughput[idx]} Mbps")
-
- """Check whether throughput is not below the threshold (default values set at start)"""
- for idx in range(0, len(speeds)):
- send_threshold = float(speeds[idx]) * float(test_config.send_throughput_threshold / 100)
- receive_threshold = float(speeds[idx]) * float(test_config.receive_throughput_threshold / 100)
- ksft_true(send_throughput[idx] >= send_threshold, f"{protocol}: Send throughput is below threshold for {speeds[idx]} Mbps in {duplex_modes[idx]} duplex")
- ksft_true(receive_throughput[idx] >= receive_threshold, f"{protocol}: Receive throughput is below threshold for {speeds[idx]} Mbps in {duplex_modes[idx]} duplex")
-
-def test_tcp_throughput(cfg: object, link_config: LinkConfig, test_config: TestConfig, traffic: GenerateTraffic) -> None:
- _pre_test_checks(cfg, link_config)
- check_throughput(cfg, link_config, test_config, 'TCP', traffic)
-
-def test_udp_throughput(cfg: object, link_config: LinkConfig, test_config: TestConfig, traffic: GenerateTraffic) -> None:
- _pre_test_checks(cfg, link_config)
- check_throughput(cfg, link_config, test_config, 'UDP', traffic)
-
-def main() -> None:
- parser = argparse.ArgumentParser(description="Run basic performance test for NIC driver")
- parser.add_argument('--time-delay', type=int, default=8, help='Time taken to wait for transitions to happen(in seconds). Default is 8 seconds.')
- parser.add_argument('--test-duration', type=int, default=10, help='Performance test duration for the throughput check, in seconds. Default is 10 seconds.')
- parser.add_argument('--stt', type=int, default=80, help='Send throughput Threshold: Percentage of send throughput upon actual throughput required to pass the throughput check (in percentage). Default is 80.')
- parser.add_argument('--rtt', type=int, default=50, help='Receive throughput Threshold: Percentage of receive throughput upon actual throughput required to pass the throughput check (in percentage). Default is 50.')
- args=parser.parse_args()
- test_config = TestConfig(args.time_delay, args.test_duration, args.stt, args.rtt)
- with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
- traffic = GenerateTraffic(cfg)
- link_config = LinkConfig(cfg)
- ksft_run(globs=globals(), case_pfx={"test_"}, args=(cfg, link_config, test_config, traffic, ))
- link_config.reset_interface()
- ksft_exit()
-
-if __name__ == "__main__":
- main()
diff --git a/tools/testing/selftests/drivers/net/lib/py/load.py b/tools/testing/selftests/drivers/net/lib/py/load.py
index da5af2c680fa..d9c10613ae67 100644
--- a/tools/testing/selftests/drivers/net/lib/py/load.py
+++ b/tools/testing/selftests/drivers/net/lib/py/load.py
@@ -2,7 +2,7 @@
import time
-from lib.py import ksft_pr, cmd, ip, rand_port, wait_port_listen, bkg
+from lib.py import ksft_pr, cmd, ip, rand_port, wait_port_listen
class GenerateTraffic:
def __init__(self, env, port=None):
@@ -23,24 +23,6 @@ from lib.py import ksft_pr, cmd, ip, rand_port, wait_port_listen, bkg
self.stop(verbose=True)
raise Exception("iperf3 traffic did not ramp up")
- def run_remote_test(self, env: object, port=None, command=None):
- if port is None:
- port = rand_port()
- try:
- server_cmd = f"iperf3 -s 1 -p {port} --one-off"
- with bkg(server_cmd, host=env.remote):
- #iperf3 opens TCP connection as default in server
- #-u to be specified in client command for UDP
- wait_port_listen(port, host=env.remote)
- except Exception as e:
- raise Exception(f"Unexpected error occurred while running server command: {e}")
- try:
- client_cmd = f"iperf3 -c {env.remote_addr} -p {port} {command}"
- proc = cmd(client_cmd)
- return proc
- except Exception as e:
- raise Exception(f"Unexpected error occurred while running client command: {e}")
-
def _wait_pkts(self, pkt_cnt=None, pps=None):
"""
Wait until we've seen pkt_cnt or until traffic ramps up to pps.
--
2.49.0
Greetings,
We're looking to start doing test development for portions of kernel code "standalone" mocked out and would like to do it in userspace. Are there any existing patch sets we could review or help extend to define this concept? We have checked out David Gow's LPC talk [1] from last year that did point out a few patch series that hinted at userspace kunit.
Regards,
Matt
[1] https://lpc.events/event/18/contributions/1790/attachments/1400/3007/LPC202…
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure
represents a HW supported queue/buffer for VM to use exclusively, e.g.
- NVIDIA's virtual command queue
- AMD vIOMMU's command buffer
either of which is an IOMMU HW feature to directly load and execute cache
invalidation commands issued by a guest kernel, to shoot down TLB entries
that HW cached for guest-owned stage-1 page table entries. This is a big
improvement since there is no VM Exit during an invalidation, compared to
the traditional invalidation pathway by trapping a guest-own invalidation
queue and forwarding those commands/requests to the host kernel that will
eventually fill a HW-owned queue to execute those commands.
Thus, a vCMDQ object, as an initial use case, is all about a guest-owned
HW command queue that VMM can allocate/configure depending on the request
from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator
IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue
info, such as queue base address, size, and etc.
Meanwhile, a guest-owned command queue needs the kernel (a command queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the command queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an
output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a
driver-specific user data support by a vIOMMU object.
As a real-world use case, this series implements a vCMDQ support to the
tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word,
this is also the Tegra CMDQV series Part-2 (user-space support), reworked
from Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v2
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v2
Changelog
v2
* Add Reviewed-by from Jason
* [smmu] Fix vsmmu initial value
* [smmu] Support impl for hw_info
* [tegra] Rename "slot" to "vsid"
* [tegra] Update kdocs and commit logs
* [tegra] Map/unmap LVCMDQ dynamically
* [tegra] Refcount the previous LVCMDQ
* [tegra] Return -EEXIST if LVCMDQ exists
* [tegra] Simplify VINTF cleanup routine
* [tegra] Use vmid and s2_domain in vsmmu
* [tegra] Rename "mmap_pgoff" to "immap_id"
* [tegra] Add more addr and length validation
* [iommufd] Add more narrative to mmap's kdoc
* [iommufd] Add iommufd_struct_depend/undepend()
* [iommufd] Rename vcmdq_free op to vcmdq_destroy
* [iommufd] Fix bug in iommu_copy_struct_to_user()
* [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
* [iommufd] Test the queue memory for its contiguity
* [iommufd] Return -ENXIO if address or length fails
* [iommufd] Do not change @min_last in mock_viommu_alloc()
* [iommufd] Generalize TEGRA241_VCMDQ data in core structure
* [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
* [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (22):
iommufd/viommu: Add driver-allocated vDEVICE support
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommu: Add iommu_copy_struct_to_user helper
iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add covearge for viommu data
iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct
iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl
iommufd: Add for-driver helpers iommufd_vcmdq_depend/undepend()
iommufd/selftest: Add coverage for IOMMUFD_CMD_VCMDQ_ALLOC
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update vCMDQ
iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/tegra241-cmdqv: Simplify deinit flow in
tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +-
drivers/iommu/iommufd/io_pagetable.h | 8 +
drivers/iommu/iommufd/iommufd_private.h | 25 +-
drivers/iommu/iommufd/iommufd_test.h | 20 +
include/linux/iommu.h | 43 +-
include/linux/iommufd.h | 146 ++++++
include/uapi/linux/iommufd.h | 113 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 51 +-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 42 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 451 +++++++++++++++++-
drivers/iommu/iommufd/device.c | 117 +----
drivers/iommu/iommufd/driver.c | 81 ++++
drivers/iommu/iommufd/io_pagetable.c | 95 ++++
drivers/iommu/iommufd/main.c | 58 ++-
drivers/iommu/iommufd/selftest.c | 123 ++++-
drivers/iommu/iommufd/viommu.c | 111 ++++-
tools/testing/selftests/iommu/iommufd.c | 93 +++-
.../selftests/iommu/iommufd_fail_nth.c | 11 +-
Documentation/userspace-api/iommufd.rst | 14 +
19 files changed, 1436 insertions(+), 191 deletions(-)
--
2.43.0
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (3):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 43 +++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 73 ++++++---
kernel/signal.c | 26 +---
tools/testing/selftests/pidfd/pidfd.h | 8 +
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
10 files changed, 342 insertions(+), 55 deletions(-)
--
2.46.2