As the part-3 of the vIOMMU infrastructure, this series introduces a vIRQ object. The existing FAULT object provides a nice notification pathway to the user space already, so let vIRQ reuse the infrastructure.
Mimicing the HWPT structure, add a common EVENTQ structure to support its derivatives: IOMMUFD_OBJ_FAULT (existing) and IOMMUFD_OBJ_VIRQ (new).
IOMMUFD_CMD_VIRQ_ALLOC is introduced to allocate vIRQ objects for vIOMMUs. One vIOMMU can have multiple vIRQs in different types but can not support multiple vIRQs with the same types.
The forwarding part is fairly simple but might need to replace a physical device ID with a virtual device ID in a driver-level IRQ data structure. So, this comes with some helpers for drivers to use.
As usual, this series comes with the selftest coverage for this new vIRQ, and with a real world use case in the ARM SMMUv3 driver.
This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_virq-v3
Testing with RMR patches for MSI: https://github.com/nicolinc/iommufd/commits/iommufd_virq-v3-with-rmr Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_virq-v3
Changelog v3 * Rebase on Will's for-joerg/arm-smmu/updates for arm_smmu_event series * Add "Reviewed-by" lines from Kevin * Fix typos in comments, kdocs, and jump tags * Add a patch to sort struct iommufd_ioctl_op * Update iommufd's userpsace-api documentation * Update uAPI kdoc to quote SMMUv3 offical spec * Drop the unused workqueue in struct iommufd_virq * Drop might_sleep() in iommufd_viommu_report_irq() helper * Add missing "break" in iommufd_viommu_get_vdev_id() helper * Shrink the scope of the vmaster's read lock in SMMUv3 driver * Pass in two arguments to iommufd_eventq_virq_handler() helper * Move "!ops || !ops->read" validation into iommufd_eventq_init() * Move "fault->ictx = ictx" closer to iommufd_ctx_get(fault->ictx) * Update commit message for arm_smmu_attach_prepare/commit_vmaster() * Keep "iommufd_fault" as-is and rename "iommufd_eventq_virq" to just "iommufd_virq" v2 https://lore.kernel.org/all/cover.1733263737.git.nicolinc@nvidia.com/ * Rebase on v6.13-rc1 * Add IOPF and vIRQ in iommufd.rst (userspace-api) * Add a proper locking in iommufd_event_virq_destroy * Add iommufd_event_virq_abort with a lockdep_assert_held * Rename "EVENT_*" to "EVENTQ_*" to describe the objects better * Reorganize flows in iommufd_eventq_virq_alloc for abort() to work * Adde struct arm_smmu_vmaster to store vSID upon attaching to a nested domain, calling a newly added iommufd_viommu_get_vdev_id helper * Adde an arm_vmaster_report_event helper in arm-smmu-v3-iommufd file to simplify the routine in arm_smmu_handle_evt() of the main driver v1 https://lore.kernel.org/all/cover.1724777091.git.nicolinc@nvidia.com/
Thanks! Nicolin
Nicolin Chen (14): iommufd: Keep IOCTL list in an alphabetical order iommufd/fault: Add an iommufd_fault_init() helper iommufd/fault: Move iommufd_fault_iopf_handler() to header iommufd: Abstract an iommufd_eventq from iommufd_fault iommufd: Rename fault.c to eventq.c iommufd: Add IOMMUFD_OBJ_VIRQ and IOMMUFD_CMD_VIRQ_ALLOC iommufd/viommu: Add iommufd_viommu_get_vdev_id helper iommufd/viommu: Add iommufd_viommu_report_irq helper iommufd/selftest: Require vdev_id when attaching to a nested domain iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VIRQ for vIRQ coverage iommufd/selftest: Add IOMMU_VIRQ_ALLOC test coverage Documentation: userspace-api: iommufd: Update FAULT and VIRQ iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster iommu/arm-smmu-v3: Report IRQs that belong to devices attached to vIOMMU
drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 30 ++ drivers/iommu/iommufd/iommufd_private.h | 115 ++++++- drivers/iommu/iommufd/iommufd_test.h | 10 + include/linux/iommufd.h | 20 ++ include/uapi/linux/iommufd.h | 46 +++ tools/testing/selftests/iommu/iommufd_utils.h | 63 ++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 65 ++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 90 ++++-- drivers/iommu/iommufd/driver.c | 57 ++++ drivers/iommu/iommufd/{fault.c => eventq.c} | 298 ++++++++++++++---- drivers/iommu/iommufd/hw_pagetable.c | 6 +- drivers/iommu/iommufd/main.c | 20 +- drivers/iommu/iommufd/selftest.c | 53 ++++ drivers/iommu/iommufd/viommu.c | 2 + tools/testing/selftests/iommu/iommufd.c | 27 ++ .../selftests/iommu/iommufd_fail_nth.c | 6 + Documentation/userspace-api/iommufd.rst | 16 + 18 files changed, 809 insertions(+), 117 deletions(-) rename drivers/iommu/iommufd/{fault.c => eventq.c} (55%)
base-commit: 376ce8b35ed15d5deee57bdecd8449f6a4df4c42
Move VDEVICE upward to keep the order. Also run clang-format keep the same coding style at line wrappings. No functional change.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/main.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 97c5e3567d33..cfbdf7b0e3c1 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -333,8 +333,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), - IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc, - out_fault_fd), + IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc, + struct iommu_fault_alloc, out_fault_fd), IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, @@ -355,20 +355,18 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { src_iova), IOCTL_OP(IOMMU_IOAS_IOVA_RANGES, iommufd_ioas_iova_ranges, struct iommu_ioas_iova_ranges, out_iova_alignment), - IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, - iova), + IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova), IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file, struct iommu_ioas_map_file, iova), IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, length), - IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, - val64), + IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64), + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, + struct iommu_vdevice_alloc, virt_id), IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), - IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, - struct iommu_vdevice_alloc, virt_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif
On Tue, Dec 17, 2024 at 09:00:14PM -0800, Nicolin Chen wrote:
Move VDEVICE upward to keep the order. Also run clang-format keep the same coding style at line wrappings. No functional change.
It should fix the order in ucmd_buffer too. Will include in v4.
Nicolin
The infrastructure of a fault object will be shared with a new vIRQ object in a following change. Add a helper for a vIRQ allocator to call it too.
Reorder the iommufd_ctx_get and refcount_inc to keep them symmetrical with the iommufd_fault_fops_release().
Since the new vIRQ object doesn't need "response", leave the xa_init_flags in its original location.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/fault.c | 48 ++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 1fe804e28a86..1d1095fc8224 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -367,11 +367,35 @@ static const struct file_operations iommufd_fault_fops = { .release = iommufd_fault_fops_release, };
+static int iommufd_fault_init(struct iommufd_fault *fault, char *name, + struct iommufd_ctx *ictx) +{ + struct file *filep; + int fdno; + + mutex_init(&fault->mutex); + INIT_LIST_HEAD(&fault->deliver); + init_waitqueue_head(&fault->wait_queue); + + filep = anon_inode_getfile(name, &iommufd_fault_fops, fault, O_RDWR); + if (IS_ERR(filep)) + return PTR_ERR(filep); + + fault->ictx = ictx; + iommufd_ctx_get(fault->ictx); + fault->filep = filep; + refcount_inc(&fault->obj.users); + + fdno = get_unused_fd_flags(O_CLOEXEC); + if (fdno < 0) + fput(filep); + return fdno; +} + int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) { struct iommu_fault_alloc *cmd = ucmd->cmd; struct iommufd_fault *fault; - struct file *filep; int fdno; int rc;
@@ -382,27 +406,12 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) if (IS_ERR(fault)) return PTR_ERR(fault);
- fault->ictx = ucmd->ictx; - INIT_LIST_HEAD(&fault->deliver); xa_init_flags(&fault->response, XA_FLAGS_ALLOC1); - mutex_init(&fault->mutex); - init_waitqueue_head(&fault->wait_queue); - - filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops, - fault, O_RDWR); - if (IS_ERR(filep)) { - rc = PTR_ERR(filep); - goto out_abort; - }
- refcount_inc(&fault->obj.users); - iommufd_ctx_get(fault->ictx); - fault->filep = filep; - - fdno = get_unused_fd_flags(O_CLOEXEC); + fdno = iommufd_fault_init(fault, "[iommufd-pgfault]", ucmd->ictx); if (fdno < 0) { rc = fdno; - goto out_fput; + goto out_abort; }
cmd->out_fault_id = fault->obj.id; @@ -418,8 +427,7 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) return 0; out_put_fdno: put_unused_fd(fdno); -out_fput: - fput(filep); + fput(fault->filep); out_abort: iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
The new vIRQ object will need a similar function for drivers to report the vIOMMU related interrupts. Split the common part out to a smaller helper, and place it in the header so that CONFIG_IOMMUFD_DRIVER_CORE can include that in the driver.c file for drivers to use.
Then keep iommufd_fault_iopf_handler() in the header too, since it's quite simple after all.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 20 +++++++++++++++++++- drivers/iommu/iommufd/fault.c | 17 ----------------- 2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index b6d706cf2c66..8b378705ee71 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -451,6 +451,17 @@ struct iommufd_fault { struct wait_queue_head wait_queue; };
+static inline int iommufd_fault_notify(struct iommufd_fault *fault, + struct list_head *new_fault) +{ + mutex_lock(&fault->mutex); + list_add_tail(new_fault, &fault->deliver); + mutex_unlock(&fault->mutex); + + wake_up_interruptible(&fault->wait_queue); + return 0; +} + struct iommufd_attach_handle { struct iommu_attach_handle handle; struct iommufd_device *idev; @@ -469,7 +480,14 @@ iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj); -int iommufd_fault_iopf_handler(struct iopf_group *group); + +static inline int iommufd_fault_iopf_handler(struct iopf_group *group) +{ + struct iommufd_hw_pagetable *hwpt = + group->attach_handle->domain->fault_data; + + return iommufd_fault_notify(hwpt->fault, &group->node); +}
int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 1d1095fc8224..d188994e4e84 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -433,20 +433,3 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
return rc; } - -int iommufd_fault_iopf_handler(struct iopf_group *group) -{ - struct iommufd_hw_pagetable *hwpt; - struct iommufd_fault *fault; - - hwpt = group->attach_handle->domain->fault_data; - fault = hwpt->fault; - - mutex_lock(&fault->mutex); - list_add_tail(&group->node, &fault->deliver); - mutex_unlock(&fault->mutex); - - wake_up_interruptible(&fault->wait_queue); - - return 0; -}
The fault object was designed exclusively for hwpt's IO page faults (PRI). But its queue implementation can be reused for other purposes too, such as hardware IRQ and event injections to user space.
Meanwhile, a fault object holds a list of faults. So it's more accurate to call it a "fault queue". Combining the reusing idea above, abstract a new iommufd_eventq as a common structure embedded into struct iommufd_fault, similar to hwpt_paging holding a common hwpt.
Add a common iommufd_eventq_ops and iommufd_eventq_init to prepare for an IOMMUFD_OBJ_VIRQ.
Also, add missing xa_destroy and mutex_destroy in iommufd_fault_destroy().
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 52 ++++++--- drivers/iommu/iommufd/fault.c | 142 +++++++++++++++--------- drivers/iommu/iommufd/hw_pagetable.c | 6 +- 3 files changed, 130 insertions(+), 70 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 8b378705ee71..dfbc5cfbd164 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -18,6 +18,7 @@ struct iommu_domain; struct iommu_group; struct iommu_option; struct iommufd_device; +struct iommufd_eventq;
struct iommufd_ctx { struct file *file; @@ -433,32 +434,35 @@ void iopt_remove_access(struct io_pagetable *iopt, u32 iopt_access_list_id); void iommufd_access_destroy_object(struct iommufd_object *obj);
-/* - * An iommufd_fault object represents an interface to deliver I/O page faults - * to the user space. These objects are created/destroyed by the user space and - * associated with hardware page table objects during page-table allocation. - */ -struct iommufd_fault { +struct iommufd_eventq_ops { + ssize_t (*read)(struct iommufd_eventq *eventq, char __user *buf, + size_t count, loff_t *ppos); /* Mandatory op */ + ssize_t (*write)(struct iommufd_eventq *eventq, const char __user *buf, + size_t count, loff_t *ppos); /* Optional op */ +}; + +struct iommufd_eventq { struct iommufd_object obj; struct iommufd_ctx *ictx; struct file *filep;
- /* The lists of outstanding faults protected by below mutex. */ + const struct iommufd_eventq_ops *ops; + + /* The lists of outstanding events protected by below mutex. */ struct mutex mutex; struct list_head deliver; - struct xarray response;
struct wait_queue_head wait_queue; };
-static inline int iommufd_fault_notify(struct iommufd_fault *fault, - struct list_head *new_fault) +static inline int iommufd_eventq_notify(struct iommufd_eventq *eventq, + struct list_head *new_event) { - mutex_lock(&fault->mutex); - list_add_tail(new_fault, &fault->deliver); - mutex_unlock(&fault->mutex); + mutex_lock(&eventq->mutex); + list_add_tail(new_event, &eventq->deliver); + mutex_unlock(&eventq->mutex);
- wake_up_interruptible(&fault->wait_queue); + wake_up_interruptible(&eventq->wait_queue); return 0; }
@@ -470,12 +474,28 @@ struct iommufd_attach_handle { /* Convert an iommu attach handle to iommufd handle. */ #define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle)
+/* + * An iommufd_fault object represents an interface to deliver I/O page faults + * to the user space. These objects are created/destroyed by the user space and + * associated with hardware page table objects during page-table allocation. + */ +struct iommufd_fault { + struct iommufd_eventq common; + struct xarray response; +}; + +static inline struct iommufd_fault * +eventq_to_fault(struct iommufd_eventq *eventq) +{ + return container_of(eventq, struct iommufd_fault, common); +} + static inline struct iommufd_fault * iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) { return container_of(iommufd_get_object(ucmd->ictx, id, IOMMUFD_OBJ_FAULT), - struct iommufd_fault, obj); + struct iommufd_fault, common.obj); }
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); @@ -486,7 +506,7 @@ static inline int iommufd_fault_iopf_handler(struct iopf_group *group) struct iommufd_hw_pagetable *hwpt = group->attach_handle->domain->fault_data;
- return iommufd_fault_notify(hwpt->fault, &group->node); + return iommufd_eventq_notify(&hwpt->fault->common, &group->node); }
int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index d188994e4e84..e386b6c3e6ab 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -17,6 +17,8 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
+/* IOMMUFD_OBJ_FAULT Functions */ + static int iommufd_fault_iopf_enable(struct iommufd_device *idev) { struct device *dev = idev->dev; @@ -108,8 +110,8 @@ static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt, if (!fault) return;
- mutex_lock(&fault->mutex); - list_for_each_entry_safe(group, next, &fault->deliver, node) { + mutex_lock(&fault->common.mutex); + list_for_each_entry_safe(group, next, &fault->common.deliver, node) { if (group->attach_handle != &handle->handle) continue; list_del(&group->node); @@ -124,7 +126,7 @@ static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt, iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); iopf_free_group(group); } - mutex_unlock(&fault->mutex); + mutex_unlock(&fault->common.mutex); }
static struct iommufd_attach_handle * @@ -211,7 +213,8 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
void iommufd_fault_destroy(struct iommufd_object *obj) { - struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj); + struct iommufd_eventq *eventq = + container_of(obj, struct iommufd_eventq, obj); struct iopf_group *group, *next;
/* @@ -220,11 +223,13 @@ void iommufd_fault_destroy(struct iommufd_object *obj) * accessing this pointer. Therefore, acquiring the mutex here * is unnecessary. */ - list_for_each_entry_safe(group, next, &fault->deliver, node) { + list_for_each_entry_safe(group, next, &eventq->deliver, node) { list_del(&group->node); iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); iopf_free_group(group); } + xa_destroy(&eventq_to_fault(eventq)->response); + mutex_destroy(&eventq->mutex); }
static void iommufd_compose_fault_message(struct iommu_fault *fault, @@ -242,11 +247,12 @@ static void iommufd_compose_fault_message(struct iommu_fault *fault, hwpt_fault->cookie = cookie; }
-static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf, - size_t count, loff_t *ppos) +static ssize_t iommufd_fault_fops_read(struct iommufd_eventq *eventq, + char __user *buf, size_t count, + loff_t *ppos) { size_t fault_size = sizeof(struct iommu_hwpt_pgfault); - struct iommufd_fault *fault = filep->private_data; + struct iommufd_fault *fault = eventq_to_fault(eventq); struct iommu_hwpt_pgfault data; struct iommufd_device *idev; struct iopf_group *group; @@ -257,10 +263,10 @@ static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf, if (*ppos || count % fault_size) return -ESPIPE;
- mutex_lock(&fault->mutex); - while (!list_empty(&fault->deliver) && count > done) { - group = list_first_entry(&fault->deliver, - struct iopf_group, node); + mutex_lock(&eventq->mutex); + while (!list_empty(&eventq->deliver) && count > done) { + group = list_first_entry(&eventq->deliver, struct iopf_group, + node);
if (group->fault_count * fault_size > count - done) break; @@ -285,16 +291,17 @@ static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
list_del(&group->node); } - mutex_unlock(&fault->mutex); + mutex_unlock(&eventq->mutex);
return done == 0 ? rc : done; }
-static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf, - size_t count, loff_t *ppos) +static ssize_t iommufd_fault_fops_write(struct iommufd_eventq *eventq, + const char __user *buf, size_t count, + loff_t *ppos) { size_t response_size = sizeof(struct iommu_hwpt_page_response); - struct iommufd_fault *fault = filep->private_data; + struct iommufd_fault *fault = eventq_to_fault(eventq); struct iommu_hwpt_page_response response; struct iopf_group *group; size_t done = 0; @@ -303,7 +310,7 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b if (*ppos || count % response_size) return -ESPIPE;
- mutex_lock(&fault->mutex); + mutex_lock(&eventq->mutex); while (count > done) { rc = copy_from_user(&response, buf + done, response_size); if (rc) @@ -329,62 +336,93 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b iopf_free_group(group); done += response_size; } - mutex_unlock(&fault->mutex); + mutex_unlock(&eventq->mutex);
return done == 0 ? rc : done; }
-static __poll_t iommufd_fault_fops_poll(struct file *filep, - struct poll_table_struct *wait) +static const struct iommufd_eventq_ops iommufd_fault_ops = { + .read = &iommufd_fault_fops_read, + .write = &iommufd_fault_fops_write, +}; + +/* Common Event Queue Functions */ + +static ssize_t iommufd_eventq_fops_read(struct file *filep, char __user *buf, + size_t count, loff_t *ppos) { - struct iommufd_fault *fault = filep->private_data; + struct iommufd_eventq *eventq = filep->private_data; + + return eventq->ops->read(eventq, buf, count, ppos); +} + +static ssize_t iommufd_eventq_fops_write(struct file *filep, + const char __user *buf, size_t count, + loff_t *ppos) +{ + struct iommufd_eventq *eventq = filep->private_data; + + if (!eventq->ops->write) + return -EOPNOTSUPP; + return eventq->ops->write(eventq, buf, count, ppos); +} + +static __poll_t iommufd_eventq_fops_poll(struct file *filep, + struct poll_table_struct *wait) +{ + struct iommufd_eventq *eventq = filep->private_data; __poll_t pollflags = EPOLLOUT;
- poll_wait(filep, &fault->wait_queue, wait); - mutex_lock(&fault->mutex); - if (!list_empty(&fault->deliver)) + poll_wait(filep, &eventq->wait_queue, wait); + mutex_lock(&eventq->mutex); + if (!list_empty(&eventq->deliver)) pollflags |= EPOLLIN | EPOLLRDNORM; - mutex_unlock(&fault->mutex); + mutex_unlock(&eventq->mutex);
return pollflags; }
-static int iommufd_fault_fops_release(struct inode *inode, struct file *filep) +static int iommufd_eventq_fops_release(struct inode *inode, struct file *filep) { - struct iommufd_fault *fault = filep->private_data; + struct iommufd_eventq *eventq = filep->private_data;
- refcount_dec(&fault->obj.users); - iommufd_ctx_put(fault->ictx); + refcount_dec(&eventq->obj.users); + iommufd_ctx_put(eventq->ictx); return 0; }
-static const struct file_operations iommufd_fault_fops = { +static const struct file_operations iommufd_eventq_fops = { .owner = THIS_MODULE, .open = nonseekable_open, - .read = iommufd_fault_fops_read, - .write = iommufd_fault_fops_write, - .poll = iommufd_fault_fops_poll, - .release = iommufd_fault_fops_release, + .read = iommufd_eventq_fops_read, + .write = iommufd_eventq_fops_write, + .poll = iommufd_eventq_fops_poll, + .release = iommufd_eventq_fops_release, };
-static int iommufd_fault_init(struct iommufd_fault *fault, char *name, - struct iommufd_ctx *ictx) +static int iommufd_eventq_init(struct iommufd_eventq *eventq, char *name, + struct iommufd_ctx *ictx, + const struct iommufd_eventq_ops *ops) { struct file *filep; int fdno;
- mutex_init(&fault->mutex); - INIT_LIST_HEAD(&fault->deliver); - init_waitqueue_head(&fault->wait_queue); + if (WARN_ON_ONCE(!ops || !ops->read)) + return -EINVAL; + + mutex_init(&eventq->mutex); + INIT_LIST_HEAD(&eventq->deliver); + init_waitqueue_head(&eventq->wait_queue);
- filep = anon_inode_getfile(name, &iommufd_fault_fops, fault, O_RDWR); + filep = anon_inode_getfile(name, &iommufd_eventq_fops, eventq, O_RDWR); if (IS_ERR(filep)) return PTR_ERR(filep);
- fault->ictx = ictx; - iommufd_ctx_get(fault->ictx); - fault->filep = filep; - refcount_inc(&fault->obj.users); + eventq->ops = ops; + eventq->ictx = ictx; + iommufd_ctx_get(eventq->ictx); + refcount_inc(&eventq->obj.users); + eventq->filep = filep;
fdno = get_unused_fd_flags(O_CLOEXEC); if (fdno < 0) @@ -402,34 +440,36 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) if (cmd->flags) return -EOPNOTSUPP;
- fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT); + fault = __iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT, + common.obj); if (IS_ERR(fault)) return PTR_ERR(fault);
xa_init_flags(&fault->response, XA_FLAGS_ALLOC1);
- fdno = iommufd_fault_init(fault, "[iommufd-pgfault]", ucmd->ictx); + fdno = iommufd_eventq_init(&fault->common, "[iommufd-pgfault]", + ucmd->ictx, &iommufd_fault_ops); if (fdno < 0) { rc = fdno; goto out_abort; }
- cmd->out_fault_id = fault->obj.id; + cmd->out_fault_id = fault->common.obj.id; cmd->out_fault_fd = fdno;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) goto out_put_fdno; - iommufd_object_finalize(ucmd->ictx, &fault->obj); + iommufd_object_finalize(ucmd->ictx, &fault->common.obj);
- fd_install(fdno, fault->filep); + fd_install(fdno, fault->common.filep);
return 0; out_put_fdno: put_unused_fd(fdno); - fput(fault->filep); + fput(fault->common.filep); out_abort: - iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj); + iommufd_object_abort_and_destroy(ucmd->ictx, &fault->common.obj);
return rc; } diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index ce03c3804651..12a576f1f13d 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -14,7 +14,7 @@ static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) iommu_domain_free(hwpt->domain);
if (hwpt->fault) - refcount_dec(&hwpt->fault->obj.users); + refcount_dec(&hwpt->fault->common.obj.users); }
void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) @@ -403,8 +403,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) hwpt->fault = fault; hwpt->domain->iopf_handler = iommufd_fault_iopf_handler; hwpt->domain->fault_data = hwpt; - refcount_inc(&fault->obj.users); - iommufd_put_object(ucmd->ictx, &fault->obj); + refcount_inc(&fault->common.obj.users); + iommufd_put_object(ucmd->ictx, &fault->common.obj); }
cmd->out_hwpt_id = hwpt->obj.id;
Rename the file, aligning with the new eventq object.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/iommufd/{fault.c => eventq.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename drivers/iommu/iommufd/{fault.c => eventq.c} (100%)
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index cb784da6cddc..71d692c9a8f4 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only iommufd-y := \ device.o \ - fault.o \ + eventq.o \ hw_pagetable.o \ io_pagetable.o \ ioas.o \ diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/eventq.c similarity index 100% rename from drivers/iommu/iommufd/fault.c rename to drivers/iommu/iommufd/eventq.c
Allow a vIOMMU object to allocate vIRQ Event Queues, with a condition that each vIOMMU can only have one single vIRQ event queue per type.
Add iommufd_eventq_virq_alloc with an iommufd_eventq_virq_ops for this new ioctl.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 57 +++++++++++ include/linux/iommufd.h | 3 + include/uapi/linux/iommufd.h | 31 ++++++ drivers/iommu/iommufd/eventq.c | 129 ++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 2 + 6 files changed, 228 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index dfbc5cfbd164..fab3b21ac687 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -547,6 +547,49 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+/* + * An iommufd_virq object represents an interface to deliver vIOMMU interrupts + * to the user space. These objects are created/destroyed by the user space and + * associated with vIOMMU object(s) during the allocations. + */ +struct iommufd_virq { + struct iommufd_eventq common; + struct iommufd_viommu *viommu; + struct list_head node; + + unsigned int type; +}; + +static inline struct iommufd_virq *eventq_to_virq(struct iommufd_eventq *eventq) +{ + return container_of(eventq, struct iommufd_virq, common); +} + +static inline struct iommufd_virq *iommufd_get_virq(struct iommufd_ucmd *ucmd, + u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIRQ), + struct iommufd_virq, common.obj); +} + +int iommufd_virq_alloc(struct iommufd_ucmd *ucmd); +void iommufd_virq_destroy(struct iommufd_object *obj); +void iommufd_virq_abort(struct iommufd_object *obj); + +/* An iommufd_virq_header packs a vIOMMU interrupt in an iommufd_virq queue */ +struct iommufd_virq_header { + struct list_head node; + ssize_t irq_len; + void *irq_data; +}; + +static inline int iommufd_virq_handler(struct iommufd_virq *virq, + struct iommufd_virq_header *virq_header) +{ + return iommufd_eventq_notify(&virq->common, &virq_header->node); +} + static inline struct iommufd_viommu * iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) { @@ -555,6 +598,20 @@ iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) struct iommufd_viommu, obj); }
+static inline struct iommufd_virq * +iommufd_viommu_find_virq(struct iommufd_viommu *viommu, u32 type) +{ + struct iommufd_virq *virq, *next; + + lockdep_assert_held(&viommu->virqs_rwsem); + + list_for_each_entry_safe(virq, next, &viommu->virqs, node) { + if (virq->type == type) + return virq; + } + return NULL; +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 11110c749200..b082676c9e43 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -34,6 +34,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, IOMMUFD_OBJ_VDEVICE, + IOMMUFD_OBJ_VIRQ, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -93,6 +94,8 @@ struct iommufd_viommu { const struct iommufd_viommu_ops *ops;
struct xarray vdevs; + struct list_head virqs; + struct rw_semaphore virqs_rwsem;
unsigned int type; }; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 34810f6ae2b5..cdf2dba28d4a 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -55,6 +55,7 @@ enum { IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92, + IOMMUFD_CMD_VIRQ_ALLOC = 0x93, };
/** @@ -1012,4 +1013,34 @@ struct iommu_ioas_change_process { #define IOMMU_IOAS_CHANGE_PROCESS \ _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS)
+/** + * enum iommu_virq_type - Virtual IRQ Type + * @IOMMU_VIRQ_TYPE_NONE: INVALID type + */ +enum iommu_virq_type { + IOMMU_VIRQ_TYPE_NONE = 0, +}; + +/** + * struct iommu_virq_alloc - ioctl(IOMMU_VIRQ_ALLOC) + * @size: sizeof(struct iommu_virq_alloc) + * @flags: Must be 0 + * @viommu: virtual IOMMU ID to associate the virtual IRQ with + * @type: Type of the virtual IRQ. Must be defined in enum iommu_virq_type + * @out_virq_id: The ID of the new virtual IRQ + * @out_virq_fd: The fd of the new virtual IRQ. User space must close the + * successfully returned fd after using it + * + * Explicitly allocate a virtual IRQ interface for a vIOMMU. A vIOMMU can have + * multiple FDs for different @type, but is confined to one FD per @type. + */ +struct iommu_virq_alloc { + __u32 size; + __u32 flags; + __u32 viommu_id; + __u32 type; + __u32 out_virq_id; + __u32 out_virq_fd; +}; +#define IOMMU_VIRQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIRQ_ALLOC) #endif diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c index e386b6c3e6ab..a8921c745d36 100644 --- a/drivers/iommu/iommufd/eventq.c +++ b/drivers/iommu/iommufd/eventq.c @@ -346,6 +346,73 @@ static const struct iommufd_eventq_ops iommufd_fault_ops = { .write = &iommufd_fault_fops_write, };
+/* IOMMUFD_OBJ_VIRQ Functions */ + +void iommufd_virq_abort(struct iommufd_object *obj) +{ + struct iommufd_eventq *eventq = + container_of(obj, struct iommufd_eventq, obj); + struct iommufd_virq *virq = eventq_to_virq(eventq); + struct iommufd_viommu *viommu = virq->viommu; + struct iommufd_virq_header *cur, *next; + + lockdep_assert_held_write(&viommu->virqs_rwsem); + + list_for_each_entry_safe(cur, next, &eventq->deliver, node) { + list_del(&cur->node); + kfree(cur); + } + + refcount_dec(&viommu->obj.users); + mutex_destroy(&eventq->mutex); + list_del(&virq->node); +} + +void iommufd_virq_destroy(struct iommufd_object *obj) +{ + struct iommufd_virq *virq = + eventq_to_virq(container_of(obj, struct iommufd_eventq, obj)); + + down_write(&virq->viommu->virqs_rwsem); + iommufd_virq_abort(obj); + up_write(&virq->viommu->virqs_rwsem); +} + +static ssize_t iommufd_virq_fops_read(struct iommufd_eventq *eventq, + char __user *buf, size_t count, + loff_t *ppos) +{ + size_t done = 0; + int rc = 0; + + if (*ppos) + return -ESPIPE; + + mutex_lock(&eventq->mutex); + while (!list_empty(&eventq->deliver) && count > done) { + struct iommufd_virq_header *cur = list_first_entry( + &eventq->deliver, struct iommufd_virq_header, node); + + if (cur->irq_len > count - done) + break; + + if (copy_to_user(buf + done, cur->irq_data, cur->irq_len)) { + rc = -EFAULT; + break; + } + done += cur->irq_len; + list_del(&cur->node); + kfree(cur); + } + mutex_unlock(&eventq->mutex); + + return done == 0 ? rc : done; +} + +static const struct iommufd_eventq_ops iommufd_virq_ops = { + .read = &iommufd_virq_fops_read, +}; + /* Common Event Queue Functions */
static ssize_t iommufd_eventq_fops_read(struct file *filep, char __user *buf, @@ -473,3 +540,65 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
return rc; } + +int iommufd_virq_alloc(struct iommufd_ucmd *ucmd) +{ + struct iommu_virq_alloc *cmd = ucmd->cmd; + struct iommufd_viommu *viommu; + struct iommufd_virq *virq; + int fdno; + int rc; + + if (cmd->flags || cmd->type == IOMMU_VIRQ_TYPE_NONE) + return -EOPNOTSUPP; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + down_write(&viommu->virqs_rwsem); + + if (iommufd_viommu_find_virq(viommu, cmd->type)) { + rc = -EEXIST; + goto out_unlock_virqs; + } + + virq = __iommufd_object_alloc(ucmd->ictx, virq, IOMMUFD_OBJ_VIRQ, + common.obj); + if (IS_ERR(virq)) { + rc = PTR_ERR(virq); + goto out_unlock_virqs; + } + + virq->type = cmd->type; + virq->viommu = viommu; + refcount_inc(&viommu->obj.users); + list_add_tail(&virq->node, &viommu->virqs); + + fdno = iommufd_eventq_init(&virq->common, "[iommufd-viommu-irq]", + ucmd->ictx, &iommufd_virq_ops); + if (fdno < 0) { + rc = fdno; + goto out_abort; + } + + cmd->out_virq_id = virq->common.obj.id; + cmd->out_virq_fd = fdno; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_put_fdno; + + iommufd_object_finalize(ucmd->ictx, &virq->common.obj); + fd_install(fdno, virq->common.filep); + goto out_unlock_virqs; + +out_put_fdno: + put_unused_fd(fdno); + fput(virq->common.filep); +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &virq->common.obj); +out_unlock_virqs: + up_write(&viommu->virqs_rwsem); + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +} diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index cfbdf7b0e3c1..9d15978ef882 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -367,6 +367,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VIRQ_ALLOC, iommufd_virq_alloc, struct iommu_virq_alloc, + out_virq_fd), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -502,6 +504,10 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_FAULT] = { .destroy = iommufd_fault_destroy, }, + [IOMMUFD_OBJ_VIRQ] = { + .destroy = iommufd_virq_destroy, + .abort = iommufd_virq_abort, + }, [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 69b88e8c7c26..075b6aed79bc 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -59,6 +59,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; refcount_inc(&viommu->hwpt->common.obj.users); + INIT_LIST_HEAD(&viommu->virqs); + init_rwsem(&viommu->virqs_rwsem); /* * It is the most likely case that a physical IOMMU is unpluggable. A * pluggable IOMMU instance (if exists) is responsible for refcounting
On Tue, Dec 17, 2024 at 09:00:19PM -0800, Nicolin Chen wrote:
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index cfbdf7b0e3c1..9d15978ef882 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -367,6 +367,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id),
- IOCTL_OP(IOMMU_VIRQ_ALLOC, iommufd_virq_alloc, struct iommu_virq_alloc,
out_virq_fd),
This is missing the "struct iommu_virq_alloc" in union ucmd_buffer. Will include in v4.
Nicolin
On Tue, Dec 17, 2024 at 09:00:19PM -0800, Nicolin Chen wrote:
Allow a vIOMMU object to allocate vIRQ Event Queues, with a condition that each vIOMMU can only have one single vIRQ event queue per type.
I suggest you should tend to use the eventq as the primary naming not vIRQ, I think that will be a bit clearer.
The virq in the VM is edge triggered by an event queue FD becoming readable, but the event queue is the file descriptor that reports a batch of events on read().
The virq name evokes similarities to the virq in vfio which is purely about conveying if an IRQ edge has happened through an eventfd and has no event queue associated with it.
Jason
On Thu, Jan 02, 2025 at 04:45:07PM -0400, Jason Gunthorpe wrote:
On Tue, Dec 17, 2024 at 09:00:19PM -0800, Nicolin Chen wrote:
Allow a vIOMMU object to allocate vIRQ Event Queues, with a condition that each vIOMMU can only have one single vIRQ event queue per type.
I suggest you should tend to use the eventq as the primary naming not vIRQ, I think that will be a bit clearer.
The virq in the VM is edge triggered by an event queue FD becoming readable, but the event queue is the file descriptor that reports a batch of events on read().
The virq name evokes similarities to the virq in vfio which is purely about conveying if an IRQ edge has happened through an eventfd and has no event queue associated with it.
Ack. By doing the "Part-3: vEVENTQ" specifying one type of queue, I think the Part-4 then should be "vCMDQ" likewise v.s. "vQUEUE".
Thanks Nicolin
On Tue, Dec 17, 2024 at 09:00:19PM -0800, Nicolin Chen wrote:
+/* An iommufd_virq_header packs a vIOMMU interrupt in an iommufd_virq queue */ +struct iommufd_virq_header {
- struct list_head node;
- ssize_t irq_len;
- void *irq_data;
+};
Based on how it is used in iommufd_viommu_report_irq()
+ header = kzalloc(sizeof(*header) + irq_len, GFP_KERNEL); + header->irq_data = (void *)header + sizeof(*header); + memcpy(header->irq_data, irq_ptr, irq_len);
It should be a flex array and use the various flexarray tools
struct iommufd_virq_header { ssize_t irq_len; u64 irq_data[] __counted_by(irq_len); }
Jason
On Thu, Jan 02, 2025 at 04:52:46PM -0400, Jason Gunthorpe wrote:
On Tue, Dec 17, 2024 at 09:00:19PM -0800, Nicolin Chen wrote:
+/* An iommufd_virq_header packs a vIOMMU interrupt in an iommufd_virq queue */ +struct iommufd_virq_header {
- struct list_head node;
- ssize_t irq_len;
- void *irq_data;
+};
Based on how it is used in iommufd_viommu_report_irq()
header = kzalloc(sizeof(*header) + irq_len, GFP_KERNEL);
header->irq_data = (void *)header + sizeof(*header);
memcpy(header->irq_data, irq_ptr, irq_len);
It should be a flex array and use the various flexarray tools
struct iommufd_virq_header { ssize_t irq_len; u64 irq_data[] __counted_by(irq_len); }
Changed to ------------------------------------------------------------------------- /* An iommufd_vevent represents a vIOMMU event in an iommufd_veventq */ struct iommufd_vevent { struct list_head node; ssize_t data_len; u64 event_data[] __counted_by(data_len); }; [...] vevent = kmalloc(struct_size(vevent, event_data, data_len), GFP_KERNEL); - header->irq_data = (void *)header + sizeof(*header); -------------------------------------------------------------------------
Thanks Nicolin
On Thu, Jan 02, 2025 at 07:30:21PM -0800, Nicolin Chen wrote:
On Thu, Jan 02, 2025 at 04:52:46PM -0400, Jason Gunthorpe wrote:
On Tue, Dec 17, 2024 at 09:00:19PM -0800, Nicolin Chen wrote:
+/* An iommufd_virq_header packs a vIOMMU interrupt in an iommufd_virq queue */ +struct iommufd_virq_header {
- struct list_head node;
- ssize_t irq_len;
- void *irq_data;
+};
Based on how it is used in iommufd_viommu_report_irq()
header = kzalloc(sizeof(*header) + irq_len, GFP_KERNEL);
header->irq_data = (void *)header + sizeof(*header);
memcpy(header->irq_data, irq_ptr, irq_len);
It should be a flex array and use the various flexarray tools
struct iommufd_virq_header { ssize_t irq_len; u64 irq_data[] __counted_by(irq_len); }
Changed to
/* An iommufd_vevent represents a vIOMMU event in an iommufd_veventq */ struct iommufd_vevent { struct list_head node; ssize_t data_len; u64 event_data[] __counted_by(data_len); }; [...] vevent = kmalloc(struct_size(vevent, event_data, data_len), GFP_KERNEL);
- header->irq_data = (void *)header + sizeof(*header);
Yeah, that's right
Jason
This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM.
Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 20 ++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index b082676c9e43..ac1f1897d290 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -190,6 +190,8 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, enum iommufd_object_type type); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); +unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, + struct device *dev); #else /* !CONFIG_IOMMUFD_DRIVER_CORE */ static inline struct iommufd_object * _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, @@ -203,6 +205,12 @@ iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) { return NULL; } + +static inline unsigned long +iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, struct device *dev) +{ + return 0; +} #endif /* CONFIG_IOMMUFD_DRIVER_CORE */
/* diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 2d98b04ff1cb..e5d7397c0a6c 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -49,5 +49,25 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, } EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, "IOMMUFD");
+/* Return 0 if device is not associated to the vIOMMU */ +unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, + struct device *dev) +{ + struct iommufd_vdevice *vdev; + unsigned long vdev_id = 0; + unsigned long index; + + xa_lock(&viommu->vdevs); + xa_for_each(&viommu->vdevs, index, vdev) { + if (vdev && vdev->dev == dev) { + vdev_id = (unsigned long)vdev->id; + break; + } + } + xa_unlock(&viommu->vdevs); + return vdev_id; +} +EXPORT_SYMBOL_NS_GPL(iommufd_viommu_get_vdev_id, "IOMMUFD"); + MODULE_DESCRIPTION("iommufd code shared with builtin modules"); MODULE_LICENSE("GPL");
On 12/18/24 13:00, Nicolin Chen wrote:
This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM.
Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone.
Signed-off-by: Nicolin Chennicolinc@nvidia.com
include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 20 ++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index b082676c9e43..ac1f1897d290 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -190,6 +190,8 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, enum iommufd_object_type type); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); +unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
struct device *dev);
Hi Nicolin,
This series overall looks good to me. But I have a question that might be irrelevant to this series itself.
The iommufd provides both IOMMUFD_OBJ_DEVICE and IOMMUFD_OBJ_VDEVICE objects. What is the essential difference between these two from userspace's perspective? And, which object ID should the IOMMU device driver provide when reporting other events in the future?
Currently, the IOMMUFD uAPI reports IOMMUFD_OBJ_DEVICE in the page fault message, and IOMMUFD_OBJ_VDEVICE (if I understand it correctly) in the vIRQ message. It will be more future-proof if this could be defined clearly.
Thanks, baolu
On Thu, Dec 19, 2024 at 10:05:53AM +0800, Baolu Lu wrote:
On 12/18/24 13:00, Nicolin Chen wrote:
This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM.
Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone.
Signed-off-by: Nicolin Chennicolinc@nvidia.com
include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 20 ++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index b082676c9e43..ac1f1897d290 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -190,6 +190,8 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, enum iommufd_object_type type); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); +unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
struct device *dev);
Hi Nicolin,
This series overall looks good to me. But I have a question that might be irrelevant to this series itself.
The iommufd provides both IOMMUFD_OBJ_DEVICE and IOMMUFD_OBJ_VDEVICE objects. What is the essential difference between these two from userspace's perspective?
A quick answer is an IOMMUFD_OBJ_DEVICE being a host physical device and an IOMMUFD_OBJ_VDEVICE being an IOMMUFD_OBJ_DEVICE related to IOMMUFD_OBJ_VIOMMU. Two of them can be seen in two different layers. May refer to this graph: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docu...
And, which object ID should the IOMMU device driver provide when reporting other events in the future?
Currently, the IOMMUFD uAPI reports IOMMUFD_OBJ_DEVICE in the page fault message, and IOMMUFD_OBJ_VDEVICE (if I understand it correctly) in the vIRQ message. It will be more future-proof if this could be defined clearly.
A vIRQ is actually reported per-vIOMMU in this design. Although in the this series the SMMU driver seems to report a per-device vIRQ, it internally converts the vDEVICE to a virtual device ID and packs the virtual device ID into a per-vIOMMU event:
+/** + * struct iommu_virq_arm_smmuv3 - ARM SMMUv3 Virtual IRQ + * (IOMMU_VIRQ_TYPE_ARM_SMMUV3) + * @evt: 256-bit ARM SMMUv3 Event record, little-endian. + * (Refer to "7.3 Event records" in SMMUv3 HW Spec) + * + * StreamID field reports a virtual device ID. To receive a virtual IRQ for a + * device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC. + */ +struct iommu_virq_arm_smmuv3 { + __aligned_le64 evt[4]; };
Thanks Nicolin
On 12/19/24 13:06, Nicolin Chen wrote:
On Thu, Dec 19, 2024 at 10:05:53AM +0800, Baolu Lu wrote:
On 12/18/24 13:00, Nicolin Chen wrote:
This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM.
Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone.
Signed-off-by: Nicolin Chennicolinc@nvidia.com
include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 20 ++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index b082676c9e43..ac1f1897d290 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -190,6 +190,8 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, enum iommufd_object_type type); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); +unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
struct device *dev);
Hi Nicolin,
This series overall looks good to me. But I have a question that might be irrelevant to this series itself.
The iommufd provides both IOMMUFD_OBJ_DEVICE and IOMMUFD_OBJ_VDEVICE objects. What is the essential difference between these two from userspace's perspective?
A quick answer is an IOMMUFD_OBJ_DEVICE being a host physical device and an IOMMUFD_OBJ_VDEVICE being an IOMMUFD_OBJ_DEVICE related to IOMMUFD_OBJ_VIOMMU. Two of them can be seen in two different layers. May refer to this graph: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ Documentation/userspace-api/iommufd.rst?h=v6.13-rc3#n150
And, which object ID should the IOMMU device driver provide when reporting other events in the future?
Currently, the IOMMUFD uAPI reports IOMMUFD_OBJ_DEVICE in the page fault message, and IOMMUFD_OBJ_VDEVICE (if I understand it correctly) in the vIRQ message. It will be more future-proof if this could be defined clearly.
A vIRQ is actually reported per-vIOMMU in this design. Although in the this series the SMMU driver seems to report a per-device vIRQ, it internally converts the vDEVICE to a virtual device ID and packs the virtual device ID into a per-vIOMMU event:
+/**
- struct iommu_virq_arm_smmuv3 - ARM SMMUv3 Virtual IRQ
(IOMMU_VIRQ_TYPE_ARM_SMMUV3)
- @evt: 256-bit ARM SMMUv3 Event record, little-endian.
(Refer to "7.3 Event records" in SMMUv3 HW Spec)
- StreamID field reports a virtual device ID. To receive a virtual IRQ for a
- device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC.
- */
+struct iommu_virq_arm_smmuv3 {
- __aligned_le64 evt[4]; };
Thanks for the explanation. Maybe I am a bit over-considering here.
Initially, my understanding is to report a virtual device ID when the object originates from a vIOMMU, and an iommufd device ID otherwise.
However, considering page fault scenarios, which are self-contained but linked to a hardware page table (hwpt), introduces ambiguity. Hwpt can be created with or without a vIOMMU. This raises the question: should the page fault message always report the iommufd device ID, or should the reporting depend on whether the hwpt was created from a vIOMMU?
Thanks, baolu
On Mon, Dec 23, 2024 at 10:28:32AM +0800, Baolu Lu wrote:
On 12/19/24 13:06, Nicolin Chen wrote:
On Thu, Dec 19, 2024 at 10:05:53AM +0800, Baolu Lu wrote:
On 12/18/24 13:00, Nicolin Chen wrote:
This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM.
Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone.
Signed-off-by: Nicolin Chennicolinc@nvidia.com
include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 20 ++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index b082676c9e43..ac1f1897d290 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -190,6 +190,8 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, enum iommufd_object_type type); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); +unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
struct device *dev);
Hi Nicolin,
This series overall looks good to me. But I have a question that might be irrelevant to this series itself.
The iommufd provides both IOMMUFD_OBJ_DEVICE and IOMMUFD_OBJ_VDEVICE objects. What is the essential difference between these two from userspace's perspective?
A quick answer is an IOMMUFD_OBJ_DEVICE being a host physical device and an IOMMUFD_OBJ_VDEVICE being an IOMMUFD_OBJ_DEVICE related to IOMMUFD_OBJ_VIOMMU. Two of them can be seen in two different layers. May refer to this graph: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ Documentation/userspace-api/iommufd.rst?h=v6.13-rc3#n150
And, which object ID should the IOMMU device driver provide when reporting other events in the future?
Currently, the IOMMUFD uAPI reports IOMMUFD_OBJ_DEVICE in the page fault message, and IOMMUFD_OBJ_VDEVICE (if I understand it correctly) in the vIRQ message. It will be more future-proof if this could be defined clearly.
A vIRQ is actually reported per-vIOMMU in this design. Although in the this series the SMMU driver seems to report a per-device vIRQ, it internally converts the vDEVICE to a virtual device ID and packs the virtual device ID into a per-vIOMMU event:
+/**
- struct iommu_virq_arm_smmuv3 - ARM SMMUv3 Virtual IRQ
(IOMMU_VIRQ_TYPE_ARM_SMMUV3)
- @evt: 256-bit ARM SMMUv3 Event record, little-endian.
(Refer to "7.3 Event records" in SMMUv3 HW Spec)
- StreamID field reports a virtual device ID. To receive a virtual IRQ for a
- device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC.
- */
+struct iommu_virq_arm_smmuv3 {
- __aligned_le64 evt[4]; };
Thanks for the explanation. Maybe I am a bit over-considering here.
Initially, my understanding is to report a virtual device ID when the object originates from a vIOMMU, and an iommufd device ID otherwise.
However, considering page fault scenarios, which are self-contained but linked to a hardware page table (hwpt), introduces ambiguity. Hwpt can be created with or without a vIOMMU. This raises the question: should the page fault message always report the iommufd device ID, or should the reporting depend on whether the hwpt was created from a vIOMMU?
As you mentioned, HWPT itself can report IO page faults regardless of vIOMMU-based or not, i.e. it should just work fine with a HWPT- based model or a vIOMMU-based model.
On the other hand, I think vIRQ can be seen as just a supplementary pathway to report non-HWPT faults, e.g. in arm-smmu-v3's interrupt handler, the logic is: if (pri_is_supported && fault_is_iopgfault) report via hwpt->fault; else if (virq_is_registered && fault_is_virq) report via virq; else print an unhandled irq;
Thanks Nicolin
On Mon, Dec 23, 2024 at 10:28:32AM +0800, Baolu Lu wrote:
However, considering page fault scenarios, which are self-contained but linked to a hardware page table (hwpt), introduces ambiguity. Hwpt can be created with or without a vIOMMU. This raises the question: should the page fault message always report the iommufd device ID, or should the reporting depend on whether the hwpt was created from a vIOMMU?
I think every single event record read from the FD needs to clearly specify what its fields are.
Page fault need to clearly say it's field is a device ID.
Jason
On 1/3/25 04:29, Jason Gunthorpe wrote:
On Mon, Dec 23, 2024 at 10:28:32AM +0800, Baolu Lu wrote:
However, considering page fault scenarios, which are self-contained but linked to a hardware page table (hwpt), introduces ambiguity. Hwpt can be created with or without a vIOMMU. This raises the question: should the page fault message always report the iommufd device ID, or should the reporting depend on whether the hwpt was created from a vIOMMU?
I think every single event record read from the FD needs to clearly specify what its fields are.
That would work.
Page fault need to clearly say it's field is a device ID.
Each field of fault message has been specified in uapi/linux/iommufd.h.
--- baolu
Similar to iommu_report_device_fault, this allows IOMMU drivers to report, from threaded IRQ handlers to user space hypervisors, IRQs or events that belong to a vIOMMU.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 9 +++++++++ drivers/iommu/iommufd/driver.c | 37 ++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index ac1f1897d290..c5909125775a 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -192,6 +192,8 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, struct device *dev); +int iommufd_viommu_report_irq(struct iommufd_viommu *viommu, unsigned int type, + void *irq_ptr, size_t irq_len); #else /* !CONFIG_IOMMUFD_DRIVER_CORE */ static inline struct iommufd_object * _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, @@ -211,6 +213,13 @@ iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, struct device *dev) { return 0; } + +static inline int iommufd_viommu_report_irq(struct iommufd_viommu *viommu, + unsigned int type, void *irq_ptr, + size_t irq_len) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_IOMMUFD_DRIVER_CORE */
/* diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index e5d7397c0a6c..2ab793f27f72 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -69,5 +69,42 @@ unsigned long iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, } EXPORT_SYMBOL_NS_GPL(iommufd_viommu_get_vdev_id, "IOMMUFD");
+/* Typically called in driver's threaded IRQ handler */ +int iommufd_viommu_report_irq(struct iommufd_viommu *viommu, unsigned int type, + void *irq_ptr, size_t irq_len) +{ + struct iommufd_virq_header *header; + struct iommufd_virq *virq; + int rc = 0; + + if (!viommu) + return -ENODEV; + if (WARN_ON_ONCE(!irq_len || !irq_ptr)) + return -EINVAL; + + down_read(&viommu->virqs_rwsem); + + virq = iommufd_viommu_find_virq(viommu, type); + if (!virq) { + rc = -EOPNOTSUPP; + goto out_unlock_virqs; + } + + header = kzalloc(sizeof(*header) + irq_len, GFP_KERNEL); + if (!header) { + rc = -ENOMEM; + goto out_unlock_virqs; + } + header->irq_data = (void *)header + sizeof(*header); + memcpy(header->irq_data, irq_ptr, irq_len); + header->irq_len = irq_len; + + iommufd_virq_handler(virq, header); +out_unlock_virqs: + up_read(&viommu->virqs_rwsem); + return rc; +} +EXPORT_SYMBOL_NS_GPL(iommufd_viommu_report_irq, "IOMMUFD"); + MODULE_DESCRIPTION("iommufd code shared with builtin modules"); MODULE_LICENSE("GPL");
When attaching a device to a vIOMMU-based nested domain, vdev_id must be present. Add a piece of code hard-requesting it, for vIRQ support in the following patch. Then, update the TEST_F.
A HWPT-based nested domain will return a NULL new_viommu, thus no such a vDEVICE requirement.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/selftest.c | 23 +++++++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 5 +++++ 2 files changed, 28 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index a0de6d6d4e68..d1438d81e664 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -161,7 +161,10 @@ enum selftest_obj_type {
struct mock_dev { struct device dev; + struct mock_viommu *viommu; + struct rw_semaphore viommu_rwsem; unsigned long flags; + unsigned long vdev_id; int id; u32 cache[MOCK_DEV_CACHE_NUM]; }; @@ -193,10 +196,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain, struct device *dev) { struct mock_dev *mdev = to_mock_dev(dev); + struct mock_viommu *new_viommu = NULL; + unsigned long vdev_id = 0;
if (domain->dirty_ops && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) return -EINVAL;
+ iommu_group_mutex_assert(dev); + if (domain->type == IOMMU_DOMAIN_NESTED) { + new_viommu = to_mock_nested(domain)->mock_viommu; + if (new_viommu) { + vdev_id = iommufd_viommu_get_vdev_id(&new_viommu->core, + dev); + if (!vdev_id) + return -ENOENT; + } + } + if (new_viommu != mdev->viommu) { + down_write(&mdev->viommu_rwsem); + mdev->viommu = new_viommu; + mdev->vdev_id = vdev_id; + up_write(&mdev->viommu_rwsem); + } + return 0; }
@@ -861,6 +883,7 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags) if (!mdev) return ERR_PTR(-ENOMEM);
+ init_rwsem(&mdev->viommu_rwsem); device_initialize(&mdev->dev); mdev->flags = dev_flags; mdev->dev.release = mock_dev_release; diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index a1b2b657999d..212e5d62e13d 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2736,6 +2736,7 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) uint32_t iopf_hwpt_id; uint32_t fault_id; uint32_t fault_fd; + uint32_t vdev_id;
if (self->device_id) { test_ioctl_fault_alloc(&fault_id, &fault_fd); @@ -2752,6 +2753,10 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
+ /* Must allocate vdevice before attaching to a nested hwpt */ + test_err_mock_domain_replace(ENOENT, self->stdev_id, + iopf_hwpt_id); + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, iopf_hwpt_id));
The handler will get vDEVICE object from the given mdev and convert it to its per-vIOMMU virtual ID to mimic a real IOMMU driver.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 10 ++++++++++ drivers/iommu/iommufd/selftest.c | 30 ++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index a6b7a163f636..3037904f2e52 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -24,6 +24,7 @@ enum { IOMMU_TEST_OP_MD_CHECK_IOTLB, IOMMU_TEST_OP_TRIGGER_IOPF, IOMMU_TEST_OP_DEV_CHECK_CACHE, + IOMMU_TEST_OP_TRIGGER_VIRQ, };
enum { @@ -145,6 +146,9 @@ struct iommu_test_cmd { __u32 id; __u32 cache; } check_dev_cache; + struct { + __u32 dev_id; + } trigger_virq; }; __u32 last; }; @@ -212,4 +216,10 @@ struct iommu_viommu_invalidate_selftest { __u32 cache_id; };
+#define IOMMU_VIRQ_TYPE_SELFTEST 0xbeefbeef + +struct iommu_viommu_irq_selftest { + __u32 virt_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index d1438d81e664..0785c9447102 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1631,6 +1631,34 @@ static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd, return 0; }
+static int iommufd_test_trigger_virq(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct iommu_viommu_irq_selftest test = {}; + struct iommufd_device *idev; + struct mock_dev *mdev; + int rc = -ENOENT; + + idev = iommufd_get_device(ucmd, cmd->trigger_virq.dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + mdev = to_mock_dev(idev->dev); + + down_read(&mdev->viommu_rwsem); + if (!mdev->viommu || !mdev->vdev_id) + goto out_unlock; + + test.virt_id = mdev->vdev_id; + rc = iommufd_viommu_report_irq(&mdev->viommu->core, + IOMMU_VIRQ_TYPE_SELFTEST, &test, + sizeof(test)); +out_unlock: + up_read(&mdev->viommu_rwsem); + iommufd_put_object(ucmd->ictx, &idev->obj); + + return rc; +} + void iommufd_selftest_destroy(struct iommufd_object *obj) { struct selftest_obj *sobj = to_selftest_obj(obj); @@ -1712,6 +1740,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->dirty.flags); case IOMMU_TEST_OP_TRIGGER_IOPF: return iommufd_test_trigger_iopf(ucmd, cmd); + case IOMMU_TEST_OP_TRIGGER_VIRQ: + return iommufd_test_trigger_virq(ucmd, cmd); default: return -EOPNOTSUPP; }
Trigger a vIRQ giving an idev ID, to test the loopback whether receiving or not the vdev_id that was set to the idev by the line above.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 63 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 22 +++++++ .../selftests/iommu/iommufd_fail_nth.c | 6 ++ 3 files changed, 91 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index d979f5b0efe8..9f03955cb198 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -9,6 +9,7 @@ #include <sys/ioctl.h> #include <stdint.h> #include <assert.h> +#include <poll.h>
#include "../kselftest_harness.h" #include "../../../../drivers/iommu/iommufd/iommufd_test.h" @@ -936,3 +937,65 @@ static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, EXPECT_ERRNO(_errno, \ _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ virt_id, vdev_id)) + +static int _test_cmd_virq_alloc(int fd, __u32 viommu_id, __u32 type, + __u32 *virq_id, __u32 *virq_fd) +{ + struct iommu_virq_alloc cmd = { + .size = sizeof(cmd), + .type = type, + .viommu_id = viommu_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VIRQ_ALLOC, &cmd); + if (ret) + return ret; + if (virq_id) + *virq_id = cmd.out_virq_id; + if (virq_fd) + *virq_fd = cmd.out_virq_fd; + return 0; +} + +#define test_cmd_virq_alloc(viommu_id, type, virq_id, virq_fd) \ + ASSERT_EQ(0, _test_cmd_virq_alloc(self->fd, viommu_id, type, virq_id, \ + virq_fd)) +#define test_err_virq_alloc(_errno, viommu_id, type, virq_id, virq_fd) \ + EXPECT_ERRNO(_errno, _test_cmd_virq_alloc(self->fd, viommu_id, type, \ + virq_id, virq_fd)) + +static int _test_cmd_trigger_virq(int fd, __u32 dev_id, __u32 event_fd, + __u32 virt_id) +{ + struct iommu_test_cmd trigger_virq_cmd = { + .size = sizeof(trigger_virq_cmd), + .op = IOMMU_TEST_OP_TRIGGER_VIRQ, + .trigger_virq = { + .dev_id = dev_id, + }, + }; + struct pollfd pollfd = { .fd = event_fd, .events = POLLIN }; + struct iommu_viommu_irq_selftest irq; + ssize_t bytes; + int ret; + + ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VIRQ), + &trigger_virq_cmd); + if (ret) + return ret; + + ret = poll(&pollfd, 1, 1000); + if (ret < 0) + return ret; + + bytes = read(event_fd, &irq, sizeof(irq)); + if (bytes <= 0) + return -EIO; + + return irq.virt_id == virt_id ? 0 : -EINVAL; +} + +#define test_cmd_trigger_virq(dev_id, event_fd, vdev_id) \ + ASSERT_EQ(0, \ + _test_cmd_trigger_virq(self->fd, dev_id, event_fd, vdev_id)) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 212e5d62e13d..b15ebc963e56 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2774,15 +2774,37 @@ TEST_F(iommufd_viommu, vdevice_alloc) uint32_t viommu_id = self->viommu_id; uint32_t dev_id = self->device_id; uint32_t vdev_id = 0; + uint32_t virq_id; + uint32_t virq_fd;
if (dev_id) { + /* Must allocate vdevice before attaching to a nested hwpt */ + test_err_mock_domain_replace(ENOENT, self->stdev_id, + self->nested_hwpt_id); + + test_cmd_virq_alloc(viommu_id, IOMMU_VIRQ_TYPE_SELFTEST, + &virq_id, &virq_fd); + test_err_virq_alloc(EEXIST, viommu_id, IOMMU_VIRQ_TYPE_SELFTEST, + NULL, NULL); /* Set vdev_id to 0x99, unset it, and set to 0x88 */ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_cmd_mock_domain_replace(self->stdev_id, + self->nested_hwpt_id); + test_cmd_trigger_virq(dev_id, virq_fd, 0x99); test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99, &vdev_id); + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); test_ioctl_destroy(vdev_id); + + /* Try again with 0x88 */ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id); + test_cmd_mock_domain_replace(self->stdev_id, + self->nested_hwpt_id); + test_cmd_trigger_virq(dev_id, virq_fd, 0x88); + close(virq_fd); + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); test_ioctl_destroy(vdev_id); + test_ioctl_destroy(virq_id); } else { test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL); } diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index 64b1f8e1b0cf..442442de3a75 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -620,6 +620,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) }; struct iommu_test_hw_info info; uint32_t fault_id, fault_fd; + uint32_t virq_id, virq_fd; uint32_t fault_hwpt_id; uint32_t ioas_id; uint32_t ioas_id2; @@ -692,6 +693,11 @@ TEST_FAIL_NTH(basic_fail_nth, device) IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data))) return -1;
+ if (_test_cmd_virq_alloc(self->fd, viommu_id, IOMMU_VIRQ_TYPE_SELFTEST, + &virq_id, &virq_fd)) + return -1; + close(virq_fd); + return 0; }
With the introduction of the new objects, update the doc to reflect that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- Documentation/userspace-api/iommufd.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 70289d6815d2..5b4ef5d74fd1 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -63,6 +63,13 @@ Following IOMMUFD objects are exposed to userspace: space usually has mappings from guest-level I/O virtual addresses to guest- level physical addresses.
+- IOMMUFD_FAULT, representing a software queue for an HWPT reporting IO page + faults using the IOMMU HW's PRI (Page Request Interface). This queue object + provides user space an FD to poll the page fault events and also to respond + to those events. A FAULT object must be created first to get a fault_id that + could be then used to allocate a fault-enabled HWPT via the IOMMU_HWPT_ALLOC + command by setting the IOMMU_HWPT_FAULT_ID_VALID bit in its flags field. + - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, passed to or shared with a VM. It may be some HW-accelerated virtualization features and some SW resources used by the VM. For examples: @@ -109,6 +116,13 @@ Following IOMMUFD objects are exposed to userspace: vIOMMU, which is a separate ioctl call from attaching the same device to an HWPT_PAGING that the vIOMMU holds.
+- IOMMUFD_OBJ_VIRQ, representing a software queue for a vIOMMU reporting events + such as translation faults occurred to a nested stage-1 and HW-specific irqs. + This queue object provides user space an FD to poll the vIOMMU events/virqs. + A vIOMMU object must be created first to get its viommu_id that could be then + used to allocate a VIRQ. Each vIOMMU can support multiple types of VIRQs, but + is confined to one VIRQ per vIRQ type. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -251,8 +265,10 @@ User visible objects are backed by following datastructures: - iommufd_device for IOMMUFD_OBJ_DEVICE. - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. +- iommufd_fault for IOMMUFD_OBJ_FAULT. - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. - iommufd_vdevice for IOMMUFD_OBJ_VDEVICE. +- iommufd_virq for IOMMUFD_OBJ_VIRQ.
Several terminologies when looking at these datastructures:
Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will be the first use case. Then, add a rw_semaphore to protect it.
Also add a pair of arm_smmu_attach_prepare/commit_vmaster helpers to set or unset the master->vmaster point. Put these helpers inside the existing arm_smmu_attach_prepare/commit(). Note that identity and blocked ops don't call arm_smmu_attach_prepare/commit(), thus simply call the new helpers at the top, so a device attaching to an identity/blocked domain can unset the master->vmaster when the device is moving away from a nested domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 23 +++++++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 49 +++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 +++++++++++- 3 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index bd9d7c85576a..4435ad7db776 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -799,6 +799,11 @@ struct arm_smmu_stream { struct rb_node node; };
+struct arm_smmu_vmaster { + struct arm_vsmmu *vsmmu; + unsigned long vsid; +}; + struct arm_smmu_event { u8 stall : 1, ssv : 1, @@ -824,6 +829,8 @@ struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; struct arm_smmu_stream *streams; + struct arm_smmu_vmaster *vmaster; + struct rw_semaphore vmaster_rwsem; /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; unsigned int num_streams; @@ -972,6 +979,7 @@ struct arm_smmu_attach_state { bool disable_ats; ioasid_t ssid; /* Resulting state */ + struct arm_smmu_vmaster *vmaster; bool ats_enabled; };
@@ -1055,9 +1063,24 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, struct iommu_domain *parent, struct iommufd_ctx *ictx, unsigned int viommu_type); +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, + struct iommu_domain *domain); +void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state); #else #define arm_smmu_hw_info NULL #define arm_vsmmu_alloc NULL + +static inline int +arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, + struct iommu_domain *domain) +{ + return 0; /* NOP */ +} + +static inline void +arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state) +{ +} #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index c7cc613050d9..2b6253ef0e8f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -85,6 +85,55 @@ static void arm_smmu_make_nested_domain_ste( } }
+int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, + struct iommu_domain *domain) +{ + struct arm_smmu_nested_domain *nested_domain; + struct arm_smmu_vmaster *vmaster; + unsigned long vsid; + unsigned int cfg; + + iommu_group_mutex_assert(state->master->dev); + + if (domain->type != IOMMU_DOMAIN_NESTED) + return 0; + nested_domain = to_smmu_nested_domain(domain); + + /* Skip ABORT/BYPASS or invalid vSTE */ + cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0])); + if (cfg == STRTAB_STE_0_CFG_ABORT || cfg == STRTAB_STE_0_CFG_BYPASS) + return 0; + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) + return 0; + + vsid = iommufd_viommu_get_vdev_id(&nested_domain->vsmmu->core, + state->master->dev); + /* Fail the attach if vSID is not correct set by the user space */ + if (!vsid) + return -ENOENT; + + vmaster = kzalloc(sizeof(*vmaster), GFP_KERNEL); + if (!vmaster) + return -ENOMEM; + vmaster->vsmmu = nested_domain->vsmmu; + vmaster->vsid = vsid; + state->vmaster = vmaster; + + return 0; +} + +void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state) +{ + struct arm_smmu_master *master = state->master; + + down_write(&master->vmaster_rwsem); + if (state->vmaster != master->vmaster) { + kfree(master->vmaster); + master->vmaster = state->vmaster; + } + up_write(&master->vmaster_rwsem); +} + static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, struct device *dev) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ea76f25c0661..686c171dd273 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2802,6 +2802,7 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(new_domain); unsigned long flags; + int ret;
/* * arm_smmu_share_asid() must not see two domains pointing to the same @@ -2831,9 +2832,15 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, }
if (smmu_domain) { + ret = arm_smmu_attach_prepare_vmaster(state, new_domain); + if (ret) + return ret; + master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); - if (!master_domain) + if (!master_domain) { + kfree(state->vmaster); return -ENOMEM; + } master_domain->master = master; master_domain->ssid = state->ssid; if (new_domain->type == IOMMU_DOMAIN_NESTED) @@ -2860,6 +2867,7 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); kfree(master_domain); + kfree(state->vmaster); return -EINVAL; }
@@ -2892,6 +2900,8 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state)
lockdep_assert_held(&arm_smmu_asid_lock);
+ arm_smmu_attach_commit_vmaster(state); + if (state->ats_enabled && !master->ats_enabled) { arm_smmu_enable_ats(master); } else if (state->ats_enabled && master->ats_enabled) { @@ -3158,8 +3168,17 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct device *dev) { + int ret; struct arm_smmu_ste ste; struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_attach_state state = { + .master = master, + }; + + ret = arm_smmu_attach_prepare_vmaster(&state, domain); + if (ret) + return ret; + arm_smmu_attach_commit_vmaster(&state);
arm_smmu_make_bypass_ste(master->smmu, &ste); arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS); @@ -3178,7 +3197,17 @@ static struct iommu_domain arm_smmu_identity_domain = { static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, struct device *dev) { + int ret; struct arm_smmu_ste ste; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_attach_state state = { + .master = master, + }; + + ret = arm_smmu_attach_prepare_vmaster(&state, domain); + if (ret) + return ret; + arm_smmu_attach_commit_vmaster(&state);
arm_smmu_make_abort_ste(&ste); arm_smmu_attach_dev_ste(domain, dev, &ste, @@ -3428,6 +3457,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
master->dev = dev; master->smmu = smmu; + init_rwsem(&master->vmaster_rwsem); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master);
Aside from the IOPF framework, iommufd provides an additional pathway to report a hardware event or IRQ, via the vIRQ of vIOMMU infrastructure.
Define an iommu_virq_arm_smmuv3 uAPI structure, and report stage-1 faults in the threaded IRQ handler.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++ include/uapi/linux/iommufd.h | 15 +++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 16 +++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 58 +++++++++++-------- 4 files changed, 71 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 4435ad7db776..d24c3d8ee397 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -1066,6 +1066,7 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, struct iommu_domain *domain); void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state); +int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, u64 *evt); #else #define arm_smmu_hw_info NULL #define arm_vsmmu_alloc NULL @@ -1081,6 +1082,12 @@ static inline void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state) { } + +static inline int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, + u64 *evt) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */ diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index cdf2dba28d4a..579529ff6fa7 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -1016,9 +1016,24 @@ struct iommu_ioas_change_process { /** * enum iommu_virq_type - Virtual IRQ Type * @IOMMU_VIRQ_TYPE_NONE: INVALID type + * @IOMMU_VIRQ_TYPE_ARM_SMMUV3: ARM SMMUv3 Virtual Event */ enum iommu_virq_type { IOMMU_VIRQ_TYPE_NONE = 0, + IOMMU_VIRQ_TYPE_ARM_SMMUV3 = 1, +}; + +/** + * struct iommu_virq_arm_smmuv3 - ARM SMMUv3 Virtual IRQ + * (IOMMU_VIRQ_TYPE_ARM_SMMUV3) + * @evt: 256-bit ARM SMMUv3 Event record, little-endian. + * (Refer to "7.3 Event records" in SMMUv3 HW Spec) + * + * StreamID field reports a virtual device ID. To receive a virtual IRQ for a + * device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC. + */ +struct iommu_virq_arm_smmuv3 { + __aligned_le64 evt[4]; };
/** diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 2b6253ef0e8f..e85456c7ff52 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -447,4 +447,20 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, return &vsmmu->core; }
+int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, u64 *evt) +{ + struct iommu_virq_arm_smmuv3 virq_data = + *(struct iommu_virq_arm_smmuv3 *)evt; + + virq_data.evt[0] &= ~EVTQ_0_SID; + virq_data.evt[0] |= FIELD_PREP(EVTQ_0_SID, vmaster->vsid); + + virq_data.evt[0] = cpu_to_le64(virq_data.evt[0]); + virq_data.evt[1] = cpu_to_le64(virq_data.evt[1]); + + return iommufd_viommu_report_irq(&vmaster->vsmmu->core, + IOMMU_VIRQ_TYPE_ARM_SMMUV3, &virq_data, + sizeof(virq_data)); +} + MODULE_IMPORT_NS("IOMMUFD"); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 686c171dd273..59fbc342a095 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1812,8 +1812,8 @@ static void arm_smmu_decode_event(struct arm_smmu_device *smmu, u64 *raw, mutex_unlock(&smmu->streams_mutex); }
-static int arm_smmu_handle_event(struct arm_smmu_device *smmu, - struct arm_smmu_event *event) +static int arm_smmu_handle_event(struct arm_smmu_device *smmu, u64 *evt, + struct arm_smmu_event *event) { int ret = 0; u32 perm = 0; @@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu, return -EOPNOTSUPP; }
- if (!event->stall) - return -EOPNOTSUPP; - - if (event->read) - perm |= IOMMU_FAULT_PERM_READ; - else - perm |= IOMMU_FAULT_PERM_WRITE; + if (event->stall) { + if (event->read) + perm |= IOMMU_FAULT_PERM_READ; + else + perm |= IOMMU_FAULT_PERM_WRITE;
- if (event->instruction) - perm |= IOMMU_FAULT_PERM_EXEC; + if (event->instruction) + perm |= IOMMU_FAULT_PERM_EXEC;
- if (event->privileged) - perm |= IOMMU_FAULT_PERM_PRIV; + if (event->privileged) + perm |= IOMMU_FAULT_PERM_PRIV;
- flt->type = IOMMU_FAULT_PAGE_REQ; - flt->prm = (struct iommu_fault_page_request) { - .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, - .grpid = event->stag, - .perm = perm, - .addr = event->iova, - }; + flt->type = IOMMU_FAULT_PAGE_REQ; + flt->prm = (struct iommu_fault_page_request){ + .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, + .grpid = event->stag, + .perm = perm, + .addr = event->iova, + };
- if (event->ssv) { - flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; - flt->prm.pasid = event->ssid; + if (event->ssv) { + flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; + flt->prm.pasid = event->ssid; + } }
mutex_lock(&smmu->streams_mutex); @@ -1865,7 +1864,16 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu, goto out_unlock; }
- ret = iommu_report_device_fault(master->dev, &fault_evt); + if (event->stall) { + ret = iommu_report_device_fault(master->dev, &fault_evt); + } else { + down_read(&master->vmaster_rwsem); + if (master->vmaster && !event->s2) + ret = arm_vmaster_report_event(master->vmaster, evt); + else + ret = -EFAULT; /* Unhandled events should be pinned */ + up_read(&master->vmaster_rwsem); + } out_unlock: mutex_unlock(&smmu->streams_mutex); return ret; @@ -1943,7 +1951,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev) do { while (!queue_remove_raw(q, evt)) { arm_smmu_decode_event(smmu, evt, &event); - if (arm_smmu_handle_event(smmu, &event)) + if (arm_smmu_handle_event(smmu, evt, &event)) arm_smmu_dump_event(smmu, evt, &event, &rs);
put_device(event.dev);
linux-kselftest-mirror@lists.linaro.org