As the part-2 of the VIOMMU infrastructure, this series introduces a VIRQ object after repurposing the existing FAULT object, which provides a nice notification pathway to the user space already. So, the first thing to do is reworking the FAULT object.
Mimicing the HWPT structures, add a common EVENT structure to support its derivatives: EVENT_IOPF (the prior FAULT object) and EVENT_VIRQ (new one). IOMMUFD_CMD_VIRQ_ALLOC is introduced to allocate EVENT_VIRQ for a VIOMMU. One VIOMMU can have multiple VIRQs in different types but can not support multiple VIRQs with the same types.
Drivers might need the VIOMMU's vdev_id list or the exact vdev_id link of the passthrough device's to forward IRQs/events via the VIOMMU framework. Thus, extend the set/unset_vdev_id ioctls down to the driver using VIOMMU ops. This allows drivers to take the control of a vdev_id's lifecycle.
The forwarding part is fairly simple but might need to replace a physical device ID with a virtual device ID. So, there comes with some helpers for drivers to use.
As usual, this series comes with the selftest coverage for this new VIRQ, and with a real world use case in the ARM SMMUv3 driver.
This must be based on the VIOMMU Part-1 series. It's on Github: https://github.com/nicolinc/iommufd/commits/iommufd_virq-v1 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_virq-v1
Thanks! Nicolin
Nicolin Chen (10): iommufd: Rename IOMMUFD_OBJ_FAULT to IOMMUFD_OBJ_EVENT_IOPF iommufd: Rename fault.c to event.c iommufd: Add IOMMUFD_OBJ_EVENT_VIRQ and IOMMUFD_CMD_VIRQ_ALLOC iommufd/viommu: Allow drivers to control vdev_id lifecycle iommufd/viommu: Add iommufd_vdev_id_to_dev helper iommufd/viommu: Add iommufd_viommu_report_irq helper iommufd/selftest: Implement mock_viommu_set/unset_vdev_id iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VIRQ for VIRQ coverage iommufd/selftest: Add EVENT_VIRQ test coverage iommu/arm-smmu-v3: Report virtual IRQ for device in user space
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 109 +++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 + drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/iommufd/device.c | 2 + drivers/iommu/iommufd/event.c | 613 ++++++++++++++++++ drivers/iommu/iommufd/fault.c | 443 ------------- drivers/iommu/iommufd/hw_pagetable.c | 12 +- drivers/iommu/iommufd/iommufd_private.h | 147 ++++- drivers/iommu/iommufd/iommufd_test.h | 10 + drivers/iommu/iommufd/main.c | 13 +- drivers/iommu/iommufd/selftest.c | 66 ++ drivers/iommu/iommufd/viommu.c | 25 +- drivers/iommu/iommufd/viommu_api.c | 54 ++ include/linux/iommufd.h | 28 + include/uapi/linux/iommufd.h | 46 ++ tools/testing/selftests/iommu/iommufd.c | 11 + tools/testing/selftests/iommu/iommufd_utils.h | 64 ++ 17 files changed, 1130 insertions(+), 517 deletions(-) create mode 100644 drivers/iommu/iommufd/event.c delete mode 100644 drivers/iommu/iommufd/fault.c
A fault object was designed exclusively for hwpt's IO page faults. But the implementation of the object could actually be used for other purposes too such as hardware IRQ and event.
Define a common event structure. Embed it into another iommufd_event_iopf, similar to hwpt_paging holding a common hwpt.
Roll out a minimal level of renamings and abstractions. Add a common event ops to prepare for IOMMUFD_OBJ_EVENT_VIRQ. Also move event handlers to the header, which will be called by a viommu_api module for IOMMUFD_DRIVER.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/fault.c | 270 +++++++++++++----------- drivers/iommu/iommufd/hw_pagetable.c | 12 +- drivers/iommu/iommufd/iommufd_private.h | 87 +++++--- drivers/iommu/iommufd/main.c | 8 +- 4 files changed, 224 insertions(+), 153 deletions(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index df03411c8728..8fea142e1ac2 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -16,7 +16,9 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
-static int iommufd_fault_iopf_enable(struct iommufd_device *idev) +/* IOMMUFD_OBJ_EVENT_IOPF Functions */ + +static int iommufd_event_iopf_enable(struct iommufd_device *idev) { struct device *dev = idev->dev; int ret; @@ -45,7 +47,7 @@ static int iommufd_fault_iopf_enable(struct iommufd_device *idev) return ret; }
-static void iommufd_fault_iopf_disable(struct iommufd_device *idev) +static void iommufd_event_iopf_disable(struct iommufd_device *idev) { mutex_lock(&idev->iopf_lock); if (!WARN_ON(idev->iopf_enabled == 0)) { @@ -55,8 +57,8 @@ static void iommufd_fault_iopf_disable(struct iommufd_device *idev) mutex_unlock(&idev->iopf_lock); }
-static int __fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev) +static int __event_iopf_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) { struct iommufd_attach_handle *handle; int ret; @@ -74,37 +76,37 @@ static int __fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, return ret; }
-int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev) +int iommufd_event_iopf_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) { int ret;
if (!hwpt->fault) return -EINVAL;
- ret = iommufd_fault_iopf_enable(idev); + ret = iommufd_event_iopf_enable(idev); if (ret) return ret;
- ret = __fault_domain_attach_dev(hwpt, idev); + ret = __event_iopf_domain_attach_dev(hwpt, idev); if (ret) - iommufd_fault_iopf_disable(idev); + iommufd_event_iopf_disable(idev);
return ret; }
-static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt, - struct iommufd_attach_handle *handle) +static void iommufd_event_iopf_auto_response(struct iommufd_hw_pagetable *hwpt, + struct iommufd_attach_handle *handle) { - struct iommufd_fault *fault = hwpt->fault; + struct iommufd_event_iopf *fault = hwpt->fault; struct iopf_group *group, *next; unsigned long index;
if (!fault) return;
- mutex_lock(&fault->mutex); - list_for_each_entry_safe(group, next, &fault->deliver, node) { + mutex_lock(&fault->common.mutex); + list_for_each_entry_safe(group, next, &fault->common.deliver, node) { if (group->attach_handle != &handle->handle) continue; list_del(&group->node); @@ -119,7 +121,7 @@ static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt, iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); iopf_free_group(group); } - mutex_unlock(&fault->mutex); + mutex_unlock(&fault->common.mutex); }
static struct iommufd_attach_handle * @@ -134,21 +136,21 @@ iommufd_device_get_attach_handle(struct iommufd_device *idev) return to_iommufd_handle(handle); }
-void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev) +void iommufd_event_iopf_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) { struct iommufd_attach_handle *handle;
handle = iommufd_device_get_attach_handle(idev); iommu_detach_group_handle(hwpt->domain, idev->igroup->group); - iommufd_auto_response_faults(hwpt, handle); - iommufd_fault_iopf_disable(idev); + iommufd_event_iopf_auto_response(hwpt, handle); + iommufd_event_iopf_disable(idev); kfree(handle); }
-static int __fault_domain_replace_dev(struct iommufd_device *idev, - struct iommufd_hw_pagetable *hwpt, - struct iommufd_hw_pagetable *old) +static int __event_iopf_domain_replace_dev(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old) { struct iommufd_attach_handle *handle, *curr = NULL; int ret; @@ -171,43 +173,44 @@ static int __fault_domain_replace_dev(struct iommufd_device *idev, }
if (!ret && curr) { - iommufd_auto_response_faults(old, curr); + iommufd_event_iopf_auto_response(old, curr); kfree(curr); }
return ret; }
-int iommufd_fault_domain_replace_dev(struct iommufd_device *idev, - struct iommufd_hw_pagetable *hwpt, - struct iommufd_hw_pagetable *old) +int iommufd_event_iopf_domain_replace_dev(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old) { bool iopf_off = !hwpt->fault && old->fault; bool iopf_on = hwpt->fault && !old->fault; int ret;
if (iopf_on) { - ret = iommufd_fault_iopf_enable(idev); + ret = iommufd_event_iopf_enable(idev); if (ret) return ret; }
- ret = __fault_domain_replace_dev(idev, hwpt, old); + ret = __event_iopf_domain_replace_dev(idev, hwpt, old); if (ret) { if (iopf_on) - iommufd_fault_iopf_disable(idev); + iommufd_event_iopf_disable(idev); return ret; }
if (iopf_off) - iommufd_fault_iopf_disable(idev); + iommufd_event_iopf_disable(idev);
return 0; }
-void iommufd_fault_destroy(struct iommufd_object *obj) +void iommufd_event_iopf_destroy(struct iommufd_object *obj) { - struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj); + struct iommufd_event *event = + container_of(obj, struct iommufd_event, obj); struct iopf_group *group, *next;
/* @@ -216,17 +219,17 @@ void iommufd_fault_destroy(struct iommufd_object *obj) * accessing this pointer. Therefore, acquiring the mutex here * is unnecessary. */ - list_for_each_entry_safe(group, next, &fault->deliver, node) { + list_for_each_entry_safe(group, next, &event->deliver, node) { list_del(&group->node); iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); iopf_free_group(group); } }
-static void iommufd_compose_fault_message(struct iommu_fault *fault, - struct iommu_hwpt_pgfault *hwpt_fault, - struct iommufd_device *idev, - u32 cookie) +static void iommufd_compose_iopf_message(struct iommu_fault *fault, + struct iommu_hwpt_pgfault *hwpt_fault, + struct iommufd_device *idev, + u32 cookie) { hwpt_fault->flags = fault->prm.flags; hwpt_fault->dev_id = idev->obj.id; @@ -238,11 +241,12 @@ static void iommufd_compose_fault_message(struct iommu_fault *fault, hwpt_fault->cookie = cookie; }
-static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf, - size_t count, loff_t *ppos) +static ssize_t iommufd_event_iopf_fops_read(struct iommufd_event *event, + char __user *buf, size_t count, + loff_t *ppos) { + struct iommufd_event_iopf *fault = to_event_iopf(event); size_t fault_size = sizeof(struct iommu_hwpt_pgfault); - struct iommufd_fault *fault = filep->private_data; struct iommu_hwpt_pgfault data; struct iommufd_device *idev; struct iopf_group *group; @@ -253,9 +257,9 @@ static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf, if (*ppos || count % fault_size) return -ESPIPE;
- mutex_lock(&fault->mutex); - while (!list_empty(&fault->deliver) && count > done) { - group = list_first_entry(&fault->deliver, + mutex_lock(&event->mutex); + while (!list_empty(&event->deliver) && count > done) { + group = list_first_entry(&event->deliver, struct iopf_group, node);
if (group->fault_count * fault_size > count - done) @@ -268,9 +272,8 @@ static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
idev = to_iommufd_handle(group->attach_handle)->idev; list_for_each_entry(iopf, &group->faults, list) { - iommufd_compose_fault_message(&iopf->fault, - &data, idev, - group->cookie); + iommufd_compose_iopf_message(&iopf->fault, &data, + idev, group->cookie); if (copy_to_user(buf + done, &data, fault_size)) { xa_erase(&fault->response, group->cookie); rc = -EFAULT; @@ -281,16 +284,17 @@ static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
list_del(&group->node); } - mutex_unlock(&fault->mutex); + mutex_unlock(&event->mutex);
return done == 0 ? rc : done; }
-static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf, - size_t count, loff_t *ppos) +static ssize_t iommufd_event_iopf_fops_write(struct iommufd_event *event, + const char __user *buf, + size_t count, loff_t *ppos) { size_t response_size = sizeof(struct iommu_hwpt_page_response); - struct iommufd_fault *fault = filep->private_data; + struct iommufd_event_iopf *fault = to_event_iopf(event); struct iommu_hwpt_page_response response; struct iopf_group *group; size_t done = 0; @@ -299,7 +303,7 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b if (*ppos || count % response_size) return -ESPIPE;
- mutex_lock(&fault->mutex); + mutex_lock(&event->mutex); while (count > done) { rc = copy_from_user(&response, buf + done, response_size); if (rc) @@ -325,119 +329,149 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b iopf_free_group(group); done += response_size; } - mutex_unlock(&fault->mutex); + mutex_unlock(&event->mutex);
return done == 0 ? rc : done; }
-static __poll_t iommufd_fault_fops_poll(struct file *filep, +static const struct iommufd_event_ops iommufd_event_iopf_ops = { + .read = &iommufd_event_iopf_fops_read, + .write = &iommufd_event_iopf_fops_write, +}; + +/* Common Event Functions */ + +static ssize_t iommufd_event_fops_read(struct file *filep, char __user *buf, + size_t count, loff_t *ppos) +{ + struct iommufd_event *event = filep->private_data; + + if (!event->ops || !event->ops->read) + return -EOPNOTSUPP; + return event->ops->read(event, buf, count, ppos); +} + +static ssize_t iommufd_event_fops_write(struct file *filep, + const char __user *buf, + size_t count, loff_t *ppos) +{ + struct iommufd_event *event = filep->private_data; + + if (!event->ops || !event->ops->write) + return -EOPNOTSUPP; + return event->ops->write(event, buf, count, ppos); +} + +static __poll_t iommufd_event_fops_poll(struct file *filep, struct poll_table_struct *wait) { - struct iommufd_fault *fault = filep->private_data; + struct iommufd_event *event = filep->private_data; __poll_t pollflags = EPOLLOUT;
- poll_wait(filep, &fault->wait_queue, wait); - mutex_lock(&fault->mutex); - if (!list_empty(&fault->deliver)) + poll_wait(filep, &event->wait_queue, wait); + mutex_lock(&event->mutex); + if (!list_empty(&event->deliver)) pollflags |= EPOLLIN | EPOLLRDNORM; - mutex_unlock(&fault->mutex); + mutex_unlock(&event->mutex);
return pollflags; }
-static int iommufd_fault_fops_release(struct inode *inode, struct file *filep) +static void iommufd_event_deinit(struct iommufd_event *event) { - struct iommufd_fault *fault = filep->private_data; + refcount_dec(&event->obj.users); + iommufd_ctx_put(event->ictx); + mutex_destroy(&event->mutex); +}
- refcount_dec(&fault->obj.users); - iommufd_ctx_put(fault->ictx); +static int iommufd_event_fops_release(struct inode *inode, struct file *filep) +{ + iommufd_event_deinit((struct iommufd_event *)filep->private_data); return 0; }
-static const struct file_operations iommufd_fault_fops = { +static const struct file_operations iommufd_event_fops = { .owner = THIS_MODULE, .open = nonseekable_open, - .read = iommufd_fault_fops_read, - .write = iommufd_fault_fops_write, - .poll = iommufd_fault_fops_poll, - .release = iommufd_fault_fops_release, + .read = iommufd_event_fops_read, + .write = iommufd_event_fops_write, + .poll = iommufd_event_fops_poll, + .release = iommufd_event_fops_release, .llseek = no_llseek, };
-int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) +static int iommufd_event_init(struct iommufd_event *event, char *name, + struct iommufd_ctx *ictx, int *out_fdno, + const struct iommufd_event_ops *ops) { - struct iommu_fault_alloc *cmd = ucmd->cmd; - struct iommufd_fault *fault; struct file *filep; int fdno; + + event->ops = ops; + event->ictx = ictx; + INIT_LIST_HEAD(&event->deliver); + mutex_init(&event->mutex); + init_waitqueue_head(&event->wait_queue); + + filep = anon_inode_getfile(name, &iommufd_event_fops, + event, O_RDWR); + if (IS_ERR(filep)) + return PTR_ERR(filep); + + refcount_inc(&event->obj.users); + iommufd_ctx_get(event->ictx); + event->filep = filep; + + fdno = get_unused_fd_flags(O_CLOEXEC); + if (fdno < 0) { + fput(filep); + iommufd_event_deinit(event); + return fdno; + } + if (out_fdno) + *out_fdno = fdno; + return 0; +} + +int iommufd_event_iopf_alloc(struct iommufd_ucmd *ucmd) +{ + struct iommu_fault_alloc *cmd = ucmd->cmd; + struct iommufd_event_iopf *event_iopf; + int fdno; int rc;
if (cmd->flags) return -EOPNOTSUPP;
- fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT); - if (IS_ERR(fault)) - return PTR_ERR(fault); + event_iopf = __iommufd_object_alloc(ucmd->ictx, event_iopf, + IOMMUFD_OBJ_EVENT_IOPF, common.obj); + if (IS_ERR(event_iopf)) + return PTR_ERR(event_iopf);
- fault->ictx = ucmd->ictx; - INIT_LIST_HEAD(&fault->deliver); - xa_init_flags(&fault->response, XA_FLAGS_ALLOC1); - mutex_init(&fault->mutex); - init_waitqueue_head(&fault->wait_queue); + xa_init_flags(&event_iopf->response, XA_FLAGS_ALLOC1);
- filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops, - fault, O_RDWR); - if (IS_ERR(filep)) { - rc = PTR_ERR(filep); + rc = iommufd_event_init(&event_iopf->common, "[iommufd-pgfault]", + ucmd->ictx, &fdno, &iommufd_event_iopf_ops); + if (rc) goto out_abort; - } - - refcount_inc(&fault->obj.users); - iommufd_ctx_get(fault->ictx); - fault->filep = filep;
- fdno = get_unused_fd_flags(O_CLOEXEC); - if (fdno < 0) { - rc = fdno; - goto out_fput; - } - - cmd->out_fault_id = fault->obj.id; + cmd->out_fault_id = event_iopf->common.obj.id; cmd->out_fault_fd = fdno;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) goto out_put_fdno; - iommufd_object_finalize(ucmd->ictx, &fault->obj); + iommufd_object_finalize(ucmd->ictx, &event_iopf->common.obj);
- fd_install(fdno, fault->filep); + fd_install(fdno, event_iopf->common.filep);
return 0; out_put_fdno: put_unused_fd(fdno); -out_fput: - fput(filep); - refcount_dec(&fault->obj.users); - iommufd_ctx_put(fault->ictx); + fput(event_iopf->common.filep); + iommufd_event_deinit(&event_iopf->common); out_abort: - iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj); + iommufd_object_abort_and_destroy(ucmd->ictx, &event_iopf->common.obj);
return rc; } - -int iommufd_fault_iopf_handler(struct iopf_group *group) -{ - struct iommufd_hw_pagetable *hwpt; - struct iommufd_fault *fault; - - hwpt = group->attach_handle->domain->fault_data; - fault = hwpt->fault; - - mutex_lock(&fault->mutex); - list_add_tail(&group->node, &fault->deliver); - mutex_unlock(&fault->mutex); - - wake_up_interruptible(&fault->wait_queue); - - return 0; -} diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 6aaec1b32abc..ca5c003a02da 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -14,7 +14,7 @@ static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) iommu_domain_free(hwpt->domain);
if (hwpt->fault) - refcount_dec(&hwpt->fault->obj.users); + refcount_dec(&hwpt->fault->common.obj.users); }
void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) @@ -342,18 +342,18 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) }
if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { - struct iommufd_fault *fault; + struct iommufd_event_iopf *fault;
- fault = iommufd_get_fault(ucmd, cmd->fault_id); + fault = iommufd_get_event_iopf(ucmd, cmd->fault_id); if (IS_ERR(fault)) { rc = PTR_ERR(fault); goto out_hwpt; } hwpt->fault = fault; - hwpt->domain->iopf_handler = iommufd_fault_iopf_handler; + hwpt->domain->iopf_handler = iommufd_event_iopf_handler; hwpt->domain->fault_data = hwpt; - refcount_inc(&fault->obj.users); - iommufd_put_object(ucmd->ictx, &fault->obj); + refcount_inc(&fault->common.obj.users); + iommufd_put_object(ucmd->ictx, &fault->common.obj); }
cmd->out_hwpt_id = hwpt->obj.id; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 7831b0ca6528..c22d72c981c7 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -19,6 +19,7 @@ struct iommu_domain; struct iommu_group; struct iommu_option; struct iommufd_device; +struct iommufd_event;
struct iommufd_ctx { struct file *file; @@ -131,7 +132,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_HWPT_NESTED, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, - IOMMUFD_OBJ_FAULT, + IOMMUFD_OBJ_EVENT_IOPF, IOMMUFD_OBJ_VIOMMU, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, @@ -297,7 +298,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt, struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain; - struct iommufd_fault *fault; + struct iommufd_event_iopf *fault; };
struct iommufd_hwpt_paging { @@ -456,24 +457,42 @@ void iopt_remove_access(struct io_pagetable *iopt, u32 iopt_access_list_id); void iommufd_access_destroy_object(struct iommufd_object *obj);
+struct iommufd_event_ops { + ssize_t (*read)(struct iommufd_event *event, char __user *buf, + size_t count, loff_t *ppos); + ssize_t (*write)(struct iommufd_event *event, const char __user *buf, + size_t count, loff_t *ppos); +}; + /* - * An iommufd_fault object represents an interface to deliver I/O page faults - * to the user space. These objects are created/destroyed by the user space and - * associated with hardware page table objects during page-table allocation. + * An iommufd_event object represents an interface to deliver IOMMU events + * to the user space. These objects are created/destroyed by the user space. */ -struct iommufd_fault { +struct iommufd_event { struct iommufd_object obj; struct iommufd_ctx *ictx; struct file *filep;
- /* The lists of outstanding faults protected by below mutex. */ + const struct iommufd_event_ops *ops; + + /* The lists of outstanding events protected by below mutex. */ struct mutex mutex; struct list_head deliver; - struct xarray response;
struct wait_queue_head wait_queue; };
+static inline int iommufd_event_notify(struct iommufd_event *event, + struct list_head *node) +{ + mutex_lock(&event->mutex); + list_add_tail(node, &event->deliver); + mutex_unlock(&event->mutex); + + wake_up_interruptible(&event->wait_queue); + return 0; +} + struct iommufd_attach_handle { struct iommu_attach_handle handle; struct iommufd_device *idev; @@ -482,31 +501,49 @@ struct iommufd_attach_handle { /* Convert an iommu attach handle to iommufd handle. */ #define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle)
-static inline struct iommufd_fault * -iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) +struct iommufd_event_iopf { + struct iommufd_event common; + struct xarray response; +}; + +static inline struct iommufd_event_iopf * +to_event_iopf(struct iommufd_event *event) +{ + return container_of(event, struct iommufd_event_iopf, common); +} + +static inline struct iommufd_event_iopf * +iommufd_get_event_iopf(struct iommufd_ucmd *ucmd, u32 id) { return container_of(iommufd_get_object(ucmd->ictx, id, - IOMMUFD_OBJ_FAULT), - struct iommufd_fault, obj); + IOMMUFD_OBJ_EVENT_IOPF), + struct iommufd_event_iopf, common.obj); }
-int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); -void iommufd_fault_destroy(struct iommufd_object *obj); -int iommufd_fault_iopf_handler(struct iopf_group *group); +int iommufd_event_iopf_alloc(struct iommufd_ucmd *ucmd); +void iommufd_event_iopf_destroy(struct iommufd_object *obj); + +static inline int iommufd_event_iopf_handler(struct iopf_group *group) +{ + struct iommufd_hw_pagetable *hwpt = + group->attach_handle->domain->fault_data; + + return iommufd_event_notify(&hwpt->fault->common, &group->node); +}
-int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev); -void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev); -int iommufd_fault_domain_replace_dev(struct iommufd_device *idev, - struct iommufd_hw_pagetable *hwpt, - struct iommufd_hw_pagetable *old); +int iommufd_event_iopf_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +void iommufd_event_iopf_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +int iommufd_event_iopf_domain_replace_dev(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old);
static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { if (hwpt->fault) - return iommufd_fault_domain_attach_dev(hwpt, idev); + return iommufd_event_iopf_domain_attach_dev(hwpt, idev);
return iommu_attach_group(hwpt->domain, idev->igroup->group); } @@ -515,7 +552,7 @@ static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { if (hwpt->fault) - iommufd_fault_domain_detach_dev(hwpt, idev); + iommufd_event_iopf_domain_detach_dev(hwpt, idev);
iommu_detach_group(hwpt->domain, idev->igroup->group); } @@ -525,7 +562,7 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, struct iommufd_hw_pagetable *old) { if (old->fault || hwpt->fault) - return iommufd_fault_domain_replace_dev(idev, hwpt, old); + return iommufd_event_iopf_domain_replace_dev(idev, hwpt, old);
return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); } diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 199ad90fa36b..015f492afab1 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -359,8 +359,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), - IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc, - out_fault_fd), + IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_event_iopf_alloc, + struct iommu_fault_alloc, out_fault_fd), IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, @@ -525,8 +525,8 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_hwpt_nested_destroy, .abort = iommufd_hwpt_nested_abort, }, - [IOMMUFD_OBJ_FAULT] = { - .destroy = iommufd_fault_destroy, + [IOMMUFD_OBJ_EVENT_IOPF] = { + .destroy = iommufd_event_iopf_destroy, }, [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy,
Rename the file, aligning with the new event object.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/iommufd/{fault.c => event.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename drivers/iommu/iommufd/{fault.c => event.c} (100%)
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 288ef3e895e3..baabb714b56c 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only iommufd-y := \ device.o \ - fault.o \ + event.o \ hw_pagetable.o \ io_pagetable.o \ ioas.o \ diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/event.c similarity index 100% rename from drivers/iommu/iommufd/fault.c rename to drivers/iommu/iommufd/event.c
Allow a VIOMMU object to allocate VIRQ events. Each VIOMMU is allowed to have multiple VIRQ events but they must not have a duplicated type.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/event.c | 136 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 54 ++++++++++ drivers/iommu/iommufd/main.c | 5 + drivers/iommu/iommufd/viommu.c | 2 + include/uapi/linux/iommufd.h | 32 ++++++ 5 files changed, 229 insertions(+)
diff --git a/drivers/iommu/iommufd/event.c b/drivers/iommu/iommufd/event.c index 8fea142e1ac2..f10827ce9cbd 100644 --- a/drivers/iommu/iommufd/event.c +++ b/drivers/iommu/iommufd/event.c @@ -339,6 +339,67 @@ static const struct iommufd_event_ops iommufd_event_iopf_ops = { .write = &iommufd_event_iopf_fops_write, };
+/* IOMMUFD_OBJ_EVENT_VIRQ Functions */ + +void iommufd_event_virq_destroy(struct iommufd_object *obj) +{ + struct iommufd_event *event = + container_of(obj, struct iommufd_event, obj); + struct iommufd_event_virq *event_virq = to_event_virq(event); + struct iommufd_viommu_irq *virq, *next; + + /* + * The iommufd object's reference count is zero at this point. + * We can be confident that no other threads are currently + * accessing this pointer. Therefore, acquiring the mutex here + * is unnecessary. + */ + list_for_each_entry_safe(virq, next, &event->deliver, node) { + list_del(&virq->node); + kfree(virq); + } + destroy_workqueue(event_virq->irq_wq); + list_del(&event_virq->node); + refcount_dec(&event_virq->viommu->obj.users); +} + +static ssize_t +iommufd_event_virq_fops_read(struct iommufd_event *event, + char __user *buf, size_t count, loff_t *ppos) +{ + size_t done = 0; + int rc = 0; + + if (*ppos) + return -ESPIPE; + + mutex_lock(&event->mutex); + while (!list_empty(&event->deliver) && count > done) { + struct iommufd_viommu_irq *virq = + list_first_entry(&event->deliver, + struct iommufd_viommu_irq, node); + void *virq_data = (void *)virq + sizeof(*virq); + + if (virq->irq_len > count - done) + break; + + if (copy_to_user(buf + done, virq_data, virq->irq_len)) { + rc = -EFAULT; + break; + } + done += virq->irq_len; + list_del(&virq->node); + kfree(virq); + } + mutex_unlock(&event->mutex); + + return done == 0 ? rc : done; +} + +static const struct iommufd_event_ops iommufd_event_virq_ops = { + .read = &iommufd_event_virq_fops_read, +}; + /* Common Event Functions */
static ssize_t iommufd_event_fops_read(struct file *filep, char __user *buf, @@ -475,3 +536,78 @@ int iommufd_event_iopf_alloc(struct iommufd_ucmd *ucmd)
return rc; } + +int iommufd_event_virq_alloc(struct iommufd_ucmd *ucmd) +{ + struct iommu_virq_alloc *cmd = ucmd->cmd; + struct iommufd_event_virq *event_virq; + struct workqueue_struct *irq_wq; + struct iommufd_viommu *viommu; + int fdno; + int rc; + + if (cmd->flags) + return -EOPNOTSUPP; + if (cmd->type == IOMMU_VIRQ_TYPE_NONE) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + down_write(&viommu->virqs_rwsem); + + if (iommufd_viommu_find_event_virq(viommu, cmd->type)) { + rc = -EEXIST; + goto out_unlock_virqs; + } + + event_virq = __iommufd_object_alloc(ucmd->ictx, event_virq, + IOMMUFD_OBJ_EVENT_VIRQ, common.obj); + if (IS_ERR(event_virq)) { + rc = PTR_ERR(event_virq); + goto out_unlock_virqs; + } + + irq_wq = alloc_workqueue("viommu_irq/%d", WQ_UNBOUND, 0, + event_virq->common.obj.id); + if (!irq_wq) { + rc = -ENOMEM; + goto out_abort; + } + + rc = iommufd_event_init(&event_virq->common, "[iommufd-viommu-irq]", + ucmd->ictx, &fdno, &iommufd_event_virq_ops); + if (rc) + goto out_irq_wq; + + event_virq->irq_wq = irq_wq; + event_virq->viommu = viommu; + event_virq->type = cmd->type; + cmd->out_virq_id = event_virq->common.obj.id; + cmd->out_virq_fd = fdno; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_put_fdno; + iommufd_object_finalize(ucmd->ictx, &event_virq->common.obj); + + fd_install(fdno, event_virq->common.filep); + + list_add_tail(&event_virq->node, &viommu->virqs); + refcount_inc(&viommu->obj.users); + + goto out_unlock_virqs; +out_put_fdno: + put_unused_fd(fdno); + fput(event_virq->common.filep); + iommufd_event_deinit(&event_virq->common); +out_irq_wq: + destroy_workqueue(irq_wq); +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &event_virq->common.obj); +out_unlock_virqs: + up_write(&viommu->virqs_rwsem); + iommufd_put_object(ucmd->ictx, &viommu->obj); + + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index c22d72c981c7..be1f1813672e 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -133,6 +133,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_EVENT_IOPF, + IOMMUFD_OBJ_EVENT_VIRQ, IOMMUFD_OBJ_VIOMMU, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, @@ -567,6 +568,43 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+struct iommufd_event_virq { + struct iommufd_event common; + struct iommufd_viommu *viommu; + struct workqueue_struct *irq_wq; + struct list_head node; + + unsigned int type; +}; + +static inline struct iommufd_event_virq * +to_event_virq(struct iommufd_event *event) +{ + return container_of(event, struct iommufd_event_virq, common); +} + +static inline struct iommufd_event_virq * +iommufd_get_event_virq(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_EVENT_VIRQ), + struct iommufd_event_virq, common.obj); +} + +int iommufd_event_virq_alloc(struct iommufd_ucmd *ucmd); +void iommufd_event_virq_destroy(struct iommufd_object *obj); + +struct iommufd_viommu_irq { + struct iommufd_event_virq *event_virq; + struct list_head node; + ssize_t irq_len; +}; + +static inline int iommufd_event_virq_handler(struct iommufd_viommu_irq *virq) +{ + return iommufd_event_notify(&virq->event_virq->common, &virq->node); +} + struct iommufd_viommu { struct iommufd_object obj; struct iommufd_ctx *ictx; @@ -575,6 +613,8 @@ struct iommufd_viommu { /* The locking order is vdev_ids_rwsem -> igroup::lock */ struct rw_semaphore vdev_ids_rwsem; struct xarray vdev_ids; + struct rw_semaphore virqs_rwsem; + struct list_head virqs;
const struct iommufd_viommu_ops *ops;
@@ -595,6 +635,20 @@ iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) struct iommufd_viommu, obj); }
+static inline struct iommufd_event_virq * +iommufd_viommu_find_event_virq(struct iommufd_viommu *viommu, u32 type) +{ + struct iommufd_event_virq *event_virq, *next; + + lockdep_assert_held(&viommu->virqs_rwsem); + + list_for_each_entry_safe(event_virq, next, &viommu->virqs, node) { + if (event_virq->type == type) + return event_virq; + } + return NULL; +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 015f492afab1..22381ba031b5 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -361,6 +361,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_event_iopf_alloc, struct iommu_fault_alloc, out_fault_fd), + IOCTL_OP(IOMMU_VIRQ_ALLOC, iommufd_event_virq_alloc, + struct iommu_virq_alloc, out_virq_fd), IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, @@ -528,6 +530,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_EVENT_IOPF] = { .destroy = iommufd_event_iopf_destroy, }, + [IOMMUFD_OBJ_EVENT_VIRQ] = { + .destroy = iommufd_event_virq_destroy, + }, [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index a4ba8bff4a26..9adc9c62ada9 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -67,6 +67,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
xa_init(&viommu->vdev_ids); init_rwsem(&viommu->vdev_ids_rwsem); + INIT_LIST_HEAD(&viommu->virqs); + init_rwsem(&viommu->virqs_rwsem);
refcount_inc(&viommu->hwpt->common.obj.users);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 0d973486b604..f9ec07efed8d 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -54,6 +54,7 @@ enum { IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, + IOMMUFD_CMD_VIRQ_ALLOC = 0x92, };
/** @@ -951,4 +952,35 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) + +/** + * enum iommu_virq_type - Virtual IRQ Type + * @IOMMU_VIRQ_TYPE_NONE: INVALID type + */ +enum iommu_virq_type { + IOMMU_VIRQ_TYPE_NONE = 0, +}; + +/** + * struct iommu_virq_alloc - ioctl(IOMMU_VIRQ_ALLOC) + * @size: sizeof(struct iommu_virq_alloc) + * @flags: Must be 0 + * @viommu: viommu ID to associate the virtual IRQ with + * @type: Type of the virtual IRQ. Must be defined in enum iommu_virq_type + * @out_virq_id: The ID of the new VIRQ + * @out_fault_fd: The fd of the new VIRQ + * + * Explicitly allocate a virtual IRQ handler for a VIOMMU. A VIOMMU can have + * multiple FDs for different @type, but is confined to have only one FD per + * @type. + */ +struct iommu_virq_alloc { + __u32 size; + __u32 flags; + __u32 viommu_id; + __u32 type; + __u32 out_virq_id; + __u32 out_virq_fd; +}; +#define IOMMU_VIRQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIRQ_ALLOC) #endif
The iommufd core provides a lookup helper for an IOMMU driver to find a device pointer by device's per-viommu virtual ID. Yet a driver may need an inverted lookup to find a device's per-viommu virtual ID by a device pointer, e.g. when reporting virtual IRQs/events back to the user space. In this case, it'd be unsafe for iommufd core to do an inverted lookup, as the driver can't track the lifecycle of a viommu object or a vdev_id object.
Meanwhile, some HW can even support virtual device ID lookup by its HW- accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to execute vanilla guest-issued SMMU commands containing virtual Stream ID but requires software to configure a link between virtual Stream ID and physical Stream ID via HW registers. So not only the iommufd core needs a vdev_id lookup table, drivers will want one too.
Given the two justifications above, it's the best practice to provide a a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver can implement them to control a vdev_id's lifecycle, and configure the HW properly if required.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/device.c | 2 ++ drivers/iommu/iommufd/iommufd_private.h | 6 ------ drivers/iommu/iommufd/viommu.c | 23 +++++++++++++++++++---- include/linux/iommufd.h | 13 +++++++++++++ 4 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 3ad759971b32..01bb5c9f415b 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -145,6 +145,8 @@ void iommufd_device_destroy(struct iommufd_object *obj) old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL, GFP_KERNEL); WARN_ON(old != vdev_id); + if (vdev_id->viommu->ops && vdev_id->viommu->ops->unset_vdev_id) + vdev_id->viommu->ops->unset_vdev_id(vdev_id); kfree(vdev_id); idev->vdev_id = NULL; } diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index be1f1813672e..4cb1555991b8 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -621,12 +621,6 @@ struct iommufd_viommu { unsigned int type; };
-struct iommufd_vdev_id { - struct iommufd_viommu *viommu; - struct iommufd_device *idev; - u64 id; -}; - static inline struct iommufd_viommu * iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) { diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 9adc9c62ada9..b1eb900b7fbf 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -13,6 +13,8 @@ void iommufd_viommu_destroy(struct iommufd_object *obj)
xa_for_each(&viommu->vdev_ids, index, vdev_id) { /* Unlocked since there should be no race in a destroy() */ + if (viommu->ops && viommu->ops->unset_vdev_id) + viommu->ops->unset_vdev_id(vdev_id); vdev_id->idev->vdev_id = NULL; kfree(vdev_id); } @@ -116,10 +118,18 @@ int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) goto out_unlock_igroup; }
- vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); - if (!vdev_id) { - rc = -ENOMEM; - goto out_unlock_igroup; + if (viommu->ops && viommu->ops->set_vdev_id) { + vdev_id = viommu->ops->set_vdev_id(viommu, idev->dev, cmd->vdev_id); + if (IS_ERR(vdev_id)) { + rc = PTR_ERR(vdev_id); + goto out_unlock_igroup; + } + } else { + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); + if (!vdev_id) { + rc = -ENOMEM; + goto out_unlock_igroup; + } }
vdev_id->idev = idev; @@ -137,6 +147,8 @@ int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) goto out_unlock_igroup;
out_free: + if (viommu->ops && viommu->ops->unset_vdev_id) + viommu->ops->unset_vdev_id(vdev_id); kfree(vdev_id); out_unlock_igroup: mutex_unlock(&idev->igroup->lock); @@ -185,6 +197,9 @@ int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) rc = xa_err(old); goto out_unlock_igroup; } + + if (viommu->ops && viommu->ops->unset_vdev_id) + viommu->ops->unset_vdev_id(old); kfree(old); idev->vdev_id = NULL;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index f7c265c6de7c..c89583c7c792 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -56,8 +56,18 @@ void iommufd_access_detach(struct iommufd_access *access);
void iommufd_ctx_get(struct iommufd_ctx *ictx);
+struct iommufd_vdev_id { + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 id; +}; + /** * struct iommufd_viommu_ops - viommu specific operations + * @set_vdev_id: Set a virtual device id for a device assigned to a viommu. + * Driver allocates an iommufd_vdev_id and return its pointer. + * @unset_vdev_id: Unset a virtual device id for a device assigned to a viommu. + * iommufd core frees the memory pointed by an iommufd_vdev_id. * @cache_invalidate: Flush hardware cache used by a viommu. It can be used for * any IOMMU hardware specific cache as long as a viommu has * enough information to identify it: for example, a VMID or @@ -69,6 +79,9 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx); * include/uapi/linux/iommufd.h */ struct iommufd_viommu_ops { + struct iommufd_vdev_id *(*set_vdev_id)(struct iommufd_viommu *viommu, + struct device *dev, u64 id); + void (*unset_vdev_id)(struct iommufd_vdev_id *vdev_id); int (*cache_invalidate)(struct iommufd_viommu *viommu, struct iommu_user_data_array *array); };
On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
The iommufd core provides a lookup helper for an IOMMU driver to find a device pointer by device's per-viommu virtual ID. Yet a driver may need an inverted lookup to find a device's per-viommu virtual ID by a device pointer, e.g. when reporting virtual IRQs/events back to the user space. In this case, it'd be unsafe for iommufd core to do an inverted lookup, as the driver can't track the lifecycle of a viommu object or a vdev_id object.
Meanwhile, some HW can even support virtual device ID lookup by its HW- accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to execute vanilla guest-issued SMMU commands containing virtual Stream ID but requires software to configure a link between virtual Stream ID and physical Stream ID via HW registers. So not only the iommufd core needs a vdev_id lookup table, drivers will want one too.
Given the two justifications above, it's the best practice to provide a a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver can implement them to control a vdev_id's lifecycle, and configure the HW properly if required.
I think the lifecycle rules should be much simpler.
If a nested domain is attached to a STE/RID/device then the vIOMMU affiliated with that nested domain is pinned while the STE is in place
So the driver only need to provide locking around attach changing the STE's vIOMMU vs async operations translating from a STE to a vIOMMU. This can be a simple driver lock of some kind, ie a rwlock across the STE table.
Generally that is how all the async events should work, go from the STE to the VIOMMU to a iommufd callback to the iommufd event queue. iommufd will translate the struct device from the driver to an idev_id (or maybe even a vdevid) the same basic way the PRI code works
Need to check the attach struct lifecycle carefully
Jason
Sorry for the late reply. Just sat down and started to look at this series.
On Thu, Sep 05, 2024 at 03:01:19PM -0300, Jason Gunthorpe wrote:
On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
The iommufd core provides a lookup helper for an IOMMU driver to find a device pointer by device's per-viommu virtual ID. Yet a driver may need an inverted lookup to find a device's per-viommu virtual ID by a device pointer, e.g. when reporting virtual IRQs/events back to the user space. In this case, it'd be unsafe for iommufd core to do an inverted lookup, as the driver can't track the lifecycle of a viommu object or a vdev_id object.
Meanwhile, some HW can even support virtual device ID lookup by its HW- accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to execute vanilla guest-issued SMMU commands containing virtual Stream ID but requires software to configure a link between virtual Stream ID and physical Stream ID via HW registers. So not only the iommufd core needs a vdev_id lookup table, drivers will want one too.
Given the two justifications above, it's the best practice to provide a a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver can implement them to control a vdev_id's lifecycle, and configure the HW properly if required.
I think the lifecycle rules should be much simpler.
If a nested domain is attached to a STE/RID/device then the vIOMMU affiliated with that nested domain is pinned while the STE is in place
So the driver only need to provide locking around attach changing the STE's vIOMMU vs async operations translating from a STE to a vIOMMU. This can be a simple driver lock of some kind, ie a rwlock across the STE table.
I was worried about the async between a vdev link (idev<=>vIOMMU) and an regular attach link (idev<=>nested_domain).
Though practically the vdev link wouldn't break until the regular attach link breaks first, it was still not safe from it happening. So, having a master->lock to protect master->vdev could ensure.
Now, with the vdev being an object refcounting idev/vIOMMU objs, I think we can simply pin the vdev in iommufd_hw_pagetable_attach to ensure the vdev won't disappear. Then, a driver wouldn't need to worry about that. [1]
Meanwhile, a nested_domain pins the vIOMMU object in the iommufd level upon allocation. [2]
So, what's missing seems to be the attach between the master->dev and the nested_domain itself [3]
idev <----------- vdev [1] ---------> vIOMMU (dev) ---[3]--> nested_domain ----[2]-----^
A lock, protecting the attach(), which is in parallel with iommu core's group->mutex could help I think?
Generally that is how all the async events should work, go from the STE to the VIOMMU to a iommufd callback to the iommufd event queue. iommufd will translate the struct device from the driver to an idev_id (or maybe even a vdevid) the same basic way the PRI code works
My first attempt was putting the vdev into the attach_handle that was created for PRI code. Yet later on I found the links were too long and some of them weren't safe (perhaps with the new vdev and those pins mentioned above, it worth another try). So letting the driver hold the vdev itself became much simpler.
Thanks Nicolin
On Thu, Sep 05, 2024 at 03:01:19PM -0300, Jason Gunthorpe wrote:
On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
The iommufd core provides a lookup helper for an IOMMU driver to find a device pointer by device's per-viommu virtual ID. Yet a driver may need an inverted lookup to find a device's per-viommu virtual ID by a device pointer, e.g. when reporting virtual IRQs/events back to the user space. In this case, it'd be unsafe for iommufd core to do an inverted lookup, as the driver can't track the lifecycle of a viommu object or a vdev_id object.
Meanwhile, some HW can even support virtual device ID lookup by its HW- accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to execute vanilla guest-issued SMMU commands containing virtual Stream ID but requires software to configure a link between virtual Stream ID and physical Stream ID via HW registers. So not only the iommufd core needs a vdev_id lookup table, drivers will want one too.
Given the two justifications above, it's the best practice to provide a a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver can implement them to control a vdev_id's lifecycle, and configure the HW properly if required.
I think the lifecycle rules should be much simpler.
If a nested domain is attached to a STE/RID/device then the vIOMMU affiliated with that nested domain is pinned while the STE is in place
So the driver only need to provide locking around attach changing the STE's vIOMMU vs async operations translating from a STE to a vIOMMU. This can be a simple driver lock of some kind, ie a rwlock across the STE table.
Generally that is how all the async events should work, go from the STE to the VIOMMU to a iommufd callback to the iommufd event queue. iommufd will translate the struct device from the driver to an idev_id (or maybe even a vdevid) the same basic way the PRI code works
I am trying to draft something following this, and here is what it would look like:
------------------------draft--------------------------- struct arm_smmu_master { .... + struct rw_semaphore lock; + struct arm_vsmmu *vsmmu; .... };
->attach_dev() { down_write(&master->lock); if (domain->type == IOMMU_DOMAIN_NESTED) master->vsmmu = to_smmu_domain(domain)->vsmmu; else master->vsmmu = NULL; up_write(&master->lock); }
isr() { down_read(&master->lock); if (master->vsmmu) { xa_lock(&master->vsmmu->core.vdevs); vdev = iommufd_viommu_dev_to_vdev(&master->vsmmu->core, master->dev); if (vdev) { struct iommu_virq_arm_smmuv3 virq_data = evt;
virq_data.evt[0] &= ~EVTQ_0_SID; virq_data.evt[0] |= FIELD_PREP(EVTQ_0_SID, vdev->id); return iommufd_viommu_report_irq( vdev->viommu, IOMMU_VIRQ_TYPE_ARM_SMMUV3, &virq_data, sizeof(virq_data)); } else { rc = -ENOENT; } xa_unlock(&master->vsmmu->core.vdevs); } up_read(&master->lock); } --------------------------------------------------------
[Comparison] | This v1 | Draft 1. Adds to master | A lock and vdev ptr | A lock and viommu ptr 2. Set/unset ptr | In ->vdevice_alloc/free | In all ->attach_dev 3. Do dev_to_vdev | master->vdev->id | attach_handle?
Both solutions needs a driver-level lock and an extra pointer in the master structure. And both ISR routines require that driver- level lock to avoid race against attach_dev v.s. vdev alloc/free. Overall, taking step.3 into consideration, it seems that letting master lock&hold the vdev pointer (i.e. this v1) is simpler?
As for the implementation of iommufd_viommu_dev_to_vdev(), I read the attach_handle part in the PRI code, yet I don't see the lock that protects the handle returned by iommu_attach_handle_get() in iommu_report_device_fault/find_fault_handler().
And I see the kdoc of iommu_attach_handle_get() mentioning: "Callers are required to synchronize the call of iommu_attach_handle_get() with domain attachment and detachment. The attach handle can only be used during its life cycle." But the caller iommu_report_device_fault() is an async event that cannot guarantee the lifecycle. Would you please shed some light?
Thanks Nicolin
On Wed, Oct 23, 2024 at 12:22:15AM -0700, Nicolin Chen wrote:
On Thu, Sep 05, 2024 at 03:01:19PM -0300, Jason Gunthorpe wrote:
On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
The iommufd core provides a lookup helper for an IOMMU driver to find a device pointer by device's per-viommu virtual ID. Yet a driver may need an inverted lookup to find a device's per-viommu virtual ID by a device pointer, e.g. when reporting virtual IRQs/events back to the user space. In this case, it'd be unsafe for iommufd core to do an inverted lookup, as the driver can't track the lifecycle of a viommu object or a vdev_id object.
Meanwhile, some HW can even support virtual device ID lookup by its HW- accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to execute vanilla guest-issued SMMU commands containing virtual Stream ID but requires software to configure a link between virtual Stream ID and physical Stream ID via HW registers. So not only the iommufd core needs a vdev_id lookup table, drivers will want one too.
Given the two justifications above, it's the best practice to provide a a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver can implement them to control a vdev_id's lifecycle, and configure the HW properly if required.
I think the lifecycle rules should be much simpler.
If a nested domain is attached to a STE/RID/device then the vIOMMU affiliated with that nested domain is pinned while the STE is in place
So the driver only need to provide locking around attach changing the STE's vIOMMU vs async operations translating from a STE to a vIOMMU. This can be a simple driver lock of some kind, ie a rwlock across the STE table.
Generally that is how all the async events should work, go from the STE to the VIOMMU to a iommufd callback to the iommufd event queue. iommufd will translate the struct device from the driver to an idev_id (or maybe even a vdevid) the same basic way the PRI code works
I am trying to draft something following this, and here is what it would look like:
------------------------draft--------------------------- struct arm_smmu_master { ....
- struct rw_semaphore lock;
It would be called vsmmu_lock
- struct arm_vsmmu *vsmmu; ....
};
->attach_dev() { down_write(&master->lock); if (domain->type == IOMMU_DOMAIN_NESTED) master->vsmmu = to_smmu_domain(domain)->vsmmu; else master->vsmmu = NULL; up_write(&master->lock); }
isr() { down_read(&master->lock); if (master->vsmmu) { xa_lock(&master->vsmmu->core.vdevs); vdev = iommufd_viommu_dev_to_vdev(&master->vsmmu->core, master->dev); if (vdev) { struct iommu_virq_arm_smmuv3 virq_data = evt;
virq_data.evt[0] &= ~EVTQ_0_SID; virq_data.evt[0] |= FIELD_PREP(EVTQ_0_SID, vdev->id); return iommufd_viommu_report_irq( vdev->viommu, IOMMU_VIRQ_TYPE_ARM_SMMUV3, &virq_data, sizeof(virq_data)); } else { rc = -ENOENT; } xa_unlock(&master->vsmmu->core.vdevs);
} up_read(&master->lock); }
This looks reasonable
[Comparison] | This v1 | Draft
- Adds to master | A lock and vdev ptr | A lock and viommu ptr
- Set/unset ptr | In ->vdevice_alloc/free | In all ->attach_dev
- Do dev_to_vdev | master->vdev->id | attach_handle?
The set/unset ops have the major issue that they can get out of sync with the domain. The only time things should be routed to the viommu is when a viommu related domain is attached.
The lock on attach can be reduced:
iommu_group_mutex_assert(dev) if (domain->type == IOMMU_DOMAIN_NESTED) new_vsmmu = to_smmu_domain(domain)->vsmmu; else new_vsmmu = NULL; if (new_vsmmu != master->vsmmu) { down_write(&master->lock); master->vsmmu = new_vsmmu; up_write(&master->lock); }
And you'd stick this in prepare or commit..
Both solutions needs a driver-level lock and an extra pointer in the master structure. And both ISR routines require that driver- level lock to avoid race against attach_dev v.s. vdev alloc/free. Overall, taking step.3 into consideration, it seems that letting master lock&hold the vdev pointer (i.e. this v1) is simpler?
I'm not sure the vdev pointer should even be visible to the drivers..
As for the implementation of iommufd_viommu_dev_to_vdev(), I read the attach_handle part in the PRI code, yet I don't see the lock that protects the handle returned by iommu_attach_handle_get() in iommu_report_device_fault/find_fault_handler().
It is the xa_lock and some rules about flushing irqs and work queues before allowing the dev to disappear:
"Callers are required to synchronize the call of iommu_attach_handle_get() with domain attachment and detachment. The attach handle can only be used during its life cycle."
But the caller iommu_report_device_fault() is an async event that cannot guarantee the lifecycle. Would you please shed some light?
The iopf detatch function will act as a barrirer to ensure that all the async work has completed, sort of like how RCU works.
But here, I think it is pretty simple, isn't it?
When you update the master->vsmmu you can query the vsmmu to get the vdev id of that master, then store it in the master struct and forward it to the iommufd_viommu_report_irq(). That could even search the xarray since attach is not a performance path.
Then it is locked under the master->lock
Jason
On Wed, Oct 23, 2024 at 01:59:05PM -0300, Jason Gunthorpe wrote:
[Comparison] | This v1 | Draft
- Adds to master | A lock and vdev ptr | A lock and viommu ptr
- Set/unset ptr | In ->vdevice_alloc/free | In all ->attach_dev
- Do dev_to_vdev | master->vdev->id | attach_handle?
The set/unset ops have the major issue that they can get out of sync with the domain. The only time things should be routed to the viommu is when a viommu related domain is attached.
Ah, I missed that point.
The lock on attach can be reduced:
iommu_group_mutex_assert(dev) if (domain->type == IOMMU_DOMAIN_NESTED) new_vsmmu = to_smmu_domain(domain)->vsmmu; else new_vsmmu = NULL; if (new_vsmmu != master->vsmmu) { down_write(&master->lock); master->vsmmu = new_vsmmu; up_write(&master->lock); }
And you'd stick this in prepare or commit..
Ack.
Both solutions needs a driver-level lock and an extra pointer in the master structure. And both ISR routines require that driver- level lock to avoid race against attach_dev v.s. vdev alloc/free. Overall, taking step.3 into consideration, it seems that letting master lock&hold the vdev pointer (i.e. this v1) is simpler?
I'm not sure the vdev pointer should even be visible to the drivers..
With the iommufd_vdevice_alloc allocator, we already expose the vdevice structure to the drivers, along with the vdevice_alloc vdevice_free ops, which would be easier for the vCMDQ driver to allocate and hold its own pSID/vSID structure.
And vsid_to_psid() requires to look up the viommu->vdevs xarray. If we hid the vDEVICE structure, we'd need something else than the vdev_to_dev(). Maybe iommufd_viommu_find_dev_by_virt_id()?
As for the implementation of iommufd_viommu_dev_to_vdev(), I read the attach_handle part in the PRI code, yet I don't see the lock that protects the handle returned by iommu_attach_handle_get() in iommu_report_device_fault/find_fault_handler().
It is the xa_lock and some rules about flushing irqs and work queues before allowing the dev to disappear:
"Callers are required to synchronize the call of iommu_attach_handle_get() with domain attachment and detachment. The attach handle can only be used during its life cycle."
But the caller iommu_report_device_fault() is an async event that cannot guarantee the lifecycle. Would you please shed some light?
The iopf detatch function will act as a barrirer to ensure that all the async work has completed, sort of like how RCU works.
The xa_lock(&group->pasid_array) is released once the handle is returned to the iommu_attach_handle_get callers, so it protects only for a very short window (T0 below). What if: | detach() | isr=>iommu_report_device_fault() T0 | Get attach_handle [xa_lock] | Get attach_handle [xa_lock] T1 | Clean deliver Q [fault->mutex] | Waiting for fault->mutex T2 | iommufd_eventq_iopf_disable() | Add new fault to the deliver Q T3 | kfree(handle) | ??
But here, I think it is pretty simple, isn't it?
When you update the master->vsmmu you can query the vsmmu to get the vdev id of that master, then store it in the master struct and forward it to the iommufd_viommu_report_irq(). That could even search the xarray since attach is not a performance path.
Then it is locked under the master->lock
Yes! I didn't see that coming. vdev->id must be set before the attach to a nested domain, and can be cleaned after the device detaches. Maybe an attach to vIOMMU-based nested domain should just fail if idev->vdev isn't ready?
Then perhaps we can have a struct arm_smmu_vstream to hold all the things, such as vsmmu pointer and vdev->id. If vCMDQ needs an extra structure, it can be stored there too.
Thanks! Nicolin
On Wed, Oct 23, 2024 at 11:54:54AM -0700, Nicolin Chen wrote:
The iopf detatch function will act as a barrirer to ensure that all the async work has completed, sort of like how RCU works.
The xa_lock(&group->pasid_array) is released once the handle is returned to the iommu_attach_handle_get callers, so it protects only for a very short window (T0 below). What if: | detach() | isr=>iommu_report_device_fault() T0 | Get attach_handle [xa_lock] | Get attach_handle [xa_lock] T1 | Clean deliver Q [fault->mutex] | Waiting for fault->mutex T2 | iommufd_eventq_iopf_disable() | Add new fault to the deliver Q T3 | kfree(handle) | ??
Prior to iommufd_eventq_iopf_disable() the driver has to ensure the threads calling isr->iommu_report_device_fault() are fenced.
New threads that start running cannot see the attach_handle() because it is not in the xarray anymore. Old threads are completed because of the fence.
But here, I think it is pretty simple, isn't it?
When you update the master->vsmmu you can query the vsmmu to get the vdev id of that master, then store it in the master struct and forward it to the iommufd_viommu_report_irq(). That could even search the xarray since attach is not a performance path.
Then it is locked under the master->lock
Yes! I didn't see that coming. vdev->id must be set before the attach to a nested domain, and can be cleaned after the device detaches. Maybe an attach to vIOMMU-based nested domain should just fail if idev->vdev isn't ready?
That would make sense
Jason
This helps drivers to get the dev pointer held by the vdev_id structure.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/viommu_api.c | 14 ++++++++++++++ include/linux/iommufd.h | 7 +++++++ 2 files changed, 21 insertions(+)
diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c index 3772a5892a6c..82eb33e047cf 100644 --- a/drivers/iommu/iommufd/viommu_api.c +++ b/drivers/iommu/iommufd/viommu_api.c @@ -51,3 +51,17 @@ iommufd_viommu_to_parent_domain(struct iommufd_viommu *viommu) return viommu->hwpt->common.domain; } EXPORT_SYMBOL_NS_GPL(iommufd_viommu_to_parent_domain, IOMMUFD); + +/* + * Fetch the dev pointer in the vdev_id structure. Caller must make ensure the + * lifecycle of the vdev_id structure, likely by adding a driver-level lock to + * protect the passed-in vdev_id for any race against a potential unset_vdev_id + * callback. + */ +struct device *iommufd_vdev_id_to_dev(struct iommufd_vdev_id *vdev_id) +{ + if (!vdev_id || !vdev_id->viommu) + return NULL; + return vdev_id->idev->dev; +} +EXPORT_SYMBOL_NS_GPL(iommufd_vdev_id_to_dev, IOMMUFD); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index c89583c7c792..88d6586a424f 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -99,6 +99,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access, unsigned long iova, unsigned long length); int iommufd_access_rw(struct iommufd_access *access, unsigned long iova, void *data, size_t len, unsigned int flags); +struct device *iommufd_vdev_id_to_dev(struct iommufd_vdev_id *vdev_id); int iommufd_vfio_compat_ioas_get_id(struct iommufd_ctx *ictx, u32 *out_ioas_id); int iommufd_vfio_compat_ioas_create(struct iommufd_ctx *ictx); int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx); @@ -138,6 +139,12 @@ static inline int iommufd_access_rw(struct iommufd_access *access, unsigned long return -EOPNOTSUPP; }
+static inline struct device * +iommufd_vdev_id_to_dev(struct iommufd_vdev_id *vdev_id) +{ + return NULL; +} + static inline int iommufd_vfio_compat_ioas_create(struct iommufd_ctx *ictx) { return -EOPNOTSUPP;
This allows IOMMU drivers to report to user space hypervisors IRQs/events that belong to a viommu.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/viommu_api.c | 40 ++++++++++++++++++++++++++++++ include/linux/iommufd.h | 8 ++++++ 2 files changed, 48 insertions(+)
diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c index 82eb33e047cf..d075727a1b38 100644 --- a/drivers/iommu/iommufd/viommu_api.c +++ b/drivers/iommu/iommufd/viommu_api.c @@ -65,3 +65,43 @@ struct device *iommufd_vdev_id_to_dev(struct iommufd_vdev_id *vdev_id) return vdev_id->idev->dev; } EXPORT_SYMBOL_NS_GPL(iommufd_vdev_id_to_dev, IOMMUFD); + +/** + * IOMMU drivers can call this helper to report a per-VIOMMU virtual IRQ. Caller + * must ensure the lifecycle of the viommu object, likely by passing it from a + * vdev_id structure that was set via a set_vdev_id callback and by holding the + * same driver-level lock to protect the passed-in vdev_id from any race against + * a potential unset_vdev_id callback. + */ +void iommufd_viommu_report_irq(struct iommufd_viommu *viommu, unsigned int type, + void *irq_ptr, size_t irq_len) +{ + struct iommufd_event_virq *event_virq; + struct iommufd_viommu_irq *virq; + void *irq_data; + + might_sleep(); + + if (!viommu) + return; + + down_read(&viommu->virqs_rwsem); + + event_virq = iommufd_viommu_find_event_virq(viommu, type); + if (!event_virq) + goto out_unlock_vdev_ids; + + virq = kzalloc(sizeof(*virq) + irq_len, GFP_KERNEL); + if (!virq) + goto out_unlock_vdev_ids; + irq_data = (void *)virq + sizeof(*virq); + memcpy(irq_data, irq_ptr, irq_len); + + virq->event_virq = event_virq; + virq->irq_len = irq_len; + + iommufd_event_virq_handler(virq); +out_unlock_vdev_ids: + up_read(&viommu->virqs_rwsem); +} +EXPORT_SYMBOL_NS_GPL(iommufd_viommu_report_irq, IOMMUFD); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 88d6586a424f..346a6257ed0c 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -108,6 +108,8 @@ void iommufd_viommu_unlock_vdev_id(struct iommufd_viommu *viommu); struct device *iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id); struct iommu_domain * iommufd_viommu_to_parent_domain(struct iommufd_viommu *viommu); +void iommufd_viommu_report_irq(struct iommufd_viommu *viommu, unsigned int type, + void *irq_ptr, size_t irq_len); #else /* !CONFIG_IOMMUFD */ static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file) { @@ -173,5 +175,11 @@ iommufd_viommu_to_parent_domain(struct iommufd_viommu *viommu) { return NULL; } + +static inline void +iommufd_viommu_report_irq(struct iommufd_viommu *viommu, unsigned int type, + void *irq_ptr, size_t irq_len) +{ +} #endif /* CONFIG_IOMMUFD */ #endif
So that the driver can take the control of vdev_id's lifecycle. This will be used by the VIRQ feature in the following patches.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/selftest.c | 36 ++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index f512874105ac..ea2861d34b4a 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -137,6 +137,8 @@ enum selftest_obj_type {
struct mock_dev { struct device dev; + struct mutex lock; + struct iommufd_vdev_id *vdev_id; unsigned long flags; int id; u32 cache[MOCK_DEV_CACHE_NUM]; @@ -541,6 +543,36 @@ static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features fea return 0; }
+static struct iommufd_vdev_id * +mock_viommu_set_vdev_id(struct iommufd_viommu *viommu, struct device *dev, + u64 id) +{ + struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct iommufd_vdev_id *vdev_id; + + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); + if (!vdev_id) + return ERR_PTR(-ENOMEM); + + mutex_lock(&mdev->lock); + mdev->vdev_id = vdev_id; + mutex_unlock(&mdev->lock); + + return vdev_id; +} + +static void mock_viommu_unset_vdev_id(struct iommufd_vdev_id *vdev_id) +{ + struct device *dev = iommufd_vdev_id_to_dev(vdev_id); + struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + + mutex_lock(&mdev->lock); + mdev->vdev_id = NULL; + mutex_unlock(&mdev->lock); + + /* IOMMUFD core frees the memory of vdev_id */ +} + static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, struct iommu_user_data_array *array) { @@ -636,6 +668,8 @@ static const struct iommu_ops mock_ops = { .unmap_pages = mock_domain_unmap_pages, .iova_to_phys = mock_domain_iova_to_phys, .default_viommu_ops = &(struct iommufd_viommu_ops){ + .set_vdev_id = mock_viommu_set_vdev_id, + .unset_vdev_id = mock_viommu_unset_vdev_id, .cache_invalidate = mock_viommu_cache_invalidate, }, }, @@ -757,6 +791,7 @@ static void mock_dev_release(struct device *dev) struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
ida_free(&mock_dev_ida, mdev->id); + mutex_destroy(&mdev->lock); kfree(mdev); }
@@ -773,6 +808,7 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags) if (!mdev) return ERR_PTR(-ENOMEM);
+ mutex_init(&mdev->lock); device_initialize(&mdev->dev); mdev->flags = dev_flags; mdev->dev.release = mock_dev_release;
The handler will get vdev_id structure from the given mdev and convert it to its per-viommu virtual device ID to mimic a real IOMMU driver.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 10 ++++++++++ drivers/iommu/iommufd/selftest.c | 30 ++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 56bade6146ff..736ae5f8152e 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -24,6 +24,7 @@ enum { IOMMU_TEST_OP_MD_CHECK_IOTLB, IOMMU_TEST_OP_TRIGGER_IOPF, IOMMU_TEST_OP_DEV_CHECK_CACHE, + IOMMU_TEST_OP_TRIGGER_VIRQ, };
enum { @@ -145,6 +146,9 @@ struct iommu_test_cmd { __u32 id; __u32 cache; } check_dev_cache; + struct { + __u32 dev_id; + } trigger_virq; }; __u32 last; }; @@ -210,4 +214,10 @@ struct iommu_viommu_invalidate_selftest { __u32 cache_id; };
+#define IOMMU_VIRQ_TYPE_SELFTEST 0xbeefbeef + +struct iommu_viommu_irq_selftest { + __u32 vdev_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index ea2861d34b4a..75fccd466018 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1560,6 +1560,34 @@ static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd, return 0; }
+static int iommufd_test_trigger_virq(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct iommufd_device *idev; + struct mock_dev *mdev; + + idev = iommufd_get_device(ucmd, cmd->trigger_virq.dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + mdev = container_of(idev->dev, struct mock_dev, dev); + + mutex_lock(&mdev->lock); + if (mdev->vdev_id) { + struct iommu_viommu_irq_selftest test = { + .vdev_id = mdev->vdev_id->id, + }; + + iommufd_viommu_report_irq(mdev->vdev_id->viommu, + IOMMU_VIRQ_TYPE_SELFTEST, + &test, sizeof(test)); + } + mutex_unlock(&mdev->lock); + + iommufd_put_object(ucmd->ictx, &idev->obj); + + return 0; +} + void iommufd_selftest_destroy(struct iommufd_object *obj) { struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); @@ -1641,6 +1669,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->dirty.flags); case IOMMU_TEST_OP_TRIGGER_IOPF: return iommufd_test_trigger_iopf(ucmd, cmd); + case IOMMU_TEST_OP_TRIGGER_VIRQ: + return iommufd_test_trigger_virq(ucmd, cmd); default: return -EOPNOTSUPP; }
Trigger an IRQ giving an idev ID, to test the loopback whether receiving or not the vdev_id that was set to the idev by the line above.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd.c | 11 ++++ tools/testing/selftests/iommu/iommufd_utils.h | 64 +++++++++++++++++++ 2 files changed, 75 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 6f1014cc208b..11208f53fdce 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -564,6 +564,8 @@ TEST_F(iommufd_ioas, viommu_default) uint32_t nested_hwpt_id = 0, hwpt_id = 0; uint32_t dev_id = self->device_id; uint32_t viommu_id = 0; + uint32_t virq_id; + uint32_t virq_fd;
if (dev_id) { /* Negative test -- invalid hwpt */ @@ -595,16 +597,25 @@ TEST_F(iommufd_ioas, viommu_default) sizeof(data)); test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id);
+ test_cmd_virq_alloc(viommu_id, IOMMU_VIRQ_TYPE_SELFTEST, + &virq_id, &virq_fd); + test_err_virq_alloc(EEXIST, viommu_id, IOMMU_VIRQ_TYPE_SELFTEST, + &virq_id, &virq_fd); + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ test_cmd_viommu_set_vdev_id(viommu_id, dev_id, 0x99); + test_cmd_trigger_virq(dev_id, virq_fd, 0x99); test_err_viommu_set_vdev_id(EEXIST, viommu_id, dev_id, 0x99); test_err_viommu_unset_vdev_id(EINVAL, viommu_id, dev_id, 0x88); test_cmd_viommu_unset_vdev_id(viommu_id, dev_id, 0x99); test_cmd_viommu_set_vdev_id(viommu_id, dev_id, 0x88); + test_cmd_trigger_virq(dev_id, virq_fd, 0x88); + close(virq_fd);
test_cmd_mock_domain_replace(self->stdev_id, hwpt_id); test_ioctl_destroy(nested_hwpt_id); test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(virq_id); test_ioctl_destroy(viommu_id); test_ioctl_destroy(hwpt_id); } else { diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 0a81827b903f..9fec38f45e0e 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -9,6 +9,7 @@ #include <sys/ioctl.h> #include <stdint.h> #include <assert.h> +#include <poll.h>
#include "../kselftest_harness.h" #include "../../../../drivers/iommu/iommufd/iommufd_test.h" @@ -888,3 +889,66 @@ static int _test_cmd_viommu_unset_vdev_id(int fd, __u32 viommu_id, EXPECT_ERRNO(_errno, \ _test_cmd_viommu_unset_vdev_id(self->fd, viommu_id, \ idev_id, vdev_id)) + +static int _test_ioctl_virq_alloc(int fd, __u32 viommu_id, __u32 type, + __u32 *virq_id, __u32 *virq_fd) +{ + struct iommu_virq_alloc cmd = { + .size = sizeof(cmd), + .type = type, + .viommu_id = viommu_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VIRQ_ALLOC, &cmd); + if (ret) + return ret; + if (virq_id) + *virq_id = cmd.out_virq_id; + if (virq_fd) + *virq_fd = cmd.out_virq_fd; + return 0; +} + +#define test_cmd_virq_alloc(viommu_id, type, virq_id, virq_fd) \ + ASSERT_EQ(0, _test_ioctl_virq_alloc(self->fd, viommu_id, type, \ + virq_id, virq_fd)) +#define test_err_virq_alloc(_errno, viommu_id, type, virq_id, virq_fd) \ + EXPECT_ERRNO(_errno, \ + _test_ioctl_virq_alloc(self->fd, viommu_id, type, \ + virq_id, virq_fd)) + +static int _test_cmd_trigger_virq(int fd, __u32 dev_id, + __u32 event_fd, __u32 vdev_id) +{ + struct iommu_test_cmd trigger_virq_cmd = { + .size = sizeof(trigger_virq_cmd), + .op = IOMMU_TEST_OP_TRIGGER_VIRQ, + .trigger_virq = { + .dev_id = dev_id, + }, + }; + struct pollfd pollfd = { .fd = event_fd, .events = POLLIN }; + struct iommu_viommu_irq_selftest irq; + ssize_t bytes; + int ret; + + ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VIRQ), + &trigger_virq_cmd); + if (ret) + return ret; + + ret = poll(&pollfd, 1, 1000); + if (ret < 0) + return ret; + + bytes = read(event_fd, &irq, sizeof(irq)); + if (bytes <= 0) + return -EIO; + + return irq.vdev_id == vdev_id ? 0 : -EINVAL; +} + +#define test_cmd_trigger_virq(dev_id, event_fd, vdev_id) \ + ASSERT_EQ(0, _test_cmd_trigger_virq(self->fd, dev_id, \ + event_fd, vdev_id))
Aside from the IOPF framework, iommufd provides an additional pathway to report a hardware event or IRQ, via the VIRQ of VIOMMU infrastructure.
Implement the set/unset_vdev_id viommu ops, to take control of vdev_id's lifecycle. Lock it properly so the threaded IRQ handler can read out the viommu pointer and the virtual SID, to call iommufd_viommu_report_irq().
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 109 +++++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 + include/uapi/linux/iommufd.h | 14 +++ 3 files changed, 97 insertions(+), 28 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ad43351145d0..28daa0b253c6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1810,6 +1810,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) { int ret = 0; u32 perm = 0; + struct iommu_domain *domain; struct arm_smmu_master *master; bool ssid_valid = evt[0] & EVTQ_0_SSV; u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); @@ -1830,41 +1831,59 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) if (evt[1] & EVTQ_1_S2) return -EFAULT;
- if (!(evt[1] & EVTQ_1_STALL)) - return -EOPNOTSUPP; - - if (evt[1] & EVTQ_1_RnW) - perm |= IOMMU_FAULT_PERM_READ; - else - perm |= IOMMU_FAULT_PERM_WRITE; - - if (evt[1] & EVTQ_1_InD) - perm |= IOMMU_FAULT_PERM_EXEC; - - if (evt[1] & EVTQ_1_PnU) - perm |= IOMMU_FAULT_PERM_PRIV; - - flt->type = IOMMU_FAULT_PAGE_REQ; - flt->prm = (struct iommu_fault_page_request) { - .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, - .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]), - .perm = perm, - .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]), - }; - - if (ssid_valid) { - flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; - flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]); - } - mutex_lock(&smmu->streams_mutex); master = arm_smmu_find_master(smmu, sid); if (!master) { ret = -EINVAL; goto out_unlock; } + domain = iommu_get_domain_for_dev(master->dev); + + if (evt[1] & EVTQ_1_STALL) { + if (evt[1] & EVTQ_1_RnW) + perm |= IOMMU_FAULT_PERM_READ; + else + perm |= IOMMU_FAULT_PERM_WRITE; + + if (evt[1] & EVTQ_1_InD) + perm |= IOMMU_FAULT_PERM_EXEC; + + if (evt[1] & EVTQ_1_PnU) + perm |= IOMMU_FAULT_PERM_PRIV; + + flt->type = IOMMU_FAULT_PAGE_REQ; + flt->prm = (struct iommu_fault_page_request) { + .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, + .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]), + .perm = perm, + .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]), + }; + + if (ssid_valid) { + flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; + flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]); + } + + iommu_report_device_fault(master->dev, &fault_evt); + } else if (domain && domain->type == IOMMU_DOMAIN_NESTED) { + mutex_lock(&master->lock); + if (master->vdev_id) { + struct iommu_virq_arm_smmuv3 virq_data = + *(struct iommu_virq_arm_smmuv3 *)evt;
- iommu_report_device_fault(master->dev, &fault_evt); + virq_data.evt[0] &= ~EVTQ_0_SID; + virq_data.evt[0] |= + FIELD_PREP(EVTQ_0_SID, master->vdev_id->id); + + iommufd_viommu_report_irq(master->vdev_id->viommu, + IOMMU_VIRQ_TYPE_ARM_SMMUV3, + &virq_data, sizeof(virq_data)); + } + mutex_unlock(&master->lock); + } else { + /* Unhandled events should be pinned */ + ret = -EFAULT; + } out_unlock: mutex_unlock(&smmu->streams_mutex); return ret; @@ -3750,6 +3769,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
master->dev = dev; master->smmu = smmu; + mutex_init(&master->lock); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); @@ -3802,6 +3822,7 @@ static void arm_smmu_release_device(struct device *dev) arm_smmu_remove_master(master); if (arm_smmu_cdtab_allocated(&master->cd_table)) arm_smmu_free_cd_tables(master); + mutex_destroy(&master->lock); kfree(master); }
@@ -3937,6 +3958,36 @@ static int arm_smmu_def_domain_type(struct device *dev) return 0; }
+static struct iommufd_vdev_id * +arm_smmu_viommu_set_vdev_id(struct iommufd_viommu *viommu, struct device *dev, + u64 id) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct iommufd_vdev_id *vdev_id; + + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); + if (!vdev_id) + return ERR_PTR(-ENOMEM); + + mutex_lock(&master->lock); + master->vdev_id = vdev_id; + mutex_unlock(&master->lock); + + return vdev_id; +} + +static void arm_smmu_viommu_unset_vdev_id(struct iommufd_vdev_id *vdev_id) +{ + struct device *dev = iommufd_vdev_id_to_dev(vdev_id); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + mutex_lock(&master->lock); + master->vdev_id = NULL; + mutex_unlock(&master->lock); + + /* IOMMUFD core frees the memory of vdev_id */ +} + static int arm_smmu_viommu_cache_invalidate(struct iommufd_viommu *viommu, struct iommu_user_data_array *array) { @@ -3977,6 +4028,8 @@ static struct iommu_ops arm_smmu_ops = { .iova_to_phys = arm_smmu_iova_to_phys, .free = arm_smmu_domain_free_paging, .default_viommu_ops = &(const struct iommufd_viommu_ops) { + .set_vdev_id = arm_smmu_viommu_set_vdev_id, + .unset_vdev_id = arm_smmu_viommu_unset_vdev_id, .cache_invalidate = arm_smmu_viommu_cache_invalidate, } } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 6930810b85cb..5d20a67683e6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -788,6 +788,8 @@ struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; struct arm_smmu_stream *streams; + struct mutex lock; + struct iommufd_vdev_id *vdev_id; /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; unsigned int num_streams; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index f9ec07efed8d..1dc2c0b05af7 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -956,9 +956,23 @@ struct iommu_viommu_unset_vdev_id { /** * enum iommu_virq_type - Virtual IRQ Type * @IOMMU_VIRQ_TYPE_NONE: INVALID type + * @IOMMU_VIRQ_TYPE_ARM_SMMUV3: ARM SMMUv3 Virtual Event */ enum iommu_virq_type { IOMMU_VIRQ_TYPE_NONE = 0, + IOMMU_VIRQ_TYPE_ARM_SMMUV3 = 1, +}; + +/** + * struct iommu_virq_arm_smmuv3 - ARM SMMUv3 Virtual IRQ + * (IOMMU_VIRQ_TYPE_ARM_SMMUV3) + * @evt: 256-bit ARM SMMUv3 Event record, little-endian. + * + * StreamID field reports a virtual device ID. To receive a virtual IRQ for a + * device, it must set its virtual device ID via IOMMU_VIOMMU_SET_VDEV_ID. + */ +struct iommu_virq_arm_smmuv3 { + __aligned_u64 evt[4]; };
/**
linux-kselftest-mirror@lists.linaro.org