This is the basic functionality for iommufd to support iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the device to a new IOAS or HWPT. Replace does this in way that failure leaves things unchanged, and utilizes the iommu iommu_group_replace_domain() API to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and used by attach or replace. At this point it isn't very useful since the HWPT is the same as the automatically managed HWPT from the IOAS. However a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some per-iommu_group memory in iommufd and redo how we think about multi-device groups to be more explicit. This solves all the locking problems in the prior attempts.
This series is infrastructure work for the following series which: - Add replace for attach - Expose replace through VFIO APIs - Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and accumulate the following series when they are ready so we can have a stable base and make more incremental progress. When we have all the parts together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v7: - Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas() v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com - Go back to the v4 locking arragnment with now both the attach/detach igroup->locks inside the functions, Kevin says he needs this for a followup series. This still fixes the syzkaller bug - Fix two more error unwind locking bugs where iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked. Make sure fail_nth will catch these mistakes - Add a patch allowing objects to have different abort than destroy function, it allows hwpt abort to require the caller to continue to hold the lock and enforces this with lockdep. v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com - Go back to the v3 version of the code, keep the comment changes from v4. Syzkaller says the group lock change in v4 didn't work. - Adjust the fail_nth test to cover the path syzkaller found. We need to have an ioas with a mapped page installed to inject a failure during domain attachment. v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.co... - Refine comments and commit messages - Move the group lock into iommufd_hw_pagetable_attach() - Fix error unwind in iommufd_device_do_replace() v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com - Refine comments and commit messages - Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only set on success - Reject replace on non-attached devices - Add missing __reserved check for IOMMU_HWPT_ALLOC v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.co... - Use WARN_ON for the igroup->group test and move that logic to a function iommufd_group_try_get() - Change igroup->devices to igroup->device list Replace will need to iterate over all attached idevs - Rename to iommufd_group_setup_msi() - New patch to export iommu_get_resv_regions() - New patch to use per-device reserved regions instead of per-group regions - Split out the reorganizing of iommufd_device_change_pt() from the replace patch - Replace uses the per-dev reserved regions - Use stdev_id in a few more places in the selftest - Fix error handling in IOMMU_HWPT_ALLOC - Clarify comments - Rebase on v6.3-rc1 v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia....
Jason Gunthorpe (17): iommufd: Move isolated msi enforcement to iommufd_device_bind() iommufd: Add iommufd_group iommufd: Replace the hwpt->devices list with iommufd_group iommu: Export iommu_get_resv_regions() iommufd: Keep track of each device's reserved regions instead of groups iommufd: Use the iommufd_group to avoid duplicate MSI setup iommufd: Make sw_msi_start a group global iommufd: Move putting a hwpt to a helper function iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc() iommufd: Allow a hwpt to be aborted after allocation iommufd: Fix locking around hwpt allocation iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt iommufd: Add iommufd_device_replace() iommufd: Make destroy_rwsem use a lock class per object type iommufd: Add IOMMU_HWPT_ALLOC iommufd/selftest: Return the real idev id from selftest mock_domain iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2): iommu: Introduce a new iommu_group_replace_domain() API iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 + drivers/iommu/iommu.c | 41 +- drivers/iommu/iommufd/device.c | 553 +++++++++++++----- drivers/iommu/iommufd/hw_pagetable.c | 112 +++- drivers/iommu/iommufd/io_pagetable.c | 32 +- drivers/iommu/iommufd/iommufd_private.h | 52 +- drivers/iommu/iommufd/iommufd_test.h | 6 + drivers/iommu/iommufd/main.c | 24 +- drivers/iommu/iommufd/selftest.c | 40 ++ include/linux/iommufd.h | 1 + include/uapi/linux/iommufd.h | 26 + tools/testing/selftests/iommu/iommufd.c | 67 ++- .../selftests/iommu/iommufd_fail_nth.c | 67 ++- tools/testing/selftests/iommu/iommufd_utils.h | 63 +- 14 files changed, 868 insertions(+), 226 deletions(-) create mode 100644 drivers/iommu/iommu-priv.h
base-commit: f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6
With the recent rework this no longer needs to be done at domain attachment time, we know if the device is usable by iommufd when we bind it.
The value of msi_device_has_isolated_msi() is not allowed to change while a driver is bound.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 38 ++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 4f9b2142274ceb..927f2051715f64 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -60,6 +60,26 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, if (!group) return ERR_PTR(-ENODEV);
+ /* + * For historical compat with VFIO the insecure interrupt path is + * allowed if the module parameter is set. Secure/Isolated means that a + * MemWr operation from the device (eg a simple DMA) cannot trigger an + * interrupt outside this iommufd context. + */ + if (!iommufd_selftest_is_mock_dev(dev) && + !iommu_group_has_isolated_msi(group)) { + if (!allow_unsafe_interrupts) { + rc = -EPERM; + goto out_group_put; + } + + dev_warn( + dev, + "MSI interrupts are not secure, they cannot be isolated by the platform. " + "Check that platform features like interrupt remapping are enabled. " + "Use the "allow_unsafe_interrupts" module parameter to override\n"); + } + rc = iommu_device_claim_dma_owner(dev, ictx); if (rc) goto out_group_put; @@ -146,24 +166,6 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev, */ hwpt->msi_cookie = true; } - - /* - * For historical compat with VFIO the insecure interrupt path is - * allowed if the module parameter is set. Insecure means that a MemWr - * operation from the device (eg a simple DMA) cannot trigger an - * interrupt outside this iommufd context. - */ - if (!iommufd_selftest_is_mock_dev(idev->dev) && - !iommu_group_has_isolated_msi(idev->group)) { - if (!allow_unsafe_interrupts) - return -EPERM; - - dev_warn( - idev->dev, - "MSI interrupts are not secure, they cannot be isolated by the platform. " - "Check that platform features like interrupt remapping are enabled. " - "Use the "allow_unsafe_interrupts" module parameter to override\n"); - } return 0; }
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
With the recent rework this no longer needs to be done at domain attachment time, we know if the device is usable by iommufd when we bind it.
The value of msi_device_has_isolated_msi() is not allowed to change while a driver is bound.
Reviewed-by: Kevin Tiankevin.tian@intel.com Signed-off-by: Jason Gunthorpejgg@nvidia.com
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com
Best regards, baolu
When the hwpt to device attachment is fairly static we could get away with the simple approach of keeping track of the groups via a device list. But with replace this is infeasible.
Add an automatically managed struct that is 1:1 with the iommu_group per-ictx so we can store the necessary tracking information there.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 141 +++++++++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 9 +- drivers/iommu/iommufd/main.c | 2 + 3 files changed, 135 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 927f2051715f64..8c27f6901446e8 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -15,13 +15,121 @@ MODULE_PARM_DESC( "Allow IOMMUFD to bind to devices even if the platform cannot isolate " "the MSI interrupt window. Enabling this is a security weakness.");
+static void iommufd_group_release(struct kref *kref) +{ + struct iommufd_group *igroup = + container_of(kref, struct iommufd_group, ref); + + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, + NULL, GFP_KERNEL); + iommu_group_put(igroup->group); + kfree(igroup); +} + +static void iommufd_put_group(struct iommufd_group *group) +{ + kref_put(&group->ref, iommufd_group_release); +} + +static bool iommufd_group_try_get(struct iommufd_group *igroup, + struct iommu_group *group) +{ + if (!igroup) + return false; + /* + * group ID's cannot be re-used until the group is put back which does + * not happen if we could get an igroup pointer under the xa_lock. + */ + if (WARN_ON(igroup->group != group)) + return false; + return kref_get_unless_zero(&igroup->ref); +} + +/* + * iommufd needs to store some more data for each iommu_group, we keep a + * parallel xarray indexed by iommu_group id to hold this instead of putting it + * in the core structure. To keep things simple the iommufd_group memory is + * unique within the iommufd_ctx. This makes it easy to check there are no + * memory leaks. + */ +static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx, + struct device *dev) +{ + struct iommufd_group *new_igroup; + struct iommufd_group *cur_igroup; + struct iommufd_group *igroup; + struct iommu_group *group; + unsigned int id; + + group = iommu_group_get(dev); + if (!group) + return ERR_PTR(-ENODEV); + + id = iommu_group_id(group); + + xa_lock(&ictx->groups); + igroup = xa_load(&ictx->groups, id); + if (iommufd_group_try_get(igroup, group)) { + xa_unlock(&ictx->groups); + iommu_group_put(group); + return igroup; + } + xa_unlock(&ictx->groups); + + new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL); + if (!new_igroup) { + iommu_group_put(group); + return ERR_PTR(-ENOMEM); + } + + kref_init(&new_igroup->ref); + /* group reference moves into new_igroup */ + new_igroup->group = group; + + /* + * The ictx is not additionally refcounted here becase all objects using + * an igroup must put it before their destroy completes. + */ + new_igroup->ictx = ictx; + + /* + * We dropped the lock so igroup is invalid. NULL is a safe and likely + * value to assume for the xa_cmpxchg algorithm. + */ + cur_igroup = NULL; + xa_lock(&ictx->groups); + while (true) { + igroup = __xa_cmpxchg(&ictx->groups, id, cur_igroup, new_igroup, + GFP_KERNEL); + if (xa_is_err(igroup)) { + xa_unlock(&ictx->groups); + iommufd_put_group(new_igroup); + return ERR_PTR(xa_err(igroup)); + } + + /* new_group was successfully installed */ + if (cur_igroup == igroup) { + xa_unlock(&ictx->groups); + return new_igroup; + } + + /* Check again if the current group is any good */ + if (iommufd_group_try_get(igroup, group)) { + xa_unlock(&ictx->groups); + iommufd_put_group(new_igroup); + return igroup; + } + cur_igroup = igroup; + } +} + void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj);
iommu_device_release_dma_owner(idev->dev); - iommu_group_put(idev->group); + iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); } @@ -46,7 +154,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id) { struct iommufd_device *idev; - struct iommu_group *group; + struct iommufd_group *igroup; int rc;
/* @@ -56,9 +164,9 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) return ERR_PTR(-EINVAL);
- group = iommu_group_get(dev); - if (!group) - return ERR_PTR(-ENODEV); + igroup = iommufd_get_group(ictx, dev); + if (IS_ERR(igroup)) + return ERR_CAST(igroup);
/* * For historical compat with VFIO the insecure interrupt path is @@ -67,7 +175,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, * interrupt outside this iommufd context. */ if (!iommufd_selftest_is_mock_dev(dev) && - !iommu_group_has_isolated_msi(group)) { + !iommu_group_has_isolated_msi(igroup->group)) { if (!allow_unsafe_interrupts) { rc = -EPERM; goto out_group_put; @@ -97,8 +205,8 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); /* The calling driver is a user until iommufd_device_unbind() */ refcount_inc(&idev->obj.users); - /* group refcount moves into iommufd_device */ - idev->group = group; + /* igroup refcount moves into iommufd_device */ + idev->igroup = igroup;
/* * If the caller fails after this success it must call @@ -113,7 +221,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, out_release_owner: iommu_device_release_dma_owner(dev); out_group_put: - iommu_group_put(group); + iommufd_put_group(igroup); return ERR_PTR(rc); } EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD); @@ -170,14 +278,14 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev, }
static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt, - struct iommu_group *group) + struct iommufd_group *igroup) { struct iommufd_device *cur_dev;
lockdep_assert_held(&hwpt->devices_lock);
list_for_each_entry(cur_dev, &hwpt->devices, devices_item) - if (cur_dev->group == group) + if (cur_dev->igroup->group == igroup->group) return true; return false; } @@ -211,7 +319,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, }
rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev, - idev->group, &sw_msi_start); + idev->igroup->group, + &sw_msi_start); if (rc) return rc;
@@ -223,8 +332,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * FIXME: Hack around missing a device-centric iommu api, only attach to * the group once for the first device that is in the group. */ - if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) { - rc = iommu_attach_group(hwpt->domain, idev->group); + if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) { + rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv; } @@ -237,8 +346,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { - if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) - iommu_detach_group(hwpt->domain, idev->group); + if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) + iommu_detach_group(hwpt->domain, idev->igroup->group); iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); }
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index b38e67d1988bdb..f45615f19798e6 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -17,6 +17,7 @@ struct iommufd_device; struct iommufd_ctx { struct file *file; struct xarray objects; + struct xarray groups;
u8 account_mode; /* Compatibility with VFIO no iommu */ @@ -262,6 +263,12 @@ void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+struct iommufd_group { + struct kref ref; + struct iommufd_ctx *ictx; + struct iommu_group *group; +}; + /* * A iommufd_device object represents the binding relationship between a * consuming driver and the iommufd. These objects are created/destroyed by @@ -270,12 +277,12 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; + struct iommufd_group *igroup; struct iommufd_hw_pagetable *hwpt; /* Head at iommufd_hw_pagetable::devices */ struct list_head devices_item; /* always the physical device */ struct device *dev; - struct iommu_group *group; bool enforce_cache_coherency; };
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 3fbe636c3d8a69..e5ed5dfa91a0b5 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -183,6 +183,7 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp) }
xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT); + xa_init(&ictx->groups); ictx->file = filp; filp->private_data = ictx; return 0; @@ -218,6 +219,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp) if (WARN_ON(!destroyed)) break; } + WARN_ON(!xa_empty(&ictx->groups)); kfree(ictx); return 0; }
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); iommu_device_release_dma_owner(idev->dev);
- iommu_group_put(idev->group);
- iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); }
@@ -46,7 +154,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id) { struct iommufd_device *idev;
- struct iommu_group *group;
- struct iommufd_group *igroup; int rc;
/* @@ -56,9 +164,9 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) return ERR_PTR(-EINVAL);
- group = iommu_group_get(dev);
- if (!group)
return ERR_PTR(-ENODEV);
- igroup = iommufd_get_group(ictx, dev);
- if (IS_ERR(igroup))
return ERR_CAST(igroup);
/* * For historical compat with VFIO the insecure interrupt path is
Hi Jason,
Perhaps I am asking a silly question. The iommufd_group is get in iommufd_device_bind(), but put in iommufd_device_destroy(). Why not put it in iommufd_device_unbind()?
Best regards, baolu
On Tue, May 16, 2023 at 10:43:23AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); iommu_device_release_dma_owner(idev->dev);
- iommu_group_put(idev->group);
- iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); }
@@ -46,7 +154,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id) { struct iommufd_device *idev;
- struct iommu_group *group;
- struct iommufd_group *igroup; int rc; /*
@@ -56,9 +164,9 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) return ERR_PTR(-EINVAL);
- group = iommu_group_get(dev);
- if (!group)
return ERR_PTR(-ENODEV);
- igroup = iommufd_get_group(ictx, dev);
- if (IS_ERR(igroup))
/*return ERR_CAST(igroup);
- For historical compat with VFIO the insecure interrupt path is
Hi Jason,
Perhaps I am asking a silly question. The iommufd_group is get in iommufd_device_bind(), but put in iommufd_device_destroy(). Why not put it in iommufd_device_unbind()?
It basically is like that, iommufd_device_destroy() is a helper that is only called by iommufd_device_unbind() through the usual destruction mechanism.
Jason
On 5/16/23 8:54 PM, Jason Gunthorpe wrote:
On Tue, May 16, 2023 at 10:43:23AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); iommu_device_release_dma_owner(idev->dev);
- iommu_group_put(idev->group);
- iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); }
@@ -46,7 +154,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id) { struct iommufd_device *idev;
- struct iommu_group *group;
- struct iommufd_group *igroup; int rc; /*
@@ -56,9 +164,9 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) return ERR_PTR(-EINVAL);
- group = iommu_group_get(dev);
- if (!group)
return ERR_PTR(-ENODEV);
- igroup = iommufd_get_group(ictx, dev);
- if (IS_ERR(igroup))
/*return ERR_CAST(igroup);
- For historical compat with VFIO the insecure interrupt path is
Hi Jason,
Perhaps I am asking a silly question. The iommufd_group is get in iommufd_device_bind(), but put in iommufd_device_destroy(). Why not put it in iommufd_device_unbind()?
It basically is like that, iommufd_device_destroy() is a helper that is only called by iommufd_device_unbind() through the usual destruction mechanism.
Oh, yes. Thank you for the explanation.
Best regards, baolu
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group.
For destruction the flow is organized to make the following patches easier, the actual call to iommufd_object_destroy_user() is done at the top of the call chain without holding any locks. The HWPT to be destroyed is returned out from the locked region to make this possible. Later patches create locking that requires this.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 100 +++++++++++------------- drivers/iommu/iommufd/hw_pagetable.c | 22 +----- drivers/iommu/iommufd/iommufd_private.h | 13 ++- 3 files changed, 54 insertions(+), 81 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 8c27f6901446e8..23ebf2065e20de 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -20,9 +20,12 @@ static void iommufd_group_release(struct kref *kref) struct iommufd_group *igroup = container_of(kref, struct iommufd_group, ref);
+ WARN_ON(igroup->hwpt || !list_empty(&igroup->device_list)); + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, NULL, GFP_KERNEL); iommu_group_put(igroup->group); + mutex_destroy(&igroup->lock); kfree(igroup); }
@@ -83,6 +86,8 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx, }
kref_init(&new_igroup->ref); + mutex_init(&new_igroup->lock); + INIT_LIST_HEAD(&new_igroup->device_list); /* group reference moves into new_igroup */ new_igroup->group = group;
@@ -277,29 +282,18 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev, return 0; }
-static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt, - struct iommufd_group *igroup) -{ - struct iommufd_device *cur_dev; - - lockdep_assert_held(&hwpt->devices_lock); - - list_for_each_entry(cur_dev, &hwpt->devices, devices_item) - if (cur_dev->igroup->group == igroup->group) - return true; - return false; -} - int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { phys_addr_t sw_msi_start = PHYS_ADDR_MAX; int rc;
- lockdep_assert_held(&hwpt->devices_lock); + mutex_lock(&idev->igroup->lock);
- if (WARN_ON(idev->hwpt)) - return -EINVAL; + if (idev->igroup->hwpt != NULL && idev->igroup->hwpt != hwpt) { + rc = -EINVAL; + goto err_unlock; + }
/* * Try to upgrade the domain we have, it is an iommu driver bug to @@ -313,8 +307,9 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, hwpt->domain->ops->enforce_cache_coherency( hwpt->domain); if (!hwpt->enforce_cache_coherency) { - WARN_ON(list_empty(&hwpt->devices)); - return -EINVAL; + WARN_ON(list_empty(&idev->igroup->device_list)); + rc = -EINVAL; + goto err_unlock; } }
@@ -322,51 +317,52 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, idev->igroup->group, &sw_msi_start); if (rc) - return rc; + goto err_unlock;
rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start); if (rc) goto err_unresv;
/* - * FIXME: Hack around missing a device-centric iommu api, only attach to - * the group once for the first device that is in the group. + * Only attach to the group once for the first device that is in the + * group. All the other devices will follow this attachment. The user + * should attach every device individually to the hwpt as the per-device + * reserved regions are only updated during individual device + * attachment. */ - if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) { + if (list_empty(&idev->igroup->device_list)) { rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv; + idev->igroup->hwpt = hwpt; } + refcount_inc(&hwpt->obj.users); + list_add_tail(&idev->group_item, &idev->igroup->device_list); + mutex_unlock(&idev->igroup->lock); return 0; err_unresv: iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); +err_unlock: + mutex_unlock(&idev->igroup->lock); return rc; }
-void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev) +struct iommufd_hw_pagetable * +iommufd_hw_pagetable_detach(struct iommufd_device *idev) { - if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) + struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt; + + mutex_lock(&idev->igroup->lock); + list_del(&idev->group_item); + if (list_empty(&idev->igroup->device_list)) { iommu_detach_group(hwpt->domain, idev->igroup->group); + idev->igroup->hwpt = NULL; + } iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); -} + mutex_unlock(&idev->igroup->lock);
-static int iommufd_device_do_attach(struct iommufd_device *idev, - struct iommufd_hw_pagetable *hwpt) -{ - int rc; - - mutex_lock(&hwpt->devices_lock); - rc = iommufd_hw_pagetable_attach(hwpt, idev); - if (rc) - goto out_unlock; - - idev->hwpt = hwpt; - refcount_inc(&hwpt->obj.users); - list_add(&idev->devices_item, &hwpt->devices); -out_unlock: - mutex_unlock(&hwpt->devices_lock); - return rc; + /* Caller must destroy hwpt */ + return hwpt; }
/* @@ -375,7 +371,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev, * Automatic domain selection will never pick a manually created domain. */ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, - struct iommufd_ioas *ioas) + struct iommufd_ioas *ioas, u32 *pt_id) { struct iommufd_hw_pagetable *hwpt; int rc; @@ -392,7 +388,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
if (!iommufd_lock_obj(&hwpt->obj)) continue; - rc = iommufd_device_do_attach(idev, hwpt); + rc = iommufd_hw_pagetable_attach(hwpt, idev); iommufd_put_object(&hwpt->obj);
/* @@ -402,6 +398,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, */ if (rc == -EINVAL) continue; + *pt_id = hwpt->obj.id; goto out_unlock; }
@@ -411,6 +408,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, goto out_unlock; } hwpt->auto_domain = true; + *pt_id = hwpt->obj.id;
mutex_unlock(&ioas->mutex); iommufd_object_finalize(idev->ictx, &hwpt->obj); @@ -446,7 +444,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) struct iommufd_hw_pagetable *hwpt = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
- rc = iommufd_device_do_attach(idev, hwpt); + rc = iommufd_hw_pagetable_attach(hwpt, idev); if (rc) goto out_put_pt_obj; break; @@ -455,7 +453,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) struct iommufd_ioas *ioas = container_of(pt_obj, struct iommufd_ioas, obj);
- rc = iommufd_device_auto_get_domain(idev, ioas); + rc = iommufd_device_auto_get_domain(idev, ioas, pt_id); if (rc) goto out_put_pt_obj; break; @@ -466,7 +464,6 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) }
refcount_inc(&idev->obj.users); - *pt_id = idev->hwpt->obj.id; rc = 0;
out_put_pt_obj: @@ -484,14 +481,9 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD); */ void iommufd_device_detach(struct iommufd_device *idev) { - struct iommufd_hw_pagetable *hwpt = idev->hwpt; - - mutex_lock(&hwpt->devices_lock); - list_del(&idev->devices_item); - idev->hwpt = NULL; - iommufd_hw_pagetable_detach(hwpt, idev); - mutex_unlock(&hwpt->devices_lock); + struct iommufd_hw_pagetable *hwpt;
+ hwpt = iommufd_hw_pagetable_detach(idev); if (hwpt->auto_domain) iommufd_object_destroy_user(idev->ictx, &hwpt->obj); else diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 6cdb6749d359f3..bdb76cdb1dc347 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -11,8 +11,6 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) struct iommufd_hw_pagetable *hwpt = container_of(obj, struct iommufd_hw_pagetable, obj);
- WARN_ON(!list_empty(&hwpt->devices)); - if (!list_empty(&hwpt->hwpt_item)) { mutex_lock(&hwpt->ioas->mutex); list_del(&hwpt->hwpt_item); @@ -25,7 +23,6 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) iommu_domain_free(hwpt->domain);
refcount_dec(&hwpt->ioas->obj.users); - mutex_destroy(&hwpt->devices_lock); }
/** @@ -52,9 +49,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if (IS_ERR(hwpt)) return hwpt;
- INIT_LIST_HEAD(&hwpt->devices); INIT_LIST_HEAD(&hwpt->hwpt_item); - mutex_init(&hwpt->devices_lock); /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt->ioas = ioas; @@ -65,8 +60,6 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_abort; }
- mutex_lock(&hwpt->devices_lock); - /* * immediate_attach exists only to accommodate iommu drivers that cannot * directly allocate a domain. These drivers do not finish creating the @@ -76,29 +69,18 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if (immediate_attach) { rc = iommufd_hw_pagetable_attach(hwpt, idev); if (rc) - goto out_unlock; + goto out_abort; }
rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); if (rc) goto out_detach; list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); - - if (immediate_attach) { - /* See iommufd_device_do_attach() */ - refcount_inc(&hwpt->obj.users); - idev->hwpt = hwpt; - list_add(&idev->devices_item, &hwpt->devices); - } - - mutex_unlock(&hwpt->devices_lock); return hwpt;
out_detach: if (immediate_attach) - iommufd_hw_pagetable_detach(hwpt, idev); -out_unlock: - mutex_unlock(&hwpt->devices_lock); + iommufd_hw_pagetable_detach(idev); out_abort: iommufd_object_abort_and_destroy(ictx, &hwpt->obj); return ERR_PTR(rc); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index f45615f19798e6..e2eb1db5f8f8ce 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -250,8 +250,6 @@ struct iommufd_hw_pagetable { bool msi_cookie : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; - struct mutex devices_lock; - struct list_head devices; };
struct iommufd_hw_pagetable * @@ -259,14 +257,17 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, bool immediate_attach); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); -void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev); +struct iommufd_hw_pagetable * +iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
struct iommufd_group { struct kref ref; + struct mutex lock; struct iommufd_ctx *ictx; struct iommu_group *group; + struct iommufd_hw_pagetable *hwpt; + struct list_head device_list; };
/* @@ -278,9 +279,7 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; - struct iommufd_hw_pagetable *hwpt; - /* Head at iommufd_hw_pagetable::devices */ - struct list_head devices_item; + struct list_head group_item; /* always the physical device */ struct device *dev; bool enforce_cache_coherency;
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group.
I am preparing the patches to route I/O page faults to user space through iommufd. The iommufd page fault handler knows the hwpt and the device pointer, but it needs to convert the device pointer into its iommufd object id and pass the id to user space.
It's fine that we remove the hwpt->devices here, but perhaps I need to add the context pointer in ioas later,
struct iommufd_ioas { struct io_pagetable iopt; struct mutex mutex; struct list_head hwpt_list; + struct iommufd_ctx *ictx; };
and, use below helper to look up the device id.
+u32 iommufd_get_device_id(struct iommufd_ctx *ictx, struct device *dev) +{ + struct iommu_group *group = iommu_group_get(dev); + u32 dev_id = IOMMUFD_INVALID_OBJ_ID; + struct iommufd_group *igroup; + struct iommufd_device *cur; + unsigned int id; + + if (!group) + return IOMMUFD_INVALID_OBJ_ID; + + id = iommu_group_id(group); + xa_lock(&ictx->groups); + igroup = xa_load(&ictx->groups, id); + if (!iommufd_group_try_get(igroup, group)) { + xa_unlock(&ictx->groups); + iommu_group_put(group); + return IOMMUFD_INVALID_OBJ_ID; + } + xa_unlock(&ictx->groups); + + mutex_lock(&igroup->lock); + list_for_each_entry(cur, &igroup->device_list, group_item) { + if (cur->dev == dev) { + dev_id = cur->obj.id; + break; + } + } + mutex_unlock(&igroup->lock); + + iommufd_put_group(igroup); + iommu_group_put(group); + + return dev_id; +}
and, use it like this in the fault handler:
dev_id = iommufd_get_device_id(hwpt->ioas->ictx, dev); + if (dev_id == IOMMUFD_INVALID_OBJ_ID) + return IOMMU_PAGE_RESP_FAILURE;
Will this look good to you?
For destruction the flow is organized to make the following patches easier, the actual call to iommufd_object_destroy_user() is done at the top of the call chain without holding any locks. The HWPT to be destroyed is returned out from the locked region to make this possible. Later patches create locking that requires this.
Reviewed-by: Lu Baolubaolu.lu@linux.intel.com Reviewed-by: Kevin Tiankevin.tian@intel.com Signed-off-by: Jason Gunthorpejgg@nvidia.com
drivers/iommu/iommufd/device.c | 100 +++++++++++------------- drivers/iommu/iommufd/hw_pagetable.c | 22 +----- drivers/iommu/iommufd/iommufd_private.h | 13 ++- 3 files changed, 54 insertions(+), 81 deletions(-)
Best regards, baolu
On Tue, May 16, 2023 at 11:00:16AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group.
I am preparing the patches to route I/O page faults to user space through iommufd. The iommufd page fault handler knows the hwpt and the device pointer, but it needs to convert the device pointer into its iommufd object id and pass the id to user space.
It's fine that we remove the hwpt->devices here, but perhaps I need to add the context pointer in ioas later,
struct iommufd_ioas { struct io_pagetable iopt; struct mutex mutex; struct list_head hwpt_list;
struct iommufd_ctx *ictx;
};
and, use below helper to look up the device id.
+u32 iommufd_get_device_id(struct iommufd_ctx *ictx, struct device *dev) +{
struct iommu_group *group = iommu_group_get(dev);
u32 dev_id = IOMMUFD_INVALID_OBJ_ID;
struct iommufd_group *igroup;
struct iommufd_device *cur;
unsigned int id;
if (!group)
return IOMMUFD_INVALID_OBJ_ID;
id = iommu_group_id(group);
xa_lock(&ictx->groups);
igroup = xa_load(&ictx->groups, id);
if (!iommufd_group_try_get(igroup, group)) {
xa_unlock(&ictx->groups);
iommu_group_put(group);
return IOMMUFD_INVALID_OBJ_ID;
}
xa_unlock(&ictx->groups);
mutex_lock(&igroup->lock);
list_for_each_entry(cur, &igroup->device_list, group_item) {
if (cur->dev == dev) {
dev_id = cur->obj.id;
break;
}
}
I dislike how slow this is on something resembling a fastish path :\
Maybe we should stash something in the dev_iommu instead?
Or can the PRI stuff provide a cookie per-device?
But it will work like this
dev_id = iommufd_get_device_id(hwpt->ioas->ictx, dev);
Where did the hwpt come from?
Jason
On 5/16/23 8:27 PM, Jason Gunthorpe wrote:
On Tue, May 16, 2023 at 11:00:16AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group.
I am preparing the patches to route I/O page faults to user space through iommufd. The iommufd page fault handler knows the hwpt and the device pointer, but it needs to convert the device pointer into its iommufd object id and pass the id to user space.
It's fine that we remove the hwpt->devices here, but perhaps I need to add the context pointer in ioas later,
struct iommufd_ioas { struct io_pagetable iopt; struct mutex mutex; struct list_head hwpt_list;
};struct iommufd_ctx *ictx;
and, use below helper to look up the device id.
+u32 iommufd_get_device_id(struct iommufd_ctx *ictx, struct device *dev) +{
struct iommu_group *group = iommu_group_get(dev);
u32 dev_id = IOMMUFD_INVALID_OBJ_ID;
struct iommufd_group *igroup;
struct iommufd_device *cur;
unsigned int id;
if (!group)
return IOMMUFD_INVALID_OBJ_ID;
id = iommu_group_id(group);
xa_lock(&ictx->groups);
igroup = xa_load(&ictx->groups, id);
if (!iommufd_group_try_get(igroup, group)) {
xa_unlock(&ictx->groups);
iommu_group_put(group);
return IOMMUFD_INVALID_OBJ_ID;
}
xa_unlock(&ictx->groups);
mutex_lock(&igroup->lock);
list_for_each_entry(cur, &igroup->device_list, group_item) {
if (cur->dev == dev) {
dev_id = cur->obj.id;
break;
}
}
I dislike how slow this is on something resembling a fastish path :\
Yes, agreed.
Maybe we should stash something in the dev_iommu instead?
Or can the PRI stuff provide a cookie per-device?
We already have a per-device fault cookie:
/** * struct iommu_fault_param - per-device IOMMU fault data * @handler: Callback function to handle IOMMU faults at device level * @data: handler private data * @faults: holds the pending faults which needs response * @lock: protect pending faults list */ struct iommu_fault_param { iommu_dev_fault_handler_t handler; void *data; struct list_head faults; struct mutex lock; };
Perhaps we can add a @dev_id memory here?
But it will work like this
dev_id = iommufd_get_device_id(hwpt->ioas->ictx, dev);
Where did the hwpt come from?
It is installed when setting up the iopf handler for the hwpt.
+ iommu_domain_set_iopf_handler(hwpt->domain, + iommufd_hw_pagetable_iopf_handler, + hwpt);
Best regards, baolu
From: Baolu Lu baolu.lu@linux.intel.com Sent: Wednesday, May 17, 2023 12:15 PM
On 5/16/23 8:27 PM, Jason Gunthorpe wrote:
On Tue, May 16, 2023 at 11:00:16AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire
group
and we can manage the per-device attach/detach with a list in the iommufd_group.
I am preparing the patches to route I/O page faults to user space through iommufd. The iommufd page fault handler knows the hwpt and
the
device pointer, but it needs to convert the device pointer into its iommufd object id and pass the id to user space.
It's fine that we remove the hwpt->devices here, but perhaps I need to add the context pointer in ioas later,
struct iommufd_ioas { struct io_pagetable iopt; struct mutex mutex; struct list_head hwpt_list;
};struct iommufd_ctx *ictx;
and, use below helper to look up the device id.
+u32 iommufd_get_device_id(struct iommufd_ctx *ictx, struct device *dev) +{
struct iommu_group *group = iommu_group_get(dev);
u32 dev_id = IOMMUFD_INVALID_OBJ_ID;
struct iommufd_group *igroup;
struct iommufd_device *cur;
unsigned int id;
if (!group)
return IOMMUFD_INVALID_OBJ_ID;
id = iommu_group_id(group);
xa_lock(&ictx->groups);
igroup = xa_load(&ictx->groups, id);
if (!iommufd_group_try_get(igroup, group)) {
xa_unlock(&ictx->groups);
iommu_group_put(group);
return IOMMUFD_INVALID_OBJ_ID;
}
xa_unlock(&ictx->groups);
mutex_lock(&igroup->lock);
list_for_each_entry(cur, &igroup->device_list, group_item) {
if (cur->dev == dev) {
dev_id = cur->obj.id;
break;
}
}
I dislike how slow this is on something resembling a fastish path :\
Yes, agreed.
Maybe we should stash something in the dev_iommu instead?
Or can the PRI stuff provide a cookie per-device?
We already have a per-device fault cookie:
/**
- struct iommu_fault_param - per-device IOMMU fault data
- @handler: Callback function to handle IOMMU faults at device level
- @data: handler private data
- @faults: holds the pending faults which needs response
- @lock: protect pending faults list
*/ struct iommu_fault_param { iommu_dev_fault_handler_t handler; void *data; struct list_head faults; struct mutex lock; };
Perhaps we can add a @dev_id memory here?
what about SIOV? There is only one cookie per parent device.
On Wed, May 17, 2023 at 06:33:30AM +0000, Tian, Kevin wrote:
From: Baolu Lu baolu.lu@linux.intel.com Sent: Wednesday, May 17, 2023 12:15 PM
On 5/16/23 8:27 PM, Jason Gunthorpe wrote:
On Tue, May 16, 2023 at 11:00:16AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire
group
and we can manage the per-device attach/detach with a list in the iommufd_group.
I am preparing the patches to route I/O page faults to user space through iommufd. The iommufd page fault handler knows the hwpt and
the
device pointer, but it needs to convert the device pointer into its iommufd object id and pass the id to user space.
It's fine that we remove the hwpt->devices here, but perhaps I need to add the context pointer in ioas later,
struct iommufd_ioas { struct io_pagetable iopt; struct mutex mutex; struct list_head hwpt_list;
};struct iommufd_ctx *ictx;
and, use below helper to look up the device id.
+u32 iommufd_get_device_id(struct iommufd_ctx *ictx, struct device *dev) +{
struct iommu_group *group = iommu_group_get(dev);
u32 dev_id = IOMMUFD_INVALID_OBJ_ID;
struct iommufd_group *igroup;
struct iommufd_device *cur;
unsigned int id;
if (!group)
return IOMMUFD_INVALID_OBJ_ID;
id = iommu_group_id(group);
xa_lock(&ictx->groups);
igroup = xa_load(&ictx->groups, id);
if (!iommufd_group_try_get(igroup, group)) {
xa_unlock(&ictx->groups);
iommu_group_put(group);
return IOMMUFD_INVALID_OBJ_ID;
}
xa_unlock(&ictx->groups);
mutex_lock(&igroup->lock);
list_for_each_entry(cur, &igroup->device_list, group_item) {
if (cur->dev == dev) {
dev_id = cur->obj.id;
break;
}
}
I dislike how slow this is on something resembling a fastish path :\
Yes, agreed.
Maybe we should stash something in the dev_iommu instead?
Or can the PRI stuff provide a cookie per-device?
We already have a per-device fault cookie:
/**
- struct iommu_fault_param - per-device IOMMU fault data
- @handler: Callback function to handle IOMMU faults at device level
- @data: handler private data
- @faults: holds the pending faults which needs response
- @lock: protect pending faults list
*/ struct iommu_fault_param { iommu_dev_fault_handler_t handler; void *data; struct list_head faults; struct mutex lock; };
Perhaps we can add a @dev_id memory here?
what about SIOV? There is only one cookie per parent device.
It doesn't make any sense to store a struct like that in dev_iommu.
The fault handler should come from the domain and we should be able to have a unique 'void *data' cookie linked to the (dev,PASID) to go along with the fault handler.
This is all going to need some revising before we can expose it to iommufd
Jason
On 2023/5/17 20:43, Jason Gunthorpe wrote:
On Wed, May 17, 2023 at 06:33:30AM +0000, Tian, Kevin wrote:
From: Baolu Lu baolu.lu@linux.intel.com Sent: Wednesday, May 17, 2023 12:15 PM
On 5/16/23 8:27 PM, Jason Gunthorpe wrote:
On Tue, May 16, 2023 at 11:00:16AM +0800, Baolu Lu wrote:
On 5/15/23 10:00 PM, Jason Gunthorpe wrote:
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire
group
and we can manage the per-device attach/detach with a list in the iommufd_group.
I am preparing the patches to route I/O page faults to user space through iommufd. The iommufd page fault handler knows the hwpt and
the
device pointer, but it needs to convert the device pointer into its iommufd object id and pass the id to user space.
It's fine that we remove the hwpt->devices here, but perhaps I need to add the context pointer in ioas later,
struct iommufd_ioas { struct io_pagetable iopt; struct mutex mutex; struct list_head hwpt_list;
};struct iommufd_ctx *ictx;
and, use below helper to look up the device id.
+u32 iommufd_get_device_id(struct iommufd_ctx *ictx, struct device *dev) +{
struct iommu_group *group = iommu_group_get(dev);
u32 dev_id = IOMMUFD_INVALID_OBJ_ID;
struct iommufd_group *igroup;
struct iommufd_device *cur;
unsigned int id;
if (!group)
return IOMMUFD_INVALID_OBJ_ID;
id = iommu_group_id(group);
xa_lock(&ictx->groups);
igroup = xa_load(&ictx->groups, id);
if (!iommufd_group_try_get(igroup, group)) {
xa_unlock(&ictx->groups);
iommu_group_put(group);
return IOMMUFD_INVALID_OBJ_ID;
}
xa_unlock(&ictx->groups);
mutex_lock(&igroup->lock);
list_for_each_entry(cur, &igroup->device_list, group_item) {
if (cur->dev == dev) {
dev_id = cur->obj.id;
break;
}
}
I dislike how slow this is on something resembling a fastish path :\
Yes, agreed.
Maybe we should stash something in the dev_iommu instead?
Or can the PRI stuff provide a cookie per-device?
We already have a per-device fault cookie:
/**
- struct iommu_fault_param - per-device IOMMU fault data
- @handler: Callback function to handle IOMMU faults at device level
- @data: handler private data
- @faults: holds the pending faults which needs response
- @lock: protect pending faults list
*/ struct iommu_fault_param { iommu_dev_fault_handler_t handler; void *data; struct list_head faults; struct mutex lock; };
Perhaps we can add a @dev_id memory here?
what about SIOV? There is only one cookie per parent device.
It doesn't make any sense to store a struct like that in dev_iommu.
The fault handler should come from the domain and we should be able to have a unique 'void *data' cookie linked to the (dev,PASID) to go along with the fault handler.
If I get your point correctly, the iommu core should provide some places for the iommufd to put a cookie for each pair of {device, pasid}, and provide interfaces to manage it. For example,
void iommu_set_device_fault_cookie(struct device *dev, ioasit_t pasid, void *fault_cookie);
void *iommu_get_device_fault_cookie(struct device *dev, ioasit_t pasid)
If so, perhaps we need some special treatment for ARM as a user hwpt actually presents the PASID table of the device and the guest setting pasid table entry will not be propagated to host. Then, the @pasid in above interfaces is meaningless.
This is all going to need some revising before we can expose it to iommufd
Yes, agreed. i will post a preparation series to do this. Besides the fault cookie, at least, I want to do the following preparation.
1) Move iommu faults uapi from uapi/linux/iommu.h to uapi/linux /iommufd.h and remove the former.
2) Add a device id in the iommu_fault structure. struct iommu_fault { __u32 type; - __u32 padding; + __u32 dev_id; union { struct iommu_fault_unrecoverable event; struct iommu_fault_page_request prm;
3) Add the device pointer to the parameters of domain fault handler.
4) Decouple I/O page fault handling from IOMMU_SVA in the iommu core and the drivers.
Best regards, baolu
On Thu, May 18, 2023 at 03:05:23PM +0800, Baolu Lu wrote:
It doesn't make any sense to store a struct like that in dev_iommu.
The fault handler should come from the domain and we should be able to have a unique 'void *data' cookie linked to the (dev,PASID) to go along with the fault handler.
If I get your point correctly, the iommu core should provide some places for the iommufd to put a cookie for each pair of {device, pasid}, and provide interfaces to manage it. For example,
I'd say when you attach a PRI capable domain (eg to a PASID) then provide a 'void * data' during the attach.
If so, perhaps we need some special treatment for ARM as a user hwpt actually presents the PASID table of the device and the guest setting pasid table entry will not be propagated to host. Then, the @pasid in above interfaces is meaningless.
As above, when attaching to a RID you'd still pass in the *data
- Move iommu faults uapi from uapi/linux/iommu.h to uapi/linux
/iommufd.h and remove the former.
Please no, delete all the dead code from here and move whatever is still in use into include/linux/
Then we can talk about what parts of it become uAPI and how best to manage that on a patch by patch basis.
- Add a device id in the iommu_fault structure.
struct iommu_fault { __u32 type;
__u32 padding;
__u32 dev_id;
Why? This is iommufd internal, the void * above should cover it.
- Add the device pointer to the parameters of domain fault handler.
That makes some sense
- Decouple I/O page fault handling from IOMMU_SVA in the iommu core and the drivers.
Definately, this SVA stuff need to be scrubbed out.
SVA is only a special domain type that takes the page table top from a mmu_stuct and a shared fault handler in the core code to do handle_mm_fault()
It should not be in drivers any more deeply than that. We still have a ways to go.
Jason
On 2023/5/18 20:02, Jason Gunthorpe wrote:
On Thu, May 18, 2023 at 03:05:23PM +0800, Baolu Lu wrote:
It doesn't make any sense to store a struct like that in dev_iommu.
The fault handler should come from the domain and we should be able to have a unique 'void *data' cookie linked to the (dev,PASID) to go along with the fault handler.
If I get your point correctly, the iommu core should provide some places for the iommufd to put a cookie for each pair of {device, pasid}, and provide interfaces to manage it. For example,
I'd say when you attach a PRI capable domain (eg to a PASID) then provide a 'void * data' during the attach.
If so, perhaps we need some special treatment for ARM as a user hwpt actually presents the PASID table of the device and the guest setting pasid table entry will not be propagated to host. Then, the @pasid in above interfaces is meaningless.
As above, when attaching to a RID you'd still pass in the *data
Yes! Merging these with hwpt attach/detach would be more logical.
- Move iommu faults uapi from uapi/linux/iommu.h to uapi/linux /iommufd.h and remove the former.
Please no, delete all the dead code from here and move whatever is still in use into include/linux/
Then we can talk about what parts of it become uAPI and how best to manage that on a patch by patch basis.
Okay, let's rebuild it from the ground up.
- Add a device id in the iommu_fault structure.
struct iommu_fault { __u32 type;
__u32 padding;
__u32 dev_id;
Why? This is iommufd internal, the void * above should cover it.
This is actually part of 1). :-)
- Add the device pointer to the parameters of domain fault handler.
That makes some sense
- Decouple I/O page fault handling from IOMMU_SVA in the iommu core and the drivers.
Definately, this SVA stuff need to be scrubbed out.
SVA is only a special domain type that takes the page table top from a mmu_stuct and a shared fault handler in the core code to do handle_mm_fault()
It should not be in drivers any more deeply than that. We still have a ways to go.
Agreed. In addition to fault handler, SVA usually needs to register a callback to mm_notifier to keep the IO or device TLB cache consistent. This part of code could also be consolidated to the core.
There are still things to do here, but the priority (from my point of view) is to make the iopf handling framework in the iommu core more generic, rather than just serving SVA.
Best regards, baolu
From: Baolu Lu baolu.lu@linux.intel.com Sent: Friday, May 19, 2023 10:04 AM
On 2023/5/18 20:02, Jason Gunthorpe wrote:
On Thu, May 18, 2023 at 03:05:23PM +0800, Baolu Lu wrote:
If so, perhaps we need some special treatment for ARM as a user hwpt actually presents the PASID table of the device and the guest setting pasid table entry will not be propagated to host. Then, the @pasid in above interfaces is meaningless.
As above, when attaching to a RID you'd still pass in the *data
Yes! Merging these with hwpt attach/detach would be more logical.
Probably we still need a way in iommu core to differentiate whether to look at *data according to {RID} alone or {RID, PASID} when receiving a fault request tagged by {struct device, pasid} from the underlying iommu driver.
That might be just implementation detail, though.
- Move iommu faults uapi from uapi/linux/iommu.h to uapi/linux /iommufd.h and remove the former.
Please no, delete all the dead code from here and move whatever is still in use into include/linux/
Then we can talk about what parts of it become uAPI and how best to manage that on a patch by patch basis.
Okay, let's rebuild it from the ground up.
yes. Actually that uapi is incomplete. It contains only definitions of data structures but no actual command. It's hard to say that definition is sufficient w/o actually involving the command.
On Fri, May 19, 2023 at 10:03:57AM +0800, Baolu Lu wrote:
Agreed. In addition to fault handler, SVA usually needs to register a callback to mm_notifier to keep the IO or device TLB cache consistent. This part of code could also be consolidated to the core.
Yes, that should not be in drivers at all..
Jason
iommufd wants to use this in the next patch. For some reason the iommu_put_resv_regions() was already exported.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommu.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f1dcfa3f1a1b48..280585965bfec5 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2669,6 +2669,14 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
+/** + * iommu_get_resv_regions - get reserved regions + * @dev: device for which to get reserved regions + * @list: reserved region list for device + * + * This returns a list of reserved IOVA regions specific to this device. + * A domain user should not map IOVA in these ranges. + */ void iommu_get_resv_regions(struct device *dev, struct list_head *list) { const struct iommu_ops *ops = dev_iommu_ops(dev); @@ -2676,9 +2684,10 @@ void iommu_get_resv_regions(struct device *dev, struct list_head *list) if (ops->get_resv_regions) ops->get_resv_regions(dev, list); } +EXPORT_SYMBOL_GPL(iommu_get_resv_regions);
/** - * iommu_put_resv_regions - release resered regions + * iommu_put_resv_regions - release reserved regions * @dev: device for which to free reserved regions * @list: reserved region list for device *
The driver facing API in the iommu core makes the reserved regions per-device. An algorithm in the core code consolidates the regions of all the devices in a group to return the group view.
To allow for devices to be hotplugged into the group iommufd would re-load the entire group's reserved regions for each device, just in case they changed.
Further iommufd already has to deal with duplicated/overlapping reserved regions as it must union all the groups together.
Thus simplify all of this to just use the device reserved regions interface directly from the iommu driver.
Suggested-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 5 ++--- drivers/iommu/iommufd/io_pagetable.c | 27 ++++++++++--------------- drivers/iommu/iommufd/iommufd_private.h | 7 +++---- 3 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 23ebf2065e20de..cd8ded265495cc 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -313,9 +313,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, } }
- rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev, - idev->igroup->group, - &sw_msi_start); + rc = iopt_table_enforce_dev_resv_regions( + &hwpt->ioas->iopt, idev->dev, &sw_msi_start); if (rc) goto err_unlock;
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c index e0ae72b9e67f86..f842768b2e250b 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -1161,25 +1161,22 @@ void iopt_remove_access(struct io_pagetable *iopt, up_write(&iopt->domains_rwsem); }
-/* Narrow the valid_iova_itree to include reserved ranges from a group. */ -int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt, - struct device *device, - struct iommu_group *group, - phys_addr_t *sw_msi_start) +/* Narrow the valid_iova_itree to include reserved ranges from a device. */ +int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt, + struct device *dev, + phys_addr_t *sw_msi_start) { struct iommu_resv_region *resv; - struct iommu_resv_region *tmp; - LIST_HEAD(group_resv_regions); + LIST_HEAD(resv_regions); unsigned int num_hw_msi = 0; unsigned int num_sw_msi = 0; int rc;
down_write(&iopt->iova_rwsem); - rc = iommu_get_group_resv_regions(group, &group_resv_regions); - if (rc) - goto out_unlock; + /* FIXME: drivers allocate memory but there is no failure propogated */ + iommu_get_resv_regions(dev, &resv_regions);
- list_for_each_entry(resv, &group_resv_regions, list) { + list_for_each_entry(resv, &resv_regions, list) { if (resv->type == IOMMU_RESV_DIRECT_RELAXABLE) continue;
@@ -1191,7 +1188,7 @@ int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt, }
rc = iopt_reserve_iova(iopt, resv->start, - resv->length - 1 + resv->start, device); + resv->length - 1 + resv->start, dev); if (rc) goto out_reserved; } @@ -1206,11 +1203,9 @@ int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt, goto out_free_resv;
out_reserved: - __iopt_remove_reserved_iova(iopt, device); + __iopt_remove_reserved_iova(iopt, dev); out_free_resv: - list_for_each_entry_safe(resv, tmp, &group_resv_regions, list) - kfree(resv); -out_unlock: + iommu_put_resv_regions(dev, &resv_regions); up_write(&iopt->iova_rwsem); return rc; } diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index e2eb1db5f8f8ce..d5c3c627c0dff3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -76,10 +76,9 @@ int iopt_table_add_domain(struct io_pagetable *iopt, struct iommu_domain *domain); void iopt_table_remove_domain(struct io_pagetable *iopt, struct iommu_domain *domain); -int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt, - struct device *device, - struct iommu_group *group, - phys_addr_t *sw_msi_start); +int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt, + struct device *dev, + phys_addr_t *sw_msi_start); int iopt_set_allow_iova(struct io_pagetable *iopt, struct rb_root_cached *allowed_iova); int iopt_reserve_iova(struct io_pagetable *iopt, unsigned long start,
This only needs to be done once per group, not once per device. The once per device was a way to make the device list work. Since we are abandoning this we can optimize things a bit.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index cd8ded265495cc..548288d9a5c167 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -318,10 +318,6 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, if (rc) goto err_unlock;
- rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start); - if (rc) - goto err_unresv; - /* * Only attach to the group once for the first device that is in the * group. All the other devices will follow this attachment. The user @@ -330,6 +326,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * attachment. */ if (list_empty(&idev->igroup->device_list)) { + rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start); + if (rc) + goto err_unresv; + rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv;
The sw_msi_start is only set by the ARM drivers and it is always constant. Due to the way vfio/iommufd allow domains to be re-used between devices we have a built in assumption that there is only one value for sw_msi_start and it is global to the system.
To make replace simpler where we may not reparse the iommu_get_resv_regions() move the sw_msi_start to the iommufd_group so it is always available once any HWPT has been attached.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 14 +++++++------- drivers/iommu/iommufd/iommufd_private.h | 1 + 2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 548288d9a5c167..b6bf447a6b1752 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -88,6 +88,7 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx, kref_init(&new_igroup->ref); mutex_init(&new_igroup->lock); INIT_LIST_HEAD(&new_igroup->device_list); + new_igroup->sw_msi_start = PHYS_ADDR_MAX; /* group reference moves into new_igroup */ new_igroup->group = group;
@@ -249,10 +250,10 @@ void iommufd_device_unbind(struct iommufd_device *idev) } EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
-static int iommufd_device_setup_msi(struct iommufd_device *idev, - struct iommufd_hw_pagetable *hwpt, - phys_addr_t sw_msi_start) +static int iommufd_group_setup_msi(struct iommufd_group *igroup, + struct iommufd_hw_pagetable *hwpt) { + phys_addr_t sw_msi_start = igroup->sw_msi_start; int rc;
/* @@ -285,7 +286,6 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev, int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { - phys_addr_t sw_msi_start = PHYS_ADDR_MAX; int rc;
mutex_lock(&idev->igroup->lock); @@ -313,8 +313,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, } }
- rc = iopt_table_enforce_dev_resv_regions( - &hwpt->ioas->iopt, idev->dev, &sw_msi_start); + rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, + &idev->igroup->sw_msi_start); if (rc) goto err_unlock;
@@ -326,7 +326,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * attachment. */ if (list_empty(&idev->igroup->device_list)) { - rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start); + rc = iommufd_group_setup_msi(idev->igroup, hwpt); if (rc) goto err_unresv;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index d5c3c627c0dff3..9ede238908e7b3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -267,6 +267,7 @@ struct iommufd_group { struct iommu_group *group; struct iommufd_hw_pagetable *hwpt; struct list_head device_list; + phys_addr_t sw_msi_start; };
/*
Next patch will need to call this from two places.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 6 +----- drivers/iommu/iommufd/iommufd_private.h | 10 ++++++++++ 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index b6bf447a6b1752..69536f6c7be832 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -483,11 +483,7 @@ void iommufd_device_detach(struct iommufd_device *idev) struct iommufd_hw_pagetable *hwpt;
hwpt = iommufd_hw_pagetable_detach(idev); - if (hwpt->auto_domain) - iommufd_object_destroy_user(idev->ictx, &hwpt->obj); - else - refcount_dec(&hwpt->obj.users); - + iommufd_hw_pagetable_put(idev->ictx, hwpt); refcount_dec(&idev->obj.users); } EXPORT_SYMBOL_NS_GPL(iommufd_device_detach, IOMMUFD); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 9ede238908e7b3..63b2587a37f23d 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -260,6 +260,16 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, + struct iommufd_hw_pagetable *hwpt) +{ + lockdep_assert_not_held(&hwpt->ioas->mutex); + if (hwpt->auto_domain) + iommufd_object_destroy_user(ictx, &hwpt->obj); + else + refcount_dec(&hwpt->obj.users); +} + struct iommufd_group { struct kref ref; struct mutex lock;
Logically the HWPT should have the coherency set properly for the device that it is being created for when it is created.
This was happening implicitly if the immediate_attach was set because iommufd_hw_pagetable_attach() does it as the first thing.
Do it unconditionally so !immediate_attach works properly.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 19 ++++------------- drivers/iommu/iommufd/hw_pagetable.c | 27 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 1 + 3 files changed, 32 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 69536f6c7be832..2eec53c5d00e74 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -295,22 +295,11 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, goto err_unlock; }
- /* - * Try to upgrade the domain we have, it is an iommu driver bug to - * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail - * enforce_cache_coherency when there are no devices attached to the - * domain. - */ - if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) { - if (hwpt->domain->ops->enforce_cache_coherency) - hwpt->enforce_cache_coherency = - hwpt->domain->ops->enforce_cache_coherency( - hwpt->domain); - if (!hwpt->enforce_cache_coherency) { - WARN_ON(list_empty(&idev->igroup->device_list)); - rc = -EINVAL; + /* Try to upgrade the domain we have */ + if (idev->enforce_cache_coherency) { + rc = iommufd_hw_pagetable_enforce_cc(hwpt); + if (rc) goto err_unlock; - } }
rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index bdb76cdb1dc347..e0699d7f4c64af 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -25,6 +25,20 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) refcount_dec(&hwpt->ioas->obj.users); }
+int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) +{ + if (hwpt->enforce_cache_coherency) + return 0; + + if (hwpt->domain->ops->enforce_cache_coherency) + hwpt->enforce_cache_coherency = + hwpt->domain->ops->enforce_cache_coherency( + hwpt->domain); + if (!hwpt->enforce_cache_coherency) + return -EINVAL; + return 0; +} + /** * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device * @ictx: iommufd context @@ -60,6 +74,19 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_abort; }
+ /* + * Set the coherency mode before we do iopt_table_add_domain() as some + * iommus have a per-PTE bit that controls it and need to decide before + * doing any maps. It is an iommu driver bug to report + * IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail enforce_cache_coherency on + * a new domain. + */ + if (idev->enforce_cache_coherency) { + rc = iommufd_hw_pagetable_enforce_cc(hwpt); + if (WARN_ON(rc)) + goto out_abort; + } + /* * immediate_attach exists only to accommodate iommu drivers that cannot * directly allocate a domain. These drivers do not finish creating the diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 63b2587a37f23d..5ce9eaf5fb88e3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -254,6 +254,7 @@ struct iommufd_hw_pagetable { struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, bool immediate_attach); +int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); struct iommufd_hw_pagetable *
During creation the hwpt must have the ioas->mutex held until the object is finalized. This means we need to be able to call iommufd_object_abort_and_destroy() while holding the mutex.
Since iommufd_hw_pagetable_destroy() also needs the mutex this is problematic.
Fix it by creating a special abort op for the object that can assume the caller is holding the lock, as required by the contract.
The next patch will add another iommufd_object_abort_and_destroy() for a hwpt.
Fixes: e8d57210035b ("iommufd: Add kAPI toward external drivers for physical devices") Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/hw_pagetable.c | 19 +++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/main.c | 7 ++++++- 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index e0699d7f4c64af..2087b51d9807ad 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -25,6 +25,21 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) refcount_dec(&hwpt->ioas->obj.users); }
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj) +{ + struct iommufd_hw_pagetable *hwpt = + container_of(obj, struct iommufd_hw_pagetable, obj); + + /* The ioas->mutex must be held until finalize is called. */ + lockdep_assert_held(&hwpt->ioas->mutex); + + if (!list_empty(&hwpt->hwpt_item)) { + list_del_init(&hwpt->hwpt_item); + iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain); + } + iommufd_hw_pagetable_destroy(obj); +} + int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) { if (hwpt->enforce_cache_coherency) @@ -49,6 +64,10 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT * will be linked to the given ioas and upon return the underlying iommu_domain * is fully popoulated. + * + * The caller must hold the ioas->mutex until after + * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on + * the returned hwpt. */ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 5ce9eaf5fb88e3..07d60d0640fee3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -260,6 +260,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); +void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index e5ed5dfa91a0b5..24f30f384df6f9 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -24,6 +24,7 @@
struct iommufd_object_ops { void (*destroy)(struct iommufd_object *obj); + void (*abort)(struct iommufd_object *obj); }; static const struct iommufd_object_ops iommufd_object_ops[]; static struct miscdevice vfio_misc_dev; @@ -95,7 +96,10 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx, struct iommufd_object *obj) { - iommufd_object_ops[obj->type].destroy(obj); + if (iommufd_object_ops[obj->type].abort) + iommufd_object_ops[obj->type].abort(obj); + else + iommufd_object_ops[obj->type].destroy(obj); iommufd_object_abort(ictx, obj); }
@@ -401,6 +405,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { }, [IOMMUFD_OBJ_HW_PAGETABLE] = { .destroy = iommufd_hw_pagetable_destroy, + .abort = iommufd_hw_pagetable_abort, }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = {
Due to the auto_domains mechanism the ioas->mutex must be held until the hwpt is completely setup by iommufd_object_abort_and_destroy() or iommufd_object_finalize().
This prevents a concurrent iommufd_device_auto_get_domain() from seeing an incompletely initialized object through the ioas->hwpt_list.
To make this more consistent move the unlock until after finalize.
Fixes: e8d57210035b ("iommufd: Add kAPI toward external drivers for physical devices") Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 2eec53c5d00e74..dc437409e70ffe 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -398,8 +398,8 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, hwpt->auto_domain = true; *pt_id = hwpt->obj.id;
- mutex_unlock(&ioas->mutex); iommufd_object_finalize(idev->ictx, &hwpt->obj); + mutex_unlock(&ioas->mutex); return 0; out_unlock: mutex_unlock(&ioas->mutex);
The code flow for first time attaching a PT and replacing a PT is very similar except for the lowest do_attach step.
Reorganize this so that the do_attach step is a function pointer.
Replace requires destroying the old HWPT once it is replaced. This destruction cannot be done under all the locks that are held in the function pointer, so the signature allows returning a HWPT which will be destroyed by the caller after everything is unlocked.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 165 +++++++++++++++++++++++---------- 1 file changed, 114 insertions(+), 51 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index dc437409e70ffe..b7868c877d1c1c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -353,16 +353,41 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) return hwpt; }
+static struct iommufd_hw_pagetable * +iommufd_device_do_attach(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt) +{ + int rc; + + rc = iommufd_hw_pagetable_attach(hwpt, idev); + if (rc) + return ERR_PTR(rc); + return NULL; +} + +typedef struct iommufd_hw_pagetable *(*attach_fn)( + struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt); + /* * When automatically managing the domains we search for a compatible domain in * the iopt and if one is found use it, otherwise create a new domain. * Automatic domain selection will never pick a manually created domain. */ -static int iommufd_device_auto_get_domain(struct iommufd_device *idev, - struct iommufd_ioas *ioas, u32 *pt_id) +static struct iommufd_hw_pagetable * +iommufd_device_auto_get_domain(struct iommufd_device *idev, + struct iommufd_ioas *ioas, u32 *pt_id, + attach_fn do_attach) { + /* + * iommufd_hw_pagetable_attach() is called by + * iommufd_hw_pagetable_alloc() in immediate attachment mode, same as + * iommufd_device_do_attach(). So if we are in this mode then we prefer + * to use the immediate_attach path as it supports drivers that can't + * directly allocate a domain. + */ + bool immediate_attach = do_attach == iommufd_device_do_attach; + struct iommufd_hw_pagetable *destroy_hwpt; struct iommufd_hw_pagetable *hwpt; - int rc;
/* * There is no differentiation when domains are allocated, so any domain @@ -376,38 +401,101 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
if (!iommufd_lock_obj(&hwpt->obj)) continue; - rc = iommufd_hw_pagetable_attach(hwpt, idev); - iommufd_put_object(&hwpt->obj); - - /* - * -EINVAL means the domain is incompatible with the device. - * Other error codes should propagate to userspace as failure. - * Success means the domain is attached. - */ - if (rc == -EINVAL) - continue; + destroy_hwpt = (*do_attach)(idev, hwpt); + if (IS_ERR(destroy_hwpt)) { + iommufd_put_object(&hwpt->obj); + /* + * -EINVAL means the domain is incompatible with the + * device. Other error codes should propagate to + * userspace as failure. Success means the domain is + * attached. + */ + if (PTR_ERR(destroy_hwpt) == -EINVAL) + continue; + goto out_unlock; + } *pt_id = hwpt->obj.id; + iommufd_put_object(&hwpt->obj); goto out_unlock; }
- hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, true); + hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, + immediate_attach); if (IS_ERR(hwpt)) { - rc = PTR_ERR(hwpt); + destroy_hwpt = ERR_CAST(hwpt); goto out_unlock; } + + if (!immediate_attach) { + destroy_hwpt = (*do_attach)(idev, hwpt); + if (IS_ERR(destroy_hwpt)) + goto out_abort; + } else { + destroy_hwpt = NULL; + } + hwpt->auto_domain = true; *pt_id = hwpt->obj.id;
iommufd_object_finalize(idev->ictx, &hwpt->obj); mutex_unlock(&ioas->mutex); - return 0; + return destroy_hwpt; + +out_abort: + iommufd_object_abort_and_destroy(idev->ictx, &hwpt->obj); out_unlock: mutex_unlock(&ioas->mutex); - return rc; + return destroy_hwpt; +} + +static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, + attach_fn do_attach) +{ + struct iommufd_hw_pagetable *destroy_hwpt; + struct iommufd_object *pt_obj; + + pt_obj = iommufd_get_object(idev->ictx, *pt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) + return PTR_ERR(pt_obj); + + switch (pt_obj->type) { + case IOMMUFD_OBJ_HW_PAGETABLE: { + struct iommufd_hw_pagetable *hwpt = + container_of(pt_obj, struct iommufd_hw_pagetable, obj); + + destroy_hwpt = (*do_attach)(idev, hwpt); + if (IS_ERR(destroy_hwpt)) + goto out_put_pt_obj; + break; + } + case IOMMUFD_OBJ_IOAS: { + struct iommufd_ioas *ioas = + container_of(pt_obj, struct iommufd_ioas, obj); + + destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id, + do_attach); + if (IS_ERR(destroy_hwpt)) + goto out_put_pt_obj; + break; + } + default: + destroy_hwpt = ERR_PTR(-EINVAL); + goto out_put_pt_obj; + } + iommufd_put_object(pt_obj); + + /* This destruction has to be after we unlock everything */ + if (destroy_hwpt) + iommufd_hw_pagetable_put(idev->ictx, destroy_hwpt); + return 0; + +out_put_pt_obj: + iommufd_put_object(pt_obj); + return PTR_ERR(destroy_hwpt); }
/** - * iommufd_device_attach - Connect a device from an iommu_domain + * iommufd_device_attach - Connect a device to an iommu_domain * @idev: device to attach * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE * Output the IOMMUFD_OBJ_HW_PAGETABLE ID @@ -420,43 +508,18 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, */ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) { - struct iommufd_object *pt_obj; int rc;
- pt_obj = iommufd_get_object(idev->ictx, *pt_id, IOMMUFD_OBJ_ANY); - if (IS_ERR(pt_obj)) - return PTR_ERR(pt_obj); - - switch (pt_obj->type) { - case IOMMUFD_OBJ_HW_PAGETABLE: { - struct iommufd_hw_pagetable *hwpt = - container_of(pt_obj, struct iommufd_hw_pagetable, obj); - - rc = iommufd_hw_pagetable_attach(hwpt, idev); - if (rc) - goto out_put_pt_obj; - break; - } - case IOMMUFD_OBJ_IOAS: { - struct iommufd_ioas *ioas = - container_of(pt_obj, struct iommufd_ioas, obj); - - rc = iommufd_device_auto_get_domain(idev, ioas, pt_id); - if (rc) - goto out_put_pt_obj; - break; - } - default: - rc = -EINVAL; - goto out_put_pt_obj; - } + rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach); + if (rc) + return rc;
+ /* + * Pairs with iommufd_device_detach() - catches caller bugs attempting + * to destroy a device with an attachment. + */ refcount_inc(&idev->obj.users); - rc = 0; - -out_put_pt_obj: - iommufd_put_object(pt_obj); - return rc; + return 0; } EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
From: Nicolin Chen nicolinc@nvidia.com
qemu has a need to replace the translations associated with a domain when the guest does large-scale operations like switching between an IDENTITY domain and, say, dma-iommu.c.
Currently, it does this by replacing all the mappings in a single domain, but this is very inefficient and means that domains have to be per-device rather than per-translation.
Provide a high-level API to allow replacements of one domain with another. This is similar to a detach/attach cycle except it doesn't force the group to go to the blocking domain in-between.
By removing this forced blocking domain the iommu driver has the opportunity to implement a non-disruptive replacement of the domain to the greatest extent its hardware allows. This allows the qemu emulation of the vIOMMU to be more complete, as real hardware often has a non-distruptive replacement capability.
It could be possible to address this by simply removing the protection from the iommu_attach_group(), but it is not so clear if that is safe for the few users. Thus, add a new API to serve this new purpose.
All drivers are already required to support changing between active UNMANAGED domains when using their attach_dev ops.
This API is expected to be used only by IOMMUFD, so add to the iommu-priv header and mark it as IOMMUFD_INTERNAL.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommu-priv.h | 10 ++++++++++ drivers/iommu/iommu.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) create mode 100644 drivers/iommu/iommu-priv.h
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h new file mode 100644 index 00000000000000..7c8011bfd15374 --- /dev/null +++ b/drivers/iommu/iommu-priv.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __LINUX_IOMMU_PRIV_H +#define __LINUX_IOMMU_PRIV_H + +#include <linux/iommu.h> + +int iommu_group_replace_domain(struct iommu_group *group, + struct iommu_domain *new_domain); + +#endif /* __LINUX_IOMMU_PRIV_H */ diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 280585965bfec5..91573efd94881a 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -34,6 +34,7 @@ #include <linux/msi.h>
#include "dma-iommu.h" +#include "iommu-priv.h"
#include "iommu-sva.h"
@@ -2221,6 +2222,35 @@ int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group) } EXPORT_SYMBOL_GPL(iommu_attach_group);
+/** + * iommu_group_replace_domain - replace the domain that a group is attached to + * @new_domain: new IOMMU domain to replace with + * @group: IOMMU group that will be attached to the new domain + * + * This API allows the group to switch domains without being forced to go to + * the blocking domain in-between. + * + * If the currently attached domain is a core domain (e.g. a default_domain), + * it will act just like the iommu_attach_group(). + */ +int iommu_group_replace_domain(struct iommu_group *group, + struct iommu_domain *new_domain) +{ + int ret; + + if (!new_domain) + return -EINVAL; + + mutex_lock(&group->mutex); + ret = __iommu_group_set_domain(group, new_domain); + if (ret) + __iommu_group_for_each_dev(group, group->domain, + iommu_group_do_attach_device); + mutex_unlock(&group->mutex); + return ret; +} +EXPORT_SYMBOL_NS_GPL(iommu_group_replace_domain, IOMMUFD_INTERNAL); + static int iommu_group_do_set_platform_dma(struct device *dev, void *data) { const struct iommu_ops *ops = dev_iommu_ops(dev);
Replace allows all the devices in a group to move in one step to a new HWPT. Further, the HWPT move is done without going through a blocking domain so that the IOMMU driver can implement some level of non-distruption to ongoing DMA if that has meaning for it (eg for future special driver domains)
Replace uses a lot of the same logic as normal attach, except the actual domain change over has different restrictions, and we are careful to sequence things so that failure is going to leave everything the way it was, and not get trapped in a blocking domain or something if there is ENOMEM.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 99 ++++++++++++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 1 + 2 files changed, 100 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index b7868c877d1c1c..ce758fbe3c525d 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -4,6 +4,7 @@ #include <linux/iommufd.h> #include <linux/slab.h> #include <linux/iommu.h> +#include "../iommu-priv.h"
#include "io_pagetable.h" #include "iommufd_private.h" @@ -365,6 +366,84 @@ iommufd_device_do_attach(struct iommufd_device *idev, return NULL; }
+static struct iommufd_hw_pagetable * +iommufd_device_do_replace(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt) +{ + struct iommufd_group *igroup = idev->igroup; + struct iommufd_hw_pagetable *old_hwpt; + unsigned int num_devices = 0; + struct iommufd_device *cur; + int rc; + + mutex_lock(&idev->igroup->lock); + + if (igroup->hwpt == NULL) { + rc = -EINVAL; + goto err_unlock; + } + + if (hwpt == igroup->hwpt) { + mutex_unlock(&idev->igroup->lock); + return NULL; + } + + /* Try to upgrade the domain we have */ + list_for_each_entry(cur, &igroup->device_list, group_item) { + num_devices++; + if (cur->enforce_cache_coherency) { + rc = iommufd_hw_pagetable_enforce_cc(hwpt); + if (rc) + goto err_unlock; + } + } + + old_hwpt = igroup->hwpt; + if (hwpt->ioas != old_hwpt->ioas) { + list_for_each_entry(cur, &igroup->device_list, group_item) { + rc = iopt_table_enforce_dev_resv_regions( + &hwpt->ioas->iopt, cur->dev, NULL); + if (rc) + goto err_unresv; + } + } + + rc = iommufd_group_setup_msi(idev->igroup, hwpt); + if (rc) + goto err_unresv; + + rc = iommu_group_replace_domain(igroup->group, hwpt->domain); + if (rc) + goto err_unresv; + + if (hwpt->ioas != old_hwpt->ioas) { + list_for_each_entry(cur, &igroup->device_list, group_item) + iopt_remove_reserved_iova(&old_hwpt->ioas->iopt, + cur->dev); + } + + igroup->hwpt = hwpt; + + /* + * Move the refcounts held by the device_list to the new hwpt. Retain a + * refcount for this thread as the caller will free it. + */ + refcount_add(num_devices, &hwpt->obj.users); + if (num_devices > 1) + WARN_ON(refcount_sub_and_test(num_devices - 1, + &old_hwpt->obj.users)); + mutex_unlock(&idev->igroup->lock); + + /* Caller must destroy old_hwpt */ + return old_hwpt; +err_unresv: + list_for_each_entry(cur, &igroup->device_list, group_item) + iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev); +err_unlock: + mutex_unlock(&idev->igroup->lock); + return ERR_PTR(rc); +} + typedef struct iommufd_hw_pagetable *(*attach_fn)( struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
@@ -523,6 +602,26 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) } EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
+/** + * iommufd_device_replace - Change the device's iommu_domain + * @idev: device to change + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE + * Output the IOMMUFD_OBJ_HW_PAGETABLE ID + * + * This is the same as + * iommufd_device_detach(); + * iommufd_device_attach(); + * If it fails then no change is made to the attachment. The iommu driver may + * implement this so there is no disruption in translation. This can only be + * called if iommufd_device_attach() has already succeeded. + */ +int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id) +{ + return iommufd_device_change_pt(idev, pt_id, + &iommufd_device_do_replace); +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD); + /** * iommufd_device_detach - Disconnect a device to an iommu_domain * @idev: device to detach diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 24f30f384df6f9..5b7f70543adb24 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -466,5 +466,6 @@ module_exit(iommufd_exit); MODULE_ALIAS_MISCDEV(VFIO_MINOR); MODULE_ALIAS("devname:vfio/vfio"); #endif +MODULE_IMPORT_NS(IOMMUFD_INTERNAL); MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices"); MODULE_LICENSE("GPL");
From: Jason Gunthorpe jgg@nvidia.com Sent: Monday, May 15, 2023 10:00 PM
Replace allows all the devices in a group to move in one step to a new HWPT. Further, the HWPT move is done without going through a blocking domain so that the IOMMU driver can implement some level of non-distruption to ongoing DMA if that has meaning for it (eg for future special driver domains)
Replace uses a lot of the same logic as normal attach, except the actual domain change over has different restrictions, and we are careful to sequence things so that failure is going to leave everything the way it was, and not get trapped in a blocking domain or something if there is ENOMEM.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
drivers/iommu/iommufd/device.c | 99 ++++++++++++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 1 + 2 files changed, 100 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index b7868c877d1c1c..ce758fbe3c525d 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -4,6 +4,7 @@ #include <linux/iommufd.h> #include <linux/slab.h> #include <linux/iommu.h> +#include "../iommu-priv.h"
#include "io_pagetable.h" #include "iommufd_private.h" @@ -365,6 +366,84 @@ iommufd_device_do_attach(struct iommufd_device *idev, return NULL; }
+static struct iommufd_hw_pagetable * +iommufd_device_do_replace(struct iommufd_device *idev,
struct iommufd_hw_pagetable *hwpt)
+{
- struct iommufd_group *igroup = idev->igroup;
- struct iommufd_hw_pagetable *old_hwpt;
- unsigned int num_devices = 0;
- struct iommufd_device *cur;
- int rc;
- mutex_lock(&idev->igroup->lock);
- if (igroup->hwpt == NULL) {
rc = -EINVAL;
goto err_unlock;
- }
- if (hwpt == igroup->hwpt) {
mutex_unlock(&idev->igroup->lock);
return NULL;
- }
- /* Try to upgrade the domain we have */
- list_for_each_entry(cur, &igroup->device_list, group_item) {
num_devices++;
if (cur->enforce_cache_coherency) {
rc = iommufd_hw_pagetable_enforce_cc(hwpt);
if (rc)
goto err_unlock;
}
- }
- old_hwpt = igroup->hwpt;
- if (hwpt->ioas != old_hwpt->ioas) {
list_for_each_entry(cur, &igroup->device_list, group_item) {
rc = iopt_table_enforce_dev_resv_regions(
&hwpt->ioas->iopt, cur->dev, NULL);
if (rc)
goto err_unresv;
}
- }
- rc = iommufd_group_setup_msi(idev->igroup, hwpt);
- if (rc)
goto err_unresv;
- rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
- if (rc)
goto err_unresv;
- if (hwpt->ioas != old_hwpt->ioas) {
list_for_each_entry(cur, &igroup->device_list, group_item)
iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
cur->dev);
- }
- igroup->hwpt = hwpt;
- /*
* Move the refcounts held by the device_list to the new hwpt. Retain a
* refcount for this thread as the caller will free it.
*/
- refcount_add(num_devices, &hwpt->obj.users);
- if (num_devices > 1)
WARN_ON(refcount_sub_and_test(num_devices - 1,
&old_hwpt->obj.users));
- mutex_unlock(&idev->igroup->lock);
- /* Caller must destroy old_hwpt */
- return old_hwpt;
+err_unresv:
- list_for_each_entry(cur, &igroup->device_list, group_item)
iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev);
+err_unlock:
- mutex_unlock(&idev->igroup->lock);
- return ERR_PTR(rc);
+}
typedef struct iommufd_hw_pagetable *(*attach_fn)( struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
@@ -523,6 +602,26 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) } EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
+/**
- iommufd_device_replace - Change the device's iommu_domain
- @idev: device to change
- @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
Output the IOMMUFD_OBJ_HW_PAGETABLE ID
- This is the same as
- iommufd_device_detach();
- iommufd_device_attach();
One blank line here would fix a warning as below in "make htmldocs".
Documentation/userspace-api/iommufd:184: ./drivers/iommu/iommufd/device.c:665: WARNING: Definition list ends without a blank line; unexpected unindent.
Regards, Yi Liu
- If it fails then no change is made to the attachment. The iommu driver may
- implement this so there is no disruption in translation. This can only be
- called if iommufd_device_attach() has already succeeded.
- */
+int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id) +{
- return iommufd_device_change_pt(idev, pt_id,
&iommufd_device_do_replace);
+} +EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
/**
- iommufd_device_detach - Disconnect a device to an iommu_domain
- @idev: device to detach
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 24f30f384df6f9..5b7f70543adb24 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -466,5 +466,6 @@ module_exit(iommufd_exit); MODULE_ALIAS_MISCDEV(VFIO_MINOR); MODULE_ALIAS("devname:vfio/vfio"); #endif +MODULE_IMPORT_NS(IOMMUFD_INTERNAL); MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices"); MODULE_LICENSE("GPL"); -- 2.40.1
On Fri, Jul 07, 2023 at 08:00:37AM +0000, Liu, Yi L wrote:
One blank line here would fix a warning as below in "make htmldocs".
Documentation/userspace-api/iommufd:184: ./drivers/iommu/iommufd/device.c:665: WARNING: Definition list ends without a blank line; unexpected unindent.
Yes, I did it a while ago, it will be in v8
Thanks, Jason
The selftest invokes things like replace under the object lock of its idev which protects the idev in a similar way to a real user. Unfortunately this triggers lockdep. A lock class per type will solve the problem.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/main.c | 11 ++++++++++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 07d60d0640fee3..79942934a4d69a 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -119,6 +119,7 @@ enum iommufd_object_type { #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif + IOMMUFD_OBJ_MAX, };
/* Base struct for all objects with a userspace ID handle. */ diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 5b7f70543adb24..f219e47d882525 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -33,6 +33,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type) { + static struct lock_class_key obj_keys[IOMMUFD_OBJ_MAX]; struct iommufd_object *obj; int rc;
@@ -40,7 +41,15 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, if (!obj) return ERR_PTR(-ENOMEM); obj->type = type; - init_rwsem(&obj->destroy_rwsem); + /* + * In most cases the destroy_rwsem is obtained with try so it doesn't + * interact with lockdep, however on destroy we have to sleep. This + * means if we have to destroy an object while holding a get on another + * object it triggers lockdep. Using one locking class per object type + * is a simple and reasonable way to avoid this. + */ + __init_rwsem(&obj->destroy_rwsem, "iommufd_object::destroy_rwsem", + &obj_keys[type]); refcount_set(&obj->users, 1);
/*
From: Nicolin Chen nicolinc@nvidia.com
Allow the selftest to call the function on the mock idev, add some tests to exercise it.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 4 ++ drivers/iommu/iommufd/selftest.c | 39 +++++++++++++++++ include/linux/iommufd.h | 1 + tools/testing/selftests/iommu/iommufd.c | 37 ++++++++++++++-- .../selftests/iommu/iommufd_fail_nth.c | 42 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 30 +++++++++++++ 6 files changed, 149 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index b3d69cca77295c..e3f1035cbd6464 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -17,6 +17,7 @@ enum { IOMMU_TEST_OP_ACCESS_PAGES, IOMMU_TEST_OP_ACCESS_RW, IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT, + IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE, };
enum { @@ -52,6 +53,9 @@ struct iommu_test_cmd { __u32 out_stdev_id; __u32 out_hwpt_id; } mock_domain; + struct { + __u32 pt_id; + } mock_domain_replace; struct { __aligned_u64 iova; __aligned_u64 length; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 74c2076105d486..eb33dffb955dcf 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -455,6 +455,42 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd, return rc; }
+/* Replace the mock domain with a manually allocated hw_pagetable */ +static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd, + unsigned int device_id, u32 pt_id, + struct iommu_test_cmd *cmd) +{ + struct iommufd_object *dev_obj; + struct selftest_obj *sobj; + int rc; + + /* + * Prefer to use the OBJ_SELFTEST because the destroy_rwsem will ensure + * it doesn't race with detach, which is not allowed. + */ + dev_obj = + iommufd_get_object(ucmd->ictx, device_id, IOMMUFD_OBJ_SELFTEST); + if (IS_ERR(dev_obj)) + return PTR_ERR(dev_obj); + + sobj = container_of(dev_obj, struct selftest_obj, obj); + if (sobj->type != TYPE_IDEV) { + rc = -EINVAL; + goto out_dev_obj; + } + + rc = iommufd_device_replace(sobj->idev.idev, &pt_id); + if (rc) + goto out_dev_obj; + + cmd->mock_domain_replace.pt_id = pt_id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + +out_dev_obj: + iommufd_put_object(dev_obj); + return rc; +} + /* Add an additional reserved IOVA to the IOAS */ static int iommufd_test_add_reserved(struct iommufd_ucmd *ucmd, unsigned int mockpt_id, @@ -948,6 +984,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->add_reserved.length); case IOMMU_TEST_OP_MOCK_DOMAIN: return iommufd_test_mock_domain(ucmd, cmd); + case IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE: + return iommufd_test_mock_domain_replace( + ucmd, cmd->id, cmd->mock_domain_replace.pt_id, cmd); case IOMMU_TEST_OP_MD_CHECK_MAP: return iommufd_test_md_check_pa( ucmd, cmd->id, cmd->check_map.iova, diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 1129a36a74c442..5e417868f575b9 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -22,6 +22,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, void iommufd_device_unbind(struct iommufd_device *idev);
int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id); +int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id); void iommufd_device_detach(struct iommufd_device *idev);
struct iommufd_access_ops { diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index e4a6b33cfde48f..96881ecca5c757 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -9,9 +9,6 @@
#include "iommufd_utils.h"
-static void *buffer; - -static unsigned long PAGE_SIZE; static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2) @@ -1035,6 +1032,7 @@ FIXTURE(iommufd_mock_domain) uint32_t ioas_id; uint32_t hwpt_id; uint32_t hwpt_ids[2]; + uint32_t stdev_ids[2]; int mmap_flags; size_t mmap_buf_size; }; @@ -1056,7 +1054,8 @@ FIXTURE_SETUP(iommufd_mock_domain) ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
for (i = 0; i != variant->mock_domains; i++) - test_cmd_mock_domain(self->ioas_id, NULL, &self->hwpt_ids[i]); + test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], + &self->hwpt_ids[i]); self->hwpt_id = self->hwpt_ids[0];
self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS; @@ -1308,6 +1307,36 @@ TEST_F(iommufd_mock_domain, user_copy) test_ioctl_destroy(ioas_id); }
+TEST_F(iommufd_mock_domain, replace) +{ + uint32_t ioas_id; + + test_ioctl_ioas_alloc(&ioas_id); + + test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id); + + /* + * Replacing the IOAS causes the prior HWPT to be deallocated, thus we + * should get enoent when we try to use it. + */ + if (variant->mock_domains == 1) + test_err_mock_domain_replace(ENOENT, self->stdev_ids[0], + self->hwpt_ids[0]); + + test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id); + if (variant->mock_domains >= 2) { + test_cmd_mock_domain_replace(self->stdev_ids[0], + self->hwpt_ids[1]); + test_cmd_mock_domain_replace(self->stdev_ids[0], + self->hwpt_ids[1]); + test_cmd_mock_domain_replace(self->stdev_ids[0], + self->hwpt_ids[0]); + } + + test_cmd_mock_domain_replace(self->stdev_ids[0], self->ioas_id); + test_ioctl_destroy(ioas_id); +} + /* VFIO compatibility IOCTLs */
TEST_F(iommufd, simple_ioctls) diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index d9afcb23810e1a..96fb2f04623f23 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -41,6 +41,8 @@ static int writeat(int dfd, const char *fn, const char *val)
static __attribute__((constructor)) void setup_buffer(void) { + PAGE_SIZE = sysconf(_SC_PAGE_SIZE); + BUFFER_SIZE = 2*1024*1024;
buffer = mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE, @@ -569,4 +571,44 @@ TEST_FAIL_NTH(basic_fail_nth, access_pin_domain) return 0; }
+/* device.c */ +TEST_FAIL_NTH(basic_fail_nth, device) +{ + uint32_t ioas_id; + uint32_t ioas_id2; + uint32_t stdev_id; + __u64 iova; + + self->fd = open("/dev/iommu", O_RDWR); + if (self->fd == -1) + return -1; + + if (_test_ioctl_ioas_alloc(self->fd, &ioas_id)) + return -1; + + if (_test_ioctl_ioas_alloc(self->fd, &ioas_id2)) + return -1; + + iova = MOCK_APERTURE_START; + if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, PAGE_SIZE, &iova, + IOMMU_IOAS_MAP_FIXED_IOVA | + IOMMU_IOAS_MAP_WRITEABLE | + IOMMU_IOAS_MAP_READABLE)) + return -1; + if (_test_ioctl_ioas_map(self->fd, ioas_id2, buffer, PAGE_SIZE, &iova, + IOMMU_IOAS_MAP_FIXED_IOVA | + IOMMU_IOAS_MAP_WRITEABLE | + IOMMU_IOAS_MAP_READABLE)) + return -1; + + fail_nth_enable(); + + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL)) + return -1; + + if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) + return -1; + return 0; +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 85d6662ef8e867..8b11bb70095269 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -19,6 +19,8 @@ static void *buffer; static unsigned long BUFFER_SIZE;
+static unsigned long PAGE_SIZE; + /* * Have the kernel check the refcount on pages. I don't know why a freshly * mmap'd anon non-compound page starts out with a ref of 3 @@ -66,6 +68,34 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id, EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \ stdev_id, hwpt_id))
+static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, + __u32 *hwpt_id) +{ + struct iommu_test_cmd cmd = { + .size = sizeof(cmd), + .op = IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE, + .id = stdev_id, + .mock_domain_replace = { + .pt_id = pt_id, + }, + }; + int ret; + + ret = ioctl(fd, IOMMU_TEST_CMD, &cmd); + if (ret) + return ret; + if (hwpt_id) + *hwpt_id = cmd.mock_domain_replace.pt_id; + return 0; +} + +#define test_cmd_mock_domain_replace(stdev_id, pt_id) \ + ASSERT_EQ(0, _test_cmd_mock_domain_replace(self->fd, stdev_id, pt_id, \ + NULL)) +#define test_err_mock_domain_replace(_errno, stdev_id, pt_id) \ + EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \ + pt_id, NULL)) + static int _test_cmd_create_access(int fd, unsigned int ioas_id, __u32 *access_id, unsigned int flags) {
This allows userspace to manually create HWPTs on IOAS's and then use those HWPTs as inputs to iommufd_device_attach/replace().
Following series will extend this to allow creating iommu_domains with driver specific parameters.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/hw_pagetable.c | 46 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 9 +++++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 26 ++++++++++++++ 4 files changed, 84 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 2087b51d9807ad..cf2c1504e20d84 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -3,6 +3,7 @@ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES */ #include <linux/iommu.h> +#include <uapi/linux/iommufd.h>
#include "iommufd_private.h"
@@ -131,3 +132,48 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, iommufd_object_abort_and_destroy(ictx, &hwpt->obj); return ERR_PTR(rc); } + +int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) +{ + struct iommu_hwpt_alloc *cmd = ucmd->cmd; + struct iommufd_hw_pagetable *hwpt; + struct iommufd_device *idev; + struct iommufd_ioas *ioas; + int rc; + + if (cmd->flags || cmd->__reserved) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id); + if (IS_ERR(ioas)) { + rc = PTR_ERR(ioas); + goto out_put_idev; + } + + mutex_lock(&ioas->mutex); + hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false); + if (IS_ERR(hwpt)) { + rc = PTR_ERR(hwpt); + goto out_unlock; + } + + cmd->out_hwpt_id = hwpt->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_hwpt; + iommufd_object_finalize(ucmd->ictx, &hwpt->obj); + goto out_unlock; + +out_hwpt: + iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); +out_unlock: + mutex_unlock(&ioas->mutex); + iommufd_put_object(&ioas->obj); +out_put_idev: + iommufd_put_object(&idev->obj); + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 79942934a4d69a..54916fcf8d2a3c 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -262,6 +262,7 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); void iommufd_hw_pagetable_abort(struct iommufd_object *obj); +int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) @@ -298,6 +299,14 @@ struct iommufd_device { bool enforce_cache_coherency; };
+static inline struct iommufd_device * +iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_DEVICE), + struct iommufd_device, obj); +} + void iommufd_device_destroy(struct iommufd_object *obj);
struct iommufd_access { diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index f219e47d882525..73a91e96896252 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -265,6 +265,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd)
union ucmd_buffer { struct iommu_destroy destroy; + struct iommu_hwpt_alloc hwpt; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -296,6 +297,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), + IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, + __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 98ebba80cfa1fc..8245c01adca673 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -45,6 +45,7 @@ enum { IOMMUFD_CMD_IOAS_UNMAP, IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, + IOMMUFD_CMD_HWPT_ALLOC, };
/** @@ -344,4 +345,29 @@ struct iommu_vfio_ioas { __u16 __reserved; }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS) + +/** + * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) + * @size: sizeof(struct iommu_hwpt_alloc) + * @flags: Must be 0 + * @dev_id: The device to allocate this HWPT for + * @pt_id: The IOAS to connect this HWPT to + * @out_hwpt_id: The ID of the new HWPT + * @__reserved: Must be 0 + * + * Explicitly allocate a hardware page table object. This is the same object + * type that is returned by iommufd_device_attach() and represents the + * underlying iommu driver's iommu_domain kernel object. + * + * A HWPT will be created with the IOVA mappings from the given IOAS. + */ +struct iommu_hwpt_alloc { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 pt_id; + __u32 out_hwpt_id; + __u32 __reserved; +}; +#define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC) #endif
Now that we actually call iommufd_device_bind() we can return the idev_id from that function to userspace for use in other APIs.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 2 ++ drivers/iommu/iommufd/selftest.c | 1 + tools/testing/selftests/iommu/iommufd.c | 17 +++++++++-------- .../testing/selftests/iommu/iommufd_fail_nth.c | 18 ++++++++++-------- tools/testing/selftests/iommu/iommufd_utils.h | 12 +++++++----- 5 files changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index e3f1035cbd6464..dd9168a20ddf4c 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -52,6 +52,8 @@ struct iommu_test_cmd { struct { __u32 out_stdev_id; __u32 out_hwpt_id; + /* out_idev_id is the standard iommufd_bind object */ + __u32 out_idev_id; } mock_domain; struct { __u32 pt_id; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index eb33dffb955dcf..9d43334e4faf16 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -443,6 +443,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd, /* Userspace must destroy the device_id to destroy the object */ cmd->mock_domain.out_hwpt_id = pt_id; cmd->mock_domain.out_stdev_id = sobj->obj.id; + cmd->mock_domain.out_idev_id = idev_id; iommufd_object_finalize(ucmd->ictx, &sobj->obj); return iommufd_ucmd_respond(ucmd, sizeof(*cmd));
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 96881ecca5c757..78841aba47d038 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -211,7 +211,7 @@ FIXTURE_SETUP(iommufd_ioas)
for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, - &self->hwpt_id); + &self->hwpt_id, NULL); self->base_iova = MOCK_APERTURE_START; } } @@ -262,7 +262,7 @@ TEST_F(iommufd_ioas, hwpt_attach) { /* Create a device attached directly to a hwpt */ if (self->stdev_id) { - test_cmd_mock_domain(self->hwpt_id, NULL, NULL); + test_cmd_mock_domain(self->hwpt_id, NULL, NULL, NULL); } else { test_err_mock_domain(ENOENT, self->hwpt_id, NULL, NULL); } @@ -681,7 +681,7 @@ TEST_F(iommufd_ioas, access_pin) _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES), &access_cmd)); test_cmd_mock_domain(self->ioas_id, &mock_stdev_id, - &mock_hwpt_id); + &mock_hwpt_id, NULL); check_map_cmd.id = mock_hwpt_id; ASSERT_EQ(0, ioctl(self->fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP), @@ -836,7 +836,7 @@ TEST_F(iommufd_ioas, fork_gone) * If a domain already existed then everything was pinned within * the fork, so this copies from one domain to another. */ - test_cmd_mock_domain(self->ioas_id, NULL, NULL); + test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL); check_access_rw(_metadata, self->fd, access_id, MOCK_APERTURE_START, 0);
@@ -885,7 +885,7 @@ TEST_F(iommufd_ioas, fork_present) ASSERT_EQ(8, read(efd, &tmp, sizeof(tmp)));
/* Read pages from the remote process */ - test_cmd_mock_domain(self->ioas_id, NULL, NULL); + test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL); check_access_rw(_metadata, self->fd, access_id, MOCK_APERTURE_START, 0);
ASSERT_EQ(0, close(pipefds[1])); @@ -1033,6 +1033,7 @@ FIXTURE(iommufd_mock_domain) uint32_t hwpt_id; uint32_t hwpt_ids[2]; uint32_t stdev_ids[2]; + uint32_t idev_ids[2]; int mmap_flags; size_t mmap_buf_size; }; @@ -1055,7 +1056,7 @@ FIXTURE_SETUP(iommufd_mock_domain)
for (i = 0; i != variant->mock_domains; i++) test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], - &self->hwpt_ids[i]); + &self->hwpt_ids[i], &self->idev_ids[i]); self->hwpt_id = self->hwpt_ids[0];
self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS; @@ -1249,7 +1250,7 @@ TEST_F(iommufd_mock_domain, all_aligns_copy) /* Add and destroy a domain while the area exists */ old_id = self->hwpt_ids[1]; test_cmd_mock_domain(self->ioas_id, &mock_stdev_id, - &self->hwpt_ids[1]); + &self->hwpt_ids[1], NULL);
check_mock_iova(buf + start, iova, length); check_refs(buf + start / PAGE_SIZE * PAGE_SIZE, @@ -1458,7 +1459,7 @@ FIXTURE_SETUP(vfio_compat_mock_domain)
/* Create what VFIO would consider a group */ test_ioctl_ioas_alloc(&self->ioas_id); - test_cmd_mock_domain(self->ioas_id, NULL, NULL); + test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL);
/* Attach it to the vfio compat */ vfio_ioas_cmd.ioas_id = self->ioas_id; diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index 96fb2f04623f23..8ab20df4edc8c7 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -315,7 +315,7 @@ TEST_FAIL_NTH(basic_fail_nth, map_domain)
fail_nth_enable();
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) return -1;
if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, 262144, &iova, @@ -326,7 +326,7 @@ TEST_FAIL_NTH(basic_fail_nth, map_domain) if (_test_ioctl_destroy(self->fd, stdev_id)) return -1;
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) return -1; return 0; } @@ -350,12 +350,13 @@ TEST_FAIL_NTH(basic_fail_nth, map_two_domains) if (_test_ioctl_set_temp_memory_limit(self->fd, 32)) return -1;
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) return -1;
fail_nth_enable();
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2, + NULL)) return -1;
if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, 262144, &iova, @@ -369,9 +370,10 @@ TEST_FAIL_NTH(basic_fail_nth, map_two_domains) if (_test_ioctl_destroy(self->fd, stdev_id2)) return -1;
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) return -1; - if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2, + NULL)) return -1; return 0; } @@ -528,7 +530,7 @@ TEST_FAIL_NTH(basic_fail_nth, access_pin_domain) if (_test_ioctl_set_temp_memory_limit(self->fd, 32)) return -1;
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) return -1;
if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, BUFFER_SIZE, &iova, @@ -603,7 +605,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
fail_nth_enable();
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL, NULL)) return -1;
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 8b11bb70095269..62e01412a7679e 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -41,7 +41,7 @@ static unsigned long PAGE_SIZE; })
static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id, - __u32 *hwpt_id) + __u32 *hwpt_id, __u32 *idev_id) { struct iommu_test_cmd cmd = { .size = sizeof(cmd), @@ -59,14 +59,16 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id, assert(cmd.id != 0); if (hwpt_id) *hwpt_id = cmd.mock_domain.out_hwpt_id; + if (idev_id) + *idev_id = cmd.mock_domain.out_idev_id; return 0; } -#define test_cmd_mock_domain(ioas_id, stdev_id, hwpt_id) \ - ASSERT_EQ(0, \ - _test_cmd_mock_domain(self->fd, ioas_id, stdev_id, hwpt_id)) +#define test_cmd_mock_domain(ioas_id, stdev_id, hwpt_id, idev_id) \ + ASSERT_EQ(0, _test_cmd_mock_domain(self->fd, ioas_id, stdev_id, \ + hwpt_id, idev_id)) #define test_err_mock_domain(_errno, ioas_id, stdev_id, hwpt_id) \ EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \ - stdev_id, hwpt_id)) + stdev_id, hwpt_id, NULL))
static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, __u32 *hwpt_id)
Test the basic flow.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/io_pagetable.c | 3 +++ tools/testing/selftests/iommu/iommufd.c | 15 +++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 11 +++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 21 +++++++++++++++++++ 4 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c index f842768b2e250b..21052f64f95649 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -1172,6 +1172,9 @@ int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt, unsigned int num_sw_msi = 0; int rc;
+ if (iommufd_should_fail()) + return -EINVAL; + down_write(&iopt->iova_rwsem); /* FIXME: drivers allocate memory but there is no failure propogated */ iommu_get_resv_regions(dev, &resv_regions); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 78841aba47d038..dc09c1de319f05 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -1338,6 +1338,21 @@ TEST_F(iommufd_mock_domain, replace) test_ioctl_destroy(ioas_id); }
+TEST_F(iommufd_mock_domain, alloc_hwpt) +{ + int i; + + for (i = 0; i != variant->mock_domains; i++) { + uint32_t stddev_id; + uint32_t hwpt_id; + + test_cmd_hwpt_alloc(self->idev_ids[0], self->ioas_id, &hwpt_id); + test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); + test_ioctl_destroy(stddev_id); + test_ioctl_destroy(hwpt_id); + } +} + /* VFIO compatibility IOCTLs */
TEST_F(iommufd, simple_ioctls) diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index 8ab20df4edc8c7..d4c552e5694812 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -579,6 +579,8 @@ TEST_FAIL_NTH(basic_fail_nth, device) uint32_t ioas_id; uint32_t ioas_id2; uint32_t stdev_id; + uint32_t idev_id; + uint32_t hwpt_id; __u64 iova;
self->fd = open("/dev/iommu", O_RDWR); @@ -605,11 +607,18 @@ TEST_FAIL_NTH(basic_fail_nth, device)
fail_nth_enable();
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL, NULL)) + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL, + &idev_id)) + return -1; + + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, &hwpt_id)) return -1;
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) return -1; + + if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL)) + return -1; return 0; }
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 62e01412a7679e..53b4d3f2d9fc6c 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -98,6 +98,27 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \ pt_id, NULL))
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, + __u32 *hwpt_id) +{ + struct iommu_hwpt_alloc cmd = { + .size = sizeof(cmd), + .dev_id = device_id, + .pt_id = pt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &cmd); + if (ret) + return ret; + if (hwpt_id) + *hwpt_id = cmd.out_hwpt_id; + return 0; +} + +#define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id)) + static int _test_cmd_create_access(int fd, unsigned int ioas_id, __u32 *access_id, unsigned int flags) {
On Mon, May 15, 2023 at 11:00:15AM -0300, Jason Gunthorpe wrote:
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v7:
I built my latest nesting branch on top of this branch, by git-merging cdev v11, and rebasing nesting v2 patches that include replace, hw_info, set_dev_data, and SMMU series: https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-05152023-nic
Things work fine with the nested SMMUv3. So,
Tested-by: Nicolin Chen nicolinc@nvidia.com
Hopefully we can merge this and cdev soon.
Thanks!
linux-kselftest-mirror@lists.linaro.org