[ This series depends on the VFIO device cdev series ]
Changelog v10: * Added Reviewed-by from Jason * Replaced the iommufd_ref_to_user call with refcount_inc * Added a wrapper iommufd_access_change_ioas_id and used it in the iommufd_access_attach() and iommufd_access_replace() APIs v9: https://lore.kernel.org/linux-iommu/cover.1690440730.git.nicolinc@nvidia.com... * Rebased on top of Jason's iommufd for-next tree * Added Reviewed-by from Jason and Alex * Reworked the replace API patches * Added a new patch allowing passing in to iopt_remove_access * Added a new patch of a helper function following Jason's design, mainly by blocking any concurrent detach/replace and keeping the refcount_dec at the end of the function * Added a call of the new helper in iommufd_access_destroy_object() to reduce race condition * Simplified the replace API patch v8: https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/ * Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series: https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.c... https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/ * Changed the order of detach() and attach() in replace(), to fix a bug v7: https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/ * Rebased on top of v6.4-rc1 and cdev v11 candidate * Fixed a wrong file in replace() API patch * Added Kevin's "Reviewed-by" to replace() API patch v6: https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/ * Rebased on top of cdev v8 series https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/ * Added "Reviewed-by" from Kevin to PATCH-4 * Squashed access->ioas updating lines into iommufd_access_change_pt(), and changed function return type accordingly for simplification. v5: https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/ * Kept the cmd->id in the iommufd_test_create_access() so the access can be created with an ioas by default. Then, renamed the previous ioctl IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so it would be used to replace an access->ioas pointer. * Added iommufd_access_replace() API after the introductions of the other two APIs iommufd_access_attach() and iommufd_access_detach(). * Since vdev->iommufd_attached is also set in emulated pathway too, call iommufd_access_update(), similar to the physical pathway. v4: https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/ * Rebased on top of Jason's series adding replace() and hwpt_alloc() https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.... * Rebased on top of cdev series v6 https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/ * Dropped the patch that's moved to cdev series. * Added unmap function pointer sanity before calling it. * Added "Reviewed-by" from Kevin and Yi. * Added back the VFIO change updating the ATTACH uAPI. v3: https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/ * Rebased on top of Jason's iommufd_hwpt branch: https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.com... * Dropped patches from this series accordingly. There were a couple of VFIO patches that will be submitted after the VFIO cdev series. Also, renamed the series to be "emulated". * Moved dma_unmap sanity patch to the first in the series. * Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways. * Added Kevin's "Reviewed-by" to two of the patches. * Fixed a NULL pointer bug in vfio_iommufd_emulated_bind(). * Moved unmap() call to the common place in iommufd_access_set_ioas(). v2: https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/ * Rebased on top of vfio_device cdev v2 series. * Update the kdoc and commit message of iommu_group_replace_domain(). * Dropped revert-to-core-domain part in iommu_group_replace_domain(). * Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas(). * Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the iommufd_access_set_ioas() call. * Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if vdev->ops->dma_unmap is not implemented. * Added a __iommmufd_device_detach helper and let the replace routine do a partial detach(). * Added restriction on auto_domains to use the replace feature. * Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric" from the has_group removal series. v1: https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group() for callers to attach a device from the default_domain (NULL if not being supported) to a given iommu domain, and iommu_detach_group() for callers to detach a device from a given domain to the default_domain. Internally, the detach_dev op is deprecated for the newer drivers with default_domain. This means that those drivers likely can switch an attaching domain to another one, without stagging the device at a blocking or default domain, for use cases such as: 1) vPASID mode, when a guest wants to replace a single pasid (PASID=0) table with a larger table (PASID=N) 2) Nesting mode, when switching the attaching device from an S2 domain to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD "physical" devices. The IOMMUFD "emulated" deivces will need some extra steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github: https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v10
Thank you Nicolin Chen
Nicolin Chen (6): vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() iommufd: Allow passing in iopt_access_list_id to iopt_remove_access() iommufd: Add iommufd_access_change_ioas(_id) helpers iommufd: Add iommufd_access_replace() API iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 132 ++++++++++++------ drivers/iommu/iommufd/io_pagetable.c | 6 +- drivers/iommu/iommufd/iommufd_private.h | 3 +- drivers/iommu/iommufd/iommufd_test.h | 4 + drivers/iommu/iommufd/selftest.c | 19 +++ drivers/vfio/iommufd.c | 11 +- drivers/vfio/vfio_main.c | 4 + include/linux/iommufd.h | 1 + include/uapi/linux/vfio.h | 6 + tools/testing/selftests/iommu/iommufd.c | 29 +++- tools/testing/selftests/iommu/iommufd_utils.h | 19 +++ 11 files changed, 184 insertions(+), 50 deletions(-)
A driver that doesn't implement ops->dma_unmap shouldn't be allowed to do vfio_pin/unpin_pages(), though it can use vfio_dma_rw() to access an iova range. Deny !ops->dma_unmap cases in vfio_pin/unpin_pages().
Suggested-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Yi Liu yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/vfio/vfio_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 902f06e52c48..0da8ed81a97d 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1483,6 +1483,8 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova, /* group->container cannot change while a vfio device is open */ if (!pages || !npage || WARN_ON(!vfio_assert_device_open(device))) return -EINVAL; + if (!device->ops->dma_unmap) + return -EINVAL; if (vfio_device_has_container(device)) return vfio_device_container_pin_pages(device, iova, npage, prot, pages); @@ -1520,6 +1522,8 @@ void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage) { if (WARN_ON(!vfio_assert_device_open(device))) return; + if (WARN_ON(!device->ops->dma_unmap)) + return;
if (vfio_device_has_container(device)) { vfio_device_container_unpin_pages(device, iova, npage);
This is a preparatory change for ioas replacement support for access. The replacement routine does an iopt_add_access() at a new IOAS first and then iopt_remove_access() at the old IOAS upon the success of the first call. However, the first call overrides the iopt_access_list_id in the access struct, resulting in that id un-erased in the xarray.
Add an iopt_access_list_id in iopt_remove_access, so the replacement routine can save the id before it gets overwritten, and pass the id in iopt_remove_access() for a proper cleanup.
The existing callers should just pass in access->iopt_access_list_id.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/device.c | 6 ++++-- drivers/iommu/iommufd/io_pagetable.c | 6 +++--- drivers/iommu/iommufd/iommufd_private.h | 3 ++- 3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 57c0e81f5073..7a3e8660b902 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -690,7 +690,8 @@ void iommufd_access_destroy_object(struct iommufd_object *obj) container_of(obj, struct iommufd_access, obj);
if (access->ioas) { - iopt_remove_access(&access->ioas->iopt, access); + iopt_remove_access(&access->ioas->iopt, access, + access->iopt_access_list_id); refcount_dec(&access->ioas->obj.users); access->ioas = NULL; } @@ -776,7 +777,8 @@ void iommufd_access_detach(struct iommufd_access *access) access->ops->unmap(access->data, 0, ULONG_MAX); mutex_lock(&access->ioas_lock); } - iopt_remove_access(&cur_ioas->iopt, access); + iopt_remove_access(&cur_ioas->iopt, access, + access->iopt_access_list_id); refcount_dec(&cur_ioas->obj.users); out: access->ioas_unpin = NULL; diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c index 4d095115c2d0..3a598182b761 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -1158,12 +1158,12 @@ int iopt_add_access(struct io_pagetable *iopt, struct iommufd_access *access) }
void iopt_remove_access(struct io_pagetable *iopt, - struct iommufd_access *access) + struct iommufd_access *access, + u32 iopt_access_list_id) { down_write(&iopt->domains_rwsem); down_write(&iopt->iova_rwsem); - WARN_ON(xa_erase(&iopt->access_list, access->iopt_access_list_id) != - access); + WARN_ON(xa_erase(&iopt->access_list, iopt_access_list_id) != access); WARN_ON(iopt_calculate_iova_alignment(iopt)); up_write(&iopt->iova_rwsem); up_write(&iopt->domains_rwsem); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index dba730129b8c..8ba786bc95ff 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -323,7 +323,8 @@ struct iommufd_access {
int iopt_add_access(struct io_pagetable *iopt, struct iommufd_access *access); void iopt_remove_access(struct io_pagetable *iopt, - struct iommufd_access *access); + struct iommufd_access *access, + u32 iopt_access_list_id); void iommufd_access_destroy_object(struct iommufd_object *obj);
#ifdef CONFIG_IOMMUFD_TEST
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 4:25 AM
This is a preparatory change for ioas replacement support for access. The replacement routine does an iopt_add_access() at a new IOAS first and then iopt_remove_access() at the old IOAS upon the success of the first call. However, the first call overrides the iopt_access_list_id in the access struct, resulting in that id un-erased in the xarray.
Add an iopt_access_list_id in iopt_remove_access, so the replacement routine can save the id before it gets overwritten, and pass the id in iopt_remove_access() for a proper cleanup.
The existing callers should just pass in access->iopt_access_list_id.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
The complication of the mutex and refcount will be amplified after we introduce the replace support for access. So, add a preparatory change of a constitutive helper iommufd_access_change_ioas() and its wrapper iommufd_access_change_ioas_id(). They can simply take care of existing iommufd_access_attach() and iommufd_access_detach(), with a less risk of race condition.
Also, update the unprotect routine in iommufd_access_destroy_object() to calling the new iommufd_access_change_ioas() helper.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/device.c | 123 +++++++++++++++++++++------------ 1 file changed, 80 insertions(+), 43 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 7a3e8660b902..e79cbedd8626 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -684,17 +684,82 @@ void iommufd_device_detach(struct iommufd_device *idev) } EXPORT_SYMBOL_NS_GPL(iommufd_device_detach, IOMMUFD);
+/* + * On success, it will refcount_inc() at a valid new_ioas and refcount_dec() at + * a valid cur_ioas (access->ioas). A caller passing in a valid new_ioas should + * call iommufd_put_object() if it does an iommufd_get_object() for a new_ioas. + */ +static int iommufd_access_change_ioas(struct iommufd_access *access, + struct iommufd_ioas *new_ioas) +{ + u32 iopt_access_list_id = access->iopt_access_list_id; + struct iommufd_ioas *cur_ioas = access->ioas; + int rc; + + lockdep_assert_held(&access->ioas_lock); + + /* We are racing with a concurrent detach, bail */ + if (cur_ioas != access->ioas_unpin) + return -EBUSY; + + if (IS_ERR(new_ioas)) + return PTR_ERR(new_ioas); + + if (cur_ioas == new_ioas) + return 0; + + /* + * Set ioas to NULL to block any further iommufd_access_pin_pages(). + * iommufd_access_unpin_pages() can continue using access->ioas_unpin. + */ + access->ioas = NULL; + + if (new_ioas) { + rc = iopt_add_access(&new_ioas->iopt, access); + if (rc) { + access->ioas = cur_ioas; + return rc; + } + refcount_inc(&new_ioas->obj.users); + } + + if (cur_ioas) { + if (access->ops->unmap) { + mutex_unlock(&access->ioas_lock); + access->ops->unmap(access->data, 0, ULONG_MAX); + mutex_lock(&access->ioas_lock); + } + iopt_remove_access(&cur_ioas->iopt, access, iopt_access_list_id); + refcount_dec(&cur_ioas->obj.users); + } + + access->ioas = new_ioas; + access->ioas_unpin = new_ioas; + + return 0; +} + +static int iommufd_access_change_ioas_id(struct iommufd_access *access, u32 id) +{ + struct iommufd_ioas *ioas = iommufd_get_ioas(access->ictx, id); + int rc; + + if (IS_ERR(ioas)) + return PTR_ERR(ioas); + rc = iommufd_access_change_ioas(access, ioas); + iommufd_put_object(&ioas->obj); + return rc; +} + void iommufd_access_destroy_object(struct iommufd_object *obj) { struct iommufd_access *access = container_of(obj, struct iommufd_access, obj);
- if (access->ioas) { - iopt_remove_access(&access->ioas->iopt, access, - access->iopt_access_list_id); - refcount_dec(&access->ioas->obj.users); - access->ioas = NULL; - } + mutex_lock(&access->ioas_lock); + if (access->ioas) + WARN_ON(iommufd_access_change_ioas(access, NULL)); + mutex_unlock(&access->ioas_lock); iommufd_ctx_put(access->ictx); }
@@ -761,60 +826,32 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
void iommufd_access_detach(struct iommufd_access *access) { - struct iommufd_ioas *cur_ioas = access->ioas; + int rc;
mutex_lock(&access->ioas_lock); - if (WARN_ON(!access->ioas)) - goto out; - /* - * Set ioas to NULL to block any further iommufd_access_pin_pages(). - * iommufd_access_unpin_pages() can continue using access->ioas_unpin. - */ - access->ioas = NULL; - - if (access->ops->unmap) { + if (WARN_ON(!access->ioas)) { mutex_unlock(&access->ioas_lock); - access->ops->unmap(access->data, 0, ULONG_MAX); - mutex_lock(&access->ioas_lock); + return; } - iopt_remove_access(&cur_ioas->iopt, access, - access->iopt_access_list_id); - refcount_dec(&cur_ioas->obj.users); -out: - access->ioas_unpin = NULL; + rc = iommufd_access_change_ioas(access, NULL); + WARN_ON(rc); mutex_unlock(&access->ioas_lock); } EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id) { - struct iommufd_ioas *new_ioas; - int rc = 0; + int rc;
mutex_lock(&access->ioas_lock); - if (WARN_ON(access->ioas || access->ioas_unpin)) { + if (WARN_ON(access->ioas)) { mutex_unlock(&access->ioas_lock); return -EINVAL; }
- new_ioas = iommufd_get_ioas(access->ictx, ioas_id); - if (IS_ERR(new_ioas)) { - mutex_unlock(&access->ioas_lock); - return PTR_ERR(new_ioas); - } - - rc = iopt_add_access(&new_ioas->iopt, access); - if (rc) { - mutex_unlock(&access->ioas_lock); - iommufd_put_object(&new_ioas->obj); - return rc; - } - iommufd_ref_to_users(&new_ioas->obj); - - access->ioas = new_ioas; - access->ioas_unpin = new_ioas; + rc = iommufd_access_change_ioas_id(access, ioas_id); mutex_unlock(&access->ioas_lock); - return 0; + return rc; } EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 4:25 AM
+static int iommufd_access_change_ioas(struct iommufd_access *access,
struct iommufd_ioas *new_ioas)
+{
- u32 iopt_access_list_id = access->iopt_access_list_id;
- struct iommufd_ioas *cur_ioas = access->ioas;
- int rc;
- lockdep_assert_held(&access->ioas_lock);
- /* We are racing with a concurrent detach, bail */
- if (cur_ioas != access->ioas_unpin)
return -EBUSY;
- if (IS_ERR(new_ioas))
return PTR_ERR(new_ioas);
iommufd_access_change_ioas_id() already checks errors.
void iommufd_access_destroy_object(struct iommufd_object *obj) { struct iommufd_access *access = container_of(obj, struct iommufd_access, obj);
- if (access->ioas) {
iopt_remove_access(&access->ioas->iopt, access,
access->iopt_access_list_id);
refcount_dec(&access->ioas->obj.users);
access->ioas = NULL;
- }
- mutex_lock(&access->ioas_lock);
- if (access->ioas)
WARN_ON(iommufd_access_change_ioas(access, NULL));
- mutex_unlock(&access->ioas_lock); iommufd_ctx_put(access->ictx);
}
this changes the behavior of destroy. Previously it always removes the access w/o detecting race while now it will give up and throw out a warning. While I'm fine with this change from bisec p.o.v. it might be good to split this into a separate patch.
void iommufd_access_detach(struct iommufd_access *access) {
- struct iommufd_ioas *cur_ioas = access->ioas;
int rc;
mutex_lock(&access->ioas_lock);
- if (WARN_ON(!access->ioas))
goto out;
- /*
* Set ioas to NULL to block any further iommufd_access_pin_pages().
* iommufd_access_unpin_pages() can continue using access-
ioas_unpin.
*/
- access->ioas = NULL;
- if (access->ops->unmap) {
- if (WARN_ON(!access->ioas)) { mutex_unlock(&access->ioas_lock);
access->ops->unmap(access->data, 0, ULONG_MAX);
mutex_lock(&access->ioas_lock);
}return;
- iopt_remove_access(&cur_ioas->iopt, access,
access->iopt_access_list_id);
- refcount_dec(&cur_ioas->obj.users);
-out:
- access->ioas_unpin = NULL;
- rc = iommufd_access_change_ioas(access, NULL);
- WARN_ON(rc);
'rc' can be removed.
Just "WARN_ON(iommufd_access_change_ioas(access, NULL));"
otherwise looks good to me,
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Jul 28, 2023 at 04:23:03AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 4:25 AM
+static int iommufd_access_change_ioas(struct iommufd_access *access,
struct iommufd_ioas *new_ioas)
+{
u32 iopt_access_list_id = access->iopt_access_list_id;
struct iommufd_ioas *cur_ioas = access->ioas;
int rc;
lockdep_assert_held(&access->ioas_lock);
/* We are racing with a concurrent detach, bail */
if (cur_ioas != access->ioas_unpin)
return -EBUSY;
if (IS_ERR(new_ioas))
return PTR_ERR(new_ioas);
iommufd_access_change_ioas_id() already checks errors.
I've thought about that: given that iommufd_access_change_ioas is a standalone API, though it's not used anywhere else at the moment, it might be safer to have this check again. Otherwise, we would need a line of comments saying that "caller must make sure that the input new_ioas is not holding an error code" or so?
void iommufd_access_destroy_object(struct iommufd_object *obj) { struct iommufd_access *access = container_of(obj, struct iommufd_access, obj);
if (access->ioas) {
iopt_remove_access(&access->ioas->iopt, access,
access->iopt_access_list_id);
refcount_dec(&access->ioas->obj.users);
access->ioas = NULL;
}
mutex_lock(&access->ioas_lock);
if (access->ioas)
WARN_ON(iommufd_access_change_ioas(access, NULL));
mutex_unlock(&access->ioas_lock); iommufd_ctx_put(access->ictx);
}
this changes the behavior of destroy. Previously it always removes the access w/o detecting race while now it will give up and throw out a warning.
You mean the -EBUSY case? That's a good catch..
While I'm fine with this change from bisec p.o.v. it might be good to split this into a separate patch.
Yea, I can do that.
void iommufd_access_detach(struct iommufd_access *access) {
struct iommufd_ioas *cur_ioas = access->ioas;
int rc; mutex_lock(&access->ioas_lock);
if (WARN_ON(!access->ioas))
goto out;
/*
* Set ioas to NULL to block any further iommufd_access_pin_pages().
* iommufd_access_unpin_pages() can continue using access-
ioas_unpin.
*/
access->ioas = NULL;
if (access->ops->unmap) {
if (WARN_ON(!access->ioas)) { mutex_unlock(&access->ioas_lock);
access->ops->unmap(access->data, 0, ULONG_MAX);
mutex_lock(&access->ioas_lock);
return; }
iopt_remove_access(&cur_ioas->iopt, access,
access->iopt_access_list_id);
refcount_dec(&cur_ioas->obj.users);
-out:
access->ioas_unpin = NULL;
rc = iommufd_access_change_ioas(access, NULL);
WARN_ON(rc);
'rc' can be removed.
Just "WARN_ON(iommufd_access_change_ioas(access, NULL));"
Will do that in v11.
otherwise looks good to me,
Reviewed-by: Kevin Tian kevin.tian@intel.com
Thanks! Nic
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 12:37 PM
On Fri, Jul 28, 2023 at 04:23:03AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 4:25 AM
+static int iommufd_access_change_ioas(struct iommufd_access *access,
struct iommufd_ioas *new_ioas)
+{
u32 iopt_access_list_id = access->iopt_access_list_id;
struct iommufd_ioas *cur_ioas = access->ioas;
int rc;
lockdep_assert_held(&access->ioas_lock);
/* We are racing with a concurrent detach, bail */
if (cur_ioas != access->ioas_unpin)
return -EBUSY;
if (IS_ERR(new_ioas))
return PTR_ERR(new_ioas);
iommufd_access_change_ioas_id() already checks errors.
I've thought about that: given that iommufd_access_change_ioas is a standalone API, though it's not used anywhere else at the moment, it might be safer to have this check again. Otherwise, we would need a line of comments saying that "caller must make sure that the input new_ioas is not holding an error code" or so?
I don't think it's a common practice for the caller to pass in an error pointer when it already knows it's an error...
On Fri, Jul 28, 2023 at 04:41:18AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 12:37 PM
On Fri, Jul 28, 2023 at 04:23:03AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 4:25 AM
+static int iommufd_access_change_ioas(struct iommufd_access *access,
struct iommufd_ioas *new_ioas)
+{
u32 iopt_access_list_id = access->iopt_access_list_id;
struct iommufd_ioas *cur_ioas = access->ioas;
int rc;
lockdep_assert_held(&access->ioas_lock);
/* We are racing with a concurrent detach, bail */
if (cur_ioas != access->ioas_unpin)
return -EBUSY;
if (IS_ERR(new_ioas))
return PTR_ERR(new_ioas);
iommufd_access_change_ioas_id() already checks errors.
I've thought about that: given that iommufd_access_change_ioas is a standalone API, though it's not used anywhere else at the moment, it might be safer to have this check again. Otherwise, we would need a line of comments saying that "caller must make sure that the input new_ioas is not holding an error code" or so?
I don't think it's a common practice for the caller to pass in an error pointer when it already knows it's an error...
OK. I will just drop it then.
Taking advantage of the new iommufd_access_change_ioas_id helper, add an iommufd_access_replace() API for VFIO emulated pathway to use.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/device.c | 15 +++++++++++++++ include/linux/iommufd.h | 1 + 2 files changed, 16 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index e79cbedd8626..7a35503b0123 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -855,6 +855,21 @@ int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id) } EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
+int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id) +{ + int rc; + + mutex_lock(&access->ioas_lock); + if (!access->ioas) { + mutex_unlock(&access->ioas_lock); + return -ENOENT; + } + rc = iommufd_access_change_ioas_id(access, ioas_id); + mutex_unlock(&access->ioas_lock); + return rc; +} +EXPORT_SYMBOL_NS_GPL(iommufd_access_replace, IOMMUFD); + /** * iommufd_access_notify_unmap - Notify users of an iopt to stop using it * @iopt: iopt to work on diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 0ac60256b659..ffc3a949f837 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -49,6 +49,7 @@ iommufd_access_create(struct iommufd_ctx *ictx, const struct iommufd_access_ops *ops, void *data, u32 *id); void iommufd_access_destroy(struct iommufd_access *access); int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id); +int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id); void iommufd_access_detach(struct iommufd_access *access);
void iommufd_ctx_get(struct iommufd_ctx *ictx);
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, July 28, 2023 4:25 AM
Taking advantage of the new iommufd_access_change_ioas_id helper, add an iommufd_access_replace() API for VFIO emulated pathway to use.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
Add a new IOMMU_TEST_OP_ACCESS_REPLACE_IOAS to allow replacing the access->ioas, corresponding to the iommufd_access_replace() helper.
Then add a replace coverage as a part of user_copy test case, which basically repeats the copy test after replacing the old ioas with a new one.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 4 +++ drivers/iommu/iommufd/selftest.c | 19 ++++++++++++ tools/testing/selftests/iommu/iommufd.c | 29 +++++++++++++++++-- tools/testing/selftests/iommu/iommufd_utils.h | 19 ++++++++++++ 4 files changed, 69 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index dd9168a20ddf..258de2253b61 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -18,6 +18,7 @@ enum { IOMMU_TEST_OP_ACCESS_RW, IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT, IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE, + IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, };
enum { @@ -91,6 +92,9 @@ struct iommu_test_cmd { struct { __u32 limit; } memory_limit; + struct { + __u32 ioas_id; + } access_replace_ioas; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 9d43334e4faf..bb2cd54ca7b6 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -785,6 +785,22 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd, return rc; }
+static int iommufd_test_access_replace_ioas(struct iommufd_ucmd *ucmd, + unsigned int access_id, + unsigned int ioas_id) +{ + struct selftest_access *staccess; + int rc; + + staccess = iommufd_access_get(access_id); + if (IS_ERR(staccess)) + return PTR_ERR(staccess); + + rc = iommufd_access_replace(staccess->access, ioas_id); + fput(staccess->file); + return rc; +} + /* Check that the pages in a page array match the pages in the user VA */ static int iommufd_test_check_pages(void __user *uptr, struct page **pages, size_t npages) @@ -1000,6 +1016,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd) case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); + case IOMMU_TEST_OP_ACCESS_REPLACE_IOAS: + return iommufd_test_access_replace_ioas( + ucmd, cmd->id, cmd->access_replace_ioas.ioas_id); case IOMMU_TEST_OP_ACCESS_PAGES: return iommufd_test_access_pages( ucmd, cmd->id, cmd->access_pages.iova, diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index dc09c1de319f..8acd0af37aa5 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -1283,7 +1283,13 @@ TEST_F(iommufd_mock_domain, user_copy) .dst_iova = MOCK_APERTURE_START, .length = BUFFER_SIZE, }; - unsigned int ioas_id; + struct iommu_ioas_unmap unmap_cmd = { + .size = sizeof(unmap_cmd), + .ioas_id = self->ioas_id, + .iova = MOCK_APERTURE_START, + .length = BUFFER_SIZE, + }; + unsigned int new_ioas_id, ioas_id;
/* Pin the pages in an IOAS with no domains then copy to an IOAS with domains */ test_ioctl_ioas_alloc(&ioas_id); @@ -1301,11 +1307,30 @@ TEST_F(iommufd_mock_domain, user_copy) ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, ©_cmd)); check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+ /* Now replace the ioas with a new one */ + test_ioctl_ioas_alloc(&new_ioas_id); + test_ioctl_ioas_map_id(new_ioas_id, buffer, BUFFER_SIZE, + ©_cmd.src_iova); + test_cmd_access_replace_ioas(access_cmd.id, new_ioas_id); + + /* Destroy the old ioas and cleanup copied mapping */ + ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_UNMAP, &unmap_cmd)); + test_ioctl_destroy(ioas_id); + + /* Then run the same test again with the new ioas */ + access_cmd.access_pages.iova = copy_cmd.src_iova; + ASSERT_EQ(0, + ioctl(self->fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES), + &access_cmd)); + copy_cmd.src_ioas_id = new_ioas_id; + ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, ©_cmd)); + check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE); + test_cmd_destroy_access_pages( access_cmd.id, access_cmd.access_pages.out_access_pages_id); test_cmd_destroy_access(access_cmd.id);
- test_ioctl_destroy(ioas_id); + test_ioctl_destroy(new_ioas_id); }
TEST_F(iommufd_mock_domain, replace) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 53b4d3f2d9fc..70353e68e599 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -119,6 +119,25 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, #define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id))
+static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, + unsigned int ioas_id) +{ + struct iommu_test_cmd cmd = { + .size = sizeof(cmd), + .op = IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, + .id = access_id, + .access_replace_ioas = { .ioas_id = ioas_id }, + }; + int ret; + + ret = ioctl(fd, IOMMU_TEST_CMD, &cmd); + if (ret) + return ret; + return 0; +} +#define test_cmd_access_replace_ioas(access_id, ioas_id) \ + ASSERT_EQ(0, _test_cmd_access_replace_ioas(self->fd, access_id, ioas_id)) + static int _test_cmd_create_access(int fd, unsigned int ioas_id, __u32 *access_id, unsigned int flags) {
Now both the physical path and the emulated path should support an IO page table replacement. Call iommufd_device_replace/iommufd_access_replace(), when vdev->iommufd_attached is true.
Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/vfio/iommufd.c | 11 ++++++----- include/uapi/linux/vfio.h | 6 ++++++ 2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index 4d84904fd927..82eba6966fa5 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -146,9 +146,9 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id) return -EINVAL;
if (vdev->iommufd_attached) - return -EBUSY; - - rc = iommufd_device_attach(vdev->iommufd_device, pt_id); + rc = iommufd_device_replace(vdev->iommufd_device, pt_id); + else + rc = iommufd_device_attach(vdev->iommufd_device, pt_id); if (rc) return rc; vdev->iommufd_attached = true; @@ -223,8 +223,9 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id) lockdep_assert_held(&vdev->dev_set->lock);
if (vdev->iommufd_attached) - return -EBUSY; - rc = iommufd_access_attach(vdev->iommufd_access, *pt_id); + rc = iommufd_access_replace(vdev->iommufd_access, *pt_id); + else + rc = iommufd_access_attach(vdev->iommufd_access, *pt_id); if (rc) return rc; vdev->iommufd_attached = true; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index fa06e3eb4955..537157ff8670 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -939,6 +939,12 @@ struct vfio_device_bind_iommufd { * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close. This is only * allowed on cdev fds. * + * If a vfio device is currently attached to a valid hw_pagetable, without doing + * a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl + * passing in another hw_pagetable (hwpt) id is allowed. This action, also known + * as a hw_pagetable replacement, will replace the device's currently attached + * hw_pagetable with a new hw_pagetable corresponding to the given pt_id. + * * Return: 0 on success, -errno on failure. */ struct vfio_device_attach_iommufd_pt {
linux-kselftest-mirror@lists.linaro.org