PASID (Process Address Space ID) is a PCIe extension to tag the DMA transactions out of a physical device, and most modern IOMMU hardware have supported PASID granular address translation. So a PASID-capable device can be attached to multiple hwpts (a.k.a. domains), each attachment is tagged with a pasid.
This series first adds a missing iommu API to replace domain for a pasid, then adds iommufd APIs for device drivers to attach/replace/detach pasid to/from hwpt per userspace's request, and adds selftest to validate the iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table locates in the physical address space hence must be managed by the kernel, both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD which allow configuring the PASID/CD table either in host physical address space or nested on top of an GPA address space. This series only add VT-d support as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Change log:
v1: - Implemnet iommu_replace_device_pasid() to fall back to the original domain if this replacement failed (Kevin) - Add check in do_attach() to check corressponding attach_fn per the pasid value.
rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.co...
Regards, Yi Liu
Kevin Tian (1): iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2): iommu: Introduce a replace API for device pasid iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5): iommufd: replace attach_fn with a structure iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu iommufd/selftest: Add a helper to get test device iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++ drivers/iommu/iommu-priv.h | 2 + drivers/iommu/iommu.c | 82 ++++++-- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/device.c | 50 +++-- drivers/iommu/iommufd/iommufd_private.h | 23 +++ drivers/iommu/iommufd/iommufd_test.h | 24 +++ drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++ drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++-- include/linux/iommufd.h | 6 + tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 28 ++- tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++ 13 files changed, 785 insertions(+), 42 deletions(-) create mode 100644 drivers/iommu/iommufd/pasid.c
From: Lu Baolu baolu.lu@linux.intel.com
Provide a high-level API to allow replacements of one domain with another for specific pasid of a device. This is similar to iommu_group_replace_domain() and it is also expected to be used only by IOMMUFD.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommu-priv.h | 2 + drivers/iommu/iommu.c | 82 +++++++++++++++++++++++++++++++------- 2 files changed, 70 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h index 2024a2313348..5c32637f6325 100644 --- a/drivers/iommu/iommu-priv.h +++ b/drivers/iommu/iommu-priv.h @@ -19,6 +19,8 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
int iommu_group_replace_domain(struct iommu_group *group, struct iommu_domain *new_domain); +int iommu_replace_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid);
int iommu_device_register_bus(struct iommu_device *iommu, const struct iommu_ops *ops, struct bus_type *bus, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 9d573e971aff..ec213ebd5ecc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3430,6 +3430,27 @@ static void __iommu_remove_group_pasid(struct iommu_group *group, } }
+static int __iommu_group_attach_pasid(struct iommu_domain *domain, + struct iommu_group *group, ioasid_t pasid) +{ + void *curr; + int ret; + + lockdep_assert_held(&group->mutex); + + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); + if (curr) + return xa_err(curr) ? : -EBUSY; + + ret = __iommu_set_group_pasid(domain, group, pasid); + if (ret) { + __iommu_remove_group_pasid(group, pasid); + xa_erase(&group->pasid_array, pasid); + } + + return ret; +} + /* * iommu_attach_device_pasid() - Attach a domain to pasid of device * @domain: the iommu domain. @@ -3453,19 +3474,9 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, return -ENODEV;
mutex_lock(&group->mutex); - curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); - if (curr) { - ret = xa_err(curr) ? : -EBUSY; - goto out_unlock; - } - - ret = __iommu_set_group_pasid(domain, group, pasid); - if (ret) { - __iommu_remove_group_pasid(group, pasid); - xa_erase(&group->pasid_array, pasid); - } -out_unlock: + ret = __iommu_group_attach_pasid(domain, group, pasid); mutex_unlock(&group->mutex); + return ret; } EXPORT_SYMBOL_GPL(iommu_attach_device_pasid); @@ -3479,8 +3490,8 @@ EXPORT_SYMBOL_GPL(iommu_attach_device_pasid); * The @domain must have been attached to @pasid of the @dev with * iommu_attach_device_pasid(). */ -void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, - ioasid_t pasid) +void iommu_detach_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; @@ -3492,6 +3503,49 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, } EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
+/** + * iommu_replace_device_pasid - replace the domain that a pasid is attached to + * @domain: new IOMMU domain to replace with + * @dev: the physical device + * @pasid: pasid that will be attached to the new domain + * + * This API allows the pasid to switch domains. Return 0 on success, or an + * error. The pasid will roll back to use the old domain if failure. The + * caller could call iommu_detach_device_pasid() before free the old domain + * in order to avoid use-after-free case. + */ +int iommu_replace_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct iommu_group *group = dev->iommu_group; + struct iommu_domain *old_domain; + int ret; + + if (!domain) + return -EINVAL; + + if (!group) + return -ENODEV; + + mutex_lock(&group->mutex); + __iommu_remove_group_pasid(group, pasid); + old_domain = xa_erase(&group->pasid_array, pasid); + ret = __iommu_group_attach_pasid(domain, group, pasid); + if (ret) + goto err_rollback; + mutex_unlock(&group->mutex); + + return 0; + +err_rollback: + if (old_domain) + __iommu_group_attach_pasid(old_domain, group, pasid); + mutex_unlock(&group->mutex); + + return ret; +} +EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, IOMMUFD_INTERNAL); + /* * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev * @dev: the queried device
Most of the core logic before conducting the actual device attach/ replace operation can be shared with pasid attach/replace. Create a new structure so more information (e.g. pasid) can be later added along with the attach_fn.
Signed-off-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 35 ++++++++++++++++--------- drivers/iommu/iommufd/iommufd_private.h | 8 ++++++ 2 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 59d3a07300d9..0992d9d46af9 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -531,8 +531,11 @@ iommufd_device_do_replace(struct iommufd_device *idev, return ERR_PTR(rc); }
-typedef struct iommufd_hw_pagetable *(*attach_fn)( - struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt); +static struct iommufd_hw_pagetable *do_attach(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, struct attach_data *data) +{ + return data->attach_fn(idev, hwpt); +}
/* * When automatically managing the domains we search for a compatible domain in @@ -542,7 +545,7 @@ typedef struct iommufd_hw_pagetable *(*attach_fn)( static struct iommufd_hw_pagetable * iommufd_device_auto_get_domain(struct iommufd_device *idev, struct iommufd_ioas *ioas, u32 *pt_id, - attach_fn do_attach) + struct attach_data *data) { /* * iommufd_hw_pagetable_attach() is called by @@ -551,7 +554,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, * to use the immediate_attach path as it supports drivers that can't * directly allocate a domain. */ - bool immediate_attach = do_attach == iommufd_device_do_attach; + bool immediate_attach = data->attach_fn == iommufd_device_do_attach; struct iommufd_hw_pagetable *destroy_hwpt; struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; @@ -569,7 +572,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, hwpt = &hwpt_paging->common; if (!iommufd_lock_obj(&hwpt->obj)) continue; - destroy_hwpt = (*do_attach)(idev, hwpt); + destroy_hwpt = do_attach(idev, hwpt, data); if (IS_ERR(destroy_hwpt)) { iommufd_put_object(&hwpt->obj); /* @@ -596,7 +599,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, hwpt = &hwpt_paging->common;
if (!immediate_attach) { - destroy_hwpt = (*do_attach)(idev, hwpt); + destroy_hwpt = do_attach(idev, hwpt, data); if (IS_ERR(destroy_hwpt)) goto out_abort; } else { @@ -617,8 +620,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, return destroy_hwpt; }
-static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, - attach_fn do_attach) +int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, + struct attach_data *data) { struct iommufd_hw_pagetable *destroy_hwpt; struct iommufd_object *pt_obj; @@ -633,7 +636,7 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, struct iommufd_hw_pagetable *hwpt = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
- destroy_hwpt = (*do_attach)(idev, hwpt); + destroy_hwpt = do_attach(idev, hwpt, data); if (IS_ERR(destroy_hwpt)) goto out_put_pt_obj; break; @@ -643,7 +646,7 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, container_of(pt_obj, struct iommufd_ioas, obj);
destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id, - do_attach); + data); if (IS_ERR(destroy_hwpt)) goto out_put_pt_obj; break; @@ -679,8 +682,11 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) { int rc; + struct attach_data data = { + .attach_fn = &iommufd_device_do_attach, + };
- rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach); + rc = iommufd_device_change_pt(idev, pt_id, &data); if (rc) return rc;
@@ -710,8 +716,11 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD); */ int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id) { - return iommufd_device_change_pt(idev, pt_id, - &iommufd_device_do_replace); + struct attach_data data = { + .attach_fn = &iommufd_device_do_replace, + }; + + return iommufd_device_change_pt(idev, pt_id, &data); } EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 160521800d9b..24fee2c37ce8 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -363,6 +363,14 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) void iommufd_device_destroy(struct iommufd_object *obj); int iommufd_get_hw_info(struct iommufd_ucmd *ucmd);
+struct attach_data { + union { + struct iommufd_hw_pagetable *(*attach_fn)( + struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt); + }; +}; + struct iommufd_access { struct iommufd_object obj; struct iommufd_ctx *ictx;
From: Kevin Tian kevin.tian@intel.com
This introduces three APIs for device drivers to manage pasid attach/ replace/detach.
int iommufd_device_pasid_attach(struct iommufd_device *idev, u32 pasid, u32 *pt_id); int iommufd_device_pasid_replace(struct iommufd_device *idev, u32 pasid, u32 *pt_id); void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid);
pasid operations have different implications when comparing to device operations:
- No connection to iommufd_group since pasid is a device capability and can be enabled only in singleton group;
- no reserved region per pasid otherwise SVA architecture is already broken (CPU address space doesn't count device reserved regions);
- accordingly no sw_msi trick;
- immediated_attach is not supported, expecting that arm-smmu driver will already remove that requirement before supporting this pasid operation. This avoids unnecessary change in iommufd_hw_pagetable_alloc() to carry the pasid from device.c.
With above differences, this puts all pasid related logics into a new pasid.c file.
Cache coherency enforcement is still applied to pasid operations since it is about memory accesses post page table walking (no matter the walk is per RID or per PASID).
Since the attach is per PASID, this introduces a pasid_hwpts xarray to track the per-pasid attach data.
Signed-off-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/device.c | 17 ++- drivers/iommu/iommufd/iommufd_private.h | 15 +++ drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++++++++++++ include/linux/iommufd.h | 6 ++ 5 files changed, 176 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/iommufd/pasid.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 34b446146961..4b4d516b025c 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -6,6 +6,7 @@ iommufd-y := \ ioas.o \ main.o \ pages.o \ + pasid.o \ vfio_compat.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 0992d9d46af9..a7574d4d5ffa 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -136,6 +136,7 @@ void iommufd_device_destroy(struct iommufd_object *obj) struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj);
+ WARN_ON(!xa_empty(&idev->pasid_hwpts)); iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) @@ -216,6 +217,8 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, /* igroup refcount moves into iommufd_device */ idev->igroup = igroup;
+ xa_init(&idev->pasid_hwpts); + /* * If the caller fails after this success it must call * iommufd_unbind_device() which is safe since we hold this refcount. @@ -534,7 +537,17 @@ iommufd_device_do_replace(struct iommufd_device *idev, static struct iommufd_hw_pagetable *do_attach(struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt, struct attach_data *data) { - return data->attach_fn(idev, hwpt); + if (data->pasid == IOMMU_PASID_INVALID) { + BUG_ON((data->attach_fn != iommufd_device_do_attach) && + (data->attach_fn != iommufd_device_do_replace)); + return data->attach_fn(idev, hwpt); + } else { + BUG_ON((data->pasid_attach_fn != + iommufd_device_pasid_do_attach) && + (data->pasid_attach_fn != + iommufd_device_pasid_do_replace)); + return data->pasid_attach_fn(idev, data->pasid, hwpt); + } }
/* @@ -684,6 +697,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) int rc; struct attach_data data = { .attach_fn = &iommufd_device_do_attach, + .pasid = IOMMU_PASID_INVALID, };
rc = iommufd_device_change_pt(idev, pt_id, &data); @@ -718,6 +732,7 @@ int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id) { struct attach_data data = { .attach_fn = &iommufd_device_do_replace, + .pasid = IOMMU_PASID_INVALID, };
return iommufd_device_change_pt(idev, pt_id, &data); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 24fee2c37ce8..d37b7d0bfffe 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -349,6 +349,7 @@ struct iommufd_device { struct list_head group_item; /* always the physical device */ struct device *dev; + struct xarray pasid_hwpts; bool enforce_cache_coherency; };
@@ -368,9 +369,23 @@ struct attach_data { struct iommufd_hw_pagetable *(*attach_fn)( struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt); + struct iommufd_hw_pagetable *(*pasid_attach_fn)( + struct iommufd_device *idev, u32 pasid, + struct iommufd_hw_pagetable *hwpt); }; + u32 pasid; };
+int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, + struct attach_data *data); + +struct iommufd_hw_pagetable * +iommufd_device_pasid_do_attach(struct iommufd_device *idev, u32 pasid, + struct iommufd_hw_pagetable *hwpt); +struct iommufd_hw_pagetable * +iommufd_device_pasid_do_replace(struct iommufd_device *idev, u32 pasid, + struct iommufd_hw_pagetable *hwpt); + struct iommufd_access { struct iommufd_object obj; struct iommufd_ctx *ictx; diff --git a/drivers/iommu/iommufd/pasid.c b/drivers/iommu/iommufd/pasid.c new file mode 100644 index 000000000000..75499a1d92a1 --- /dev/null +++ b/drivers/iommu/iommufd/pasid.c @@ -0,0 +1,138 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2023, Intel Corporation + */ +#include <linux/iommufd.h> +#include <linux/iommu.h> +#include "../iommu-priv.h" + +#include "iommufd_private.h" + +static int __iommufd_device_pasid_do_attach(struct iommufd_device *idev, + u32 pasid, + struct iommufd_hw_pagetable *hwpt, + bool replace) +{ + int rc; + + if (!replace) + rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid); + else + rc = iommu_replace_device_pasid(hwpt->domain, idev->dev, pasid); + if (rc) + return rc; + + refcount_inc(&hwpt->obj.users); + xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL); + return 0; +} + +struct iommufd_hw_pagetable * +iommufd_device_pasid_do_attach(struct iommufd_device *idev, u32 pasid, + struct iommufd_hw_pagetable *hwpt) +{ + struct iommufd_hw_pagetable *old_hwpt; + int rc; + + old_hwpt = xa_load(&idev->pasid_hwpts, pasid); + if (old_hwpt) { + /* Attach does not allow overwrite */ + if (old_hwpt == hwpt) + return NULL; + else + return ERR_PTR(-EINVAL); + } + + rc = __iommufd_device_pasid_do_attach(idev, pasid, hwpt, false); + return rc ? ERR_PTR(rc) : NULL; +} + +struct iommufd_hw_pagetable * +iommufd_device_pasid_do_replace(struct iommufd_device *idev, u32 pasid, + struct iommufd_hw_pagetable *hwpt) +{ + struct iommufd_hw_pagetable *old_hwpt; + int rc; + + old_hwpt = xa_load(&idev->pasid_hwpts, pasid); + if (!old_hwpt) + return ERR_PTR(-EINVAL); + + if (hwpt == old_hwpt) + return NULL; + + rc = __iommufd_device_pasid_do_attach(idev, pasid, hwpt, true); + /* Caller must destroy old_hwpt */ + return rc ? ERR_PTR(rc) : old_hwpt; +} + +/** + * iommufd_device_pasid_attach - Connect a {device, pasid} to an iommu_domain + * @idev: device to attach + * @pasid: pasid to attach + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE + * Output the IOMMUFD_OBJ_HW_PAGETABLE ID + * + * This connects a pasid of the device to an iommu_domain. Once this + * completes the device could do DMA with the pasid. + * + * This function is undone by calling iommufd_device_detach_pasid(). + */ +int iommufd_device_pasid_attach(struct iommufd_device *idev, + u32 pasid, u32 *pt_id) +{ + struct attach_data data = { + .pasid_attach_fn = &iommufd_device_pasid_do_attach, + .pasid = pasid, + }; + + return iommufd_device_change_pt(idev, pt_id, &data); +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_attach, IOMMUFD); + +/** + * iommufd_device_pasid_replace- Change the {device, pasid}'s iommu_domain + * @idev: device to change + * @pasid: pasid to change + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE + * Output the IOMMUFD_OBJ_HW_PAGETABLE ID + * + * This is the same as + * iommufd_device_pasid_detach(); + * iommufd_device_pasid_attach(); + * + * If it fails then no change is made to the attachment. The iommu driver may + * implement this so there is no disruption in translation. This can only be + * called if iommufd_device_pasid_attach() has already succeeded. + */ +int iommufd_device_pasid_replace(struct iommufd_device *idev, + u32 pasid, u32 *pt_id) +{ + struct attach_data data = { + .pasid_attach_fn = &iommufd_device_pasid_do_replace, + .pasid = pasid, + }; + + return iommufd_device_change_pt(idev, pt_id, &data); +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_replace, IOMMUFD); + +/** + * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an iommu_domain + * @idev: device to detach + * @pasid: pasid to detach + * + * Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid from + * the previously attached pt_id. + */ +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid) +{ + struct iommufd_hw_pagetable *hwpt; + + hwpt = xa_load(&idev->pasid_hwpts, pasid); + if (!hwpt) + return; + xa_erase(&idev->pasid_hwpts, pasid); + iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid); + iommufd_hw_pagetable_put(idev->ictx, hwpt); +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_detach, IOMMUFD); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index ffc3a949f837..0b007c376306 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -26,6 +26,12 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id); int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id); void iommufd_device_detach(struct iommufd_device *idev);
+int iommufd_device_pasid_attach(struct iommufd_device *idev, + u32 pasid, u32 *pt_id); +int iommufd_device_pasid_replace(struct iommufd_device *idev, + u32 pasid, u32 *pt_id); +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid); + struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev); u32 iommufd_device_to_id(struct iommufd_device *idev);
The two callbacks are needed to make pasid_attach/detach path complete for mock device. A nop is enough for set_dev_pasid, a domain type check in the remove_dev_pasid is also helpful.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/selftest.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index f2d4599f701b..221a206bf38f 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -440,6 +440,31 @@ static struct iommu_device *mock_probe_device(struct device *dev) return &mock_iommu_device; }
+static void mock_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +{ + struct iommu_domain *domain; + + /* Domain type specific cleanup: */ + domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0); + if (domain) { + switch (domain->type) { + case IOMMU_DOMAIN_NESTED: + case IOMMU_DOMAIN_UNMANAGED: + break; + default: + /* should never reach here */ + WARN_ON(1); + break; + } + } +} + +static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + return 0; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -455,6 +480,7 @@ static const struct iommu_ops mock_ops = { .capable = mock_domain_capable, .device_group = generic_device_group, .probe_device = mock_probe_device, + .remove_dev_pasid = mock_iommu_remove_dev_pasid, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free, @@ -462,6 +488,7 @@ static const struct iommu_ops mock_ops = { .map_pages = mock_domain_map_pages, .unmap_pages = mock_domain_unmap_pages, .iova_to_phys = mock_domain_iova_to_phys, + .set_dev_pasid = mock_domain_set_dev_pasid_nop, }, };
@@ -519,6 +546,7 @@ static struct iommu_domain_ops domain_nested_ops = { .free = mock_domain_free_nested, .attach_dev = mock_domain_nop_attach, .cache_invalidate_user = mock_domain_cache_invalidate_user, + .set_dev_pasid = mock_domain_set_dev_pasid_nop, };
static inline struct iommufd_hw_pagetable *
There is need to get the selftest device (sobj->type == TYPE_IDEV) in multiple places, so have a helper to for it.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/selftest.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 221a206bf38f..9b9fd3f50264 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -720,29 +720,39 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd, return rc; }
-/* Replace the mock domain with a manually allocated hw_pagetable */ -static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd, - unsigned int device_id, u32 pt_id, - struct iommu_test_cmd *cmd) +static struct selftest_obj * +iommufd_test_get_self_test_device(struct iommufd_ctx *ictx, u32 id) { struct iommufd_object *dev_obj; struct selftest_obj *sobj; - int rc;
/* * Prefer to use the OBJ_SELFTEST because the destroy_rwsem will ensure * it doesn't race with detach, which is not allowed. */ - dev_obj = - iommufd_get_object(ucmd->ictx, device_id, IOMMUFD_OBJ_SELFTEST); + dev_obj = iommufd_get_object(ictx, id, IOMMUFD_OBJ_SELFTEST); if (IS_ERR(dev_obj)) - return PTR_ERR(dev_obj); + return (struct selftest_obj *)dev_obj;
sobj = container_of(dev_obj, struct selftest_obj, obj); if (sobj->type != TYPE_IDEV) { - rc = -EINVAL; - goto out_dev_obj; + iommufd_put_object(dev_obj); + return ERR_PTR(-EINVAL); } + return sobj; +} + +/* Replace the mock domain with a manually allocated hw_pagetable */ +static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd, + unsigned int device_id, u32 pt_id, + struct iommu_test_cmd *cmd) +{ + struct selftest_obj *sobj; + int rc; + + sobj = iommufd_test_get_self_test_device(ucmd->ictx, device_id); + if (IS_ERR(sobj)) + return PTR_ERR(sobj);
rc = iommufd_device_replace(sobj->idev.idev, &pt_id); if (rc) @@ -752,7 +762,7 @@ static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd, rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
out_dev_obj: - iommufd_put_object(dev_obj); + iommufd_put_object(&sobj->obj); return rc; }
This adds 4 test ops for pasid attach/replace/detach testing. There are ops to attach/detach pasid, and also op to check the attached domain of a pasid.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 24 ++++++ drivers/iommu/iommufd/selftest.c | 116 +++++++++++++++++++++++++++ 2 files changed, 140 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 35d6207a96dd..67c2a8508b92 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -22,6 +22,10 @@ enum { IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, + IOMMU_TEST_OP_PASID_ATTACH, + IOMMU_TEST_OP_PASID_REPLACE, + IOMMU_TEST_OP_PASID_DETACH, + IOMMU_TEST_OP_PASID_CHECK_DOMAIN, };
enum { @@ -126,6 +130,26 @@ struct iommu_test_cmd { __u32 id; __u32 iotlb; } check_iotlb; + struct { + __u32 pasid; + __u32 pt_id; + /* @id is stdev_id for IOMMU_TEST_OP_PASID_ATTACH */ + } pasid_attach; + struct { + __u32 pasid; + __u32 pt_id; + /* @id is stdev_id for IOMMU_TEST_OP_PASID_ATTACH */ + } pasid_replace; + struct { + __u32 pasid; + /* @id is stdev_id for IOMMU_TEST_OP_PASID_DETACH */ + } pasid_detach; + struct { + __u32 pasid; + __u32 hwpt_id; + __u64 out_result_ptr; + /* @id is stdev_id for IOMMU_TEST_OP_HWPT_GET_DOMAIN */ + } pasid_check; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 9b9fd3f50264..a3139ad534a1 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1340,6 +1340,114 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id, return rc; }
+static int iommufd_test_pasid_attach(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct selftest_obj *sobj; + int rc; + + sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id); + if (IS_ERR(sobj)) + return PTR_ERR(sobj); + + rc = iommufd_device_pasid_attach(sobj->idev.idev, + cmd->pasid_attach.pasid, + &cmd->pasid_attach.pt_id); + iommufd_put_object(&sobj->obj); + return rc; +} + +static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct selftest_obj *sobj; + int rc; + + sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id); + if (IS_ERR(sobj)) + return PTR_ERR(sobj); + + rc = iommufd_device_pasid_replace(sobj->idev.idev, + cmd->pasid_attach.pasid, + &cmd->pasid_attach.pt_id); + iommufd_put_object(&sobj->obj); + return rc; +} + +static int iommufd_test_pasid_detach(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct selftest_obj *sobj; + + sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id); + if (IS_ERR(sobj)) + return PTR_ERR(sobj); + + iommufd_device_pasid_detach(sobj->idev.idev, + cmd->pasid_detach.pasid); + iommufd_put_object(&sobj->obj); + return 0; +} + +static inline struct iommufd_hw_pagetable * +iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) +{ + struct iommufd_object *pt_obj; + + pt_obj = iommufd_get_object(ucmd->ictx, id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) + return ERR_CAST(pt_obj); + + if (pt_obj->type != IOMMUFD_OBJ_HWPT_NESTED && + pt_obj->type != IOMMUFD_OBJ_HWPT_PAGING) { + iommufd_put_object(pt_obj); + return ERR_PTR(-EINVAL); + } + + return container_of(pt_obj, struct iommufd_hw_pagetable, obj); +} + +static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct iommu_domain *attached_domain, *expect_domain = NULL; + struct iommufd_hw_pagetable *hwpt = NULL; + struct selftest_obj *sobj; + struct mock_dev *mdev; + bool result; + int rc = 0; + + sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id); + if (IS_ERR(sobj)) + return PTR_ERR(sobj); + + mdev = sobj->idev.mock_dev; + + attached_domain = iommu_get_domain_for_dev_pasid(&mdev->dev, + cmd->pasid_check.pasid, 0); + if (IS_ERR(attached_domain)) + attached_domain = NULL; + + if (cmd->pasid_check.hwpt_id) { + hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id); + if (IS_ERR(hwpt)) { + rc = PTR_ERR(hwpt); + goto out_put_dev; + } + expect_domain = hwpt->domain; + } + + result = (attached_domain == expect_domain) ? 1 : 0; + if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr), + &result, sizeof(result))) + rc = -EFAULT; + if (hwpt) + iommufd_put_object(&hwpt->obj); +out_put_dev: + iommufd_put_object(&sobj->obj); + return rc; +} + void iommufd_selftest_destroy(struct iommufd_object *obj) { struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); @@ -1415,6 +1523,14 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->dirty.page_size, u64_to_user_ptr(cmd->dirty.uptr), cmd->dirty.flags); + case IOMMU_TEST_OP_PASID_ATTACH: + return iommufd_test_pasid_attach(ucmd, cmd); + case IOMMU_TEST_OP_PASID_REPLACE: + return iommufd_test_pasid_replace(ucmd, cmd); + case IOMMU_TEST_OP_PASID_DETACH: + return iommufd_test_pasid_detach(ucmd, cmd); + case IOMMU_TEST_OP_PASID_CHECK_DOMAIN: + return iommufd_test_pasid_check_domain(ucmd, cmd); default: return -EOPNOTSUPP; }
This tests iommufd pasid attach/replace/detach.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 172 ++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 28 ++- tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++ 3 files changed, 274 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 2781d5bc6309..4dfd2d33c0a0 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2221,4 +2221,176 @@ TEST_F(vfio_compat_mock_domain, huge_map) } }
+FIXTURE(iommufd_device_pasid) +{ + int fd; + uint32_t ioas_id; + uint32_t hwpt_id; + uint32_t stdev_id; + uint32_t device_id; +}; + +FIXTURE_SETUP(iommufd_device_pasid) +{ + self->fd = open("/dev/iommu", O_RDWR); + ASSERT_NE(-1, self->fd); + test_ioctl_ioas_alloc(&self->ioas_id); + + test_cmd_mock_domain(self->ioas_id, &self->stdev_id, + &self->hwpt_id, &self->device_id); +} + +FIXTURE_TEARDOWN(iommufd_device_pasid) +{ + teardown_iommufd(self->fd, _metadata); +} + +TEST_F(iommufd_device_pasid, pasid_attach) +{ + if (self->device_id) { + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t nested_hwpt_id[2] = {}; + uint32_t parent_hwpt_id = 0; + uint32_t pasid = 100; + bool result; + + /* Allocate two nested hwpts sharing one common parent hwpt */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &parent_hwpt_id); + + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0, + &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0, + &nested_hwpt_id[1], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + + /* + * Attach ioas to pasid 100, should succeed, domain should + * be valid. + */ + test_cmd_pasid_attach(pasid, self->ioas_id); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, self->hwpt_id, + &result)); + EXPECT_EQ(1, result); + + /* + * Try attach pasid 100 with self->ioas_id, should succeed + * as it is the same with existing hwpt. + */ + test_cmd_pasid_attach(pasid, self->ioas_id); + + /* + * Try attach pasid 100 with another hwpt, should FAIL + * as attach does not allow overwrite, use REPLACE instead. + */ + test_err_cmd_pasid_attach(EINVAL, pasid, nested_hwpt_id[0]); + + /* + * Detach hwpt from pasid 100, and check if the pasid 100 + * has null domain. Should be done before the next attach. + */ + test_cmd_pasid_detach(pasid); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, 0, &result)); + EXPECT_EQ(1, result); + + /* + * Attach nested hwpt to pasid 100, should succeed, domain + * should be valid. + */ + test_cmd_pasid_attach(pasid, nested_hwpt_id[0]); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, nested_hwpt_id[0], + &result)); + EXPECT_EQ(1, result); + + /* + * Detach hwpt from pasid 100, and check if the pasid 100 + * has null domain + */ + test_cmd_pasid_detach(pasid); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, 0, &result)); + EXPECT_EQ(1, result); + + /* Replace tests */ + pasid = 200; + + /* + * Replace pasid 200 without attaching it first, should + * fail with -EINVAL. + */ + test_err_cmd_pasid_replace(EINVAL, pasid, parent_hwpt_id); + + /* + * Attach a s2 hwpt to pasid 200, should succeed, domain should + * be valid. + */ + test_cmd_pasid_attach(pasid, parent_hwpt_id); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, parent_hwpt_id, + &result)); + EXPECT_EQ(1, result); + + /* + * Replace pasid 200 with self->ioas_id, should succeed, + * and have valid domain. + */ + test_cmd_pasid_replace(pasid, self->ioas_id); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, self->hwpt_id, + &result)); + EXPECT_EQ(1, result); + + /* + * Replace a nested hwpt for pasid 200, should succeed, + * and have valid domain. + */ + test_cmd_pasid_replace(pasid, nested_hwpt_id[0]); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, nested_hwpt_id[0], + &result)); + EXPECT_EQ(1, result); + + /* + * Replace with another nested hwpt for pasid 200, should + * succeed, and have valid domain. + */ + test_cmd_pasid_replace(pasid, nested_hwpt_id[1]); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, nested_hwpt_id[1], + &result)); + EXPECT_EQ(1, result); + + /* + * Detach hwpt from pasid 200, and check if the pasid 200 + * has null domain. + */ + test_cmd_pasid_detach(pasid); + ASSERT_EQ(0, + test_cmd_pasid_check_domain(self->fd, self->stdev_id, + pasid, 0, &result)); + EXPECT_EQ(1, result); + + test_ioctl_destroy(nested_hwpt_id[0]); + test_ioctl_destroy(nested_hwpt_id[1]); + test_ioctl_destroy(parent_hwpt_id); + } +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index f590417cd67a..6d1b03e73b9d 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -206,12 +206,16 @@ FIXTURE(basic_fail_nth) { int fd; uint32_t access_id; + uint32_t stdev_id; + uint32_t pasid; };
FIXTURE_SETUP(basic_fail_nth) { self->fd = -1; self->access_id = 0; + self->stdev_id = 0; + self->pasid = 0; //test should use a non-zero value }
FIXTURE_TEARDOWN(basic_fail_nth) @@ -223,6 +227,8 @@ FIXTURE_TEARDOWN(basic_fail_nth) rc = _test_cmd_destroy_access(self->access_id); assert(rc == 0); } + if (self->pasid && self->stdev_id) + _test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid); teardown_iommufd(self->fd, _metadata); }
@@ -579,7 +585,6 @@ TEST_FAIL_NTH(basic_fail_nth, device) struct iommu_test_hw_info info; uint32_t ioas_id; uint32_t ioas_id2; - uint32_t stdev_id; uint32_t idev_id; uint32_t hwpt_id; __u64 iova; @@ -608,7 +613,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
fail_nth_enable();
- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL, + if (_test_cmd_mock_domain(self->fd, ioas_id, &self->stdev_id, NULL, &idev_id)) return -1;
@@ -619,11 +624,26 @@ TEST_FAIL_NTH(basic_fail_nth, device) IOMMU_HWPT_DATA_NONE, 0, 0)) return -1;
- if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) + if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, ioas_id2, NULL)) + return -1; + + if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, hwpt_id, NULL)) return -1;
- if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL)) + self->pasid = 200; + + /* Tests for pasid attach/replace/detach */ + if (_test_cmd_pasid_attach(self->fd, self->stdev_id, self->pasid, ioas_id)) return -1; + + if (_test_cmd_pasid_replace(self->fd, self->stdev_id, self->pasid, ioas_id2)) + return -1; + + if (_test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid)) + return -1; + + self->pasid = 0; + return 0; }
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index d4a169a2278e..890d622cde86 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -682,3 +682,81 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
#define test_cmd_get_hw_capabilities(device_id, caps, mask) \ ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps)) + +static int _test_cmd_pasid_attach(int fd, __u32 stdev_id, __u32 pasid, __u32 pt_id) +{ + struct iommu_test_cmd test_attach = { + .size = sizeof(test_attach), + .op = IOMMU_TEST_OP_PASID_ATTACH, + .id = stdev_id, + .pasid_attach = { + .pasid = pasid, + .pt_id = pt_id, + }, + }; + + return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_ATTACH), &test_attach); +} + +#define test_cmd_pasid_attach(pasid, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_pasid_attach(self->fd, self->stdev_id, pasid, hwpt_id)) + +#define test_err_cmd_pasid_attach(_errno, pasid, hwpt_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_pasid_attach(self->fd, self->stdev_id, pasid, hwpt_id)) + +static int _test_cmd_pasid_replace(int fd, __u32 stdev_id, __u32 pasid, __u32 pt_id) +{ + struct iommu_test_cmd test_replace = { + .size = sizeof(test_replace), + .op = IOMMU_TEST_OP_PASID_REPLACE, + .id = stdev_id, + .pasid_replace = { + .pasid = pasid, + .pt_id = pt_id, + }, + }; + + return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_REPLACE), &test_replace); +} + +#define test_cmd_pasid_replace(pasid, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_pasid_replace(self->fd, self->stdev_id, pasid, hwpt_id)) + +#define test_err_cmd_pasid_replace(_errno, pasid, hwpt_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_pasid_replace(self->fd, self->stdev_id, pasid, hwpt_id)) + +static int _test_cmd_pasid_detach(int fd, __u32 stdev_id, __u32 pasid) +{ + struct iommu_test_cmd test_detach = { + .size = sizeof(test_detach), + .op = IOMMU_TEST_OP_PASID_DETACH, + .id = stdev_id, + .pasid_detach = { + .pasid = pasid, + }, + }; + + return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_DETACH), &test_detach); +} + +#define test_cmd_pasid_detach(pasid) \ + ASSERT_EQ(0, _test_cmd_pasid_detach(self->fd, self->stdev_id, pasid)) + +static int test_cmd_pasid_check_domain(int fd, __u32 stdev_id, __u32 pasid, + __u32 hwpt_id, bool *result) +{ + struct iommu_test_cmd test_pasid_check = { + .size = sizeof(test_pasid_check), + .op = IOMMU_TEST_OP_PASID_CHECK_DOMAIN, + .id = stdev_id, + .pasid_check = { + .pasid = pasid, + .hwpt_id = hwpt_id, + .out_result_ptr = (__u64)result, + }, + }; + + return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_CHECK_DOMAIN), &test_pasid_check); +}
From: Lu Baolu baolu.lu@linux.intel.com
This allows the upper layers to set a nested type domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware.
The set_dev_pasid callback for non-nest domain has already be there, so this only needs to add it for nested domains.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index 44ad48db7ea0..f6f687750104 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain, return 0; }
+static int intel_nested_set_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct intel_iommu *iommu = info->iommu; + struct dev_pasid_info *dev_pasid; + unsigned long flags; + int ret = 0; + + if (!pasid_supported(iommu)) + return -EOPNOTSUPP; + + if (iommu->agaw < dmar_domain->s2_domain->agaw) + return -EINVAL; + + ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev); + if (ret) + return ret; + + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); + if (!dev_pasid) + return -ENOMEM; + + ret = domain_attach_iommu(dmar_domain, iommu); + if (ret) + goto err_free; + + ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain); + if (ret) + goto err_detach_iommu; + + dev_pasid->dev = dev; + dev_pasid->pasid = pasid; + spin_lock_irqsave(&dmar_domain->lock, flags); + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); + spin_unlock_irqrestore(&dmar_domain->lock, flags); + + return 0; +err_detach_iommu: + domain_detach_iommu(dmar_domain, iommu); +err_free: + kfree(dev_pasid); + return ret; +} + static void intel_nested_domain_free(struct iommu_domain *domain) { kfree(to_dmar_domain(domain)); @@ -128,6 +174,7 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, + .set_dev_pasid = intel_nested_set_dev_pasid, .free = intel_nested_domain_free, .cache_invalidate_user = intel_nested_cache_invalidate_user, };
On 11/27/2023 2:34 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
This allows the upper layers to set a nested type domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware.
The set_dev_pasid callback for non-nest domain has already be there, so this only needs to add it for nested domains.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index 44ad48db7ea0..f6f687750104 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain, return 0; } +static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct intel_iommu *iommu = info->iommu;
- struct dev_pasid_info *dev_pasid;
- unsigned long flags;
- int ret = 0;
- if (!pasid_supported(iommu))
return -EOPNOTSUPP;
- if (iommu->agaw < dmar_domain->s2_domain->agaw)
return -EINVAL;
- ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
- if (ret)
return ret;
- dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
- if (!dev_pasid)
return -ENOMEM;
- ret = domain_attach_iommu(dmar_domain, iommu);
- if (ret)
goto err_free;
- ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
- if (ret)
goto err_detach_iommu;
- dev_pasid->dev = dev;
- dev_pasid->pasid = pasid;
- spin_lock_irqsave(&dmar_domain->lock, flags);
- list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
- spin_unlock_irqrestore(&dmar_domain->lock, flags);
---> list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
dev_pasid is linked at later time, this leads to domain->has_iotlb_device is not correctly set, which finally results into a missing of device iotlb flush in iommu_flush_dev_iotlb()when it's called. Check this call path: domain_attach_iommu()->domain_update_iommu_cap()->domain_update_iotlb()->domain->has_iotlb_device = has_iotlb_device; The ugly fixup is to call domain_update_iommu_cap() or domain_update_iotlb() here again before return. The similar issue is in intel_iommu_set_dev_pasid() and intel_nested_attach_dev().
- return 0;
+err_detach_iommu:
- domain_detach_iommu(dmar_domain, iommu);
+err_free:
- kfree(dev_pasid);
- return ret;
+}
- static void intel_nested_domain_free(struct iommu_domain *domain) { kfree(to_dmar_domain(domain));
@@ -128,6 +174,7 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain, static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev,
- .set_dev_pasid = intel_nested_set_dev_pasid, .free = intel_nested_domain_free, .cache_invalidate_user = intel_nested_cache_invalidate_user, };
On 2023/12/14 10:55, Yang, Weijiang wrote:
On 11/27/2023 2:34 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
This allows the upper layers to set a nested type domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware.
The set_dev_pasid callback for non-nest domain has already be there, so this only needs to add it for nested domains.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index 44ad48db7ea0..f6f687750104 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain, return 0; } +static int intel_nested_set_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct intel_iommu *iommu = info->iommu; + struct dev_pasid_info *dev_pasid; + unsigned long flags; + int ret = 0;
+ if (!pasid_supported(iommu)) + return -EOPNOTSUPP;
+ if (iommu->agaw < dmar_domain->s2_domain->agaw) + return -EINVAL;
+ ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev); + if (ret) + return ret;
+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); + if (!dev_pasid) + return -ENOMEM;
+ ret = domain_attach_iommu(dmar_domain, iommu); + if (ret) + goto err_free;
+ ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain); + if (ret) + goto err_detach_iommu;
+ dev_pasid->dev = dev; + dev_pasid->pasid = pasid; + spin_lock_irqsave(&dmar_domain->lock, flags); + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); + spin_unlock_irqrestore(&dmar_domain->lock, flags);
---> list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
dev_pasid is linked at later time, this leads to domain->has_iotlb_device is not correctly set, which finally results into a missing of device iotlb flush in iommu_flush_dev_iotlb()when it's called. Check this call path: domain_attach_iommu()->domain_update_iommu_cap()->domain_update_iotlb()->domain->has_iotlb_device = has_iotlb_device; The ugly fixup is to call domain_update_iommu_cap() or domain_update_iotlb() here again before return. The similar issue is in intel_iommu_set_dev_pasid() and intel_nested_attach_dev().
Yes, domain->has_iotlb_device must be updated whenever a domain is attached to (or removed from) a RID or PASID. I would be grateful if you could post some patches to fix the set_device_pasid and nested_attach_dev paths.
I assume Yi can fix this series in the next version.
Best regards, baolu
On 12/14/2023 9:33 PM, Baolu Lu wrote:
On 2023/12/14 10:55, Yang, Weijiang wrote:
On 11/27/2023 2:34 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
This allows the upper layers to set a nested type domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware.
The set_dev_pasid callback for non-nest domain has already be there, so this only needs to add it for nested domains.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index 44ad48db7ea0..f6f687750104 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain, return 0; } +static int intel_nested_set_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct intel_iommu *iommu = info->iommu; + struct dev_pasid_info *dev_pasid; + unsigned long flags; + int ret = 0;
+ if (!pasid_supported(iommu)) + return -EOPNOTSUPP;
+ if (iommu->agaw < dmar_domain->s2_domain->agaw) + return -EINVAL;
+ ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev); + if (ret) + return ret;
+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); + if (!dev_pasid) + return -ENOMEM;
+ ret = domain_attach_iommu(dmar_domain, iommu); + if (ret) + goto err_free;
+ ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain); + if (ret) + goto err_detach_iommu;
+ dev_pasid->dev = dev; + dev_pasid->pasid = pasid; + spin_lock_irqsave(&dmar_domain->lock, flags); + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); + spin_unlock_irqrestore(&dmar_domain->lock, flags);
---> list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
dev_pasid is linked at later time, this leads to domain->has_iotlb_device is not correctly set, which finally results into a missing of device iotlb flush in iommu_flush_dev_iotlb()when it's called. Check this call path: domain_attach_iommu()->domain_update_iommu_cap()->domain_update_iotlb()->domain->has_iotlb_device = has_iotlb_device; The ugly fixup is to call domain_update_iommu_cap() or domain_update_iotlb() here again before return. The similar issue is in intel_iommu_set_dev_pasid() and intel_nested_attach_dev().
Yes, domain->has_iotlb_device must be updated whenever a domain is attached to (or removed from) a RID or PASID. I would be grateful if you could post some patches to fix the set_device_pasid and nested_attach_dev paths.
Sure, I'll post fixup patches for the paths, thanks for confirmation!
I assume Yi can fix this series in the next version.
Best regards, baolu
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *old_domain;
- int ret;
- if (!domain)
return -EINVAL;
- if (!group)
return -ENODEV;
- mutex_lock(&group->mutex);
- __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Jason
On Sun, Nov 26, 2023 at 10:34:28PM -0800, Yi Liu wrote:
+static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct intel_iommu *iommu = info->iommu;
- struct dev_pasid_info *dev_pasid;
- unsigned long flags;
- int ret = 0;
- if (!pasid_supported(iommu))
return -EOPNOTSUPP;
- if (iommu->agaw < dmar_domain->s2_domain->agaw)
return -EINVAL;
- ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
- if (ret)
return ret;
- dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
- if (!dev_pasid)
return -ENOMEM;
- ret = domain_attach_iommu(dmar_domain, iommu);
- if (ret)
goto err_free;
- ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
- if (ret)
goto err_detach_iommu;
- dev_pasid->dev = dev;
- dev_pasid->pasid = pasid;
- spin_lock_irqsave(&dmar_domain->lock, flags);
- list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
- spin_unlock_irqrestore(&dmar_domain->lock, flags);
- return 0;
+err_detach_iommu:
- domain_detach_iommu(dmar_domain, iommu);
+err_free:
- kfree(dev_pasid);
- return ret;
+}
This seems alot longer than I'd think it should be, why isn't it exactly the same code as the other set_dev_pasid's?
Jason
On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
@@ -534,7 +537,17 @@ iommufd_device_do_replace(struct iommufd_device *idev, static struct iommufd_hw_pagetable *do_attach(struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt, struct attach_data *data) {
- return data->attach_fn(idev, hwpt);
- if (data->pasid == IOMMU_PASID_INVALID) {
BUG_ON((data->attach_fn != iommufd_device_do_attach) &&
(data->attach_fn != iommufd_device_do_replace));
return data->attach_fn(idev, hwpt);
- } else {
BUG_ON((data->pasid_attach_fn !=
iommufd_device_pasid_do_attach) &&
(data->pasid_attach_fn !=
iommufd_device_pasid_do_replace));
return data->pasid_attach_fn(idev, data->pasid, hwpt);
- }
Seems like the BUG_ON's are pointless
+/**
- iommufd_device_pasid_detach - Disconnect a {device, pasid} to an iommu_domain
- @idev: device to detach
- @pasid: pasid to detach
- Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid from
- the previously attached pt_id.
- */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid) +{
- struct iommufd_hw_pagetable *hwpt;
- hwpt = xa_load(&idev->pasid_hwpts, pasid);
- if (!hwpt)
return;
- xa_erase(&idev->pasid_hwpts, pasid);
- iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
- iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
None of this xarray stuff looks locked properly
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, January 16, 2024 1:25 AM
On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
+/**
- iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
iommu_domain
- @idev: device to detach
- @pasid: pasid to detach
- Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid
from
- the previously attached pt_id.
- */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
pasid)
+{
- struct iommufd_hw_pagetable *hwpt;
- hwpt = xa_load(&idev->pasid_hwpts, pasid);
- if (!hwpt)
return;
- xa_erase(&idev->pasid_hwpts, pasid);
- iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
- iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
None of this xarray stuff looks locked properly
I had an impression from past discussions that the caller should not race attach/detach/replace on same device or pasid, otherwise it is already a problem in a higher level.
and the original intention of the group lock was to ensure all devices in the group have a same view. Not exactly to guard concurrent attach/detach.
If this understanding is incorrect we can add a lock for sure. 😊
On Tue, Jan 16, 2024 at 01:18:12AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, January 16, 2024 1:25 AM
On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
+/**
- iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
iommu_domain
- @idev: device to detach
- @pasid: pasid to detach
- Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid
from
- the previously attached pt_id.
- */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
pasid)
+{
- struct iommufd_hw_pagetable *hwpt;
- hwpt = xa_load(&idev->pasid_hwpts, pasid);
- if (!hwpt)
return;
- xa_erase(&idev->pasid_hwpts, pasid);
- iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
- iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
None of this xarray stuff looks locked properly
I had an impression from past discussions that the caller should not race attach/detach/replace on same device or pasid, otherwise it is already a problem in a higher level.
I thought that was just at the iommu layer? We want VFIO to do the same? Then why do we need the dual xarrays?
Still, it looks really wrong to have code like this, we don't need to - it can be locked properly, it isn't a performance path..
and the original intention of the group lock was to ensure all devices in the group have a same view. Not exactly to guard concurrent attach/detach.
We don't have a group lock here, this is in iommufd.
Use the xarray lock..
eg
hwpt = xa_erase(&idev->pasid_hwpts, pasid); if (WARN_ON(!hwpt)) return
xa_erase is atomic.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, January 16, 2024 8:58 PM
On Tue, Jan 16, 2024 at 01:18:12AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, January 16, 2024 1:25 AM
On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
+/**
- iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
iommu_domain
- @idev: device to detach
- @pasid: pasid to detach
- Undo iommufd_device_pasid_attach(). This disconnects the
idev/pasid
from
- the previously attached pt_id.
- */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
pasid)
+{
- struct iommufd_hw_pagetable *hwpt;
- hwpt = xa_load(&idev->pasid_hwpts, pasid);
- if (!hwpt)
return;
- xa_erase(&idev->pasid_hwpts, pasid);
- iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
- iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
None of this xarray stuff looks locked properly
I had an impression from past discussions that the caller should not race attach/detach/replace on same device or pasid, otherwise it is already a problem in a higher level.
I thought that was just at the iommu layer? We want VFIO to do the same? Then why do we need the dual xarrays?
Still, it looks really wrong to have code like this, we don't need to
- it can be locked properly, it isn't a performance path..
OK, let's add a lock for this.
and the original intention of the group lock was to ensure all devices in the group have a same view. Not exactly to guard concurrent attach/detach.
We don't have a group lock here, this is in iommufd.
I meant the lock in iommufd_group.
Use the xarray lock..
eg
hwpt = xa_erase(&idev->pasid_hwpts, pasid); if (WARN_ON(!hwpt)) return
xa_erase is atomic.
yes, that's better.
On 2024/1/16 1:22, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:28PM -0800, Yi Liu wrote:
+static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct intel_iommu *iommu = info->iommu;
- struct dev_pasid_info *dev_pasid;
- unsigned long flags;
- int ret = 0;
- if (!pasid_supported(iommu))
return -EOPNOTSUPP;
- if (iommu->agaw < dmar_domain->s2_domain->agaw)
return -EINVAL;
- ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
- if (ret)
return ret;
- dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
- if (!dev_pasid)
return -ENOMEM;
- ret = domain_attach_iommu(dmar_domain, iommu);
- if (ret)
goto err_free;
- ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
- if (ret)
goto err_detach_iommu;
- dev_pasid->dev = dev;
- dev_pasid->pasid = pasid;
- spin_lock_irqsave(&dmar_domain->lock, flags);
- list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
- spin_unlock_irqrestore(&dmar_domain->lock, flags);
- return 0;
+err_detach_iommu:
- domain_detach_iommu(dmar_domain, iommu);
+err_free:
- kfree(dev_pasid);
- return ret;
+}
This seems alot longer than I'd think it should be, why isn't it exactly the same code as the other set_dev_pasid's?
Yes. It should be. The only difference is how to configure the pasid entry.
Best regards, baolu
On 2024/1/17 12:17, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, January 16, 2024 8:58 PM
On Tue, Jan 16, 2024 at 01:18:12AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, January 16, 2024 1:25 AM
On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
+/**
- iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
iommu_domain
- @idev: device to detach
- @pasid: pasid to detach
- Undo iommufd_device_pasid_attach(). This disconnects the
idev/pasid
from
- the previously attached pt_id.
- */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
pasid)
+{
- struct iommufd_hw_pagetable *hwpt;
- hwpt = xa_load(&idev->pasid_hwpts, pasid);
- if (!hwpt)
return;
- xa_erase(&idev->pasid_hwpts, pasid);
- iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
- iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
None of this xarray stuff looks locked properly
I had an impression from past discussions that the caller should not race attach/detach/replace on same device or pasid, otherwise it is already a problem in a higher level.
I thought that was just at the iommu layer? We want VFIO to do the same? Then why do we need the dual xarrays?
Still, it looks really wrong to have code like this, we don't need to
- it can be locked properly, it isn't a performance path..
OK, let's add a lock for this.
and the original intention of the group lock was to ensure all devices in the group have a same view. Not exactly to guard concurrent attach/detach.
We don't have a group lock here, this is in iommufd.
I meant the lock in iommufd_group.
Use the xarray lock..
eg
hwpt = xa_erase(&idev->pasid_hwpts, pasid); if (WARN_ON(!hwpt)) return
xa_erase is atomic.
yes, that's better.
Above indeed makes more sense if there can be concurrent attach/replace/detach on a single pasid. Just have one doubt should we add lock to protect the whole attach/replace/detach paths. In the attach/replace path[1] [2], the xarray entry is verified firstly, and then updated after returning from iommu attach/replace API. It is uneasy to protect the xarray operations only with xa_lock as a detach path can acquire xa_lock right after attach/replace path checks the xarray. To avoid it, may need a mutex to protect the whole attach/replace/detach path to avoid race. Or maybe the attach/replace path should mark the corresponding entry as a special state that can block the other path like detach until the attach/replace path update the final hwpt to the xarray. Is there such state in xarray?
[1] iommufd_device_pasid_attach() -> iommufd_device_pasid_do_attach() -> __iommufd_device_pasid_do_attach() [2] iommufd_device_pasid_replace -> iommufd_device_pasid_do_replace -> __iommufd_device_pasid_do_attach
Regards, Yi Liu
On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
Above indeed makes more sense if there can be concurrent attach/replace/detach on a single pasid. Just have one doubt should we add lock to protect the whole attach/replace/detach paths. In the attach/replace path[1] [2], the xarray entry is verified firstly, and then updated after returning from iommu attach/replace API. It is uneasy to protect the xarray operations only with xa_lock as a detach path can acquire xa_lock right after attach/replace path checks the xarray. To avoid it, may need a mutex to protect the whole attach/replace/detach path to avoid race. Or maybe the attach/replace path should mark the corresponding entry as a special state that can block the other path like detach until the attach/replace path update the final hwpt to the xarray. Is there such state in xarray?
If the caller is not allowed to make concurrent attaches/detaches to the same pasid then you can document that in a comment, but it is still better to use xarray in a self-consistent way.
Jason
On 2024/1/17 20:56, Jason Gunthorpe wrote:
On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
Above indeed makes more sense if there can be concurrent attach/replace/detach on a single pasid. Just have one doubt should we add lock to protect the whole attach/replace/detach paths. In the attach/replace path[1] [2], the xarray entry is verified firstly, and then updated after returning from iommu attach/replace API. It is uneasy to protect the xarray operations only with xa_lock as a detach path can acquire xa_lock right after attach/replace path checks the xarray. To avoid it, may need a mutex to protect the whole attach/replace/detach path to avoid race. Or maybe the attach/replace path should mark the corresponding entry as a special state that can block the other path like detach until the attach/replace path update the final hwpt to the xarray. Is there such state in xarray?
If the caller is not allowed to make concurrent attaches/detaches to the same pasid then you can document that in a comment,
yes. I can document it. Otherwise, we may need a mutex for pasid to allow concurrent attaches/detaches.
but it is still better to use xarray in a self-consistent way.
sure. I'll try. At least in the detach path, xarray should be what you've suggested in prior email. Currently in the attach path, the logic is as below. Perhaps I can skip the check on old_hwpt since iommu_attach_device_pasid() should fail if there is an existing domain attached on the pasid. Then the xarray should be more consistent. what about your opinion?
old_hwpt = xa_load(&idev->pasid_hwpts, pasid); if (old_hwpt) { /* Attach does not allow overwrite */ if (old_hwpt == hwpt) return NULL; else return ERR_PTR(-EINVAL); }
rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid); if (rc) return ERR_PTR(rc);
refcount_inc(&hwpt->obj.users); xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
On Thu, Jan 18, 2024 at 05:28:01PM +0800, Yi Liu wrote:
On 2024/1/17 20:56, Jason Gunthorpe wrote:
On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
Above indeed makes more sense if there can be concurrent attach/replace/detach on a single pasid. Just have one doubt should we add lock to protect the whole attach/replace/detach paths. In the attach/replace path[1] [2], the xarray entry is verified firstly, and then updated after returning from iommu attach/replace API. It is uneasy to protect the xarray operations only with xa_lock as a detach path can acquire xa_lock right after attach/replace path checks the xarray. To avoid it, may need a mutex to protect the whole attach/replace/detach path to avoid race. Or maybe the attach/replace path should mark the corresponding entry as a special state that can block the other path like detach until the attach/replace path update the final hwpt to the xarray. Is there such state in xarray?
If the caller is not allowed to make concurrent attaches/detaches to the same pasid then you can document that in a comment,
yes. I can document it. Otherwise, we may need a mutex for pasid to allow concurrent attaches/detaches.
but it is still better to use xarray in a self-consistent way.
sure. I'll try. At least in the detach path, xarray should be what you've suggested in prior email. Currently in the attach path, the logic is as below. Perhaps I can skip the check on old_hwpt since iommu_attach_device_pasid() should fail if there is an existing domain attached on the pasid. Then the xarray should be more consistent. what about your opinion?
old_hwpt = xa_load(&idev->pasid_hwpts, pasid); if (old_hwpt) { /* Attach does not allow overwrite */ if (old_hwpt == hwpt) return NULL; else return ERR_PTR(-EINVAL); }
rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid); if (rc) return ERR_PTR(rc);
refcount_inc(&hwpt->obj.users); xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
Use xa_cmpxchg()
Jason
On 2024/1/18 21:38, Jason Gunthorpe wrote:
On Thu, Jan 18, 2024 at 05:28:01PM +0800, Yi Liu wrote:
On 2024/1/17 20:56, Jason Gunthorpe wrote:
On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
Above indeed makes more sense if there can be concurrent attach/replace/detach on a single pasid. Just have one doubt should we add lock to protect the whole attach/replace/detach paths. In the attach/replace path[1] [2], the xarray entry is verified firstly, and then updated after returning from iommu attach/replace API. It is uneasy to protect the xarray operations only with xa_lock as a detach path can acquire xa_lock right after attach/replace path checks the xarray. To avoid it, may need a mutex to protect the whole attach/replace/detach path to avoid race. Or maybe the attach/replace path should mark the corresponding entry as a special state that can block the other path like detach until the attach/replace path update the final hwpt to the xarray. Is there such state in xarray?
If the caller is not allowed to make concurrent attaches/detaches to the same pasid then you can document that in a comment,
yes. I can document it. Otherwise, we may need a mutex for pasid to allow concurrent attaches/detaches.
but it is still better to use xarray in a self-consistent way.
sure. I'll try. At least in the detach path, xarray should be what you've suggested in prior email. Currently in the attach path, the logic is as below. Perhaps I can skip the check on old_hwpt since iommu_attach_device_pasid() should fail if there is an existing domain attached on the pasid. Then the xarray should be more consistent. what about your opinion?
old_hwpt = xa_load(&idev->pasid_hwpts, pasid); if (old_hwpt) { /* Attach does not allow overwrite */ if (old_hwpt == hwpt) return NULL; else return ERR_PTR(-EINVAL); }
rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid); if (rc) return ERR_PTR(rc);
refcount_inc(&hwpt->obj.users); xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
Use xa_cmpxchg()
sure.
On 2024/1/16 01:19, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *old_domain;
- int ret;
- if (!domain)
return -EINVAL;
- if (!group)
return -ENODEV;
- mutex_lock(&group->mutex);
- __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Seems uneasy to do it so far. The VT-d driver needs to get the old domain first in order to do replacement. However, VT-d driver does not track the attached domains of pasids. It gets domain of a pasid by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid) in link [1]. While the iommu layer exchanges the domain in the group->pasid_array before calling into VT-d driver. So when calling into VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already the new domain. To solve it, we may need to let VT-d driver have its own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?
[1] https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu.c#L4...
From: Liu, Yi L yi.l.liu@intel.com Sent: Sunday, March 10, 2024 9:06 PM
On 2024/1/16 01:19, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *old_domain;
- int ret;
- if (!domain)
return -EINVAL;
- if (!group)
return -ENODEV;
- mutex_lock(&group->mutex);
- __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Seems uneasy to do it so far. The VT-d driver needs to get the old domain first in order to do replacement. However, VT-d driver does not track the attached domains of pasids. It gets domain of a pasid by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid) in link [1]. While the iommu layer exchanges the domain in the group->pasid_array before calling into VT-d driver. So when calling into VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already the new domain. To solve it, we may need to let VT-d driver have its own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?
[1] https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu .c#L4621C19-L4621C20
Jason's point was that the core helper should directly call set_dev_pasid and underlying iommu driver decides whether atomic replacement can be implemented inside the callback. If you first remove in the core then one can never implement a replace semantics.
On 2024/3/11 17:26, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Sunday, March 10, 2024 9:06 PM
On 2024/1/16 01:19, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
+{
- struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *old_domain;
- int ret;
- if (!domain)
return -EINVAL;
- if (!group)
return -ENODEV;
- mutex_lock(&group->mutex);
- __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Seems uneasy to do it so far. The VT-d driver needs to get the old domain first in order to do replacement. However, VT-d driver does not track the attached domains of pasids. It gets domain of a pasid by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid) in link [1]. While the iommu layer exchanges the domain in the group->pasid_array before calling into VT-d driver. So when calling into VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already the new domain. To solve it, we may need to let VT-d driver have its own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?
[1] https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu .c#L4621C19-L4621C20
Jason's point was that the core helper should directly call set_dev_pasid and underlying iommu driver decides whether atomic replacement can be implemented inside the callback. If you first remove in the core then one can never implement a replace semantics.
yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid callback to handle domain replacement. The existing intel iommu driver does not track attached domains for pasid. But it's needed to know the attached domain of a pasid and find the dev_pasid under the domain, hence be able to clean up the old attachment and do the new attachment. Existing code cannot work as I mentioned above. The group->pasid_xarray is updated before calling set_dev_pasid callback. This means the underlying iommu driver cannot get the old domain in the set_dev_pasid callback by the iommu_get_domain_for_dev_pasid() helper.
As above, I'm considering the possibility to track attached domains for pasid by an xarray in the struct device_domain_info. Hence, intel iommu driver could store/retrieve attached domain for pasids. If it's replacement, the set_dev_pasid callback could find the attached domain and the dev_pasid instance in the domain. Then be able to do detach and clean up the tracking structures (e.g. dev_pasid).
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index e58263dcfadd..03a847a406e1 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4398,6 +4398,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
info->dev = dev; info->iommu = iommu; + xa_init(&info->pasid_array); if (dev_is_pci(dev)) { if (ecap_dev_iotlb_support(iommu->ecap) && pci_ats_supported(pdev) && @@ -4464,6 +4465,7 @@ static void intel_iommu_release_device(struct device *dev) mutex_unlock(&iommu->iopf_lock); dmar_remove_one_dev_info(dev); intel_pasid_free_table(dev); + WARN_ON(!xa_empty(&info->pasid_array)); intel_iommu_debugfs_remove_dev(info); kfree(info); set_dma_ops(dev, NULL); @@ -4707,7 +4709,13 @@ static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain, return 0; }
-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +/* + * Clear the pasid table entries so that all DMA requests with specific + * PASID from the device are blocked. If the page table has been set, clean + * up the data structures. + */ +static void device_pasid_block_translation(struct device *dev, ioasid_t pasid, + bool remove) { struct device_domain_info *info = dev_iommu_priv_get(dev); struct dev_pasid_info *curr, *dev_pasid = NULL; @@ -4716,9 +4724,11 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) struct iommu_domain *domain; unsigned long flags;
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0); - if (WARN_ON_ONCE(!domain)) + domain = xa_erase(&info->pasid_array, pasid); + if (!domain) { + WARN_ON_ONCE(remove); return; + }
/* * The SVA implementation needs to handle its own stuffs like the mm @@ -4753,6 +4763,11 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) intel_drain_pasid_prq(dev, pasid); }
+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +{ + device_pasid_block_translation(dev, pasid, true); +} + static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) { @@ -4772,6 +4787,9 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, if (context_copied(iommu, info->bus, info->devfn)) return -EBUSY;
+ /* Try to block translation if it already has */ + device_pasid_block_translation(dev, pasid, false); + ret = prepare_domain_attach_device(domain, dev); if (ret) return ret; @@ -4801,6 +4819,8 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ xa_store(&info->pasid_array, pasid, dmar_domain, GFP_KERNEL); + if (domain->type & __IOMMU_DOMAIN_PAGING) intel_iommu_debugfs_create_dev_pasid(dev_pasid);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 94e2ecc3c73f..a466555f61fe 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -767,6 +767,7 @@ struct device_domain_info { #ifdef CONFIG_INTEL_IOMMU_DEBUGFS struct dentry *debugfs_dentry; /* pointer to device directory dentry */ #endif + struct xarray pasid_array; /* Attached domains per PASID */ };
struct dev_pasid_info {
On 2024/3/12 11:07, Yi Liu wrote:
On 2024/3/11 17:26, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Sunday, March 10, 2024 9:06 PM
On 2024/1/16 01:19, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct iommu_group *group = dev->iommu_group; + struct iommu_domain *old_domain; + int ret;
+ if (!domain) + return -EINVAL;
+ if (!group) + return -ENODEV;
+ mutex_lock(&group->mutex); + __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Seems uneasy to do it so far. The VT-d driver needs to get the old domain first in order to do replacement. However, VT-d driver does not track the attached domains of pasids. It gets domain of a pasid by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid) in link [1]. While the iommu layer exchanges the domain in the group->pasid_array before calling into VT-d driver. So when calling into VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already the new domain. To solve it, we may need to let VT-d driver have its own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?
[1] https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu .c#L4621C19-L4621C20
Jason's point was that the core helper should directly call set_dev_pasid and underlying iommu driver decides whether atomic replacement can be implemented inside the callback. If you first remove in the core then one can never implement a replace semantics.
yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid callback to handle domain replacement. The existing intel iommu driver does not track attached domains for pasid. But it's needed to know the attached domain of a pasid and find the dev_pasid under the domain, hence be able to clean up the old attachment and do the new attachment. Existing code cannot work as I mentioned above. The group->pasid_xarray is updated before calling set_dev_pasid callback. This means the underlying iommu driver cannot get the old domain in the set_dev_pasid callback by the iommu_get_domain_for_dev_pasid() helper.
As above, I'm considering the possibility to track attached domains for pasid by an xarray in the struct device_domain_info. Hence, intel iommu driver could store/retrieve attached domain for pasids. If it's replacement, the set_dev_pasid callback could find the attached domain and the dev_pasid instance in the domain. Then be able to do detach and clean up the tracking structures (e.g. dev_pasid).
Maintaining the same data structure in both core and individual iommu drivers seems to be duplicate effort. It appears that what you want here is just to get the currently used domain in the set_dev_pasid path. Is it possible to adjust the core code so that the pasid array is updated after the driver's set_dev_pasid callback returns success?
Best regards, baolu
On 2024/3/13 11:13, Baolu Lu wrote:
On 2024/3/12 11:07, Yi Liu wrote:
On 2024/3/11 17:26, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Sunday, March 10, 2024 9:06 PM
On 2024/1/16 01:19, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct iommu_group *group = dev->iommu_group; + struct iommu_domain *old_domain; + int ret;
+ if (!domain) + return -EINVAL;
+ if (!group) + return -ENODEV;
+ mutex_lock(&group->mutex); + __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Seems uneasy to do it so far. The VT-d driver needs to get the old domain first in order to do replacement. However, VT-d driver does not track the attached domains of pasids. It gets domain of a pasid by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid) in link [1]. While the iommu layer exchanges the domain in the group->pasid_array before calling into VT-d driver. So when calling into VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already the new domain. To solve it, we may need to let VT-d driver have its own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?
[1] https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu .c#L4621C19-L4621C20
Jason's point was that the core helper should directly call set_dev_pasid and underlying iommu driver decides whether atomic replacement can be implemented inside the callback. If you first remove in the core then one can never implement a replace semantics.
yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid callback to handle domain replacement. The existing intel iommu driver does not track attached domains for pasid. But it's needed to know the attached domain of a pasid and find the dev_pasid under the domain, hence be able to clean up the old attachment and do the new attachment. Existing code cannot work as I mentioned above. The group->pasid_xarray is updated before calling set_dev_pasid callback. This means the underlying iommu driver cannot get the old domain in the set_dev_pasid callback by the iommu_get_domain_for_dev_pasid() helper.
As above, I'm considering the possibility to track attached domains for pasid by an xarray in the struct device_domain_info. Hence, intel iommu driver could store/retrieve attached domain for pasids. If it's replacement, the set_dev_pasid callback could find the attached domain and the dev_pasid instance in the domain. Then be able to do detach and clean up the tracking structures (e.g. dev_pasid).
Maintaining the same data structure in both core and individual iommu drivers seems to be duplicate effort. It appears that what you want here is just to get the currently used domain in the set_dev_pasid path. Is it possible to adjust the core code so that the pasid array is updated after the driver's set_dev_pasid callback returns success?
yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback and pasid_array update is under the group->lock, so update it should be fine to adjust the order to update pasid_array after set_dev_pasid returns.
On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback and pasid_array update is under the group->lock, so update it should be fine to adjust the order to update pasid_array after set_dev_pasid returns.
Yes, it makes some sense
But, also I would like it very much if we just have the core pass in the actual old domain as a an addition function argument.
I think we have some small mistakes in multi-device group error unwinding for remove because the global xarray can't isn't actually going to be correct in all scenarios.
Jason
On 2024/3/19 00:52, Jason Gunthorpe wrote:
On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback and pasid_array update is under the group->lock, so update it should be fine to adjust the order to update pasid_array after set_dev_pasid returns.
Yes, it makes some sense
But, also I would like it very much if we just have the core pass in the actual old domain as a an addition function argument.
ok, this works too. For normal attach, just pass in a NULL old domain.
I think we have some small mistakes in multi-device group error unwinding for remove because the global xarray can't isn't actually going to be correct in all scenarios.
do you mean the __iommu_remove_group_pasid() call in the below function? Currently, it is called when __iommu_set_group_pasid() failed. However, __iommu_set_group_pasid() may need to do remove itself when error happens, so the helper can be more self-contained. Or you mean something else?
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; void *curr; int ret;
if (!domain->ops->set_dev_pasid) return -EOPNOTSUPP;
if (!group) return -ENODEV;
if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) return -EINVAL;
mutex_lock(&group->mutex); curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); if (curr) { ret = xa_err(curr) ? : -EBUSY; goto out_unlock; }
ret = __iommu_set_group_pasid(domain, group, pasid); if (ret) { __iommu_remove_group_pasid(group, pasid); xa_erase(&group->pasid_array, pasid); } out_unlock: mutex_unlock(&group->mutex); return ret; } EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
https://github.com/torvalds/linux/blob/b3603fcb79b1036acae10602bffc4855a4b9a...
On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
On 2024/3/19 00:52, Jason Gunthorpe wrote:
On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback and pasid_array update is under the group->lock, so update it should be fine to adjust the order to update pasid_array after set_dev_pasid returns.
Yes, it makes some sense
But, also I would like it very much if we just have the core pass in the actual old domain as a an addition function argument.
ok, this works too. For normal attach, just pass in a NULL old domain.
I think we have some small mistakes in multi-device group error unwinding for remove because the global xarray can't isn't actually going to be correct in all scenarios.
do you mean the __iommu_remove_group_pasid() call in the below function? Currently, it is called when __iommu_set_group_pasid() failed. However, __iommu_set_group_pasid() may need to do remove itself when error happens, so the helper can be more self-contained. Or you mean something else?
Yes..
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; void *curr; int ret;
if (!domain->ops->set_dev_pasid) return -EOPNOTSUPP;
if (!group) return -ENODEV;
if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) return -EINVAL;
mutex_lock(&group->mutex); curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); if (curr) { ret = xa_err(curr) ? : -EBUSY; goto out_unlock; }
ret = __iommu_set_group_pasid(domain, group, pasid);
So here we have the xa set to the new domain
if (ret) { __iommu_remove_group_pasid(group, pasid);
And here we still have it set to the new domain even though some of the devices within the group failed to attach. The logic needs to be more like the main domain attach path where iterate and then undo only what we did
And the whole thing is easier to reason about if an input argument specifies the current attached domain instead of having the driver read it from the xarray.
Jason
On 2024/3/20 20:38, Jason Gunthorpe wrote:
On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
On 2024/3/19 00:52, Jason Gunthorpe wrote:
On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback and pasid_array update is under the group->lock, so update it should be fine to adjust the order to update pasid_array after set_dev_pasid returns.
Yes, it makes some sense
But, also I would like it very much if we just have the core pass in the actual old domain as a an addition function argument.
ok, this works too. For normal attach, just pass in a NULL old domain.
I think we have some small mistakes in multi-device group error unwinding for remove because the global xarray can't isn't actually going to be correct in all scenarios.
do you mean the __iommu_remove_group_pasid() call in the below function? Currently, it is called when __iommu_set_group_pasid() failed. However, __iommu_set_group_pasid() may need to do remove itself when error happens, so the helper can be more self-contained. Or you mean something else?
Yes..
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; void *curr; int ret;
if (!domain->ops->set_dev_pasid) return -EOPNOTSUPP;
if (!group) return -ENODEV;
if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) return -EINVAL;
mutex_lock(&group->mutex); curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); if (curr) { ret = xa_err(curr) ? : -EBUSY; goto out_unlock; }
ret = __iommu_set_group_pasid(domain, group, pasid);
So here we have the xa set to the new domain
aha, yes. The underlying driver won't be able to get the correct domain in the remove_dev_pasid callback.
if (ret) { __iommu_remove_group_pasid(group, pasid);
And here we still have it set to the new domain even though some of the devices within the group failed to attach. The logic needs to be more like the main domain attach path where iterate and then undo only what we did
yes, the correct way is to undo what have been done before the fail device. However, I somehow remember that pasid capability is only available when the group is singleton. So iterate all devices of the devices just means one device in fact. If this is true, then the current code is fine although a bit confusing.
And the whole thing is easier to reason about if an input argument specifies the current attached domain instead of having the driver read it from the xarray.
yep, will correct it as a fix patch.
On 2024/3/21 14:16, Yi Liu wrote:
On 2024/3/20 20:38, Jason Gunthorpe wrote:
On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
On 2024/3/19 00:52, Jason Gunthorpe wrote:
On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback and pasid_array update is under the group->lock, so update it should be fine to adjust the order to update pasid_array after set_dev_pasid returns.
Yes, it makes some sense
But, also I would like it very much if we just have the core pass in the actual old domain as a an addition function argument.
ok, this works too. For normal attach, just pass in a NULL old domain.
I think we have some small mistakes in multi-device group error unwinding for remove because the global xarray can't isn't actually going to be correct in all scenarios.
do you mean the __iommu_remove_group_pasid() call in the below function? Currently, it is called when __iommu_set_group_pasid() failed. However, __iommu_set_group_pasid() may need to do remove itself when error happens, so the helper can be more self-contained. Or you mean something else?
Yes..
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; void *curr; int ret;
if (!domain->ops->set_dev_pasid) return -EOPNOTSUPP;
if (!group) return -ENODEV;
if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) return -EINVAL;
mutex_lock(&group->mutex); curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); if (curr) { ret = xa_err(curr) ? : -EBUSY; goto out_unlock; }
ret = __iommu_set_group_pasid(domain, group, pasid);
So here we have the xa set to the new domain
aha, yes. The underlying driver won't be able to get the correct domain in the remove_dev_pasid callback.
if (ret) { __iommu_remove_group_pasid(group, pasid);
And here we still have it set to the new domain even though some of the devices within the group failed to attach. The logic needs to be more like the main domain attach path where iterate and then undo only what we did
yes, the correct way is to undo what have been done before the fail device. However, I somehow remember that pasid capability is only available when the group is singleton. So iterate all devices of the devices just means one device in fact. If this is true, then the current code is fine although a bit confusing.
And the whole thing is easier to reason about if an input argument specifies the current attached domain instead of having the driver read it from the xarray.
yep, will correct it as a fix patch.
Hi Jason,
It appears there are two solutions here.
First, only undo the devices that have set_dev_pasid successfully in the __iommu_set_group_pasid(), so the problematic __iommu_remove_group_pasid() call at line 3378 [1] would go away. This also makes the helper more self-contained. Draft patch in [2]
Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]
Either of the above two should be able to solve the mistake you mentioned. BTW. They are orthogonal, so it's also possible to apply both of them. Which one is your preference then?
[1] https://github.com/torvalds/linux/blob/b3603fcb79b1036acae10602bffc4855a4b9a... [2] https://github.com/yiliu1765/iommufd/commit/bb83270767ab460978a3452750f5e032... [3] https://github.com/yiliu1765/iommufd/commit/bd270b7b6619b8ba35dfaf9870df8728...
Regards, Yi Liu
On Thu, Mar 21, 2024 at 07:26:41PM +0800, Yi Liu wrote:
yes, the correct way is to undo what have been done before the fail device. However, I somehow remember that pasid capability is only available when the group is singleton. So iterate all devices of the devices just means one device in fact. If this is true, then the current code is fine although a bit confusing.
Platform devicse don't have that limitation.. It is PCI only.
And the whole thing is easier to reason about if an input argument specifies the current attached domain instead of having the driver read it from the xarray.
yep, will correct it as a fix patch.
Hi Jason,
It appears there are two solutions here.
First, only undo the devices that have set_dev_pasid successfully in the __iommu_set_group_pasid(), so the problematic __iommu_remove_group_pasid() call at line 3378 [1] would go away. This also makes the helper more self-contained. Draft patch in [2]
Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]
Either of the above two should be able to solve the mistake you mentioned. BTW. They are orthogonal, so it's also possible to apply both of them. Which one is your preference then?
I would do both because I also think it is not nice that the drivers always have to have the boiler plate to read the xarray in their remove..
Jason
On 2024/3/21 20:20, Jason Gunthorpe wrote:
On Thu, Mar 21, 2024 at 07:26:41PM +0800, Yi Liu wrote:
yes, the correct way is to undo what have been done before the fail device. However, I somehow remember that pasid capability is only available when the group is singleton. So iterate all devices of the devices just means one device in fact. If this is true, then the current code is fine although a bit confusing.
Platform devicse don't have that limitation.. It is PCI only.
ok.
And the whole thing is easier to reason about if an input argument specifies the current attached domain instead of having the driver read it from the xarray.
yep, will correct it as a fix patch.
Hi Jason,
It appears there are two solutions here.
First, only undo the devices that have set_dev_pasid successfully in the __iommu_set_group_pasid(), so the problematic __iommu_remove_group_pasid() call at line 3378 [1] would go away. This also makes the helper more self-contained. Draft patch in [2]
Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]
Either of the above two should be able to solve the mistake you mentioned. BTW. They are orthogonal, so it's also possible to apply both of them. Which one is your preference then?
I would do both because I also think it is not nice that the drivers always have to have the boiler plate to read the xarray in their remove..
sure. I'll send the two patches as Fix series.
linux-kselftest-mirror@lists.linaro.org