 
            On 2024/3/11 17:26, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Sunday, March 10, 2024 9:06 PM
On 2024/1/16 01:19, Jason Gunthorpe wrote:
On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
+int iommu_replace_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)+{
- struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *old_domain;
- int ret;
- if (!domain)
return -EINVAL;- if (!group)
return -ENODEV;- mutex_lock(&group->mutex);
- __iommu_remove_group_pasid(group, pasid);
It is not replace if you do remove first.
Replace must just call set_dev_pasid and nothing much else..
Seems uneasy to do it so far. The VT-d driver needs to get the old domain first in order to do replacement. However, VT-d driver does not track the attached domains of pasids. It gets domain of a pasid by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid) in link [1]. While the iommu layer exchanges the domain in the group->pasid_array before calling into VT-d driver. So when calling into VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already the new domain. To solve it, we may need to let VT-d driver have its own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?
[1] https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu .c#L4621C19-L4621C20
Jason's point was that the core helper should directly call set_dev_pasid and underlying iommu driver decides whether atomic replacement can be implemented inside the callback. If you first remove in the core then one can never implement a replace semantics.
yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid callback to handle domain replacement. The existing intel iommu driver does not track attached domains for pasid. But it's needed to know the attached domain of a pasid and find the dev_pasid under the domain, hence be able to clean up the old attachment and do the new attachment. Existing code cannot work as I mentioned above. The group->pasid_xarray is updated before calling set_dev_pasid callback. This means the underlying iommu driver cannot get the old domain in the set_dev_pasid callback by the iommu_get_domain_for_dev_pasid() helper.
As above, I'm considering the possibility to track attached domains for pasid by an xarray in the struct device_domain_info. Hence, intel iommu driver could store/retrieve attached domain for pasids. If it's replacement, the set_dev_pasid callback could find the attached domain and the dev_pasid instance in the domain. Then be able to do detach and clean up the tracking structures (e.g. dev_pasid).
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index e58263dcfadd..03a847a406e1 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4398,6 +4398,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
info->dev = dev; info->iommu = iommu; + xa_init(&info->pasid_array); if (dev_is_pci(dev)) { if (ecap_dev_iotlb_support(iommu->ecap) && pci_ats_supported(pdev) && @@ -4464,6 +4465,7 @@ static void intel_iommu_release_device(struct device *dev) mutex_unlock(&iommu->iopf_lock); dmar_remove_one_dev_info(dev); intel_pasid_free_table(dev); + WARN_ON(!xa_empty(&info->pasid_array)); intel_iommu_debugfs_remove_dev(info); kfree(info); set_dma_ops(dev, NULL); @@ -4707,7 +4709,13 @@ static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain, return 0; }
-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +/* + * Clear the pasid table entries so that all DMA requests with specific + * PASID from the device are blocked. If the page table has been set, clean + * up the data structures. + */ +static void device_pasid_block_translation(struct device *dev, ioasid_t pasid, + bool remove) { struct device_domain_info *info = dev_iommu_priv_get(dev); struct dev_pasid_info *curr, *dev_pasid = NULL; @@ -4716,9 +4724,11 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) struct iommu_domain *domain; unsigned long flags;
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0); - if (WARN_ON_ONCE(!domain)) + domain = xa_erase(&info->pasid_array, pasid); + if (!domain) { + WARN_ON_ONCE(remove); return; + }
/* * The SVA implementation needs to handle its own stuffs like the mm @@ -4753,6 +4763,11 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) intel_drain_pasid_prq(dev, pasid); }
+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +{ + device_pasid_block_translation(dev, pasid, true); +} + static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) { @@ -4772,6 +4787,9 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, if (context_copied(iommu, info->bus, info->devfn)) return -EBUSY;
+ /* Try to block translation if it already has */ + device_pasid_block_translation(dev, pasid, false); + ret = prepare_domain_attach_device(domain, dev); if (ret) return ret; @@ -4801,6 +4819,8 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ xa_store(&info->pasid_array, pasid, dmar_domain, GFP_KERNEL); + if (domain->type & __IOMMU_DOMAIN_PAGING) intel_iommu_debugfs_create_dev_pasid(dev_pasid);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 94e2ecc3c73f..a466555f61fe 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -767,6 +767,7 @@ struct device_domain_info { #ifdef CONFIG_INTEL_IOMMU_DEBUGFS struct dentry *debugfs_dentry; /* pointer to device directory dentry */ #endif + struct xarray pasid_array; /* Attached domains per PASID */ };
struct dev_pasid_info {