On Mon, Feb 06, 2023 at 06:57:35AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, February 3, 2023 11:03 PM
On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Thursday, February 2, 2023 3:05 PM
All drivers are already required to support changing between active UNMANAGED domains when using their attach_dev ops.
All drivers which don't have *broken* UNMANAGED domain?
No, all drivers.. It has always been used by VFIO.
existing iommu_attach_group() doesn't support changing between two UNMANAGED domains. only from default->unmanaged or blocking->unmanaged.
Yes, but before we added the blocking domains VFIO was changing between unmanaged domains. Blocking domains are so new that no driver could have suddenly started to depend on this.
Can you elaborate the error handling here? Ideally if __iommu_group_set_domain() fails then group->domain shouldn't be changed.
That isn't what it implements though. The internal helper leaves things in a mess, it is for the caller to fix it, and it depends on the caller what that means.
I didn't see any warning of the mess and the caller's responsibility in __iommu_group_set_domain(). Can it be documented clearly so if someone wants to add a new caller on it he can clearly know what to do?
That would be nice..
and why doesn't iommu_attach_group() need to do anything when an attach attempt fails? In the end it calls the same iommu_group_do_attach_device() as __iommu_group_set_domain() does.
That's a bug for sure.
btw looking at the code __iommu_group_set_domain():
* Note that this is called in error unwind paths, attaching to a * domain that has already been attached cannot fail. */
ret = __iommu_group_for_each_dev(group, new_domain, iommu_group_do_attach_device);
with that we don't need fall back to core domain in above error unwinding per this comment.
That does make some sense.
I tried to make a patch to consolidate all this error handling once, that would be the better way to approach this.
In this case the API cannot retain a hidden reference to the new domain, so it must be purged, one way or another.
Could you elaborate where the hidden reference is retained?
Inside the driver, it can keep track of the domain pointer if attach_dev succeeds
Jason