From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, April 11, 2023 10:31 PM
On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
If no oversight then we can directly put the lock in iommufd_hw_pagetable_attach/detach() which can also simplify a bit on its callers in device.c.
So, I did this, and syzkaller explains why this can't be done:
https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com
We can't allow the hwpt to be discovered by a parallel iommufd_hw_pagetable_attach() until it is done being setup, otherwise if we fail to set it up we can't destroy the hwpt.
if (immediate_attach) { rc = iommufd_hw_pagetable_attach(hwpt, idev); if (rc) goto out_abort; }
rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); if (rc) goto out_detach; list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); return hwpt;
out_detach: if (immediate_attach) iommufd_hw_pagetable_detach(idev); out_abort: iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
As some other idev could be pointing at it too now.
How could this happen before this object is finalized? iirc you pointed to me this fact in previous discussion.
For this specific lockdep issue isn't the simple fix is to move the group lock into iommufd_hw_pagetable_detach() just like done in attach()?
So the lock has to come back out..
Jason