From: Jason Gunthorpe jgg@nvidia.com Sent: Monday, October 16, 2023 7:58 PM
On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 14, 2023 8:45 AM
On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed
hwpt
things. So, they should be only setup on kernel-managed domains. If the
attaching
domain is a user-managed domain, redirect the hwpt to hwpt->parent
to
do
it correctly.
No redirection. The parent should already have the configuration done when it's created. It shouldn't be triggered in the nesting path.
iommufd_hw_pagetable_enforce_cc() is not only called in alloc(), but also in hwpt_attach/replace() if cc is not enforced by the alloc() because the idev that initiates the hwpt_alloc() might not have idev->enforce_cache_coherency. Only when another idev that has idev->enforce_cache_coherency attaches to the shared hwpt, the cc configuration would be done.
is this a bug already? If the 1st device doesn't have enforce_cc in its iommu, setting the snp bit in the hwpt would lead to reserved bit violation.
I suspect there are technically some gaps in the intel driver, yes..
double checked. intel driver is doing right thing now:
intel_iommu_enforce_cache_coherency() domain_support_force_snooping()
static bool domain_support_force_snooping(struct dmar_domain *domain) { struct device_domain_info *info; bool support = true;
assert_spin_locked(&domain->lock); list_for_each_entry(info, &domain->devices, link) { if (!ecap_sc_support(info->iommu->ecap)) { support = false; break; } }
return support; }
another problem is that intel_iommu_enforce_cache_coherency() doesn't update existing entries. It only sets a domain flag to affect future mappings. so it means the 2nd idev is also broken.
This is such a gap, intel driver should not permit that.
yes. @Baolu, can you add a fix?
The simplest option is to follow vfio type1 i.e. don't mix devices with different enforce_cc in one domain.
This is why I wanted to get rid of this bad mechanism going forward.
Manually created hwpt should have a manual specification of cc and then we don't have so many problems.
It means userspace needs to compute if they want to use CC or not, but userspace already needs to figure this out since without autodomains it must create two hwpts manually anyhow.
Now there is no interface reporting enforce_cc to userspace.
Actually the user doesn't need to know enforce_cc. As long as there is an attach incompatibility error the user has to create a 2nd hwpt for the to-be-attached device then enforce_cc will be handled automatically in hwpt_alloc.
I prefer to removing enforce_cc in attach fn completely then no parent trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to figure out the attaching compatibility:
- attaching a idev with matching enforce_cc as hwpt is always allowed. - attaching a idev w/o enforce_cc to a hwpt with enforce_cc is rejected. - attaching a idev w/ enforce_cc to a hwpt w/o enforce_cc succeeds with hwpt continuing to be w/o enforce_cc. No promotion.
above has been guaranteed by intel iommu driver.