From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 20, 2023 7:54 AM
On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:
Otherwise we have a problem where the order devices are attached to the domain decides how many domains you get. ie the first device attached does not want CC (but is compatible with it) so we create a non-CC domain
in autodetect model this won't happen. If the first device is capable of enforce_cc then the domain will be created with enforce_cc.
there is no "does not want CC" input in autodetect.
Then later we attach a device that does want CC and now we are forced to create a second iommu domain when upgrading the first domain
would
have been fine.
then in this case the 2nd device will reuse the domain.
Then you have the reverse problem where the domain will not be CC when it should be.
If the domain has been non-CC it's perfectly fine for the 2nd device with CC to reuse it. As long as there is one domain with non-CC then KVM has to have special treatment on wbinvd. In this case there is actually a benefit to use just one non-CC domain for all devices (either non-CC or CC).
What we want to prevent is attaching a non-CC device to a CC domain or upgrade a non-CC domain to CC since in both case the non-CC device will be broken due to incompatible page table format.
Hotplug is another scenario (though Intel driver does not support it, and it looks broken)
Can you elaborate how hotplug is broken? If device is hotplugged and failed to attach to an existing domain, then a new one will be created for it.
A non-cc domain will ask to be upgraded and the driver will let it happen even though it doesn't/can't fix the existing IOPTEs
iommufd should not ask for upgrade at all. The CC attribute of domain should be fixed since creation time.
Baolu will fix the intel-iommu driver accordingly.
there is indeed a broken case in KVM which cannot handle dynamic change of coherency due to hotplug. But that one is orthogonal to what we discussed here about how to decide cc in iommufd.
That too
Really, I hate this CC mechanism. It is only for Intel, can we just
Intel and AMD.
Nope, AMD just hardwires their IOMMU to always do CC enforcing. All this mess is only for Intel and their weird IOMMU that can only do the enforcement for a GPU.
punt it to userspace and have an intel 'want cc flag' for the entire nesting path and forget about it?
I dislike it too. But still not get your point why adding such a flag can really simplify things. As explained before the only difference between autodetect and having a user flag just affects the decision of cc when creating a hwpt. whether we should upgrade in the attach path is an orthogonal open which imho is unnecessary and Nicoline's patches to remove that check then also remove this patch makes lot of sense to me.
I don't think we can remove it, it is supposed to provide consistency of result regardless of ordering.
Who cares about such consistency? sure the result is different due to order:
1) creating hwpt for dev1 (non-CC) then later attaching hwpt to dev2 (CC) will succeed;
2) creating hwpt for dev2 (CC) then later attaching hwpt to dev1 (non-CC) will fail then the user should create a new hwpt for dev1;
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊