On Wed, Sep 07, 2022 at 01:12:52PM -0300, Jason Gunthorpe wrote:
The PCI offset is some embedded thing - I've never seen it in a server platform.
That's not actually true, e.g. some power system definitively had it, althiugh I don't know if the current ones do.
But that's not that point. The offset is a configuration fully supported by Linux, and someone that just works by using the proper APIs. Doing some handwaiving about embedded only or bad design doesn't matter. There is a reason why we have these proper APIs and no one has any business bypassing them.
I also seem to remember that iommu and PCI offset don't play nice together - so for the VFIO use case where the iommu is present I'm pretty sure we can very safely assume 0 offset. That seems confirmed by the fact that VFIO has never handled PCI offset in its own P2P path and P2P works fine in VMs across a wide range of platforms.
I think the offset is one of the reasons why IOVA windows can be reserved (and maybe also why ppc is so weird).
So, would you be OK with this series if I try to make a dma_map_p2p() that resolves the offset issue?
Well, if it also solves the other issue of invalid scatterlists leaking outside of drm we can think about it.
Last but not least I don't really see how the code would even work when an IOMMU is used, as dma_map_resource will return an IOVA that is only understood by the IOMMU itself, and not the other endpoint.
I don't understand this.
__iommu_dma_map() will put the given phys into the iommu_domain associated with 'dev' and return the IOVA it picked.
Yes, __iommu_dma_map creates an IOVA for the mapped remote BAR. That is the right thing if the I/O goes through the host bridge, but it is the wrong thing if the I/O goes through the switch - in that case the IOVA generated is not something that the endpoint that owns the BAR can even understand.
Take a look at iommu_dma_map_sg and pci_p2pdma_map_segment to see how this is handled.