On Wed, Jul 30, 2025 at 03:49:45PM +0100, Robin Murphy wrote:
On 2025-07-29 9:13 pm, Jason Gunthorpe wrote:
On Tue, Jul 29, 2025 at 08:44:21PM +0100, Robin Murphy wrote:
In this case with just one single contiguous mapping, it is clearly objectively worse to have to bounce in and out of the IOMMU layer 3 separate times and store a dma_map_state,
The non-contiguous mappings are comming back, it was in earlier drafts of this. Regardless, the point is to show how to use the general API that we would want to bring into the DRM drivers that don't have contiguity even though VFIO is a bit special.
Oh yeah, and mapping MMIO with regular memory attributes (IOMMU_CACHE) rather than appropriate ones (IOMMU_MMIO), as this will end up doing, isn't guaranteed not to end badly either (e.g. if the system interconnect ends up merging consecutive write bursts and exceeding the target root port's MPS.)
Yes, I recently noticed this too, it should be fixed..
But so we are all on the same page, alot of the PCI P2P systems are setup so P2P does not transit through the iommu. It either takes the ACS path through a switch or it uses ATS and takes a different ACS path through a switch. It only transits through the iommu in misconfigured systems or in the rarer case of P2P between root ports.
For non-ATS (and ATS Untranslated traffic), my understanding is that we rely on ACS upstream redirect to send transactions all the way up to the root port for translation (and without that then they are indeed pure bus addresses, take the pci_p2pdma_bus_addr_map() case,
My point is it is common for real systems to take the pci_p2pdma_bus_addr_map() path. Going through the RP is too slow.
all irrelevant). In Arm system terms, simpler root ports may well have to run that traffic out to an external SMMU TBU, at which point any P2P would loop back externally through the memory space window in the system
Many real systems simply don't support this at all :(
But of course, if it's not dma-direct because we're on POWER with TCE, rather than VFIO Type1 implying an iommu-dma/dma-direct arch, then who knows? I imagine the complete absence of any mention means this hasn't been tried, or possibly even considered?
POWER uses dma_ops and the point of this design is that dma_may_phys() will still call the dma_ops. See below.
I don't get what you mean by "not be a full no-op", can you clarify exactly what you think it should be doing? Even if it's just the dma_capable() mask check equivalent to dma_direct_map_resource(), you don't actually want that here either - in that case you'd want to fail the entire attachment to begin with since it can never work.
The expectation would be if the dma mapping can't succeed then the phys map should fail. So if dma_capable() or whatever is not OK then fail inside the loop and unwind back to failing the whole attach.
It should be failing for cases where it is not supported (ie swiotlb=force), it should still be calling the legacy dma_ops, and it should be undoing any CC mangling with the address. (also the pci_p2pdma_bus_addr_map() needs to deal with any CC issues too)
Um, my whole point is that the "legacy DMA ops" cannot be called, because they still assume page-backed memory, so at best are guaranteed to fail; any "CC mangling" assumed for memory is most likely wrong for MMIO, and there simply is no "deal with" at this point.
I think we all agreed it should use the resource path. So legacy DMA ops, including POWER, should end up calling
struct dma_map_ops { dma_addr_t (*map_resource)(struct device *dev, phys_addr_t phys_addr, size_t size, enum dma_data_direction dir, unsigned long attrs);
And if that is NULL it should fail.
A device BAR is simply not under control of the trusted hypervisor the same way memory is;
I'm not sure what you mean? I think it is, at least for CC I expect ACS to be setup to force translation and this squarly puts access to the MMIO BAR under control of the the S2 translation.
In ARM terms I expect that the RMM's S2 will contain the MMIO BAR at the shared IPA (ie top bit set), which will match where the CPU should access it? Linux's IOMMU S2 should mirror this and put the MMIO BAR at the shared IPA. Meaning upon locking the MMIO phys_addr_t effectively moves?
At least I would be surprised to hear that shared MMIO was placed in the private IPA space??
Outside CC we do have a rare configuration where the ACS is not forcing translation and then your remarks are true. Hypervisor must enfroce IPA == GPA == bus addr. It's a painful configuration to make work.
Sticking to Arm CCA terminology for example, if a device in shared state tries to import a BAR from a device in locked/private state, there is no notion of touching the shared alias and hoping it somehow magically works (at best it might throw the exporting device into TDISP error state terminally);
Right, we don't support T=1 DMA yet, or locked devices, but when we do the p2pdma layer needs to be fixed up to catch this and reject it.
I think it is pretty easy, the p2pdma_provider struct can record if the exporting struct device has shared or private MMIO. Then when doing the mapping we require that private MMIO be accessed from T=1.
This should be addressed as part of enabling PCI T=1 support, eg in ARM terms along with Aneesh's series "ARM CCA Device Assignment support"
simply cannot be allowed. If an shared resource exists in the shared IPA space to begin with, dma_to_phys() will do the wrong thing, and even phys_to_dma() would technically not walk dma_range_map correctly, because both assume "phys" represents kernel memory.
As above for CC I am expecting that translation will always be required. The S2 in both the RMM and hypervisor SMMUs should both have shared accessiblity for whatever phys_addr the CPU is using.
So phys_to_dma() just needs to return the normal CPU phys_addr_t to work, and this looks believable to me. ARM forces the shared IPA through dma_addr_unencrypted(), but it is already wrong for the core code to call that function for "encrypted" MMIO.
Not sure about the ranges or dma_to_phys(), I doubt anyone has ever tested this so it probably doesn't work - but I don't see anything architecturally catastrophic here, just some bugs.
However it's also all moot since any attempt at any combination will fail anyway due to SWIOTLB being forced by is_realm_world().
Yep.
Basically P2P for ARM CCA today needs some bug fixing and testing - not surprising. ARM CCA is already rare, and even we don't use P2P under any CC architecture today.
I'm sure it will be fixed as a separate work, at least we will soon care about P2P on ARM CCA working.
Regardless, from a driver perspective none of the CC detail should leak into VFIO. The P2P APIs and the DMA APIs are the right place to abstract it away, and yes they probably fail to do so right now.
I'm guessing that if DMA_ATTR_MMIO is agreed then a DMA_ATTR_MMIO_ENCRYPTED would be the logical step. That should provide enough detail that the DMA API can compute correct addressing.
Maybe this whole discussion improves the case for DMA_ATTR_MMIO.
Jason