On 11/7/25 8:01 AM, Leon Romanovsky wrote:
On Thu, Nov 06, 2025 at 10:15:07PM -0800, Randy Dunlap wrote:
On 11/6/25 6:16 AM, Leon Romanovsky wrote:
From: Jason Gunthorpe jgg@nvidia.com
Reflect latest changes in p2p implementation to support DMABUF lifecycle.
Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
Documentation/driver-api/pci/p2pdma.rst | 95 +++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 23 deletions(-)
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst index d0b241628cf1..69adea45f73e 100644 --- a/Documentation/driver-api/pci/p2pdma.rst +++ b/Documentation/driver-api/pci/p2pdma.rst @@ -9,22 +9,47 @@ between two devices on the bus. This type of transaction is henceforth called Peer-to-Peer (or P2P). However, there are a number of issues that make P2P transactions tricky to do in a perfectly safe way. -One of the biggest issues is that PCI doesn't require forwarding -transactions between hierarchy domains, and in PCIe, each Root Port -defines a separate hierarchy domain. To make things worse, there is no -simple way to determine if a given Root Complex supports this or not. -(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel -only supports doing P2P when the endpoints involved are all behind the -same PCI bridge, as such devices are all in the same PCI hierarchy -domain, and the spec guarantees that all transactions within the -hierarchy will be routable, but it does not require routing -between hierarchies.
-The second issue is that to make use of existing interfaces in Linux, -memory that is used for P2P transactions needs to be backed by struct -pages. However, PCI BARs are not typically cache coherent so there are -a few corner case gotchas with these pages so developers need to -be careful about what they do with them. +For PCIe the routing of TLPs is well defined up until they reach a host bridge
Define what TLP means?
In PCIe "world", TLP is very well-known and well-defined acronym, which means Transaction Layer Packet.
It's your choice (or Bjorn's). I'm just reviewing...
well-definedThanks
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst index 69adea45f73e..7530296a5dea 100644 --- a/Documentation/driver-api/pci/p2pdma.rst +++ b/Documentation/driver-api/pci/p2pdma.rst @@ -9,17 +9,17 @@ between two devices on the bus. This type of transaction is henceforth called Peer-to-Peer (or P2P). However, there are a number of issues that make P2P transactions tricky to do in a perfectly safe way.
-For PCIe the routing of TLPs is well defined up until they reach a host bridge -or root port. If the path includes PCIe switches then based on the ACS settings -the transaction can route entirely within the PCIe hierarchy and never reach the -root port. The kernel will evaluate the PCIe topology and always permit P2P -in these well defined cases. +For PCIe the routing of Transaction Layer Packets (TLPs) is well-defined up +until they reach a host bridge or root port. If the path includes PCIe switches +then based on the ACS settings the transaction can route entirely within +the PCIe hierarchy and never reach the root port. The kernel will evaluate +the PCIe topology and always permit P2P in these well-defined cases.
However, if the P2P transaction reaches the host bridge then it might have to hairpin back out the same root port, be routed inside the CPU SOC to another PCIe root port, or routed internally to the SOC.
-As this is not well defined or well supported in real HW the kernel defaults to +As this is not well-defined or well supported in real HW the kernel defaults to
Nit: well-supported
The rest of it looks good. Thanks.
blocking such routing. There is an allow list to allow detecting known-good HW, in which case P2P between any two PCIe devices will be permitted.
@@ -39,7 +39,7 @@ delegates lifecycle management to the providing driver. It is expected that drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF to provide an invalidation shutdown. These MMIO pages have no struct page, and if used with mmap() must create special PTEs. As such there are very few -kernel uAPIs that can accept pointers to them, in particular they cannot be used +kernel uAPIs that can accept pointers to them; in particular they cannot be used with read()/write(), including O_DIRECT.
Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE @@ -154,7 +154,7 @@ access happens. Usage With DMABUF =================
-DMABUF provides an alternative to the above struct page based +DMABUF provides an alternative to the above struct page-based client/provider/orchestrator system. In this mode the exporting driver will wrap some of its MMIO in a DMABUF and give the DMABUF FD to userspace.
@@ -162,10 +162,10 @@ Userspace can then pass the FD to an importing driver which will ask the exporting driver to map it.
In this case the initiator and target pci_devices are known and the P2P subsystem -is used to determine the mapping type. The phys_addr_t based DMA API is used to +is used to determine the mapping type. The phys_addr_t-based DMA API is used to establish the dma_addr_t.
-Lifecycle is controlled by DMABUF move_notify(), when the exporting driver wants +Lifecycle is controlled by DMABUF move_notify(). When the exporting driver wants to remove() it must deliver an invalidation shutdown to all DMABUF importing drivers through move_notify() and synchronously DMA unmap all the MMIO.