Hi,
This series is the follow-up of the discussion that John and I had some
time ago here:
https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrr…
The initial problem we were discussing was that I'm currently working on
a platform which has a memory layout with ECC enabled. However, enabling
the ECC has a number of drawbacks on that platform: lower performance,
increased memory usage, etc. So for things like framebuffers, the
trade-off isn't great and thus there's a memory region with ECC disabled
to allocate from for such use cases.
After a suggestion from John, I chose to first start using heap
allocations flags to allow for userspace to ask for a particular ECC
setup. This is then backed by a new heap type that runs from reserved
memory chunks flagged as such, and the existing DT properties to specify
the ECC properties.
After further discussion, it was considered that flags were not the
right solution, and relying on the names of the heaps would be enough to
let userspace know the kind of buffer it deals with.
Thus, even though the uAPI part of it has been dropped in this second
version, we still need a driver to create heaps out of carved-out memory
regions. In addition to the original usecase, a similar driver can be
found in BSPs from most vendors, so I believe it would be a useful
addition to the kernel.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v5:
- Rebased on 6.16-rc2
- Switch from property to dedicated binding
- Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kerne…
Changes in v4:
- Rebased on 6.15-rc7
- Map buffers only when map is actually called, not at allocation time
- Deal with restricted-dma-pool and shared-dma-pool
- Reword Kconfig options
- Properly report dma_map_sgtable failures
- Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kerne…
Changes in v3:
- Reworked global variable patch
- Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kerne…
Changes in v2:
- Add vmap/vunmap operations
- Drop ECC flags uapi
- Rebase on top of 6.14
- Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kerne…
---
Maxime Ripard (2):
dt-bindings: reserved-memory: Introduce carved-out memory region binding
dma-buf: heaps: Introduce a new heap for reserved memory
.../bindings/reserved-memory/carved-out.yaml | 49 +++
drivers/dma-buf/heaps/Kconfig | 8 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/carveout_heap.c | 362 +++++++++++++++++++++
4 files changed, 420 insertions(+)
---
base-commit: d076bed8cb108ba2236d4d49c92303fda4036893
change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Changelog:
v1:
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v1 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO regions
from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct
page memory with controlled lifetime management. This allows RDMA and other
subsystems to import dma-buf FDs and build them into memory regions for PCI
P2P operations.
The series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
Leon Romanovsky (8):
PCI/P2PDMA: Remove redundant bus_offset from map state
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory
allocation
PCI/P2PDMA: Export pci_p2pdma_map_type() function
types: move phys_vec definition to common header
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature
functions
block/blk-mq-dma.c | 7 +-
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 154 ++++++++----
drivers/vfio/pci/Kconfig | 20 ++
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/vfio_pci_config.c | 22 +-
drivers/vfio/pci/vfio_pci_core.c | 59 +++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 390 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 ++
drivers/vfio/vfio_main.c | 2 +
include/linux/dma-buf.h | 1 +
include/linux/pci-p2pdma.h | 114 +++++----
include/linux/types.h | 5 +
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 25 ++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
18 files changed, 715 insertions(+), 125 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
--
2.50.1
Hi Ling,
kernel test robot noticed the following build warnings:
[auto build test WARNING on char-misc/char-misc-testing]
[also build test WARNING on char-misc/char-misc-next char-misc/char-misc-linus linus/master v6.16 next-20250806]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Ling-Xu/misc-fastrpc-Save-ac…
base: char-misc/char-misc-testing
patch link: https://lore.kernel.org/r/20250806115114.688814-5-quic_lxu5%40quicinc.com
patch subject: [PATCH v2 4/4] misc: fastrpc: Skip reference for DMA handles
config: hexagon-randconfig-002-20250807 (https://download.01.org/0day-ci/archive/20250807/202508070731.S30957lV-lkp@…)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 7b8dea265e72c3037b6b1e54d5ab51b7e14f328b)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250807/202508070731.S30957lV-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508070731.S30957lV-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/misc/fastrpc.c:368:30: warning: unused variable 'sess' [-Wunused-variable]
368 | struct fastrpc_session_ctx *sess = fl->sctx;
| ^~~~
1 warning generated.
vim +/sess +368 drivers/misc/fastrpc.c
c68cfb718c8f97 Srinivas Kandagatla 2019-02-08 363
8f6c1d8c4f0cc3 Vamsi Krishna Gattupalli 2022-02-14 364
8f6c1d8c4f0cc3 Vamsi Krishna Gattupalli 2022-02-14 365 static int fastrpc_map_lookup(struct fastrpc_user *fl, int fd,
1922c68c56c660 Ling Xu 2025-08-06 366 struct fastrpc_map **ppmap)
c68cfb718c8f97 Srinivas Kandagatla 2019-02-08 367 {
9446fa1683a7e3 Abel Vesa 2022-11-24 @368 struct fastrpc_session_ctx *sess = fl->sctx;
c68cfb718c8f97 Srinivas Kandagatla 2019-02-08 369 struct fastrpc_map *map = NULL;
d259063578ed76 Ling Xu 2025-08-06 370 struct dma_buf *buf;
9446fa1683a7e3 Abel Vesa 2022-11-24 371 int ret = -ENOENT;
c68cfb718c8f97 Srinivas Kandagatla 2019-02-08 372
d259063578ed76 Ling Xu 2025-08-06 373 buf = dma_buf_get(fd);
d259063578ed76 Ling Xu 2025-08-06 374 if (IS_ERR(buf))
d259063578ed76 Ling Xu 2025-08-06 375 return PTR_ERR(buf);
d259063578ed76 Ling Xu 2025-08-06 376
9446fa1683a7e3 Abel Vesa 2022-11-24 377 spin_lock(&fl->lock);
c68cfb718c8f97 Srinivas Kandagatla 2019-02-08 378 list_for_each_entry(map, &fl->maps, node) {
d259063578ed76 Ling Xu 2025-08-06 379 if (map->fd != fd || map->buf != buf)
9446fa1683a7e3 Abel Vesa 2022-11-24 380 continue;
9446fa1683a7e3 Abel Vesa 2022-11-24 381
9446fa1683a7e3 Abel Vesa 2022-11-24 382 *ppmap = map;
9446fa1683a7e3 Abel Vesa 2022-11-24 383 ret = 0;
9446fa1683a7e3 Abel Vesa 2022-11-24 384 break;
c68cfb718c8f97 Srinivas Kandagatla 2019-02-08 385 }
9446fa1683a7e3 Abel Vesa 2022-11-24 386 spin_unlock(&fl->lock);
8f6c1d8c4f0cc3 Vamsi Krishna Gattupalli 2022-02-14 387
8f6c1d8c4f0cc3 Vamsi Krishna Gattupalli 2022-02-14 388 return ret;
8f6c1d8c4f0cc3 Vamsi Krishna Gattupalli 2022-02-14 389 }
8f6c1d8c4f0cc3 Vamsi Krishna Gattupalli 2022-02-14 390
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Mon, Aug 04, 2025 at 10:10:32AM -0400, Benjamin LaHaise wrote:
> FYI: this entire patch series was rejected as spam by large numbers of
> linux-mm subscribers using @gmail.com email addresses.
Thanks for the heads-up. Are you aware of any issues from my side?
I'm sending patches with git-send-email through mail.kernel.org SMTP.
Thanks
>
> -ben (owner-linux-mm)
From: Leon Romanovsky <leonro(a)nvidia.com>
---------------------------------------------------------------------------
Based on blk and DMA patches which will be sent during coming merge window.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO regions
from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct
page memory with controlled lifetime management. This allows RDMA and other
subsystems to import dma-buf FDs and build them into memory regions for PCI
P2P operations.
The series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
This is based on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
Leon Romanovsky (8):
PCI/P2PDMA: Remove redundant bus_offset from map state
PCI/P2PDMA: Introduce p2pdma_provider structure for cleaner
abstraction
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory
allocation
PCI/P2PDMA: Export pci_p2pdma_map_type() function
types: move phys_vec definition to common header
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature
functions
block/blk-mq-dma.c | 7 +-
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 144 +++++++++----
drivers/vfio/pci/Kconfig | 20 ++
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/vfio_pci_config.c | 22 +-
drivers/vfio/pci/vfio_pci_core.c | 59 ++++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 321 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 +++
drivers/vfio/vfio_main.c | 2 +
include/linux/dma-buf.h | 1 +
include/linux/pci-p2pdma.h | 114 +++++-----
include/linux/types.h | 5 +
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 19 ++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
18 files changed, 631 insertions(+), 124 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
--
2.50.1
On Tue, Jul 29, 2025 at 02:54:13PM -0600, Logan Gunthorpe wrote:
>
>
> On 2025-07-28 17:11, Jason Gunthorpe wrote:
> >> If the dma mapping for P2P memory doesn't need to create an iommu
> >> mapping then that's fine. But it should be the dma-iommu layer to decide
> >> that.
> >
> > So above, we can't use dma-iommu.c, it might not be compiled into the
> > kernel but the dma_map_phys() path is still valid.
>
> This is an easily solved problem. I did a very rough sketch below to say
> it's really not that hard. (Note it has some rough edges that could be
> cleaned up and I based it off Leon's git repo which appears to not be
> the same as what was posted, but the core concept is sound).
I started to prepare v2, this is why posted version is slightly
different from dmabuf-vfio branch.
In addition to what Jason wrote. there is an extra complexity with using
state. The wrappers which operate on dma_iova_state assume that all memory,
which is going to be mapped, is the same type: or p2p or not.
This is not the cased for HMM/RDMA users, there you create state in
advance and get mixed type of pages.
Thanks
On Tue, Jul 29, 2025 at 02:54:13PM -0600, Logan Gunthorpe wrote:
>
>
> On 2025-07-28 17:11, Jason Gunthorpe wrote:
> >> If the dma mapping for P2P memory doesn't need to create an iommu
> >> mapping then that's fine. But it should be the dma-iommu layer to decide
> >> that.
> >
> > So above, we can't use dma-iommu.c, it might not be compiled into the
> > kernel but the dma_map_phys() path is still valid.
>
> This is an easily solved problem. I did a very rough sketch below to say
> it's really not that hard. (Note it has some rough edges that could be
> cleaned up and I based it off Leon's git repo which appears to not be
> the same as what was posted, but the core concept is sound).
I did hope for something like this in the early days, but it proved
not so easy to get agreements on details :(
My feeling was we should get some actual examples of using this thing
and then it is far easier to discuss ideas, like yours here, to
improve it. Many of the discussions kind of got confused without
enough actual usering code for everyone to refer to.
For instance the nvme use case is a big driver for the API design, and
it is quite different from these simpler flows, this idea needs to see
how it would work there.
Maybe this idea could also have provider = NULL meaning it is CPU
cachable memory?
> +static inline void dma_iova_try_alloc_p2p(struct p2pdma_provider *provider,
> + struct device *dev, struct dma_iova_state *state, phys_addr_t phys,
> + size_t size)
> +{
> +}
Can't be empty - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE vs
PCI_P2PDMA_MAP_BUS_ADDR still matters so it still must set
dma_iova_state::bus_addr to get dma_map_phys_prealloc() to do the
right thing.
Still, it would make sense to put something like that in dma/mapping.c
and rely on the static inline stub for dma_iova_try_alloc()..
> for (i = 0; i < priv->nr_ranges; i++) {
> - if (!state) {
> - addr = pci_p2pdma_bus_addr_map(provider,
> - phys_vec[i].paddr);
> - } else if (dma_use_iova(state)) {
> - ret = dma_iova_link(attachment->dev, state,
> - phys_vec[i].paddr, 0,
> - phys_vec[i].len, dir, attrs);
> - if (ret)
> - goto err_unmap_dma;
> -
> - mapped_len += phys_vec[i].len;
> - } else {
> - addr = dma_map_phys(attachment->dev, phys_vec[i].paddr,
> - phys_vec[i].len, dir, attrs);
> - ret = dma_mapping_error(attachment->dev, addr);
> - if (ret)
> - goto err_unmap_dma;
> - }
> + addr = dma_map_phys_prealloc(attachment->dev, phys_vec[i].paddr,
> + phys_vec[i].len, dir, attrs, state,
> + provider);
There was a draft of something like this at some point. The
DMA_MAPPING_USE_IOVA is a new twist though
> #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
> struct dma_iova_state {
> dma_addr_t addr;
> u64 __size;
> + bool bus_addr;
> };
Gowing this structure has been strongly pushed back on. This probably
can be solved in some other way, a bitfield on size perhaps..
> +dma_addr_t dma_map_phys_prealloc(struct device *dev, phys_addr_t phys,
> size_t size,
> + enum dma_data_direction dir, unsigned long attrs,
> + struct dma_iova_state *state, struct p2pdma_provider *provider)
> +{
> + int ret;
> +
> + if (state->bus_addr)
> + return pci_p2pdma_bus_addr_map(provider, phys);
> +
> + if (dma_use_iova(state)) {
> + ret = dma_iova_link(dev, state, phys, 0, size, dir, attrs);
> + if (ret)
> + return DMA_MAPPING_ERROR;
> +
> + return DMA_MAPPING_USE_IOVA;
> + }
> +
> + return dma_map_phys(dev, phys, size, dir, attrs);
> +}
> +EXPORT_SYMBOL_GPL(dma_map_phys_prealloc);
I would be tempted to inline this
Overall, yeah I would certainly welcome improvements like this if
everyone can agree, but I'd really like to see nvme merged before we
start working on ideas. That way the proposal can be properly
evaluated by all the stake holders.
Jason
On Mon, Jul 28, 2025 at 11:07:34AM -0600, Logan Gunthorpe wrote:
>
>
> On 2025-07-28 10:41, Leon Romanovsky wrote:
> > On Mon, Jul 28, 2025 at 10:12:31AM -0600, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2025-07-27 13:05, Jason Gunthorpe wrote:
> >>> On Fri, Jul 25, 2025 at 10:30:46AM -0600, Logan Gunthorpe wrote:
> >>>>
> >>>>
> >>>> On 2025-07-24 02:13, Leon Romanovsky wrote:
> >>>>> On Thu, Jul 24, 2025 at 10:03:13AM +0200, Christoph Hellwig wrote:
> >>>>>> On Wed, Jul 23, 2025 at 04:00:06PM +0300, Leon Romanovsky wrote:
> >>>>>>> From: Leon Romanovsky <leonro(a)nvidia.com>
> >>>>>>>
> >>>>>>> Export the pci_p2pdma_map_type() function to allow external modules
> >>>>>>> and subsystems to determine the appropriate mapping type for P2PDMA
> >>>>>>> transfers between a provider and target device.
> >>>>>>
> >>>>>> External modules have no business doing this.
> >>>>>
> >>>>> VFIO PCI code is built as module. There is no way to access PCI p2p code
> >>>>> without exporting functions in it.
> >>>>
> >>>> The solution that would make more sense to me would be for either
> >>>> dma_iova_try_alloc() or another helper in dma-iommu.c to handle the
> >>>> P2PDMA case.
> >>>
> >>> This has nothing to do with dma-iommu.c, the decisions here still need
> >>> to be made even if dma-iommu.c is not compiled in.
> >>
> >> Doesn't it though? Every single call in patch 10 to the newly exported
> >> PCI functions calls into the the dma-iommu functions.
Patch 10 has lots of flows, only one will end up in dma-iommu.c
vfio_pci_dma_buf_map() calls pci_p2pdma_bus_addr_map(),
dma_iova_link(), dma_map_phys().
Only iova_link would call to dma-iommu.c - if dma_map_phys() is called
we know that dma-iommu.c won't be called by it.
> >> If there were non-iommu paths then I would expect the code would
> >> use the regular DMA api directly which would then call in to
> >> dma-iommu.
> >
> > If p2p type is PCI_P2PDMA_MAP_BUS_ADDR, there will no dma-iommu and DMA
> > at all.
>
> I understand that and it is completely beside my point.
>
> If the dma mapping for P2P memory doesn't need to create an iommu
> mapping then that's fine. But it should be the dma-iommu layer to decide
> that.
So above, we can't use dma-iommu.c, it might not be compiled into the
kernel but the dma_map_phys() path is still valid.
> It's not a decision that should be made by every driver doing this
> kind of thing.
Sort of, I think we are trying to get to some place where there are
subsystem, or at least data structure specific helpers that do this
(ie nvme has BIO helpers), but the helpers should be running this
logic directly for performance. Leon hasn't done it but I think we
should see helpers for DMABUF too encapsulating the logic shown in
patch 10. I think we need to prove it out these basic points first
before trying to go and convert a bunch of GPU drivers.
The vfio in patch 10 is not the full example since it only has a
single scatter/gather" effectively, but the generalized version loops
over pci_p2pdma_bus_addr_map(), dma_iova_link(), dma_map_phys() for
each page.
Part of the new API design is to only do one kind of mapping operation
at once, and part of the design is we know that the P2P type is fixed.
It makes no performance sense to check the type inside the
pci_p2pdma_bus_addr_map()/ dma_iova_link()/dma_map_phys() within the
per-page loop.
I do think some level of abstraction has been lost here in pursuit of
performance. If someone does have a better way to structure this
without a performance hit then fantastic, but thats going back and
revising the new DMA API. This just builds on top of that, and yes, it
is not so abstract.
Jason
On Mon, Jul 28, 2025 at 10:12:31AM -0600, Logan Gunthorpe wrote:
>
>
> On 2025-07-27 13:05, Jason Gunthorpe wrote:
> > On Fri, Jul 25, 2025 at 10:30:46AM -0600, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2025-07-24 02:13, Leon Romanovsky wrote:
> >>> On Thu, Jul 24, 2025 at 10:03:13AM +0200, Christoph Hellwig wrote:
> >>>> On Wed, Jul 23, 2025 at 04:00:06PM +0300, Leon Romanovsky wrote:
> >>>>> From: Leon Romanovsky <leonro(a)nvidia.com>
> >>>>>
> >>>>> Export the pci_p2pdma_map_type() function to allow external modules
> >>>>> and subsystems to determine the appropriate mapping type for P2PDMA
> >>>>> transfers between a provider and target device.
> >>>>
> >>>> External modules have no business doing this.
> >>>
> >>> VFIO PCI code is built as module. There is no way to access PCI p2p code
> >>> without exporting functions in it.
> >>
> >> The solution that would make more sense to me would be for either
> >> dma_iova_try_alloc() or another helper in dma-iommu.c to handle the
> >> P2PDMA case.
> >
> > This has nothing to do with dma-iommu.c, the decisions here still need
> > to be made even if dma-iommu.c is not compiled in.
>
> Doesn't it though? Every single call in patch 10 to the newly exported
> PCI functions calls into the the dma-iommu functions. If there were
> non-iommu paths then I would expect the code would use the regular DMA
> api directly which would then call in to dma-iommu.
If p2p type is PCI_P2PDMA_MAP_BUS_ADDR, there will no dma-iommu and DMA
at all.
+static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attachment)
+{
+ struct vfio_pci_dma_buf *priv = dmabuf->priv;
+
+ if (!attachment->peer2peer)
+ return -EOPNOTSUPP;
+
+ if (priv->revoked)
+ return -ENODEV;
+
+ switch (pci_p2pdma_map_type(priv->vdev->provider, attachment->dev)) {
+ case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+ break;
+ case PCI_P2PDMA_MAP_BUS_ADDR:
+ /*
+ * There is no need in IOVA at all for this flow.
+ * We rely on attachment->priv == NULL as a marker
+ * for this mode.
+ */
+ return 0;
+ default:
+ return -EINVAL;
+ }
+
+ attachment->priv = kzalloc(sizeof(struct dma_iova_state), GFP_KERNEL);
+ if (!attachment->priv)
+ return -ENOMEM;
+
+ dma_iova_try_alloc(attachment->dev, attachment->priv, 0, priv->phys_vec.len);
+ return 0;
+}
We've discussed a number of times of how some heap names are bad, but
not really what makes a good heap name.
Let's document what we expect the heap names to look like.
Reviewed-by: Andrew Davis <afd(a)ti.com>
Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v4:
- Dropped *all* the cacheable mentions
- Link to v3: https://lore.kernel.org/r/20250717-dma-buf-heap-names-doc-v3-1-d2dbb4b95ef6…
Changes in v3:
- Grammar, spelling fixes
- Remove the cacheable / uncacheable name suggestion
- Link to v2: https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf…
Changes in v2:
- Added justifications for each requirement / suggestions
- Added a mention and example of buffer attributes
- Link to v1: https://lore.kernel.org/r/20250520-dma-buf-heap-names-doc-v1-1-ab31f74809ee…
---
Documentation/userspace-api/dma-buf-heaps.rst | 35 +++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
index 535f49047ce6450796bf4380c989e109355efc05..1ced2720f929432661182f1a3a88aa1ff80bd6af 100644
--- a/Documentation/userspace-api/dma-buf-heaps.rst
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -21,5 +21,40 @@ following heaps:
usually created either through the kernel commandline through the
`cma` parameter, a memory region Device-Tree node with the
`linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
`CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
might be called ``reserved``, ``linux,cma``, or ``default-pool``.
+
+Naming Convention
+=================
+
+``dma-buf`` heaps name should meet a number of constraints:
+
+- The name must be stable, and must not change from one version to the other.
+ Userspace identifies heaps by their name, so if the names ever change, we
+ would be likely to introduce regressions.
+
+- The name must describe the memory region the heap will allocate from, and
+ must uniquely identify it in a given platform. Since userspace applications
+ use the heap name as the discriminant, it must be able to tell which heap it
+ wants to use reliably if there's multiple heaps.
+
+- The name must not mention implementation details, such as the allocator. The
+ heap driver will change over time, and implementation details when it was
+ introduced might not be relevant in the future.
+
+- The name should describe properties of the buffers that would be allocated.
+ Doing so will make heap identification easier for userspace. Such properties
+ are:
+
+ - ``contiguous`` for physically contiguous buffers;
+
+ - ``protected`` for encrypted buffers not accessible the OS;
+
+- The name may describe intended usage. Doing so will make heap identification
+ easier for userspace applications and users.
+
+For example, assuming a platform with a reserved memory region located
+at the RAM address 0x42000000, intended to allocate video framebuffers,
+physically contiguous, and backed by the CMA kernel allocator, good
+names would be ``memory@42000000-contiguous`` or ``video@42000000``, but
+``cma-video`` wouldn't.
---
base-commit: 038d61fd642278bab63ee8ef722c50d10ab01e8f
change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>