Changelog:
v3:
* Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider().
* Cache provider in vfio_pci_dma_buf struct instead of BAR index.
* Removed misleading comment from pcim_p2pdma_provider().
* Moved MMIO check to be in pcim_p2pdma_provider().
v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/
* Added extra patch which adds new CONFIG, so next patches can reuse it.
* Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
into the other patch.
* Fixed revoke calls to be aligned with true->false semantics.
* Extended p2pdma_providers to be per-BAR and not global to whole device.
* Fixed possible race between dmabuf states and revoke.
* Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.
The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
Leon Romanovsky (8):
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory
allocation
PCI/P2PDMA: Export pci_p2pdma_map_type() function
types: move phys_vec definition to common header
vfio/pci: Add dma-buf export config for MMIO regions
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature
functions
block/blk-mq-dma.c | 7 +-
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 176 +++++++++----
drivers/vfio/pci/Kconfig | 20 ++
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/vfio_pci_config.c | 22 +-
drivers/vfio/pci/vfio_pci_core.c | 58 +++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 394 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 ++
drivers/vfio/vfio_main.c | 2 +
include/linux/pci-p2pdma.h | 115 +++++----
include/linux/types.h | 5 +
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 25 ++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
17 files changed, 741 insertions(+), 124 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
--
2.51.0
We've discussed a number of times of how some heap names are bad, but
not really what makes a good heap name.
Let's document what we expect the heap names to look like.
Reviewed-by: Andrew Davis <afd(a)ti.com>
Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v4:
- Dropped *all* the cacheable mentions
- Link to v3: https://lore.kernel.org/r/20250717-dma-buf-heap-names-doc-v3-1-d2dbb4b95ef6…
Changes in v3:
- Grammar, spelling fixes
- Remove the cacheable / uncacheable name suggestion
- Link to v2: https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf…
Changes in v2:
- Added justifications for each requirement / suggestions
- Added a mention and example of buffer attributes
- Link to v1: https://lore.kernel.org/r/20250520-dma-buf-heap-names-doc-v1-1-ab31f74809ee…
---
Documentation/userspace-api/dma-buf-heaps.rst | 35 +++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
index 535f49047ce6450796bf4380c989e109355efc05..1ced2720f929432661182f1a3a88aa1ff80bd6af 100644
--- a/Documentation/userspace-api/dma-buf-heaps.rst
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -21,5 +21,40 @@ following heaps:
usually created either through the kernel commandline through the
`cma` parameter, a memory region Device-Tree node with the
`linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
`CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
might be called ``reserved``, ``linux,cma``, or ``default-pool``.
+
+Naming Convention
+=================
+
+``dma-buf`` heaps name should meet a number of constraints:
+
+- The name must be stable, and must not change from one version to the other.
+ Userspace identifies heaps by their name, so if the names ever change, we
+ would be likely to introduce regressions.
+
+- The name must describe the memory region the heap will allocate from, and
+ must uniquely identify it in a given platform. Since userspace applications
+ use the heap name as the discriminant, it must be able to tell which heap it
+ wants to use reliably if there's multiple heaps.
+
+- The name must not mention implementation details, such as the allocator. The
+ heap driver will change over time, and implementation details when it was
+ introduced might not be relevant in the future.
+
+- The name should describe properties of the buffers that would be allocated.
+ Doing so will make heap identification easier for userspace. Such properties
+ are:
+
+ - ``contiguous`` for physically contiguous buffers;
+
+ - ``protected`` for encrypted buffers not accessible the OS;
+
+- The name may describe intended usage. Doing so will make heap identification
+ easier for userspace applications and users.
+
+For example, assuming a platform with a reserved memory region located
+at the RAM address 0x42000000, intended to allocate video framebuffers,
+physically contiguous, and backed by the CMA kernel allocator, good
+names would be ``memory@42000000-contiguous`` or ``video@42000000``, but
+``cma-video`` wouldn't.
---
base-commit: 038d61fd642278bab63ee8ef722c50d10ab01e8f
change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
On 16.09.25 13:58, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 16/09/2025 à 12:52, Christian König a écrit :
>> On 16.09.25 11:46, Pierre-Eric Pelloux-Prayer wrote:
>>>
>>>
>>> Le 16/09/2025 à 11:25, Christian König a écrit :
>>>> On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote:
>>>>> amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out
>>>>> fence is non-NULL to simplify the code.
>>>>> Since none of the pointers should be NULL, we can enable
>>>>> __attribute__((nonnull))__.
>>>>>
>>>>> While at it make the function static since it's only used from
>>>>> amdgpuu_ttm.c.
>>>>>
>>>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++---------
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------
>>>>> 2 files changed, 8 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index 27ab4e754b2a..70b817b5578d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>>>>> * move and different for a BO to BO copy.
>>>>> *
>>>>> */
>>>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> - const struct amdgpu_copy_mem *src,
>>>>> - const struct amdgpu_copy_mem *dst,
>>>>> - uint64_t size, bool tmz,
>>>>> - struct dma_resv *resv,
>>>>> - struct dma_fence **f)
>>>>> +__attribute__((nonnull))
>>>>
>>>> That looks fishy.
>>>>
>>>>> +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> + const struct amdgpu_copy_mem *src,
>>>>> + const struct amdgpu_copy_mem *dst,
>>>>> + uint64_t size, bool tmz,
>>>>> + struct dma_resv *resv,
>>>>> + struct dma_fence **f)
>>>>
>>>> I'm not an expert for those, but looking at other examples that should be here and look something like:
>>>>
>>>> __attribute__((nonnull(7)))
>>>
>>> Both syntax are valid. The GCC docs says:
>>>
>>> If no arg-index is given to the nonnull attribute, all pointer arguments are marked as non-null
>>
>> Never seen that before. Is that gcc specifc or standardized?
>
> clang supports it:
>
> https://clang.llvm.org/docs/AttributeReference.html#id10
>
> And both syntaxes are already used in the drm subtree by i915.
Ok in that case Reviewed-by: Christian König <christian.koenig(a)amd.com>.
Regards,
Christian.
>
> Pierre-Eric
>
>>
>>>
>>>
>>>>
>>>> But I think for this case here it is also not a must have to have that.
>>>
>>> I can remove it if you prefer, but it doesn't hurt to have the compiler validate usage of the functions.
>>
>> Yeah it's clearly useful, but I'm worried that clang won't like it.
>>
>> Christian.
>>
>>>
>>> Pierre-Eric
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> {
>>>>> struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>>>>> struct amdgpu_res_cursor src_mm, dst_mm;
>>>>> @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> }
>>>>> error:
>>>>> mutex_unlock(&adev->mman.gtt_window_lock);
>>>>> - if (f)
>>>>> - *f = dma_fence_get(fence);
>>>>> - dma_fence_put(fence);
>>>>> + *f = fence;
>>>>> return r;
>>>>> }
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> index bb17987f0447..07ae2853c77c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>>>>> struct dma_resv *resv,
>>>>> struct dma_fence **fence, bool direct_submit,
>>>>> bool vm_needs_flush, uint32_t copy_flags);
>>>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> - const struct amdgpu_copy_mem *src,
>>>>> - const struct amdgpu_copy_mem *dst,
>>>>> - uint64_t size, bool tmz,
>>>>> - struct dma_resv *resv,
>>>>> - struct dma_fence **f);
>>>>> int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>>>> struct dma_resv *resv,
>>>>> struct dma_fence **fence);
>>
On 16.09.25 11:46, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 16/09/2025 à 11:25, Christian König a écrit :
>> On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote:
>>> amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out
>>> fence is non-NULL to simplify the code.
>>> Since none of the pointers should be NULL, we can enable
>>> __attribute__((nonnull))__.
>>>
>>> While at it make the function static since it's only used from
>>> amdgpuu_ttm.c.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++---------
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------
>>> 2 files changed, 8 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 27ab4e754b2a..70b817b5578d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>>> * move and different for a BO to BO copy.
>>> *
>>> */
>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> - const struct amdgpu_copy_mem *src,
>>> - const struct amdgpu_copy_mem *dst,
>>> - uint64_t size, bool tmz,
>>> - struct dma_resv *resv,
>>> - struct dma_fence **f)
>>> +__attribute__((nonnull))
>>
>> That looks fishy.
>>
>>> +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> + const struct amdgpu_copy_mem *src,
>>> + const struct amdgpu_copy_mem *dst,
>>> + uint64_t size, bool tmz,
>>> + struct dma_resv *resv,
>>> + struct dma_fence **f)
>>
>> I'm not an expert for those, but looking at other examples that should be here and look something like:
>>
>> __attribute__((nonnull(7)))
>
> Both syntax are valid. The GCC docs says:
>
> If no arg-index is given to the nonnull attribute, all pointer arguments are marked as non-null
Never seen that before. Is that gcc specifc or standardized?
>
>
>>
>> But I think for this case here it is also not a must have to have that.
>
> I can remove it if you prefer, but it doesn't hurt to have the compiler validate usage of the functions.
Yeah it's clearly useful, but I'm worried that clang won't like it.
Christian.
>
> Pierre-Eric
>
>
>>
>> Regards,
>> Christian.
>>
>>> {
>>> struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>>> struct amdgpu_res_cursor src_mm, dst_mm;
>>> @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> }
>>> error:
>>> mutex_unlock(&adev->mman.gtt_window_lock);
>>> - if (f)
>>> - *f = dma_fence_get(fence);
>>> - dma_fence_put(fence);
>>> + *f = fence;
>>> return r;
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> index bb17987f0447..07ae2853c77c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>>> struct dma_resv *resv,
>>> struct dma_fence **fence, bool direct_submit,
>>> bool vm_needs_flush, uint32_t copy_flags);
>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> - const struct amdgpu_copy_mem *src,
>>> - const struct amdgpu_copy_mem *dst,
>>> - uint64_t size, bool tmz,
>>> - struct dma_resv *resv,
>>> - struct dma_fence **f);
>>> int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>> struct dma_resv *resv,
>>> struct dma_fence **fence);
Hi,
On Fri, Sep 12, 2025 at 6:07 AM Amirreza Zarrabi
<amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
> This patch series introduces a Trusted Execution Environment (TEE)
> driver for Qualcomm TEE (QTEE). QTEE enables Trusted Applications (TAs)
> and services to run securely. It uses an object-based interface, where
> each service is an object with sets of operations. Clients can invoke
> these operations on objects, which can generate results, including other
> objects. For example, an object can load a TA and return another object
> that represents the loaded TA, allowing access to its services.
>
[snip]
I'm OK with the TEE patches, Sumit and I have reviewed them.
There were some minor conflicts with other patches I have in the pipe
for this merge window, so this patchset is on top of what I have to
avoid merge conflicts.
However, the firmware patches are for code maintained by Björn.
Björn, how would you like to do this? Can I take them via my tree, or
what do you suggest?
It's urgent to get this patchset into linux-next if it's to make it
for the coming merge window. Ideally, I'd like to send my pull request
to arm-soc during this week.
Cheers,
Jens
>
> ---
> Amirreza Zarrabi (11):
> firmware: qcom: tzmem: export shm_bridge create/delete
> firmware: qcom: scm: add support for object invocation
> tee: allow a driver to allocate a tee_device without a pool
> tee: add close_context to TEE driver operation
> tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF
> tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF
> tee: increase TEE_MAX_ARG_SIZE to 4096
> tee: add Qualcomm TEE driver
> tee: qcom: add primordial object
> tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl
> Documentation: tee: Add Qualcomm TEE driver
>
> Documentation/tee/index.rst | 1 +
> Documentation/tee/qtee.rst | 96 ++++
> MAINTAINERS | 7 +
> drivers/firmware/qcom/qcom_scm.c | 119 ++++
> drivers/firmware/qcom/qcom_scm.h | 7 +
> drivers/firmware/qcom/qcom_tzmem.c | 63 ++-
> drivers/tee/Kconfig | 1 +
> drivers/tee/Makefile | 1 +
> drivers/tee/qcomtee/Kconfig | 12 +
> drivers/tee/qcomtee/Makefile | 9 +
> drivers/tee/qcomtee/async.c | 182 ++++++
> drivers/tee/qcomtee/call.c | 820 +++++++++++++++++++++++++++
> drivers/tee/qcomtee/core.c | 915 +++++++++++++++++++++++++++++++
> drivers/tee/qcomtee/mem_obj.c | 169 ++++++
> drivers/tee/qcomtee/primordial_obj.c | 113 ++++
> drivers/tee/qcomtee/qcomtee.h | 185 +++++++
> drivers/tee/qcomtee/qcomtee_msg.h | 304 ++++++++++
> drivers/tee/qcomtee/qcomtee_object.h | 316 +++++++++++
> drivers/tee/qcomtee/shm.c | 150 +++++
> drivers/tee/qcomtee/user_obj.c | 692 +++++++++++++++++++++++
> drivers/tee/tee_core.c | 127 ++++-
> drivers/tee/tee_private.h | 6 -
> include/linux/firmware/qcom/qcom_scm.h | 6 +
> include/linux/firmware/qcom/qcom_tzmem.h | 15 +
> include/linux/tee_core.h | 54 +-
> include/linux/tee_drv.h | 12 +
> include/uapi/linux/tee.h | 56 +-
> 27 files changed, 4410 insertions(+), 28 deletions(-)
> ---
> base-commit: 8b8aefa5a5c7d4a65883e5653cf12f94c0b68dbf
> change-id: 20241202-qcom-tee-using-tee-ss-without-mem-obj-362c66340527
>
> Best regards,
> --
> Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com>
>
Now that we're getting close to reaching the finish line for upstreaming
the rust gem shmem bindings, we've got another batch of patches that
have been reviewed and can be safely pushed to drm-rust-next
independently of the rest of the series.
These patches of course apply against the drm-rust-next branch, and are
part of the gem shmem series, the latest version of which can be found
here:
https://patchwork.freedesktop.org/series/146465/
Lyude Paul (3):
drm/gem/shmem: Extract drm_gem_shmem_init() from
drm_gem_shmem_create()
drm/gem/shmem: Extract drm_gem_shmem_release() from
drm_gem_shmem_free()
rust: Add dma_buf stub bindings
drivers/gpu/drm/drm_gem_shmem_helper.c | 98 ++++++++++++++++++--------
include/drm/drm_gem_shmem_helper.h | 2 +
rust/kernel/dma_buf.rs | 40 +++++++++++
rust/kernel/lib.rs | 1 +
4 files changed, 111 insertions(+), 30 deletions(-)
create mode 100644 rust/kernel/dma_buf.rs
base-commit: cf4fd52e323604ccfa8390917593e1fb965653ee
--
2.51.0