On 16.09.25 13:58, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 16/09/2025 à 12:52, Christian König a écrit :
>> On 16.09.25 11:46, Pierre-Eric Pelloux-Prayer wrote:
>>>
>>>
>>> Le 16/09/2025 à 11:25, Christian König a écrit :
>>>> On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote:
>>>>> amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out
>>>>> fence is non-NULL to simplify the code.
>>>>> Since none of the pointers should be NULL, we can enable
>>>>> __attribute__((nonnull))__.
>>>>>
>>>>> While at it make the function static since it's only used from
>>>>> amdgpuu_ttm.c.
>>>>>
>>>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++---------
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------
>>>>> 2 files changed, 8 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index 27ab4e754b2a..70b817b5578d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>>>>> * move and different for a BO to BO copy.
>>>>> *
>>>>> */
>>>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> - const struct amdgpu_copy_mem *src,
>>>>> - const struct amdgpu_copy_mem *dst,
>>>>> - uint64_t size, bool tmz,
>>>>> - struct dma_resv *resv,
>>>>> - struct dma_fence **f)
>>>>> +__attribute__((nonnull))
>>>>
>>>> That looks fishy.
>>>>
>>>>> +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> + const struct amdgpu_copy_mem *src,
>>>>> + const struct amdgpu_copy_mem *dst,
>>>>> + uint64_t size, bool tmz,
>>>>> + struct dma_resv *resv,
>>>>> + struct dma_fence **f)
>>>>
>>>> I'm not an expert for those, but looking at other examples that should be here and look something like:
>>>>
>>>> __attribute__((nonnull(7)))
>>>
>>> Both syntax are valid. The GCC docs says:
>>>
>>> If no arg-index is given to the nonnull attribute, all pointer arguments are marked as non-null
>>
>> Never seen that before. Is that gcc specifc or standardized?
>
> clang supports it:
>
> https://clang.llvm.org/docs/AttributeReference.html#id10
>
> And both syntaxes are already used in the drm subtree by i915.
Ok in that case Reviewed-by: Christian König <christian.koenig(a)amd.com>.
Regards,
Christian.
>
> Pierre-Eric
>
>>
>>>
>>>
>>>>
>>>> But I think for this case here it is also not a must have to have that.
>>>
>>> I can remove it if you prefer, but it doesn't hurt to have the compiler validate usage of the functions.
>>
>> Yeah it's clearly useful, but I'm worried that clang won't like it.
>>
>> Christian.
>>
>>>
>>> Pierre-Eric
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> {
>>>>> struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>>>>> struct amdgpu_res_cursor src_mm, dst_mm;
>>>>> @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> }
>>>>> error:
>>>>> mutex_unlock(&adev->mman.gtt_window_lock);
>>>>> - if (f)
>>>>> - *f = dma_fence_get(fence);
>>>>> - dma_fence_put(fence);
>>>>> + *f = fence;
>>>>> return r;
>>>>> }
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> index bb17987f0447..07ae2853c77c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>>>>> struct dma_resv *resv,
>>>>> struct dma_fence **fence, bool direct_submit,
>>>>> bool vm_needs_flush, uint32_t copy_flags);
>>>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>>>> - const struct amdgpu_copy_mem *src,
>>>>> - const struct amdgpu_copy_mem *dst,
>>>>> - uint64_t size, bool tmz,
>>>>> - struct dma_resv *resv,
>>>>> - struct dma_fence **f);
>>>>> int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>>>> struct dma_resv *resv,
>>>>> struct dma_fence **fence);
>>
On 16.09.25 11:46, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 16/09/2025 à 11:25, Christian König a écrit :
>> On 16.09.25 09:08, Pierre-Eric Pelloux-Prayer wrote:
>>> amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out
>>> fence is non-NULL to simplify the code.
>>> Since none of the pointers should be NULL, we can enable
>>> __attribute__((nonnull))__.
>>>
>>> While at it make the function static since it's only used from
>>> amdgpuu_ttm.c.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 ++++++++---------
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 6 ------
>>> 2 files changed, 8 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 27ab4e754b2a..70b817b5578d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -284,12 +284,13 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo,
>>> * move and different for a BO to BO copy.
>>> *
>>> */
>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> - const struct amdgpu_copy_mem *src,
>>> - const struct amdgpu_copy_mem *dst,
>>> - uint64_t size, bool tmz,
>>> - struct dma_resv *resv,
>>> - struct dma_fence **f)
>>> +__attribute__((nonnull))
>>
>> That looks fishy.
>>
>>> +static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> + const struct amdgpu_copy_mem *src,
>>> + const struct amdgpu_copy_mem *dst,
>>> + uint64_t size, bool tmz,
>>> + struct dma_resv *resv,
>>> + struct dma_fence **f)
>>
>> I'm not an expert for those, but looking at other examples that should be here and look something like:
>>
>> __attribute__((nonnull(7)))
>
> Both syntax are valid. The GCC docs says:
>
> If no arg-index is given to the nonnull attribute, all pointer arguments are marked as non-null
Never seen that before. Is that gcc specifc or standardized?
>
>
>>
>> But I think for this case here it is also not a must have to have that.
>
> I can remove it if you prefer, but it doesn't hurt to have the compiler validate usage of the functions.
Yeah it's clearly useful, but I'm worried that clang won't like it.
Christian.
>
> Pierre-Eric
>
>
>>
>> Regards,
>> Christian.
>>
>>> {
>>> struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>>> struct amdgpu_res_cursor src_mm, dst_mm;
>>> @@ -363,9 +364,7 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> }
>>> error:
>>> mutex_unlock(&adev->mman.gtt_window_lock);
>>> - if (f)
>>> - *f = dma_fence_get(fence);
>>> - dma_fence_put(fence);
>>> + *f = fence;
>>> return r;
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> index bb17987f0447..07ae2853c77c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> @@ -170,12 +170,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>>> struct dma_resv *resv,
>>> struct dma_fence **fence, bool direct_submit,
>>> bool vm_needs_flush, uint32_t copy_flags);
>>> -int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
>>> - const struct amdgpu_copy_mem *src,
>>> - const struct amdgpu_copy_mem *dst,
>>> - uint64_t size, bool tmz,
>>> - struct dma_resv *resv,
>>> - struct dma_fence **f);
>>> int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
>>> struct dma_resv *resv,
>>> struct dma_fence **fence);
Hi,
On Fri, Sep 12, 2025 at 6:07 AM Amirreza Zarrabi
<amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
> This patch series introduces a Trusted Execution Environment (TEE)
> driver for Qualcomm TEE (QTEE). QTEE enables Trusted Applications (TAs)
> and services to run securely. It uses an object-based interface, where
> each service is an object with sets of operations. Clients can invoke
> these operations on objects, which can generate results, including other
> objects. For example, an object can load a TA and return another object
> that represents the loaded TA, allowing access to its services.
>
[snip]
I'm OK with the TEE patches, Sumit and I have reviewed them.
There were some minor conflicts with other patches I have in the pipe
for this merge window, so this patchset is on top of what I have to
avoid merge conflicts.
However, the firmware patches are for code maintained by Björn.
Björn, how would you like to do this? Can I take them via my tree, or
what do you suggest?
It's urgent to get this patchset into linux-next if it's to make it
for the coming merge window. Ideally, I'd like to send my pull request
to arm-soc during this week.
Cheers,
Jens
>
> ---
> Amirreza Zarrabi (11):
> firmware: qcom: tzmem: export shm_bridge create/delete
> firmware: qcom: scm: add support for object invocation
> tee: allow a driver to allocate a tee_device without a pool
> tee: add close_context to TEE driver operation
> tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF
> tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF
> tee: increase TEE_MAX_ARG_SIZE to 4096
> tee: add Qualcomm TEE driver
> tee: qcom: add primordial object
> tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl
> Documentation: tee: Add Qualcomm TEE driver
>
> Documentation/tee/index.rst | 1 +
> Documentation/tee/qtee.rst | 96 ++++
> MAINTAINERS | 7 +
> drivers/firmware/qcom/qcom_scm.c | 119 ++++
> drivers/firmware/qcom/qcom_scm.h | 7 +
> drivers/firmware/qcom/qcom_tzmem.c | 63 ++-
> drivers/tee/Kconfig | 1 +
> drivers/tee/Makefile | 1 +
> drivers/tee/qcomtee/Kconfig | 12 +
> drivers/tee/qcomtee/Makefile | 9 +
> drivers/tee/qcomtee/async.c | 182 ++++++
> drivers/tee/qcomtee/call.c | 820 +++++++++++++++++++++++++++
> drivers/tee/qcomtee/core.c | 915 +++++++++++++++++++++++++++++++
> drivers/tee/qcomtee/mem_obj.c | 169 ++++++
> drivers/tee/qcomtee/primordial_obj.c | 113 ++++
> drivers/tee/qcomtee/qcomtee.h | 185 +++++++
> drivers/tee/qcomtee/qcomtee_msg.h | 304 ++++++++++
> drivers/tee/qcomtee/qcomtee_object.h | 316 +++++++++++
> drivers/tee/qcomtee/shm.c | 150 +++++
> drivers/tee/qcomtee/user_obj.c | 692 +++++++++++++++++++++++
> drivers/tee/tee_core.c | 127 ++++-
> drivers/tee/tee_private.h | 6 -
> include/linux/firmware/qcom/qcom_scm.h | 6 +
> include/linux/firmware/qcom/qcom_tzmem.h | 15 +
> include/linux/tee_core.h | 54 +-
> include/linux/tee_drv.h | 12 +
> include/uapi/linux/tee.h | 56 +-
> 27 files changed, 4410 insertions(+), 28 deletions(-)
> ---
> base-commit: 8b8aefa5a5c7d4a65883e5653cf12f94c0b68dbf
> change-id: 20241202-qcom-tee-using-tee-ss-without-mem-obj-362c66340527
>
> Best regards,
> --
> Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com>
>
Now that we're getting close to reaching the finish line for upstreaming
the rust gem shmem bindings, we've got another batch of patches that
have been reviewed and can be safely pushed to drm-rust-next
independently of the rest of the series.
These patches of course apply against the drm-rust-next branch, and are
part of the gem shmem series, the latest version of which can be found
here:
https://patchwork.freedesktop.org/series/146465/
Lyude Paul (3):
drm/gem/shmem: Extract drm_gem_shmem_init() from
drm_gem_shmem_create()
drm/gem/shmem: Extract drm_gem_shmem_release() from
drm_gem_shmem_free()
rust: Add dma_buf stub bindings
drivers/gpu/drm/drm_gem_shmem_helper.c | 98 ++++++++++++++++++--------
include/drm/drm_gem_shmem_helper.h | 2 +
rust/kernel/dma_buf.rs | 40 +++++++++++
rust/kernel/lib.rs | 1 +
4 files changed, 111 insertions(+), 30 deletions(-)
create mode 100644 rust/kernel/dma_buf.rs
base-commit: cf4fd52e323604ccfa8390917593e1fb965653ee
--
2.51.0
On Thu, 11 Sep 2025 21:07:39 -0700, Amirreza Zarrabi wrote:
> This patch series introduces a Trusted Execution Environment (TEE)
> driver for Qualcomm TEE (QTEE). QTEE enables Trusted Applications (TAs)
> and services to run securely. It uses an object-based interface, where
> each service is an object with sets of operations. Clients can invoke
> these operations on objects, which can generate results, including other
> objects. For example, an object can load a TA and return another object
> that represents the loaded TA, allowing access to its services.
>
> [...]
Applied, thanks!
[01/11] firmware: qcom: tzmem: export shm_bridge create/delete
commit: 8aa1e3a6f0ffbcfdf3bd7d87feb9090f96c54bc4
[02/11] firmware: qcom: scm: add support for object invocation
commit: 4b700098c0fc4a76c5c1e54465c8f35e13755294
Best regards,
--
Bjorn Andersson <andersson(a)kernel.org>
Hi,
This patch set allocates the protected DMA-bufs from a DMA-heap
instantiated from the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE
(OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QTEE) which sets up the
protection for the memory used for the DMA-bufs.
The DMA-heap uses a protected memory pool provided by the backend TEE
driver, allowing it to choose how to allocate the protected physical
memory.
The allocated DMA-bufs must be imported with a new TEE_IOC_SHM_REGISTER_FD
before they can be passed as arguments when requesting services from the
secure world.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
Recording) have been identified so far to serve as examples of what can be
expected. The use-cases have predefined DMA-heap names,
"protected,secure-video", "protected,trusted-ui", and
"protected,secure-video-record". The backend driver registers protected
memory pools for the use-cases it supports.
Each use-case has its own protected memory pool since different use-cases
require isolation from different parts of the system. A protected memory
pool can be based on a static carveout instantiated while probing the TEE
backend driver, or dynamically allocated from CMA (dma_alloc_pages()) and
made protected as needed by the TEE.
This can be tested on a RockPi 4B+ with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m rockpi4.xml \
-b prototype/sdp-v12
repo sync -j8
cd build
make toolchains -j$(nproc)
make all -j$(nproc)
# Copy ../out/rockpi4.img to an SD card and boot the RockPi from that
# Connect a monitor to the RockPi
# login and at the prompt:
gst-launch-1.0 videotestsrc ! \
aesenc key=1f9423681beb9a79215820f6bda73d0f \
iv=e9aa8e834d8d70b7e0d254ff670dd718 serialize-iv=true ! \
aesdec key=1f9423681beb9a79215820f6bda73d0f ! \
kmssink
The aesdec module has been hacked to use an OP-TEE TA to decrypt the stream
into protected DMA-bufs which are consumed by the kmssink.
The primitive QEMU tests from previous patch sets can be tested on RockPi
in the same way using:
xtest --sdp-basic
The primitive tests are tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b prototype/sdp-v12
repo sync -j8
cd build
make toolchains -j$(nproc)
make SPMC_AT_EL=1 all -j$(nproc)
make SPMC_AT_EL=1 run-only
# login and at the prompt:
xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
S-EL1 inside OP-TEE. The parameter can be changed to SPMC_AT_EL=n to test
without FF-A using the original SMC ABI instead. Please remember to do
%make arm-tf-clean
for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies required to build the above.
The primitive tests are pretty basic, mostly checking that a Trusted
Application in the secure world can access and manipulate the memory. There
are also some negative tests for out of bounds buffers, etc.
Thanks,
Jens
Changes since V11:
* In "dma-buf: dma-heap: export declared functions":
- use EXPORT_SYMBOL_NS_GPL()
- Added TJ's R-B and Sumit's Ack
* In "tee: implement protected DMA-heap", import the namespaces "DMA_BUF"
and "DMA_BUF_HEAP" as needed.
Changes since V10:
* Changed the new ABI OPTEE_MSG_CMD_GET_PROTMEM_CONFIG to report a list
of u32 memory attributes instead of u16 endpoints to make room for both
endpoint and access permissions in each entry.
* In "tee: new ioctl to a register tee_shm from a dmabuf file descriptor",
remove the unused path for DMA-bufs allocated by other means than the on
in the TEE SS.
* In "tee: implement protected DMA-heap", handle unloading of the
backend driver module implementing the heap. The heap is reference
counted and also calls tee_device_get() to guarantee that the module
remains available while the heap is instantiated.
* In "optee: support protected memory allocation", use
dma_coerce_mask_and_coherent() instead of open-coding the function.
* Added Sumit's R-B to
- "optee: smc abi: dynamic protected memory allocation"
- "optee: FF-A: dynamic protected memory allocation"
- "optee: support protected memory allocation"
- "tee: implement protected DMA-heap"
- "dma-buf: dma-heap: export declared functions"
Changes since V9:
* Adding Sumit's R-B to "optee: sync secure world ABI headers"
* Update commit message as requested for "dma-buf: dma-heap: export
declared functions".
* In "tee: implement protected DMA-heap":
- add the hidden config option TEE_DMABUF_HEAPS to tell if the TEE
subsystem can support DMA heaps
- add a pfn_valid() to check that the passed physical address can be
used by __pfn_to_page() and friends
- remove the memremap() call, the caller is should do that instead if
needed
* In "tee: add tee_shm_alloc_dma_mem()" guard the calls to
dma_alloc_pages() and dma_free_pages() with TEE_DMABUF_HEAPS to avoid
linking errors in some configurations
* In "optee: support protected memory allocation":
- add the hidden config option OPTEE_STATIC_PROTMEM_POOL to tell if the
driver can support a static protected memory pool
- optee_protmem_pool_init() is slightly refactored to make the patches
that follow easier
- Call devm_memremap() before calling tee_protmem_static_pool_alloc()
Changes since V8:
* Using dma_alloc_pages() instead of cma_alloc() so the direct dependency on
CMA can be removed together with the patches
"cma: export cma_alloc() and cma_release()" and
"dma-contiguous: export dma_contiguous_default_area". The patch
* Renaming the patch "tee: add tee_shm_alloc_cma_phys_mem()" to
"tee: add tee_shm_alloc_dma_mem()"
* Setting DMA mask for the OP-TEE TEE device based on input from the secure
world instead of relying on the parent device so following patches are
removed: "tee: tee_device_alloc(): copy dma_mask from parent device" and
"optee: pass parent device to tee_device_alloc()".
* Adding Sumit Garg's R-B to "tee: refactor params_from_user()"
* In the patch "tee: implement protected DMA-heap", map the physical memory
passed to tee_protmem_static_pool_alloc().
Changes since V7:
* Adding "dma-buf: dma-heap: export declared functions",
"cma: export cma_alloc() and cma_release()", and
"dma-contiguous: export dma_contiguous_default_area" to export the symbols
needed to keep the TEE subsystem as a load module.
* Removing CONFIG_TEE_DMABUF_HEAP and CONFIG_TEE_CMA since they aren't
needed any longer.
* Addressing review comments in "optee: sync secure world ABI headers"
* Better align protected memory pool initialization between the smc-abi and
ffa-abi parts of the optee driver.
* Removing the patch "optee: account for direction while converting parameters"
Changes since V6:
* Restricted memory is now known as protected memory since to use the same
term as https://docs.vulkan.org/guide/latest/protected.html. Update all
patches to consistently use protected memory.
* In "tee: implement protected DMA-heap" add the hidden config option
TEE_DMABUF_HEAP to tell if the DMABUF_HEAPS functions are available
for the TEE subsystem
* Adding "tee: refactor params_from_user()", broken out from the patch
"tee: new ioctl to a register tee_shm from a dmabuf file descriptor"
* For "tee: new ioctl to a register tee_shm from a dmabuf file descriptor":
- Update commit message to mention protected memory
- Remove and open code tee_shm_get_parent_shm() in param_from_user_memref()
* In "tee: add tee_shm_alloc_cma_phys_mem" add the hidden config option
TEE_CMA to tell if the CMA functions are available for the TEE subsystem
* For "tee: tee_device_alloc(): copy dma_mask from parent device" and
"optee: pass parent device to tee_device_alloc", added
Reviewed-by: Sumit Garg <sumit.garg(a)kernel.org>
Changes since V5:
* Removing "tee: add restricted memory allocation" and
"tee: add TEE_IOC_RSTMEM_FD_INFO"
* Adding "tee: implement restricted DMA-heap",
"tee: new ioctl to a register tee_shm from a dmabuf file descriptor",
"tee: add tee_shm_alloc_cma_phys_mem()",
"optee: pass parent device to tee_device_alloc()", and
"tee: tee_device_alloc(): copy dma_mask from parent device"
* The two TEE driver OPs "rstmem_alloc()" and "rstmem_free()" are replaced
with a struct tee_rstmem_pool abstraction.
* Replaced the the TEE_IOC_RSTMEM_ALLOC user space API with the DMA-heap API
Changes since V4:
* Adding the patch "tee: add TEE_IOC_RSTMEM_FD_INFO" needed by the
GStreamer demo
* Removing the dummy CPU access and mmap functions from the dma_buf_ops
* Fixing a compile error in "optee: FF-A: dynamic restricted memory allocation"
reported by kernel test robot <lkp(a)intel.com>
Changes since V3:
* Make the use_case and flags field in struct tee_shm u32's instead of
u16's
* Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
* Import namespace DMA_BUF in module tee, reported by lkp(a)intel.com
* Added a note in the commit message for "optee: account for direction
while converting parameters" why it's needed
* Factor out dynamic restricted memory allocation from
"optee: support restricted memory allocation" into two new commits
"optee: FF-A: dynamic restricted memory allocation" and
"optee: smc abi: dynamic restricted memory allocation"
* Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
* Based on v6.12
* Replaced the flags for SVP and Trusted UID memory with a u32 field with
unique id for each use case
* Added dynamic allocation of restricted memory pools
* Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
* Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
* Based on v6.11
* Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
support"
* Replaced the word "secure" with "restricted" where applicable
Etienne Carriere (1):
tee: new ioctl to a register tee_shm from a dmabuf file descriptor
Jens Wiklander (8):
optee: sync secure world ABI headers
dma-buf: dma-heap: export declared functions
tee: implement protected DMA-heap
tee: refactor params_from_user()
tee: add tee_shm_alloc_dma_mem()
optee: support protected memory allocation
optee: FF-A: dynamic protected memory allocation
optee: smc abi: dynamic protected memory allocation
drivers/dma-buf/dma-heap.c | 4 +
drivers/tee/Kconfig | 5 +
drivers/tee/Makefile | 1 +
drivers/tee/optee/Kconfig | 5 +
drivers/tee/optee/Makefile | 1 +
drivers/tee/optee/core.c | 7 +
drivers/tee/optee/ffa_abi.c | 146 ++++++++-
drivers/tee/optee/optee_ffa.h | 27 +-
drivers/tee/optee/optee_msg.h | 84 ++++-
drivers/tee/optee/optee_private.h | 15 +-
drivers/tee/optee/optee_smc.h | 37 ++-
drivers/tee/optee/protmem.c | 335 ++++++++++++++++++++
drivers/tee/optee/smc_abi.c | 141 ++++++++-
drivers/tee/tee_core.c | 158 +++++++---
drivers/tee/tee_heap.c | 500 ++++++++++++++++++++++++++++++
drivers/tee/tee_private.h | 14 +
drivers/tee/tee_shm.c | 157 +++++++++-
include/linux/tee_core.h | 59 ++++
include/linux/tee_drv.h | 10 +
include/uapi/linux/tee.h | 31 ++
20 files changed, 1670 insertions(+), 67 deletions(-)
create mode 100644 drivers/tee/optee/protmem.c
create mode 100644 drivers/tee/tee_heap.c
base-commit: c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9
--
2.43.0
Hi,
Here's another attempt at supporting user-space allocations from a
specific carved-out reserved memory region.
The initial problem we were discussing was that I'm currently working on
a platform which has a memory layout with ECC enabled. However, enabling
the ECC has a number of drawbacks on that platform: lower performance,
increased memory usage, etc. So for things like framebuffers, the
trade-off isn't great and thus there's a memory region with ECC disabled
to allocate from for such use cases.
After a suggestion from John, I chose to first start using heap
allocations flags to allow for userspace to ask for a particular ECC
setup. This is then backed by a new heap type that runs from reserved
memory chunks flagged as such, and the existing DT properties to specify
the ECC properties.
After further discussion, it was considered that flags were not the
right solution, and relying on the names of the heaps would be enough to
let userspace know the kind of buffer it deals with.
Thus, even though the uAPI part of it had been dropped in this second
version, we still needed a driver to create heaps out of carved-out memory
regions. In addition to the original usecase, a similar driver can be
found in BSPs from most vendors, so I believe it would be a useful
addition to the kernel.
Some extra discussion with Rob Herring [1] came to the conclusion that
some specific compatible for this is not great either, and as such an
new driver probably isn't called for either.
Some other discussions we had with John [2] also dropped some hints that
multiple CMA heaps might be a good idea, and some vendors seem to do
that too.
So here's another attempt that doesn't affect the device tree at all and
will just create a heap for every CMA reserved memory region.
It also falls nicely into the current plan we have to support cgroups in
DRM/KMS and v4l2, which is an additional benefit.
Let me know what you think,
Maxime
1: https://lore.kernel.org/all/20250707-cobalt-dingo-of-serenity-dbf92c@houat/
2: https://lore.kernel.org/all/CANDhNCroe6ZBtN_o=c71kzFFaWK-fF5rCdnr9P5h1sgPOW…
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v7:
- Invert the logic and register CMA heap from the reserved memory /
dma contiguous code, instead of iterating over them from the CMA heap.
- Link to v6: https://lore.kernel.org/r/20250709-dma-buf-ecc-heap-v6-0-dac9bf80f35d@kerne…
Changes in v6:
- Drop the new driver and allocate a CMA heap for each region now
- Dropped the binding
- Rebased on 6.16-rc5
- Link to v5: https://lore.kernel.org/r/20250617-dma-buf-ecc-heap-v5-0-0abdc5863a4f@kerne…
Changes in v5:
- Rebased on 6.16-rc2
- Switch from property to dedicated binding
- Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kerne…
Changes in v4:
- Rebased on 6.15-rc7
- Map buffers only when map is actually called, not at allocation time
- Deal with restricted-dma-pool and shared-dma-pool
- Reword Kconfig options
- Properly report dma_map_sgtable failures
- Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kerne…
Changes in v3:
- Reworked global variable patch
- Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kerne…
Changes in v2:
- Add vmap/vunmap operations
- Drop ECC flags uapi
- Rebase on top of 6.14
- Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kerne…
---
Maxime Ripard (5):
doc: dma-buf: List the heaps by name
dma-buf: heaps: cma: Register list of CMA regions at boot
dma: contiguous: Register reusable CMA regions at boot
dma: contiguous: Reserve default CMA heap
dma-buf: heaps: cma: Create CMA heap for each CMA reserved region
Documentation/userspace-api/dma-buf-heaps.rst | 24 ++++++++------
MAINTAINERS | 1 +
drivers/dma-buf/heaps/Kconfig | 10 ------
drivers/dma-buf/heaps/cma_heap.c | 47 +++++++++++++++++----------
include/linux/dma-buf/heaps/cma.h | 16 +++++++++
kernel/dma/contiguous.c | 11 +++++++
6 files changed, 72 insertions(+), 37 deletions(-)
---
base-commit: 47633099a672fc7bfe604ef454e4f116e2c954b1
change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e
prerequisite-message-id: <20250610131231.1724627-1-jkangas(a)redhat.com>
prerequisite-patch-id: bc44be5968feb187f2bc1b8074af7209462b18e7
prerequisite-patch-id: f02a91b723e5ec01fbfedf3c3905218b43d432da
prerequisite-patch-id: e944d0a3e22f2cdf4d3b3906e5603af934696deb
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Changelog:
v2:
* Added extra patch which adds new CONFIG, so next patches can reuse it.
* Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
into the other patch.
* Fixed revoke calls to be aligned with true->false semantics.
* Extended p2pdma_providers to be per-BAR and not global to whole
device.
* Fixed possible race between dmabuf states and revoke.
* Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.
The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
Leon Romanovsky (8):
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory
allocation
PCI/P2PDMA: Export pci_p2pdma_map_type() function
types: move phys_vec definition to common header
vfio/pci: Add dma-buf export config for MMIO regions
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
:wqa
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature
functions
block/blk-mq-dma.c | 7 +-
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 165 ++++++++----
drivers/vfio/pci/Kconfig | 20 ++
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/vfio_pci_config.c | 22 +-
drivers/vfio/pci/vfio_pci_core.c | 59 +++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 398 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 ++
drivers/vfio/vfio_main.c | 2 +
include/linux/pci-p2pdma.h | 114 +++++----
include/linux/types.h | 5 +
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 25 ++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
17 files changed, 734 insertions(+), 124 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
--
2.51.0