Linaro-mm-sig February 2025

linaro-mm-sig@lists.linaro.org

11 participants
25 discussions

[PATCH v5 0/7] TEE subsystem for restricted dma-buf allocations

by Jens Wiklander

Hi, [Now with a Gstreamer demo, see below.] This patch set allocates the restricted DMA-bufs via the TEE subsystem. The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs. I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory. TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also a use-case parameter. This is used by the backend TEE driver to decide on allocation policy and which devices should be able to access the memory. Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) has been identified so far to serve as examples of what can be expected. More use-cases can be added in userspace ABI, but it's up to the backend TEE drivers to provide the implementation. Each use-case has it's own restricted memory pool since different use-cases requires isolation from different parts of the system. A restricted memory pool can be based on a static carveout instantiated while probing the TEE backend driver, or dynamically allocated from CMA and made restricted as needed by the TEE. This can be tested on a RockPi 4B+ with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m rockpi4.xml \ -b prototype/sdp-v5 repo sync -j8 cd build make toolchains -j$(nproc) make all -j$(nproc) # Copy ../out/rockpi4.img to an SD card and boot the RockPi from that # Connect a monitor to the RockPi # login and at the prompt: gst-launch-1.0 videotestsrc ! \ aesenc key=1f9423681beb9a79215820f6bda73d0f \ iv=e9aa8e834d8d70b7e0d254ff670dd718 serialize-iv=true ! \ aesdec key=1f9423681beb9a79215820f6bda73d0f ! \ kmssink The aesdec module has been hacked to use an OP-TEE TA to decrypt the stream into restricted DMA-bufs which are consumed by the kmssink. The primitive QEMU tests from previous patch set can be tested on RockPi in the same way with: xtest --sdp-basic The primitive test are tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v5 repo sync -j8 cd build make toolchains -j$(nproc) make SPMC_AT_EL=1 all -j$(nproc) make SPMC_AT_EL=1 run-only # login and at the prompt: xtest --sdp-basic The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test without FF-A using the original SMC ABI instead. Please remember to do %rm -rf ../trusted-firmware-a/build/qemu for TF-A to be rebuilt properly using the new configuration. https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above. The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. There are also some negative tests for out of bounds buffers etc. Thanks, Jens Changes since V4: * Adding the patch "tee: add TEE_IOC_RSTMEM_FD_INFO" needed by the GStreamer demo * Removing the dummy CPU access and mmap functions from the dma_buf_ops * Fixing a compile error in "optee: FF-A: dynamic restricted memory allocation" reported by kernel test robot <lkp(a)intel.com> Changes since V3: * Make the use_case and flags field in struct tee_shm u32's instead of u16's * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file * Import namespace DMA_BUF in module tee, reported by lkp(a)intel.com * Added a note in the commit message for "optee: account for direction while converting parameters" why it's needed * Factor out dynamic restricted memory allocation from "optee: support restricted memory allocation" into two new commits "optee: FF-A: dynamic restricted memory allocation" and "optee: smc abi: dynamic restricted memory allocation" * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic restricted memory allocate if CMA isn't configured Changes since the V2 RFC: * Based on v6.12 * Replaced the flags for SVP and Trusted UID memory with a u32 field with unique id for each use case * Added dynamic allocation of restricted memory pools * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory * Added support for FF-A with FFA_LEND Changes since the V1 RFC: * Based on v6.11 * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC Changes since Olivier's post [2]: * Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap * Simplifications and cleanup * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support" * Replaced the word "secure" with "restricted" where applicable Jens Wiklander (7): tee: add restricted memory allocation tee: add TEE_IOC_RSTMEM_FD_INFO optee: account for direction while converting parameters optee: sync secure world ABI headers optee: support restricted memory allocation optee: FF-A: dynamic restricted memory allocation optee: smc abi: dynamic restricted memory allocation drivers/tee/Makefile | 1 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/call.c | 10 +- drivers/tee/optee/core.c | 1 + drivers/tee/optee/ffa_abi.c | 179 +++++++++++++- drivers/tee/optee/optee_ffa.h | 27 ++- drivers/tee/optee/optee_msg.h | 65 ++++- drivers/tee/optee/optee_private.h | 75 ++++-- drivers/tee/optee/optee_smc.h | 71 +++++- drivers/tee/optee/rpc.c | 31 ++- drivers/tee/optee/rstmem.c | 389 ++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 215 +++++++++++++++-- drivers/tee/tee_core.c | 69 +++++- drivers/tee/tee_private.h | 4 + drivers/tee/tee_rstmem.c | 188 +++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 +++++- include/linux/tee_core.h | 15 ++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 71 +++++- 20 files changed, 1409 insertions(+), 76 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c create mode 100644 drivers/tee/tee_rstmem.c base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3 -- 2.43.0

1 year, 2 months

Re: [PATCH 4/4] dma-buf: drop caching of sg_tables

by Christian König

Am 17.02.25 um 17:28 schrieb Simona Vetter: > On Tue, Feb 11, 2025 at 05:31:09PM +0100, Christian König wrote: >> That was purely for the transition from static to dynamic dma-buf >> handling and can be removed again now. >> >> Signed-off-by: Christian König <christian.koenig(a)amd.com> > Yay! > > Might uncover some fun if people have meanwhile started to rely on this > for perf or something. But we'll figure that out when it happens. Yeah that was exactly the reason why I made this a separate patch and didn't mixed it into patch #3. On the other hand I'm fine to keep it a bit longer if anybody complains. Christian. > > Reviewed-by: Simona Vetter <simona.vetter(a)ffwll.ch> > > >> --- >> drivers/dma-buf/dma-buf.c | 34 -------------------------- >> drivers/dma-buf/udmabuf.c | 1 - >> drivers/gpu/drm/drm_prime.c | 1 - >> drivers/gpu/drm/virtio/virtgpu_prime.c | 1 - >> include/linux/dma-buf.h | 13 ---------- >> 5 files changed, 50 deletions(-) >> >> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c >> index 357b94a3dbaa..35221c4ddbf5 100644 >> --- a/drivers/dma-buf/dma-buf.c >> +++ b/drivers/dma-buf/dma-buf.c >> @@ -636,10 +636,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) >> || !exp_info->ops->release)) >> return ERR_PTR(-EINVAL); >> >> - if (WARN_ON(exp_info->ops->cache_sgt_mapping && >> - (exp_info->ops->pin || exp_info->ops->unpin))) >> - return ERR_PTR(-EINVAL); >> - >> if (WARN_ON(!exp_info->ops->pin != !exp_info->ops->unpin)) >> return ERR_PTR(-EINVAL); >> >> @@ -964,17 +960,7 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) >> return; >> >> dma_resv_lock(dmabuf->resv, NULL); >> - >> - if (attach->sgt) { >> - mangle_sg_table(attach->sgt); >> - attach->dmabuf->ops->unmap_dma_buf(attach, attach->sgt, >> - attach->dir); >> - >> - if (dma_buf_pin_on_map(attach)) >> - dma_buf_unpin(attach); >> - } >> list_del(&attach->node); >> - >> dma_resv_unlock(dmabuf->resv); >> >> if (dmabuf->ops->detach) >> @@ -1069,18 +1055,6 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, >> >> dma_resv_assert_held(attach->dmabuf->resv); >> >> - if (attach->sgt) { >> - /* >> - * Two mappings with different directions for the same >> - * attachment are not allowed. >> - */ >> - if (attach->dir != direction && >> - attach->dir != DMA_BIDIRECTIONAL) >> - return ERR_PTR(-EBUSY); >> - >> - return attach->sgt; >> - } >> - >> if (dma_buf_pin_on_map(attach)) { >> ret = attach->dmabuf->ops->pin(attach); >> if (ret) >> @@ -1105,11 +1079,6 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, >> } >> mangle_sg_table(sg_table); >> >> - if (attach->dmabuf->ops->cache_sgt_mapping) { >> - attach->sgt = sg_table; >> - attach->dir = direction; >> - } >> - >> #ifdef CONFIG_DMA_API_DEBUG >> { >> struct scatterlist *sg; >> @@ -1190,9 +1159,6 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, >> >> dma_resv_assert_held(attach->dmabuf->resv); >> >> - if (attach->sgt == sg_table) >> - return; >> - >> mangle_sg_table(sg_table); >> attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, direction); >> >> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c >> index cc7398cc17d6..2fa2c9135eac 100644 >> --- a/drivers/dma-buf/udmabuf.c >> +++ b/drivers/dma-buf/udmabuf.c >> @@ -285,7 +285,6 @@ static int end_cpu_udmabuf(struct dma_buf *buf, >> } >> >> static const struct dma_buf_ops udmabuf_ops = { >> - .cache_sgt_mapping = true, >> .map_dma_buf = map_udmabuf, >> .unmap_dma_buf = unmap_udmabuf, >> .release = release_udmabuf, >> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >> index 32a8781cfd67..c284f306d597 100644 >> --- a/drivers/gpu/drm/drm_prime.c >> +++ b/drivers/gpu/drm/drm_prime.c >> @@ -810,7 +810,6 @@ int drm_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) >> EXPORT_SYMBOL(drm_gem_dmabuf_mmap); >> >> static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { >> - .cache_sgt_mapping = true, >> .attach = drm_gem_map_attach, >> .detach = drm_gem_map_detach, >> .map_dma_buf = drm_gem_map_dma_buf, >> diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c b/drivers/gpu/drm/virtio/virtgpu_prime.c >> index fe6a0b018571..c6f3be3cb914 100644 >> --- a/drivers/gpu/drm/virtio/virtgpu_prime.c >> +++ b/drivers/gpu/drm/virtio/virtgpu_prime.c >> @@ -75,7 +75,6 @@ static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach, >> >> static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops = { >> .ops = { >> - .cache_sgt_mapping = true, >> .attach = virtio_dma_buf_attach, >> .detach = drm_gem_map_detach, >> .map_dma_buf = virtgpu_gem_map_dma_buf, >> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h >> index c54ff2dda8cb..544f8f8c3f44 100644 >> --- a/include/linux/dma-buf.h >> +++ b/include/linux/dma-buf.h >> @@ -34,15 +34,6 @@ struct dma_buf_attachment; >> * @vunmap: [optional] unmaps a vmap from the buffer >> */ >> struct dma_buf_ops { >> - /** >> - * @cache_sgt_mapping: >> - * >> - * If true the framework will cache the first mapping made for each >> - * attachment. This avoids creating mappings for attachments multiple >> - * times. >> - */ >> - bool cache_sgt_mapping; >> - >> /** >> * @attach: >> * >> @@ -493,8 +484,6 @@ struct dma_buf_attach_ops { >> * @dmabuf: buffer for this attachment. >> * @dev: device attached to the buffer. >> * @node: list of dma_buf_attachment, protected by dma_resv lock of the dmabuf. >> - * @sgt: cached mapping. >> - * @dir: direction of cached mapping. >> * @peer2peer: true if the importer can handle peer resources without pages. >> * @priv: exporter specific attachment data. >> * @importer_ops: importer operations for this attachment, if provided >> @@ -514,8 +503,6 @@ struct dma_buf_attachment { >> struct dma_buf *dmabuf; >> struct device *dev; >> struct list_head node; >> - struct sg_table *sgt; >> - enum dma_data_direction dir; >> bool peer2peer; >> const struct dma_buf_attach_ops *importer_ops; >> void *importer_priv; >> -- >> 2.34.1 >>

1 year, 2 months

Bunch of DMA-buf cleanup patches

by Christian König

Hello everyone, just a few DMA-buf cleanup patches. The first one is fixing an incorrect documentation which has annoyed me for quite a while. The rest basically just removes stuff we no longer need or which was just added as abstraction which was never used. Please review and/or voice objection if some of that is till needed for something. Regards, Christian.

1 year, 2 months

Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations

by Daniel Stone

Hi, On Thu, 13 Feb 2025 at 12:40, Boris Brezillon <boris.brezillon(a)collabora.com> wrote: > On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg <sumit.garg(a)linaro.org> wrote: > > Yeah but all the prior vendor specific secure/restricted DMA heaps > > relied on DT information. > > Right, but there's nothing in the DMA heap provider API forcing that. Yeah. DMA heaps are just a way to allocate memory from a specific place. It allows people to settle on having a single way to do allocations from weird platform-specific places; the only weird platform-specific part userspace needs to deal with is figuring out the name to use. The rest is at least a unified API: the point of dma-heaps was exactly to have a single coherent API for userspace, not to create one API for ZONE_CMA and DT ranges and everyone else doing their own thing. > > Rather than that it's better > > for the user to directly ask the TEE device to allocate restricted > > memory without worrying about how the memory restriction gets > > enforced. > > If the consensus is that restricted/protected memory allocation should > always be routed to the TEE, sure, but I had the feeling this wasn't as > clear as that. OTOH, using a dma-heap to expose the TEE-SDP > implementation provides the same benefits, without making potential > future non-TEE based implementations a pain for users. The dma-heap > ioctl being common to all implementations, it just becomes a > configuration matter if we want to change the heap we rely on for > protected/restricted buffer allocation. And because heaps have > unique/well-known names, users can still default to (or rely solely on) > the TEE-SPD implementation if they want. > > > There have been several attempts with DMA heaps in the past which all > > resulted in a very vendor specific vertically integrated solution. But > > the solution with TEE subsystem aims to make it generic and vendor > > agnostic. > > Just because all previous protected/restricted dma-heap effort > failed to make it upstream, doesn't mean dma-heap is the wrong way of > exposing this feature IMHO. To be fair, having a TEE implementation does give us a much better chance of having a sensible cross-vendor plan. And the fact it's already (sort of accidentally and only on one platform AFAICT) ready for a 'test' interface, where we can still exercise protected allocation paths but without having to go through all the platform-specific setup that is inaccessible to most people, is also really great! That's probably been the biggest barrier to having this tested outside of IHVs and OEMs. But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT? How does userspace pick which TEE device to use? What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86? I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes. Cheers, Daniel

1 year, 2 months

Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations

by Sumit Garg

On Thu, 13 Feb 2025 at 14:06, Boris Brezillon <boris.brezillon(a)collabora.com> wrote: > > On Thu, 13 Feb 2025 12:11:52 +0530 > Sumit Garg <sumit.garg(a)linaro.org> wrote: > > > Hi Boris, > > > > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon > > <boris.brezillon(a)collabora.com> wrote: > > > > > > +Florent, who's working on protected-mode support in Panthor. > > > > > > Hi Jens, > > > > > > On Tue, 17 Dec 2024 11:07:36 +0100 > > > Jens Wiklander <jens.wiklander(a)linaro.org> wrote: > > > > > > > Hi, > > > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > > > We're currently working on protected-mode support for Panthor [1] and it > > > looks like your series (and the OP-TEE implementation that goes with > > > it) would allow us to have a fully upstream/open solution for the > > > protected content use case we're trying to support. I need a bit more > > > time to play with the implementation but this looks very promising > > > (especially the lend rstmem feature, which might help us allocate our > > > FW sections that are supposed to execute code accessing protected > > > content). > > > > Glad to hear that, if you can demonstrate an open source use case > > based on this series then it will help to land it. We really would > > love to see support for restricted DMA-buf consumers be it GPU, crypto > > accelerator, media pipeline etc. > > > > > > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > > restrictions for the memory used for the DMA-bufs. > > > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > > how to allocate the restricted physical memory. > > > > > > I'll probably have more questions soon, but here's one to start: any > > > particular reason you didn't go for a dma-heap to expose restricted > > > buffer allocation to userspace? I see you already have a cdev you can > > > take ioctl()s from, but my understanding was that dma-heap was the > > > standard solution for these device-agnostic/central allocators. > > > > This series started with the DMA heap approach only here [1] but later > > discussions [2] lead us here. To point out specifically: > > > > - DMA heaps require reliance on DT to discover static restricted > > regions carve-outs whereas via the TEE implementation driver (eg. > > OP-TEE) those can be discovered dynamically. > > Hm, the system heap [1] doesn't rely on any DT information AFAICT. Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information. > The dynamic allocation scheme, where the TEE implementation allocates a > chunk of protected memory for us would have a similar behavior, I guess. In a dynamic scheme, the allocation will still be from CMA or system heap depending on TEE implementation capabilities but the restriction will be enforced via interaction with TEE. > > > - Dynamic allocation of buffers and making them restricted requires > > vendor specific driver hooks with DMA heaps whereas the TEE subsystem > > abstracts that out with underlying TEE implementation (eg. OP-TEE) > > managing the dynamic buffer restriction. > > Yeah, the lend rstmem feature is clearly something tee specific, and I > think that's okay to assume the user knows the protection request > should go through the tee subsystem in that case. Yeah but how will the user discover that? Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced. > > > - TEE subsystem already has a well defined user-space interface for > > managing shared memory buffers with TEE and restricted DMA buffers > > will be yet another interface managed along similar lines. > > Okay, so the very reason I'm asking about the dma-buf heap interface is > because there might be cases where the protected/restricted allocation > doesn't go through the TEE (Mediatek has a TEE-free implementation > for instance, but I realize vendor implementations are probably not the > best selling point :-/). You can always have a system with memory and peripheral access permissions setup during boot (or even have a pre-configured hardware as a special case) prior to booting up the kernel too. But that even gets somehow configured by a TEE implementation during boot, so calling it a TEE-free implementation seems over-simplified and not a scalable solution. However, this patchset [1] from Mediatek requires runtime TEE interaction too. [1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@med… > If we expose things as a dma-heap, we have > a solution where integrators can pick the dma-heap they think is > relevant for protected buffer allocations without the various drivers > (GPU, video codec, ...) having to implement a dispatch function for all > possible implementations. The same goes for userspace allocations, > where passing a dma-heap name, is simpler than supporting different > ioctl()s based on the allocation backend. There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic. > > [1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/sys… -Sumit

1 year, 2 months

Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations

by Sumit Garg

Hi Boris, On Thu, 13 Feb 2025 at 01:26, Boris Brezillon <boris.brezillon(a)collabora.com> wrote: > > +Florent, who's working on protected-mode support in Panthor. > > Hi Jens, > > On Tue, 17 Dec 2024 11:07:36 +0100 > Jens Wiklander <jens.wiklander(a)linaro.org> wrote: > > > Hi, > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > We're currently working on protected-mode support for Panthor [1] and it > looks like your series (and the OP-TEE implementation that goes with > it) would allow us to have a fully upstream/open solution for the > protected content use case we're trying to support. I need a bit more > time to play with the implementation but this looks very promising > (especially the lend rstmem feature, which might help us allocate our > FW sections that are supposed to execute code accessing protected > content). Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc. > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > restrictions for the memory used for the DMA-bufs. > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > how to allocate the restricted physical memory. > > I'll probably have more questions soon, but here's one to start: any > particular reason you didn't go for a dma-heap to expose restricted > buffer allocation to userspace? I see you already have a cdev you can > take ioctl()s from, but my understanding was that dma-heap was the > standard solution for these device-agnostic/central allocators. This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically: - DMA heaps require reliance on DT to discover static restricted regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically. - Dynamic allocation of buffers and making them restricted requires vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction. - TEE subsystem already has a well defined user-space interface for managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines. [1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqie… [2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjF… -Sumit > > Regards, > > Boris > > [1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t

1 year, 2 months

Re: [RFC PATCH 1/5] dt-bindings: dma: Add CMA Heap bindings

by Maxime Ripard

On Wed, Feb 12, 2025 at 10:29:32AM +0000, Florent Tomasin wrote: > > > On 12/02/2025 10:01, Maxime Ripard wrote: > > On Wed, Feb 12, 2025 at 09:49:56AM +0000, Florent Tomasin wrote: > >> Note that the CMA patches were initially shared to help reproduce my > >> environment of development, I can isolate them in a separate patch > >> series and include a reference or "base-commit:" tag to it in the > >> Panthor protected mode RFC, to help progress this review in another > >> thread. It will avoid overlapping these two topics: > >> > >> - Multiple standalone CMA heaps support > >> - Panthor protected mode handling > > > > You keep insisting on using CMA here, but it's really not clear to me > > why you would need CMA in the first place. > > > > By CMA, do you mean the CMA allocator, and thus would provide buffers > > through the usual dma_alloc_* API, or would any allocator providing > > physically contiguous memory work? > > You are correct only the CMA allocator is relevant. I needed a way to > sub-allocate from a carved-out memory. I'm still confused, sorry. You're saying that you require CMA but... > > In the latter case, would something like this work: > > https://lore.kernel.org/all/20240515-dma-buf-ecc-heap-v1-1-54cbbd049511@ker… > > Thanks for sharing this link, I was not aware previous work was done > on this aspect. The new carveout heap introduced in the series could > probably be a good alternative. I will play-around with it and share > some updates. ... you seem to be ok with a driver that doesn't use it? Maxime

1 year, 2 months

Re: [RFC PATCH 1/5] dt-bindings: dma: Add CMA Heap bindings

by Maxime Ripard

On Wed, Feb 12, 2025 at 09:49:56AM +0000, Florent Tomasin wrote: > Note that the CMA patches were initially shared to help reproduce my > environment of development, I can isolate them in a separate patch > series and include a reference or "base-commit:" tag to it in the > Panthor protected mode RFC, to help progress this review in another > thread. It will avoid overlapping these two topics: > > - Multiple standalone CMA heaps support > - Panthor protected mode handling You keep insisting on using CMA here, but it's really not clear to me why you would need CMA in the first place. By CMA, do you mean the CMA allocator, and thus would provide buffers through the usual dma_alloc_* API, or would any allocator providing physically contiguous memory work? In the latter case, would something like this work: https://lore.kernel.org/all/20240515-dma-buf-ecc-heap-v1-1-54cbbd049511@ker… Maxime

1 year, 2 months

Re: [RFC PATCH 0/5] drm/panthor: Protected mode support for Mali CSF GPUs

by Maxime Ripard

Hi Boris, On Fri, Feb 07, 2025 at 04:02:53PM +0100, Boris Brezillon wrote: > Sorry for joining the party late, a couple of comments to back Akash > and Nicolas' concerns. > > On Wed, 05 Feb 2025 13:14:14 -0500 > Nicolas Dufresne <nicolas(a)ndufresne.ca> wrote: > > > Le mercredi 05 février 2025 à 15:52 +0100, Maxime Ripard a écrit : > > > On Mon, Feb 03, 2025 at 04:43:23PM +0000, Florent Tomasin wrote: > > > > Hi Maxime, Nicolas > > > > > > > > On 30/01/2025 17:47, Nicolas Dufresne wrote: > > > > > Le jeudi 30 janvier 2025 à 17:38 +0100, Maxime Ripard a écrit : > > > > > > Hi Nicolas, > > > > > > > > > > > > On Thu, Jan 30, 2025 at 10:59:56AM -0500, Nicolas Dufresne wrote: > > > > > > > Le jeudi 30 janvier 2025 à 14:46 +0100, Maxime Ripard a écrit : > > > > > > > > Hi, > > > > > > > > > > > > > > > > I started to review it, but it's probably best to discuss it here. > > > > > > > > > > > > > > > > On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > This is a patch series covering the support for protected mode execution in > > > > > > > > > Mali Panthor CSF kernel driver. > > > > > > > > > > > > > > > > > > The Mali CSF GPUs come with the support for protected mode execution at the > > > > > > > > > HW level. This feature requires two main changes in the kernel driver: > > > > > > > > > > > > > > > > > > 1) Configure the GPU with a protected buffer. The system must provide a DMA > > > > > > > > > heap from which the driver can allocate a protected buffer. > > > > > > > > > It can be a carved-out memory or dynamically allocated protected memory region. > > > > > > > > > Some system includes a trusted FW which is in charge of the protected memory. > > > > > > > > > Since this problem is integration specific, the Mali Panthor CSF kernel > > > > > > > > > driver must import the protected memory from a device specific exporter. > > > > > > > > > > > > > > > > Why do you need a heap for it in the first place? My understanding of > > > > > > > > your series is that you have a carved out memory region somewhere, and > > > > > > > > you want to allocate from that carved out memory region your buffers. > > > > > > > > > > > > > > > > How is that any different from using a reserved-memory region, adding > > > > > > > > the reserved-memory property to the GPU device and doing all your > > > > > > > > allocation through the usual dma_alloc_* API? > > > > > > > > > > > > > > How do you then multiplex this region so it can be shared between > > > > > > > GPU/Camera/Display/Codec drivers and also userspace ? > > > > > > > > > > > > You could point all the devices to the same reserved memory region, and > > > > > > they would all allocate from there, including for their userspace-facing > > > > > > allocations. > > > > > > > > > > I get that using memory region is somewhat more of an HW description, and > > > > > aligned with what a DT is supposed to describe. One of the challenge is that > > > > > Mediatek heap proposal endup calling into their TEE, meaning knowing the region > > > > > is not that useful. You actually need the TEE APP guid and its IPC protocol. If > > > > > we can dell drivers to use a head instead, we can abstract that SoC specific > > > > > complexity. I believe each allocated addressed has to be mapped to a zone, and > > > > > that can only be done in the secure application. I can imagine similar needs > > > > > when the protection is done using some sort of a VM / hypervisor. > > > > > > > > > > Nicolas > > > > > > > > > > > > > The idea in this design is to abstract the heap management from the > > > > Panthor kernel driver (which consumes a DMA buffer from it). > > > > > > > > In a system, an integrator would have implemented a secure heap driver, > > > > and could be based on TEE or a carved-out memory with restricted access, > > > > or else. This heap driver would be responsible of implementing the > > > > logic to: allocate, free, refcount, etc. > > > > > > > > The heap would be retrieved by the Panthor kernel driver in order to > > > > allocate protected memory to load the FW and allow the GPU to enter/exit > > > > protected mode. This memory would not belong to a user space process. > > > > The driver allocates it at the time of loading the FW and initialization > > > > of the GPU HW. This is a device globally owned protected memory. > > > > > > The thing is, it's really not clear why you absolutely need to have the > > > Panthor driver involved there. It won't be transparent to userspace, > > > since you'd need an extra flag at allocation time, and the buffers > > > behave differently. If userspace has to be aware of it, what's the > > > advantage to your approach compared to just exposing a heap for those > > > secure buffers, and letting userspace allocate its buffers from there? > > > > Unless I'm mistaken, the Panthor driver loads its own firmware. Since loading > > the firmware requires placing the data in a protected memory region, and that > > this aspect has no exposure to userspace, how can Panthor not be implicated ? > > Right, the very reason we need protected memory early is because some > FW sections need to be allocated from the protected pool, otherwise the > TEE will fault as soon at the FW enters the so-called 'protected mode'. How does that work if you don't have some way to allocate the protected memory? You can still submit jobs to the GPU, but you can't submit / execute "protected jobs"? > Now, it's not impossible to work around this limitation. For instance, > we could load the FW without this protected section by default (what we > do right now), and then provide a DRM_PANTHOR_ENABLE_FW_PROT_MODE > ioctl that would take a GEM object imported from a dmabuf allocated > from the protected dma-heap by userspace. We can then reset the FW and > allow it to operate in protected mode after that point. Urgh, I'd rather avoid that dance if possible :) > This approach has two downsides though: > > 1. We have no way of checking that the memory we're passed is actually > suitable for FW execution in a protected context. If we're passed > random memory, this will likely hang the platform as soon as we enter > protected mode. It's a current limitation of dma-buf in general, and you'd have the same issue right now if someone imports a buffer, or misconfigure the heap for a !protected heap. I'd really like to have some way to store some metadata in dma_buf, if only to tell that the buffer is protected. I suspect you'd also need that if you do things like do protected video playback through a codec, get a protected frame, and want to import that into the GPU. Depending on how you allocate it, either the codec or the GPU or both will want to make sure it's protected. > 2. If the driver already boot the FW and exposed a DRI node, we might > have GPU workloads running, and doing a FW reset might incur a slight > delay in GPU jobs execution. > > I think #1 is a more general issue that applies to suspend buffers > allocated for GPU contexts too. If we expose ioctls where we take > protected memory buffers that can possibly lead to crashes if they are > not real protected memory regions, and we have no way to ensure the > memory is protected, we probably want to restrict these ioctls/modes to > some high-privilege CAP_SYS_. > > For #2, that's probably something we can live with, since it's a > one-shot thing. If it becomes an issue, we can even make sure we enable > the FW protected-mode before the GPU starts being used for real. > > This being said, I think the problem applies outside Panthor, and it > might be that the video codec can't reset the FW/HW block to switch to > protected mode as easily as Panthor. > > Note that there's also downsides to the reserved-memory node approach, > where some bootloader stage would ask the secure FW to reserve a > portion of mem and pass this through the DT. This sort of things tend to > be an integration mess, where you need all the pieces of the stack (TEE, > u-boot, MTK dma-heap driver, gbm, ...) to be at a certain version to > work properly. If we go the ioctl() way, we restrict the scope to the > TEE, gbm/mesa and the protected-dma-heap driver, which is still a lot, > but we've ripped the bootloader out of the equation at least. Yeah. I also think there's two discussions in parallel here: 1) Being able to allocate protected buffers from the driver 2) Exposing an interface to allocate those to userspace I'm not really convinced we need 2, but 1 is obviously needed from what you're saying. Maxime

1 year, 2 months

Re: [RFC PATCH 3/5] dt-bindings: gpu: Add protected heap name to Mali Valhall CSF binding

by Krzysztof Kozlowski

On 03/02/2025 16:31, Florent Tomasin wrote: > Hi Krzysztof > > On 30/01/2025 13:25, Krzysztof Kozlowski wrote: >> On 30/01/2025 14:08, Florent Tomasin wrote: >>> Allow mali-valhall-csf driver to retrieve a protected >>> heap at probe time by passing the name of the heap >>> as attribute to the device tree GPU node. >> >> Please wrap commit message according to Linux coding style / submission >> process (neither too early nor over the limit): >> https://elixir.bootlin.com/linux/v6.4-rc1/source/Documentation/process/subm… > Apologies, I think I made quite few other mistakes in the style of the > patches I sent. I will work on improving this aspect, appreciated > >> Why this cannot be passed by phandle, just like all reserved regions? >> >> From where do you take these protected heaps? Firmware? This would >> explain why no relation is here (no probe ordering, no device links, >> nothing connecting separate devices). > > The protected heap is generaly obtained from a firmware (TEE) and could > sometimes be a carved-out memory with restricted access. Which is a reserved memory, isn't it? > > The Panthor CSF kernel driver does not own or manage the protected heap > and is instead a consumer of it (assuming the heap is made available by > the system integrator). > > I initially used a phandle, but then I realised it would introduce a new > API to share the heap across kernel driver. In addition I found this > patch series: > - > https://lore.kernel.org/lkml/20230911023038.30649-1-yong.wu@mediatek.com/#t > > which introduces a DMA Heap API to the rest of the kernel to find a > heap by name: > - dma_heap_find() > > I then decided to follow that approach to help isolate the heap > management from the GPU driver code. In the Panthor driver, if the > heap is not found at probe time, the driver will defer the probe until > the exporter made it available. I don't talk here really about the driver but even above mediatek patchset uses reserved memory bindings. You explained some things about driver yet you did not answer the question. This looks like reserved memory. If it does not, bring arguments why this binding cannot be a reserved memory, why hardware is not a carve out memory. Best regards, Krzysztof

1 year, 3 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig February 2025