On Mon, Feb 03, 2025 at 04:43:23PM +0000, Florent Tomasin wrote:
> Hi Maxime, Nicolas
>
> On 30/01/2025 17:47, Nicolas Dufresne wrote:
> > Le jeudi 30 janvier 2025 à 17:38 +0100, Maxime Ripard a écrit :
> >> Hi Nicolas,
> >>
> >> On Thu, Jan 30, 2025 at 10:59:56AM -0500, Nicolas Dufresne wrote:
> >>> Le jeudi 30 janvier 2025 à 14:46 +0100, Maxime Ripard a écrit :
> >>>> Hi,
> >>>>
> >>>> I started to review it, but it's probably best to discuss it here.
> >>>>
> >>>> On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote:
> >>>>> Hi,
> >>>>>
> >>>>> This is a patch series covering the support for protected mode execution in
> >>>>> Mali Panthor CSF kernel driver.
> >>>>>
> >>>>> The Mali CSF GPUs come with the support for protected mode execution at the
> >>>>> HW level. This feature requires two main changes in the kernel driver:
> >>>>>
> >>>>> 1) Configure the GPU with a protected buffer. The system must provide a DMA
> >>>>> heap from which the driver can allocate a protected buffer.
> >>>>> It can be a carved-out memory or dynamically allocated protected memory region.
> >>>>> Some system includes a trusted FW which is in charge of the protected memory.
> >>>>> Since this problem is integration specific, the Mali Panthor CSF kernel
> >>>>> driver must import the protected memory from a device specific exporter.
> >>>>
> >>>> Why do you need a heap for it in the first place? My understanding of
> >>>> your series is that you have a carved out memory region somewhere, and
> >>>> you want to allocate from that carved out memory region your buffers.
> >>>>
> >>>> How is that any different from using a reserved-memory region, adding
> >>>> the reserved-memory property to the GPU device and doing all your
> >>>> allocation through the usual dma_alloc_* API?
> >>>
> >>> How do you then multiplex this region so it can be shared between
> >>> GPU/Camera/Display/Codec drivers and also userspace ?
> >>
> >> You could point all the devices to the same reserved memory region, and
> >> they would all allocate from there, including for their userspace-facing
> >> allocations.
> >
> > I get that using memory region is somewhat more of an HW description, and
> > aligned with what a DT is supposed to describe. One of the challenge is that
> > Mediatek heap proposal endup calling into their TEE, meaning knowing the region
> > is not that useful. You actually need the TEE APP guid and its IPC protocol. If
> > we can dell drivers to use a head instead, we can abstract that SoC specific
> > complexity. I believe each allocated addressed has to be mapped to a zone, and
> > that can only be done in the secure application. I can imagine similar needs
> > when the protection is done using some sort of a VM / hypervisor.
> >
> > Nicolas
> >
>
> The idea in this design is to abstract the heap management from the
> Panthor kernel driver (which consumes a DMA buffer from it).
>
> In a system, an integrator would have implemented a secure heap driver,
> and could be based on TEE or a carved-out memory with restricted access,
> or else. This heap driver would be responsible of implementing the
> logic to: allocate, free, refcount, etc.
>
> The heap would be retrieved by the Panthor kernel driver in order to
> allocate protected memory to load the FW and allow the GPU to enter/exit
> protected mode. This memory would not belong to a user space process.
> The driver allocates it at the time of loading the FW and initialization
> of the GPU HW. This is a device globally owned protected memory.
The thing is, it's really not clear why you absolutely need to have the
Panthor driver involved there. It won't be transparent to userspace,
since you'd need an extra flag at allocation time, and the buffers
behave differently. If userspace has to be aware of it, what's the
advantage to your approach compared to just exposing a heap for those
secure buffers, and letting userspace allocate its buffers from there?
> When I came across this patch series:
> -
> https://lore.kernel.org/lkml/20230911023038.30649-1-yong.wu@mediatek.com/#t
> I found it could help abstract the interface between the secure heap and
> the integration of protected memory in Panthor.
>
> A kernel driver would have to find the heap: `dma_heap_find()`, then
> request allocation of a DMA buffer from it. The heap driver would deal
> with the specifities of the protected memory on the system.
Sure, but we still have to address *why* it would be a good idea for the
driver to do it in the first place. The mediatek series had the same
feedback.
Maxime
Le lundi 03 février 2025 à 16:43 +0000, Florent Tomasin a écrit :
> Hi Maxime, Nicolas
>
> On 30/01/2025 17:47, Nicolas Dufresne wrote:
> > Le jeudi 30 janvier 2025 à 17:38 +0100, Maxime Ripard a écrit :
> > > Hi Nicolas,
> > >
> > > On Thu, Jan 30, 2025 at 10:59:56AM -0500, Nicolas Dufresne wrote:
> > > > Le jeudi 30 janvier 2025 à 14:46 +0100, Maxime Ripard a écrit :
> > > > > Hi,
> > > > >
> > > > > I started to review it, but it's probably best to discuss it here.
> > > > >
> > > > > On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote:
> > > > > > Hi,
> > > > > >
> > > > > > This is a patch series covering the support for protected mode execution in
> > > > > > Mali Panthor CSF kernel driver.
> > > > > >
> > > > > > The Mali CSF GPUs come with the support for protected mode execution at the
> > > > > > HW level. This feature requires two main changes in the kernel driver:
> > > > > >
> > > > > > 1) Configure the GPU with a protected buffer. The system must provide a DMA
> > > > > > heap from which the driver can allocate a protected buffer.
> > > > > > It can be a carved-out memory or dynamically allocated protected memory region.
> > > > > > Some system includes a trusted FW which is in charge of the protected memory.
> > > > > > Since this problem is integration specific, the Mali Panthor CSF kernel
> > > > > > driver must import the protected memory from a device specific exporter.
> > > > >
> > > > > Why do you need a heap for it in the first place? My understanding of
> > > > > your series is that you have a carved out memory region somewhere, and
> > > > > you want to allocate from that carved out memory region your buffers.
> > > > >
> > > > > How is that any different from using a reserved-memory region, adding
> > > > > the reserved-memory property to the GPU device and doing all your
> > > > > allocation through the usual dma_alloc_* API?
> > > >
> > > > How do you then multiplex this region so it can be shared between
> > > > GPU/Camera/Display/Codec drivers and also userspace ?
> > >
> > > You could point all the devices to the same reserved memory region, and
> > > they would all allocate from there, including for their userspace-facing
> > > allocations.
> >
> > I get that using memory region is somewhat more of an HW description, and
> > aligned with what a DT is supposed to describe. One of the challenge is that
> > Mediatek heap proposal endup calling into their TEE, meaning knowing the region
> > is not that useful. You actually need the TEE APP guid and its IPC protocol. If
> > we can dell drivers to use a head instead, we can abstract that SoC specific
> > complexity. I believe each allocated addressed has to be mapped to a zone, and
> > that can only be done in the secure application. I can imagine similar needs
> > when the protection is done using some sort of a VM / hypervisor.
> >
> > Nicolas
> >
>
> The idea in this design is to abstract the heap management from the
> Panthor kernel driver (which consumes a DMA buffer from it).
>
> In a system, an integrator would have implemented a secure heap driver,
> and could be based on TEE or a carved-out memory with restricted access,
> or else. This heap driver would be responsible of implementing the
> logic to: allocate, free, refcount, etc.
>
> The heap would be retrieved by the Panthor kernel driver in order to
> allocate protected memory to load the FW and allow the GPU to enter/exit
> protected mode. This memory would not belong to a user space process.
> The driver allocates it at the time of loading the FW and initialization
> of the GPU HW. This is a device globally owned protected memory.
This use case also applies well for codec. The Mediatek SCP firmware needs to be
loaded with a restricted memory too, its a very similar scenario, plus Mediatek
chips often include a Mali. On top of that, V4L2 codecs (in general) do need to
allocate internal scratch buffer for the IP to write to for things like motion
vectors, reconstruction frames, entropy statistics, etc. The IP will only be
able to write if the memory is restricted.
Nicolas
>
> When I came across this patch series:
> -
> https://lore.kernel.org/lkml/20230911023038.30649-1-yong.wu@mediatek.com/#t
> I found it could help abstract the interface between the secure heap and
> the integration of protected memory in Panthor.
>
> A kernel driver would have to find the heap: `dma_heap_find()`, then
> request allocation of a DMA buffer from it. The heap driver would deal
> with the specifities of the protected memory on the system.
>
> > >
> > > > Also, how the secure memory is allocted / obtained is a process that
> > > > can vary a lot between SoC, so implementation details assumption
> > > > should not be coded in the driver.
> > >
> > > But yeah, we agree there, it's also the point I was trying to make :)
> > >
> > > Maxime
> >
>
> Agree with your point, the Panthor kernel driver may not be aware of the
> heap management logic. As an alternative to the DMA heap API used here,
> I also tried to expose the heap by passing the phandle of a "heap"
> device to Panthor. The reference to the DMA heap was stored as a private
> data of the heap device as a new type: `struct dma_heap_import_info`,
> similar to the existing type: `struct dma_heap_export_info`.
> This made me think it could be problematic, as the private data type
> would have to be cast before accessing it from the importer driver. I
> worried about a mis-use of the types with this approach.
>
> Regards,
> Florent
Hi Florent,
Le lundi 03 février 2025 à 13:36 +0000, Florent Tomasin a écrit :
>
> On 30/01/2025 13:28, Maxime Ripard wrote:
> > Hi,
> >
> > On Thu, Jan 30, 2025 at 01:08:57PM +0000, Florent Tomasin wrote:
> > > Introduce a CMA Heap dt-binding allowing custom
> > > CMA heap registrations.
> > >
> > > * Note to the reviewers:
> > > The patch was used for the development of the protected mode
Just to avoid divergence in nomenclature, and because this is not a new subject,
perhaps you should also adhere to the name "restricted". Both Linaro and
Mediatek have moved from "secure" to that name in their proposal. As you are the
third proposing this (at least for the proposal that are CCed on linux-media), I
would have expected in your cover letter a summary of how the other requirement
have been blended in your proposal.
regards,
Nicolas
> > > feature in Panthor CSF kernel driver and is not initially thought
> > > to land in the Linux kernel. It is mostly relevant if someone
> > > wants to reproduce the environment of testing. Please, raise
> > > interest if you think the patch has value in the Linux kernel.
> > >
> > > Signed-off-by: Florent Tomasin <florent.tomasin(a)arm.com>
> > > ---
> > > .../devicetree/bindings/dma/linux,cma.yml | 43 +++++++++++++++++++
> > > 1 file changed, 43 insertions(+)
> > > create mode 100644 Documentation/devicetree/bindings/dma/linux,cma.yml
> > >
> > > diff --git a/Documentation/devicetree/bindings/dma/linux,cma.yml b/Documentation/devicetree/bindings/dma/linux,cma.yml
> > > new file mode 100644
> > > index 000000000000..c532e016bbe5
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/dma/linux,cma.yml
> > > @@ -0,0 +1,43 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/dma/linux-cma.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: Custom Linux CMA heap
> > > +
> > > +description:
> > > + The custom Linux CMA heap device tree node allows registering
> > > + of multiple CMA heaps.
> > > +
> > > + The CMA heap name will match the node name of the "memory-region".
> > > +
> > > +properties:
> > > + compatible:
> > > + enum:
> > > + - linux,cma
> > > +
> > > + memory-region:
> > > + maxItems: 1
> > > + description: |
> > > + Phandle to the reserved memory node associated with the CMA Heap.
> > > + The reserved memory node must follow this binding convention:
> > > + - Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +
> > > +examples:
> > > + - |
> > > + reserved-memory {
> > > + #address-cells = <2>;
> > > + #size-cells = <2>;
> > > +
> > > + custom_cma_heap: custom-cma-heap {
> > > + compatible = "shared-dma-pool";
> > > + reg = <0x0 0x90600000 0x0 0x1000000>;
> > > + reusable;
> > > + };
> > > + };
> > > +
> > > + device_cma_heap: device-cma-heap {
> > > + compatible = "linux,cma";
> > > + memory-region = <&custom_cma_heap>;
> > > + };
> >
> > Isn't it redundant with the linux,cma-default shared-dma-pool property
> > already?
> >
> > Maxime
>
> Hi Maxime,
>
> Please correct me if my understanding is wrong,
>
> The existing properties: linux,cma-default and shared-dma-pool, do not
> allow the creations of multiple standalone CMA heaps, those will create
> a single CMA heap: `dma_contiguous_default_area`? Other CMA heaps will
> be bound to a driver.
>
> I introduced the "linux,cma" to allow creating multiple standalone CMA
> heaps, with the intention of validating the protected mode support on
> Mali CSG GPUs. It was included in the RFC in there are interests in
> this approach.
>
> Since the Panthor CSF kernel driver does not own or manage a heap,
> I needed a way to create a standalone heap. The idea here is for the
> kernel driver to be an importer. I relied on a patch series to retrieve
> the heap and allocate a DMA buffer from it:
> - dma_heap_find()
> - dma_heap_buffer_alloc()
> - dma_heap_put()
>
> Ref:
> https://lore.kernel.org/lkml/20230911023038.30649-1-yong.wu@mediatek.com/#t
>
>
> Since the protected/secure memory management is integration specific,
> I needed a generic way for Panthor to allocate from such heap.
>
> In some scenarios it might be a carved-out memory, in others a FW will
> reside in the system (TEE) and require a secure heap driver to allocate
> memory (e.g: a similar approach is followd by MTK). Such driver would
> implement the allocation and free logic.
>
> Florent
>
>
>
I can't see patch #1 in my inbox for some reason, but I already know
what it does from your repository.
Feel free to add Reviewed-by: Christian König <christian.koenig(a)amd.com>
to the entire series.
Regards,
Christian.
Am 31.01.25 um 12:02 schrieb Pierre-Eric Pelloux-Prayer:
> Hi,
>
> The initial goal of this series was to improve the drm and amdgpu
> trace events to be able to expose more of the inner workings of
> the scheduler and drivers to developers via tools.
>
> Then, the series evolved to become focused only on gpu_scheduler.
> The changes around vblank events will be part of a different
> series, as well as the amdgpu ones.
>
> Moreover Sima suggested to make some trace events stable uAPI,
> so tools can rely on them long term.
>
> The first patches extend and cleanup the gpu scheduler events.
>
> The last one adds a documentation entry in drm-uapi.rst.
>
> Changes since v6:
> * Addressed comments from Philipp, Tvrtko, Steven
> * The commit storing drm_client_id in sched_fence adds an extra
> field which looks like a duplicate of the owner field. Currently
> amdgpu uses the owner field with some magic values, so we have to
> have 2 separate fields for now, but ultimately one could be removed.
> Similarly storing the drm_client_id in the sched_entity is not
> really possible as there's nothing preventing a driver to use a
> sched_entity with multiple drm_file instances.
>
>
> Useful links:
> - userspace tool using the updated events:
> https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37
> - v6:
> https://lists.freedesktop.org/archives/dri-devel/2024-November/477644.html
>
> Pierre-Eric Pelloux-Prayer (7):
> drm/debugfs: output client_id in in drm_clients_info
> drm/sched: store the drm client_id in drm_sched_fence
> drm/sched: add device name to the drm_sched_process_job event
> drm/sched: cleanup gpu_scheduler trace events
> drm/sched: trace dependencies for gpu jobs
> drm/sched: add the drm_client_id to the drm_sched_run/exec_job events
> drm/doc: document some tracepoints as uAPI
>
> Documentation/gpu/drm-uapi.rst | 19 +++
> drivers/accel/amdxdna/aie2_ctx.c | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 +-
> drivers/gpu/drm/drm_debugfs.c | 10 +-
> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 2 +-
> drivers/gpu/drm/imagination/pvr_job.c | 2 +-
> drivers/gpu/drm/imagination/pvr_queue.c | 5 +-
> drivers/gpu/drm/imagination/pvr_queue.h | 2 +-
> drivers/gpu/drm/lima/lima_gem.c | 2 +-
> drivers/gpu/drm/lima/lima_sched.c | 6 +-
> drivers/gpu/drm/lima/lima_sched.h | 3 +-
> drivers/gpu/drm/msm/msm_gem_submit.c | 8 +-
> drivers/gpu/drm/nouveau/nouveau_sched.c | 3 +-
> drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
> drivers/gpu/drm/panthor/panthor_drv.c | 3 +-
> drivers/gpu/drm/panthor/panthor_mmu.c | 2 +-
> drivers/gpu/drm/panthor/panthor_sched.c | 5 +-
> drivers/gpu/drm/panthor/panthor_sched.h | 3 +-
> .../gpu/drm/scheduler/gpu_scheduler_trace.h | 123 ++++++++++++++----
> drivers/gpu/drm/scheduler/sched_entity.c | 8 +-
> drivers/gpu/drm/scheduler/sched_fence.c | 4 +-
> drivers/gpu/drm/scheduler/sched_main.c | 8 +-
> drivers/gpu/drm/v3d/v3d_submit.c | 2 +-
> drivers/gpu/drm/xe/xe_sched_job.c | 3 +-
> include/drm/gpu_scheduler.h | 12 +-
> 28 files changed, 192 insertions(+), 64 deletions(-)
>
On Thu, Jan 30, 2025 at 01:08:57PM +0000, Florent Tomasin wrote:
> Introduce a CMA Heap dt-binding allowing custom
> CMA heap registrations.
>
> * Note to the reviewers:
> The patch was used for the development of the protected mode
> feature in Panthor CSF kernel driver and is not initially thought
> to land in the Linux kernel. It is mostly relevant if someone
> wants to reproduce the environment of testing. Please, raise
> interest if you think the patch has value in the Linux kernel.
Why would panthor need CMA, it has an MMU.
In any case, I agree with Maxime that this is redundant.
Rob
Hi,
I started to review it, but it's probably best to discuss it here.
On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote:
> Hi,
>
> This is a patch series covering the support for protected mode execution in
> Mali Panthor CSF kernel driver.
>
> The Mali CSF GPUs come with the support for protected mode execution at the
> HW level. This feature requires two main changes in the kernel driver:
>
> 1) Configure the GPU with a protected buffer. The system must provide a DMA
> heap from which the driver can allocate a protected buffer.
> It can be a carved-out memory or dynamically allocated protected memory region.
> Some system includes a trusted FW which is in charge of the protected memory.
> Since this problem is integration specific, the Mali Panthor CSF kernel
> driver must import the protected memory from a device specific exporter.
Why do you need a heap for it in the first place? My understanding of
your series is that you have a carved out memory region somewhere, and
you want to allocate from that carved out memory region your buffers.
How is that any different from using a reserved-memory region, adding
the reserved-memory property to the GPU device and doing all your
allocation through the usual dma_alloc_* API?
Or do you expect to have another dma-buf heap that would call into the
firmware to perform the allocations?
The semantics of the CMA heap allocations is a concern too.
Another question is how would you expect something like a secure
video-playback pipeline to operate with that kind of interface. Assuming
you have a secure codec, you would likely get that protected buffer from
the codec, right? So the most likely scenario would be to then import
that dma-buf into the GPU driver, but not allocate the buffer from it.
Overall, I think a better abstraction would be to have a heap indeed to
allocate your protected buffers from, and then import them in the
devices that need them. The responsibility would be on the userspace to
do so, but it already kind of does with your design anyway.
Maxime
Hi,
On Thu, Jan 30, 2025 at 01:08:58PM +0000, Florent Tomasin wrote:
> This patch introduces a cma-heap probe function, allowing
> users to register custom cma heaps in the device tree.
>
> A "memory-region" is bound to the cma heap at probe time
> allowing allocation of DMA buffers from that heap.
>
> Use cases:
> - registration of carved out secure heaps. Some devices
> are implementing secure memory by reserving a specific
> memory regions for that purpose. For example, this is the
> case of platforms making use of early version of
> ARM TrustZone.
In such a case, the CMA heap would de-facto become un-mappable for
userspace, right?
> - registration of multiple memory regions at different
> locations for efficiency or HW integration reasons.
> For example, a peripheral may expect to share data at a
> specific location in RAM. This information could have been
> programmed by a FW prior to the kernel boot.
How would you differentiate between them?
Maxime
On 30/01/2025 14:08, Florent Tomasin wrote:
> Allow mali-valhall-csf driver to retrieve a protected
> heap at probe time by passing the name of the heap
> as attribute to the device tree GPU node.
Please wrap commit message according to Linux coding style / submission
process (neither too early nor over the limit):
https://elixir.bootlin.com/linux/v6.4-rc1/source/Documentation/process/subm…
Why this cannot be passed by phandle, just like all reserved regions?
From where do you take these protected heaps? Firmware? This would
explain why no relation is here (no probe ordering, no device links,
nothing connecting separate devices).
Best regards,
Krzysztof