Hi,
On Thu, 13 Feb 2025 at 12:40, Boris Brezillon boris.brezillon@collabora.com wrote:
On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information.
Right, but there's nothing in the DMA heap provider API forcing that.
Yeah. DMA heaps are just a way to allocate memory from a specific place. It allows people to settle on having a single way to do allocations from weird platform-specific places; the only weird platform-specific part userspace needs to deal with is figuring out the name to use. The rest is at least a unified API: the point of dma-heaps was exactly to have a single coherent API for userspace, not to create one API for ZONE_CMA and DT ranges and everyone else doing their own thing.
Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced.
If the consensus is that restricted/protected memory allocation should always be routed to the TEE, sure, but I had the feeling this wasn't as clear as that. OTOH, using a dma-heap to expose the TEE-SDP implementation provides the same benefits, without making potential future non-TEE based implementations a pain for users. The dma-heap ioctl being common to all implementations, it just becomes a configuration matter if we want to change the heap we rely on for protected/restricted buffer allocation. And because heaps have unique/well-known names, users can still default to (or rely solely on) the TEE-SPD implementation if they want.
There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic.
Just because all previous protected/restricted dma-heap effort failed to make it upstream, doesn't mean dma-heap is the wrong way of exposing this feature IMHO.
To be fair, having a TEE implementation does give us a much better chance of having a sensible cross-vendor plan. And the fact it's already (sort of accidentally and only on one platform AFAICT) ready for a 'test' interface, where we can still exercise protected allocation paths but without having to go through all the platform-specific setup that is inaccessible to most people, is also really great! That's probably been the biggest barrier to having this tested outside of IHVs and OEMs.
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT? How does userspace pick which TEE device to use? What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
Cheers, Daniel
Hi,
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 12:40, Boris Brezillon boris.brezillon@collabora.com wrote:
On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information.
Right, but there's nothing in the DMA heap provider API forcing that.
Yeah. DMA heaps are just a way to allocate memory from a specific place. It allows people to settle on having a single way to do allocations from weird platform-specific places; the only weird platform-specific part userspace needs to deal with is figuring out the name to use. The rest is at least a unified API: the point of dma-heaps was exactly to have a single coherent API for userspace, not to create one API for ZONE_CMA and DT ranges and everyone else doing their own thing.
Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced.
If the consensus is that restricted/protected memory allocation should always be routed to the TEE, sure, but I had the feeling this wasn't as clear as that. OTOH, using a dma-heap to expose the TEE-SDP implementation provides the same benefits, without making potential future non-TEE based implementations a pain for users. The dma-heap ioctl being common to all implementations, it just becomes a configuration matter if we want to change the heap we rely on for protected/restricted buffer allocation. And because heaps have unique/well-known names, users can still default to (or rely solely on) the TEE-SPD implementation if they want.
There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic.
Just because all previous protected/restricted dma-heap effort failed to make it upstream, doesn't mean dma-heap is the wrong way of exposing this feature IMHO.
To be fair, having a TEE implementation does give us a much better chance of having a sensible cross-vendor plan. And the fact it's already (sort of accidentally and only on one platform AFAICT) ready for a 'test' interface, where we can still exercise protected allocation paths but without having to go through all the platform-specific setup that is inaccessible to most people, is also really great! That's probably been the biggest barrier to having this tested outside of IHVs and OEMs.
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
How does userspace pick which TEE device to use?
There's normally only one and even if there is more than one it should be safe to assume that only one of them should be used when allocating restricted memory (TEE_GEN_CAP_RSTMEM from TEE_IOC_VERSION).
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
Cheers, Jens
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
Standardising on heaps allows us to add those in a similar way. If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
Cheers, Daniel
Hi,
On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant.
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself.
Standardising on heaps allows us to add those in a similar way.
How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both?
If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example.
Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps.
Cheers, Jens
Cheers, Daniel
On Fri, 14 Feb 2025 at 15:37, Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant.
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself.
Standardising on heaps allows us to add those in a similar way.
How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both?
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example.
Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps.
+1
-Sumit
linaro-mm-sig@lists.linaro.org