On Thu, Jun 20, 2024 at 3:52 PM Hans Verkuil <hverkuil-cisco(a)xs4all.nl> wrote:
>
> On 19/06/2024 06:19, Tomasz Figa wrote:
> > On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne <nicolas(a)ndufresne.ca> wrote:
> >>
> >> Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit :
> >>> Hi TaoJiang,
> >>>
> >>> On Tue, Jun 18, 2024 at 4:30 PM TaoJiang <tao.jiang_2(a)nxp.com> wrote:
> >>>>
> >>>> From: Ming Qian <ming.qian(a)nxp.com>
> >>>>
> >>>> When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know
> >>>> whether the dma buffer is coherent or synchronized.
> >>>>
> >>>> The videobuf2-core will skip cache syncs as it think the DMA exporter
> >>>> should take care of cache syncs
> >>>>
> >>>> But in fact it's likely that the client doesn't
> >>>> synchronize the dma buf before qbuf() or after dqbuf(). and it's
> >>>> difficult to find this type of error directly.
> >>>>
> >>>> I think it's helpful that videobuf2-core can call
> >>>> dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the
> >>>> cache syncs.
> >>>>
> >>>> Signed-off-by: Ming Qian <ming.qian(a)nxp.com>
> >>>> Signed-off-by: TaoJiang <tao.jiang_2(a)nxp.com>
> >>>> ---
> >>>> .../media/common/videobuf2/videobuf2-core.c | 22 +++++++++++++++++++
> >>>> 1 file changed, 22 insertions(+)
> >>>>
> >>>
> >>> Sorry, that patch is incorrect. I believe you're misunderstanding the
> >>> way DMA-buf buffers should be managed in the userspace. It's the
> >>> userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to
> >>> signal start and end of CPU access to the kernel and imply necessary
> >>> cache synchronization.
> >>>
> >>> [1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls
> >>>
> >>> So, really sorry, but it's a NAK.
> >>
> >>
> >>
> >> This patch *could* make sense if it was inside UVC Driver as an example, as this
> >> driver can import dmabuf, to CPU memcpy, and does omits the required sync calls
> >> (unless that got added recently, I can easily have missed it).
> >
> > Yeah, currently V4L2 drivers don't call the in-kernel
> > dma_buf_{begin,end}_cpu_access() when they need to access the buffers
> > from the CPU, while my quick grep [1] reveals that we have 68 files
> > retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a
> > 100% guarantee of CPU access being done, but rather likely so).
> >
> > I also repeated the same thing with VB2_DMABUF [2] and tried to
> > attribute both lists to specific drivers (by retaining the path until
> > the first - or _ [3]; which seemed to be relatively accurate), leading
> > to the following drivers that claim support for DMABUF while also
> > retrieving plane vaddr (without proper synchronization - no drivers
> > currently call any begin/end CPU access):
> >
> > i2c/video
> > pci/bt8xx/bttv
> > pci/cobalt/cobalt
> > pci/cx18/cx18
> > pci/tw5864/tw5864
> > pci/tw686x/tw686x
> > platform/allegro
> > platform/amphion/vpu
> > platform/chips
> > platform/intel/pxa
> > platform/marvell/mcam
> > platform/mediatek/jpeg/mtk
> > platform/mediatek/vcodec/decoder/mtk
> > platform/mediatek/vcodec/encoder/mtk
> > platform/nuvoton/npcm
> > platform/nvidia/tegra
> > platform/nxp/imx
> > platform/renesas/rcar
> > platform/renesas/vsp1/vsp1
> > platform/rockchip/rkisp1/rkisp1
> > platform/samsung/exynos4
> > platform/samsung/s5p
> > platform/st/sti/delta/delta
> > platform/st/sti/hva/hva
> > platform/verisilicon/hantro
> > usb/au0828/au0828
> > usb/cx231xx/cx231xx
> > usb/dvb
> > usb/em28xx/em28xx
> > usb/gspca/gspca.c
> > usb/hackrf/hackrf.c
> > usb/stk1160/stk1160
> > usb/uvc/uvc
> >
> > which means we potentially have ~30 drivers which likely don't handle
> > imported DMABUFs correctly (there is still a chance that DMABUF is
> > advertised for one queue, while vaddr is used for another).
> >
> > I think we have two options:
> > 1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each
> > driver and add calls to those
>
> I actually started on that 9 (!) years ago:
>
> https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access
>
> If memory serves, the main problem was that there were some drivers where
> it wasn't clear what should be done. In the end I never continued this
> work since nobody complained about it.
>
> This patch series adds vb2_plane_begin/end_cpu_access() functions,
> replaces all calls to vb2_plane_vaddr() in drivers to the new functions,
> and at the end removes vb2_plane_vaddr() altogether.
>
> > 2) take a heavy gun approach and just call vb2_begin_cpu_access()
> > whenever vb2_plane_vaddr() is called and then vb2_end_cpu_access()
> > whenever vb2_buffer_done() is called (if begin was called before).
> >
> > The latter has the disadvantage of drivers not having control over the
> > timing of the cache sync, so could end up with less than optimal
> > performance. Also there could be some more complex cases, where the
> > driver needs to mix DMA and CPU accesses to the buffer, so the fixed
> > sequence just wouldn't work for them. (But then they just wouldn't
> > work today either.)
> >
> > Hans, Marek, do you have any thoughts? (I'd personally just go with 2
> > and if any driver in the future needs something else, they could call
> > begin/end CPU access manually.)
>
> I prefer 1. If nothing else, that makes it easy to identify drivers
> that do such things.
>
> But perhaps a mix is possible: if a VB2 flag is set by the driver, then
> approach 2 is used. That might help with the drivers where it isn't clear
> what they should do. Although perhaps this can all be done in the driver
> itself: instead of vb2_plane_vaddr they call vb2_begin_cpu_access for the
> whole buffer, and at buffer_done time they call vb2_end_cpu_access. Should
> work just as well for the very few drivers that need this.
That's a good point. I guess we don't really need to dig so much into
those drivers in this case. Just mechanically do the same for all of
them (+/- maybe checking for some obvious corner cases which don't
need the extra calls). Let me see if I can give it a stab.
Best,
Tomasz
>
> Regards,
>
> Hans
>
> >
> > [1] git grep vb2_plane_vaddr | cut -d":" -f 1 | sort | uniq
> > [2] git grep VB2_DMABUF | cut -d":" -f 1 | sort | uniq
> > [3] by running [1] and [2] through | cut -d"-" -f 1 | cut -d"_" -f 1 | uniq
> >
> > Best,
> > Tomasz
> >
> >>
> >> But generally speaking, bracketing all driver with CPU access synchronization
> >> does not make sense indeed, so I second the rejection.
> >>
> >> Nicolas
> >>
> >>>
> >>> Best regards,
> >>> Tomasz
> >>>
> >>>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
> >>>> index 358f1fe42975..4734ff9cf3ce 100644
> >>>> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> >>>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> >>>> @@ -340,6 +340,17 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
> >>>> vb->synced = 1;
> >>>> for (plane = 0; plane < vb->num_planes; ++plane)
> >>>> call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> >>>> +
> >>>> + if (vb->memory != VB2_MEMORY_DMABUF)
> >>>> + return;
> >>>> + for (plane = 0; plane < vb->num_planes; ++plane) {
> >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf;
> >>>> +
> >>>> + if (!dbuf)
> >>>> + continue;
> >>>> +
> >>>> + dma_buf_end_cpu_access(dbuf, vb->vb2_queue->dma_dir);
> >>>> + }
> >>>> }
> >>>>
> >>>> /*
> >>>> @@ -356,6 +367,17 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
> >>>> vb->synced = 0;
> >>>> for (plane = 0; plane < vb->num_planes; ++plane)
> >>>> call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> >>>> +
> >>>> + if (vb->memory != VB2_MEMORY_DMABUF)
> >>>> + return;
> >>>> + for (plane = 0; plane < vb->num_planes; ++plane) {
> >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf;
> >>>> +
> >>>> + if (!dbuf)
> >>>> + continue;
> >>>> +
> >>>> + dma_buf_begin_cpu_access(dbuf, vb->vb2_queue->dma_dir);
> >>>> + }
> >>>> }
> >>>>
> >>>> /*
> >>>> --
> >>>> 2.43.0-rc1
> >>>>
> >>
> >
>
Am 10.07.24 um 16:35 schrieb Lei Liu:
>
> 在 2024/7/10 22:14, Christian König 写道:
>> Am 10.07.24 um 15:57 schrieb Lei Liu:
>>> Use vm_insert_page to establish a mapping for the memory allocated
>>> by dmabuf, thus supporting direct I/O read and write; and fix the
>>> issue of incorrect memory statistics after mapping dmabuf memory.
>>
>> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs.
>
> Hello! Could you explain why direct_io is disabled on DMABUF? Is there
> any historical reason for this?
It's basically one of the most fundamental design decision of DMA-Buf.
The attachment/map/fence model DMA-buf uses is not really compatible
with direct I/O on the underlying pages.
>>
>> We already discussed enforcing that in the DMA-buf framework and this
>> patch probably means that we should really do that.
>>
>> Regards,
>> Christian.
>
> Thank you for your response. With the application of AI large model
> edgeification, we urgently need support for direct_io on DMABUF to
> read some very large files. Do you have any new solutions or plans for
> this?
We have seen similar projects over the years and all of those turned out
to be complete shipwrecks.
There is currently a patch set under discussion to give the network
subsystem DMA-buf support. If you are interest in network direct I/O
that could help.
Additional to that a lot of GPU drivers support userptr usages, e.g. to
import malloced memory into the GPU driver. You can then also do direct
I/O on that malloced memory and the kernel will enforce correct handling
with the GPU driver through MMU notifiers.
But as far as I know a general DMA-buf based solution isn't possible.
Regards,
Christian.
>
> Regards,
> Lei Liu.
>
>>
>>>
>>> Lei Liu (2):
>>> mm: dmabuf_direct_io: Support direct_io for memory allocated by
>>> dmabuf
>>> mm: dmabuf_direct_io: Fix memory statistics error for dmabuf
>>> allocated
>>> memory with direct_io support
>>>
>>> drivers/dma-buf/heaps/system_heap.c | 5 +++--
>>> fs/proc/task_mmu.c | 8 +++++++-
>>> include/linux/mm.h | 1 +
>>> mm/memory.c | 15 ++++++++++-----
>>> mm/rmap.c | 9 +++++----
>>> 5 files changed, 26 insertions(+), 12 deletions(-)
>>>
>>
On Mon, Jul 8, 2024 at 6:47 AM Zenghui Yu <yuzenghui(a)huawei.com> wrote:
>
> Even if a vgem device is configured in, we will skip the import_vgem_fd()
> test almost every time.
>
> TAP version 13
> 1..11
> # Testing heap: system
> # =======================================
> # Testing allocation and importing:
> ok 1 # SKIP Could not open vgem -1
>
> The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver
> version information but leave the name field a non-null-terminated string.
> Terminate it properly to actually test against the vgem device.
Hm yeah. Looks like drm_copy_field resets version.name to the actual
size of the name in the case of truncation, so maybe worth checking
that too in case there is a name like "vgemfoo" that gets converted to
"vgem\0" by this?
>
> Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com>
> ---
> tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
> index 5f541522364f..2fcc74998fa9 100644
> --- a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
> +++ b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
> @@ -32,6 +32,8 @@ static int check_vgem(int fd)
> if (ret)
> return 0;
>
> + name[4] = '\0';
> +
> return !strcmp(name, "vgem");
> }
>
> --
> 2.33.0
>
On Thu, 4 Jul 2024 at 00:40, Amirreza Zarrabi <quic_azarrabi(a)quicinc.com> wrote:
>
>
>
> On 7/3/2024 10:13 PM, Dmitry Baryshkov wrote:
> > On Tue, Jul 02, 2024 at 10:57:36PM GMT, Amirreza Zarrabi wrote:
> >> Qualcomm TEE hosts Trusted Applications and Services that run in the
> >> secure world. Access to these resources is provided using object
> >> capabilities. A TEE client with access to the capability can invoke
> >> the object and request a service. Similarly, TEE can request a service
> >> from nonsecure world with object capabilities that are exported to secure
> >> world.
> >>
> >> We provide qcom_tee_object which represents an object in both secure
> >> and nonsecure world. TEE clients can invoke an instance of qcom_tee_object
> >> to access TEE. TEE can issue a callback request to nonsecure world
> >> by invoking an instance of qcom_tee_object in nonsecure world.
> >
> > Please see Documentation/process/submitting-patches.rst on how to write
> > commit messages.
>
> Ack.
>
> >
> >>
> >> Any driver in nonsecure world that is interested to export a struct (or a
> >> service object) to TEE, requires to embed an instance of qcom_tee_object in
> >> the relevant struct and implements the dispatcher function which is called
> >> when TEE invoked the service object.
> >>
> >> We also provids simplified API which implements the Qualcomm TEE transport
> >> protocol. The implementation is independent from any services that may
> >> reside in nonsecure world.
> >
> > "also" usually means that it should go to a separate commit.
>
> I will split this patch to multiple smaller ones.
>
[...]
> >
> >> + } in, out;
> >> +};
> >> +
> >> +int qcom_tee_object_do_invoke(struct qcom_tee_object_invoke_ctx *oic,
> >> + struct qcom_tee_object *object, unsigned long op, struct qcom_tee_arg u[], int *result);
> >
> > What's the difference between a result that gets returned by the
> > function and the result that gets retuned via the pointer?
>
> The function result, is local to kernel, for instance memory allocation failure,
> or failure to issue the smc call. The result in pointer, is the remote result,
> for instance return value from TA, or the TEE itself.
>
> I'll use better name, e.g. 'remote_result'?
See how this is handled by other parties. For example, PSCI. If you
have a standard set of return codes, translate them to -ESOMETHING in
your framework and let everybody else see only the standard errors.
--
With best wishes
Dmitry
On Mon, Jul 01, 2024 at 11:26:34PM -0700, Andrew Morton wrote:
> No, I do think the cast is useful:
>
> struct page *page = dma_fence_chain_alloc();
>
> will presently generate a warning. We want this. Your change will
> remove that useful warning.
>
>
> Unrelatedly: there is no earthly reason why this is implemented as a
> macro. A static inline function would be so much better. Why do we
> keep doing this.
Agreed with all of the above. Adding the dmabuf maintainers.
On Tue, Jul 02, 2024 at 10:57:36PM GMT, Amirreza Zarrabi wrote:
> Qualcomm TEE hosts Trusted Applications and Services that run in the
> secure world. Access to these resources is provided using object
> capabilities. A TEE client with access to the capability can invoke
> the object and request a service. Similarly, TEE can request a service
> from nonsecure world with object capabilities that are exported to secure
> world.
>
> We provide qcom_tee_object which represents an object in both secure
> and nonsecure world. TEE clients can invoke an instance of qcom_tee_object
> to access TEE. TEE can issue a callback request to nonsecure world
> by invoking an instance of qcom_tee_object in nonsecure world.
Please see Documentation/process/submitting-patches.rst on how to write
commit messages.
>
> Any driver in nonsecure world that is interested to export a struct (or a
> service object) to TEE, requires to embed an instance of qcom_tee_object in
> the relevant struct and implements the dispatcher function which is called
> when TEE invoked the service object.
>
> We also provids simplified API which implements the Qualcomm TEE transport
> protocol. The implementation is independent from any services that may
> reside in nonsecure world.
"also" usually means that it should go to a separate commit.
>
> Signed-off-by: Amirreza Zarrabi <quic_azarrabi(a)quicinc.com>
> ---
> drivers/firmware/qcom/Kconfig | 14 +
> drivers/firmware/qcom/Makefile | 2 +
> drivers/firmware/qcom/qcom_object_invoke/Makefile | 4 +
> drivers/firmware/qcom/qcom_object_invoke/async.c | 142 +++
> drivers/firmware/qcom/qcom_object_invoke/core.c | 1139 ++++++++++++++++++++
> drivers/firmware/qcom/qcom_object_invoke/core.h | 186 ++++
> .../qcom/qcom_object_invoke/qcom_scm_invoke.c | 22 +
> .../firmware/qcom/qcom_object_invoke/release_wq.c | 90 ++
> include/linux/firmware/qcom/qcom_object_invoke.h | 233 ++++
> 9 files changed, 1832 insertions(+)
>
> diff --git a/drivers/firmware/qcom/Kconfig b/drivers/firmware/qcom/Kconfig
> index 7f6eb4174734..103ab82bae9f 100644
> --- a/drivers/firmware/qcom/Kconfig
> +++ b/drivers/firmware/qcom/Kconfig
> @@ -84,4 +84,18 @@ config QCOM_QSEECOM_UEFISECAPP
> Select Y here to provide access to EFI variables on the aforementioned
> platforms.
>
> +config QCOM_OBJECT_INVOKE_CORE
> + bool "Secure TEE Communication Support"
tristate
> + help
> + Various Qualcomm SoCs have a Trusted Execution Environment (TEE) running
> + in the Trust Zone. This module provides an interface to that via the
> + capability based object invocation, using SMC calls.
> +
> + OBJECT_INVOKE_CORE allows capability based secure communication between
> + TEE and VMs. Using OBJECT_INVOKE_CORE, kernel can issue calls to TEE or
> + TAs to request a service or exposes services to TEE and TAs. It implements
> + the necessary marshaling of messages with TEE.
> +
> + Select Y here to provide access to TEE.
> +
> endmenu
> diff --git a/drivers/firmware/qcom/Makefile b/drivers/firmware/qcom/Makefile
> index 0be40a1abc13..dd5e00215b2e 100644
> --- a/drivers/firmware/qcom/Makefile
> +++ b/drivers/firmware/qcom/Makefile
> @@ -8,3 +8,5 @@ qcom-scm-objs += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
> obj-$(CONFIG_QCOM_TZMEM) += qcom_tzmem.o
> obj-$(CONFIG_QCOM_QSEECOM) += qcom_qseecom.o
> obj-$(CONFIG_QCOM_QSEECOM_UEFISECAPP) += qcom_qseecom_uefisecapp.o
> +
> +obj-y += qcom_object_invoke/
> diff --git a/drivers/firmware/qcom/qcom_object_invoke/Makefile b/drivers/firmware/qcom/qcom_object_invoke/Makefile
> new file mode 100644
> index 000000000000..6ef4d54891a5
> --- /dev/null
> +++ b/drivers/firmware/qcom/qcom_object_invoke/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +obj-$(CONFIG_QCOM_OBJECT_INVOKE_CORE) += object-invoke-core.o
> +object-invoke-core-objs := qcom_scm_invoke.o release_wq.o async.o core.o
> diff --git a/drivers/firmware/qcom/qcom_object_invoke/async.c b/drivers/firmware/qcom/qcom_object_invoke/async.c
> new file mode 100644
> index 000000000000..dd022ec68d8b
> --- /dev/null
> +++ b/drivers/firmware/qcom/qcom_object_invoke/async.c
> @@ -0,0 +1,142 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/kobject.h>
> +#include <linux/slab.h>
> +#include <linux/mutex.h>
> +
> +#include "core.h"
> +
> +/* Async handlers and providers. */
> +struct async_msg {
> + struct {
> + u32 version; /* Protocol version: top 16b major, lower 16b minor. */
> + u32 op; /* Async operation. */
> + } header;
> +
> + /* Format of the Async data field is defined by the specified operation. */
> +
> + struct {
> + u32 count; /* Number of objects that should be released. */
> + u32 obj[];
> + } op_release;
> +};
Another generic comment: please select some prefix (like QTEE_ / qtee_)
and use it for _all_ defines and all names in the driver.
`struct async_msg` means that it is some genric code that is applicable
to the whole kernel.
> +
> +/* Async Operations and header information. */
> +
> +#define ASYNC_HEADER_SIZE sizeof(((struct async_msg *)(0))->header)
Extract struct definition. Use sizeof(struct qtee_async_msg_header).
> +
> +/* ASYNC_OP_x: operation.
> + * ASYNC_OP_x_HDR_SIZE: header size for the operation.
> + * ASYNC_OP_x_SIZE: size of each entry in a message for the operation.
> + * ASYNC_OP_x_MSG_SIZE: size of a message with n entries.
> + */
> +
> +#define ASYNC_OP_RELEASE QCOM_TEE_OBJECT_OP_RELEASE /* Added in minor version 0x0000. **/
Anything before minor version 0x0000 ?
> +#define ASYNC_OP_RELEASE_HDR_SIZE offsetof(struct async_msg, op_release.obj)
> +#define ASYNC_OP_RELEASE_SIZE sizeof(((struct async_msg *)(0))->op_release.obj[0])
sizeof(u32) is much better
> +#define ASYNC_OP_RELEASE_MSG_SIZE(n) \
> + (ASYNC_OP_RELEASE_HDR_SIZE + ((n) * ASYNC_OP_RELEASE_SIZE))
struct_size(). But I think you should be able to inline and/or drop most
of these defines.
> +
> +/* async_qcom_tee_buffer return the available async buffer in the output buffer. */
> +
> +static struct qcom_tee_buffer async_qcom_tee_buffer(struct qcom_tee_object_invoke_ctx *oic)
Why do you need to return struct instance?
> +{
> + int i;
> + size_t offset;
> +
> + struct qcom_tee_callback *msg = (struct qcom_tee_callback *)oic->out.msg.addr;
> +
> + if (!(oic->flags & OIC_FLAG_BUSY))
> + return oic->out.msg;
> +
> + /* Async requests are appended to the output buffer after the CB message. */
> +
> + offset = OFFSET_TO_BUFFER_ARGS(msg, counts_total(msg->counts));
> +
> + for_each_input_buffer(i, msg->counts)
> + offset += align_offset(msg->args[i].b.size);
> +
> + for_each_output_buffer(i, msg->counts)
> + offset += align_offset(msg->args[i].b.size);
> +
> + if (oic->out.msg.size > offset) {
> + return (struct qcom_tee_buffer)
> + { { oic->out.msg.addr + offset }, oic->out.msg.size - offset };
> + }
> +
> + pr_err("no space left for async messages! or malformed message.\n");
No spamming on the kmsg.
> +
> + return (struct qcom_tee_buffer) { { 0 }, 0 };
This doesn't look correct.
> +}
> +
What does this function return?
> +static size_t async_release_handler(struct qcom_tee_object_invoke_ctx *oic,
> + struct async_msg *async_msg, size_t size)
Please ident the code properly, this should be aligned to the open
bracket.
> +{
> + int i;
> +
> + /* We need space for at least a single entry. */
> + if (size < ASYNC_OP_RELEASE_MSG_SIZE(1))
> + return 0;
> +
> + for (i = 0; i < async_msg->op_release.count; i++) {
> + struct qcom_tee_object *object;
> +
> + /* Remove the object from xa_qcom_tee_objects so that the object_id
> + * becomes invalid for further use. However, call put_qcom_tee_object
> + * to schedule the actual release if there is no user.
> + */
> +
> + object = erase_qcom_tee_object(async_msg->op_release.obj[i]);
> +
> + put_qcom_tee_object(object);
> + }
> +
> + return ASYNC_OP_RELEASE_MSG_SIZE(i);
> +}
> +
> +/* '__fetch__async_reqs' is a handler dispatcher (from TEE). */
> +
> +void __fetch__async_reqs(struct qcom_tee_object_invoke_ctx *oic)
> +{
> + size_t consumed, used = 0;
> +
> + struct qcom_tee_buffer async_buffer = async_qcom_tee_buffer(oic);
> +
> + while (async_buffer.size - used > ASYNC_HEADER_SIZE) {
> + struct async_msg *async_msg = (struct async_msg *)(async_buffer.addr + used);
> +
> + /* TEE assumes unused buffer is set to zero. */
> + if (!async_msg->header.version)
> + goto out;
> +
> + switch (async_msg->header.op) {
> + case ASYNC_OP_RELEASE:
> + consumed = async_release_handler(oic,
> + async_msg, async_buffer.size - used);
> +
> + break;
> + default: /* Unsupported operations. */
> + consumed = 0;
> + }
> +
> + used += align_offset(consumed);
> +
> + if (!consumed) {
> + pr_err("Drop async buffer (context_id %d): buffer %p, (%p, %zx), processed %zx\n",
Should it really go to the kmsg?
> + oic->context_id,
> + oic->out.msg.addr, /* Address of Output buffer. */
> + async_buffer.addr, /* Address of beginning of async buffer. */
> + async_buffer.size, /* Available size of async buffer. */
> + used); /* Processed async buffer. */
> +
> + goto out;
> + }
> + }
> +
> + out:
> +
> + memset(async_buffer.addr, 0, async_buffer.size);
Why?
> +}
> diff --git a/drivers/firmware/qcom/qcom_object_invoke/core.c b/drivers/firmware/qcom/qcom_object_invoke/core.c
> new file mode 100644
> index 000000000000..37dde8946b08
> --- /dev/null
> +++ b/drivers/firmware/qcom/qcom_object_invoke/core.c
> @@ -0,0 +1,1139 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/delay.h>
> +#include <linux/mm.h>
> +#include <linux/xarray.h>
> +
> +#include "core.h"
> +
> +/* Static 'Primordial Object' operations. */
> +
> +#define OBJECT_OP_YIELD 1
> +#define OBJECT_OP_SLEEP 2
> +
> +/* static_qcom_tee_object_primordial always exists. */
> +/* primordial_object_register and primordial_object_release extends it. */
> +
> +static struct qcom_tee_object static_qcom_tee_object_primordial;
> +
> +static int primordial_object_register(struct qcom_tee_object *object);
> +static void primordial_object_release(struct qcom_tee_object *object);
> +
> +/* Marshaling API. */
> +/*
> + * prepare_msg - Prepares input buffer for sending to TEE.
> + * update_args - Parses TEE response in input buffer.
> + * prepare_args - Parses TEE request from output buffer.
> + * update_msg - Updates output buffer with response for TEE request.
> + *
> + * prepare_msg and update_args are used in direct TEE object invocation.
> + * prepare_args and update_msg are used for TEE requests (callback or async).
> + */
> +
> +static int prepare_msg(struct qcom_tee_object_invoke_ctx *oic,
> + struct qcom_tee_object *object, unsigned long op, struct qcom_tee_arg u[]);
> +static int update_args(struct qcom_tee_arg u[], struct qcom_tee_object_invoke_ctx *oic);
> +static int prepare_args(struct qcom_tee_object_invoke_ctx *oic);
> +static int update_msg(struct qcom_tee_object_invoke_ctx *oic);
Please reorder the functions so that you don't need forward
declarations.
> +
> +static int next_arg_type(struct qcom_tee_arg u[], int i, enum qcom_tee_arg_type type)
> +{
> + while (u[i].type != QCOM_TEE_ARG_TYPE_END && u[i].type != type)
> + i++;
> +
> + return i;
> +}
> +
> +/**
> + * args_for_each_type - Iterate over argument of given type.
> + * @i: index in @args.
> + * @args: array of arguments.
> + * @at: type of argument.
> + */
> +#define args_for_each_type(i, args, at) \
> + for (i = 0, i = next_arg_type(args, i, at); \
> + args[i].type != QCOM_TEE_ARG_TYPE_END; i = next_arg_type(args, ++i, at))
> +
> +#define arg_for_each_input_buffer(i, args) args_for_each_type(i, args, QCOM_TEE_ARG_TYPE_IB)
> +#define arg_for_each_output_buffer(i, args) args_for_each_type(i, args, QCOM_TEE_ARG_TYPE_OB)
> +#define arg_for_each_input_object(i, args) args_for_each_type(i, args, QCOM_TEE_ARG_TYPE_IO)
> +#define arg_for_each_output_object(i, args) args_for_each_type(i, args, QCOM_TEE_ARG_TYPE_OO)
> +
> +/* Outside this file we use struct qcom_tee_object to identify an object. */
> +
> +/* We only allocate IDs with QCOM_TEE_OBJ_NS_BIT set in range
> + * [QCOM_TEE_OBJECT_ID_START .. QCOM_TEE_OBJECT_ID_END]. qcom_tee_object
> + * represents non-secure object. The first ID with QCOM_TEE_OBJ_NS_BIT set is reserved
> + * for primordial object.
> + */
> +
> +#define QCOM_TEE_OBJECT_PRIMORDIAL (QCOM_TEE_OBJ_NS_BIT)
> +#define QCOM_TEE_OBJECT_ID_START (QCOM_TEE_OBJECT_PRIMORDIAL + 1)
> +#define QCOM_TEE_OBJECT_ID_END (UINT_MAX)
> +
> +#define SET_QCOM_TEE_OBJECT(p, type, ...) __SET_QCOM_TEE_OBJECT(p, type, ##__VA_ARGS__, 0UL)
> +#define __SET_QCOM_TEE_OBJECT(p, type, optr, ...) do { \
> + (p)->object_type = (type); \
> + (p)->info.object_ptr = (unsigned long)(optr); \
> + (p)->release = NULL; \
> + } while (0)
> +
> +/* ''TEE Object Table''. */
> +static DEFINE_XARRAY_ALLOC(xa_qcom_tee_objects);
> +
> +struct qcom_tee_object *allocate_qcom_tee_object(void)
> +{
> + struct qcom_tee_object *object;
I thought that struct qcom_tee_object should be embedded into other
struct definitions. Here you are just allocing it. Did I misunderstand
something?
> +
> + object = kzalloc(sizeof(*object), GFP_KERNEL);
> + if (object)
> + SET_QCOM_TEE_OBJECT(object, QCOM_TEE_OBJECT_TYPE_NULL);
> +
> + return object;
> +}
> +EXPORT_SYMBOL_GPL(allocate_qcom_tee_object);
kerneldoc for all exported functions.
> +
> +void free_qcom_tee_object(struct qcom_tee_object *object)
> +{
> + kfree(object);
> +}
> +EXPORT_SYMBOL_GPL(free_qcom_tee_object);
If qcom_tee_object is refcounted, then such API is defintely forbidden.
> +
> +/* 'get_qcom_tee_object' and 'put_qcom_tee_object'. */
> +
> +static int __free_qcom_tee_object(struct qcom_tee_object *object);
What is the difference between free_qcom_tee_object() and
__free_qcom_tee_object() ?
> +static void ____destroy_qcom_tee_object(struct kref *refcount)
> +{
> + struct qcom_tee_object *object = container_of(refcount, struct qcom_tee_object, refcount);
> +
> + __free_qcom_tee_object(object);
> +}
> +
> +int get_qcom_tee_object(struct qcom_tee_object *object)
> +{
> + if (object != NULL_QCOM_TEE_OBJECT &&
> + object != ROOT_QCOM_TEE_OBJECT)
> + return kref_get_unless_zero(&object->refcount);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(get_qcom_tee_object);
> +
> +static struct qcom_tee_object *qcom_tee__get_qcom_tee_object(unsigned int object_id)
> +{
> + XA_STATE(xas, &xa_qcom_tee_objects, object_id);
> + struct qcom_tee_object *object;
> +
> + rcu_read_lock();
> + do {
> + object = xas_load(&xas);
> + if (xa_is_zero(object))
> + object = NULL_QCOM_TEE_OBJECT;
> +
> + } while (xas_retry(&xas, object));
If you are just looping over the objects, why do you need XArray instead
of list?
> +
> + /* Sure object still exists. */
Why?
> + if (!get_qcom_tee_object(object))
> + object = NULL_QCOM_TEE_OBJECT;
> +
> + rcu_read_unlock();
> +
> + return object;
> +}
> +
> +struct qcom_tee_object *qcom_tee_get_qcom_tee_object(unsigned int object_id)
> +{
> + switch (object_id) {
> + case QCOM_TEE_OBJECT_PRIMORDIAL:
> + return &static_qcom_tee_object_primordial;
> +
> + default:
> + return qcom_tee__get_qcom_tee_object(object_id);
> + }
> +}
> +
> +void put_qcom_tee_object(struct qcom_tee_object *object)
> +{
> + if (object != &static_qcom_tee_object_primordial &&
> + object != NULL_QCOM_TEE_OBJECT &&
> + object != ROOT_QCOM_TEE_OBJECT)
misaligned
> + kref_put(&object->refcount, ____destroy_qcom_tee_object);
> +}
> +EXPORT_SYMBOL_GPL(put_qcom_tee_object);
> +
> +/* 'alloc_qcom_tee_object_id' and 'erase_qcom_tee_object'. */
huh?
I think I'm going to stop here. Please:
- Split the driver into logical chunks and more logical pieces.
- Rename functions and structures to follow generic scheme. Usually it
is prefix_object_do_something().
- Add documentation that would allow us to understand what is going on.
- Also document some general design decisions. Usage of XArray. API
choice. Refcounting.
[skipped]
> diff --git a/include/linux/firmware/qcom/qcom_object_invoke.h b/include/linux/firmware/qcom/qcom_object_invoke.h
> new file mode 100644
> index 000000000000..9e6acd0f4db0
> --- /dev/null
> +++ b/include/linux/firmware/qcom/qcom_object_invoke.h
> @@ -0,0 +1,233 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef __QCOM_OBJECT_INVOKE_H
> +#define __QCOM_OBJECT_INVOKE_H
> +
> +#include <linux/kref.h>
> +#include <linux/completion.h>
> +#include <linux/workqueue.h>
> +#include <uapi/misc/qcom_tee.h>
This header doesn't exist yet. This obviously means that you haven't
actually tried building this patch. Please make sure that kernel
compiles successfully after each commit.
> +
> +struct qcom_tee_object;
> +
> +/* Primordial Object */
> +
> +/* It is used for bootstrapping the IPC connection between a VM and TEE.
> + *
> + * Each side (both the VM and the TEE) starts up with no object received from the
> + * other side. They both ''assume'' the other side implements a permanent initial
> + * object in the object table.
> + *
> + * TEE's initial object is typically called the ''root client env'', and it's
> + * invoked by VMs when they want to get a new clientEnv. The initial object created
> + * by the VMs is invoked by TEE, it's typically called the ''primordial object''.
> + *
> + * VM can register a primordial object using 'init_qcom_tee_object_user' with
> + * 'QCOM_TEE_OBJECT_TYPE_ROOT' type.
> + */
> +
> +enum qcom_tee_object_type {
> + QCOM_TEE_OBJECT_TYPE_USER = 0x1, /* TEE object. */
> + QCOM_TEE_OBJECT_TYPE_CB_OBJECT = 0x2, /* Callback Object. */
> + QCOM_TEE_OBJECT_TYPE_ROOT = 0x8, /* ''Root client env.'' or 'primordial' Object. */
> + QCOM_TEE_OBJECT_TYPE_NULL = 0x10, /* NULL object. */
> +};
> +
> +enum qcom_tee_arg_type {
> + QCOM_TEE_ARG_TYPE_END = 0,
> + QCOM_TEE_ARG_TYPE_IB = 0x80, /* Input Buffer (IB). */
> + QCOM_TEE_ARG_TYPE_OB = 0x1, /* Output Buffer (OB). */
> + QCOM_TEE_ARG_TYPE_IO = 0x81, /* Input Object (IO). */
> + QCOM_TEE_ARG_TYPE_OO = 0x2 /* Output Object (OO). */
> +};
> +
> +#define QCOM_TEE_ARG_TYPE_INPUT_MASK 0x80
> +
> +/* Maximum specific type of arguments (i.e. IB, OB, IO, and OO) that can fit in a TEE message. */
> +#define QCOM_TEE_ARGS_PER_TYPE 16
Why is it 16?
> +
> +/* Maximum arguments that can fit in a TEE message. */
> +#define QCOM_TEE_ARGS_MAX (QCOM_TEE_ARGS_PER_TYPE * 4)
> +
> +/**
> + * struct qcom_tee_arg - Argument for TEE object invocation.
> + * @type: type of argument
> + * @flags: extra flags.
> + * @b: address and size if type of argument is buffer.
> + * @o: qcom_tee_object instance if type of argument is object.
> + *
> + * @flags only accept QCOM_TEE_ARG_FLAGS_UADDR for now which states that @b
> + * contains userspace address in uaddr.
> + *
> + */
> +struct qcom_tee_arg {
> + enum qcom_tee_arg_type type;
> +
> +/* 'uaddr' holds a __user address. */
> +#define QCOM_TEE_ARG_FLAGS_UADDR 1
> + char flags;
This is not a character.
> + union {
> + struct qcom_tee_buffer {
> + union {
> + void *addr;
> + void __user *uaddr;
> + };
> + size_t size;
> + } b;
> + struct qcom_tee_object *o;
> + };
How can the code distinguish between the qcom_tee_object and
qcom_tee_buffer here?
> +};
> +
> +static inline int size_of_arg(struct qcom_tee_arg u[])
length, not size.
> +{
> + int i = 0;
> +
> + while (u[i].type != QCOM_TEE_ARG_TYPE_END)
> + i++;
> +
> + return i;
> +}
> +
> +/* Context ID - It is a unique ID assigned to a invocation which is in progress.
> + * Objects's dispatcher can use the ID to differentiate between concurrent calls.
> + * ID [0 .. 10) are reserved, i.e. never passed to object's dispatcher.
Is 10 included or excluded here? Why does it have a different bracket
type?
> + */
> +
> +struct qcom_tee_object_invoke_ctx {
> + unsigned int context_id;
> +
> +#define OIC_FLAG_BUSY 1 /* Context is busy, i.e. callback is in progress. */
> +#define OIC_FLAG_NOTIFY 2 /* Context needs to notify the current object. */
> +#define OIC_FLAG_QCOM_TEE 4 /* Context has objects shared with TEE. */
BIT(n)
> + unsigned int flags;
> +
> + /* Current object invoked in this callback context. */
> + struct qcom_tee_object *object;
> +
> + /* Arguments passed to dispatch callback. */
> + struct qcom_tee_arg u[QCOM_TEE_ARGS_MAX + 1];
> +
> + int errno;
> +
> + /* Inbound and Outbound buffers shared with TEE. */
> + struct {
> + struct qcom_tee_buffer msg;
Please define struct qcom_tee_buffer in a top-level definition instead
of having it nested somewhere in another struct;
> + } in, out;
> +};
> +
> +int qcom_tee_object_do_invoke(struct qcom_tee_object_invoke_ctx *oic,
> + struct qcom_tee_object *object, unsigned long op, struct qcom_tee_arg u[], int *result);
What's the difference between a result that gets returned by the
function and the result that gets retuned via the pointer?
> +
> +#define QCOM_TEE_OBJECT_OP_METHOD_MASK 0x0000FFFFU
> +#define QCOM_TEE_OBJECT_OP_METHOD_ID(op) ((op) & QCOM_TEE_OBJECT_OP_METHOD_MASK)
> +
> +/* Reserved Operations. */
> +
> +#define QCOM_TEE_OBJECT_OP_RELEASE (QCOM_TEE_OBJECT_OP_METHOD_MASK - 0)
> +#define QCOM_TEE_OBJECT_OP_RETAIN (QCOM_TEE_OBJECT_OP_METHOD_MASK - 1)
> +#define QCOM_TEE_OBJECT_OP_NO_OP (QCOM_TEE_OBJECT_OP_METHOD_MASK - 2)
> +
> +struct qcom_tee_object_operations {
> + void (*release)(struct qcom_tee_object *object);
> +
> + /**
> + * @op_supported:
> + *
> + * Query made to make sure the requested operation is supported. If defined,
> + * it is called before marshaling of the arguments (as optimisation).
> + */
> + int (*op_supported)(unsigned long op);
> +
> + /**
> + * @notify:
> + *
> + * After @dispatch returned, it is called to notify the status of the transport;
> + * i.e. transport errors or success. This allows the client to cleanup, if
> + * the transport fails after @dispatch submits a SUCCESS response.
> + */
> + void (*notify)(unsigned int context_id, struct qcom_tee_object *object, int status);
> +
> + int (*dispatch)(unsigned int context_id, struct qcom_tee_object *object,
> + unsigned long op, struct qcom_tee_arg args[]);
> +
> + /**
> + * @param_to_object:
> + *
> + * Called by core to do the object dependent marshaling from @param to an
> + * instance of @object (NOT IMPLEMENTED YET).
> + */
> + int (*param_to_object)(struct qcom_tee_param *param, struct qcom_tee_object *object);
> +
> + int (*object_to_param)(struct qcom_tee_object *object, struct qcom_tee_param *param);
> +};
> +
> +struct qcom_tee_object {
> + const char *name;
> + struct kref refcount;
> +
> + enum qcom_tee_object_type object_type;
> + union object_info {
> + unsigned long object_ptr;
> + } info;
> +
> + struct qcom_tee_object_operations *ops;
> +
> + /* see release_wq.c. */
> + struct work_struct work;
> +
> + /* Callback for any internal cleanup before the object's release. */
> + void (*release)(struct qcom_tee_object *object);
> +};
> +
> +/* Static instances of qcom_tee_object objects. */
> +
> +#define NULL_QCOM_TEE_OBJECT ((struct qcom_tee_object *)(0))
> +
> +/* ROOT_QCOM_TEE_OBJECT aka ''root client env''. */
> +#define ROOT_QCOM_TEE_OBJECT ((struct qcom_tee_object *)(1))
My gut feeling is that an invalid non-null pointer is a path to
disaster.
> +
> +static inline enum qcom_tee_object_type typeof_qcom_tee_object(struct qcom_tee_object *object)
> +{
> + if (object == NULL_QCOM_TEE_OBJECT)
> + return QCOM_TEE_OBJECT_TYPE_NULL;
> +
> + if (object == ROOT_QCOM_TEE_OBJECT)
> + return QCOM_TEE_OBJECT_TYPE_ROOT;
> +
> + return object->object_type;
> +}
> +
> +static inline const char *qcom_tee_object_name(struct qcom_tee_object *object)
> +{
> + if (object == NULL_QCOM_TEE_OBJECT)
> + return "null";
> +
> + if (object == ROOT_QCOM_TEE_OBJECT)
> + return "root";
> +
> + if (!object->name)
> + return "noname";
> +
> + return object->name;
> +}
> +
> +struct qcom_tee_object *allocate_qcom_tee_object(void);
> +void free_qcom_tee_object(struct qcom_tee_object *object);
> +
> +/**
> + * init_qcom_tee_object_user - Initialize an instance of qcom_tee_object.
> + * @object: object being initialized.
> + * @ot: type of object.
> + * @ops: sets of callback opeartions.
> + * @fmt: object name.
> + */
> +int init_qcom_tee_object_user(struct qcom_tee_object *object, enum qcom_tee_object_type ot,
> + struct qcom_tee_object_operations *ops, const char *fmt, ...);
> +
> +int get_qcom_tee_object(struct qcom_tee_object *object);
> +void put_qcom_tee_object(struct qcom_tee_object *object);
> +
> +#endif /* __QCOM_OBJECT_INVOKE_H */
>
> --
> 2.34.1
>
--
With best wishes
Dmitry
On Tue, Jul 02, 2024 at 10:57:35PM GMT, Amirreza Zarrabi wrote:
> Qualcomm TEE hosts Trusted Applications (TAs) and services that run in
> the secure world. Access to these resources is provided using MinkIPC.
> MinkIPC is a capability-based synchronous message passing facility. It
> allows code executing in one domain to invoke objects running in other
> domains. When a process holds a reference to an object that lives in
> another domain, that object reference is a capability. Capabilities
> allow us to separate implementation of policies from implementation of
> the transport.
>
> As part of the upstreaming of the object invoke driver (called SMC-Invoke
> driver), we need to provide a reasonable kernel API and UAPI. The clear
> option is to use TEE subsystem and write a back-end driver, however the
> TEE subsystem doesn't fit with the design of Qualcomm TEE.
>
> Does TEE subsystem fit requirements of a capability based system?
> -----------------------------------------------------------------
> In TEE subsystem, to invoke a function:
> - client should open a device file "/dev/teeX",
> - create a session with a TA, and
> - invoke the functions in that session.
>
> 1. The privilege to invoke a function is determined by a session. If a
> client has a session, it cannot share it with other clients. Even if
> it does, it is not fine-grained enough, i.e. either all accessible
> functions/resources in a session or none. Assume a scenario when a client
> wants to grant a permission to invoke just a function that it has the rights,
> to another client.
>
> The "all or nothing" for sharing sessions is not in line with our
> capability system: "if you own a capability, you should be able to grant
> or share it".
Can you please be more specific here? What kind of sharing is expected
on the user side of it?
> 2. In TEE subsystem, resources are managed in a context. Every time a
> client opens "/dev/teeX", a new context is created to keep track of
> the allocated resources, including opened sessions and remote objects. Any
> effort for sharing resources between two independent clients requires
> involvement of context manager, i.e. the back-end driver. This requires
> implementing some form of policy in the back-end driver.
What kind of resource sharing?
> 3. The TEE subsystem supports two type of memory sharing:
> - per-device memory pools, and
> - user defined memory references.
> User defined memory references are private to the application and cannot
> be shared. Memory allocated from per-device "shared" pools are accessible
> using a file descriptor. It can be mapped by any process if it has
> access to it. This means, we cannot provide the resource isolation
> between two clients. Assume a scenario when a client wants to allocate a
> memory (which is shared with TEE) from an "isolated" pool and share it
> with another client, without the right to access the contents of memory.
This doesn't explain, why would it want to share such memory with
another client.
> 4. The kernel API provided by TEE subsystem does not support a kernel
> supplicant. Adding support requires an execution context (e.g. a
> kernel thread) due to the TEE subsystem design. tee_driver_ops supports
> only "send" and "receive" callbacks and to deliver a request, someone
> should wait on "receive".
There is nothing wrong here, but maybe I'm misunderstanding something.
> We need a callback to "dispatch" or "handle" a request in the context of
> the client thread. It should redirect a request to a kernel service or
> a user supplicant. In TEE subsystem such requirement should be implemented
> in TEE back-end driver, independent from the TEE subsystem.
>
> 5. The UAPI provided by TEE subsystem is similar to the GPTEE Client
> interface. This interface is not suitable for a capability system.
> For instance, there is no session in a capability system which means
> either its should not be used, or we should overload its definition.
General comment: maybe adding more detailed explanation of how the
capabilities are aquired and how they can be used might make sense.
BTW. It might be my imperfect English, but each time I see the word
'capability' I'm thinking that some is capable of doing something. I
find it hard to use 'capability' for the reference to another object.
>
> Can we use TEE subsystem?
> -------------------------
> There are workarounds for some of the issues above. The question is if we
> should define our own UAPI or try to use a hack-y way of fitting into
> the TEE subsystem. I am using word hack-y, as most of the workaround
> involves:
>
> - "diverging from the definition". For instance, ignoring the session
> open and close ioctl calls or use file descriptors for all remote
> resources (as, fd is the closet to capability) which undermines the
> isolation provided by the contexts,
>
> - "overloading the variables". For instance, passing object ID as file
> descriptors in a place of session ID, or
>
> - "bypass TEE subsystem". For instance, extensively rely on meta
> parameters or push everything (e.g. kernel services) to the back-end
> driver, which means leaving almost all TEE subsystem unused.
>
> We cannot take the full benefits of TEE subsystem and may need to
> implement most of the requirements in the back-end driver. Also, as
> discussed above, the UAPI is not suitable for capability-based use cases.
> We proposed a new set of ioctl calls for SMC-Invoke driver.
>
> In this series we posted three patches. We implemented a transport
> driver that provides qcom_tee_object. Any object on secure side is
> represented with an instance of qcom_tee_object and any struct exposed
> to TEE should embed an instance of qcom_tee_object. Any, support for new
> services, e.g. memory object, RPMB, userspace clients or supplicants are
> implemented independently from the driver.
>
> We have a simple memory object and a user driver that uses
> qcom_tee_object.
Could you please point out any user for the uAPI? I'd like to understand
how does it from from the userspace point of view.
>
> Signed-off-by: Amirreza Zarrabi <quic_azarrabi(a)quicinc.com>
> ---
> Amirreza Zarrabi (3):
> firmware: qcom: implement object invoke support
> firmware: qcom: implement memory object support for TEE
> firmware: qcom: implement ioctl for TEE object invocation
>
> drivers/firmware/qcom/Kconfig | 36 +
> drivers/firmware/qcom/Makefile | 2 +
> drivers/firmware/qcom/qcom_object_invoke/Makefile | 12 +
> drivers/firmware/qcom/qcom_object_invoke/async.c | 142 +++
> drivers/firmware/qcom/qcom_object_invoke/core.c | 1139 ++++++++++++++++++
> drivers/firmware/qcom/qcom_object_invoke/core.h | 186 +++
> .../qcom/qcom_object_invoke/qcom_scm_invoke.c | 22 +
> .../firmware/qcom/qcom_object_invoke/release_wq.c | 90 ++
> .../qcom/qcom_object_invoke/xts/mem_object.c | 406 +++++++
> .../qcom_object_invoke/xts/object_invoke_uapi.c | 1231 ++++++++++++++++++++
> include/linux/firmware/qcom/qcom_object_invoke.h | 233 ++++
> include/uapi/misc/qcom_tee.h | 117 ++
> 12 files changed, 3616 insertions(+)
> ---
> base-commit: 74564adfd3521d9e322cfc345fdc132df80f3c79
> change-id: 20240702-qcom-tee-object-and-ioctls-6f52fde03485
>
> Best regards,
> --
> Amirreza Zarrabi <quic_azarrabi(a)quicinc.com>
>
--
With best wishes
Dmitry
Am 27.06.24 um 05:21 schrieb Jason-JH Lin (林睿祥):
>
> On Wed, 2024-06-26 at 19:56 +0200, Daniel Vetter wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> > On Wed, Jun 26, 2024 at 12:49:02PM +0200, Christian König wrote:
> > > Am 26.06.24 um 10:05 schrieb Jason-JH Lin (林睿祥):
> > > > > > I think I have the same problem as the ECC_FLAG mention in:
> > > > > > > >
> > https://lore.kernel.org/linux-media/20240515-dma-buf-ecc-heap-v1-0-54cbbd04…
> > > > > > > > I think it would be better to have the user configurable
> > private
> > > > > > information in dma-buf, so all the drivers who have the same
> > > > > > requirement can get their private information from dma-buf
> > directly
> > > > > > and
> > > > > > no need to change or add the interface.
> > > > > > > > What's your opinion in this point?
> > > > > > Well of hand I don't see the need for that.
> > > > > > What happens if you get a non-secure buffer imported in your
> > secure
> > > > > device?
> > > >
> > > > We use the same mediatek-drm driver for secure and non-secure
> > buffer.
> > > > If non-secure buffer imported to mediatek-drm driver, it's go to
> > the
> > > > normal flow with normal hardware settings.
> > > >
> > > > We use different configurations to make hardware have different
> > > > permission to access the buffer it should access.
> > > >
> > > > So if we can't get the information of "the buffer is allocated
> > from
> > > > restricted_mtk_cma" when importing the buffer into the driver, we
> > won't
> > > > be able to configure the hardware correctly.
> > >
> > > Why can't you get this information from userspace?
> >
> > Same reason amd and i915/xe also pass this around internally in the
> > kernel, it's just that for those gpus the render and kms node are the
> > same
> > driver so this is easy.
> >
The reason I ask is that encryption here looks just like another
parameter for the buffer, e.g. like format, stride, tilling etc..
So instead of this during buffer import:
mtk_gem->secure = (!strncmp(attach->dmabuf->exp_name, "restricted", 10));
mtk_gem->dma_addr = sg_dma_address(sg->sgl);
mtk_gem->size = attach->dmabuf->size;
mtk_gem->sg = sg;
You can trivially say during use hey this buffer is encrypted.
At least that's my 10 mile high view, maybe I'm missing some extensive
key exchange or something like that.
>
> > But on arm you have split designs everywhere and dma-buf
> > import/export, so
> > something else is needed. And neither current kms uapi nor
> > protocols/extensions have provisions for this (afaik) because it
> > works on
> > the big gpus, and on android it's just hacked up with backchannels.
> >
> > So yeah essentially I think we probably need something like this, as
> > much
> > as it sucks. I see it somewhat similar to handling pcip2pdma
> > limitations
> > in the kernel too.
> >
> > Not sure where/how it should be handled though, and maybe I've missed
> > something around protocols, in which case I guess we should add some
> > secure buffer flags to the ADDFB2 ioctl.
>
> Thanks for your hint, I'll try to add the secure flag to the ADDFB2
> ioctl. If it works, I'll send the patch.
Yeah, exactly what I would suggest as well.
I'm not an expert for that part, but as far as I know we already have
bunch of device specific tilling flags in there.
Adding an MTK_ENCRYPTED flag should be trivial.
Regards,
Christian.
>
> Regards,
> Jason-JH.Lin
>
> > -Sima
>
> ************* MEDIATEK Confidentiality Notice ********************
> The information contained in this e-mail message (including any
> attachments) may be confidential, proprietary, privileged, or otherwise
> exempt from disclosure under applicable laws. It is intended to be
> conveyed only to the designated recipient(s). Any use, dissemination,
> distribution, printing, retaining or copying of this e-mail (including its
> attachments) by unintended recipient(s) is strictly prohibited and may
> be unlawful. If you are not an intended recipient of this e-mail, or believe
> that you have received this e-mail in error, please notify the sender
> immediately (by replying to this e-mail), delete any and all copies of
> this e-mail (including any attachments) from your system, and do not
> disclose the content of this e-mail to any other person. Thank you!