Hi,
On Mon, Oct 23, 2023 at 10:25:50AM -0700, Doug Anderson wrote:
> On Mon, Oct 23, 2023 at 9:31 AM Yuran Pereira <yuran.pereira(a)hotmail.com> wrote:
> >
> > Since "Clean up checks for already prepared/enabled in panels" has
> > already been done and merged [1], I think there is no longer a need
> > for this item to be in the gpu TODO.
> >
> > [1] https://patchwork.freedesktop.org/patch/551421/
> >
> > Signed-off-by: Yuran Pereira <yuran.pereira(a)hotmail.com>
> > ---
> > Documentation/gpu/todo.rst | 25 -------------------------
> > 1 file changed, 25 deletions(-)
>
> It's not actually all done. It's in a bit of a limbo state right now,
> unfortunately. I landed all of the "simple" cases where panels were
> needlessly tracking prepare/enable, but the less simple cases are
> still outstanding.
>
> Specifically the issue is that many panels have code to properly power
> cycle themselves off at shutdown time and in order to do that they
> need to keep track of the prepare/enable state. After a big, long
> discussion [1] it was decided that we could get rid of all the panel
> code handling shutdown if only all relevant DRM KMS drivers would
> properly call drm_atomic_helper_shutdown().
>
> I made an attempt to get DRM KMS drivers to call
> drm_atomic_helper_shutdown() [2] [3] [4]. I was able to land the
> patches that went through drm-misc, but currently many of the
> non-drm-misc ones are blocked waiting for attention.
>
> ...so things that could be done to help out:
>
> a) Could review patches that haven't landed in [4]. Maybe adding a
> Reviewed-by tag would help wake up maintainers?
>
> b) Could see if you can identify panels that are exclusively used w/
> DRM drivers that have already been converted and then we could post
> patches for just those panels. I have no idea how easy this task would
> be. Is it enough to look at upstream dts files by "compatible" string?
I think it is, yes.
Maxime
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
This a big update compared to the previous patch set [1].
The TEE subsystem handles the DMA-buf allocations since it is the TEE
(OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QTEE) which sets up the
restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
a use-case parameter. This is used by the backend TEE driver to decide on
allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
Recording) has been identified so far to serve as examples of what can be
expected. More use-cases can be added in userspace ABI, but it's up to the
backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases
requires isolation from different parts of the system. A restricted memory
pool can be based on a static carveout instantiated while probing the TEE
backend driver, or dynamically allocated from CMA and made restricted as
needed by the TEE.
This can be tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b prototype/sdp-v3
repo sync -j8
cd build
make toolchains -j$(nproc)
make SPMC_AT_EL=1 all -j$(nproc)
make SPMC_AT_EL=1 run-only
# login and at the prompt:
xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
without FF-A using the original SMC ABI instead. Please remember to do
%rm -rf ../trusted-firmware-a/build/qemu
for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in
the secure world can access and manipulate the memory. There are also some
negative tests for out of bounds buffers etc.
Thanks,
Jens
[1] https://lore.kernel.org/lkml/20241015101716.740829-1-jens.wiklander@linaro.…
Changes since the V2 RFC:
* Based on v6.12
* Replaced the flags for SVP and Trusted UID memory with a u32 field with
unique id for each use case
* Added dynamic allocation of restricted memory pools
* Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
* Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
* Based on v6.11
* Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
support"
* Replaced the word "secure" with "restricted" where applicable
Jens Wiklander (4):
tee: add restricted memory allocation
optee: account for direction while converting parameters
optee: sync secure world ABI headers
optee: support restricted memory allocation
drivers/tee/Makefile | 1 +
drivers/tee/optee/Makefile | 1 +
drivers/tee/optee/call.c | 10 +-
drivers/tee/optee/core.c | 1 +
drivers/tee/optee/ffa_abi.c | 178 +++++++++++++-
drivers/tee/optee/optee_ffa.h | 27 ++-
drivers/tee/optee/optee_msg.h | 65 ++++-
drivers/tee/optee/optee_private.h | 75 ++++--
drivers/tee/optee/optee_smc.h | 71 +++++-
drivers/tee/optee/rpc.c | 31 ++-
drivers/tee/optee/rstmem.c | 380 ++++++++++++++++++++++++++++++
drivers/tee/optee/smc_abi.c | 214 +++++++++++++++--
drivers/tee/tee_core.c | 37 ++-
drivers/tee/tee_private.h | 2 +
drivers/tee/tee_rstmem.c | 201 ++++++++++++++++
drivers/tee/tee_shm.c | 2 +
drivers/tee/tee_shm_pool.c | 69 +++++-
include/linux/tee_core.h | 15 ++
include/linux/tee_drv.h | 4 +-
include/uapi/linux/tee.h | 37 ++-
20 files changed, 1344 insertions(+), 77 deletions(-)
create mode 100644 drivers/tee/optee/rstmem.c
create mode 100644 drivers/tee/tee_rstmem.c
--
2.43.0
On Thu, Nov 21, 2024 at 06:31:34PM +0530, Jyothi Kumar Seerapu wrote:
> I2C functionality has dependencies on the GPI driver.
> Ensure that the GPI driver is enabled when using the I2C
> driver functionality.
> Therefore, update the I2C GENI driver to depend on the GPI driver.
>
> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu(a)quicinc.com>
> ---
> v2 -> v3:
> - Moved this change to patch3.
> - Updated commit description.
>
> v1 -> v2:
> - This patch is added in v2 to address the kernel test robot
> reported compilation error.
> ERROR: modpost: "gpi_multi_desc_process" [drivers/i2c/busses/i2c-qcom-geni.ko] undefined!
>
> drivers/i2c/busses/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
> index 0aa948014008..87634a682855 100644
> --- a/drivers/i2c/busses/Kconfig
> +++ b/drivers/i2c/busses/Kconfig
> @@ -1049,6 +1049,7 @@ config I2C_QCOM_GENI
> tristate "Qualcomm Technologies Inc.'s GENI based I2C controller"
> depends on ARCH_QCOM || COMPILE_TEST
> depends on QCOM_GENI_SE
> + depends on QCOM_GPI_DMA
So... without this change the previous patch is broken, which is a
no-go. And anyway, adding dependency onto a particular DMA driver is a
bad idea. Please make use of the DMA API instead.
> help
> This driver supports GENI serial engine based I2C controller in
> master mode on the Qualcomm Technologies Inc.'s SoCs. If you say
> --
> 2.17.1
>
--
With best wishes
Dmitry
On Mon, Nov 18, 2024 at 1:15 AM Tvrtko Ursulin <tursulin(a)ursulin.net> wrote:
>
>
> On 17/11/2024 17:03, T.J. Mercier wrote:
> > The arguments for __dma_buf_debugfs_list_del do not match for both the
> > CONFIG_DEBUG_FS case and the !CONFIG_DEBUG_FS case. The !CONFIG_DEBUG_FS
> > case should take a struct dma_buf *, but it's currently struct file *.
> > This can lead to the build error:
> >
> > error: passing argument 1 of ‘__dma_buf_debugfs_list_del’ from
> > incompatible pointer type [-Werror=incompatible-pointer-types]
> >
> > dma-buf.c:63:53: note: expected ‘struct file *’ but argument is of
> > type ‘struct dma_buf *’
> > 63 | static void __dma_buf_debugfs_list_del(struct file *file)
> >
> > Fixes: bfc7bc539392 ("dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FS")
> > Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
> > ---
> > drivers/dma-buf/dma-buf.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 8892bc701a66..afb8c1c50107 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -60,7 +60,7 @@ static void __dma_buf_debugfs_list_add(struct dma_buf *dmabuf)
> > {
> > }
> >
> > -static void __dma_buf_debugfs_list_del(struct file *file)
> > +static void __dma_buf_debugfs_list_del(struct dma_buf *dmabuf)
> > {
> > }
> > #endif
>
> Huh I wonder how this sneaked by until now.. thanks for fixing!
>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
>
> Regards,
>
> Tvrtko
Thanks Tvrtko. Upstream there is currently only the one use where it's
called with a void pointer which doesn't generate the error, but
KernelCI caught the problem on an Android branch where it's also
called with a dma_buf pointer:
https://dashboard.kernelci.org/tree/5a4c93e2f794001a5efa13c0dec931235240d38…
The arguments for __dma_buf_debugfs_list_del do not match for both the
CONFIG_DEBUG_FS case and the !CONFIG_DEBUG_FS case. The !CONFIG_DEBUG_FS
case should take a struct dma_buf *, but it's currently struct file *.
This can lead to the build error:
error: passing argument 1 of ‘__dma_buf_debugfs_list_del’ from
incompatible pointer type [-Werror=incompatible-pointer-types]
dma-buf.c:63:53: note: expected ‘struct file *’ but argument is of
type ‘struct dma_buf *’
63 | static void __dma_buf_debugfs_list_del(struct file *file)
Fixes: bfc7bc539392 ("dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FS")
Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
---
drivers/dma-buf/dma-buf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 8892bc701a66..afb8c1c50107 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -60,7 +60,7 @@ static void __dma_buf_debugfs_list_add(struct dma_buf *dmabuf)
{
}
-static void __dma_buf_debugfs_list_del(struct file *file)
+static void __dma_buf_debugfs_list_del(struct dma_buf *dmabuf)
{
}
#endif
--
2.47.0.338.g60cca15819-goog
Am 14.11.24 um 12:14 schrieb Tvrtko Ursulin:
> From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
>
> One alternative to the fix Christian proposed in
> https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
> is to replace the rather complex open coded sorting loops with the kernel
> standard sort followed by a context squashing pass.
>
> Proposed advantage of this would be readability but one concern Christian
> raised was that there could be many fences, that they are typically mostly
> sorted, and so the kernel's heap sort would be much worse by the proposed
> algorithm.
>
> I had a look running some games and vkcube to see what are the typical
> number of input fences. Tested scenarios:
>
> 1) Hogwarts Legacy under Gamescope
>
> 450 calls per second to __dma_fence_unwrap_merge.
>
> Percentages per number of fences buckets, before and after checking for
> signalled status, sorting and flattening:
>
> N Before After
> 0 0.91%
> 1 69.40%
> 2-3 28.72% 9.4% (90.6% resolved to one fence)
> 4-5 0.93%
> 6-9 0.03%
> 10+
>
> 2) Cyberpunk 2077 under Gamescope
>
> 1050 calls per second, amounting to 0.01% CPU time according to perf top.
>
> N Before After
> 0 1.13%
> 1 52.30%
> 2-3 40.34% 55.57%
> 4-5 1.46% 0.50%
> 6-9 2.44%
> 10+ 2.34%
>
> 3) vkcube under Plasma
>
> 90 calls per second.
>
> N Before After
> 0
> 1
> 2-3 100% 0% (Ie. all resolved to a single fence)
> 4-5
> 6-9
> 10+
>
> In the case of vkcube all invocations in the 2-3 bucket were actually
> just two input fences.
>
> From these numbers it looks like the heap sort should not be a
> disadvantage, given how the dominant case is <= 2 input fences which heap
> sort solves with just one compare and swap. (And for the case of one input
> fence we have a fast path in the previous patch.)
>
> A complementary possibility is to implement a different sorting algorithm
> under the same API as the kernel's sort() and so keep the simplicity,
> potentially moving the new sort under lib/ if it would be found more
> widely useful.
>
> v2:
> * Hold on to fence references and reduce commentary. (Christian)
> * Record and use latest signaled timestamp in the 2nd loop too.
> * Consolidate zero or one fences fast paths.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
> Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
> Cc: Christian König <christian.koenig(a)amd.com>
> Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> Cc: Gustavo Padovan <gustavo(a)padovan.org>
> Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
> Cc: linux-media(a)vger.kernel.org
> Cc: dri-devel(a)lists.freedesktop.org
> Cc: linaro-mm-sig(a)lists.linaro.org
> Cc: <stable(a)vger.kernel.org> # v6.0+
> ---
> drivers/dma-buf/dma-fence-unwrap.c | 129 ++++++++++++++---------------
> 1 file changed, 64 insertions(+), 65 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
> index 628af51c81af..26cad03340ce 100644
> --- a/drivers/dma-buf/dma-fence-unwrap.c
> +++ b/drivers/dma-buf/dma-fence-unwrap.c
> @@ -12,6 +12,7 @@
> #include <linux/dma-fence-chain.h>
> #include <linux/dma-fence-unwrap.h>
> #include <linux/slab.h>
> +#include <linux/sort.h>
>
> /* Internal helper to start new array iteration, don't use directly */
> static struct dma_fence *
> @@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor)
> }
> EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
>
> +
> +static int fence_cmp(const void *_a, const void *_b)
> +{
> + struct dma_fence *a = *(struct dma_fence **)_a;
> + struct dma_fence *b = *(struct dma_fence **)_b;
> +
> + if (a->context < b->context)
> + return -1;
> + else if (a->context > b->context)
> + return 1;
> +
> + if (dma_fence_is_later(b, a))
> + return -1;
> + else if (dma_fence_is_later(a, b))
> + return 1;
> +
> + return 0;
> +}
> +
> /* Implementation for the dma_fence_merge() marco, don't use directly */
> struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> struct dma_fence **fences,
> @@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> struct dma_fence_array *result;
> struct dma_fence *tmp, **array;
> ktime_t timestamp;
> - unsigned int i;
> - size_t count;
> + int i, j, count;
>
> count = 0;
> timestamp = ns_to_ktime(0);
> @@ -96,78 +115,58 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> if (!array)
> return NULL;
>
> - /*
> - * This trashes the input fence array and uses it as position for the
> - * following merge loop. This works because the dma_fence_merge()
> - * wrapper macro is creating this temporary array on the stack together
> - * with the iterators.
> - */
> - for (i = 0; i < num_fences; ++i)
> - fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
> -
> count = 0;
> - do {
> - unsigned int sel;
> -
> -restart:
> - tmp = NULL;
> - for (i = 0; i < num_fences; ++i) {
> - struct dma_fence *next;
> -
> - while (fences[i] && dma_fence_is_signaled(fences[i]))
> - fences[i] = dma_fence_unwrap_next(&iter[i]);
> -
> - next = fences[i];
> - if (!next)
> - continue;
> -
> - /*
> - * We can't guarantee that inpute fences are ordered by
> - * context, but it is still quite likely when this
> - * function is used multiple times. So attempt to order
> - * the fences by context as we pass over them and merge
> - * fences with the same context.
> - */
> - if (!tmp || tmp->context > next->context) {
> - tmp = next;
> - sel = i;
> -
> - } else if (tmp->context < next->context) {
> - continue;
> -
> - } else if (dma_fence_is_later(tmp, next)) {
> - fences[i] = dma_fence_unwrap_next(&iter[i]);
> - goto restart;
> + for (i = 0; i < num_fences; ++i) {
> + dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
> + if (!dma_fence_is_signaled(tmp)) {
> + array[count++] = dma_fence_get(tmp);
> } else {
> - fences[sel] = dma_fence_unwrap_next(&iter[sel]);
> - goto restart;
> + ktime_t t = dma_fence_timestamp(tmp);
> +
> + if (ktime_after(t, timestamp))
> + timestamp = t;
> }
> }
> + }
>
> - if (tmp) {
> - array[count++] = dma_fence_get(tmp);
> - fences[sel] = dma_fence_unwrap_next(&iter[sel]);
> + if (count == 0 || count == 1)
> + goto return_fastpath;
> +
> + sort(array, count, sizeof(*array), fence_cmp, NULL);
> +
> + /*
> + * Only keep the most recent fence for each context.
> + */
> + j = 0;
> + tmp = array[0];
> + for (i = 1; i < count; i++) {
> + if (array[i]->context != tmp->context)
> + array[j++] = tmp;
> + else
> + dma_fence_put(tmp);
If I'm not completely mistaken that can result in dropping the first
element but not assigning it again.
E.g. array[0] is potentially invalid after the loop.
> + tmp = array[i];
> + }
> + if (j == 0 || tmp->context != array[j - 1]->context) {
> + array[j++] = tmp;
> + }
Maybe adjust the sort criteria so that the highest seqno comes first.
This reduces the whole loop to something like this:
j = 0;
for (i = 1; i < count; i++) {
if (array[i]->context == array[j]->context)
dma_fence_put(array[i]);
else
array[++j] = array[i];
}
count = ++j;
Regards,
Christian.
> + count = j;
> +
> + if (count > 1) {
> + result = dma_fence_array_create(count, array,
> + dma_fence_context_alloc(1),
> + 1, false);
> + if (!result) {
> + tmp = NULL;
> + goto return_tmp;
> }
> - } while (tmp);
> -
> - if (count == 0) {
> - tmp = dma_fence_allocate_private_stub(ktime_get());
> - goto return_tmp;
> + return &result->base;
> }
>
> - if (count == 1) {
> +return_fastpath:
> + if (count == 0)
> + tmp = dma_fence_allocate_private_stub(timestamp);
> + else
> tmp = array[0];
> - goto return_tmp;
> - }
> -
> - result = dma_fence_array_create(count, array,
> - dma_fence_context_alloc(1),
> - 1, false);
> - if (!result) {
> - tmp = NULL;
> - goto return_tmp;
> - }
> - return &result->base;
>
> return_tmp:
> kfree(array);