Hi Amir,
On Fri, Mar 28, 2025 at 3:48 AM Amirreza Zarrabi
<amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
> The tee_context can be used to manage TEE user resources, including
> those allocated by the driver for the TEE on behalf of the user.
> The release() callback is invoked only when all resources, such as
> tee_shm, are released and there are no references to the tee_context.
>
> When a user closes the device file, the driver should notify the
> TEE to release any resources it may hold and drop the context
> references. To achieve this, a close_context() callback is
> introduced to initiate resource release in the TEE driver when
> the device file is closed.
>
> Relocate teedev_ctx_get, teedev_ctx_put, tee_device_get, and
> tee_device_get functions to tee_drv.h to make them accessible
> outside the TEE subsystem.
>
> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com>
> ---
> drivers/tee/tee_core.c | 39 +++++++++++++++++++++++++++++++++++++++
> drivers/tee/tee_private.h | 6 ------
> include/linux/tee_core.h | 11 +++++++++--
> include/linux/tee_drv.h | 40 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 88 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> index 24edce4cdbaa..22cc7d624b0c 100644
> --- a/drivers/tee/tee_core.c
> +++ b/drivers/tee/tee_core.c
> @@ -72,6 +72,20 @@ struct tee_context *teedev_open(struct tee_device *teedev)
> }
> EXPORT_SYMBOL_GPL(teedev_open);
>
> +/**
> + * teedev_ctx_get() - Increment the reference count of a context
> + *
> + * This function increases the refcount of the context, which is tied to
> + * resources shared by the same tee_device. During the unregistration process,
> + * the context may remain valid even after tee_device_unregister() has returned.
> + *
> + * Users should ensure that the context's refcount is properly decreased before
> + * calling tee_device_put(), typically within the context's release() function.
> + * Alternatively, users can call tee_device_get() and teedev_ctx_get() together
> + * and release them simultaneously (see shm_alloc_helper()).
> + *
> + * @ctx: Pointer to the context
Please move this @ctx line to before the verbose description of the function.
Cheers,
Jens
> + */
> void teedev_ctx_get(struct tee_context *ctx)
> {
> if (ctx->releasing)
> @@ -79,6 +93,7 @@ void teedev_ctx_get(struct tee_context *ctx)
>
> kref_get(&ctx->refcount);
> }
> +EXPORT_SYMBOL_GPL(teedev_ctx_get);
>
> static void teedev_ctx_release(struct kref *ref)
> {
> @@ -89,6 +104,10 @@ static void teedev_ctx_release(struct kref *ref)
> kfree(ctx);
> }
>
> +/**
> + * teedev_ctx_put() - Decrease reference count on a context
> + * @ctx: pointer to the context
> + */
> void teedev_ctx_put(struct tee_context *ctx)
> {
> if (ctx->releasing)
> @@ -96,11 +115,15 @@ void teedev_ctx_put(struct tee_context *ctx)
>
> kref_put(&ctx->refcount, teedev_ctx_release);
> }
> +EXPORT_SYMBOL_GPL(teedev_ctx_put);
>
> void teedev_close_context(struct tee_context *ctx)
> {
> struct tee_device *teedev = ctx->teedev;
>
> + if (teedev->desc->ops->close_context)
> + teedev->desc->ops->close_context(ctx);
> +
> teedev_ctx_put(ctx);
> tee_device_put(teedev);
> }
> @@ -1024,6 +1047,10 @@ int tee_device_register(struct tee_device *teedev)
> }
> EXPORT_SYMBOL_GPL(tee_device_register);
>
> +/**
> + * tee_device_put() - Decrease the user count for a tee_device
> + * @teedev: pointer to the tee_device
> + */
> void tee_device_put(struct tee_device *teedev)
> {
> mutex_lock(&teedev->mutex);
> @@ -1037,7 +1064,18 @@ void tee_device_put(struct tee_device *teedev)
> }
> mutex_unlock(&teedev->mutex);
> }
> +EXPORT_SYMBOL_GPL(tee_device_put);
>
> +/**
> + * tee_device_get() - Increment the user count for a tee_device
> + * @teedev: Pointer to the tee_device
> + *
> + * If tee_device_unregister() has been called and the final user of @teedev
> + * has already released the device, this function will fail to prevent new users
> + * from accessing the device during the unregistration process.
> + *
> + * Returns: true if @teedev remains valid, otherwise false
> + */
> bool tee_device_get(struct tee_device *teedev)
> {
> mutex_lock(&teedev->mutex);
> @@ -1049,6 +1087,7 @@ bool tee_device_get(struct tee_device *teedev)
> mutex_unlock(&teedev->mutex);
> return true;
> }
> +EXPORT_SYMBOL_GPL(tee_device_get);
>
> /**
> * tee_device_unregister() - Removes a TEE device
> diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h
> index 9bc50605227c..d3f40a03de36 100644
> --- a/drivers/tee/tee_private.h
> +++ b/drivers/tee/tee_private.h
> @@ -14,12 +14,6 @@
>
> int tee_shm_get_fd(struct tee_shm *shm);
>
> -bool tee_device_get(struct tee_device *teedev);
> -void tee_device_put(struct tee_device *teedev);
> -
> -void teedev_ctx_get(struct tee_context *ctx);
> -void teedev_ctx_put(struct tee_context *ctx);
> -
> struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size);
> struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx,
> unsigned long addr, size_t length);
> diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h
> index a38494d6b5f4..8a4c9e30b652 100644
> --- a/include/linux/tee_core.h
> +++ b/include/linux/tee_core.h
> @@ -65,8 +65,9 @@ struct tee_device {
> /**
> * struct tee_driver_ops - driver operations vtable
> * @get_version: returns version of driver
> - * @open: called when the device file is opened
> - * @release: release this open file
> + * @open: called for a context when the device file is opened
> + * @close_context: called when the device file is closed
> + * @release: called to release the context
> * @open_session: open a new session
> * @close_session: close a session
> * @system_session: declare session as a system session
> @@ -76,11 +77,17 @@ struct tee_device {
> * @supp_send: called for supplicant to send a response
> * @shm_register: register shared memory buffer in TEE
> * @shm_unregister: unregister shared memory buffer in TEE
> + *
> + * The context given to @open might last longer than the device file if it is
> + * tied to other resources in the TEE driver. @close_context is called when the
> + * client closes the device file, even if there are existing references to the
> + * context. The TEE driver can use @close_context to start cleaning up.
> */
> struct tee_driver_ops {
> void (*get_version)(struct tee_device *teedev,
> struct tee_ioctl_version_data *vers);
> int (*open)(struct tee_context *ctx);
> + void (*close_context)(struct tee_context *ctx);
> void (*release)(struct tee_context *ctx);
> int (*open_session)(struct tee_context *ctx,
> struct tee_ioctl_open_session_arg *arg,
> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> index a54c203000ed..ce23fd42c5d4 100644
> --- a/include/linux/tee_drv.h
> +++ b/include/linux/tee_drv.h
> @@ -96,6 +96,46 @@ struct tee_param {
> } u;
> };
>
> +/**
> + * tee_device_get() - Increment the user count for a tee_device
> + * @teedev: Pointer to the tee_device
> + *
> + * If tee_device_unregister() has been called and the final user of @teedev
> + * has already released the device, this function will fail to prevent new users
> + * from accessing the device during the unregistration process.
> + *
> + * Returns: true if @teedev remains valid, otherwise false
> + */
> +bool tee_device_get(struct tee_device *teedev);
> +
> +/**
> + * tee_device_put() - Decrease the user count for a tee_device
> + * @teedev: pointer to the tee_device
> + */
> +void tee_device_put(struct tee_device *teedev);
> +
> +/**
> + * teedev_ctx_get() - Increment the reference count of a context
> + *
> + * This function increases the refcount of the context, which is tied to
> + * resources shared by the same tee_device. During the unregistration process,
> + * the context may remain valid even after tee_device_unregister() has returned.
> + *
> + * Users should ensure that the context's refcount is properly decreased before
> + * calling tee_device_put(), typically within the context's release() function.
> + * Alternatively, users can call tee_device_get() and teedev_ctx_get() together
> + * and release them simultaneously (see shm_alloc_helper()).
> + *
> + * @ctx: Pointer to the context
> + */
> +void teedev_ctx_get(struct tee_context *ctx);
> +
> +/**
> + * teedev_ctx_put() - Decrease reference count on a context
> + * @ctx: pointer to the context
> + */
> +void teedev_ctx_put(struct tee_context *ctx);
> +
> /**
> * tee_shm_alloc_kernel_buf() - Allocate kernel shared memory for a
> * particular TEE client driver
>
> --
> 2.34.1
>
On Mon, Apr 07, 2025 at 02:43:20PM +0800, Muchun Song wrote:
> By the way, in case you truly struggle to comprehend the fundamental
> aspects of HVO, I would like to summarize for you the user-visible
> behaviors in comparison to the situation where HVO is disabled.
>
> HVO Status Tail Page Structures Head Page Structures
> Enabled Read-Only (RO) Read-Write (RW)
> Disabled Read-Write (RW) Read-Write (RW)
>
> The sole distinction between the two scenarios lies in whether the
> tail page structures are allowed to be written or not. Please refrain
> from getting bogged down in the details of the implementation of HVO.
This feels extremely fragile to me. I doubt many people know what
operations needs read vs write access to tail pages. Or for higher
level operations if needs access to tail pages at all.
Hi,
This patch set allocates the protected DMA-bufs from a DMA-heap
instantiated from the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE
(OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QTEE) which sets up the
protection for the memory used for the DMA-bufs.
The DMA-heap uses a protected memory pool provided by the backend TEE
driver, allowing it to choose how to allocate the protected physical
memory.
The allocated DMA-bufs must be imported with a new TEE_IOC_SHM_REGISTER_FD
before they can be passed as arguments when requesting services from the
secure world.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
Recording) has been identified so far to serve as examples of what can be
expected. The use-cases has predefined DMA-heap names,
"protected,secure-video", "protected,trusted-ui", and
"protected,secure-video-record". The backend driver registers protected
memory pools for the use-cases it supports.
Each use-case has it's own protected memory pool since different use-cases
requires isolation from different parts of the system. A protected memory
pool can be based on a static carveout instantiated while probing the TEE
backend driver, or dynamically allocated from CMA and made protected as
needed by the TEE.
This can be tested on a RockPi 4B+ with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m rockpi4.xml \
-b prototype/sdp-v7
repo sync -j8
cd build
make toolchains -j$(nproc)
make all -j$(nproc)
# Copy ../out/rockpi4.img to an SD card and boot the RockPi from that
# Connect a monitor to the RockPi
# login and at the prompt:
gst-launch-1.0 videotestsrc ! \
aesenc key=1f9423681beb9a79215820f6bda73d0f \
iv=e9aa8e834d8d70b7e0d254ff670dd718 serialize-iv=true ! \
aesdec key=1f9423681beb9a79215820f6bda73d0f ! \
kmssink
The aesdec module has been hacked to use an OP-TEE TA to decrypt the stream
into protected DMA-bufs which are consumed by the kmssink.
The primitive QEMU tests from previous patch set can be tested on RockPi
in the same way with:
xtest --sdp-basic
The primitive test are tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b prototype/sdp-v7
repo sync -j8
cd build
make toolchains -j$(nproc)
make SPMC_AT_EL=1 all -j$(nproc)
make SPMC_AT_EL=1 run-only
# login and at the prompt:
xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
without FF-A using the original SMC ABI instead. Please remember to do
%rm -rf ../trusted-firmware-a/build/qemu
for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in
the secure world can access and manipulate the memory. There are also some
negative tests for out of bounds buffers etc.
Thanks,
Jens
Changes since V6:
* Restricted memory is now known as protected memory since to use the same
term as https://docs.vulkan.org/guide/latest/protected.html. Update all
patches to consistently use protected memory.
* In "tee: implement protected DMA-heap" add the hidden config option
TEE_DMABUF_HEAP to tell if the DMABUF_HEAPS functions are available
for the TEE subsystem
* Adding "tee: refactor params_from_user()", broken out from the patch
"tee: new ioctl to a register tee_shm from a dmabuf file descriptor"
* For "tee: new ioctl to a register tee_shm from a dmabuf file descriptor":
- Update commit message to mention protected memory
- Remove and open code tee_shm_get_parent_shm() in param_from_user_memref()
* In "tee: add tee_shm_alloc_cma_phys_mem" add the hidden config option
TEE_CMA to tell if the CMA functions are available for the TEE subsystem
* For "tee: tee_device_alloc(): copy dma_mask from parent device" and
"optee: pass parent device to tee_device_alloc", added
Reviewed-by: Sumit Garg <sumit.garg(a)kernel.org>
Changes since V5:
* Removing "tee: add restricted memory allocation" and
"tee: add TEE_IOC_RSTMEM_FD_INFO"
* Adding "tee: implement restricted DMA-heap",
"tee: new ioctl to a register tee_shm from a dmabuf file descriptor",
"tee: add tee_shm_alloc_cma_phys_mem()",
"optee: pass parent device to tee_device_alloc()", and
"tee: tee_device_alloc(): copy dma_mask from parent device"
* The two TEE driver OPs "rstmem_alloc()" and "rstmem_free()" are replaced
with a struct tee_rstmem_pool abstraction.
* Replaced the the TEE_IOC_RSTMEM_ALLOC user space API with the DMA-heap API
Changes since V4:
* Adding the patch "tee: add TEE_IOC_RSTMEM_FD_INFO" needed by the
GStreamer demo
* Removing the dummy CPU access and mmap functions from the dma_buf_ops
* Fixing a compile error in "optee: FF-A: dynamic restricted memory allocation"
reported by kernel test robot <lkp(a)intel.com>
Changes since V3:
* Make the use_case and flags field in struct tee_shm u32's instead of
u16's
* Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
* Import namespace DMA_BUF in module tee, reported by lkp(a)intel.com
* Added a note in the commit message for "optee: account for direction
while converting parameters" why it's needed
* Factor out dynamic restricted memory allocation from
"optee: support restricted memory allocation" into two new commits
"optee: FF-A: dynamic restricted memory allocation" and
"optee: smc abi: dynamic restricted memory allocation"
* Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
* Based on v6.12
* Replaced the flags for SVP and Trusted UID memory with a u32 field with
unique id for each use case
* Added dynamic allocation of restricted memory pools
* Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
* Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
* Based on v6.11
* Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
support"
* Replaced the word "secure" with "restricted" where applicable
Etienne Carriere (1):
tee: new ioctl to a register tee_shm from a dmabuf file descriptor
Jens Wiklander (10):
tee: tee_device_alloc(): copy dma_mask from parent device
optee: pass parent device to tee_device_alloc()
optee: account for direction while converting parameters
optee: sync secure world ABI headers
tee: implement protected DMA-heap
tee: refactor params_from_user()
tee: add tee_shm_alloc_cma_phys_mem()
optee: support protected memory allocation
optee: FF-A: dynamic protected memory allocation
optee: smc abi: dynamic protected memory allocation
drivers/tee/Kconfig | 10 +
drivers/tee/Makefile | 1 +
drivers/tee/optee/Makefile | 1 +
drivers/tee/optee/call.c | 10 +-
drivers/tee/optee/core.c | 1 +
drivers/tee/optee/ffa_abi.c | 195 ++++++++++++-
drivers/tee/optee/optee_ffa.h | 27 +-
drivers/tee/optee/optee_msg.h | 83 +++++-
drivers/tee/optee/optee_private.h | 55 +++-
drivers/tee/optee/optee_smc.h | 71 ++++-
drivers/tee/optee/protmem.c | 330 +++++++++++++++++++++
drivers/tee/optee/rpc.c | 31 +-
drivers/tee/optee/smc_abi.c | 192 ++++++++++--
drivers/tee/tee_core.c | 157 +++++++---
drivers/tee/tee_heap.c | 469 ++++++++++++++++++++++++++++++
drivers/tee/tee_private.h | 16 +
drivers/tee/tee_shm.c | 164 ++++++++++-
include/linux/tee_core.h | 70 +++++
include/linux/tee_drv.h | 10 +
include/uapi/linux/tee.h | 31 ++
20 files changed, 1792 insertions(+), 132 deletions(-)
create mode 100644 drivers/tee/optee/protmem.c
create mode 100644 drivers/tee/tee_heap.c
base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
--
2.43.0
Am 03.04.25 um 16:40 schrieb Philipp Stanner:
>>>>>
>>>>
>>>> That looks like a really really awkward approach. The driver
>>>> basically uses a the DMA fence infrastructure as middle layer and
>>>> callbacks into itself to cleanup it's own structures.
>>>>
>>>
>>> What else are callbacks good for, if not to do something
>>> automatically
>>> when the fence gets signaled?
>>>
>>
>> Well if you add a callback for a signal you issued yourself then
>> that's kind of awkward.
>>
>> E.g. you call into the DMA fence code, just for the DMA fence code
>> to call yourself back again.
> Now we're entering CS-Philosophy, because it depends on who "you" and
> "yourself" are. In case of the driver, yes, naturally it registers a
> callback because at some other place (e.g., in the driver's interrupt
> handler) the fence will be signaled and the driver wants the callback
> stuff to be done.
>
> If that's not dma_fences' callbacks' purpose, then I'd be interested in
> knowing what their purpose is, because from my POV this discussion
> seems to imply that we effectively must never use them for anything.
>
> How could it ever be different? Who, for example, registers dma_fence
> callbacks while not signaling them "himself"?
Let me try to improve that explanation.
First of all we have components, they can be drivers, frameworks, helpers like the DRM scheduler or generally any code which is more or less stand alone.
The definition of components is a bit tricky, but in general people usually get what they mean. E.g. in this case here we have nouveau as single component.
Now the DMA fence interface allows sending signals between different components in a standardized way, one component can send a signal to another one and they don't necessarily need to know anything from each other except that both are using the DMA fence framework in the documented manner.
When a component is now both the provide and the consumer at the same time you actually need a reason for that. Could be for example that it wants to consume signals from both itself as well as others, but this doesn't apply for this use case here.
Considering pool or billiard you can of course do a trick shot and try to hit the 8, but going straight for it just has a better chance to succeed.
>
>
>>
>>
>>>
>>>>
>>>> Additional to that we don't guarantee any callback order for the
>>>> DMA
>>>> fence and so it can be that mix cleaning up the callback with
>>>> other
>>>> work which is certainly not good when you want to guarantee that
>>>> the
>>>> cleanup happens under the same lock.
>>>>
>>>
>>> Isn't my perception correct that the primary issue you have with
>>> this
>>> approach is that dma_fence_put() is called from within the
>>> callback? Or
>>> do you also take issue with deleting from the list?
>>>
>>
>> Well kind of both. The issue is that the caller of
>> dma_fence_signal() or dma_fence_signal_locked() must hold the
>> reference until the function returns.
>>
>> When you do the list cleanup and the drop inside the callback it is
>> perfectly possible that the fence pointer becomes stale before you
>> return and that's really not a good idea.
> In other words, you would prefer if this patch would have a function
> with my callback's code in a function, and that function would be
> called at every place where the driver signals a fence?
>
> If that's your opinion, then, IOW, it would mean for us to go almost
> back to status quo, with nouveau_fence_signal2.0, but with the
> dma_fence_is_signaled() part fixed.
Well it could potentially be cleaned up more, but as far as I can see only the two lines I pointed out in the other mail need to move at the right place, yes.
I mean it's just two lines. If you open code that or if you make a nouveau_cleanup_list_ref() function (or similar) is perfectly up to you.
Regards,
Christian.
>
>
> P.
>
>>
>>
>>>
>>>
>>>
>>>>
>>>> Instead the call to dma_fence_signal_locked() should probably be
>>>> removed from nouveau_fence_signal() and into
>>>> nouveau_fence_context_kill() and nouveau_fence_update().
>>>>
>>>> This way nouveau_fence_is_signaled() can call this function as
>>>> well.
>>>>
>>>
>>> Which "this function"? dma_fence_signal_locked()
>>>
>>
>> No the cleanup function for the list entry. Whatever you call that
>> then, the name nouveau_fence_signal() is probably not appropriate any
>> more.
>>
>>
>>>
>>>
>>>
>>>>
>>>> BTW: nouveau_fence_no_signaling() looks completely broken as
>>>> well. It
>>>> calls nouveau_fence_is_signaled() and then list_del() on the
>>>> fence
>>>> head.
>>>>
>>>
>>> I can assure you that a great many things in Nouveau look
>>> completely
>>> broken.
>>>
>>> The question for us is always the cost-benefit-ratio when fixing
>>> bugs.
>>> There are fixes that solve the bug with reasonable effort, and
>>> there
>>> are great reworks towards an ideal state.
>>>
>>
>> I would just simply drop that function. As far as I can see it
>> severs no purpose other than doing exactly what the common DMA fence
>> code does anyway.
>>
>> Just one less thing which could fail.
>>
>> Christian.
>>
>>
>>>
>>>
>>> P.
>>>
>>>
>>>
>>>>
>>>> As far as I can see that is completely superfluous and should
>>>> probably be dropped. IIRC I once had a patch to clean that up but
>>>> it
>>>> was dropped for some reason.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>
>>>>>
>>>>> + dma_fence_put(&fence->base);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> ret = fctx->emit(fence);
>>>>> if (!ret) {
>>>>> dma_fence_get(&fence->base);
>>>>> @@ -246,8 +251,7 @@ nouveau_fence_emit(struct nouveau_fence
>>>>> *fence)
>>>>> return -ENODEV;
>>>>> }
>>>>>
>>>>> - if (nouveau_fence_update(chan, fctx))
>>>>> - nvif_event_block(&fctx->event);
>>>>> + nouveau_fence_update(chan, fctx);
>>>>>
>>>>> list_add_tail(&fence->head, &fctx->pending);
>>>>> spin_unlock_irq(&fctx->lock);
>>>>> @@ -270,8 +274,8 @@ nouveau_fence_done(struct nouveau_fence
>>>>> *fence)
>>>>>
>>>>> spin_lock_irqsave(&fctx->lock, flags);
>>>>> chan = rcu_dereference_protected(fence-
>>>>>> channel,
>>>>> lockdep_is_held(&fctx->lock));
>>>>> - if (chan && nouveau_fence_update(chan, fctx))
>>>>> - nvif_event_block(&fctx->event);
>>>>> + if (chan)
>>>>> + nouveau_fence_update(chan, fctx);
>>>>> spin_unlock_irqrestore(&fctx->lock, flags);
>>>>> }
>>>>> return dma_fence_is_signaled(&fence->base);
>>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h
>>>>> b/drivers/gpu/drm/nouveau/nouveau_fence.h
>>>>> index 8bc065acfe35..e6b2df7fdc42 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
>>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
>>>>> @@ -10,6 +10,7 @@ struct nouveau_bo;
>>>>>
>>>>> struct nouveau_fence {
>>>>> struct dma_fence base;
>>>>> + struct dma_fence_cb cb;
>>>>>
>>>>> struct list_head head;
>>>>>
>>>>>
>>>>
>>>
>>
>>
After the btrfs compressed bio discussion I think the hugetlb changes that
skip the tail pages are fundamentally unsafe in the current kernel.
That is because the bio_vec representation assumes tail pages do exist, so
as soon as you are doing direct I/O that generates a bvec starting beyond
the present head page things will blow up. Other users of bio_vecs might
do the same, but the way the block bio_vecs are generated are very suspect
to that. So we'll first need to sort that out and a few other things
before we can even think of enabling such a feature.
Am 03.04.25 um 15:15 schrieb Danilo Krummrich:
> On Thu, Apr 03, 2025 at 02:58:13PM +0200, Philipp Stanner wrote:
>> On Thu, 2025-04-03 at 14:08 +0200, Christian König wrote:
>>> Am 03.04.25 um 12:13 schrieb Philipp Stanner:
>>> BTW: nouveau_fence_no_signaling() looks completely broken as well. It
>>> calls nouveau_fence_is_signaled() and then list_del() on the fence
>>> head.
>> I can assure you that a great many things in Nouveau look completely
>> broken.
>>
>> The question for us is always the cost-benefit-ratio when fixing bugs.
>> There are fixes that solve the bug with reasonable effort, and there
>> are great reworks towards an ideal state.
> That's just an additional thing that Christian noticed. It isn't really directly
> related to what you want to fix with your patch.
Well there is some relation. From nouveau_fence_no_signaling():
if (nouveau_fence_is_signaled(f)) {
list_del(&fence->head);
dma_fence_put(&fence->base);
return false;
}
That looks like somebody realized that the fence needs to be removed from the pending list and the reference dropped.
It's just that this was added to the wrong function, e.g. those lines need to be in nouveau_fence_is_signaled() and not here.
Regards,
Christian.
>
> I think the function can simply be dropped.
Am 03.04.25 um 14:58 schrieb Philipp Stanner:
> On Thu, 2025-04-03 at 14:08 +0200, Christian König wrote:
>> Am 03.04.25 um 12:13 schrieb Philipp Stanner:
>>> Nouveau currently relies on the assumption that dma_fences will
>>> only
>>> ever get signalled through nouveau_fence_signal(), which takes care
>>> of
>>> removing a signalled fence from the list
>>> nouveau_fence_chan.pending.
>>>
>>> This self-imposed rule is violated in nouveau_fence_done(), where
>>> dma_fence_is_signaled() can signal the fence without removing it
>>> from
>>> the list. This enables accesses to already signalled fences through
>>> the
>>> list, which is a bug.
>>>
>>> Furthermore, it must always be possible to use standard dma_fence
>>> methods an a dma_fence and observe valid behavior. The canonical
>>> way of
>>> ensuring that signalling a fence has additional effects is to add
>>> those
>>> effects to a callback and register it on that fence.
>>>
>>> Move the code from nouveau_fence_signal() into a dma_fence
>>> callback.
>>> Register that callback when creating the fence.
>>>
>>> Cc: <stable(a)vger.kernel.org> # 4.10+
>>> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
>>> ---
>>> Changes in v2:
>>> - Remove Fixes: tag. (Danilo)
>>> - Remove integer "drop" and call nvif_event_block() in the fence
>>> callback. (Danilo)
>>> ---
>>> drivers/gpu/drm/nouveau/nouveau_fence.c | 52 +++++++++++++--------
>>> ----
>>> drivers/gpu/drm/nouveau/nouveau_fence.h | 1 +
>>> 2 files changed, 29 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> index 7cc84472cece..cf510ef9641a 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> @@ -50,24 +50,24 @@ nouveau_fctx(struct nouveau_fence *fence)
>>> return container_of(fence->base.lock, struct
>>> nouveau_fence_chan, lock);
>>> }
>>>
>>> -static int
>>> -nouveau_fence_signal(struct nouveau_fence *fence)
>>> +static void
>>> +nouveau_fence_cleanup_cb(struct dma_fence *dfence, struct
>>> dma_fence_cb *cb)
>>> {
>>> - int drop = 0;
>>> + struct nouveau_fence_chan *fctx;
>>> + struct nouveau_fence *fence;
>>> +
>>> + fence = container_of(dfence, struct nouveau_fence, base);
>>> + fctx = nouveau_fctx(fence);
>>>
>>> - dma_fence_signal_locked(&fence->base);
>>> list_del(&fence->head);
>>> rcu_assign_pointer(fence->channel, NULL);
>>>
>>> if (test_bit(DMA_FENCE_FLAG_USER_BITS, &fence-
>>>> base.flags)) {
>>> - struct nouveau_fence_chan *fctx =
>>> nouveau_fctx(fence);
>>> -
>>> if (!--fctx->notify_ref)
>>> - drop = 1;
>>> + nvif_event_block(&fctx->event);
>>> }
>>>
>>> dma_fence_put(&fence->base);
>>> - return drop;
>>> }
>>>
>>> static struct nouveau_fence *
>>> @@ -93,8 +93,7 @@ nouveau_fence_context_kill(struct
>>> nouveau_fence_chan *fctx, int error)
>>> if (error)
>>> dma_fence_set_error(&fence->base, error);
>>>
>>> - if (nouveau_fence_signal(fence))
>>> - nvif_event_block(&fctx->event);
>>> + dma_fence_signal_locked(&fence->base);
>>> }
>>> fctx->killed = 1;
>>> spin_unlock_irqrestore(&fctx->lock, flags);
>>> @@ -127,11 +126,10 @@ nouveau_fence_context_free(struct
>>> nouveau_fence_chan *fctx)
>>> kref_put(&fctx->fence_ref, nouveau_fence_context_put);
>>> }
>>>
>>> -static int
>>> +static void
>>> nouveau_fence_update(struct nouveau_channel *chan, struct
>>> nouveau_fence_chan *fctx)
>>> {
>>> struct nouveau_fence *fence;
>>> - int drop = 0;
>>> u32 seq = fctx->read(chan);
>>>
>>> while (!list_empty(&fctx->pending)) {
>>> @@ -140,10 +138,8 @@ nouveau_fence_update(struct nouveau_channel
>>> *chan, struct nouveau_fence_chan *fc
>>> if ((int)(seq - fence->base.seqno) < 0)
>>> break;
>>>
>>> - drop |= nouveau_fence_signal(fence);
>>> + dma_fence_signal_locked(&fence->base);
>>> }
>>> -
>>> - return drop;
>>> }
>>>
>>> static void
>>> @@ -152,7 +148,6 @@ nouveau_fence_uevent_work(struct work_struct
>>> *work)
>>> struct nouveau_fence_chan *fctx = container_of(work,
>>> struct nouveau_fence_chan,
>>>
>>> uevent_work);
>>> unsigned long flags;
>>> - int drop = 0;
>>>
>>> spin_lock_irqsave(&fctx->lock, flags);
>>> if (!list_empty(&fctx->pending)) {
>>> @@ -161,11 +156,8 @@ nouveau_fence_uevent_work(struct work_struct
>>> *work)
>>>
>>> fence = list_entry(fctx->pending.next,
>>> typeof(*fence), head);
>>> chan = rcu_dereference_protected(fence->channel,
>>> lockdep_is_held(&fctx->lock));
>>> - if (nouveau_fence_update(chan, fctx))
>>> - drop = 1;
>>> + nouveau_fence_update(chan, fctx);
>>> }
>>> - if (drop)
>>> - nvif_event_block(&fctx->event);
>>>
>>> spin_unlock_irqrestore(&fctx->lock, flags);
>>> }
>>> @@ -235,6 +227,19 @@ nouveau_fence_emit(struct nouveau_fence
>>> *fence)
>>> &fctx->lock, fctx->context, ++fctx-
>>>> sequence);
>>> kref_get(&fctx->fence_ref);
>>>
>>> + fence->cb.func = nouveau_fence_cleanup_cb;
>>> + /* Adding a callback runs into
>>> __dma_fence_enable_signaling(), which will
>>> + * ultimately run into nouveau_fence_no_signaling(), where
>>> a WARN_ON
>>> + * would fire because the refcount can be dropped there.
>>> + *
>>> + * Increment the refcount here temporarily to work around
>>> that.
>>> + */
>>> + dma_fence_get(&fence->base);
>>> + ret = dma_fence_add_callback(&fence->base, &fence->cb,
>>> nouveau_fence_cleanup_cb);
>> That looks like a really really awkward approach. The driver
>> basically uses a the DMA fence infrastructure as middle layer and
>> callbacks into itself to cleanup it's own structures.
> What else are callbacks good for, if not to do something automatically
> when the fence gets signaled?
Well if you add a callback for a signal you issued yourself then that's kind of awkward.
E.g. you call into the DMA fence code, just for the DMA fence code to call yourself back again.
>> Additional to that we don't guarantee any callback order for the DMA
>> fence and so it can be that mix cleaning up the callback with other
>> work which is certainly not good when you want to guarantee that the
>> cleanup happens under the same lock.
> Isn't my perception correct that the primary issue you have with this
> approach is that dma_fence_put() is called from within the callback? Or
> do you also take issue with deleting from the list?
Well kind of both. The issue is that the caller of dma_fence_signal() or dma_fence_signal_locked() must hold the reference until the function returns.
When you do the list cleanup and the drop inside the callback it is perfectly possible that the fence pointer becomes stale before you return and that's really not a good idea.
>
>> Instead the call to dma_fence_signal_locked() should probably be
>> removed from nouveau_fence_signal() and into
>> nouveau_fence_context_kill() and nouveau_fence_update().
>>
>> This way nouveau_fence_is_signaled() can call this function as well.
> Which "this function"? dma_fence_signal_locked()
No the cleanup function for the list entry. Whatever you call that then, the name nouveau_fence_signal() is probably not appropriate any more.
>
>> BTW: nouveau_fence_no_signaling() looks completely broken as well. It
>> calls nouveau_fence_is_signaled() and then list_del() on the fence
>> head.
> I can assure you that a great many things in Nouveau look completely
> broken.
>
> The question for us is always the cost-benefit-ratio when fixing bugs.
> There are fixes that solve the bug with reasonable effort, and there
> are great reworks towards an ideal state.
I would just simply drop that function. As far as I can see it severs no purpose other than doing exactly what the common DMA fence code does anyway.
Just one less thing which could fail.
Christian.
>
> P.
>
>
>> As far as I can see that is completely superfluous and should
>> probably be dropped. IIRC I once had a patch to clean that up but it
>> was dropped for some reason.
>>
>> Regards,
>> Christian.
>>
>>
>>> + dma_fence_put(&fence->base);
>>> + if (ret)
>>> + return ret;
>>> +
>>> ret = fctx->emit(fence);
>>> if (!ret) {
>>> dma_fence_get(&fence->base);
>>> @@ -246,8 +251,7 @@ nouveau_fence_emit(struct nouveau_fence *fence)
>>> return -ENODEV;
>>> }
>>>
>>> - if (nouveau_fence_update(chan, fctx))
>>> - nvif_event_block(&fctx->event);
>>> + nouveau_fence_update(chan, fctx);
>>>
>>> list_add_tail(&fence->head, &fctx->pending);
>>> spin_unlock_irq(&fctx->lock);
>>> @@ -270,8 +274,8 @@ nouveau_fence_done(struct nouveau_fence *fence)
>>>
>>> spin_lock_irqsave(&fctx->lock, flags);
>>> chan = rcu_dereference_protected(fence->channel,
>>> lockdep_is_held(&fctx->lock));
>>> - if (chan && nouveau_fence_update(chan, fctx))
>>> - nvif_event_block(&fctx->event);
>>> + if (chan)
>>> + nouveau_fence_update(chan, fctx);
>>> spin_unlock_irqrestore(&fctx->lock, flags);
>>> }
>>> return dma_fence_is_signaled(&fence->base);
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> b/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> index 8bc065acfe35..e6b2df7fdc42 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
>>> @@ -10,6 +10,7 @@ struct nouveau_bo;
>>>
>>> struct nouveau_fence {
>>> struct dma_fence base;
>>> + struct dma_fence_cb cb;
>>>
>>> struct list_head head;
>>>
Am 03.04.25 um 12:25 schrieb Danilo Krummrich:
> On Thu, Apr 03, 2025 at 12:17:29PM +0200, Philipp Stanner wrote:
>> On Thu, 2025-04-03 at 12:13 +0200, Philipp Stanner wrote:
>>> -static int
>>> -nouveau_fence_signal(struct nouveau_fence *fence)
>>> +static void
>>> +nouveau_fence_cleanup_cb(struct dma_fence *dfence, struct
>>> dma_fence_cb *cb)
>>> {
>>> - int drop = 0;
>>> + struct nouveau_fence_chan *fctx;
>>> + struct nouveau_fence *fence;
>>> +
>>> + fence = container_of(dfence, struct nouveau_fence, base);
>>> + fctx = nouveau_fctx(fence);
>>>
>>> - dma_fence_signal_locked(&fence->base);
>>> list_del(&fence->head);
>>> rcu_assign_pointer(fence->channel, NULL);
>>>
>>> if (test_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags))
>>> {
>>> - struct nouveau_fence_chan *fctx =
>>> nouveau_fctx(fence);
>>> -
>>> if (!--fctx->notify_ref)
>>> - drop = 1;
>>> + nvif_event_block(&fctx->event);
>>> }
>>>
>>> dma_fence_put(&fence->base);
>> What I realized while coding this v2 is that we might want to think
>> about whether we really want the dma_fence_put() in the fence callback?
>>
>> It should work fine, since it's exactly identical to the previous
>> code's behavior – but effectively it means that the driver's reference
>> will be dropped whenever it signals that fence.
> Not quite, it's the reference of the fence context's pending list.
>
> When the fence is emitted, dma_fence_init() is called, which initializes the
> reference count to 1. Subsequently, another reference is taken, when the fence
> is added to the pending list. Once the fence is signaled and hence removed from
> the pending list, we can (and have to) drop this reference.
The general idea is that the caller must hold the reference until the signaling is completed.
So for signaling from the interrupt handler it means that you need to call dma_fence_put() for the list reference *after* you called dma_fence_signal_locked().
For signaling from the .enable_signaling or .signaled callback you need to remove the fence from the linked list and call dma_fence_put() *before* you return (because the caller is holding the potential last reference).
That's why I'm pretty sure that the approach with installing the callback won't work. As far as I know no other DMA fence implementation is doing that.
Regards,
Christian.