Hi,
Here's preliminary work to enable dmem tracking for heavy users of DMA
allocations on behalf of userspace: v4l2, DRM, and dma-buf heaps.
It's not really meant for inclusion at the moment, because I really
don't like it that much, and would like to discuss solutions on how to
make it nicer.
In particular, the dma dmem region accessors don't feel that great to
me. It duplicates the logic to select the proper accessor in
dma_alloc_attrs(), and it looks fragile and potentially buggy to me.
One solution I tried is to do the accounting in dma_alloc_attrs()
directly, depending on a flag being set, similar to what __GFP_ACCOUNT
is doing.
It didn't work because dmem initialises a state pointer when charging an
allocation to a region, and expects that state pointer to be passed back
when uncharging. Since dma_alloc_attrs() returns a void pointer to the
allocated buffer, we need to put that state into a higher-level
structure, such as drm_gem_object, or dma_buf.
Since we can't share the region selection logic, we need to get the
region through some other mean. Another thing I consider was to return
the region as part of the allocated buffer (through struct page or
folio), but those are lost across the calls and dma_alloc_attrs() will
only get a void pointer. So that's not doable without some heavy
rework, if it's a good idea at all.
So yeah, I went for the dumbest possible solution with the accessors,
hoping you could suggest a much smarter idea :)
Thanks,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Maxime Ripard (12):
cma: Register dmem region for each cma region
cma: Provide accessor to cma dmem region
dma: coherent: Register dmem region for each coherent region
dma: coherent: Provide accessor to dmem region
dma: contiguous: Provide accessor to dmem region
dma: direct: Provide accessor to dmem region
dma: Create default dmem region for DMA allocations
dma: Provide accessor to dmem region
dma-buf: Clear cgroup accounting on release
dma-buf: cma: Account for allocations in dmem cgroup
drm/gem: Add cgroup memory accounting
media: videobuf2: Track buffer allocations through the dmem cgroup
drivers/dma-buf/dma-buf.c | 7 ++++
drivers/dma-buf/heaps/cma_heap.c | 18 ++++++++--
drivers/gpu/drm/drm_gem.c | 5 +++
drivers/gpu/drm/drm_gem_dma_helper.c | 6 ++++
.../media/common/videobuf2/videobuf2-dma-contig.c | 19 +++++++++++
include/drm/drm_device.h | 1 +
include/drm/drm_gem.h | 2 ++
include/linux/cma.h | 9 +++++
include/linux/dma-buf.h | 5 +++
include/linux/dma-direct.h | 2 ++
include/linux/dma-map-ops.h | 32 ++++++++++++++++++
include/linux/dma-mapping.h | 11 ++++++
kernel/dma/coherent.c | 26 +++++++++++++++
kernel/dma/direct.c | 8 +++++
kernel/dma/mapping.c | 39 ++++++++++++++++++++++
mm/cma.c | 21 +++++++++++-
mm/cma.h | 3 ++
17 files changed, 211 insertions(+), 3 deletions(-)
---
base-commit: 55a2aa61ba59c138bd956afe0376ec412a7004cf
change-id: 20250307-dmem-cgroups-73febced0989
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi Amirreza,
kernel test robot noticed the following build warnings:
[auto build test WARNING on db8da9da41bced445077925f8a886c776a47440c]
url: https://github.com/intel-lab-lkp/linux/commits/Amirreza-Zarrabi/tee-allow-a…
base: db8da9da41bced445077925f8a886c776a47440c
patch link: https://lore.kernel.org/r/20250327-qcom-tee-using-tee-ss-without-mem-obj-v3…
patch subject: [PATCH v3 08/11] tee: add Qualcomm TEE driver
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20250329/202503290620.2KJEcZM6-lkp@…)
compiler: s390-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250329/202503290620.2KJEcZM6-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503290620.2KJEcZM6-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/tee/qcomtee/core.c:310: warning: Function parameter or struct member 'oic' not described in 'qcomtee_object_qtee_init'
vim +310 drivers/tee/qcomtee/core.c
298
299 /**
300 * qcomtee_object_qtee_init() - Initialize an object for QTEE.
301 * @object: object returned.
302 * @object_id: object ID received from QTEE.
303 *
304 * Return: On failure, returns < 0 and sets @object to %NULL_QCOMTEE_OBJECT.
305 * On success, returns 0
306 */
307 static int qcomtee_object_qtee_init(struct qcomtee_object_invoke_ctx *oic,
308 struct qcomtee_object **object,
309 unsigned int object_id)
> 310 {
311 int ret = 0;
312
313 switch (qcomtee_object_type(object_id)) {
314 case QCOMTEE_OBJECT_TYPE_NULL:
315 *object = NULL_QCOMTEE_OBJECT;
316
317 break;
318 case QCOMTEE_OBJECT_TYPE_CB:
319 *object = qcomtee_local_object_get(object_id);
320 if (*object == NULL_QCOMTEE_OBJECT)
321 ret = -EINVAL;
322
323 break;
324
325 default: /* QCOMTEE_OBJECT_TYPE_TEE */
326 *object = qcomtee_qtee_object_alloc(oic, object_id);
327 if (*object == NULL_QCOMTEE_OBJECT)
328 ret = -ENOMEM;
329
330 break;
331 }
332
333 return ret;
334 }
335
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Rob Clark <robdclark(a)chromium.org>
Add support for exporting a dma_fence fd for a specific point on a
timeline. This is needed for vtest/vpipe[1][2] to implement timeline
syncobj support, as it needs a way to turn a point on a timeline back
into a dma_fence fd. It also closes an odd omission from the syncobj
UAPI.
[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33433
[2] https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/805
v2: Add DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_TIMELINE
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
---
drivers/gpu/drm/drm_syncobj.c | 18 +++++++++++++-----
include/uapi/drm/drm.h | 2 ++
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 4f2ab8a7b50f..bc57d6f1a22e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -762,7 +762,7 @@ static int drm_syncobj_import_sync_file_fence(struct drm_file *file_private,
}
static int drm_syncobj_export_sync_file(struct drm_file *file_private,
- int handle, int *p_fd)
+ int handle, u64 point, int *p_fd)
{
int ret;
struct dma_fence *fence;
@@ -772,7 +772,7 @@ static int drm_syncobj_export_sync_file(struct drm_file *file_private,
if (fd < 0)
return fd;
- ret = drm_syncobj_find_fence(file_private, handle, 0, 0, &fence);
+ ret = drm_syncobj_find_fence(file_private, handle, point, 0, &fence);
if (ret)
goto err_put_fd;
@@ -869,6 +869,9 @@ drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
struct drm_syncobj_handle *args = data;
+ unsigned valid_flags = DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_TIMELINE |
+ DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE;
+ u64 point = 0;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
return -EOPNOTSUPP;
@@ -876,13 +879,18 @@ drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, void *data,
if (args->pad)
return -EINVAL;
- if (args->flags != 0 &&
- args->flags != DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE)
+ if (args->flags != 0 && (args->flags & ~valid_flags))
return -EINVAL;
+ if (args->flags & DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_TIMELINE)
+ point = args->point;
+
if (args->flags & DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE)
return drm_syncobj_export_sync_file(file_private, args->handle,
- &args->fd);
+ point, &args->fd);
+
+ if (args->point)
+ return -EINVAL;
return drm_syncobj_handle_to_fd(file_private, args->handle,
&args->fd);
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 7fba37b94401..c71a8f4439f2 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -912,6 +912,8 @@ struct drm_syncobj_handle {
__s32 fd;
__u32 pad;
+
+ __u64 point;
};
struct drm_syncobj_transfer {
--
2.49.0
Am 27.03.25 um 09:42 schrieb Philipp Stanner:
> Nouveau currently relies on the assumption that dma_fences will only
> ever get signalled through nouveau_fence_signal(), which takes care of
> removing a signalled fence from the list nouveau_fence_chan.pending.
>
> This self-imposed rule is violated in nouveau_fence_done(), where
> dma_fence_is_signaled() can signal the fence without removing it from
> the list. This enables accesses to already signalled fences through the
> list, which is a bug.
>
> Furthermore, it must always be possible to use standard dma_fence
> methods an a dma_fence and observe valid behavior. The canonical way of
> ensuring that signalling a fence has additional effects is to add those
> effects to a callback and register it on the fence.
Good catch.
>
> Move the code from nouveau_fence_signal() into a dma_fence callback.
> Register that callback when creating the fence.
But that's a really ugly approach.
Either nouveau shouldn't implement the signaled callback or make sure that when returning true from the signaled callback the fence is also removed from the pending list.
>
> Cc: <stable(a)vger.kernel.org> # 4.10+
> Fixes: f54d1867005c ("dma-buf: Rename struct fence to dma_fence")
> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
> ---
> I'm not entirely sure what Fixes-Tag is appropriate. The last time the
> line causing the signalled fence in the list was touched is the commit
> listed above.
Yeah, that's most likely not correct. My educated guess is that the bug was always there just never discovered.
Regards,
Christian.
> ---
> drivers/gpu/drm/nouveau/nouveau_fence.c | 41 ++++++++++++++++---------
> drivers/gpu/drm/nouveau/nouveau_fence.h | 1 +
> 2 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 7cc84472cece..b2c2241a8803 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -50,24 +50,22 @@ nouveau_fctx(struct nouveau_fence *fence)
> return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> }
>
> -static int
> -nouveau_fence_signal(struct nouveau_fence *fence)
> +static void
> +nouveau_fence_cleanup_cb(struct dma_fence *dfence, struct dma_fence_cb *cb)
> {
> - int drop = 0;
> + struct nouveau_fence_chan *fctx;
> + struct nouveau_fence *fence;
> +
> + fence = container_of(dfence, struct nouveau_fence, base);
> + fctx = nouveau_fctx(fence);
>
> - dma_fence_signal_locked(&fence->base);
> list_del(&fence->head);
> rcu_assign_pointer(fence->channel, NULL);
>
> - if (test_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags)) {
> - struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> -
> - if (!--fctx->notify_ref)
> - drop = 1;
> - }
> + if (test_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags))
> + --fctx->notify_ref;
>
> dma_fence_put(&fence->base);
> - return drop;
> }
>
> static struct nouveau_fence *
> @@ -93,7 +91,8 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
> if (error)
> dma_fence_set_error(&fence->base, error);
>
> - if (nouveau_fence_signal(fence))
> + dma_fence_signal_locked(&fence->base);
> + if (fctx->notify_ref == 0)
> nvif_event_block(&fctx->event);
> }
> fctx->killed = 1;
> @@ -131,7 +130,6 @@ static int
> nouveau_fence_update(struct nouveau_channel *chan, struct nouveau_fence_chan *fctx)
> {
> struct nouveau_fence *fence;
> - int drop = 0;
> u32 seq = fctx->read(chan);
>
> while (!list_empty(&fctx->pending)) {
> @@ -140,10 +138,10 @@ nouveau_fence_update(struct nouveau_channel *chan, struct nouveau_fence_chan *fc
> if ((int)(seq - fence->base.seqno) < 0)
> break;
>
> - drop |= nouveau_fence_signal(fence);
> + dma_fence_signal_locked(&fence->base);
> }
>
> - return drop;
> + return fctx->notify_ref == 0 ? 1 : 0;
> }
>
> static void
> @@ -235,6 +233,19 @@ nouveau_fence_emit(struct nouveau_fence *fence)
> &fctx->lock, fctx->context, ++fctx->sequence);
> kref_get(&fctx->fence_ref);
>
> + fence->cb.func = nouveau_fence_cleanup_cb;
> + /* Adding a callback runs into __dma_fence_enable_signaling(), which will
> + * ultimately run into nouveau_fence_no_signaling(), where a WARN_ON
> + * would fire because the refcount can be dropped there.
> + *
> + * Increment the refcount here temporarily to work around that.
> + */
> + dma_fence_get(&fence->base);
> + ret = dma_fence_add_callback(&fence->base, &fence->cb, nouveau_fence_cleanup_cb);
> + dma_fence_put(&fence->base);
> + if (ret)
> + return ret;
> +
> ret = fctx->emit(fence);
> if (!ret) {
> dma_fence_get(&fence->base);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
> index 8bc065acfe35..e6b2df7fdc42 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
> @@ -10,6 +10,7 @@ struct nouveau_bo;
>
> struct nouveau_fence {
> struct dma_fence base;
> + struct dma_fence_cb cb;
>
> struct list_head head;
>