This is a note to let you know that I've just added the patch titled
drm/gem: Acquire references on GEM handles for framebuffers
to the 6.12-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
drm-gem-acquire-references-on-gem-handles-for-framebuffers.patch
and it can be found in the queue-6.12 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From 5307dce878d4126e1b375587318955bd019c3741 Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Mon, 30 Jun 2025 10:36:47 +0200
Subject: drm/gem: Acquire references on GEM handles for framebuffers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Thomas Zimmermann <tzimmermann(a)suse.de>
commit 5307dce878d4126e1b375587318955bd019c3741 upstream.
A GEM handle can be released while the GEM buffer object is attached
to a DRM framebuffer. This leads to the release of the dma-buf backing
the buffer object, if any. [1] Trying to use the framebuffer in further
mode-setting operations leads to a segmentation fault. Most easily
happens with driver that use shadow planes for vmap-ing the dma-buf
during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------
[ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430
[...]
[ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430
[ 157.043420] Call Trace:
[ 157.045898] <TASK>
[ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710
[ 157.065567] ? dma_buf_vmap+0x224/0x430
[ 157.069446] ? __warn.cold+0x58/0xe4
[ 157.073061] ? dma_buf_vmap+0x224/0x430
[ 157.077111] ? report_bug+0x1dd/0x390
[ 157.080842] ? handle_bug+0x5e/0xa0
[ 157.084389] ? exc_invalid_op+0x14/0x50
[ 157.088291] ? asm_exc_invalid_op+0x16/0x20
[ 157.092548] ? dma_buf_vmap+0x224/0x430
[ 157.096663] ? dma_resv_get_singleton+0x6d/0x230
[ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10
[ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10
[ 157.110697] drm_gem_shmem_vmap+0x74/0x710
[ 157.114866] drm_gem_vmap+0xa9/0x1b0
[ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0
[ 157.123086] drm_gem_fb_vmap+0xab/0x300
[ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10
[ 157.133032] ? lockdep_init_map_type+0x19d/0x880
[ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0
[ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180
[ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40
[...]
[ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents
this from happening. The framebuffer's cleanup later puts the handle
references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object
instance") triggers the segmentation fault easily by using the dma-buf
field more widely. The underlying issue with reference counting has
been present before.
v2:
- acquire the handle instead of the BO (Christian)
- fix comment style (Christian)
- drop the Fixes tag (Christian)
- rename err_ gotos
- add missing Link tag
Suggested-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1]
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Anusha Srivatsa <asrivats(a)redhat.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://lore.kernel.org/r/20250630084001.293053-1-tzimmermann@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++++++++---
drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++++----
drivers/gpu/drm/drm_internal.h | 2 +
3 files changed, 51 insertions(+), 11 deletions(-)
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -186,6 +186,35 @@ void drm_gem_private_object_fini(struct
}
EXPORT_SYMBOL(drm_gem_private_object_fini);
+static void drm_gem_object_handle_get(struct drm_gem_object *obj)
+{
+ struct drm_device *dev = obj->dev;
+
+ drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock));
+
+ if (obj->handle_count++ == 0)
+ drm_gem_object_get(obj);
+}
+
+/**
+ * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles
+ * @obj: GEM object
+ *
+ * Acquires a reference on the GEM buffer object's handle. Required
+ * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked()
+ * to release the reference.
+ */
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj)
+{
+ struct drm_device *dev = obj->dev;
+
+ guard(mutex)(&dev->object_name_lock);
+
+ drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */
+ drm_gem_object_handle_get(obj);
+}
+EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked);
+
/**
* drm_gem_object_handle_free - release resources bound to userspace handles
* @obj: GEM object to clean up.
@@ -216,8 +245,14 @@ static void drm_gem_object_exported_dma_
}
}
-static void
-drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj)
+/**
+ * drm_gem_object_handle_put_unlocked - releases reference on user-space handles
+ * @obj: GEM object
+ *
+ * Releases a reference on the GEM buffer object's handle. Possibly releases
+ * the GEM buffer object and associated dma-buf objects.
+ */
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj)
{
struct drm_device *dev = obj->dev;
bool final = false;
@@ -242,6 +277,7 @@ drm_gem_object_handle_put_unlocked(struc
if (final)
drm_gem_object_put(obj);
}
+EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked);
/*
* Called at device or object close to release the file's
@@ -363,8 +399,8 @@ drm_gem_handle_create_tail(struct drm_fi
int ret;
WARN_ON(!mutex_is_locked(&dev->object_name_lock));
- if (obj->handle_count++ == 0)
- drm_gem_object_get(obj);
+
+ drm_gem_object_handle_get(obj);
/*
* Get the user-visible handle using idr. Preload and perform
--- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
+++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
@@ -99,7 +99,7 @@ void drm_gem_fb_destroy(struct drm_frame
unsigned int i;
for (i = 0; i < fb->format->num_planes; i++)
- drm_gem_object_put(fb->obj[i]);
+ drm_gem_object_handle_put_unlocked(fb->obj[i]);
drm_framebuffer_cleanup(fb);
kfree(fb);
@@ -182,8 +182,10 @@ int drm_gem_fb_init_with_funcs(struct dr
if (!objs[i]) {
drm_dbg_kms(dev, "Failed to lookup GEM object\n");
ret = -ENOENT;
- goto err_gem_object_put;
+ goto err_gem_object_handle_put_unlocked;
}
+ drm_gem_object_handle_get_unlocked(objs[i]);
+ drm_gem_object_put(objs[i]);
min_size = (height - 1) * mode_cmd->pitches[i]
+ drm_format_info_min_pitch(info, i, width)
@@ -193,22 +195,22 @@ int drm_gem_fb_init_with_funcs(struct dr
drm_dbg_kms(dev,
"GEM object size (%zu) smaller than minimum size (%u) for plane %d\n",
objs[i]->size, min_size, i);
- drm_gem_object_put(objs[i]);
+ drm_gem_object_handle_put_unlocked(objs[i]);
ret = -EINVAL;
- goto err_gem_object_put;
+ goto err_gem_object_handle_put_unlocked;
}
}
ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs);
if (ret)
- goto err_gem_object_put;
+ goto err_gem_object_handle_put_unlocked;
return 0;
-err_gem_object_put:
+err_gem_object_handle_put_unlocked:
while (i > 0) {
--i;
- drm_gem_object_put(objs[i]);
+ drm_gem_object_handle_put_unlocked(objs[i]);
}
return ret;
}
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -153,6 +153,8 @@ void drm_sysfs_lease_event(struct drm_de
/* drm_gem.c */
int drm_gem_init(struct drm_device *dev);
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj);
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj);
int drm_gem_handle_create_tail(struct drm_file *file_priv,
struct drm_gem_object *obj,
u32 *handlep);
Patches currently in stable-queue which might be from tzimmermann(a)suse.de are
queue-6.12/drm-gem-fix-race-in-drm_gem_handle_create_tail.patch
queue-6.12/drm-gem-acquire-references-on-gem-handles-for-framebuffers.patch
This is a note to let you know that I've just added the patch titled
drm/gem: Acquire references on GEM handles for framebuffers
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
drm-gem-acquire-references-on-gem-handles-for-framebuffers.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From 5307dce878d4126e1b375587318955bd019c3741 Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Mon, 30 Jun 2025 10:36:47 +0200
Subject: drm/gem: Acquire references on GEM handles for framebuffers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Thomas Zimmermann <tzimmermann(a)suse.de>
commit 5307dce878d4126e1b375587318955bd019c3741 upstream.
A GEM handle can be released while the GEM buffer object is attached
to a DRM framebuffer. This leads to the release of the dma-buf backing
the buffer object, if any. [1] Trying to use the framebuffer in further
mode-setting operations leads to a segmentation fault. Most easily
happens with driver that use shadow planes for vmap-ing the dma-buf
during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------
[ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430
[...]
[ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430
[ 157.043420] Call Trace:
[ 157.045898] <TASK>
[ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710
[ 157.065567] ? dma_buf_vmap+0x224/0x430
[ 157.069446] ? __warn.cold+0x58/0xe4
[ 157.073061] ? dma_buf_vmap+0x224/0x430
[ 157.077111] ? report_bug+0x1dd/0x390
[ 157.080842] ? handle_bug+0x5e/0xa0
[ 157.084389] ? exc_invalid_op+0x14/0x50
[ 157.088291] ? asm_exc_invalid_op+0x16/0x20
[ 157.092548] ? dma_buf_vmap+0x224/0x430
[ 157.096663] ? dma_resv_get_singleton+0x6d/0x230
[ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10
[ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10
[ 157.110697] drm_gem_shmem_vmap+0x74/0x710
[ 157.114866] drm_gem_vmap+0xa9/0x1b0
[ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0
[ 157.123086] drm_gem_fb_vmap+0xab/0x300
[ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10
[ 157.133032] ? lockdep_init_map_type+0x19d/0x880
[ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0
[ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180
[ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40
[...]
[ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents
this from happening. The framebuffer's cleanup later puts the handle
references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object
instance") triggers the segmentation fault easily by using the dma-buf
field more widely. The underlying issue with reference counting has
been present before.
v2:
- acquire the handle instead of the BO (Christian)
- fix comment style (Christian)
- drop the Fixes tag (Christian)
- rename err_ gotos
- add missing Link tag
Suggested-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1]
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Anusha Srivatsa <asrivats(a)redhat.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://lore.kernel.org/r/20250630084001.293053-1-tzimmermann@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++++++++---
drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++++----
drivers/gpu/drm/drm_internal.h | 2 +
3 files changed, 51 insertions(+), 11 deletions(-)
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -186,6 +186,35 @@ void drm_gem_private_object_fini(struct
}
EXPORT_SYMBOL(drm_gem_private_object_fini);
+static void drm_gem_object_handle_get(struct drm_gem_object *obj)
+{
+ struct drm_device *dev = obj->dev;
+
+ drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock));
+
+ if (obj->handle_count++ == 0)
+ drm_gem_object_get(obj);
+}
+
+/**
+ * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles
+ * @obj: GEM object
+ *
+ * Acquires a reference on the GEM buffer object's handle. Required
+ * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked()
+ * to release the reference.
+ */
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj)
+{
+ struct drm_device *dev = obj->dev;
+
+ guard(mutex)(&dev->object_name_lock);
+
+ drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */
+ drm_gem_object_handle_get(obj);
+}
+EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked);
+
/**
* drm_gem_object_handle_free - release resources bound to userspace handles
* @obj: GEM object to clean up.
@@ -216,8 +245,14 @@ static void drm_gem_object_exported_dma_
}
}
-static void
-drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj)
+/**
+ * drm_gem_object_handle_put_unlocked - releases reference on user-space handles
+ * @obj: GEM object
+ *
+ * Releases a reference on the GEM buffer object's handle. Possibly releases
+ * the GEM buffer object and associated dma-buf objects.
+ */
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj)
{
struct drm_device *dev = obj->dev;
bool final = false;
@@ -242,6 +277,7 @@ drm_gem_object_handle_put_unlocked(struc
if (final)
drm_gem_object_put(obj);
}
+EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked);
/*
* Called at device or object close to release the file's
@@ -363,8 +399,8 @@ drm_gem_handle_create_tail(struct drm_fi
int ret;
WARN_ON(!mutex_is_locked(&dev->object_name_lock));
- if (obj->handle_count++ == 0)
- drm_gem_object_get(obj);
+
+ drm_gem_object_handle_get(obj);
/*
* Get the user-visible handle using idr. Preload and perform
--- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
+++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
@@ -99,7 +99,7 @@ void drm_gem_fb_destroy(struct drm_frame
unsigned int i;
for (i = 0; i < fb->format->num_planes; i++)
- drm_gem_object_put(fb->obj[i]);
+ drm_gem_object_handle_put_unlocked(fb->obj[i]);
drm_framebuffer_cleanup(fb);
kfree(fb);
@@ -182,8 +182,10 @@ int drm_gem_fb_init_with_funcs(struct dr
if (!objs[i]) {
drm_dbg_kms(dev, "Failed to lookup GEM object\n");
ret = -ENOENT;
- goto err_gem_object_put;
+ goto err_gem_object_handle_put_unlocked;
}
+ drm_gem_object_handle_get_unlocked(objs[i]);
+ drm_gem_object_put(objs[i]);
min_size = (height - 1) * mode_cmd->pitches[i]
+ drm_format_info_min_pitch(info, i, width)
@@ -193,22 +195,22 @@ int drm_gem_fb_init_with_funcs(struct dr
drm_dbg_kms(dev,
"GEM object size (%zu) smaller than minimum size (%u) for plane %d\n",
objs[i]->size, min_size, i);
- drm_gem_object_put(objs[i]);
+ drm_gem_object_handle_put_unlocked(objs[i]);
ret = -EINVAL;
- goto err_gem_object_put;
+ goto err_gem_object_handle_put_unlocked;
}
}
ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs);
if (ret)
- goto err_gem_object_put;
+ goto err_gem_object_handle_put_unlocked;
return 0;
-err_gem_object_put:
+err_gem_object_handle_put_unlocked:
while (i > 0) {
--i;
- drm_gem_object_put(objs[i]);
+ drm_gem_object_handle_put_unlocked(objs[i]);
}
return ret;
}
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -155,6 +155,8 @@ void drm_sysfs_lease_event(struct drm_de
/* drm_gem.c */
int drm_gem_init(struct drm_device *dev);
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj);
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj);
int drm_gem_handle_create_tail(struct drm_file *file_priv,
struct drm_gem_object *obj,
u32 *handlep);
Patches currently in stable-queue which might be from tzimmermann(a)suse.de are
queue-6.6/drm-gem-fix-race-in-drm_gem_handle_create_tail.patch
queue-6.6/drm-gem-acquire-references-on-gem-handles-for-framebuffers.patch
Xu Yilun wrote:
> On Sat, Jun 21, 2025 at 11:07:24AM +1000, Alexey Kardashevskiy wrote:
> >
> >
> > On 11/6/25 11:55, Alexey Kardashevskiy wrote:
> > > Hi,
> > >
> > > Is there a QEMU tree using this somewhere?
> >
> > Ping? Thanks,
>
> Sorry for late. I've finally got a public tree.
>
> https://github.com/yiliu1765/qemu/tree/zhenzhong/devsec_tsm
>
> Again, I think the changes are far from good, just work for enabling.
At some point I want to stage a merge tree QEMU bits here:
https://git.kernel.org/pub/scm/linux/kernel/git/devsec/qemu.git/ (not
created yet)
...unless Paolo or others in QEMU community are open to running a
staging branch in qemu.git. At some point we need to collide all the
QEMU POC branches, and I expect that needs to happen and show some
success before the upstream projects start ingesting all these changes.
On 11/07/2025 5:00 pm, Tomeu Vizoso wrote:
> On Tue, Jun 24, 2025 at 3:50 PM Robin Murphy <robin.murphy(a)arm.com> wrote:
>>
>> On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
>> [...]
>>> diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
>>> index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
>>> --- a/drivers/accel/rocket/rocket_device.h
>>> +++ b/drivers/accel/rocket/rocket_device.h
>>> @@ -13,6 +13,8 @@
>>> struct rocket_device {
>>> struct drm_device ddev;
>>>
>>> + struct mutex sched_lock;
>>> +
>>> struct mutex iommu_lock;
>>
>> Just realised I missed this in the last patch, but iommu_lock appears to
>> be completely unnecessary now.
>>
>>> struct rocket_core *cores;
>> [...]
>>> +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
>>> +{
>>> + struct rocket_task *task;
>>> + bool task_pp_en = 1;
>>> + bool task_count = 1;
>>> +
>>> + /* GO ! */
>>> +
>>> + /* Don't queue the job if a reset is in progress */
>>> + if (atomic_read(&core->reset.pending))
>>> + return;
>>> +
>>> + task = &job->tasks[job->next_task_idx];
>>> + job->next_task_idx++;
>>> +
>>> + rocket_pc_writel(core, BASE_ADDRESS, 0x1);
>>> +
>>> + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
>>> + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
>>
>> Those really look like bitfield operations rather than actual arithmetic
>> to me.
>>
>>> +
>>> + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
>>
>> I don't see how regcmd is created (I guess that's in userspace?), but
>> given that it's explicitly u64 all the way through - and especially
>> since you claim to support 40-bit DMA addresses - it definitely seems
>> suspicious that the upper 32 bits never seem to be consumed anywhere :/
>
> Yeah, but there's no other register for BASE_ADDRESS address in the TRM.
That only reaffirms the question then - if this value is only ever
written verbatim to a 32-bit register, why is it 64-bit?
Thanks,
Robin.
Hi,
Here's another attempt at supporting user-space allocations from a
specific carved-out reserved memory region.
The initial problem we were discussing was that I'm currently working on
a platform which has a memory layout with ECC enabled. However, enabling
the ECC has a number of drawbacks on that platform: lower performance,
increased memory usage, etc. So for things like framebuffers, the
trade-off isn't great and thus there's a memory region with ECC disabled
to allocate from for such use cases.
After a suggestion from John, I chose to first start using heap
allocations flags to allow for userspace to ask for a particular ECC
setup. This is then backed by a new heap type that runs from reserved
memory chunks flagged as such, and the existing DT properties to specify
the ECC properties.
After further discussion, it was considered that flags were not the
right solution, and relying on the names of the heaps would be enough to
let userspace know the kind of buffer it deals with.
Thus, even though the uAPI part of it had been dropped in this second
version, we still needed a driver to create heaps out of carved-out memory
regions. In addition to the original usecase, a similar driver can be
found in BSPs from most vendors, so I believe it would be a useful
addition to the kernel.
Some extra discussion with Rob Herring [1] came to the conclusion that
some specific compatible for this is not great either, and as such an
new driver probably isn't called for either.
Some other discussions we had with John [2] also dropped some hints that
multiple CMA heaps might be a good idea, and some vendors seem to do
that too.
So here's another attempt that doesn't affect the device tree at all and
will just create a heap for every CMA reserved memory region.
It also falls nicely into the current plan we have to support cgroups in
DRM/KMS and v4l2, which is an additional benefit.
Let me know what you think,
Maxime
1: https://lore.kernel.org/all/20250707-cobalt-dingo-of-serenity-dbf92c@houat/
2: https://lore.kernel.org/all/CANDhNCroe6ZBtN_o=c71kzFFaWK-fF5rCdnr9P5h1sgPOW…
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v6:
- Drop the new driver and allocate a CMA heap for each region now
- Dropped the binding
- Rebased on 6.16-rc5
- Link to v5: https://lore.kernel.org/r/20250617-dma-buf-ecc-heap-v5-0-0abdc5863a4f@kerne…
Changes in v5:
- Rebased on 6.16-rc2
- Switch from property to dedicated binding
- Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kerne…
Changes in v4:
- Rebased on 6.15-rc7
- Map buffers only when map is actually called, not at allocation time
- Deal with restricted-dma-pool and shared-dma-pool
- Reword Kconfig options
- Properly report dma_map_sgtable failures
- Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kerne…
Changes in v3:
- Reworked global variable patch
- Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kerne…
Changes in v2:
- Add vmap/vunmap operations
- Drop ECC flags uapi
- Rebase on top of 6.14
- Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kerne…
---
Maxime Ripard (2):
dma/contiguous: Add helper to test reserved memory type
dma-buf: heaps: cma: Create CMA heap for each CMA reserved region
drivers/dma-buf/heaps/cma_heap.c | 52 +++++++++++++++++++++++++++++++++++++++-
include/linux/dma-map-ops.h | 13 ++++++++++
kernel/dma/contiguous.c | 7 ++++++
3 files changed, 71 insertions(+), 1 deletion(-)
---
base-commit: 47633099a672fc7bfe604ef454e4f116e2c954b1
change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e
prerequisite-message-id: <20250610131231.1724627-1-jkangas(a)redhat.com>
prerequisite-patch-id: bc44be5968feb187f2bc1b8074af7209462b18e7
prerequisite-patch-id: f02a91b723e5ec01fbfedf3c3905218b43d432da
prerequisite-patch-id: e944d0a3e22f2cdf4d3b3906e5603af934696deb
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
On Thu, Jul 10, 2025 at 10:49:19AM +0200, Pavel Machek wrote:
> Hi!
>
> > > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> > > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> > > 760p video recording. Plus, copying full-resolution photo buffer takes
> > > more than 200msec!
> > >
> > > There's possibility to do some processing on GPU, and its implemented here:
> > >
> > > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
> > >
> > > but that hits the same problem in the end -- data is in DMA-BUF,
> > > uncached, and takes way too long to copy out.
> > >
> > > And that's ... wrong. DMA ended seconds ago, complete cache flush
> > > would be way cheaper than copying single frame out, and I still have
> > > to deal with uncached frames.
> > >
> > > So I have two questions:
> > >
> > > 1) Is my analysis correct that, no matter how I get frame from v4l and
> > > process it on GPU, I'll have to copy it from uncached memory in the
> > > end?
> >
> > If you need to touch the buffers using the CPU then you are either
> > stuck with uncached memory or you need to implement bracketed access to
> > do the necessary cache maintenance. Be aware that completely flushing
> > the cache is not really an option, as that would impact other
> > workloads, so you have to flush the cache by walking the virtual
> > address space of the buffer, which may take a significant amount of CPU
> > time.
>
> What kind of "significant amount of CPU time" are we talking here?
> Millisecond?
It really depends on the platform, the type of cache, and the size of
the buffer. I remember that back in the N900 days a selective cash clean
of a large buffer for full resolution images took several dozens of
milliseconds, possibly close to 100ms. We had to clean the whole D-cache
to make it fast enough, but you can't always do that as Lucas mentioned.
> Bracketed access is fine with me.
>
> Flushing a cache should be an option. I'm root, there's no other
> significant workload, and copying out the buffer takes 200msec+. There
> are lot of cache flushes that can be done in quarter a second!
>
> > However, if you are only going to use the buffer with the GPU I see no
> > reason to touch it from the CPU side. Why would you even need to copy
> > the content? After all dma-bufs are meant to enable zero-copy between
> > DMA capable accelerators. You can simply import the V4L2 buffer into a
> > GL texture using EGL_EXT_image_dma_buf_import. Using this path you
> > don't need to bother with the cache at all, as the GPU will directly
> > read the video buffers from RAM.
>
> Yes, so GPU will read video buffer from RAM, then debayer it, and then
> what? Then I need to store a data into raw file, or use CPU to turn it
> into JPEG file, or maybe run video encoder on it. That are all tasks
> that are done on CPU...
--
Regards,
Laurent Pinchart
Hi Pavel,
Le jeudi 10 juillet 2025 à 10:24 +0200, Pavel Machek a écrit :
> Hi!
>
> It seems that DMA-BUFs are always uncached on arm64... which is a
> problem.
>
> I'm trying to get useful camera support on Librem 5, and that includes
> recording vidos (and taking photos).
>
> memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> 760p video recording. Plus, copying full-resolution photo buffer takes
> more than 200msec!
>
> There's possibility to do some processing on GPU, and its implemented here:
>
> https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
>
> but that hits the same problem in the end -- data is in DMA-BUF,
> uncached, and takes way too long to copy out.
>
> And that's ... wrong. DMA ended seconds ago, complete cache flush
> would be way cheaper than copying single frame out, and I still have
> to deal with uncached frames.
>
> So I have two questions:
>
> 1) Is my analysis correct that, no matter how I get frame from v4l and
> process it on GPU, I'll have to copy it from uncached memory in the
> end?
>
> 2) Does anyone have patches / ideas / roadmap how to solve that? It
> makes GPU unusable for computing, and camera basically unusable for
> video.
If CPU access is strictly required for your use case, the way forward is to
implement V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINT in the capture driver. Very
little drivers enable that.
Once your driver have that capability, you will be able to set
V4L2_MEMORY_FLAG_NON_COHERENT while doing REQBUFS or CREATE_BUFS ioctl. That
gives you allocation with CPU cache working, but you'll get the invalidation (or
flush) overhead by default. When capture data have not been read by CPU, you can
always queue it back with the V4L2_BUF_FLAG_NO_CACHE_INVALIDATE. But for your
use case, it seems that you want the invalidation to take place, otherwise your
software will endup reading old cache data instead of the next frame data.
Please note that the integration in the DMABuf SYNC ioctl was missing for a
while, so make sure you have recent enough kernel or get ready for backports.
The feature itself was commonly used with CPU only access, notably on ChromeOS
using libyuv. No DMABuf was involved initially.
regards,
Nicolas
[0] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-reqbu…
>
> Best regards,
> Pavel
We've discussed a number of times of how some heap names are bad, but
not really what makes a good heap name.
Let's document what we expect the heap names to look like.
Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v2:
- Added justifications for each requirement / suggestions
- Added a mention and example of buffer attributes
- Link to v1: https://lore.kernel.org/r/20250520-dma-buf-heap-names-doc-v1-1-ab31f74809ee…
---
Documentation/userspace-api/dma-buf-heaps.rst | 38 +++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
index 535f49047ce6450796bf4380c989e109355efc05..835ad1c3a65bc07b6f41d387d85c57162909e859 100644
--- a/Documentation/userspace-api/dma-buf-heaps.rst
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -21,5 +21,43 @@ following heaps:
usually created either through the kernel commandline through the
`cma` parameter, a memory region Device-Tree node with the
`linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
`CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
might be called ``reserved``, ``linux,cma``, or ``default-pool``.
+
+Naming Convention
+=================
+
+``dma-buf`` heaps name should meet a number of constraints:
+
+- That name must be stable, and must not change from one version to the
+ other. Userspace identifies heaps by their name, so if the names ever
+ changes, we would be likely to introduce regressions.
+
+- That name must describe the memory region the heap will allocate from,
+ and must uniquely identify it in a given platform. Since userspace
+ applications use the heap name as the discriminant, it must be able to
+ tell which heap it wants to use reliably if there's multiple heaps.
+
+- That name must not mention implementation details, such as the
+ allocator. The heap driver will change over time, and implementation
+ details when it was introduced might not be relevant in the future.
+
+- The name should describe properties of the buffers that would be
+ allocated. Doing so will make heap identification easier for
+ userspace. Such properties are:
+
+ - ``cacheable`` / ``uncacheable`` for buffers with CPU caches enabled
+ or disabled;
+
+ - ``contiguous`` for physically contiguous buffers;
+
+ - ``protected`` for encrypted buffers not accessible the OS;
+
+- The name may describe intended usage. Doing so will make heap
+ identification easier for userspace applications and users.
+
+For example, assuming a platform with a reserved memory region located
+at the RAM address 0x42000000, intended to allocate video framebuffers,
+physically contiguous, and backed by the CMA kernel allocator. Good
+names would be ``memory@42000000-cacheable-contiguous`` or
+``video@42000000``, but ``cma-video`` wouldn't.
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>