From: Jiri Pirko <jiri(a)nvidia.com>
Confidential computing (CoCo) VMs/guests, such as AMD SEV and Intel TDX,
run with private/encrypted memory which creates a challenge
for devices that do not support DMA to it (no TDISP support).
For kernel-only DMA operations, swiotlb bounce buffering provides a
transparent solution by copying data through shared memory.
However, the only way to get this memory into userspace is via the DMA
API's dma_alloc_pages()/dma_mmap_pages() type interfaces which limits
the use of the memory to a single DMA device, and is incompatible with
pin_user_pages().
These limitations are particularly problematic for the RDMA subsystem
which makes heavy use of pin_user_pages() and expects flexible memory
usage between many different DMA devices.
This patch series enables userspace to explicitly request shared
(decrypted) memory allocations from new dma-buf system_cc_shared heap.
Userspace can mmap this memory and pass the dma-buf fd to other
existing importers such as RDMA or DRM devices to access the
memory. The DMA API is improved to allow the dma heap exporter to DMA
map the shared memory to each importing device.
Based on dma-mapping-for-next e7442a68cd1ee797b585f045d348781e9c0dde0d
Jiri Pirko (2):
dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
dma-buf: heaps: system: add system_cc_shared heap for explicitly
shared memory
drivers/dma-buf/heaps/system_heap.c | 103 ++++++++++++++++++++++++++--
include/linux/dma-mapping.h | 10 +++
include/trace/events/dma.h | 3 +-
kernel/dma/direct.h | 14 +++-
kernel/dma/mapping.c | 13 +++-
5 files changed, 132 insertions(+), 11 deletions(-)
--
2.51.1
Wave Rider surprised me more than I expected. The game looks simple at first, but once the speed increases and the waves become unstable, every run starts feeling intense. Unlike many endless runner games that rely only on fast movement, Wave Rider creates pressure through unpredictable water physics and constantly changing obstacle patterns. Play Wave Rider online at https://wave-rider.io
The gameplay is built around survival. Your board moves forward automatically while you focus on steering through dangerous sections of water. Floating barriers, underwater mines, sharp turns, and narrow lanes appear quickly, forcing players to react within seconds. What I liked most is how the game balances smooth surfing movement with sudden moments of chaos. Sometimes the water feels calm, then everything changes when multiple obstacles appear together.
Wave Rider also rewards risky decisions. Pearls are often hidden near dangerous routes, making players choose between safe movement and higher scores. I found myself taking bigger risks after every run just to see how far I could push the distance counter. The durability system adds even more tension because repeated small crashes slowly reduce your chance of surviving longer sessions.
Game Controls
Left Arrow / A: Move left
Right Arrow / D: Move right
Spacebar or Up Arrow: Jump or glide over obstacles
The controls are easy to learn, but mastering the timing takes practice. Quick reactions help, but smooth movement and patience usually lead to better runs. Players who panic and change direction too aggressively often crash faster than expected.
Overall, Wave Rider feels like a solid arcade experience for anyone who enjoys reaction-based gameplay. The combination of flowing water, rising speed, and nonstop danger makes every session feel fresh and competitive.
When dumping IB contents from a hung job, amdgpu_devcoredump_format()
acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid()
and then, for each IB referenced by the job, calls amdgpu_bo_reserve()
on the BO that backs the IB. Both reservations are taken on
reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx,
which trips lockdep:
WARNING: possible recursive locking detected
--------------------------------------------
kworker/u128:0 is trying to acquire lock:
ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
but task is already holding lock:
ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
Possible unsafe locking scenario:
CPU0
----
lock(reservation_ww_class_mutex);
lock(reservation_ww_class_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
Call Trace:
__ww_mutex_lock.constprop.0
ww_mutex_lock
amdgpu_bo_reserve
amdgpu_devcoredump_format+0x1594 [amdgpu]
amdgpu_devcoredump_deferred_work+0xea [amdgpu]
process_one_work
worker_thread
kthread
The two reservations are on different BOs in the captured trace, so the
splat is a lockdep-correctness warning, not an observed deadlock. It
becomes a real self-deadlock whenever the IB BO shares its dma_resv
with the root PD (the always-valid case, see
amdgpu_vm_is_bo_always_valid()): amdgpu_bo_reserve(abo) re-acquires the
same ww_mutex without a ticket and blocks forever.
With amdgpu.gpu_recovery=0 the timeout handler refires every ~2 s and
each invocation produces this splat, drowning the kernel ring buffer.
Fix it by collecting the per-IB BO references under the root PD's
reservation, then releasing the root before reserving each IB BO
individually. The walk over the VM mapping tree must remain under the
root lock (mappings can be torn down without it), but the actual
content copies do not need to nest inside it. Each per-IB reservation
is now an independent top-level acquire, eliminating the nested
ww_mutex.
The collect/release logic is factored out into two small helpers
(amdgpu_devcoredump_collect_ib_refs / amdgpu_devcoredump_release_ib_refs)
to keep the main function's indentation reasonable.
This also fixes a BO refcount leak in the original code: when
amdgpu_bo_reserve() failed, control jumped to free_ib_content without
running amdgpu_bo_unref(). In the new structure the per-IB BO refs
are released unconditionally in the cleanup helper.
Reproducer (~150 LoC libdrm_amdgpu): submit a single GFX IB containing
PACKET3_INDIRECT_BUFFER chained at GPU VA 0 and wait for the fence.
The TDR fires within ~10 s and the deferred coredump worker produces
the splat above on every invocation.
Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
Cc: stable(a)vger.kernel.org # 7.1
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 147 +++++++++++++-----
1 file changed, 110 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
index d386bc775d03..f6bb968de756 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
@@ -207,6 +207,72 @@ static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev,
}
}
+struct amdgpu_devcoredump_ib_ref {
+ struct amdgpu_bo *bo;
+ u64 offset;
+};
+
+/*
+ * Walk the VM's mapping tree under the root PD's reservation to obtain the BO
+ * that backs each IB and pin it with a refcount. The root PD reservation is
+ * dropped before this function returns; the caller can then reserve each IB
+ * BO individually without nesting ww_mutex acquires on
+ * reservation_ww_class_mutex.
+ *
+ * Returns an array of num_ibs entries (each ib_refs[i].bo may be NULL if its
+ * mapping was not found), or NULL on allocation failure / VM lookup failure.
+ * The caller must release the BO refs and free the array.
+ */
+static struct amdgpu_devcoredump_ib_ref *
+amdgpu_devcoredump_collect_ib_refs(struct amdgpu_device *adev,
+ struct amdgpu_coredump_info *coredump)
+{
+ struct amdgpu_devcoredump_ib_ref *ib_refs;
+ struct amdgpu_bo_va_mapping *mapping;
+ struct amdgpu_bo *root;
+ struct amdgpu_vm *vm;
+ u64 va_start;
+
+ ib_refs = kcalloc(coredump->num_ibs, sizeof(*ib_refs), GFP_KERNEL);
+ if (!ib_refs)
+ return NULL;
+
+ vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+ if (!vm) {
+ kfree(ib_refs);
+ return NULL;
+ }
+
+ for (int i = 0; i < coredump->num_ibs; i++) {
+ va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
+ mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
+ if (!mapping)
+ continue;
+
+ ib_refs[i].bo = amdgpu_bo_ref(mapping->bo_va->base.bo);
+ ib_refs[i].offset = va_start -
+ mapping->start * AMDGPU_GPU_PAGE_SIZE;
+ }
+
+ amdgpu_bo_unreserve(root);
+ amdgpu_bo_unref(&root);
+
+ return ib_refs;
+}
+
+static void
+amdgpu_devcoredump_release_ib_refs(struct amdgpu_devcoredump_ib_ref *ib_refs,
+ int num_ibs)
+{
+ if (!ib_refs)
+ return;
+
+ for (int i = 0; i < num_ibs; i++)
+ if (ib_refs[i].bo)
+ amdgpu_bo_unref(&ib_refs[i].bo);
+ kfree(ib_refs);
+}
+
static ssize_t
amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_info *coredump)
{
@@ -214,13 +280,11 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
struct drm_printer p;
struct drm_print_iterator iter;
struct amdgpu_vm_fault_info *fault_info;
- struct amdgpu_bo_va_mapping *mapping;
struct amdgpu_ip_block *ip_block;
struct amdgpu_res_cursor cursor;
- struct amdgpu_bo *abo, *root;
- uint64_t va_start, offset;
+ struct amdgpu_bo *abo;
+ uint64_t offset;
struct amdgpu_ring *ring;
- struct amdgpu_vm *vm;
u32 *ib_content;
uint8_t *kptr;
int ver, i, j, r;
@@ -343,43 +407,52 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
if (coredump->num_ibs) {
- /* Don't try to lookup the VM or map the BOs when calculating the
- * size required to store the devcoredump.
+ struct amdgpu_devcoredump_ib_ref *ib_refs = NULL;
+
+ /*
+ * Snapshot per-IB BO references under the root PD's reservation,
+ * then release the root before reserving each IB BO individually
+ * to copy its contents.
+ *
+ * Reserving an IB BO while the root PD is still reserved would
+ * be a nested ww_mutex acquire on reservation_ww_class_mutex
+ * without a ww_acquire_ctx, which trips lockdep's recursive-
+ * locking check and self-deadlocks for IB BOs that share their
+ * dma_resv with the root PD (always-valid BOs).
+ *
+ * Skip lookup/reservation entirely on the sizing pass: it does
+ * not write IB content, and the size estimate doesn't depend on
+ * whether the BOs are reachable.
*/
- if (sizing_pass)
- vm = NULL;
- else
- vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+ if (!sizing_pass)
+ ib_refs = amdgpu_devcoredump_collect_ib_refs(adev, coredump);
- for (int i = 0; i < coredump->num_ibs && (sizing_pass || vm); i++) {
+ for (int i = 0; i < coredump->num_ibs; i++) {
ib_content = kvmalloc_array(coredump->ibs[i].ib_size_dw, 4,
GFP_KERNEL);
if (!ib_content)
continue;
- /* vm=NULL can only happen when 'sizing_pass' is true. Skip to the
- * drm_printf() calls (ib_content doesn't need to be initialized
- * as its content won't be written anywhere).
- */
- if (!vm)
+ if (sizing_pass)
goto output_ib_content;
- va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
- mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
- if (!mapping)
- goto free_ib_content;
+ if (!ib_refs || !ib_refs[i].bo)
+ goto output_ib_content;
+
+ abo = ib_refs[i].bo;
+ offset = ib_refs[i].offset;
- offset = va_start - (mapping->start * AMDGPU_GPU_PAGE_SIZE);
- abo = amdgpu_bo_ref(mapping->bo_va->base.bo);
r = amdgpu_bo_reserve(abo, false);
if (r)
- goto free_ib_content;
+ goto output_ib_content;
if (abo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS) {
off = 0;
- if (abo->tbo.resource->mem_type != TTM_PL_VRAM)
- goto unreserve_abo;
+ if (abo->tbo.resource->mem_type != TTM_PL_VRAM) {
+ amdgpu_bo_unreserve(abo);
+ goto output_ib_content;
+ }
amdgpu_res_first(abo->tbo.resource, offset,
coredump->ibs[i].ib_size_dw * 4,
@@ -395,8 +468,10 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
r = ttm_bo_kmap(&abo->tbo, 0,
PFN_UP(abo->tbo.base.size),
&abo->kmap);
- if (r)
- goto unreserve_abo;
+ if (r) {
+ amdgpu_bo_unreserve(abo);
+ goto output_ib_content;
+ }
kptr = amdgpu_bo_kptr(abo);
kptr += offset;
@@ -406,21 +481,19 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
amdgpu_bo_kunmap(abo);
}
+ amdgpu_bo_unreserve(abo);
+
output_ib_content:
drm_printf(&p, "\nIB #%d 0x%llx %d dw\n",
i, coredump->ibs[i].gpu_addr, coredump->ibs[i].ib_size_dw);
- for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
- drm_printf(&p, "0x%08x\n", ib_content[j]);
-unreserve_abo:
- if (vm)
- amdgpu_bo_unreserve(abo);
-free_ib_content:
+ if (!sizing_pass && ib_refs && ib_refs[i].bo) {
+ for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
+ drm_printf(&p, "0x%08x\n", ib_content[j]);
+ }
kvfree(ib_content);
}
- if (vm) {
- amdgpu_bo_unreserve(root);
- amdgpu_bo_unref(&root);
- }
+
+ amdgpu_devcoredump_release_ib_refs(ib_refs, coredump->num_ibs);
}
return count - iter.remain;
--
2.54.0
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock
the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and
ignore its return value. The function can fail with -EINTR from
dma_resv_lock_interruptible() (signal during lock wait) or with
-ENOMEM from dma_resv_reserve_fences() (fence slot allocation),
leaving the resv lock not held. The queue path then walks the object
array and calls dma_resv_add_fence(), which requires the lock held;
with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840
Call Trace:
virtio_gpu_array_add_fence
virtio_gpu_queue_ctrl_sgs
virtio_gpu_queue_fenced_ctrl_buffer
virtio_gpu_cursor_plane_update
drm_atomic_helper_commit_planes
drm_atomic_helper_commit_tail
commit_tail
drm_atomic_helper_commit
drm_atomic_commit
drm_atomic_helper_update_plane
__setplane_atomic
drm_mode_cursor_universal
drm_mode_cursor_common
drm_mode_cursor_ioctl
drm_ioctl
__x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock
races with concurrent readers/writers and can corrupt the list.
Both call sites run inside the .atomic_update plane callback, which
DRM atomic helpers do not allow to fail (by the time it runs, the
commit has been signed off to userspace and there is no clean
rollback path). Moving the lock acquisition to .prepare_fb was
rejected because the broader lock scope deadlocks against other BO
locking paths in the same atomic commit.
Introduce virtio_gpu_lock_one_resv_uninterruptible() that uses
dma_resv_lock() instead of dma_resv_lock_interruptible(). This
eliminates the -EINTR failure mode -- the realistic syzbot trigger
-- without extending the lock hold across the commit. The helper
locks a single BO and rejects nents > 1 with -EINVAL; both fix
sites lock exactly one BO.
Use it from virtio_gpu_cursor_plane_update() and
virtio_gpu_resource_flush(); check the return value to handle the
remaining -ENOMEM case from dma_resv_reserve_fences() by freeing
the objs and skipping the plane update for that frame. The
framebuffer BOs touched here are not shared with other contexts
and lock contention is expected to be brief, so the loss of
signal-interruptibility is acceptable.
Other callers of virtio_gpu_array_lock_resv() (the ioctl paths)
continue to use the interruptible variant.
The bug was reported by syzbot, triggered via fault injection
(fail_nth) on the DRM_IOCTL_MODE_CURSOR path, which forces the
-ENOMEM branch in dma_resv_reserve_fences().
Reported-by: syzbot+72bd3dd3a5d5f39a0271(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=72bd3dd3a5d5f39a0271
Fixes: 5cfd31c5b3a3 ("drm/virtio: fix virtio_gpu_cursor_plane_update().")
Cc: stable(a)vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com>
---
v4: Rename the helper to virtio_gpu_lock_one_resv_uninterruptible()
and reject objs->nents > 1 with -EINVAL. The v3 helper's
multi-object branch used drm_gem_lock_reservations(), which is
interruptible, contradicting the "uninterruptible" name; both
fix sites lock a single BO so the multi-object path is dropped.
(Dmitry Osipenko)
v3: Drop the prepare_fb/cleanup_fb approach from v2 (it deadlocked
against virtio_gpu_resource_flush(), which also locks the BO in
the same atomic commit). Instead add an uninterruptible variant
of the resv lock helper and use it in both
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush().
(Dmitry Osipenko)
v2: Move resv lock acquisition from .atomic_update (which must not
fail) to .prepare_fb (which may), per maintainer review of v1.
The v1 approach of silently skipping the cursor update on lock
failure violated the atomic-commit contract with userspace.
---
drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
drivers/gpu/drm/virtio/virtgpu_gem.c | 17 +++++++++++++++++
drivers/gpu/drm/virtio/virtgpu_plane.c | 10 ++++++++--
3 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index f17660a71a3e..2f3531950aa4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -317,6 +317,7 @@ virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 nents
void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
struct drm_gem_object *obj);
int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs);
+int virtio_gpu_lock_one_resv_uninterruptible(struct virtio_gpu_object_array *objs);
void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs);
void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs,
struct dma_fence *fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
index f22dc5c21cd4..435d37d36034 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -238,6 +238,23 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs)
return ret;
}
+int virtio_gpu_lock_one_resv_uninterruptible(struct virtio_gpu_object_array *objs)
+{
+ int ret;
+
+ if (objs->nents != 1)
+ return -EINVAL;
+
+ dma_resv_lock(objs->objs[0]->resv, NULL);
+
+ ret = dma_resv_reserve_fences(objs->objs[0]->resv, 1);
+ if (ret) {
+ virtio_gpu_array_unlock_resv(objs);
+ return ret;
+ }
+ return 0;
+}
+
void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs)
{
if (objs->nents == 1) {
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a126d1b25f46..652352424744 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -215,7 +215,10 @@ static void virtio_gpu_resource_flush(struct drm_plane *plane,
if (!objs)
return;
virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
- virtio_gpu_array_lock_resv(objs);
+ if (virtio_gpu_lock_one_resv_uninterruptible(objs)) {
+ virtio_gpu_array_put_free(objs);
+ return;
+ }
virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y,
width, height, objs,
vgplane_st->fence);
@@ -459,7 +462,10 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane,
if (!objs)
return;
virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
- virtio_gpu_array_lock_resv(objs);
+ if (virtio_gpu_lock_one_resv_uninterruptible(objs)) {
+ virtio_gpu_array_put_free(objs);
+ return;
+ }
virtio_gpu_cmd_transfer_to_host_2d
(vgdev, 0,
plane->state->crtc_w,
--
2.43.0
In FNAF 2, players work as a night security guard at Freddy Fazbear’s Pizza, a children’s restaurant filled with animatronic characters. Unlike normal robots, these animatronics become dangerous at night and try to attack anyone they see. The player must survive from midnight until 6 a.m. by monitoring security cameras, using a flashlight, and wearing a Freddy Fazbear mask to trick the animatronics. Play now: https://fnaf-2.io
Hi,
The recent introduction of heaps in the optee driver [1] made possible
the creation of heaps as modules.
It's generally a good idea if possible, including for the already
existing system and CMA heaps.
The system one is pretty trivial, the CMA is now easy too with the
reworks we got in 7.1-r1.
Let me know what you think,
Maxime
1: https://lore.kernel.org/dri-devel/20250911135007.1275833-4-jens.wiklander@l…
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v5:
- Rebase on 7.1-rc1
- Add a patch to enable the heaps in arm64 defconfig
- Link to v4: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-0-e18fda5044…
Changes in v4:
- Fix compilation failure
- Rework to take into account OF_RESERVED_MEM
- Fix regression making the default CMA area disappear if not created
through the DT
- Added some documentation and comments
- Link to v3: https://lore.kernel.org/r/20260303-dma-buf-heaps-as-modules-v3-0-24344812c7…
Changes in v3:
- Squashed cma_get_name and cma_alloc/release patches
- Fixed typo in Export dev_get_cma_area commit title
- Fixed compilation failure with DMA_CMA but not OF_RESERVED_MEM
- Link to v2: https://lore.kernel.org/r/20260227-dma-buf-heaps-as-modules-v2-0-454aee7e06…
Changes in v2:
- Collect tags
- Don't export dma_contiguous_default_area anymore, but export
dev_get_cma_area instead
- Mentioned that heap modules can't be removed
- Link to v1: https://lore.kernel.org/r/20260225-dma-buf-heaps-as-modules-v1-0-2109225a09…
---
Maxime Ripard (4):
dma-buf: heaps: Export mem_accounting parameter
dma-buf: heaps: cma: Turn the heap into a module
dma-buf: heaps: system: Turn the heap into a module
arm64: defconfig: Enable dma-buf heaps
arch/arm64/configs/defconfig | 3 +++
drivers/dma-buf/dma-heap.c | 1 +
drivers/dma-buf/heaps/Kconfig | 4 ++--
drivers/dma-buf/heaps/cma_heap.c | 3 +++
drivers/dma-buf/heaps/system_heap.c | 5 +++++
5 files changed, 14 insertions(+), 2 deletions(-)
---
base-commit: 5e9b7d093f3f77cb0af4409559e3d139babfb443
change-id: 20260225-dma-buf-heaps-as-modules-1034b3ec9f2a
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
When sharing a dma-buf between components of different trust levels, the
allocator may need to hand a consumer a read-only view of a buffer it
holds with read-write access. An example is a camera pipeline where the
capture component writes frames into a buffer and needs to pass a
read-only handle to a downstream processing component that should not be
able to modify the data.
However, no such mechanism exists today. The access mode of a dma-buf
file descriptor is fixed at export time, and the standard POSIX
interfaces for duplicating or changing file descriptors (i.e., dup(2),
dup3(2), and fcntl(F_SETFL)) cannot alter the read/write access mode of
the copy.
One natural candidate would be reopening via /proc/self/fd/<N> with
O_RDONLY, which works for regular files. For dma-buf this would fail
(that is, if we were to add a new handler for open f_op) with ENXIO
because the dmabuf pseudo-filesystem carries SB_NOUSER, which prevents
the VFS from opening its files through path-based resolution from
userspace.
Alternatively, exporting the buffer twice would produce two independent
dma_buf instances, which breaks fence synchronization.
Therefore we add a new DMA_BUF_IOCTL_DERIVE ioctl, which produces a new
file descriptor for an existing dma-buf with a caller-specified subset
of the original permissions:
```
struct dma_buf_derive { __u32 flags; __s32 fd; };
struct dma_buf_derive req = { .flags = O_RDONLY | O_CLOEXEC };
ioctl(rw_fd, DMA_BUF_IOCTL_DERIVE, &req);
/* req.fd is now a read-only alias of the same buffer */
```
Permission escalation is rejected with -EACCES. The new fd aliases the
same struct dma_buf as the original, same dma_resv, same exporter ops,
same underlying memory; so importers attaching to either fd see the same
fence timeline and operate on the same object. Access control for which
components may receive or pass on restricted descriptors can be layered on
top via SELinux file:read and file:write permissions.
A shared writable mapping (PROT_WRITE | MAP_SHARED) on the read-only fd is
rejected with -EACCES in dma_buf_mmap_internal().
Two small internal adjustments accompany the ioctl:
- __dma_buf_list_del() is moved to dma_buf_release() so it fires exactly
once on dentry destruction rather than on every file close.
- dma_buf_file_release() is updated to call dma_buf_put() only for
files that are not the primary dma-buf file.
This may not be the best approach, but after considering different
options and alternatives (as described above), we decided to raise the
discussion upstream. Thus, we welcome any alternative proposal or ideas.
The series is structured as:
- Patch 1 adds the new ioctl implementation.
- Patch 2 adds selftests covering the new ioctl.
Signed-off-by: Albert Esteve <aesteve(a)redhat.com>
---
Albert Esteve (2):
dma-buf: add DMA_BUF_IOCTL_DERIVE for reduced-permission aliases
selftests: dma-buf: add DERIVE ioctl tests
drivers/dma-buf/dma-buf.c | 58 ++++++++++-
include/uapi/linux/dma-buf.h | 28 +++++
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 114 ++++++++++++++++++++-
3 files changed, 198 insertions(+), 2 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-dmabuf-limit-access-73261353841a
Best regards,
--
Albert Esteve <aesteve(a)redhat.com>