This series adds a new device /dev/syncobj that can be used to create
and manipulate DRM syncobjs. Previously, these operations required the
use of a DRM device and the device needed to support the DRIVER_SYNCOBJ
and DRIVER_SYNCOBJ_TIMELINE features.
There are several issues with the existing API:
- Syncobjs are the only explicit sync mechanism available on wayland.
Most compositors do not use GPU waits. Instead, they use the
DRM_IOCTL_SYNCOBJ_EVENTFD ioctl to perform a CPU wait. Being tied to
DRM devices means that compositors cannot consistently offer this
feature even though no device-specific logic is involved.
- llvmpipe currently cannot offer syncobj interop because it does not
have access to a DRM device. This means that applications using
llvmpipe cannot present images before they have finished rendering,
despite llvmpipe using threaded rendering.
- Clients that do not use the Vulkan WSI need to manually probe /dev/dri
for devices that support the syncobj ioctls in order to use the
wayland syncobj protocol.
- Similarly, clients that want to use screen capture have no equivalent
to the WSI and are therefore forced into that path.
- Having to keep a DRM device open has potentially negative interactions
with GPU hotplug.
- Having to translate between syncobj FDs and handles is troublesome in
the compositor usecase since syncobjs come and go frequently and need
to be cleaned up when clients disconnect.
/dev/syncobj solves these issues by providing all syncobj ioctls under a
consistent path that is not tied to any DRM device. It also operates
directly on file descriptors instead of syncobj handles.
The series starts with a number of small refactorings in drm_syncobj.c
to make its functionality available outside of the file and without the
need for drm_file/handle pairs.
The last commit adds the /dev/syncobj module. I've added it as a misc
device but maybe this should instead live somewhere under gpu/drm.
An application using the new interface can be found at [1].
[1]: https://github.com/mahkoh/jay/pull/947
---
Julian Orth (12):
drm/syncobj: add drm_syncobj_from_fd
drm/syncobj: add drm_syncobj_fence_lookup
drm/syncobj: make drm_syncobj_array_wait_timeout public
drm/syncobj: add drm_syncobj_register_eventfd
drm/syncobj: have transfer functions accept drm_syncobj directly
drm/syncobj: add drm_syncobj_transfer
drm/syncobj: add drm_syncobj_timeline_signal
drm/syncobj: add drm_syncobj_query
drm/syncobj: fix resource leak in drm_syncobj_import_sync_file_fence
drm/syncobj: add drm_syncobj_import_sync_file
drm/syncobj: add drm_syncobj_export_sync_file
misc/syncobj: add new device
Documentation/userspace-api/ioctl/ioctl-number.rst | 1 +
drivers/gpu/drm/drm_syncobj.c | 374 ++++++++++++++-----
drivers/misc/Kconfig | 10 +
drivers/misc/Makefile | 1 +
drivers/misc/syncobj.c | 404 +++++++++++++++++++++
include/drm/drm_syncobj.h | 21 ++
include/uapi/linux/syncobj.h | 75 ++++
7 files changed, 795 insertions(+), 91 deletions(-)
---
base-commit: 6916d5703ddf9a38f1f6c2cc793381a24ee914c6
change-id: 20260516-jorth-syncobj-d4d374c8c61b
Best regards,
--
Julian Orth <ju.orth(a)gmail.com>
This RFC builds on T.J. Mercier's earlier series [1] which added
a memory.stat counter for exported dma-bufs and a binder-backed
mechanism to transfer charges between cgroups.
The first commit is taken almost verbatim from TJ's series:
it introduces MEMCG_DMABUF as a dedicated per-cgroup stat, so that
the total exported dma-buf footprint is visible both system-wide
(via the root cgroup) and per-application (via per-process cgroups).
This avoids the overhead of DMABUF_SYSFS_STATS and integrates
naturally into the existing cgroup memory hierarchy.
The rest of the series departs from TJ's approach. While the first
commit introduces the memcg stat infrastructure for dmabufs, the
export-time charging it introduces in dma_buf_export() is then
superseded: we charge at dma_heap_ioctl_allocate() time, using a
new charge_pid_fd field in struct dma_heap_allocation_data. The
allocator opens a pidfd for its client (e.g., from binder's
sender_pid), passes it to the ioctl, and the kernel charges the
buffer directly to the client's cgroup at allocation time, so no
transfer step is needed.
This decouples the accounting path from binder entirely:
any allocator that knows its client's PID can use the pid_fd
mechanism regardless of the IPC transport in use.
The cross-cgroup charging capability requires access control.
Patches #3 and #4 add a generic LSM hook (security_dma_heap_alloc)
and an SELinux implementation based on a new dma_heap object class
with a charge_to permission, so policy authors can express which
domains are allowed to charge memory to another domain's cgroup.
Last patch adds some tests to verify the new charge_pid_fd field.
We are sending it as an RFC to spark broader discussion. It may or
may not be the right path forward, and we welcome feedback on the
trade-offs.
Collision note: Eric Chanudet's series [2] adds __GFP_ACCOUNT to
system_heap page allocations as an opt-in module parameter. That
approach charges pages to the allocator's own kmem, which overlaps with
MEMCG_DMABUF. This series explicitly removes __GFP_ACCOUNT from system
heap allocations and routes all accounting through the MEMCG_DMABUF
path to avoid double-counting.
[1] https://lore.kernel.org/cgroups/20230109213809.418135-1-tjmercier@google.co…
[2] https://lore.kernel.org/r/20260113-dmabuf-heap-system-memcg-v2-0-e85722cc2f…
Signed-off-by: Albert Esteve <aesteve(a)redhat.com>
---
Albert Esteve (4):
dma-heap: charge dma-buf memory via explicit memcg
security: dma-heap: Add dma_heap_alloc LSM hook
selinux: Restrict cross-cgroup dma-heap charging
selftests/dmabuf-heaps: Add dma-buf memcg accounting tests
T.J. Mercier (1):
memcg: Track exported dma-buffers
Documentation/admin-guide/cgroup-v2.rst | 5 +
drivers/dma-buf/dma-buf.c | 7 +
drivers/dma-buf/dma-heap.c | 54 +++++-
drivers/dma-buf/heaps/system_heap.c | 2 -
include/linux/dma-buf.h | 4 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/memcontrol.h | 37 ++++
include/linux/security.h | 7 +
include/uapi/linux/dma-heap.h | 6 +
mm/memcontrol.c | 19 ++
security/security.c | 16 ++
security/selinux/hooks.c | 7 +
security/selinux/include/classmap.h | 1 +
tools/testing/selftests/cgroup/Makefile | 2 +-
tools/testing/selftests/cgroup/test_memcontrol.c | 143 +++++++++++++-
tools/testing/selftests/dmabuf-heaps/config | 1 +
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 126 ++++++++++++-
tools/testing/selftests/dmabuf-heaps/vmtest.sh | 205 +++++++++++++++++++++
18 files changed, 633 insertions(+), 10 deletions(-)
---
base-commit: 74fe02ce122a6103f207d29fafc8b3a53de6abaf
change-id: 20260508-v2_20230123_tjmercier_google_com-f44fcfb16530
Best regards,
--
Albert Esteve <aesteve(a)redhat.com>
Space Waves features simple controls but intense gameplay. Players must carefully time their movements to avoid crashing into obstacles in a constantly moving environment https://space-waves.co
virtio_gpu_cursor_plane_update() allocates a virtio_gpu_object_array,
locks its dma_resv, and queues a fenced transfer to the host. The
lock acquisition can fail in two ways:
- dma_resv_lock_interruptible() returns -EINTR when a signal is
delivered while waiting for the reservation lock.
- dma_resv_reserve_fences() returns -ENOMEM if it fails to allocate
a fence slot; in this case lock_resv unlocks before returning.
The return value was ignored, so the cursor path could proceed with
the resv lock not held. The queue path then walks the object array
and calls dma_resv_add_fence(), which requires the lock; with lockdep
enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840
Call Trace:
virtio_gpu_array_add_fence
virtio_gpu_queue_ctrl_sgs
virtio_gpu_queue_fenced_ctrl_buffer
virtio_gpu_cursor_plane_update
drm_atomic_helper_commit_planes
drm_atomic_helper_commit_tail
commit_tail
drm_atomic_helper_commit
drm_atomic_commit
drm_atomic_helper_update_plane
__setplane_atomic
drm_mode_cursor_universal
drm_mode_cursor_common
drm_mode_cursor_ioctl
drm_ioctl
__x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock
races with concurrent readers/writers and can corrupt the list.
The DRM atomic helpers do not allow .atomic_update to fail: by the
time it runs, the commit has been signed off to userspace and there
is no clean rollback path. Move the fallible work -- objs allocation,
dma_resv locking, and fence slot reservation -- into
virtio_gpu_plane_prepare_fb, which is the designated callback for
resource acquisition and may return errors that the framework
handles by rolling back the commit. Stash the prepared object array
on virtio_gpu_plane_state so the update step can consume it.
Make virtio_gpu_plane_cleanup_fb release the objs if the commit was
rolled back before update ran (i.e., objs not consumed). The queue
path already unlocks the resv after attaching the fence (vq.c:411)
and frees the array via put_free_delayed after host completion
(vq.c:271), so the update step only needs to clear vgplane_st->objs
to transfer ownership.
Simplify virtio_gpu_cursor_plane_update to a no-fail queue submission
that hands the prepared, locked objs to the queue path.
The bug was reported by syzbot, triggered via fault injection
(fail_nth) on the DRM_IOCTL_MODE_CURSOR path, which forces the
-ENOMEM branch in dma_resv_reserve_fences().
Reported-by: syzbot+72bd3dd3a5d5f39a0271(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=72bd3dd3a5d5f39a0271
Fixes: 5cfd31c5b3a3 ("drm/virtio: fix virtio_gpu_cursor_plane_update().")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20260510053025.100224-1-kartikey406@gmail.com/T/ [v1]
Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com>
---
v2: Move resv lock acquisition from .atomic_update (which must not
fail) to .prepare_fb (which may), per maintainer review of v1.
The previous approach of silently skipping the cursor update on
lock failure violated the atomic-commit contract with userspace.
---
drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
drivers/gpu/drm/virtio/virtgpu_plane.c | 38 ++++++++++++++++++++------
2 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index f17660a71a3e..e51f959dce46 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -198,6 +198,7 @@ struct virtio_gpu_framebuffer {
struct virtio_gpu_plane_state {
struct drm_plane_state base;
struct virtio_gpu_fence *fence;
+ struct virtio_gpu_object_array *objs;
};
#define to_virtio_gpu_plane_state(x) \
container_of(x, struct virtio_gpu_plane_state, base)
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a126d1b25f46..b0511ace89e6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -381,6 +381,23 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
goto err_fence;
}
+ if (plane->type == DRM_PLANE_TYPE_CURSOR && bo->dumb) {
+ struct virtio_gpu_object_array *objs;
+
+ objs = virtio_gpu_array_alloc(1);
+ if (!objs) {
+ ret = -ENOMEM;
+ goto err_fence;
+ }
+ virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
+ ret = virtio_gpu_array_lock_resv(objs);
+ if (ret) {
+ virtio_gpu_array_put_free(objs);
+ goto err_fence;
+ }
+ vgplane_st->objs = objs;
+ }
+
return 0;
err_fence:
@@ -417,6 +434,12 @@ static void virtio_gpu_plane_cleanup_fb(struct drm_plane *plane,
vgplane_st->fence = NULL;
}
+ if (vgplane_st->objs) {
+ virtio_gpu_array_unlock_resv(vgplane_st->objs);
+ virtio_gpu_array_put_free(vgplane_st->objs);
+ vgplane_st->objs = NULL;
+ }
+
obj = state->fb->obj[0];
if (drm_gem_is_imported(obj))
virtio_gpu_cleanup_imported_obj(obj);
@@ -452,21 +475,18 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane,
}
if (bo && bo->dumb && (plane->state->fb != old_state->fb)) {
- /* new cursor -- update & wait */
- struct virtio_gpu_object_array *objs;
-
- objs = virtio_gpu_array_alloc(1);
- if (!objs)
- return;
- virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
- virtio_gpu_array_lock_resv(objs);
+ /* objs and fence were prepared in virtio_gpu_plane_prepare_fb;
+ * the resv is already locked. The queue path takes ownership
+ * of objs and unlocks the resv after attaching the fence.
+ */
virtio_gpu_cmd_transfer_to_host_2d
(vgdev, 0,
plane->state->crtc_w,
plane->state->crtc_h,
- 0, 0, objs, vgplane_st->fence);
+ 0, 0, vgplane_st->objs, vgplane_st->fence);
virtio_gpu_notify(vgdev);
dma_fence_wait(&vgplane_st->fence->f, true);
+ vgplane_st->objs = NULL;
}
if (plane->state->fb != old_state->fb) {
--
2.43.0
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock
the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and
ignore its return value. The function can fail with -EINTR from
dma_resv_lock_interruptible() (signal during lock wait) or with
-ENOMEM from dma_resv_reserve_fences() (fence slot allocation),
leaving the resv lock not held. The queue path then walks the object
array and calls dma_resv_add_fence(), which requires the lock held;
with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840
Call Trace:
virtio_gpu_array_add_fence
virtio_gpu_queue_ctrl_sgs
virtio_gpu_queue_fenced_ctrl_buffer
virtio_gpu_cursor_plane_update
drm_atomic_helper_commit_planes
drm_atomic_helper_commit_tail
commit_tail
drm_atomic_helper_commit
drm_atomic_commit
drm_atomic_helper_update_plane
__setplane_atomic
drm_mode_cursor_universal
drm_mode_cursor_common
drm_mode_cursor_ioctl
drm_ioctl
__x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock
races with concurrent readers/writers and can corrupt the list.
Both call sites run inside the .atomic_update plane callback, which
DRM atomic helpers do not allow to fail (by the time it runs, the
commit has been signed off to userspace and there is no clean
rollback path). Moving the lock acquisition to .prepare_fb (v2) was
rejected because the broader lock scope deadlocks against other
BO locking paths in the same atomic commit.
Introduce virtio_gpu_array_lock_resv_uninterruptible() that uses
dma_resv_lock() instead of dma_resv_lock_interruptible() on the
nents==1 path. This eliminates the -EINTR failure mode -- the
realistic syzbot trigger -- without extending the lock hold across
the commit. Use it from both virtio_gpu_cursor_plane_update() and
virtio_gpu_resource_flush(); check the return value to handle the
remaining -ENOMEM case by freeing the objs and skipping the plane
update for that frame. The framebuffer BOs touched here are not
shared with other contexts and lock contention is expected to be
brief, so the loss of signal-interruptibility is acceptable.
Other callers of virtio_gpu_array_lock_resv() (the ioctl paths)
continue to use the interruptible variant.
The bug was reported by syzbot, triggered via fault injection
(fail_nth) on the DRM_IOCTL_MODE_CURSOR path, which forces the
-ENOMEM branch in dma_resv_reserve_fences().
Reported-by: syzbot+72bd3dd3a5d5f39a0271(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=72bd3dd3a5d5f39a0271
Fixes: 5cfd31c5b3a3 ("drm/virtio: fix virtio_gpu_cursor_plane_update().")
Cc: stable(a)vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com>
---
v3: Per maintainer feedback on v2 (lockup caused by the broader
lock scope in prepare_fb conflicting with other BO locking in
the same atomic commit): drop the prepare_fb/cleanup_fb
approach, introduce an uninterruptible variant of
virtio_gpu_array_lock_resv(), and use it in both
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush().
v2: Move resv lock acquisition from .atomic_update (which must not
fail) to .prepare_fb (which may), per maintainer review of v1.
The previous approach of silently skipping the cursor update on
lock failure violated the atomic-commit contract with userspace.
---
drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
drivers/gpu/drm/virtio/virtgpu_gem.c | 24 ++++++++++++++++++++++++
drivers/gpu/drm/virtio/virtgpu_plane.c | 10 ++++++++--
3 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index f17660a71a3e..43a7eb568e15 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -317,6 +317,7 @@ virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 nents
void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
struct drm_gem_object *obj);
int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs);
+int virtio_gpu_array_lock_resv_uninterruptible(struct virtio_gpu_object_array *objs);
void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs);
void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs,
struct dma_fence *fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
index f22dc5c21cd4..08c4b7ef8d44 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -238,6 +238,30 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs)
return ret;
}
+int virtio_gpu_array_lock_resv_uninterruptible(struct virtio_gpu_object_array *objs)
+{
+ unsigned int i;
+ int ret = 0;
+
+ if (objs->nents == 1) {
+ dma_resv_lock(objs->objs[0]->resv, NULL);
+ } else {
+ ret = drm_gem_lock_reservations(objs->objs, objs->nents,
+ &objs->ticket);
+ if (ret)
+ return ret;
+ }
+
+ for (i = 0; i < objs->nents; ++i) {
+ ret = dma_resv_reserve_fences(objs->objs[i]->resv, 1);
+ if (ret) {
+ virtio_gpu_array_unlock_resv(objs);
+ return ret;
+ }
+ }
+ return 0;
+}
+
void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs)
{
if (objs->nents == 1) {
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a126d1b25f46..ef118cb4f0fa 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -215,7 +215,10 @@ static void virtio_gpu_resource_flush(struct drm_plane *plane,
if (!objs)
return;
virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
- virtio_gpu_array_lock_resv(objs);
+ if (virtio_gpu_array_lock_resv_uninterruptible(objs)) {
+ virtio_gpu_array_put_free(objs);
+ return;
+ }
virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y,
width, height, objs,
vgplane_st->fence);
@@ -459,7 +462,10 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane,
if (!objs)
return;
virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
- virtio_gpu_array_lock_resv(objs);
+ if (virtio_gpu_array_lock_resv_uninterruptible(objs)) {
+ virtio_gpu_array_put_free(objs);
+ return;
+ }
virtio_gpu_cmd_transfer_to_host_2d
(vgdev, 0,
plane->state->crtc_w,
--
2.43.0
The kerneldoc comment on dma_fence_init() and dma_fence_init64() describe
the legacy reason to pass an external lock as a need to prevent multiple
fences "from signaling out of order". However, this wording is a bit
misleading: a shared spinlock does not (and cannot) prevent the signaler
from signaling out of order. Signaling order is the driver's responsibility
regardless of whether the lock is shared or per-fence.
What a shared lock actually provides is serialization of signaling and
observation across fences in a given context, so that observers never
see a later fence signaled while an earlier one is not.
Reword both comments to describe this more accurately.
Signed-off-by: MaÃra Canal <mcanal(a)igalia.com>
---
v1 -> v2: https://lore.kernel.org/dri-devel/20260411185756.1887119-4-mcanal@igalia.co…
- Be more explicit about not allowing new users to use an external lock.
- De-duplicate the explanation in dma_fence_init64() by pointing to the
dma_fence_init() documentation.
---
drivers/dma-buf/dma-fence.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1c1eaecaf1b0..63b349ba9a34 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1102,9 +1102,11 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
* context and seqno are used for easy comparison between fences, allowing
* to check which fence is later by simply using dma_fence_later().
*
- * It is strongly discouraged to provide an external lock because this couples
- * lock and fence life time. This is only allowed for legacy use cases when
- * multiple fences need to be prevented from signaling out of order.
+ * External locks are a relic from legacy use cases that needed a shared lock
+ * to serialize signaling and observation of fences within a context, so that
+ * observers never see a later fence signaled while an earlier one isn't. New
+ * users MUST NOT use external locks, as they force the issuer to outlive all
+ * fences that reference the lock.
*/
void
dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -1129,9 +1131,8 @@ EXPORT_SYMBOL(dma_fence_init);
* Context and seqno are used for easy comparison between fences, allowing
* to check which fence is later by simply using dma_fence_later().
*
- * It is strongly discouraged to provide an external lock because this couples
- * lock and fence life time. This is only allowed for legacy use cases when
- * multiple fences need to be prevented from signaling out of order.
+ * New users MUST NOT use external locks. Check the documentation in
+ * dma_fence_init() to understand the motives behind the legacy use cases.
*/
void
dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
--
2.53.0
Turn the mixed bag of manual locks and guards into something
more consistent.
This patchset takes care of locks that already have guards
available, but also adds new guards for resv, drm_dev_enter/exit
and the custom panthor_device_resume_and_get() helper we have
around runtime PM.
I've intentionally placed the patch transition all locks with
readily available guards first so we can merge it even if the
new guards face some controversy.
Signed-off-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
Boris Brezillon (6):
drm/panthor: Driver-wide xxx_[un]lock -> [scoped_]guard replacement
dma-resv: Define guards for context-less dma_resv locks
drm: Define a conditional guard for drm_dev_{enter,exit}()
drm/panthor: Use guards for resv locking
drm/panthor: Use the drm_dev_access guard
drm/panthor: Add a new guard for our custom resume_and_get() PM helper
drivers/gpu/drm/panthor/panthor_devfreq.c | 29 +-
drivers/gpu/drm/panthor/panthor_device.c | 163 +++++-----
drivers/gpu/drm/panthor/panthor_device.h | 10 +-
drivers/gpu/drm/panthor/panthor_drv.c | 62 ++--
drivers/gpu/drm/panthor/panthor_gem.c | 102 +++----
drivers/gpu/drm/panthor/panthor_gpu.c | 40 +--
drivers/gpu/drm/panthor/panthor_heap.c | 139 ++++-----
drivers/gpu/drm/panthor/panthor_mmu.c | 480 ++++++++++++++----------------
drivers/gpu/drm/panthor/panthor_pwr.c | 8 +-
drivers/gpu/drm/panthor/panthor_sched.c | 254 ++++++++--------
include/drm/drm_drv.h | 9 +
include/linux/dma-resv.h | 5 +
12 files changed, 589 insertions(+), 712 deletions(-)
---
base-commit: ac5ac0acf11df04295eb1811066097b7022d6c7f
change-id: 20260512-panthor-guard-refactor-f1c6bc30c321
prerequisite-message-id: 20260512-panthor-signal-from-irq-v2-0-95c614a739cb(a)collabora.com
prerequisite-patch-id: e3cfd6399b2dc5439687932c6e961d845369562a
prerequisite-patch-id: 79820e6740c0c456efc1dfa273de04e495515a1c
prerequisite-patch-id: a3611a7c9551c606aaf87125782e6d18b6a6549e
prerequisite-patch-id: 6e9dc83a60e53e7b0d84030727ad9b1921e4b2ca
prerequisite-patch-id: eabd36064a01418a6ada3176b996a4038a314c21
prerequisite-patch-id: ca3a30182b71bf66c51ed2b6411d7ed8dc761c8e
prerequisite-patch-id: 6e549dd0ee9e3e0c8866da72dcabc82209d88360
prerequisite-patch-id: 5217700df7026ef533a2f273ea2535f9fc1274ac
prerequisite-patch-id: 8d57abec9f92bcbb21108d3005805b7c155a48f6
prerequisite-patch-id: 0bf98de955fce577ff8d4fb82c02dc04684beca6
prerequisite-patch-id: a9e0d90a64dfd5950a69b857af3867404be1ab45
Best regards,
--
Boris Brezillon <boris.brezillon(a)collabora.com>
Hi all,
This series is based on previous RFCs/discussions:
Tech topic: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
RFCv1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
RFCv2: https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
The background/rationale is covered in more detail in the RFC cover
letters. The TL;DR is:
The goal is to enable userspace driver designs that use VFIO to export
DMABUFs representing subsets of PCI device BARs, and "vend" those
buffers from a primary process to other subordinate processes by fd.
These processes then mmap() the buffers and their access to the device
is isolated to the exported ranges. This is an improvement on sharing
the VFIO device fd to subordinate processes, which would allow
unfettered access .
This is achieved by enabling mmap() of vfio-pci DMABUFs. Second, a
new ioctl()-based revocation mechanism is added to allow the primary
process to forcibly revoke access to previously-shared BAR spans, even
if the subordinate processes haven't cleanly exited.
(The related topic of safe delegation of iommufd control to the
subordinate processes is not addressed here, and is follow-up work.)
As well as isolation and revocation, another advantage to accessing a
BAR through a VMA backed by a DMABUF is that it's straightforward to
create the buffer with access attributes, such as write-combining.
Notes on patches
================
Feedback from the RFCs requested that, instead of creating
DMABUF-specific vm_ops and .fault paths, to go the whole way and
migrate the existing VFIO PCI BAR mmap() to be backed by a DMABUF too,
resulting in a common vm_ops and fault handler for mmap()s of both the
VFIO device and explicitly-exported DMABUFs. This has been done for
vfio-pci, but not sub-drivers (nvgrace-gpu's special-case mappings are
unchanged).
vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
A bug fix to a related are, whose context is a depdency for later
patches.
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
The first is for a DMABUF VMA fault handler to determine
arbitrary-sized PFNs from ranges in DMABUF. Secondly, refactor
DMABUF export for use by the existing export feature and a new
helper that creates a DMABUF corresponding to a VFIO BAR mmap()
request.
vfio/pci: Convert BAR mmap() to use a DMABUF
The vfio-pci core mmap() creates a DMABUF with the helper, and the
vm_ops fault handler uses the other helper to resolve the fault.
Because this depends on DMABUF structs/code, CONFIG_VFIO_PCI_CORE
needs to depend on CONFIG_DMA_SHARED_BUFFER. The
CONFIG_VFIO_PCI_DMABUF still conditionally enables the export
support code.
NOTE: The user mmap()s a device fd, but the resulting VMA's vm_file
becomes that of the DMABUF which takes ownership of the device and
puts it on release. This maintains the existing behaviour of a VMA
keeping the VFIO device open.
BAR zapping then happens via the existing vfio_pci_dma_buf_move()
path, which now needs to unmap PTEs in the DMABUF's address_space.
vfio/pci: Provide a user-facing name for BAR mappings
There was a request for decent debug naming in /proc/<pid>/maps
etc. comparable to the existing VFIO names: since the VMAs are
DMABUFs, they have a "dmabuf:" prefix and can't be 100% identical
to before. This is a user-visible change, but this patch at least
now gives us extra info on the BDF & BAR being mapped.
vfio/pci: Clean up BAR zap and revocation
In general (see NOTE!) the vfio_pci_zap_bars() is now obsolete,
since it unmaps PTEs in the VFIO device address_space which is now
unused. This consolidates all calls (e.g. around reset) with the
neighbouring vfio_pci_dma_buf_move()s into new functions, to
revoke-zap/unrevoke.
NOTE: the nvgrace-gpu driver continues to use its own private
vm_ops, fault handler, etc. for its special memregions, and these
DO still add PTEs to the VFIO device address_space. So, a
temporary flag, vdev->bar_needs_zap, maintains the old behaviour
for this use. At least this patch's consolidation makes it easy
to remove the remaining zap when this need goes away.
A FIXME is added: if nvgrace-gpu is converted to DMABUFs, remove
the flag and final zap.
vfio/pci: Support mmap() of a VFIO DMABUF
Adds mmap() for a DMABUF fd exported from vfio-pci.
It was a goal to keep the VFIO device fd lifetime behaviour
unchanged with respect to the DMABUFs. An application can close
all device fds, and this will revoke/clean up all DMABUFs; no
mappings or other access can be performed now. When enabling
mmap() of the DMABUFs, this means access through the VMA is also
revoked. This complicates the fault handler because whilst the
DMABUF exists, it has no guarantee that the corresponding VFIO
device is still alive. Adds synchronisation ensuring the vdev is
available before vdev->memory_lock is touched.
(I decided against the alternative of preventing cleanup by holding
the VFIO device open if any DMABUFs exist, because it's both a
change of behaviour and less clean overall.)
I've added a chonky comment in place, happy to clarify more if you
have ideas.
vfio/pci: Permanently revoke a DMABUF on request
By weight, this is mostly a rename of revoked to an enum, status.
There are now 3 states for a buffer, usable and revoked
temporary/permanent. A new VFIO device ioctl is added,
VFIO_DEVICE_PCI_DMABUF_REVOKE, which passes a DMABUF (exported from
that device) and permanently revokes it. Thus a userspace driver
can guarantee any downstream consumers of a shared fd are prevented
from accessing a BAR range, and that range can be reused.
The code doing revocation in vfio_pci_dma_buf_move() is moved,
unchanged, to a common function for use by _move() and the new
ioctl path.
Q: I can't think of a good reason to temporarily revoke/unrevoke
buffers from userspace, so didn't add a 'flags' field to the ioctl
struct. Easy to add if people think it's worthwhile for future
use.
vfio/pci: Add mmap() attributes to DMABUF feature
Reserves bits [31:28] in vfio_device_feature_dma_buf to allow a
(CPU) mapping attribute to be specified for an exported set of
ranges. The default is the current UC, and a new flag can specify
CPU access as WC.
Q: I've taken 4 bits; the intention is for this field to be a
scalar not a bitmap (i.e. mutually-exclusive access properties).
Perhaps 4 is a bit too many?
Testing
=======
(The [RFC ONLY] userspace test program, for QEMU edu-plus, has been
dropped, but can be found in the GitHub branch below.)
This code has been tested in mapping DMABUFs of single/multiple
ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
to work correctly. I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...). No regressions
observed on the VFIO selftests, or on our internal vfio-pci
applications.
End
===
This is based on -next (next-20260414 but will merge earlier), as it
depends on Leon's series "vfio: Wait for dma-buf invalidation to
complete":
https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566a…
These commits are on GitHub, along with "[RFC ONLY] selftests: vfio: Add
standalone vfio_dmabuf_mmap_test":
https://github.com/metamev/linux/compare/next-20260414...metamev:linux:dev/…
Thanks for reading,
Matt
================================================================================
Change log:
v1:
- Cleanup of the common DMABUF-aware VMA vm_ops fault handler and
export code.
- Fixed a lot of races, particularly faults racing with DMABUF
cleanup (if the VFIO device fds close, for example).
- Added nicer human-readable names for VFIO mmap() VMAs
RFCv2: Respin based on the feedback/suggestions:
https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
- Transform the existing VFIO BAR mmap path to also use DMABUFs
behind the scenes, and then simply share that code for
explicitly-mapped DMABUFs. Jason wanted to go that direction to
enable iommufd VFIO type 1 emulation to pick up a DMABUF for an IO
mapping.
- Revoke buffers using a VFIO device fd ioctl
RFCv1:
https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
Matt Evans (9):
vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
vfio/pci: Convert BAR mmap() to use a DMABUF
vfio/pci: Provide a user-facing name for BAR mappings
vfio/pci: Clean up BAR zap and revocation
vfio/pci: Support mmap() of a VFIO DMABUF
vfio/pci: Permanently revoke a DMABUF on request
vfio/pci: Add mmap() attributes to DMABUF feature
drivers/vfio/pci/Kconfig | 3 +-
drivers/vfio/pci/Makefile | 3 +-
drivers/vfio/pci/nvgrace-gpu/main.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 30 +-
drivers/vfio/pci/vfio_pci_core.c | 224 ++++++++++---
drivers/vfio/pci/vfio_pci_dmabuf.c | 500 +++++++++++++++++++++++-----
drivers/vfio/pci/vfio_pci_priv.h | 49 ++-
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 42 ++-
9 files changed, 690 insertions(+), 167 deletions(-)
--
2.47.3