virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and ignore its return value. The function can fail with -EINTR from dma_resv_lock_interruptible() (signal during lock wait) or with -ENOMEM from dma_resv_reserve_fences() (fence slot allocation), leaving the resv lock not held. The queue path then walks the object array and calls dma_resv_add_fence(), which requires the lock held; with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840 Call Trace: virtio_gpu_array_add_fence virtio_gpu_queue_ctrl_sgs virtio_gpu_queue_fenced_ctrl_buffer virtio_gpu_cursor_plane_update drm_atomic_helper_commit_planes drm_atomic_helper_commit_tail commit_tail drm_atomic_helper_commit drm_atomic_commit drm_atomic_helper_update_plane __setplane_atomic drm_mode_cursor_universal drm_mode_cursor_common drm_mode_cursor_ioctl drm_ioctl __x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock races with concurrent readers/writers and can corrupt the list.
Both call sites run inside the .atomic_update plane callback, which DRM atomic helpers do not allow to fail (by the time it runs, the commit has been signed off to userspace and there is no clean rollback path). Moving the lock acquisition to .prepare_fb was rejected because the broader lock scope deadlocks against other BO locking paths in the same atomic commit.
Introduce virtio_gpu_lock_one_resv_uninterruptible() that uses dma_resv_lock() instead of dma_resv_lock_interruptible(). This eliminates the -EINTR failure mode -- the realistic syzbot trigger -- without extending the lock hold across the commit. The helper locks a single BO and rejects nents > 1 with -EINVAL; both fix sites lock exactly one BO.
Use it from virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush(); check the return value to handle the remaining -ENOMEM case from dma_resv_reserve_fences() by freeing the objs and skipping the plane update for that frame. The framebuffer BOs touched here are not shared with other contexts and lock contention is expected to be brief, so the loss of signal-interruptibility is acceptable.
Other callers of virtio_gpu_array_lock_resv() (the ioctl paths) continue to use the interruptible variant.
The bug was reported by syzbot, triggered via fault injection (fail_nth) on the DRM_IOCTL_MODE_CURSOR path, which forces the -ENOMEM branch in dma_resv_reserve_fences().
Reported-by: syzbot+72bd3dd3a5d5f39a0271@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=72bd3dd3a5d5f39a0271 Fixes: 5cfd31c5b3a3 ("drm/virtio: fix virtio_gpu_cursor_plane_update().") Cc: stable@vger.kernel.org Signed-off-by: Deepanshu Kartikey kartikey406@gmail.com --- v4: Rename the helper to virtio_gpu_lock_one_resv_uninterruptible() and reject objs->nents > 1 with -EINVAL. The v3 helper's multi-object branch used drm_gem_lock_reservations(), which is interruptible, contradicting the "uninterruptible" name; both fix sites lock a single BO so the multi-object path is dropped. (Dmitry Osipenko) v3: Drop the prepare_fb/cleanup_fb approach from v2 (it deadlocked against virtio_gpu_resource_flush(), which also locks the BO in the same atomic commit). Instead add an uninterruptible variant of the resv lock helper and use it in both virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush(). (Dmitry Osipenko) v2: Move resv lock acquisition from .atomic_update (which must not fail) to .prepare_fb (which may), per maintainer review of v1. The v1 approach of silently skipping the cursor update on lock failure violated the atomic-commit contract with userspace. --- drivers/gpu/drm/virtio/virtgpu_drv.h | 1 + drivers/gpu/drm/virtio/virtgpu_gem.c | 17 +++++++++++++++++ drivers/gpu/drm/virtio/virtgpu_plane.c | 10 ++++++++-- 3 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h index f17660a71a3e..2f3531950aa4 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.h +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h @@ -317,6 +317,7 @@ virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 nents void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs, struct drm_gem_object *obj); int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs); +int virtio_gpu_lock_one_resv_uninterruptible(struct virtio_gpu_object_array *objs); void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs); void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs, struct dma_fence *fence); diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index f22dc5c21cd4..435d37d36034 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -238,6 +238,23 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) return ret; }
+int virtio_gpu_lock_one_resv_uninterruptible(struct virtio_gpu_object_array *objs) +{ + int ret; + + if (objs->nents != 1) + return -EINVAL; + + dma_resv_lock(objs->objs[0]->resv, NULL); + + ret = dma_resv_reserve_fences(objs->objs[0]->resv, 1); + if (ret) { + virtio_gpu_array_unlock_resv(objs); + return ret; + } + return 0; +} + void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs) { if (objs->nents == 1) { diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index a126d1b25f46..652352424744 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -215,7 +215,10 @@ static void virtio_gpu_resource_flush(struct drm_plane *plane, if (!objs) return; virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]); - virtio_gpu_array_lock_resv(objs); + if (virtio_gpu_lock_one_resv_uninterruptible(objs)) { + virtio_gpu_array_put_free(objs); + return; + } virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y, width, height, objs, vgplane_st->fence); @@ -459,7 +462,10 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane, if (!objs) return; virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]); - virtio_gpu_array_lock_resv(objs); + if (virtio_gpu_lock_one_resv_uninterruptible(objs)) { + virtio_gpu_array_put_free(objs); + return; + } virtio_gpu_cmd_transfer_to_host_2d (vgdev, 0, plane->state->crtc_w,
On 5/19/26 10:22, Deepanshu Kartikey wrote:
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and ignore its return value. The function can fail with -EINTR from dma_resv_lock_interruptible() (signal during lock wait) or with -ENOMEM from dma_resv_reserve_fences() (fence slot allocation), leaving the resv lock not held. The queue path then walks the object array and calls dma_resv_add_fence(), which requires the lock held; with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840 Call Trace: virtio_gpu_array_add_fence virtio_gpu_queue_ctrl_sgs virtio_gpu_queue_fenced_ctrl_buffer virtio_gpu_cursor_plane_update drm_atomic_helper_commit_planes drm_atomic_helper_commit_tail commit_tail drm_atomic_helper_commit drm_atomic_commit drm_atomic_helper_update_plane __setplane_atomic drm_mode_cursor_universal drm_mode_cursor_common drm_mode_cursor_ioctl drm_ioctl __x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock races with concurrent readers/writers and can corrupt the list.
Well why are you trying to add a fence on an atomic mode set in the first place?
That is usually an illegal operation here.
Regards, Christian.
Both call sites run inside the .atomic_update plane callback, which DRM atomic helpers do not allow to fail (by the time it runs, the commit has been signed off to userspace and there is no clean rollback path). Moving the lock acquisition to .prepare_fb was rejected because the broader lock scope deadlocks against other BO locking paths in the same atomic commit.
Introduce virtio_gpu_lock_one_resv_uninterruptible() that uses dma_resv_lock() instead of dma_resv_lock_interruptible(). This eliminates the -EINTR failure mode -- the realistic syzbot trigger -- without extending the lock hold across the commit. The helper locks a single BO and rejects nents > 1 with -EINVAL; both fix sites lock exactly one BO.
Use it from virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush(); check the return value to handle the remaining -ENOMEM case from dma_resv_reserve_fences() by freeing the objs and skipping the plane update for that frame. The framebuffer BOs touched here are not shared with other contexts and lock contention is expected to be brief, so the loss of signal-interruptibility is acceptable.
Other callers of virtio_gpu_array_lock_resv() (the ioctl paths) continue to use the interruptible variant.
The bug was reported by syzbot, triggered via fault injection (fail_nth) on the DRM_IOCTL_MODE_CURSOR path, which forces the -ENOMEM branch in dma_resv_reserve_fences().
Reported-by: syzbot+72bd3dd3a5d5f39a0271@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=72bd3dd3a5d5f39a0271 Fixes: 5cfd31c5b3a3 ("drm/virtio: fix virtio_gpu_cursor_plane_update().") Cc: stable@vger.kernel.org Signed-off-by: Deepanshu Kartikey kartikey406@gmail.com
v4: Rename the helper to virtio_gpu_lock_one_resv_uninterruptible() and reject objs->nents > 1 with -EINVAL. The v3 helper's multi-object branch used drm_gem_lock_reservations(), which is interruptible, contradicting the "uninterruptible" name; both fix sites lock a single BO so the multi-object path is dropped. (Dmitry Osipenko) v3: Drop the prepare_fb/cleanup_fb approach from v2 (it deadlocked against virtio_gpu_resource_flush(), which also locks the BO in the same atomic commit). Instead add an uninterruptible variant of the resv lock helper and use it in both virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush(). (Dmitry Osipenko) v2: Move resv lock acquisition from .atomic_update (which must not fail) to .prepare_fb (which may), per maintainer review of v1. The v1 approach of silently skipping the cursor update on lock failure violated the atomic-commit contract with userspace.
drivers/gpu/drm/virtio/virtgpu_drv.h | 1 + drivers/gpu/drm/virtio/virtgpu_gem.c | 17 +++++++++++++++++ drivers/gpu/drm/virtio/virtgpu_plane.c | 10 ++++++++-- 3 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h index f17660a71a3e..2f3531950aa4 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.h +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h @@ -317,6 +317,7 @@ virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 nents void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs, struct drm_gem_object *obj); int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs); +int virtio_gpu_lock_one_resv_uninterruptible(struct virtio_gpu_object_array *objs); void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs); void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs, struct dma_fence *fence); diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index f22dc5c21cd4..435d37d36034 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -238,6 +238,23 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) return ret; } +int virtio_gpu_lock_one_resv_uninterruptible(struct virtio_gpu_object_array *objs) +{
- int ret;
- if (objs->nents != 1)
return -EINVAL;- dma_resv_lock(objs->objs[0]->resv, NULL);
- ret = dma_resv_reserve_fences(objs->objs[0]->resv, 1);
- if (ret) {
virtio_gpu_array_unlock_resv(objs);return ret;- }
- return 0;
+}
void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs) { if (objs->nents == 1) { diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index a126d1b25f46..652352424744 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -215,7 +215,10 @@ static void virtio_gpu_resource_flush(struct drm_plane *plane, if (!objs) return; virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
virtio_gpu_array_lock_resv(objs);
if (virtio_gpu_lock_one_resv_uninterruptible(objs)) {virtio_gpu_array_put_free(objs);return; virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y, width, height, objs, vgplane_st->fence);}@@ -459,7 +462,10 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane, if (!objs) return; virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
virtio_gpu_array_lock_resv(objs);
if (virtio_gpu_lock_one_resv_uninterruptible(objs)) {virtio_gpu_array_put_free(objs);return; virtio_gpu_cmd_transfer_to_host_2d (vgdev, 0, plane->state->crtc_w,}
On 5/19/26 11:27, Christian König wrote:
On 5/19/26 10:22, Deepanshu Kartikey wrote:
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and ignore its return value. The function can fail with -EINTR from dma_resv_lock_interruptible() (signal during lock wait) or with -ENOMEM from dma_resv_reserve_fences() (fence slot allocation), leaving the resv lock not held. The queue path then walks the object array and calls dma_resv_add_fence(), which requires the lock held; with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840 Call Trace: virtio_gpu_array_add_fence virtio_gpu_queue_ctrl_sgs virtio_gpu_queue_fenced_ctrl_buffer virtio_gpu_cursor_plane_update drm_atomic_helper_commit_planes drm_atomic_helper_commit_tail commit_tail drm_atomic_helper_commit drm_atomic_commit drm_atomic_helper_update_plane __setplane_atomic drm_mode_cursor_universal drm_mode_cursor_common drm_mode_cursor_ioctl drm_ioctl __x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock races with concurrent readers/writers and can corrupt the list.
Well why are you trying to add a fence on an atomic mode set in the first place?
That is usually an illegal operation here.
That is pre-existing in the driver. It performs draw operation and in some cases waits for the completion during atomic. Whether all that syncing is correct is hard to say immediately as some of it may be historical edge cases.
On 5/20/26 08:50, Dmitry Osipenko wrote:
On 5/19/26 11:27, Christian König wrote:
On 5/19/26 10:22, Deepanshu Kartikey wrote:
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and ignore its return value. The function can fail with -EINTR from dma_resv_lock_interruptible() (signal during lock wait) or with -ENOMEM from dma_resv_reserve_fences() (fence slot allocation), leaving the resv lock not held. The queue path then walks the object array and calls dma_resv_add_fence(), which requires the lock held; with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840 Call Trace: virtio_gpu_array_add_fence virtio_gpu_queue_ctrl_sgs virtio_gpu_queue_fenced_ctrl_buffer virtio_gpu_cursor_plane_update drm_atomic_helper_commit_planes drm_atomic_helper_commit_tail commit_tail drm_atomic_helper_commit drm_atomic_commit drm_atomic_helper_update_plane __setplane_atomic drm_mode_cursor_universal drm_mode_cursor_common drm_mode_cursor_ioctl drm_ioctl __x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock races with concurrent readers/writers and can corrupt the list.
Well why are you trying to add a fence on an atomic mode set in the first place?
That is usually an illegal operation here.
That is pre-existing in the driver. It performs draw operation and in some cases waits for the completion during atomic. Whether all that syncing is correct is hard to say immediately as some of it may be historical edge cases.
I'm not not so deeply in the atomic mode setting stuff but it strongly sounds like that this is seriously broken.
The background is that the atomic mode set framework allows an output dma_fence which is signaled when the commit is finished.
So when you allocate a fence slot and add a new fence to finish the atomic commit it is trivially possible that this cycles back and waits for the atomic commit to finish. In other words you have a deadlock.
You probably need specially crafted userspace with the right timing to trigger that, but such issues are usually a rather big no-no and need to be fixed in the long term.
Try to add dma_fence_begin_signaling() and dma_fence_end_signaling() annotation and enable lockdep, the tool should be able to point out if and what exactly goes wrong.
The usual fix is to prepare everything before commit_tail is called (alloc memory, create, reserve slot, add dma_fence etc....) and then just send out the prepared commands later on.
Regards, Christian.
On 5/20/26 10:05, Christian König wrote:
On 5/20/26 08:50, Dmitry Osipenko wrote:
On 5/19/26 11:27, Christian König wrote:
On 5/19/26 10:22, Deepanshu Kartikey wrote:
virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and ignore its return value. The function can fail with -EINTR from dma_resv_lock_interruptible() (signal during lock wait) or with -ENOMEM from dma_resv_reserve_fences() (fence slot allocation), leaving the resv lock not held. The queue path then walks the object array and calls dma_resv_add_fence(), which requires the lock held; with lockdep enabled this trips dma_resv_assert_held():
WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840 Call Trace: virtio_gpu_array_add_fence virtio_gpu_queue_ctrl_sgs virtio_gpu_queue_fenced_ctrl_buffer virtio_gpu_cursor_plane_update drm_atomic_helper_commit_planes drm_atomic_helper_commit_tail commit_tail drm_atomic_helper_commit drm_atomic_commit drm_atomic_helper_update_plane __setplane_atomic drm_mode_cursor_universal drm_mode_cursor_common drm_mode_cursor_ioctl drm_ioctl __x64_sys_ioctl
Beyond the WARN, mutating the dma_resv fence list without the lock races with concurrent readers/writers and can corrupt the list.
Well why are you trying to add a fence on an atomic mode set in the first place?
That is usually an illegal operation here.
That is pre-existing in the driver. It performs draw operation and in some cases waits for the completion during atomic. Whether all that syncing is correct is hard to say immediately as some of it may be historical edge cases.
I'm not not so deeply in the atomic mode setting stuff but it strongly sounds like that this is seriously broken.
The background is that the atomic mode set framework allows an output dma_fence which is signaled when the commit is finished.
So when you allocate a fence slot and add a new fence to finish the atomic commit it is trivially possible that this cycles back and waits for the atomic commit to finish. In other words you have a deadlock.
You probably need specially crafted userspace with the right timing to trigger that, but such issues are usually a rather big no-no and need to be fixed in the long term.
Try to add dma_fence_begin_signaling() and dma_fence_end_signaling() annotation and enable lockdep, the tool should be able to point out if and what exactly goes wrong.
The usual fix is to prepare everything before commit_tail is called (alloc memory, create, reserve slot, add dma_fence etc....) and then just send out the prepared commands later on.
We tried with moving resv alloc to prepare_fb() in a previous patch version, it resulted in a non-trivial deadlocks. The goal of this patch is to fix immediate problem with a minimal code change.
What you're saying is correct, but it may require a rather big refactoring of the code. In general, everything works okay today, so not really an urgent problem.
linaro-mm-sig@lists.linaro.org