On Mon, 2025-10-13 at 20:07 +0100, Tvrtko Ursulin wrote:
When adding dependencies with drm_sched_job_add_dependency(), that
I'm sorry if there was confusion about upper case style. I'm quite sure I'd think I'd never ask for a function name to be written uppercase, but maybe that was a -ENOCOFFEE version of me last year or sth :O
function consumes the fence reference both on success and failure, so in the latter case the dma_fence_put() on the error path (xarray failed to expand) is a double free.
Interestingly this bug appears to have been present ever since ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code back then looked like this:
drm_sched_job_add_implicit_dependencies(): ... for (i = 0; i < fence_count; i++) { ret = drm_sched_job_add_dependency(job, fences[i]); if (ret) break; }
for (; i < fence_count; i++) dma_fence_put(fences[i]);
Which means for the failing 'i' the dma_fence_put was already a double free. Possibly there were no users at that time, or the test cases were insufficient to hit it.
The bug was then only noticed and fixed after 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") landed, with its fixup of 4eaf02d6076c ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").
At that point it was a slightly different flavour of a double free, which 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder") noticed and attempted to fix.
But it only moved the double free from happening inside the drm_sched_job_add_dependency(), when releasing the reference not yet obtained, to the caller, when releasing the reference already released by the former in the failure case.
As such it is not easy to identify the right target for the fixes tag so lets keep it simple and just continue the chain.
While fixing we also improve the comment and explain the reason for taking the reference and not dropping it.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder") Reported-by: Dan Carpenter dan.carpenter@linaro.org Reference: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/
Any objection with me changing that to Closes: when applying?
As I read it that's a real bug report by Dan and this patches closes that report.
P.
Cc: Christian König christian.koenig@amd.com Cc: Rob Clark robdclark@chromium.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Brost matthew.brost@intel.com Cc: Danilo Krummrich dakr@kernel.org Cc: Philipp Stanner phasta@kernel.org Cc: "Christian König" ckoenig.leichtzumerken@gmail.com Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.16+
v2: * Re-arrange commit text so discussion around sentences starting with capital letters in all cases can be avoided. * Keep double return for now. * Improved comment instead of dropping it.
drivers/gpu/drm/scheduler/sched_main.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 46119aacb809..c39f0245e3a9 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -965,13 +965,14 @@ int drm_sched_job_add_resv_dependencies(struct drm_sched_job *job, dma_resv_assert_held(resv); dma_resv_for_each_fence(&cursor, resv, usage, fence) {
/* Make sure to grab an additional ref on the added fence */
dma_fence_get(fence);
ret = drm_sched_job_add_dependency(job, fence);
if (ret) {
dma_fence_put(fence);
/*
* As drm_sched_job_add_dependency always consumes the fence
* reference (even when it fails), and dma_resv_for_each_fence
* is not obtaining one, we need to grab one before calling.
*/
ret = drm_sched_job_add_dependency(job, dma_fence_get(fence));
if (ret)
return ret;
}
} return 0; }