In rocket_job_run(), after taking an extra fence reference for job->done_fence via dma_fence_get(), the error paths have three bugs:
- The dma_fence reference held by job->done_fence is never released, causing a reference leak. - pm_runtime_get_sync() increments the usage counter even on failure, but the error path does not decrement it, leaking the runtime PM reference and preventing the NPU from suspending. - A valid but unsignaled fence is returned to the DRM scheduler, which triggers WARN("Fence ... released with pending signals!") when the scheduler drops its reference.
Fix by replacing pm_runtime_get_sync() with pm_runtime_resume_and_get() which auto-balances the usage counter on failure, releasing both fence references on error, and returning ERR_PTR(ret) instead of the unsignaled fence.
Cc: stable@vger.kernel.org Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL") Signed-off-by: ZhaoJinming zhaojinming@uniontech.com --- drivers/accel/rocket/rocket_job.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c index ac51bff39833..e8a073e22ac2 100644 --- a/drivers/accel/rocket/rocket_job.c +++ b/drivers/accel/rocket/rocket_job.c @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job) dma_fence_put(job->done_fence); job->done_fence = dma_fence_get(fence);
- ret = pm_runtime_get_sync(core->dev); - if (ret < 0) - return fence; + ret = pm_runtime_resume_and_get(core->dev); + if (ret < 0) { + dma_fence_put(job->done_fence); + job->done_fence = NULL; + dma_fence_put(fence); + return ERR_PTR(ret); + }
ret = iommu_attach_group(job->domain->domain, core->iommu_group); - if (ret < 0) - return fence; + if (ret < 0) { + pm_runtime_put(core->dev); + dma_fence_put(job->done_fence); + job->done_fence = NULL; + dma_fence_put(fence); + return ERR_PTR(ret); + }
scoped_guard(mutex, &core->job_lock) { core->in_flight_job = job;