On Thu, Oct 16, 2025 at 11:25:34PM -0700, Matthew Brost wrote:
On Thu, Oct 16, 2025 at 04:06:05PM -0500, Rob Herring (Arm) wrote:
Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a relatively simple interface with single command stream to describe buffers, operation settings, and network operations. It supports up to 8 memory regions (though no h/w bounds on a region). The Ethos NPUs are designed to use an SRAM for scratch memory. Region 2 is reserved for SRAM (like the downstream driver stack and compiler). Userspace doesn't need access to the SRAM.
Thanks for the review.
[...]
+static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job) +{
- struct ethosu_job *job = to_ethosu_job(sched_job);
- struct ethosu_device *dev = job->dev;
- struct dma_fence *fence = NULL;
- int ret;
- if (unlikely(job->base.s_fence->finished.error))
return NULL;
- fence = ethosu_fence_create(dev);
Another reclaim issue: ethosu_fence_create allocates memory using GFP_KERNEL. Since we're already in the DMA fence signaling path (reclaim), this can lead to a deadlock.
Without too much thought, you likely want to move this allocation to ethosu_job_do_push, but before taking dev->sched_lock or calling drm_sched_job_arm.
We really should fix the DRM scheduler work queue to be tainted with reclaim. If I recall correctly, we'd need to update the work queue layer. Let me look into that—I've seen this type of bug several times, and lockdep should be able to catch it.
Likely the rocket driver suffers from the same issues...
- if (IS_ERR(fence))
return fence;
- if (job->done_fence)
dma_fence_put(job->done_fence);
- job->done_fence = dma_fence_get(fence);
- ret = pm_runtime_get_sync(dev->base.dev);
I haven't looked at your PM design, but this generally looks quite dangerous with respect to reclaim. For example, if your PM resume paths allocate memory or take locks that allocate memory underneath, you're likely to run into issues.
A better approach would be to attach a PM reference to your job upon creation and release it upon job destruction. That would be safer and save you headaches in the long run.
Our PM is nothing more than clock enable/disable and register init.
If the runtime PM API doesn't work and needs special driver wrappers, then I'm inclined to just not use it and manage clocks directly (as that's all it is doing).
This is what we do in Xe [1] [2].
Also, in general, this driver has been reviewed (RB’d), but it's not great that I spotted numerous issues within just five minutes. I suggest taking a step back and thoroughly evaluating everything this driver is doing.
Well, if it is hard to get simple drivers right, then it's a problem with the subsystem APIs IMO.
Rob