On Tue, Oct 21, 2025 at 4:43 PM Matthew Brost matthew.brost@intel.com wrote:
On Sat, Oct 18, 2025 at 12:42:30AM -0700, Matthew Brost wrote:
On Fri, Oct 17, 2025 at 11:43:51PM -0700, Matthew Brost wrote:
On Fri, Oct 17, 2025 at 10:37:46AM -0500, Rob Herring wrote:
On Thu, Oct 16, 2025 at 11:25:34PM -0700, Matthew Brost wrote:
On Thu, Oct 16, 2025 at 04:06:05PM -0500, Rob Herring (Arm) wrote:
Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a relatively simple interface with single command stream to describe buffers, operation settings, and network operations. It supports up to 8 memory regions (though no h/w bounds on a region). The Ethos NPUs are designed to use an SRAM for scratch memory. Region 2 is reserved for SRAM (like the downstream driver stack and compiler). Userspace doesn't need access to the SRAM.
Thanks for the review.
[...]
+static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job) +{
struct ethosu_job *job = to_ethosu_job(sched_job);struct ethosu_device *dev = job->dev;struct dma_fence *fence = NULL;int ret;if (unlikely(job->base.s_fence->finished.error))return NULL;fence = ethosu_fence_create(dev);Another reclaim issue: ethosu_fence_create allocates memory using GFP_KERNEL. Since we're already in the DMA fence signaling path (reclaim), this can lead to a deadlock.
Without too much thought, you likely want to move this allocation to ethosu_job_do_push, but before taking dev->sched_lock or calling drm_sched_job_arm.
We really should fix the DRM scheduler work queue to be tainted with reclaim. If I recall correctly, we'd need to update the work queue layer. Let me look into that—I've seen this type of bug several times, and lockdep should be able to catch it.
Likely the rocket driver suffers from the same issues...
I am not surprised by this statement.
if (IS_ERR(fence))return fence;if (job->done_fence)dma_fence_put(job->done_fence);job->done_fence = dma_fence_get(fence);ret = pm_runtime_get_sync(dev->base.dev);I haven't looked at your PM design, but this generally looks quite dangerous with respect to reclaim. For example, if your PM resume paths allocate memory or take locks that allocate memory underneath, you're likely to run into issues.
A better approach would be to attach a PM reference to your job upon creation and release it upon job destruction. That would be safer and save you headaches in the long run.
Our PM is nothing more than clock enable/disable and register init.
If the runtime PM API doesn't work and needs special driver wrappers, then I'm inclined to just not use it and manage clocks directly (as that's all it is doing).
Yes, then you’re probably fine. More complex drivers can do all sorts of things during a PM wake, which is why PM wakes should generally be the outermost layer. I still suggest, to future-proof your code, that you move the PM reference to an outer layer.
Also, taking a PM reference in a function call — as opposed to tying it to a object's lifetime — is risky. It can quickly lead to imbalances in PM references if things go sideways or function calls become unbalanced. Depending on how your driver uses the DRM scheduler, this seems like a real possibility.
Matt
This is what we do in Xe [1] [2].
Also, in general, this driver has been reviewed (RB’d), but it's not great that I spotted numerous issues within just five minutes. I suggest taking a step back and thoroughly evaluating everything this driver is doing.
Well, if it is hard to get simple drivers right, then it's a problem with the subsystem APIs IMO.
Yes, agreed. We should have assertions and lockdep annotations in place to catch driver-side misuses. This is the second driver I’ve randomly looked at over the past year that has broken DMA fencing and reclaim rules. I’ll take an action item to fix this in the DRM scheduler, but I’m afraid I’ll likely break multiple drivers in the process as misuess / lockdep will complain.
I've posted a series [1] for the DRM scheduler which will complain about the things I've pointed out here.
Thanks. I ran v6 with them and no lockdep splats.
Rob
linaro-mm-sig@lists.linaro.org