On Wed, Jun 17, 2020 at 9:27 AM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jun 16, 2020 at 02:07:19PM +0200, Daniel Vetter wrote:
I've pinged a bunch of armsoc gpu driver people and ask them how much this hurts, so that we have a clear answer. On x86 I don't think we have much of a choice on this, with userptr in amd and i915 and hmm work in nouveau (but nouveau I think doesn't use dma_fence in there).
Right, nor will RDMA ODP.
Hm, what's the context here? I thought RDMA side you really don't want dma_fence in mmu_notifiers, so not clear to me what you're agreeing on here.
rdma does not use dma_fence at all, and though it is hard to tell, I didn't notice a dma_fence in the nouveau invalidation call path.
Nouveau for compute has hw page faults. It doesn't have hw page faults for non-compute fixed function blocks afaik, so there's a hybrid model going on. But nouveau also doesn't support userspace memory (instead of driver-allocated buffer objects) for these fixed function blocks, so no need to have a dma_fence_wait in there.
At the very least I think there should be some big warning that dma_fence in notifiers should be avoided.
Yeah I'm working on documentation, and also the notifiers here hopefully make it clear it's massive pain. I think we could even make a hard rule that dma_fence in mmu notifier outside of drivers/gpu is a bug/misfeature.
Might be a good idea to add a MAINTAINERS entry with a K: regex pattern, so that you can catch such modifiers. We do already have such a pattern for dma-fence, to catch abuse. So if you want I could type up a documentation patch for this, get your and others acks and the dri-devel folks would enforce that the dma_fence_wait madness doesn't leak beyond drivers/gpu
Ie it is strange that the new totally-not-a-gpu drivers use dma_fence, they surely don't have the same constraints as the existing GPU world, and it would be annoying to see dma_fence notifiers spring up in them
If you mean drivers/misc/habanalabs, that's going to get taken care of:
commit ed65bfd9fd86dec3772570b0320ca85b9fb69f2e Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Mon May 11 11:11:42 2020 +0200
habanalabs: don't set default fence_ops->wait
It's the default.
Also so much for "we're not going to tell the graphics people how to review their code", dma_fence is a pretty core piece of gpu driver infrastructure. And it's very much uapi relevant, including piles of corresponding userspace protocols and libraries for how to pass these around.
Would be great if habanalabs would not use this (from a quick look it's not needed at all), since open source the userspace and playing by the usual rules isn't on the table. If that's not possible (because it's actually using the uapi part of dma_fence to interact with gpu drivers) then we have exactly what everyone promised we'd want to avoid.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Reviewed-by: Oded Gabbay oded.gabbay@gmail.com Signed-off-by: Oded Gabbay oded.gabbay@gmail.com
Oded has agreed to remove the dma-fence usage, since they really don't need it (and all the baggage that comes with it), plain old completion is enough for their use. This use is also why I added the regex to MAINTAINERS, so that in the future we can catch people who try to use dma_fence because it looks cute and useful, and are completely oblivious to all the pain and headaches involved.
Cheers, Daniel