On Tue, May 07, 2024 at 01:48:38PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote:
- Align with devmem TCP to use udmabuf for your io_uring memory. I
think in the past you said it's a uapi you don't link but in the face of this pushback you may want to reconsider.
dmabuf does not force a uapi, you can acquire your pages however you want and wrap them up in a dmabuf. No uapi at all.
The point is that dmabuf already provides ops that do basically what is needed here. We don't need ops calling ops just because dmabuf's ops are not understsood or not perfect. Fixup dmabuf.
If io_uring wants to take its existing memory pre-registration it can wrap that in a dmbauf, and somehow pass it to the netstack. Userspace doesn't need to know a dmabuf is being used in the background.
So roughly the current dma-buf design considerations for the users of the dma-api interfaces:
- It's a memory buffer of fixed length.
- Either that memory is permanently nailed into place with dma_buf_pin (and if we add more users than just drm display then we should probably figure out the mlock account question for these). For locking hierarchy dma_buf_pin uses dma_resv_lock which nests within mmap_sem/vma locks but outside of any reclaim/alloc contexts.
- Or the memory is more dynamic, in which case case you need to be able to dma_resv_lock when you need the memory and make a promise (as a dma_fence) that you'll release the memory within finite time and without any further allocations once you've unlocked the dma_buf (because dma_fence is in GFP_NORECLAIM). That promise can be waiting for memory access to finish, but it can also be a pte invalidate+tlb flush, or some kind of preemption, or whatever your hw can do really.
Also, if you do this dynamic model and need to atomically reserve more than one dma_buf, you get to do the wait/wound mutex dance, but that's really just a bunch of funny looking error handling code and not really impacting the overall design or locking hierarchy.
Everything else we can adjust, but I think the above three are not really changeable or dma-buf becomes unuseable for gpu drivers.
Note that exporters of dma-buf can pretty much do whatever they feel like, including rejecting all the generic interfaces/ops, because we also use dma-buf as userspace handles for some really special memory. -Sima