On Sun, Sep 11, 2011 at 10:32:20AM -0500, Clark, Rob wrote:
On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
Well, the problem is with devices that hang onto mappings for way too long so just waiting for all dma to finish to be able to fix up the buffer placement is a no-go. But I think we can postpone that issue a bit, especially since the drivers that tend to do this (gpus) can also evict objects nilly-willy, so that should be fixable with some explicit kill_your_mappings callback attached to drm_buf_attachment (or full-blown sync objects a là ttm).
I'm ok if the weird fallback cases aren't fast.. I just don't want things to explode catastrophically in weird cases.
I guess in the GPU / deep pipeline case, you can at least set up to get an interrupt back when the GPU is done with some surface (ie. when it gets to a certain point in the command-stream)? I think it is ok if things stall in this case until the GPU pipeline is drained (and if you are targeting 60fps, that is probably still faster than video, likely at 30fps). Again, this is just for the cases where userspace doesn't do what we want, to avoid just complete failure..
If the GPU is the one importing the dmabuf, it just calls put_scatterlist() once it gets some interrupt from the GPU. If the GPU is the one exporting the dmabuf, then get_scatterlist() just blocks until the GPU gets the interrupt from the GPU. (Well, I guess then do you need get_scatterlist_interruptable()?)
The problem with gpus is that they eat through data so _fast_ that not caching mappings kills performance. Now for simpler gpus we could shovel the mapping code into the dma/dma_buf subsystem and cache things there.
But desktop gpus already have (or will get) support for per-process gpu address spaces and I don't thing it makes sense to put that complexity into generic layers (nor is it imo feasible accross different gpus - per-process stuff tends to highly integrate with command submission). So I think we need some explicit unmap_ASAP callback support, but definitly not for v1 of dma_buf. But with attach separated from get_scatterlist and an explicit struct dma_buf_attachment around, such an extension should be pretty straightforward to implement. -Daniel