[Linaro-mm-sig] [RFC] dma-shared-buf: Add buffer sharing framework
Clark, Rob
rob at ti.com
Mon Sep 12 14:51:37 UTC 2011
On Mon, Sep 12, 2011 at 9:07 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Sun, Sep 11, 2011 at 10:32:20AM -0500, Clark, Rob wrote:
>> On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> > On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
>> >> with this sort of approach, if a new device is attached after the
>> >> first get_scatterlist the buffer can be, if needed, migrated using the
>> >> union of all the devices requirements at a point in time when no DMA
>> >> is active to/from the buffer. But if all the devices are known up
>> >> front, then you never need to migrate unnecessarily.
>> >
>> > Well, the problem is with devices that hang onto mappings for way too long
>> > so just waiting for all dma to finish to be able to fix up the buffer
>> > placement is a no-go. But I think we can postpone that issue a bit,
>> > especially since the drivers that tend to do this (gpus) can also evict
>> > objects nilly-willy, so that should be fixable with some explicit
>> > kill_your_mappings callback attached to drm_buf_attachment (or full-blown
>> > sync objects a là ttm).
>>
>> I'm ok if the weird fallback cases aren't fast.. I just don't want
>> things to explode catastrophically in weird cases.
>>
>> I guess in the GPU / deep pipeline case, you can at least set up to
>> get an interrupt back when the GPU is done with some surface (ie. when
>> it gets to a certain point in the command-stream)? I think it is ok
>> if things stall in this case until the GPU pipeline is drained (and if
>> you are targeting 60fps, that is probably still faster than video,
>> likely at 30fps). Again, this is just for the cases where userspace
>> doesn't do what we want, to avoid just complete failure..
>>
>> If the GPU is the one importing the dmabuf, it just calls
>> put_scatterlist() once it gets some interrupt from the GPU. If the
>> GPU is the one exporting the dmabuf, then get_scatterlist() just
>> blocks until the GPU gets the interrupt from the GPU. (Well, I guess
>> then do you need get_scatterlist_interruptable()?)
>
> The problem with gpus is that they eat through data so _fast_ that not
> caching mappings kills performance. Now for simpler gpus we could shovel
> the mapping code into the dma/dma_buf subsystem and cache things there.
>
> But desktop gpus already have (or will get) support for per-process gpu
> address spaces and I don't thing it makes sense to put that complexity
> into generic layers (nor is it imo feasible accross different gpus -
> per-process stuff tends to highly integrate with command submission). So I
> think we need some explicit unmap_ASAP callback support, but definitly not
> for v1 of dma_buf. But with attach separated from get_scatterlist and an
> explicit struct dma_buf_attachment around, such an extension should be
> pretty straightforward to implement.
hmm, I was thinking we could somehow make the GPU MMU look something
like a IOMMU and do the caching behind that.. but I guess if you
start talking about per-process address spaces and that sort of thing
(GPU/command-stream driven context switches), then maybe that starts
to stretch the limits of generic interfaces..
OTOH, I think the GPU subsystem would normally be the exporter of the
buf.. other subsystems like v4l2 cameras or encoders/decoders that
import the dmabuf would be substantially simpler.
I was thinking about whether or not to add a GEM ioctl to import
dmabuf's, but I think the main use-case of that would be to re-import
dmabuf's that are already backed by a GEM object.
BR,
-R
> -Daniel
> --
> Daniel Vetter
> Mail: daniel at ffwll.ch
> Mobile: +41 (0)79 365 57 48
>
More information about the Linaro-mm-sig
mailing list