[Linaro-mm-sig] [RFC v2] dma-buf: Add buffer sharing framework

Daniel Vetter daniel at ffwll.ch
Tue Sep 27 14:19:56 UTC 2011

Hi Hans,

I'll try to explain a bit, after all I've been pushing this attachment
buisness quite a bit.

On Tue, Sep 27, 2011 at 03:24:24PM +0200, Hans Verkuil wrote:
> OK, it is not clear to me what the purpose is of the attachments.
> If I understand the discussion from this list correctly, then the idea is
> that each device that wants to use this buffer first attaches itself by
> calling dma_buf_attach().
> Then at some point the application asks some driver to export the buffer.
> So the driver calls dma_buf_export() and passes its own dma_buf_ops. In
> other words, this becomes the driver that 'controls' the memory, right?

Actually, the ordering is the other way round. First, some driver calls
dam_buf_export, userspace then passes around the fd to all other drivers,
they do an import and call attach. While all this happens, the driver that
exported the dma_buf does not (yet) allocate any backing storage.

> Another driver that receives the fd will call dma_buf_get() and can then
> call e.g. get_scatterlist from dma_buf->ops. (As an aside: I would make
> inline functions that take a dma_buf pointer and call the corresponding
> op, rather than requiring drivers to go through ops directly)

Well, drivers should only call get_scatterlist when they actually need to
access the memory. This way the originating driver can go through the list
of all attached devices and decide where to allocate backing storage on
the first get_scatterlist.

> But what I miss in this picture is the role of dma_buf_attachment. I'm
> passing it to get_scatterlist, but which attachment is that? That of the
> calling driver? And what is the get_scatterlist implementation supposed
> to do with it?

See above, essentially an attachment is just list bookkeeping for all the
devices that take part in a buffer sharing.

> I also read some discussion about what is supposed to happen if another
> device is attached after get_scatterlist was already called. Apparently
> the idea was that the old scatterlist is somehow migrated to a new one
> if that should be necessary? Although I got the impression that that
> involved a lot of hand-waving with a pinch of wishful thinking. But I
> may be wrong about that.

Now this is very it's getting "intersting". If all drivers guard they're
usage with get_scatterlist/put_scatterlist, and we add a new driver, and
all drivers that currently hold onto a mapping are known to release that
with put_scatterlist in a finite time, we can do Cool Stuff (tm). First,
that scenario actually happens for e.g. a video pipe, where we cycle
through buffers.

Now when adding a new device with stricter backing storage constrains, the
originator can just stall in the get_scatterlist call until all
outstanding access has completed (signalled by put_scatterlist), move the
object around and let things continue merily. The video pipe might stutter
a bit when e.g. switching on the encoder until all buffers have settled
into the new place, but it should Just Work (tm).

> Anyway, I guess my main point is that this patch does not explain the
> role of the attachments and how they should be used (and who uses them).

I agree.

> One other thing: once you call REQBUFS on a V4L device the V4L spec says that
> the memory should be allocated at that time. Because V4L often needs a lot of
> memory that behavior makes sense: you know immediately if you can get the memory
> or not. In addition, that memory is mmap-ed before the DMA is started.

If that is actually a fixed requirement for v4l, that's a good reason for
mmap support on the dma_buf object. We could hide all the complecity of
shooting down userspace mmapings on buffer movements from the drivers.
Can you elaborate a bit on this?

> This behavior may pose a problem if the idea is to wait with actually
> allocating memory until the pipeline is started.

I think you're looking at v4lv3 ;-)

More seriously all modern linux apis for pushing frames out use one of
two modes:
- gimme the next frame to draw into (dri2)
- here's the next frame I've drawn into (wayland)

To make that fast, we obviously need to recycle buffers. But from a
semantic point of view, you only ever have one buffer, namely the current
one. All the other N buffers to make the graphics pipeline not stutter are
transparently in-flight somewhere.

Imo such a dynamic scheme has a few advantages:
- there's just no way to know the amount of buffers you need up-front on
  any reasonable complex graphics pipeline. As soon as a gpu is in the
  mix, it's best effort. With a dynamic limit on the in-flight buffers you
  can cope with latencies until you hit -ENOMEM. With a fixed set you
  always have to make a compromise and can't really allocate for the
  worst case - it will hinder stuff running in the background.
- in the usual case you need much fewer buffers to make any given pipeline
  run stutter-free than in the worst case. No point wasting that memory.

Now I have no idea how you could shoe-horn that onto the current v4l

> Hmm, I'm rambling a bit, but I hope the gist of my mail is clear.

It's clear and I think you're raising good points.

Cheers, Daniel
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48

More information about the Linaro-mm-sig mailing list