Hi Hans,
I'll try to explain a bit, after all I've been pushing this attachment buisness quite a bit.
On Tue, Sep 27, 2011 at 03:24:24PM +0200, Hans Verkuil wrote:
OK, it is not clear to me what the purpose is of the attachments.
If I understand the discussion from this list correctly, then the idea is that each device that wants to use this buffer first attaches itself by calling dma_buf_attach().
Then at some point the application asks some driver to export the buffer. So the driver calls dma_buf_export() and passes its own dma_buf_ops. In other words, this becomes the driver that 'controls' the memory, right?
Actually, the ordering is the other way round. First, some driver calls dam_buf_export, userspace then passes around the fd to all other drivers, they do an import and call attach. While all this happens, the driver that exported the dma_buf does not (yet) allocate any backing storage.
Another driver that receives the fd will call dma_buf_get() and can then call e.g. get_scatterlist from dma_buf->ops. (As an aside: I would make inline functions that take a dma_buf pointer and call the corresponding op, rather than requiring drivers to go through ops directly)
Well, drivers should only call get_scatterlist when they actually need to access the memory. This way the originating driver can go through the list of all attached devices and decide where to allocate backing storage on the first get_scatterlist.
But what I miss in this picture is the role of dma_buf_attachment. I'm passing it to get_scatterlist, but which attachment is that? That of the calling driver? And what is the get_scatterlist implementation supposed to do with it?
See above, essentially an attachment is just list bookkeeping for all the devices that take part in a buffer sharing.
I also read some discussion about what is supposed to happen if another device is attached after get_scatterlist was already called. Apparently the idea was that the old scatterlist is somehow migrated to a new one if that should be necessary? Although I got the impression that that involved a lot of hand-waving with a pinch of wishful thinking. But I may be wrong about that.
Now this is very it's getting "intersting". If all drivers guard they're usage with get_scatterlist/put_scatterlist, and we add a new driver, and all drivers that currently hold onto a mapping are known to release that with put_scatterlist in a finite time, we can do Cool Stuff (tm). First, that scenario actually happens for e.g. a video pipe, where we cycle through buffers.
Now when adding a new device with stricter backing storage constrains, the originator can just stall in the get_scatterlist call until all outstanding access has completed (signalled by put_scatterlist), move the object around and let things continue merily. The video pipe might stutter a bit when e.g. switching on the encoder until all buffers have settled into the new place, but it should Just Work (tm).
Anyway, I guess my main point is that this patch does not explain the role of the attachments and how they should be used (and who uses them).
I agree.
One other thing: once you call REQBUFS on a V4L device the V4L spec says that the memory should be allocated at that time. Because V4L often needs a lot of memory that behavior makes sense: you know immediately if you can get the memory or not. In addition, that memory is mmap-ed before the DMA is started.
If that is actually a fixed requirement for v4l, that's a good reason for mmap support on the dma_buf object. We could hide all the complecity of shooting down userspace mmapings on buffer movements from the drivers. Can you elaborate a bit on this?
This behavior may pose a problem if the idea is to wait with actually allocating memory until the pipeline is started.
I think you're looking at v4lv3 ;-)
More seriously all modern linux apis for pushing frames out use one of two modes: - gimme the next frame to draw into (dri2) - here's the next frame I've drawn into (wayland)
To make that fast, we obviously need to recycle buffers. But from a semantic point of view, you only ever have one buffer, namely the current one. All the other N buffers to make the graphics pipeline not stutter are transparently in-flight somewhere.
Imo such a dynamic scheme has a few advantages: - there's just no way to know the amount of buffers you need up-front on any reasonable complex graphics pipeline. As soon as a gpu is in the mix, it's best effort. With a dynamic limit on the in-flight buffers you can cope with latencies until you hit -ENOMEM. With a fixed set you always have to make a compromise and can't really allocate for the worst case - it will hinder stuff running in the background. - in the usual case you need much fewer buffers to make any given pipeline run stutter-free than in the worst case. No point wasting that memory.
Now I have no idea how you could shoe-horn that onto the current v4l interfaces.
Hmm, I'm rambling a bit, but I hope the gist of my mail is clear.
It's clear and I think you're raising good points.
Cheers, Daniel