On Thu, Sep 08, 2011 at 09:18:30AM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 4:57 AM, Sumit Semwal sumit.semwal@linaro.org wrote:
Ok, so do you think the following as dma_buf_ops would be ok? - create, - attach_device, - rename {get, put}_scatterlist to "buffer_{map, unmap}, adding struct device*
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Well, the create would just create the abstract handle, so you can attach devices.
All the discussion about deferring allocation of backing pages could still be handled in the first get_scatterlist() call.
Well, the original patch only has the get_scatterlist in dma_buf_ops. So you'll only know about that one device (and maybe the originator), which is not enough. Hence the attach_device. And after the first get_scatterlist we can't move the backing storage anymore.
Now I think we could always change that (userspace just sees some abstract handle) but I fear that switching (kernel-internally) to a two-step approach will be quite a bit of work after some drivers/subsytems start to use dma_buf.
Something else that got lost a bit in the attach_device discussion: I think we should directly return the scatterlist mapped into the address space of the mapping device (and hence call it map/unmap to keep in line with the existing dma api). That hopefully gives us a bit more leeway to do insane stuff with the backing storage (like device->device dma for e.g. ip blocks that can directly access gpu address space). Hence my proposal to add a struct device * parameter.
I think the interesting questions are:
- what to add to 'struct device_dma_parameters'
I think that would be highly (and imo should be) platform specific, e.g. on ARM a few things cascades like - normal pages -> large pages (due to puny tlbs) -> contigous - cached -> uncached memory - anything else totally insane (like all of mem -> only from one bank)
We could implement this by a chain of allocators with page_alloc at the top and cma_alloc at the bottom. Arbitration could then take all know devices and walk down the allocator list untill it fits.
btw, recent Intel gpus can use large pages (32kb), so that allocator infrastructure could perhaps be shared.
- how to handle things like "I need luma over here, chroma over
there".. maybe this could be handled by making luma and chroma two buffers, and having some sub-devices?
Yeah I think for devices with extremely stringet requirements splitting planar yuv formats into 2 or 3 buffers is the only way to go. If it's getting more extreme I think it's approaching the time to throw in the towel and tell the hw engineers to fix up their mess.
Obviously, the 'release' callback should then take care of doing all book-keeping related to 'de-attaching' struct device* as well. Then we can have a 'central allocator' device which will use this framework to allow negotiation and sync'ing. Also in absence of this central allocator device, one of the subsystems can take up the role of allocator in the earlier-suggested way?
central allocator or not, I don't think that the importing device should know or care.. *who* allocates and *who* imports, doesn't really seem relevant to me for dmabuf API..
Yeah, I agree. As long as we shovel the get_scatterlist through an internal api and so separate the allocator of the backing storage from dma_buf api users, we should be free to change things without too much work.
I think for that I'd be good to have an inline encapsulating the abstract function call, so we can later change it to e.g. a central platfrom-specific allocator. I.e.
struct scatterlist *dma_buf_get_scatterlist(struct dma_buf *) { return dma_buf->ops->get-scatterlist(dma_buf); }
Cheers, Daniel