[RFC 0/2] Introduce dma buffer sharing mechanism

List overview All Threads
Download

newer

older

[PATCH 0/8 v3] ARM: DMA-mapping...

[PATCH] add debug trace to CMA

Sumit Semwal

11 Oct 2011 11 Oct '11

9:23 a.m.

Hello Everyone,

Various subsystems - V4L2, GPU-accessors, DRI to name a few - have felt the need to have a common mechanism to share memory buffers across different devices - ARM, video hardware, GPU.

This need comes forth from a variety of use cases including cameras, image processing, video recorders, sound processing, DMA engines, GPU and display buffers, and others.

This RFC is the first attempt in defining such a buffer sharing mechanism- it is the result of discussions from a couple of memory-management mini-summits held by Linaro to understand and address common needs around memory management. [1]

A new dma_buf buffer object is added, with operations and API to allow easy sharing of this buffer object across devices.

The framework allows: - a new buffer-object to be created with fixed size. - different devices to 'attach' themselves to this buffer, to facilitate backing storage negotiation, using dma_buf_attach() API. - association of a file pointer with each user-buffer and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - this exported buffer-object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist using get_scatterlist and put_scatterlist operations.

Documentation present in the patch-set gives more details.

This is based on design suggestions from many people at the mini-summits, most notably from Arnd Bergmann arnd@arndb.de, Rob Clark rob@ti.com and Daniel Vetter daniel@ffwll.ch.

The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski t.stanislaws@samsung.com, who demonstrated buffer sharing between two v4l2 devices. [2]

References: [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389

Sumit Semwal (2): dma-buf: Introduce dma buffer sharing mechanism dma-buf: Documentation for buffer sharing framework

Documentation/dma-buf-sharing.txt | 210 ++++++++++++++++++++++++++++++++ drivers/base/Kconfig | 10 ++ drivers/base/Makefile | 1 + drivers/base/dma-buf.c | 242 +++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 162 +++++++++++++++++++++++++ 5 files changed, 625 insertions(+), 0 deletions(-) create mode 100644 Documentation/dma-buf-sharing.txt create mode 100644 drivers/base/dma-buf.c create mode 100644 include/linux/dma-buf.h

-- 1.7.4.1

Show replies by date

Sumit Semwal

11 Oct 11 Oct

9:23 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices.

Atleast one 'attach()' call is required to be made prior to calling the get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the optional allocator defined mmap(), to be used by devices that might need one.

More details are there in the documentation patch.

This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann arnd@arndb.de, Rob Clark rob@ti.com and Daniel Vetter daniel@ffwll.ch.

The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski t.stanislaws@samsung.com, who demonstrated buffer sharing between two v4l2 devices. [2]

[1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389

Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Sumit Semwal sumit.semwal@ti.com --- drivers/base/Kconfig | 10 ++ drivers/base/Makefile | 1 + drivers/base/dma-buf.c | 242 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 162 +++++++++++++++++++++++++++++++ 4 files changed, 415 insertions(+), 0 deletions(-) create mode 100644 drivers/base/dma-buf.c create mode 100644 include/linux/dma-buf.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 21cf46f..07d8095 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -174,4 +174,14 @@ config SYS_HYPERVISOR

source "drivers/base/regmap/Kconfig"

+config DMA_SHARED_BUFFER + bool "Buffer framework to be shared between drivers" + default n + depends on ANON_INODES + help + This option enables the framework for buffer-sharing between + multiple drivers. A buffer is associated with a file using driver + APIs extension; the file's descriptor can then be passed on to other + driver. + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 99a375a..d0df046 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c new file mode 100644 index 0000000..58c51a0 --- /dev/null +++ b/drivers/base/dma-buf.c @@ -0,0 +1,242 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2011 Linaro Limited. All rights reserved. + * Author: Sumit Semwal sumit.semwal@ti.com + * + * Many thanks to linaro-mm-sig list, and specially + * Arnd Bergmann arnd@arndb.de, Rob Clark rob@ti.com and + * Daniel Vetter daniel@ffwll.ch for their support in creation and + * refining of this idea. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/fs.h> +#include <linux/slab.h> +#include <linux/dma-buf.h> +#include <linux/anon_inodes.h> + +static inline int is_dma_buf_file(struct file *); + +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + + if (!dmabuf->ops->mmap) + return -EINVAL; + + return dmabuf->ops->mmap(dmabuf, vma); +} + +static int dma_buf_release(struct inode *inode, struct file *file) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + + dmabuf->ops->release(dmabuf); + kfree(dmabuf); + return 0; +} + +static const struct file_operations dma_buf_fops = { + .mmap = dma_buf_mmap, + .release = dma_buf_release, +}; + +/* + * is_dma_buf_file - Check if struct file* is associated with dma_buf + */ +static inline int is_dma_buf_file(struct file *file) +{ + return file->f_op == &dma_buf_fops; +} + +/** + * dma_buf_export - Creates a new dma_buf, and associates an anon file + * with this buffer,so it can be exported. + * Also connect the allocator specific data and ops to the buffer. + * + * @priv: [in] Attach private data of allocator to this buffer + * @ops: [in] Attach allocator-defined dma buf ops to the new buffer. + * @flags: [in] mode flags for the file. + * + * Returns, on success, a newly created dma_buf object, which wraps the + * supplied private data and operations for dma_buf_ops. On failure to + * allocate the dma_buf object, it can return NULL. + * + */ +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, + int flags) +{ + struct dma_buf *dmabuf; + struct file *file; + + BUG_ON(!priv || !ops); + + dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL); + if (dmabuf == NULL) + return dmabuf; + + dmabuf->priv = priv; + dmabuf->ops = ops; + + file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags); + + dmabuf->file = file; + + mutex_init(&dmabuf->lock); + INIT_LIST_HEAD(&dmabuf->attachments); + + return dmabuf; +} +EXPORT_SYMBOL(dma_buf_export); + + +/** + * dma_buf_fd - returns a file descriptor for the given dma_buf + * @dmabuf: [in] pointer to dma_buf for which fd is required. + * + * On success, returns an associated 'fd'. Else, returns error. + */ +int dma_buf_fd(struct dma_buf *dmabuf) +{ + int error, fd; + + if (!dmabuf->file) + return -EINVAL; + + error = get_unused_fd_flags(0); + if (error < 0) + return error; + fd = error; + + fd_install(fd, dmabuf->file); + + return fd; +} +EXPORT_SYMBOL(dma_buf_fd); + +/** + * dma_buf_get - returns the dma_buf structure related to an fd + * @fd: [in] fd associated with the dma_buf to be returned + * + * On success, returns the dma_buf structure associated with an fd; uses + * file's refcounting done by fget to increase refcount. returns ERR_PTR + * otherwise. + */ +struct dma_buf *dma_buf_get(int fd) +{ + struct file *file; + + file = fget(fd); + + if (!file) + return ERR_PTR(-EBADF); + + if (!is_dma_buf_file(file)) { + fput(file); + return ERR_PTR(-EINVAL); + } + + return file->private_data; +} +EXPORT_SYMBOL(dma_buf_get); + +/** + * dma_buf_put - decreases refcount of the buffer + * @dmabuf: [in] buffer to reduce refcount of + * + * Uses file's refcounting done implicitly by fput() + */ +void dma_buf_put(struct dma_buf *dmabuf) +{ + BUG_ON(!dmabuf->file); + + fput(dmabuf->file); + + return; +} +EXPORT_SYMBOL(dma_buf_put); + +/** + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally, + * calls attach() of dma_buf_ops to allow device-specific attach functionality + * @dmabuf: [in] buffer to attach device to. + * @dev: [in] device to be attached. + * + * Returns struct dma_buf_attachment * for this attachment; may return NULL. + * + */ +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, + struct device *dev) +{ + struct dma_buf_attachment *attach; + int ret; + + BUG_ON(!dmabuf || !dev); + + mutex_lock(&dmabuf->lock); + + attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL); + if (attach == NULL) + goto err_alloc; + + attach->dev = dev; + if (dmabuf->ops->attach) { + ret = dmabuf->ops->attach(dmabuf, dev, attach); + if (!ret) + goto err_attach; + } + list_add(&attach->node, &dmabuf->attachments); + +err_alloc: + mutex_unlock(&dmabuf->lock); + return attach; +err_attach: + kfree(attach); + mutex_unlock(&dmabuf->lock); + return ERR_PTR(ret); +} +EXPORT_SYMBOL(dma_buf_attach); + +/** + * dma_buf_detach - Remove the given attachment from dmabuf's attachments list; + * optionally calls detach() of dma_buf_ops for device-specific detach + * @dmabuf: [in] buffer to detach from. + * @attach: [in] attachment to be detached; is free'd after this call. + * + */ +void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) +{ + BUG_ON(!dmabuf || !attach); + + mutex_lock(&dmabuf->lock); + list_del(&attach->node); + if (dmabuf->ops->detach) + dmabuf->ops->detach(dmabuf, attach); + + kfree(attach); + mutex_unlock(&dmabuf->lock); + return; +} +EXPORT_SYMBOL(dma_buf_detach); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h new file mode 100644 index 0000000..5bdf16a --- /dev/null +++ b/include/linux/dma-buf.h @@ -0,0 +1,162 @@ +/* + * Header file for dma buffer sharing framework. + * + * Copyright(C) 2011 Linaro Limited. All rights reserved. + * Author: Sumit Semwal sumit.semwal@ti.com + * + * Many thanks to linaro-mm-sig list, and specially + * Arnd Bergmann arnd@arndb.de, Rob Clark rob@ti.com and + * Daniel Vetter daniel@ffwll.ch for their support in creation and + * refining of this idea. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ +#ifndef __DMA_BUF_H__ +#define __DMA_BUF_H__ + +#include <linux/file.h> +#include <linux/err.h> +#include <linux/device.h> +#include <linux/scatterlist.h> +#include <linux/list.h> +#include <linux/dma-mapping.h> + +struct dma_buf; + +/** + * struct dma_buf_attachment - holds device-buffer attachment data + * @dmabuf: buffer for this attachment. + * @dev: device attached to the buffer. + * @node: list_head to allow manipulation of list of dma_buf_attachment. + * @priv: exporter-specific attachment data. + */ +struct dma_buf_attachment { + struct dma_buf *dmabuf; + struct device *dev; + struct list_head node; + void *priv; +}; + +/** + * struct dma_buf_ops - operations possible on struct dma_buf + * @create: creates a struct dma_buf of a fixed size. Actual allocation + * does not happen here. + * @attach: allows different devices to 'attach' themselves to the given + * buffer. It might return -EBUSY to signal that backing storage + * is already allocated and incompatible with the requirements + * of requesting device. [optional] + * @detach: detach a given device from this buffer. [optional] + * @get_scatterlist: returns list of scatter pages allocated, increases + * usecount of the buffer. Requires atleast one attach to be + * called before. Returned sg list should already be mapped + * into _device_ address space. + * @put_scatterlist: decreases usecount of buffer, might deallocate scatter + * pages. + * @mmap: memory map this buffer - optional. + * @release: release this buffer; to be called after the last dma_buf_put. + * @sync_sg_for_cpu: sync the sg list for cpu. + * @sync_sg_for_device: synch the sg list for device. + */ +struct dma_buf_ops { + int (*attach)(struct dma_buf *, struct device *, + struct dma_buf_attachment *); + + void (*detach)(struct dma_buf *, struct dma_buf_attachment *); + + /* For {get,put}_scatterlist below, any specific buffer attributes + * required should get added to device_dma_parameters accessible + * via dev->dma_params. + */ + struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment *, + enum dma_data_direction, + int *nents); + void (*put_scatterlist)(struct dma_buf_attachment *, + struct scatterlist *, + int nents); + /* TODO: Add interruptible and interruptible_timeout versions */ + + /* allow mmap optionally for devices that need it */ + int (*mmap)(struct dma_buf *, struct vm_area_struct *); + /* after final dma_buf_put() */ + void (*release)(struct dma_buf *); + + /* allow allocator to take care of cache ops */ + void (*sync_sg_for_cpu) (struct dma_buf *, struct device *); + void (*sync_sg_for_device)(struct dma_buf *, struct device *); +}; + +/** + * struct dma_buf - shared buffer object + * @file: file pointer used for sharing buffers across, and for refcounting. + * @attachments: list of dma_buf_attachment that denotes all devices attached. + * @ops: dma_buf_ops associated with this buffer object + * @priv: user specific private data + */ +struct dma_buf { + size_t size; + struct file *file; + struct list_head attachments; + const struct dma_buf_ops *ops; + /* mutex to serialize list manipulation and other ops */ + struct mutex lock; + void *priv; +}; + +#ifdef CONFIG_DMA_SHARED_BUFFER +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, + struct device *dev); +void dma_buf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *dmabuf_attach); +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, int flags); +int dma_buf_fd(struct dma_buf *dmabuf); +struct dma_buf *dma_buf_get(int fd); +void dma_buf_put(struct dma_buf *dmabuf); + +#else + +static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, + struct device *dev) +{ + return ERR_PTR(-ENODEV); +} + +static inline void dma_buf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *dmabuf_attach) +{ + return; +} + +static inline struct dma_buf *dma_buf_export(void *priv, + struct dma_buf_ops *ops, + int flags) +{ + return ERR_PTR(-ENODEV); +} + +static inline int dma_buf_fd(struct dma_buf *dmabuf) +{ + return -ENODEV; +} + +static inline struct dma_buf *dma_buf_get(int fd) +{ + return ERR_PTR(-ENODEV); +} + +static inline void dma_buf_put(struct dma_buf *dmabuf) +{ + return; +} +#endif /* CONFIG_DMA_SHARED_BUFFER */ + +#endif /* __DMA_BUF_H__ */

-- 1.7.4.1

Dave Airlie

12 Oct 12 Oct

12:41 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Tue, Oct 11, 2011 at 10:23 AM, Sumit Semwal sumit.semwal@ti.com wrote:

...

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices.

The framework allows:

a new buffer-object to be created with fixed size.

different devices to 'attach' themselves to this buffer, to facilitate

backing storage negotiation, using dma_buf_attach() API.

association of a file pointer with each user-buffer and associated

allocator-defined operations on that buffer. This operation is called the 'export' operation.

this exported buffer-object to be shared with the other entity by asking for

its 'file-descriptor (fd)', and sharing the fd across.

a received fd to get the buffer object back, where it can be accessed using

the associated exporter-defined operations.

the exporter and user to share the scatterlist using get_scatterlist and

put_scatterlist operations.

Atleast one 'attach()' call is required to be made prior to calling the get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the optional allocator defined mmap(), to be used by devices that might need one.

Why is this needed? it really doesn't make sense to be mmaping objects independent of some front-end like drm or v4l.

how will you know what contents are in them, how will you synchronise access. Unless someone has a hard use-case for this I'd say we drop it until someone does.

Dave.

Rob Clark

1:28 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Wed, Oct 12, 2011 at 7:41 AM, Dave Airlie airlied@gmail.com wrote:

...

On Tue, Oct 11, 2011 at 10:23 AM, Sumit Semwal sumit.semwal@ti.com wrote:

...
This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices.

The framework allows:

a new buffer-object to be created with fixed size.

different devices to 'attach' themselves to this buffer, to facilitate

backing storage negotiation, using dma_buf_attach() API.

association of a file pointer with each user-buffer and associated

allocator-defined operations on that buffer. This operation is called the 'export' operation.

this exported buffer-object to be shared with the other entity by asking for

its 'file-descriptor (fd)', and sharing the fd across.

a received fd to get the buffer object back, where it can be accessed using

the associated exporter-defined operations.

the exporter and user to share the scatterlist using get_scatterlist and

put_scatterlist operations.

Atleast one 'attach()' call is required to be made prior to calling the get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the optional allocator defined mmap(), to be used by devices that might need one.

Why is this needed? it really doesn't make sense to be mmaping objects independent of some front-end like drm or v4l.

well, the mmap is actually implemented by the buffer allocator (v4l/drm).. although not sure if this was the point

...

how will you know what contents are in them, how will you synchronise access. Unless someone has a hard use-case for this I'd say we drop it until someone does.

The intent was that this is for well defined formats.. ie. it would need to be a format that both v4l and drm understood in the first place for sharing to make sense at all..

Anyways, the basic reason is to handle random edge cases where you need sw access to the buffer. For example, you are decoding video and pull out a frame to generate a thumbnail w/ a sw jpeg encoder..

On gstreamer 0.11 branch, for example, there is already a map/unmap virtual method on the gst buffer for sw access (ie. same purpose as PrepareAccess/FinishAccess in EXA). The idea w/ dmabuf mmap() support is that we could implement support to mmap()/munmap() before/after sw access.

With this current scheme, synchronization could be handled in dmabufops->mmap() and vm_ops->close().. it is perhaps a bit heavy to require mmap/munmap for each sw access, but I suppose this isn't really for the high-performance use case. It is just so that some random bit of sw that gets passed a dmabuf handle without knowing who allocated it can have sw access if really needed.

BR, -R

...

Dave.

To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Dave Airlie

1:35 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

...

well, the mmap is actually implemented by the buffer allocator (v4l/drm).. although not sure if this was the point

Then why not use the correct interface? doing some sort of not-quite generic interface isn't really helping anyone except adding an ABI that we have to support.

If someone wants to bypass the current kernel APIs we should add a new API for them not shove it into this generic buffer sharing layer.

...

The intent was that this is for well defined formats.. ie. it would need to be a format that both v4l and drm understood in the first place for sharing to make sense at all..

How will you know the stride to take a simple example? The userspace had to create this buffer somehow and wants to share it with "something", you sound like you really needs another API that is a simple accessor API that can handle mmaps.

...

Anyways, the basic reason is to handle random edge cases where you need sw access to the buffer. For example, you are decoding video and pull out a frame to generate a thumbnail w/ a sw jpeg encoder..

Again, doesn't sound like it should be part of this API, and also sounds like the sw jpeg encoder will need more info about the buffer anyways like stride and format.

...

With this current scheme, synchronization could be handled in dmabufops->mmap() and vm_ops->close().. it is perhaps a bit heavy to require mmap/munmap for each sw access, but I suppose this isn't really for the high-performance use case. It is just so that some random bit of sw that gets passed a dmabuf handle without knowing who allocated it can have sw access if really needed.

So I think thats fine, write a sw accessor providers, don't go overloading the buffer sharing code.

This API will limit what people can use this buffer sharing for with pure hw accessors, you might say, oh buts its okay to fail the mmap then, but the chances of sw handling that I'm not so sure off.

Dave.

Rob Clark

1:50 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Wed, Oct 12, 2011 at 8:35 AM, Dave Airlie airlied@gmail.com wrote:

...

...
well, the mmap is actually implemented by the buffer allocator (v4l/drm).. although not sure if this was the point

Then why not use the correct interface? doing some sort of not-quite generic interface isn't really helping anyone except adding an ABI that we have to support.

But what if you don't know who allocated the buffer? How do you know what interface to use to mmap?

...

If someone wants to bypass the current kernel APIs we should add a new API for them not shove it into this generic buffer sharing layer.

...
The intent was that this is for well defined formats.. ie. it would need to be a format that both v4l and drm understood in the first place for sharing to make sense at all..

How will you know the stride to take a simple example? The userspace had to create this buffer somehow and wants to share it with "something", you sound like you really needs another API that is a simple accessor API that can handle mmaps.

Well, things like stride, width, height, color format, userspace needs to know all this already, even for malloc()'d sw buffers. The assumption is userspace already has a way to pass this information around so it was not required to be duplicated by dmabuf.

...

...
Anyways, the basic reason is to handle random edge cases where you need sw access to the buffer. For example, you are decoding video and pull out a frame to generate a thumbnail w/ a sw jpeg encoder..

Again, doesn't sound like it should be part of this API, and also sounds like the sw jpeg encoder will need more info about the buffer anyways like stride and format.

...
With this current scheme, synchronization could be handled in dmabufops->mmap() and vm_ops->close().. it is perhaps a bit heavy to require mmap/munmap for each sw access, but I suppose this isn't really for the high-performance use case. It is just so that some random bit of sw that gets passed a dmabuf handle without knowing who allocated it can have sw access if really needed.

So I think thats fine, write a sw accessor providers, don't go overloading the buffer sharing code.

But then we'd need a different set of accessors for every different drm/v4l/etc driver, wouldn't we?

...

This API will limit what people can use this buffer sharing for with pure hw accessors, you might say, oh buts its okay to fail the mmap then, but the chances of sw handling that I'm not so sure off.

I'm not entirely sure the case you are worried about.. sharing buffers between multiple GPU's that understand same tiled formats? I guess that is a bit different from a case like a jpeg encoder that is passed a dmabuf handle without any idea where it came from..

I guess if sharing a buffer between multiple drm devices, there is nothing stopping you from having some NOT_DMABUF_MMAPABLE flag you pass when the buffer is allocated, then you don't have to support dmabuf->mmap(), and instead mmap via device and use some sort of DRM_CPU_PREP/FINI ioctls for synchronization..

BR, -R

...

Dave.

Dave Airlie

2:01 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

...

But then we'd need a different set of accessors for every different drm/v4l/etc driver, wouldn't we?

Not any more different than you need for this, you just have a new interface that you request a sw object from, then mmap that object, and underneath it knows who owns it in the kernel.

mmap just feels wrong in this API, which is a buffer sharing API not a buffer mapping API.

...

I guess if sharing a buffer between multiple drm devices, there is nothing stopping you from having some NOT_DMABUF_MMAPABLE flag you pass when the buffer is allocated, then you don't have to support dmabuf->mmap(), and instead mmap via device and use some sort of DRM_CPU_PREP/FINI ioctls for synchronization..

Or we could make a generic CPU accessor that we don't have to worry about.

Dave.

Rob Clark

2:24 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Wed, Oct 12, 2011 at 9:01 AM, Dave Airlie airlied@gmail.com wrote:

...

...
But then we'd need a different set of accessors for every different drm/v4l/etc driver, wouldn't we?

Not any more different than you need for this, you just have a new interface that you request a sw object from, then mmap that object, and underneath it knows who owns it in the kernel.

oh, ok, so you are talking about a kernel level interface, rather than userspace..

but I guess in this case I don't quite see the difference. It amounts to which fd you call mmap (or ioctl[*]) on.. If you use the dmabuf fd directly then you don't have to pass around a 2nd fd.

[*] there is nothing stopping defining some dmabuf ioctls (such as for synchronization).. although the thinking was to keep it simple for first version of dmabuf

BR, -R

...

mmap just feels wrong in this API, which is a buffer sharing API not a buffer mapping API.

...
I guess if sharing a buffer between multiple drm devices, there is nothing stopping you from having some NOT_DMABUF_MMAPABLE flag you pass when the buffer is allocated, then you don't have to support dmabuf->mmap(), and instead mmap via device and use some sort of DRM_CPU_PREP/FINI ioctls for synchronization..

Or we could make a generic CPU accessor that we don't have to worry about.

Dave.

Dave Airlie

2:34 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Wed, Oct 12, 2011 at 3:24 PM, Rob Clark robdclark@gmail.com wrote:

...

On Wed, Oct 12, 2011 at 9:01 AM, Dave Airlie airlied@gmail.com wrote:

...
...
But then we'd need a different set of accessors for every different drm/v4l/etc driver, wouldn't we?

Not any more different than you need for this, you just have a new interface that you request a sw object from, then mmap that object, and underneath it knows who owns it in the kernel.

oh, ok, so you are talking about a kernel level interface, rather than userspace..

but I guess in this case I don't quite see the difference. It amounts to which fd you call mmap (or ioctl[*]) on.. If you use the dmabuf fd directly then you don't have to pass around a 2nd fd.

[*] there is nothing stopping defining some dmabuf ioctls (such as for synchronization).. although the thinking was to keep it simple for first version of dmabuf

Yes a separate kernel level interface.

Well I'd like to keep it even simpler. dmabuf is a buffer sharing API, shoehorning in a sw mapping API isn't making it simpler.

The problem I have with implementing mmap on the sharing fd, is that nothing says this should be purely optional and userspace shouldn't rely on it.

In the Intel GEM space alone you have two types of mapping, one direct to shmem one via GTT, the GTT could be even be a linear view. The intel guys initially did GEM mmaps direct to the shmem pages because it seemed simple, up until they had to do step two which was do mmaps on the GTT copy and ended up having two separate mmap methods. I think the problem here is it seems deceptively simple to add this to the API now because the API is simple, however I think in the future it'll become a burden that we'll have to workaround.

Dave.

Daniel Vetter

2:49 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Wed, Oct 12, 2011 at 03:34:54PM +0100, Dave Airlie wrote:

...

On Wed, Oct 12, 2011 at 3:24 PM, Rob Clark robdclark@gmail.com wrote:

...
On Wed, Oct 12, 2011 at 9:01 AM, Dave Airlie airlied@gmail.com wrote:

...
...
But then we'd need a different set of accessors for every different drm/v4l/etc driver, wouldn't we?

Not any more different than you need for this, you just have a new interface that you request a sw object from, then mmap that object, and underneath it knows who owns it in the kernel.

oh, ok, so you are talking about a kernel level interface, rather than userspace..

but I guess in this case I don't quite see the difference. It amounts to which fd you call mmap (or ioctl[*]) on.. If you use the dmabuf fd directly then you don't have to pass around a 2nd fd.

[*] there is nothing stopping defining some dmabuf ioctls (such as for synchronization).. although the thinking was to keep it simple for first version of dmabuf

Yes a separate kernel level interface.

Well I'd like to keep it even simpler. dmabuf is a buffer sharing API, shoehorning in a sw mapping API isn't making it simpler.

The problem I have with implementing mmap on the sharing fd, is that nothing says this should be purely optional and userspace shouldn't rely on it.

In the Intel GEM space alone you have two types of mapping, one direct to shmem one via GTT, the GTT could be even be a linear view. The intel guys initially did GEM mmaps direct to the shmem pages because it seemed simple, up until they had to do step two which was do mmaps on the GTT copy and ended up having two separate mmap methods. I think the problem here is it seems deceptively simple to add this to the API now because the API is simple, however I think in the future it'll become a burden that we'll have to workaround.

Yeah, that's my feeling, too. Adding mmap sounds like a neat, simple idea, that could simplify things for simple devices like v4l. But as soon as you're dealing with a real gpu, nothing is simple. Those who don't believe this, just take a look at the data upload/download paths in the open-source i915,nouveau,radeon drivers. Making this fast (and for gpus, it needs to be fast) requires tons of tricks, special-cases and jumping through loops.

You absolutely want the device-specific ioctls to do that. Adding a generic mmap just makes matters worse, especially if userspace expects this to work synchronized with everything else that is going on.

Cheers, Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Rob Clark

3:15 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Wed, Oct 12, 2011 at 9:34 AM, Dave Airlie airlied@gmail.com wrote:

...

On Wed, Oct 12, 2011 at 3:24 PM, Rob Clark robdclark@gmail.com wrote:

...
On Wed, Oct 12, 2011 at 9:01 AM, Dave Airlie airlied@gmail.com wrote:

...
...
But then we'd need a different set of accessors for every different drm/v4l/etc driver, wouldn't we?

Not any more different than you need for this, you just have a new interface that you request a sw object from, then mmap that object, and underneath it knows who owns it in the kernel.

oh, ok, so you are talking about a kernel level interface, rather than userspace..

but I guess in this case I don't quite see the difference. It amounts to which fd you call mmap (or ioctl[*]) on.. If you use the dmabuf fd directly then you don't have to pass around a 2nd fd.

[*] there is nothing stopping defining some dmabuf ioctls (such as for synchronization).. although the thinking was to keep it simple for first version of dmabuf

Yes a separate kernel level interface.

I'm not against it, but if it is a device-independent interface, it just seems like six of one, half-dozen of the other..

Ie. how does it differ if the dmabuf fd is the fd used for ioctl/mmap, vs if some other /dev/buffer-sharer file that you open?

But I think maybe I'm misunderstanding what you have in mind?

BR, -R

...

Well I'd like to keep it even simpler. dmabuf is a buffer sharing API, shoehorning in a sw mapping API isn't making it simpler.

The problem I have with implementing mmap on the sharing fd, is that nothing says this should be purely optional and userspace shouldn't rely on it.

In the Intel GEM space alone you have two types of mapping, one direct to shmem one via GTT, the GTT could be even be a linear view. The intel guys initially did GEM mmaps direct to the shmem pages because it seemed simple, up until they had to do step two which was do mmaps on the GTT copy and ended up having two separate mmap methods. I think the problem here is it seems deceptively simple to add this to the API now because the API is simple, however I think in the future it'll become a burden that we'll have to workaround.

Dave.

Tomasz Stanislawski

14 Oct 14 Oct

10 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

Hi Mr. Sumit Semwal, Thank you for taking care of the framework for buffer sharing. The support of buffer sharing in V4L2, both exporting and importing was posted in shrbuf proof-of-concept patch. It should be easy to port it to dmabuf.

http://lists.linaro.org/pipermail/linaro-mm-sig/2011-August/000485.html

Please refer to the comments below:

On 10/11/2011 11:23 AM, Sumit Semwal wrote:

...

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices.

The framework allows:

a new buffer-object to be created with fixed size.

different devices to 'attach' themselves to this buffer, to facilitate backing storage negotiation, using dma_buf_attach() API.

association of a file pointer with each user-buffer and associated allocator-defined operations on that buffer. This operation is called the 'export' operation.

this exported buffer-object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across.

a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations.

the exporter and user to share the scatterlist using get_scatterlist and put_scatterlist operations.

Atleast one 'attach()' call is required to be made prior to calling the get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the optional allocator defined mmap(), to be used by devices that might need one.

More details are there in the documentation patch.

This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmannarnd@arndb.de, Rob Clarkrob@ti.com and Daniel Vetterdaniel@ffwll.ch.

The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawskit.stanislaws@samsung.com, who demonstrated buffer sharing between two v4l2 devices. [2]

Signed-off-by: Sumit Semwalsumit.semwal@linaro.org Signed-off-by: Sumit Semwalsumit.semwal@ti.com

drivers/base/Kconfig | 10 ++ drivers/base/Makefile | 1 + drivers/base/dma-buf.c | 242 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 162 +++++++++++++++++++++++++++++++ 4 files changed, 415 insertions(+), 0 deletions(-) create mode 100644 drivers/base/dma-buf.c create mode 100644 include/linux/dma-buf.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 21cf46f..07d8095 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -174,4 +174,14 @@ config SYS_HYPERVISOR

source "drivers/base/regmap/Kconfig"

+config DMA_SHARED_BUFFER
bool "Buffer framework to be shared between drivers"

default n

depends on ANON_INODES

help
 This option enables the framework for buffer-sharing between
 multiple drivers. A buffer is associated with a file using driver
 APIs extension; the file's descriptor can then be passed on to other
 driver.
endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 99a375a..d0df046 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c new file mode 100644 index 0000000..58c51a0 --- /dev/null +++ b/drivers/base/dma-buf.c @@ -0,0 +1,242 @@ +/*

Framework for buffer objects that can be shared across devices/subsystems.

Copyright(C) 2011 Linaro Limited. All rights reserved.

Author: Sumit Semwalsumit.semwal@ti.com

Many thanks to linaro-mm-sig list, and specially

Arnd Bergmannarnd@arndb.de, Rob Clarkrob@ti.com and

Daniel Vetterdaniel@ffwll.ch for their support in creation and

refining of this idea.

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, seehttp://www.gnu.org/licenses/.

*/

+#include<linux/fs.h> +#include<linux/slab.h> +#include<linux/dma-buf.h> +#include<linux/anon_inodes.h>

+static inline int is_dma_buf_file(struct file *);

+static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma) +{
struct dma_buf *dmabuf;

if (!is_dma_buf_file(file))
return -EINVAL;
dmabuf = file->private_data;

if (!dmabuf->ops->mmap)
return -EINVAL;
return dmabuf->ops->mmap(dmabuf, vma);
+}

+static int dma_buf_release(struct inode *inode, struct file *file) +{
struct dma_buf *dmabuf;

if (!is_dma_buf_file(file))
return -EINVAL;
dmabuf = file->private_data;

dmabuf->ops->release(dmabuf);

kfree(dmabuf);

return 0;
+}

+static const struct file_operations dma_buf_fops = {

.mmap = dma_buf_mmap,

.release = dma_buf_release,

+};

+/*

is_dma_buf_file - Check if struct file* is associated with dma_buf

*/

+static inline int is_dma_buf_file(struct file *file) +{

return file->f_op ==&dma_buf_fops;

+}

+/**

dma_buf_export - Creates a new dma_buf, and associates an anon file

with this buffer,so it can be exported.

Also connect the allocator specific data and ops to the buffer.

@priv: [in] Attach private data of allocator to this buffer

@ops: [in] Attach allocator-defined dma buf ops to the new buffer.

@flags: [in] mode flags for the file.

What is the purpose of these flags? The file is not visible to any process by any file system, is it?

...

Returns, on success, a newly created dma_buf object, which wraps the

supplied private data and operations for dma_buf_ops. On failure to

allocate the dma_buf object, it can return NULL.

*/

+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
		int flags)
+{

struct dma_buf *dmabuf;

struct file *file;

why priv is not allowed to be NULL?

...

BUG_ON(!priv || !ops);

dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);

if (dmabuf == NULL)
return dmabuf;
dmabuf->priv = priv;

dmabuf->ops = ops;

file = anon_inode_getfile("dmabuf",&dma_buf_fops, dmabuf, flags);

dmabuf->file = file;

mutex_init(&dmabuf->lock);

INIT_LIST_HEAD(&dmabuf->attachments);

return dmabuf;
+} +EXPORT_SYMBOL(dma_buf_export);

+/**

dma_buf_fd - returns a file descriptor for the given dma_buf

@dmabuf: [in] pointer to dma_buf for which fd is required.

On success, returns an associated 'fd'. Else, returns error.

*/

+int dma_buf_fd(struct dma_buf *dmabuf) +{
int error, fd;

if (!dmabuf->file)
return -EINVAL;
error = get_unused_fd_flags(0);

if (error< 0)
return error;
fd = error;

fd_install(fd, dmabuf->file);

return fd;
+} +EXPORT_SYMBOL(dma_buf_fd);

+/**

dma_buf_get - returns the dma_buf structure related to an fd

@fd: [in] fd associated with the dma_buf to be returned

On success, returns the dma_buf structure associated with an fd; uses

file's refcounting done by fget to increase refcount. returns ERR_PTR

otherwise.

*/

+struct dma_buf *dma_buf_get(int fd) +{
struct file *file;

file = fget(fd);

if (!file)
return ERR_PTR(-EBADF);
if (!is_dma_buf_file(file)) {
fput(file);
return ERR_PTR(-EINVAL);
}

return file->private_data;
+} +EXPORT_SYMBOL(dma_buf_get);

+/**

dma_buf_put - decreases refcount of the buffer

@dmabuf: [in] buffer to reduce refcount of

Uses file's refcounting done implicitly by fput()

*/

+void dma_buf_put(struct dma_buf *dmabuf) +{

BUG_ON(!dmabuf->file);

fput(dmabuf->file);

return is not needed

...

return;

+} +EXPORT_SYMBOL(dma_buf_put);

+/**

dma_buf_attach - Add the device to dma_buf's attachments list; optionally,

calls attach() of dma_buf_ops to allow device-specific attach functionality

@dmabuf: [in] buffer to attach device to.

@dev: [in] device to be attached.

Returns struct dma_buf_attachment * for this attachment; may return NULL.

*/

+struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
				struct device *dev)
+{

struct dma_buf_attachment *attach;

int ret;

BUG_ON(!dmabuf || !dev);

mutex_lock(&dmabuf->lock);

There is no need to call kzalloc inside critical section protected by dmabuf->lock. The code would be simpler if the allocation is moved outside the section.

...

attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);

if (attach == NULL)
goto err_alloc;
attach->dev = dev;

if (dmabuf->ops->attach) {
ret = dmabuf->ops->attach(dmabuf, dev, attach);
if (!ret)
	goto err_attach;
}

list_add(&attach->node,&dmabuf->attachments);
+err_alloc:

mutex_unlock(&dmabuf->lock);

return attach;

+err_attach:

kfree(attach);

mutex_unlock(&dmabuf->lock);

return ERR_PTR(ret);

+} +EXPORT_SYMBOL(dma_buf_attach);

+/**

dma_buf_detach - Remove the given attachment from dmabuf's attachments list;

optionally calls detach() of dma_buf_ops for device-specific detach

@dmabuf: [in] buffer to detach from.

@attach: [in] attachment to be detached; is free'd after this call.

*/

+void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) +{
BUG_ON(!dmabuf || !attach);

mutex_lock(&dmabuf->lock);

list_del(&attach->node);

if (dmabuf->ops->detach)
dmabuf->ops->detach(dmabuf, attach);

[as above]

...

kfree(attach);

mutex_unlock(&dmabuf->lock);

return;

+} +EXPORT_SYMBOL(dma_buf_detach); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h new file mode 100644 index 0000000..5bdf16a --- /dev/null +++ b/include/linux/dma-buf.h @@ -0,0 +1,162 @@ +/*

Header file for dma buffer sharing framework.

Copyright(C) 2011 Linaro Limited. All rights reserved.

Author: Sumit Semwalsumit.semwal@ti.com

Many thanks to linaro-mm-sig list, and specially

Arnd Bergmannarnd@arndb.de, Rob Clarkrob@ti.com and

Daniel Vetterdaniel@ffwll.ch for their support in creation and

refining of this idea.

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, seehttp://www.gnu.org/licenses/.

*/

+#ifndef __DMA_BUF_H__ +#define __DMA_BUF_H__

+#include<linux/file.h> +#include<linux/err.h> +#include<linux/device.h> +#include<linux/scatterlist.h> +#include<linux/list.h> +#include<linux/dma-mapping.h>

+struct dma_buf;

+/**

struct dma_buf_attachment - holds device-buffer attachment data

@dmabuf: buffer for this attachment.

@dev: device attached to the buffer.

@node: list_head to allow manipulation of list of dma_buf_attachment.

@priv: exporter-specific attachment data.

*/

+struct dma_buf_attachment {

struct dma_buf *dmabuf;

struct device *dev;

struct list_head node;

void *priv;

+};

+/**
struct dma_buf_ops - operations possible on struct dma_buf

@create: creates a struct dma_buf of a fixed size. Actual allocation
   does not happen here.

The 'create' ops is not present in dma_buf_ops.

...

@attach: allows different devices to 'attach' themselves to the given
   buffer. It might return -EBUSY to signal that backing storage
   is already allocated and incompatible with the requirements
   of requesting device. [optional]
@detach: detach a given device from this buffer. [optional]

@get_scatterlist: returns list of scatter pages allocated, increases
     usecount of the buffer. Requires atleast one attach to be
     called before. Returned sg list should already be mapped
     into _device_ address space.

You must add a comment that this call 'may sleep'.

I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way.

I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory.

Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements.

Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework.

I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet.

The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future.

The same situation happens if buffer sharing is added to framebuffer API.

The buffer sharing mechanism is dedicated to improve cooperation between multiple APIs. Therefore the common denominator strategy should be applied that is buffer-creation == buffer-allocation.

...

@put_scatterlist: decreases usecount of buffer, might deallocate scatter
     pages.
@mmap: memory map this buffer - optional.

@release: release this buffer; to be called after the last dma_buf_put.

@sync_sg_for_cpu: sync the sg list for cpu.

@sync_sg_for_device: synch the sg list for device.

*/
+struct dma_buf_ops {
int (*attach)(struct dma_buf *, struct device *,
	struct dma_buf_attachment *);
void (*detach)(struct dma_buf *, struct dma_buf_attachment *);

/* For {get,put}_scatterlist below, any specific buffer attributes
* required should get added to device_dma_parameters accessible
* via dev->dma_params.
*/
struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment *,
				enum dma_data_direction,
				int *nents);
void (*put_scatterlist)(struct dma_buf_attachment *,
				struct scatterlist *,
				int nents);
/* TODO: Add interruptible and interruptible_timeout versions */

I don't agree the interruptible and interruptible_timeout versions are needed. I think that get_scatterlist should alway be interruptible. You can add try_get_scatterlist callback that returns ERR_PTR(-EBUSY) if the call would be blocking.

...

/* allow mmap optionally for devices that need it */

int (*mmap)(struct dma_buf *, struct vm_area_struct *);

The mmap is not needed for inital version. It could be added at any time in the future. The dmabuf client should not be allowed to create mapping of the dmabuf from the scatterlist.

...

/* after final dma_buf_put() */

void (*release)(struct dma_buf *);

/* allow allocator to take care of cache ops */

void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);

void (*sync_sg_for_device)(struct dma_buf *, struct device *);

+};

+/**

struct dma_buf - shared buffer object

@file: file pointer used for sharing buffers across, and for refcounting.

@attachments: list of dma_buf_attachment that denotes all devices attached.

@ops: dma_buf_ops associated with this buffer object

@priv: user specific private data

*/

+struct dma_buf {

size_t size;

struct file *file;

struct list_head attachments;

const struct dma_buf_ops *ops;

/* mutex to serialize list manipulation and other ops */

struct mutex lock;

void *priv;

+};

+#ifdef CONFIG_DMA_SHARED_BUFFER +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
					struct device *dev);
+void dma_buf_detach(struct dma_buf *dmabuf,
		struct dma_buf_attachment *dmabuf_attach);
+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, int flags); +int dma_buf_fd(struct dma_buf *dmabuf); +struct dma_buf *dma_buf_get(int fd); +void dma_buf_put(struct dma_buf *dmabuf);

+#else

+static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
					struct device *dev)
+{

return ERR_PTR(-ENODEV);

+}

+static inline void dma_buf_detach(struct dma_buf *dmabuf,
		  struct dma_buf_attachment *dmabuf_attach)
+{

return;

+}

+static inline struct dma_buf *dma_buf_export(void *priv,
				struct dma_buf_ops *ops,
				int flags)
+{

return ERR_PTR(-ENODEV);

+}

+static inline int dma_buf_fd(struct dma_buf *dmabuf) +{

return -ENODEV;

+}

+static inline struct dma_buf *dma_buf_get(int fd) +{

return ERR_PTR(-ENODEV);

+}

+static inline void dma_buf_put(struct dma_buf *dmabuf) +{

return;

+} +#endif /* CONFIG_DMA_SHARED_BUFFER */

+#endif /* __DMA_BUF_H__ */

I hope you find my comments useful.

Yours sincerely, Tomasz Stanislawski

Sumit Semwal

2:13 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On 14 October 2011 15:30, Tomasz Stanislawski t.stanislaws@samsung.com wrote:

...

Hi Mr. Sumit Semwal,

Hello Mr. Tomasz Stanislawski :),

...

Thank you for taking care of the framework for buffer sharing. The support of buffer sharing in V4L2, both exporting and importing was posted in shrbuf proof-of-concept patch. It should be easy to port it to dmabuf.

http://lists.linaro.org/pipermail/linaro-mm-sig/2011-August/000485.html

I should thank you for the wonderful proof-of-concept patch, and the idea behind it! I am currently working on the V4L2 side patch for it, and would send out the RFC soon. Also, thanks for a good review and some pertinent points; replies inline.

...

Please refer to the comments below:

On 10/11/2011 11:23 AM, Sumit Semwal wrote:

...

<snip>

...

...
+/**

dma_buf_export - Creates a new dma_buf, and associates an anon file

with this buffer,so it can be exported.

Also connect the allocator specific data and ops to the buffer.

@priv: [in] Attach private data of allocator to this buffer

@ops: [in] Attach allocator-defined dma buf ops to the new

buffer.

@flags: [in] mode flags for the file.

What is the purpose of these flags? The file is not visible to any process by any file system, is it?

These are the standard file mode flags, which can be used to do file-level access-type control by the exporter, so for example write-access can be denied for a buffer exported as a read-only buffer.

...

...

Returns, on success, a newly created dma_buf object, which wraps the

supplied private data and operations for dma_buf_ops. On failure to

allocate the dma_buf object, it can return NULL.

*/

+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,

int flags)

+{

struct dma_buf *dmabuf;

struct file *file;

why priv is not allowed to be NULL?

priv will be used by the exporter to attach its own context to the dma buf; I couldn't think of any case where it could be NULL?

...

...

BUG_ON(!priv || !ops);

<snip>

...

...

BUG_ON(!dmabuf->file);

fput(dmabuf->file);

return is not needed

Right; will correct this.

...

...

return;

+} +EXPORT_SYMBOL(dma_buf_put);

+/**

dma_buf_attach - Add the device to dma_buf's attachments list;

optionally,

calls attach() of dma_buf_ops to allow device-specific attach

functionality

@dmabuf: [in] buffer to attach device to.

@dev: [in] device to be attached.

Returns struct dma_buf_attachment * for this attachment; may return

NULL.

*/

+struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,

struct device *dev)

+{

struct dma_buf_attachment *attach;

int ret;

BUG_ON(!dmabuf || !dev);

mutex_lock(&dmabuf->lock);

There is no need to call kzalloc inside critical section protected by dmabuf->lock. The code would be simpler if the allocation is moved outside the section.

Yes, you're right; will correct it.

...

...

attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);

if (attach == NULL)

goto err_alloc;

attach->dev = dev;

if (dmabuf->ops->attach) {

ret = dmabuf->ops->attach(dmabuf, dev, attach);

if (!ret)

goto err_attach;

}

list_add(&attach->node,&dmabuf->attachments);

+err_alloc:

mutex_unlock(&dmabuf->lock);

return attach;

+err_attach:

kfree(attach);

mutex_unlock(&dmabuf->lock);

return ERR_PTR(ret);

+} +EXPORT_SYMBOL(dma_buf_attach);

+/**

dma_buf_detach - Remove the given attachment from dmabuf's attachments

list;

optionally calls detach() of dma_buf_ops for device-specific detach

@dmabuf: [in] buffer to detach from.

@attach: [in] attachment to be detached; is free'd after this

call.

*/

+void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) +{

BUG_ON(!dmabuf || !attach);

mutex_lock(&dmabuf->lock);

list_del(&attach->node);

if (dmabuf->ops->detach)

dmabuf->ops->detach(dmabuf, attach);

[as above]

Ok.

...

...

kfree(attach);

mutex_unlock(&dmabuf->lock);

return;

+} +EXPORT_SYMBOL(dma_buf_detach); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h new file mode 100644 index 0000000..5bdf16a --- /dev/null +++ b/include/linux/dma-buf.h @@ -0,0 +1,162 @@ +/*

Header file for dma buffer sharing framework.

Copyright(C) 2011 Linaro Limited. All rights reserved.

Author: Sumit Semwalsumit.semwal@ti.com

Many thanks to linaro-mm-sig list, and specially

Arnd Bergmannarnd@arndb.de, Rob Clarkrob@ti.com and

Daniel Vetterdaniel@ffwll.ch for their support in creation and

refining of this idea.

This program is free software; you can redistribute it and/or modify

it

under the terms of the GNU General Public License version 2 as

published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but

WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License

for

more details.

You should have received a copy of the GNU General Public License

along with

this program. If not, seehttp://www.gnu.org/licenses/.

*/

+#ifndef __DMA_BUF_H__ +#define __DMA_BUF_H__

+#include<linux/file.h> +#include<linux/err.h> +#include<linux/device.h> +#include<linux/scatterlist.h> +#include<linux/list.h> +#include<linux/dma-mapping.h>

+struct dma_buf;

+/**

struct dma_buf_attachment - holds device-buffer attachment data

@dmabuf: buffer for this attachment.

@dev: device attached to the buffer.

@node: list_head to allow manipulation of list of dma_buf_attachment.

@priv: exporter-specific attachment data.

*/

+struct dma_buf_attachment {

struct dma_buf *dmabuf;

struct device *dev;

struct list_head node;

void *priv;

+};

+/**

struct dma_buf_ops - operations possible on struct dma_buf

@create: creates a struct dma_buf of a fixed size. Actual allocation

does not happen here.

The 'create' ops is not present in dma_buf_ops.

Yes, this is a copy-paste mistake; will correct.

...

...

@attach: allows different devices to 'attach' themselves to the given

buffer. It might return -EBUSY to signal that backing storage

is already allocated and incompatible with the requirements

of requesting device. [optional]

@detach: detach a given device from this buffer. [optional]

@get_scatterlist: returns list of scatter pages allocated, increases

usecount of the buffer. Requires atleast one attach

to be

called before. Returned sg list should already be

mapped

into _device_ address space.

You must add a comment that this call 'may sleep'.

Thanks, I will add this.

...

I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way.

I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory.

You are right, there is currently no generic way for attribute negotiation for buffers of multiple devices. However, since buffer sharing idea itself is about allowing for buffer allocation and migration based on 'some' negotiation, the attachment idea makes it easier to do so.

What you choose to call 'more or less hacky solution', I would call 'platform-specific mechanism' for negotiation, allocation (and migration if possible and required). :)

...

Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements.

Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework.

I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet.

I quite like what you said - the problem of having a 'generic' mechanism of DMA attribute negotiation might not have been solved yet, and so the exporter drivers shouldn't need to try to do that. However, exporter drivers would most likely know about the negotiation etc needed and possible on their given platform - and the attachment mechanism gives them a way to gather requirements from participating DMA users, and use it for such decisions.

...

The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future.

That is correct for V4L2_MEMORY_MMAP; should that need to be a requirement even for V4L2_MEMORY_DMABUF_USER memory type? [I am deliberately distinguishing a V4L2 driver which is a shared buffer exporter from a V4L2 shared-buffer-user driver] - could we not do a get_scatterlist on each VIDIOC_QBUF, and put_scatterlist on each VIDIOC_DQBUF?

...

The same situation happens if buffer sharing is added to framebuffer API.

The buffer sharing mechanism is dedicated to improve cooperation between multiple APIs. Therefore the common denominator strategy should be applied that is buffer-creation == buffer-allocation.

It might be prudent to do so in the media drivers. But the buffer sharing mechanism is also useful for other, non-media usecases? I would say that we should be able to leave enough flexibility.

...

...

@put_scatterlist: decreases usecount of buffer, might deallocate

scatter

pages.

@mmap: memory map this buffer - optional.

@release: release this buffer; to be called after the last

dma_buf_put.

@sync_sg_for_cpu: sync the sg list for cpu.

@sync_sg_for_device: synch the sg list for device.

*/

+struct dma_buf_ops {

int (*attach)(struct dma_buf *, struct device *,

struct dma_buf_attachment *);

void (*detach)(struct dma_buf *, struct dma_buf_attachment *);

/* For {get,put}_scatterlist below, any specific buffer attributes

* required should get added to device_dma_parameters accessible

* via dev->dma_params.

*/

struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment

*,

enum dma_data_direction,

int *nents);

void (*put_scatterlist)(struct dma_buf_attachment *,

struct scatterlist *,

int nents);

/* TODO: Add interruptible and interruptible_timeout versions */

I don't agree the interruptible and interruptible_timeout versions are needed. I think that get_scatterlist should alway be interruptible. You can add try_get_scatterlist callback that returns ERR_PTR(-EBUSY) if the call would be blocking.

Sure, that's a good idea - I could do that.

...

...

/* allow mmap optionally for devices that need it */

int (*mmap)(struct dma_buf *, struct vm_area_struct *);

The mmap is not needed for inital version. It could be added at any time in the future. The dmabuf client should not be allowed to create mapping of the dmabuf from the scatterlist.

There's already a discussion between Rob and David on this mail-chain earlier; I guess we'll need to wait a little before deciding to remove this.

...

<snip>

...

I hope you find my comments useful.

Like I said Tomasz, I thank you for a very good review! Hope my replies satisfy some of the concerns you raised?

...

Yours sincerely, Tomasz Stanislawski

-- Thanks and regards, Sumit Semwal Linaro Kernel Engineer - Graphics working group Linaro.org │ Open source software for ARM SoCs

Rob Clark

3:34 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Fri, Oct 14, 2011 at 5:00 AM, Tomasz Stanislawski t.stanislaws@samsung.com wrote:

...

...

@attach: allows different devices to 'attach' themselves to the given

buffer. It might return -EBUSY to signal that backing storage

is already allocated and incompatible with the requirements

of requesting device. [optional]

@detach: detach a given device from this buffer. [optional]

@get_scatterlist: returns list of scatter pages allocated, increases

usecount of the buffer. Requires atleast one attach

to be

called before. Returned sg list should already be

mapped

into _device_ address space.

You must add a comment that this call 'may sleep'.

I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way.

I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory.

Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements.

dev->dma_params (struct device_dma_parameters).. for example

it would need to be expanded a bit to have a way to say "it needs to be physically contiguous"..

...

Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework.

I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet.

The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future.

But this problem really only applies if v4l is your buffer allocator. I don't think a v4l limitation is a valid argument to remove the attachment stuff.

...

The same situation happens if buffer sharing is added to framebuffer API.

The buffer sharing mechanism is dedicated to improve cooperation between multiple APIs. Therefore the common denominator strategy should be applied that is buffer-creation == buffer-allocation.

I think it would be sufficient if buffer creators that cannot defer the allocation just take a worst-case approach and allocate physically contiguous buffers. No need to penalize other potential buffer allocators. This allows buffer creators with more flexibility the option for deferring the allocation until it knows whether the buffer really needs to be contiguous.

...

...

@put_scatterlist: decreases usecount of buffer, might deallocate

scatter

pages.

@mmap: memory map this buffer - optional.

@release: release this buffer; to be called after the last

dma_buf_put.

@sync_sg_for_cpu: sync the sg list for cpu.

@sync_sg_for_device: synch the sg list for device.

*/

+struct dma_buf_ops {

int (*attach)(struct dma_buf *, struct device *,

struct dma_buf_attachment *);

void (*detach)(struct dma_buf *, struct dma_buf_attachment *);

/* For {get,put}_scatterlist below, any specific buffer attributes

* required should get added to device_dma_parameters accessible

* via dev->dma_params.

*/

struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment

*,

enum dma_data_direction,

int *nents);

void (*put_scatterlist)(struct dma_buf_attachment *,

struct scatterlist *,

int nents);

/* TODO: Add interruptible and interruptible_timeout versions */

I don't agree the interruptible and interruptible_timeout versions are needed. I think that get_scatterlist should alway be interruptible. You can add try_get_scatterlist callback that returns ERR_PTR(-EBUSY) if the call would be blocking.

...

/* allow mmap optionally for devices that need it */

int (*mmap)(struct dma_buf *, struct vm_area_struct *);

The mmap is not needed for inital version. It could be added at any time in the future. The dmabuf client should not be allowed to create mapping of the dmabuf from the scatterlist.

fwiw, this wasn't intended for allowing the client to create the mapping.. the intention was that the buffer creator always be the one that implements the mmap'ing. This was just to implement fops->mmap() for the dmabuf handle, ie. so userspace could mmap the buffer without having to know *who* allocated it. Otherwise you have to also pass around the fd of the allocator and an offset.

BR, -R

Daniel Vetter

3:35 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:

...

...
+/**
struct dma_buf_ops - operations possible on struct dma_buf

@create: creates a struct dma_buf of a fixed size. Actual allocation
   does not happen here.
The 'create' ops is not present in dma_buf_ops.

...
@attach: allows different devices to 'attach' themselves to the given
   buffer. It might return -EBUSY to signal that backing storage
   is already allocated and incompatible with the requirements
   of requesting device. [optional]
@detach: detach a given device from this buffer. [optional]

@get_scatterlist: returns list of scatter pages allocated, increases
     usecount of the buffer. Requires atleast one attach to be
     called before. Returned sg list should already be mapped
     into _device_ address space.
You must add a comment that this call 'may sleep'.

I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way.

I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory.

Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements.

Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework.

Imo we clearly need this to make the multi-device-driver with insane dma requirements work on arm. And rewriting the buffer handling in participating subsystem twice isn't really a great plan. I envision that on platforms where we need this madness, the driver must call back to the dma subsytem to create a dma_buf. The dma subsytem should be already aware of all the requirements and hence should be able to handle them..

...

I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet.

Well, this is why we need to create a decent support infrastructure for platforms (= arm madness) that needs this, so that device drivers and subsystem don't need to invent that wheel on their own. Which as you point out, they actually can't.

...

The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future.

Well, pardon to break the news, but v4l needs to rework the buffer handling. If you want to share buffers with a gpu driver, you _have_ to life with the fact that gpus do fully dynamic buffer management, meaning: - buffers get allocated and destroyed on the fly, meaning static reqbuf just went out the window (we obviously cache buffer objects and reuse them for performance, as long as the processing pipeline doesn't really change). - buffers get moved around in memory, meaning you either need full-blown sync-objects with a callback to drivers to tear-down mappings on-demand, or every driver needs to guarnatee to call put_scatterlist in a reasonable short time. The latter is probably the more natural thing for v4l devices.

...

The same situation happens if buffer sharing is added to framebuffer API.

You can fix that by using the gem/ttm infrastructure of drm (or whatever the blob gpu drivers are using). Which is why I think fb should just die, please.

...

The buffer sharing mechanism is dedicated to improve cooperation between multiple APIs. Therefore the common denominator strategy should be applied that is buffer-creation == buffer-allocation.

No.

Really, there's just no way gpu's will be moving back to static buffer management. And I know, for many use-cases we could get away with a bunch of static buffers (e.g. a video processing pipe). But in drm-land even scanout-buffers can get moved around - currently only when they're not being used, but strictly speaking nothing prevents us from copying the scanout to a new location and issueing a pageflip and so even move the buffer around even when it's in use.

But let's look quickly at an OpenCL usecase, moving Gb's of date per second around between the cpu and a bunch of add-on gpus (or other special purpose processing units). We'd also need to extend dma_buf with sync objects to make this work well, but there's simply no way this is gonna work with statically allocated objects.

Also, gem buffer objects that are currently unused can be swapped out.

Cheers, Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Marek Szyprowski

3 Nov 3 Nov

8:04 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

Hello,

I'm sorry for a late reply, but after Kernel Summit/ELC I have some comments.

On Friday, October 14, 2011 5:35 PM Daniel Vetter wrote:

...

On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:

...
...
+/**
struct dma_buf_ops - operations possible on struct dma_buf

@create: creates a struct dma_buf of a fixed size. Actual allocation
   does not happen here.
The 'create' ops is not present in dma_buf_ops.

...
@attach: allows different devices to 'attach' themselves to the given
   buffer. It might return -EBUSY to signal that backing storage
   is already allocated and incompatible with the requirements
   of requesting device. [optional]
@detach: detach a given device from this buffer. [optional]

@get_scatterlist: returns list of scatter pages allocated, increases
     usecount of the buffer. Requires atleast one attach to be
     called before. Returned sg list should already be mapped
     into _device_ address space.
You must add a comment that this call 'may sleep'.

I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way.

I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory.

Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements.

Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework.
Imo we clearly need this to make the multi-device-driver with insane dma requirements work on arm. And rewriting the buffer handling in participating subsystem twice isn't really a great plan. I envision that on platforms where we need this madness, the driver must call back to the dma subsytem to create a dma_buf. The dma subsytem should be already aware of all the requirements and hence should be able to handle them..

...
I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet.

Well, this is why we need to create a decent support infrastructure for platforms (= arm madness) that needs this, so that device drivers and subsystem don't need to invent that wheel on their own. Which as you point out, they actually can't.

The real question is whether it is possible to create any generic support infrastructure. I really doubt. IMHO this is something that will be hacked for each 'product release' and will never read the mainline...

...

...
The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future.

Well, pardon to break the news, but v4l needs to rework the buffer handling. If you want to share buffers with a gpu driver, you _have_ to life with the fact that gpus do fully dynamic buffer management, meaning:

buffers get allocated and destroyed on the fly, meaning static reqbuf just went out the window (we obviously cache buffer objects and reuse them for performance, as long as the processing pipeline doesn't really change).

buffers get moved around in memory, meaning you either need full-blown sync-objects with a callback to drivers to tear-down mappings on-demand, or every driver needs to guarnatee to call put_scatterlist in a reasonable short time. The latter is probably the more natural thing for v4l devices.

I'm really not convinced if it is possible to go for the completely dynamic buffer management, especially if we are implementing a proof-of-concept solution. Please notice the following facts:

1. all v4l2 drivers do the 'static' buffer management - memory is being allocated on REQBUF() call and then mapped permanently into both userspace and dma (io) address space.

2. dma-mapping api is very limited in the area of the dynamic buffer management, this API has been designed definitely for static buffer allocation and mapping.

It looks that fully dynamic buffer management requires a complete change of v4l2 api principles (V4L3?) and a completely new DMA API interface. That's probably the reason by none of the GPU driver relies on the DMA-mapping API and implements custom solution for managing the mappings.

This reminds me one more issue I've noticed in the current dma buf proof-of- concept. You assumed that the exporter will be responsible for mapping the buffer into io address space of all the client devices. What if the device needs additional custom hooks/hacks during the mappings? This will be a serious problem for the current GPU drivers for example. IMHO the API will be much clearer if each client driver will map the scatter list gathered from the dma buf by itself. Only the client driver has the complete knowledge how to do this correctly for this particular device. This way it will also work with devices that don't do the real DMA (like for example USB devices that copy all data from usb packets to the target buffer with the cpu).

Best regards

-- Marek Szyprowski Samsung Poland R&D Center

Clark, Rob

8 Nov 8 Nov

4:59 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski m.szyprowski@samsung.com wrote:

...

Hello,

I'm sorry for a late reply, but after Kernel Summit/ELC I have some comments.

On Friday, October 14, 2011 5:35 PM Daniel Vetter wrote:

...
On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:

...
...
+/**

struct dma_buf_ops - operations possible on struct dma_buf

@create: creates a struct dma_buf of a fixed size. Actual allocation

does not happen here.

The 'create' ops is not present in dma_buf_ops.

...

@attach: allows different devices to 'attach' themselves to the given

buffer. It might return -EBUSY to signal that backing storage

is already allocated and incompatible with the requirements

of requesting device. [optional]

@detach: detach a given device from this buffer. [optional]

@get_scatterlist: returns list of scatter pages allocated, increases

usecount of the buffer. Requires atleast one attach to be

called before. Returned sg list should already be mapped

into _device_ address space.

You must add a comment that this call 'may sleep'.

I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way.

I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory.

Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements.

Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework.

Imo we clearly need this to make the multi-device-driver with insane dma requirements work on arm. And rewriting the buffer handling in participating subsystem twice isn't really a great plan. I envision that on platforms where we need this madness, the driver must call back to the dma subsytem to create a dma_buf. The dma subsytem should be already aware of all the requirements and hence should be able to handle them..

...
I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet.

Well, this is why we need to create a decent support infrastructure for platforms (= arm madness) that needs this, so that device drivers and subsystem don't need to invent that wheel on their own. Which as you point out, they actually can't.

The real question is whether it is possible to create any generic support infrastructure. I really doubt. IMHO this is something that will be hacked for each 'product release' and will never read the mainline...

...
...
The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future.

Well, pardon to break the news, but v4l needs to rework the buffer handling. If you want to share buffers with a gpu driver, you _have_ to life with the fact that gpus do fully dynamic buffer management, meaning:

buffers get allocated and destroyed on the fly, meaning static reqbuf

just went out the window (we obviously cache buffer objects and reuse them for performance, as long as the processing pipeline doesn't really change).

buffers get moved around in memory, meaning you either need full-blown

sync-objects with a callback to drivers to tear-down mappings on-demand, or every driver needs to guarnatee to call put_scatterlist in a reasonable short time. The latter is probably the more natural thing for v4l devices.

I'm really not convinced if it is possible to go for the completely dynamic buffer management, especially if we are implementing a proof-of-concept solution. Please notice the following facts:

all v4l2 drivers do the 'static' buffer management - memory is being

allocated on REQBUF() call and then mapped permanently into both userspace and dma (io) address space.

Is this strictly true if we are introducing a new 'enum v4l2_memory' for dmabuf's? Shouldn't that give us some flexibility, especially if the v4l2 device is only the importer, not the allocator, of the memory.

and a couple more notes: 1) for V4L2_MEMORY_DMABUF we don't want the importing v4l2 device to create a userspace memory mapping 2) there is nothing stopping a dmabuf exporter from being more strict about static allocation of buffers.. we just want the API to be flexible enough so that exporting devices which are more dynamic can continue to do so.

...

dma-mapping api is very limited in the area of the dynamic buffer management,

this API has been designed definitely for static buffer allocation and mapping.

It looks that fully dynamic buffer management requires a complete change of v4l2 api principles (V4L3?) and a completely new DMA API interface. That's probably the reason by none of the GPU driver relies on the DMA-mapping API and implements custom solution for managing the mappings.

This reminds me one more issue I've noticed in the current dma buf proof-of- concept. You assumed that the exporter will be responsible for mapping the buffer into io address space of all the client devices. What if the device needs additional custom hooks/hacks during the mappings? This will be a serious problem for the current GPU drivers for example. IMHO the API will be much clearer if each client driver will map the scatter list gathered from the dma buf by itself. Only the client driver has the complete knowledge how to do this correctly for this particular device. This way it will also work with devices that don't do the real DMA (like for example USB devices that copy all data from usb packets to the target buffer with the cpu).

The exporter doesn't map.. it returns a scatterlist to the importer. But the exporter does allocate and pin backing pages. And it is preferable if the exporter has the opportunity to wait until as much is known about the various importing devices to know if it must allocate contiguous pages, or pages in a certain range.

That said, on a platform where everything had iommu's or somehow didn't have any particular memory requirements, or where the exporter had the strictest requirements (or at least knew of the strictest requirements), then the exporter is free to allocate/pin the backing pages earlier, such as even before the buffer is exported.

BR, -R

...

Best regards

Marek Szyprowski Samsung Poland R&D Center

Daniel Vetter

5:42 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

On Tue, Nov 08, 2011 at 10:59:56AM -0600, Clark, Rob wrote:

...

On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski

...

dma-mapping api is very limited in the area of the dynamic buffer management,

this API has been designed definitely for static buffer allocation and mapping.

It looks that fully dynamic buffer management requires a complete change of v4l2 api principles (V4L3?) and a completely new DMA API interface. That's probably the reason by none of the GPU driver relies on the DMA-mapping API and implements custom solution for managing the mappings.

This reminds me one more issue I've noticed in the current dma buf proof-of- concept. You assumed that the exporter will be responsible for mapping the buffer into io address space of all the client devices. What if the device needs additional custom hooks/hacks during the mappings? This will be a serious problem for the current GPU drivers for example. IMHO the API will be much clearer if each client driver will map the scatter list gathered from the dma buf by itself. Only the client driver has the complete knowledge how to do this correctly for this particular device. This way it will also work with devices that don't do the real DMA (like for example USB devices that copy all data from usb packets to the target buffer with the cpu).

The exporter doesn't map.. it returns a scatterlist to the importer. But the exporter does allocate and pin backing pages. And it is preferable if the exporter has the opportunity to wait until as much is known about the various importing devices to know if it must allocate contiguous pages, or pages in a certain range.

Actually I think the importer should get a _mapped_ scatterlist when it calls get_scatterlist. The simple reason is that for strange stuff like memory remapped into e.g. omaps TILER doesn't have any sensible notion of an address in physical memory. For the USB-example I think the right approach is to attach the usb hci to the dma_buf, after all that is the device that will read the data and move over the usb bus to the udl device. Similar for any other device that sits behind a bus that can't do dma (or it doesn't make sense to do dma).

Imo if there's a use-case where the client needs to frob the sg_list before calling dma_map_sg, we have an issue with the dma subsystem in general.

...

That said, on a platform where everything had iommu's or somehow didn't have any particular memory requirements, or where the exporter had the strictest requirements (or at least knew of the strictest requirements), then the exporter is free to allocate/pin the backing pages earlier, such as even before the buffer is exported.

Yeah, I think the important thing is that the dma_buf api should allow decent buffer management. If certain subsystems ignore that and just allocate up-front, no problem for me. But given how all graphics drivers for essentially all OS have moved to dynamic buffer management, I expect decoders, encoders, v4l devices and whatever else might sit in a graphics pipeline to follow.

Yours, Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Russell King - ARM Linux

5:55 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:

...

Actually I think the importer should get a _mapped_ scatterlist when it calls get_scatterlist. The simple reason is that for strange stuff like memory remapped into e.g. omaps TILER doesn't have any sensible notion of an address in physical memory. For the USB-example I think the right approach is to attach the usb hci to the dma_buf, after all that is the device that will read the data and move over the usb bus to the udl device. Similar for any other device that sits behind a bus that can't do dma (or it doesn't make sense to do dma).

Imo if there's a use-case where the client needs to frob the sg_list before calling dma_map_sg, we have an issue with the dma subsystem in general.

Let's clear something up about the DMA API, which I think is causing some misunderstanding here. For this purpose, I'm going to talk about dma_map_single(), but the same applies to the scatterlist and _page variants as well.

dma = dma_map_single(dev, cpuaddr, size, dir);

dev := the device _performing_ the DMA operation. You are quite correct that in the case of a USB peripheral device, the device is normally the USB HCI device.

dma := dma address to be programmed into 'dev' which corresponds (by some means) with 'cpuaddr'. This may not be the physical address due to bus offset translations or mappings setup in IOMMUs.

Therefore, it is wrong to talk about a 'physical address' when talking about the DMA API.

We can take this one step further. Lets say that the USB HCI is not capable of performing memory accesses itself, but it is connected to a separate DMA engine device:

mem <---> dma engine <---> usb hci <---> usb peripheral

(such setups do exist, but despite having such implementations I've never tried to support it.)

In this case, the dma engine, in response to control signals from the USB host controller, will generate the appropriate bus address to access memory and transfer the data into the USB HCI device.

So, in this case, the struct device to be used for mapping memory for transfers to the usb peripheral is the DMA engine device, not the USB HCI device nor the USB peripheral device.

Daniel Vetter

6:43 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

On Tue, Nov 08, 2011 at 05:55:17PM +0000, Russell King - ARM Linux wrote:

...

On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:

...
Actually I think the importer should get a _mapped_ scatterlist when it calls get_scatterlist. The simple reason is that for strange stuff like memory remapped into e.g. omaps TILER doesn't have any sensible notion of an address in physical memory. For the USB-example I think the right approach is to attach the usb hci to the dma_buf, after all that is the device that will read the data and move over the usb bus to the udl device. Similar for any other device that sits behind a bus that can't do dma (or it doesn't make sense to do dma).

Imo if there's a use-case where the client needs to frob the sg_list before calling dma_map_sg, we have an issue with the dma subsystem in general.

Let's clear something up about the DMA API, which I think is causing some misunderstanding here. For this purpose, I'm going to talk about dma_map_single(), but the same applies to the scatterlist and _page variants as well.

dma = dma_map_single(dev, cpuaddr, size, dir);

dev := the device _performing_ the DMA operation. You are quite correct that in the case of a USB peripheral device, the device is normally the USB HCI device.

dma := dma address to be programmed into 'dev' which corresponds (by some means) with 'cpuaddr'. This may not be the physical address due to bus offset translations or mappings setup in IOMMUs.

Therefore, it is wrong to talk about a 'physical address' when talking about the DMA API.

We can take this one step further. Lets say that the USB HCI is not capable of performing memory accesses itself, but it is connected to a separate DMA engine device:

mem <---> dma engine <---> usb hci <---> usb peripheral

(such setups do exist, but despite having such implementations I've never tried to support it.)

In this case, the dma engine, in response to control signals from the USB host controller, will generate the appropriate bus address to access memory and transfer the data into the USB HCI device.

So, in this case, the struct device to be used for mapping memory for transfers to the usb peripheral is the DMA engine device, not the USB HCI device nor the USB peripheral device.

Thanks for the clarification. I think this is another reason why get_scatterlist should return the sg_list already mapped into the device address space - it's more consisten with the other dma apis. Another reason to completely hide everything but mapped addresses is crazy stuff like this

mem <---> tiling iommu <-+-> gpu | +-> scanout engine | +-> mpeg decoder

where it doesn't really make sense to talk about the memory backing the dma buffer because that's smeared all over the place due to tiling. IIRC for the case of omap these devices can also access memory through other paths and iommut that don't tile (but just remap like a normal iommu)

-Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Marek Szyprowski

28 Nov 28 Nov

7:47 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

Hello,

I'm sorry for the late reply, I must have missed this mail...

On Tuesday, November 08, 2011 7:43 PM Daniel Vetter wrote:

...

On Tue, Nov 08, 2011 at 05:55:17PM +0000, Russell King - ARM Linux wrote:

...
On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:

...
Actually I think the importer should get a _mapped_ scatterlist when it calls get_scatterlist. The simple reason is that for strange stuff like memory remapped into e.g. omaps TILER doesn't have any sensible notion of an address in physical memory. For the USB-example I think the right approach is to attach the usb hci to the dma_buf, after all that is the device that will read the data and move over the usb bus to the udl device. Similar for any other device that sits behind a bus that can't do dma (or it doesn't make sense to do dma).

Imo if there's a use-case where the client needs to frob the sg_list before calling dma_map_sg, we have an issue with the dma subsystem in general.

Let's clear something up about the DMA API, which I think is causing some misunderstanding here. For this purpose, I'm going to talk about dma_map_single(), but the same applies to the scatterlist and _page variants as well.

dma = dma_map_single(dev, cpuaddr, size, dir);

dev := the device _performing_ the DMA operation. You are quite correct that in the case of a USB peripheral device, the device is normally the USB HCI device.

dma := dma address to be programmed into 'dev' which corresponds (by some means) with 'cpuaddr'. This may not be the physical address due to bus offset translations or mappings setup in IOMMUs.

Therefore, it is wrong to talk about a 'physical address' when talking about the DMA API.

We can take this one step further. Lets say that the USB HCI is not capable of performing memory accesses itself, but it is connected to a separate DMA engine device:

mem <---> dma engine <---> usb hci <---> usb peripheral

(such setups do exist, but despite having such implementations I've never tried to support it.)

In this case, the dma engine, in response to control signals from the USB host controller, will generate the appropriate bus address to access memory and transfer the data into the USB HCI device.

So, in this case, the struct device to be used for mapping memory for transfers to the usb peripheral is the DMA engine device, not the USB HCI device nor the USB peripheral device.

Thanks for the clarification. I think this is another reason why get_scatterlist should return the sg_list already mapped into the device address space - it's more consisten with the other dma apis. Another reason to completely hide everything but mapped addresses is crazy stuff like this

mem <---> tiling iommu <-+-> gpu | +-> scanout engine | +-> mpeg decoder

where it doesn't really make sense to talk about the memory backing the dma buffer because that's smeared all over the place due to tiling. IIRC for the case of omap these devices can also access memory through other paths and iommut that don't tile (but just remap like a normal iommu)

I really don't get why you want to force the exporter to map the buffer into clients dma address space. Only the client device might know all the quirks required to do this correctly. Exporter should only provide a scatter-list with the memory that belongs to the exported buffer (might be pinned). How do you want to solve the following case - the gpu hardware from your diagram and a simple usb webcam with generic driver. The application would like to export a buffer from the webcam to scanout engine. How the generic webcam driver might know HOW to set up the tiller to create correct mappings for the GPU/scanout? IMHO only a GPU driver is capable of doing that assuming it got just a scatter list from the webcam driver.

Best regards

-- Marek Szyprowski Samsung Poland R&D Center

Daniel Vetter

10:34 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

On Mon, Nov 28, 2011 at 08:47:31AM +0100, Marek Szyprowski wrote:

...

On Tuesday, November 08, 2011 7:43 PM Daniel Vetter wrote:

...
Thanks for the clarification. I think this is another reason why get_scatterlist should return the sg_list already mapped into the device address space - it's more consisten with the other dma apis. Another reason to completely hide everything but mapped addresses is crazy stuff like this

mem <---> tiling iommu <-+-> gpu | +-> scanout engine | +-> mpeg decoder

where it doesn't really make sense to talk about the memory backing the dma buffer because that's smeared all over the place due to tiling. IIRC for the case of omap these devices can also access memory through other paths and iommut that don't tile (but just remap like a normal iommu)

I really don't get why you want to force the exporter to map the buffer into clients dma address space. Only the client device might know all the quirks required to do this correctly. Exporter should only provide a scatter-list with the memory that belongs to the exported buffer (might be pinned). How do you want to solve the following case - the gpu hardware from your diagram and a simple usb webcam with generic driver. The application would like to export a buffer from the webcam to scanout engine. How the generic webcam driver might know HOW to set up the tiller to create correct mappings for the GPU/scanout? IMHO only a GPU driver is capable of doing that assuming it got just a scatter list from the webcam driver.

You're correct that only the gpu knows how to put things into the tiler (and maybe other devices that have access to it). Let me expand my diagram so that you're webcam fits into the picture.

A few notes: - might not be exactly how omap really looks like - the devices behind tiler have different device address space windows to access the different paths to main memory. No other device can access the tiler, iirc. - your webcam doesn't exist on this because we can't dma from it's memory, we can only zero-copy from the memory the usb hci transferred the frame to.

Now when when e.g. the scanout engine calls get_scatterlist you only call dma_map_sg (which does nothing, because there's no iommu that's managed by the core kernel code for it). The scanout engine will then complain that your stuff is not contiguous and bail out. Or it is indeed contiguous and things Just Work.

The much more interesting case is when the mpeg decoder and the gpu share a buffer (think video on rotating cube or whatever other gui transition you fancy). Then the omap tiler code can check whether the the device sits behind the tiler (e.g. with some omap-specific device tree attribute) and hand out a linear view to a tiled buffer.

In other words, whereever you're currently calling one of the map/unmap dma api variants, you would call get/put_scatterlist (or better the new name I'm proposing). So I also don't see your argument about only the client knows how to map something into address space.

Yours, Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Dave Airlie

25 Nov 25 Nov

2:13 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Tue, Oct 11, 2011 at 10:23 AM, Sumit Semwal sumit.semwal@ti.com wrote:

...

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices.

The framework allows:

a new buffer-object to be created with fixed size.

different devices to 'attach' themselves to this buffer, to facilitate

backing storage negotiation, using dma_buf_attach() API.

association of a file pointer with each user-buffer and associated

allocator-defined operations on that buffer. This operation is called the 'export' operation.

this exported buffer-object to be shared with the other entity by asking for

its 'file-descriptor (fd)', and sharing the fd across.

a received fd to get the buffer object back, where it can be accessed using

the associated exporter-defined operations.

the exporter and user to share the scatterlist using get_scatterlist and

put_scatterlist operations.

Atleast one 'attach()' call is required to be made prior to calling the get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the optional allocator defined mmap(), to be used by devices that might need one.

More details are there in the documentation patch.

Some questions, I've started playing around with using this framework to do buffer sharing between DRM devices,

Why struct scatterlist and not struct sg_table? it seems like I really want to use an sg_table,

I'm not convinced fd's are really useful over just some idr allocated handle, so far I'm just returning the "fd" to userspace as a handle, and passing it back in the other side, so I'm not really sure what an fd wins us here, apart from the mmap thing which I think shouldn't be here anyways. (if fd's do win us more we should probably record that in the docs patch).

Dave.

Daniel Vetter

4:02 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Fri, Nov 25, 2011 at 02:13:22PM +0000, Dave Airlie wrote:

...

On Tue, Oct 11, 2011 at 10:23 AM, Sumit Semwal sumit.semwal@ti.com wrote:

...
This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices.

The framework allows:

a new buffer-object to be created with fixed size.

different devices to 'attach' themselves to this buffer, to facilitate

backing storage negotiation, using dma_buf_attach() API.

association of a file pointer with each user-buffer and associated

allocator-defined operations on that buffer. This operation is called the 'export' operation.

this exported buffer-object to be shared with the other entity by asking for

its 'file-descriptor (fd)', and sharing the fd across.

a received fd to get the buffer object back, where it can be accessed using

the associated exporter-defined operations.

the exporter and user to share the scatterlist using get_scatterlist and

put_scatterlist operations.

Atleast one 'attach()' call is required to be made prior to calling the get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the optional allocator defined mmap(), to be used by devices that might need one.

More details are there in the documentation patch.

Some questions, I've started playing around with using this framework to do buffer sharing between DRM devices,

Why struct scatterlist and not struct sg_table? it seems like I really want to use an sg_table,

No reason at all besides that intel-gtt is using scatterlist internally (and only kludges the sg_table together in an ad-hoc fashion) and so I haven't noticed. sg_table for more consistency with the dma api sounds good.

...

I'm not convinced fd's are really useful over just some idr allocated handle, so far I'm just returning the "fd" to userspace as a handle, and passing it back in the other side, so I'm not really sure what an fd wins us here, apart from the mmap thing which I think shouldn't be here anyways. (if fd's do win us more we should probably record that in the docs patch).

Imo fds are nice because their known and there's already all the preexisting infrastructure for them around. And if we ever get fancy with e.g. sync objects we can easily add poll support (or some insane ioctls). But I agree that "we can mmap" is bust as a reason and should just die. -Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Dave Airlie

4:15 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

...

+struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,

struct device *dev)

+{

struct dma_buf_attachment *attach;

int ret;

BUG_ON(!dmabuf || !dev);

mutex_lock(&dmabuf->lock);

attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);

if (attach == NULL)

goto err_alloc;

attach->dev = dev;

if (dmabuf->ops->attach) {

ret = dmabuf->ops->attach(dmabuf, dev, attach);

if (!ret)

goto err_attach;

}

list_add(&attach->node, &dmabuf->attachments);

I would assume at some point this needed at attach->dmabuf = dmabuf; added.

Dave.

Dave Airlie

4:28 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

I've rebuilt my PRIME interface on top of dmabuf to see how it would work,

I've got primed gears running again on top, but I expect all my object lifetime and memory ownership rules need fixing up (i.e. leaks like a sieve).

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

has the i915/nouveau patches for the kernel to produce the prime interface.

Dave.

Daniel Vetter

26 Nov 26 Nov

2 p.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Fri, Nov 25, 2011 at 17:28, Dave Airlie airlied@gmail.com wrote:

...

I've rebuilt my PRIME interface on top of dmabuf to see how it would work,

I've got primed gears running again on top, but I expect all my object lifetime and memory ownership rules need fixing up (i.e. leaks like a sieve).

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

has the i915/nouveau patches for the kernel to produce the prime interface.

I've noticed that your implementations for get_scatterlist (at least for the i915 driver) doesn't return the sg table mapped into the device address space. I've checked and the documentation makes it clear that this should be the case (and we really need this to support certain insane hw), but the get/put_scatterlist names are a bit misleading. Proposal:

- use struct sg_table instead of scatterlist like you've already done in you branch. Simply more consistent with the dma api.

- rename get/put_scatterlist into map/unmap for consistency with all the map/unmap dma api functions. The attachement would then serve as the abstract cookie to the backing storage, similar to how struct page * works as an abstract cookie for dma_map/unmap_page. The only special thing is that struct device * parameter because that's already part of the attachment.

- add new wrapper functions dma_buf_map_attachment and dma_buf_unmap_attachement to hide all the pointer/vtable-chasing that we currently expose to users of this interface.

Comments?

Cheers, Daniel

-- Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

Rob Clark

27 Nov 27 Nov

6:59 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

On Sat, Nov 26, 2011 at 8:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...

On Fri, Nov 25, 2011 at 17:28, Dave Airlie airlied@gmail.com wrote:

...
I've rebuilt my PRIME interface on top of dmabuf to see how it would work,

I've got primed gears running again on top, but I expect all my object lifetime and memory ownership rules need fixing up (i.e. leaks like a sieve).

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

has the i915/nouveau patches for the kernel to produce the prime interface.

I've noticed that your implementations for get_scatterlist (at least for the i915 driver) doesn't return the sg table mapped into the device address space. I've checked and the documentation makes it clear that this should be the case (and we really need this to support certain insane hw), but the get/put_scatterlist names are a bit misleading. Proposal:

use struct sg_table instead of scatterlist like you've already done

in you branch. Simply more consistent with the dma api.

yup

...

rename get/put_scatterlist into map/unmap for consistency with all

the map/unmap dma api functions. The attachement would then serve as the abstract cookie to the backing storage, similar to how struct page

works as an abstract cookie for dma_map/unmap_page. The only special

thing is that struct device * parameter because that's already part of the attachment.

yup

...

add new wrapper functions dma_buf_map_attachment and

dma_buf_unmap_attachement to hide all the pointer/vtable-chasing that we currently expose to users of this interface.

I thought that was one of the earlier comments on the initial dmabuf patch, but either way: yup

BR, -R

...

Comments?

Cheers, Daniel

Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Semwal, Sumit

1 Dec 1 Dec

5:51 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

Hi Dave, Daniel, Rob,

On Sun, Nov 27, 2011 at 12:29 PM, Rob Clark robdclark@gmail.com wrote:

...

On Sat, Nov 26, 2011 at 8:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Fri, Nov 25, 2011 at 17:28, Dave Airlie airlied@gmail.com wrote:

...
I've rebuilt my PRIME interface on top of dmabuf to see how it would

work,

...
...
I've got primed gears running again on top, but I expect all my object lifetime and memory ownership rules need fixing up (i.e. leaks like a sieve).

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

has the i915/nouveau patches for the kernel to produce the prime

interface.

...
I've noticed that your implementations for get_scatterlist (at least for the i915 driver) doesn't return the sg table mapped into the device address space. I've checked and the documentation makes it clear that this should be the case (and we really need this to support certain insane hw), but the get/put_scatterlist names are a bit misleading. Proposal:

use struct sg_table instead of scatterlist like you've already done

in you branch. Simply more consistent with the dma api.

yup

...

rename get/put_scatterlist into map/unmap for consistency with all

the map/unmap dma api functions. The attachement would then serve as the abstract cookie to the backing storage, similar to how struct page

works as an abstract cookie for dma_map/unmap_page. The only special

thing is that struct device * parameter because that's already part of the attachment.

yup

...

add new wrapper functions dma_buf_map_attachment and

dma_buf_unmap_attachement to hide all the pointer/vtable-chasing that we currently expose to users of this interface.

I thought that was one of the earlier comments on the initial dmabuf patch, but either way: yup

Thanks for your comments; I will incorporate all of these in the next version I'll send out.

...

BR, -R

BR, Sumit.

...

...
Comments?

Cheers, Daniel

Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Semwal, Sumit

5:55 a.m.

New subject: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

Hi Dave, Daniel, Rob,

...

On Sun, Nov 27, 2011 at 12:29 PM, Rob Clark robdclark@gmail.com wrote:

...
On Sat, Nov 26, 2011 at 8:00 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Fri, Nov 25, 2011 at 17:28, Dave Airlie airlied@gmail.com wrote:

...
I've rebuilt my PRIME interface on top of dmabuf to see how it would work,

I've got primed gears running again on top, but I expect all my object lifetime and memory ownership rules need fixing up (i.e. leaks like a sieve).

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

has the i915/nouveau patches for the kernel to produce the prime interface.

I've noticed that your implementations for get_scatterlist (at least for the i915 driver) doesn't return the sg table mapped into the device address space. I've checked and the documentation makes it clear that this should be the case (and we really need this to support certain insane hw), but the get/put_scatterlist names are a bit misleading. Proposal:

use struct sg_table instead of scatterlist like you've already done

in you branch. Simply more consistent with the dma api.

yup

...

rename get/put_scatterlist into map/unmap for consistency with all

the map/unmap dma api functions. The attachement would then serve as the abstract cookie to the backing storage, similar to how struct page

works as an abstract cookie for dma_map/unmap_page. The only special

thing is that struct device * parameter because that's already part of the attachment.

yup

...

add new wrapper functions dma_buf_map_attachment and

dma_buf_unmap_attachement to hide all the pointer/vtable-chasing that we currently expose to users of this interface.

I thought that was one of the earlier comments on the initial dmabuf patch, but either way: yup

Thanks for your comments; I will incorporate all of these in the next version I'll send out.

...

...
BR, -R

BR, Sumit.

...

...
...
Comments?

Cheers, Daniel

Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Sumit Semwal

11 Oct 11 Oct

9:23 a.m.

New subject: [RFC 2/2] dma-buf: Documentation for buffer sharing framework

Add documentation for dma buffer sharing framework, explaining the various operations, members and API of the dma buffer sharing framework.

Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Sumit Semwal sumit.semwal@ti.com --- Documentation/dma-buf-sharing.txt | 210 +++++++++++++++++++++++++++++++++++++ 1 files changed, 210 insertions(+), 0 deletions(-) create mode 100644 Documentation/dma-buf-sharing.txt

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt new file mode 100644 index 0000000..4da6644 --- /dev/null +++ b/Documentation/dma-buf-sharing.txt @@ -0,0 +1,210 @@ + DMA Buffer Sharing API Guide + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Sumit Semwal + <sumit dot semwal at linaro dot org> + <sumit dot semwal at ti dot com> + +This document serves as a guide to device-driver writers on what is the dma-buf +buffer sharing API, how to use it for exporting and using shared buffers. + +Any device driver which wishes to be a part of dma buffer sharing, can do so as +either the 'exporter' of buffers, or the 'user' of buffers. + +Say a driver A wants to use buffers created by driver B, then we call B as the +exporter, and B as buffer-user. + +The exporter +- implements and manages operations[1] for the buffer +- allows other users to share the buffer by using dma_buf sharing APIs, +- manages the details of buffer allocation, +- decides about the actual backing storage where this allocation happens, +- takes care of any migration of scatterlist - for all (shared) users of this + buffer, +- optionally, provides mmap capability for drivers that need it. + +The buffer-user +- is one of (many) sharing users of the buffer. +- doesn't need to worry about how the buffer is allocated, or where. +- needs a mechanism to get access to the scatterlist that makes up this buffer + in memory, mapped into its own address space, so it can access the same area + of memory. + + +The dma_buf buffer sharing API usage contains the following steps: + +1. Exporter announces that it wishes to export a buffer +2. Userspace gets the file descriptor associated with the exported buffer, and + passes it around to potential buffer-users based on use case +3. Each buffer-user 'connects' itself to the buffer +4. When needed, buffer-user requests access to the buffer from exporter +5. When finished with its use, the buffer-user notifies end-of-dma to exporter +6. when buffer-user is done using this buffer completely, it 'disconnects' + itself from the buffer. + + +1. Exporter's announcement of buffer export + + The buffer exporter announces its wish to export a buffer. In this, it + connects its own private buffer data, provides implementation for operations + that can be performed on the exported dma_buf, and flags for the file + associated with this buffer. + + Interface: + struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, + int flags) + + If this succeeds, dma_buf_export allocates a dma_buf structure, and returns a + pointer to the same. It also associates an anon file with this buffer, so it + can be exported. On failure to allocate the dma_buf object, it returns NULL. + +2. Userspace gets a handle to pass around to potential buffer-users + + Userspace entity requests for a file-descriptor (fd) which is a handle to the + anon file associated with the buffer. It can then share the fd with other + drivers and/or processes. + + Interface: + int dma_buf_fd(struct dma_buf *dmabuf) + + This API installs an fd for the anon file associated with this buffer; + returns either 'fd', or error. + +3. Each buffer-user 'connects' itself to the buffer + + Each buffer-user now gets a reference to the buffer, using the fd passed to + it. + + Interface: + struct dma_buf *dma_buf_get(int fd) + + This API will return a reference to the dma_buf, and increment refcount for + it. + + After this, the buffer-user needs to attach its device with the buffer, which + helps the exporter to know of device buffer constraints. + + Interface: + struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, + struct device *dev) + + This API returns reference to an attachment structure, which is then used + for scatterlist operations. It will optionally call the 'attach' dma_buf + operation, if provided by the exporter. + + The dma-buf sharing framework does the book-keeping bits related to keeping + the list of all attachments to a buffer. + +Till this stage, the buffer-exporter has the option to choose not to actually +allocate the backing storage for this buffer, but wait for the first buffer-user +to request use of buffer for allocation. + + +4. When needed, buffer-user requests access to the buffer + + Whenever a buffer-user wants to use the buffer for any dma, it asks for + access to the buffer using dma_buf->ops->get_scatterlist operation. Atleast + one attach to the buffer should have happened before get_scatterlist can be + called. + + Interface: [member of struct dma_buf_ops] + struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment *, + enum dma_data_direction, + int* nents); + + It is one of the buffer operations that must be implemented by the exporter. + It should return the scatterlist for this buffer, mapped into caller's address + space. + + If this is being called for the first time, the exporter can now choose to + scan through the list of attachments for this buffer, collate the requirements + of the attached devices, and choose an appropriate backing storage for the + buffer. + + Based on enum dma_data_direction, it might be possible to have multiple users + accessing at the same time (for reading, maybe), or any other kind of sharing + that the exporter might wish to make available to buffer-users. + + +5. When finished, the buffer-user notifies end-of-dma to exporter + + Once the dma for the current buffer-user is over, it signals 'end-of-dma' to + the exporter using the dma_buf->ops->put_scatterlist() operation. + + Interface: + void (*put_scatterlist)(struct dma_buf_attachment *, struct scatterlist *, + int nents); + + put_scatterlist signifies the end-of-dma for the attachment provided. + + +6. when buffer-user is done using this buffer, it 'disconnects' itself from the + buffer. + + After the buffer-user has no more interest in using this buffer, it should + disconnect itself from the buffer: + + - it first detaches itself from the buffer. + + Interface: + void dma_buf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *dmabuf_attach); + + This API removes the attachment from the list in dmabuf, and optionally calls + dma_buf->ops->detach(), if provided by exporter, for any housekeeping bits. + + - Then, the buffer-user returns the buffer reference to exporter. + + Interface: + void dma_buf_put(struct dma_buf *dmabuf); + + This API then reduces the refcount for this buffer. + + If, as a result of this call, the refcount becomes 0, the 'release' file + operation related to this fd is called. It calls the dmabuf->ops->release() + operation in turn, and frees the memory allocated for dmabuf when exported. + +NOTES: +- Importance of attach-detach and {get,put}_scatterlist operation pairs + The attach-detach calls allow the exporter to figure out backing-storage + constraints for the currently-interested devices. This allows preferential + allocation, and/or migration of pages across different types of storage + available, if possible. + + Bracketing of dma access with {get,put}_scatterlist operations is essential + to allow just-in-time backing of storage, and migration mid-way through a + use-case. + +- Migration of backing storage if needed + After + - atleast one get_scatterlist has happened, + - and the backing storage has been allocated for this buffer, + If another new buffer-user intends to attach itself to this buffer, it might + be allowed, if possible for the exporter. + + In case it is allowed by the exporter: + if the new buffer-user has stricter 'backing-storage constraints', and the + exporter can handle these constraints, the exporter can just stall on the + get_scatterlist till all outstanding access is completed (as signalled by + put_scatterlist). + Once all ongoing access is completed, the exporter could potentially move + the buffer to the stricter backing-storage, and then allow further + {get,put}_scatterlist operations from any buffer-user from the migrated + backing-storage. + + If the exporter cannot fulfill the backing-storage constraints of the new + buffer-user device as requested, dma_buf_attach() would return an error to + denote non-compatibility of the new buffer-sharing request with the current + buffer. + + If the exporter chooses not to allow an attach() operation once a + get_scatterlist has been called, it simply returns an error. + +- mmap file operation + An mmap() file operation is provided for the fd associated with the buffer. + If the exporter defines an mmap operation, the mmap() fop calls this to allow + mmap for devices that might need it; if not, it returns an error. + +References: +[1] struct dma_buf_ops in include/linux/dma-buf.h +[2] All interfaces mentioned above defined in include/linux/dma-buf.h

-- 1.7.4.1

Randy Dunlap

12 Oct 12 Oct

10:30 p.m.

New subject: [RFC 2/2] dma-buf: Documentation for buffer sharing framework

On 10/11/2011 02:23 AM, Sumit Semwal wrote:

...

Add documentation for dma buffer sharing framework, explaining the various operations, members and API of the dma buffer sharing framework.

Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Sumit Semwal sumit.semwal@ti.com

Documentation/dma-buf-sharing.txt | 210 +++++++++++++++++++++++++++++++++++++ 1 files changed, 210 insertions(+), 0 deletions(-) create mode 100644 Documentation/dma-buf-sharing.txt

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt new file mode 100644 index 0000000..4da6644 --- /dev/null +++ b/Documentation/dma-buf-sharing.txt @@ -0,0 +1,210 @@
               DMA Buffer Sharing API Guide
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                       Sumit Semwal
           <sumit dot semwal at linaro dot org>
            <sumit dot semwal at ti dot com>
+This document serves as a guide to device-driver writers on what is the dma-buf +buffer sharing API, how to use it for exporting and using shared buffers.

+Any device driver which wishes to be a part of dma buffer sharing, can do so as

Please use DMA instead of dma (except combinations like dma-buf are OK). [multiple]

...

+either the 'exporter' of buffers, or the 'user' of buffers.

+Say a driver A wants to use buffers created by driver B, then we call B as the +exporter, and B as buffer-user.

and A

...

+The exporter +- implements and manages operations[1] for the buffer +- allows other users to share the buffer by using dma_buf sharing APIs, +- manages the details of buffer allocation, +- decides about the actual backing storage where this allocation happens, +- takes care of any migration of scatterlist - for all (shared) users of this

buffer,

+- optionally, provides mmap capability for drivers that need it.

+The buffer-user +- is one of (many) sharing users of the buffer. +- doesn't need to worry about how the buffer is allocated, or where. +- needs a mechanism to get access to the scatterlist that makes up this buffer

in memory, mapped into its own address space, so it can access the same area

of memory.

+The dma_buf buffer sharing API usage contains the following steps:

+1. Exporter announces that it wishes to export a buffer +2. Userspace gets the file descriptor associated with the exported buffer, and

passes it around to potential buffer-users based on use case

+3. Each buffer-user 'connects' itself to the buffer +4. When needed, buffer-user requests access to the buffer from exporter +5. When finished with its use, the buffer-user notifies end-of-dma to exporter +6. when buffer-user is done using this buffer completely, it 'disconnects'

itself from the buffer.

+1. Exporter's announcement of buffer export
The buffer exporter announces its wish to export a buffer. In this, it

connects its own private buffer data, provides implementation for operations

that can be performed on the exported dma_buf, and flags for the file

associated with this buffer.

Interface:
 struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
                           int flags)
If this succeeds, dma_buf_export allocates a dma_buf structure, and returns a

pointer to the same. It also associates an anon file with this buffer, so it

s/anon/anonymous/ (multiple)

...

can be exported. On failure to allocate the dma_buf object, it returns NULL.

+2. Userspace gets a handle to pass around to potential buffer-users
Userspace entity requests for a file-descriptor (fd) which is a handle to the

anon file associated with the buffer. It can then share the fd with other

drivers and/or processes.

Interface:
 int dma_buf_fd(struct dma_buf *dmabuf)
This API installs an fd for the anon file associated with this buffer;

returns either 'fd', or error.
+3. Each buffer-user 'connects' itself to the buffer
Each buffer-user now gets a reference to the buffer, using the fd passed to

it.

Interface:
 struct dma_buf *dma_buf_get(int fd)
This API will return a reference to the dma_buf, and increment refcount for

it.

After this, the buffer-user needs to attach its device with the buffer, which

helps the exporter to know of device buffer constraints.

Interface:
 struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
                                           struct device *dev)
This API returns reference to an attachment structure, which is then used

for scatterlist operations. It will optionally call the 'attach' dma_buf

operation, if provided by the exporter.

The dma-buf sharing framework does the book-keeping bits related to keeping

bookkeeping

...

the list of all attachments to a buffer.

+Till this stage, the buffer-exporter has the option to choose not to actually

Until

...

+allocate the backing storage for this buffer, but wait for the first buffer-user +to request use of buffer for allocation.

+4. When needed, buffer-user requests access to the buffer

Whenever a buffer-user wants to use the buffer for any dma, it asks for

access to the buffer using dma_buf->ops->get_scatterlist operation. Atleast

At least

...

one attach to the buffer should have happened before get_scatterlist can be

called.

Interface: [member of struct dma_buf_ops]
 struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment *,
                                           enum dma_data_direction,
                                           int* nents);
It is one of the buffer operations that must be implemented by the exporter.

It should return the scatterlist for this buffer, mapped into caller's address

space.

If this is being called for the first time, the exporter can now choose to

scan through the list of attachments for this buffer, collate the requirements

of the attached devices, and choose an appropriate backing storage for the

buffer.

Based on enum dma_data_direction, it might be possible to have multiple users

accessing at the same time (for reading, maybe), or any other kind of sharing

that the exporter might wish to make available to buffer-users.
+5. When finished, the buffer-user notifies end-of-dma to exporter
Once the dma for the current buffer-user is over, it signals 'end-of-dma' to

the exporter using the dma_buf->ops->put_scatterlist() operation.

Interface:
 void (*put_scatterlist)(struct dma_buf_attachment *, struct scatterlist *,
                         int nents);
put_scatterlist signifies the end-of-dma for the attachment provided.
+6. when buffer-user is done using this buffer, it 'disconnects' itself from the
buffer.

After the buffer-user has no more interest in using this buffer, it should

disconnect itself from the buffer:

it first detaches itself from the buffer.

Interface:
 void dma_buf_detach(struct dma_buf *dmabuf,
                     struct dma_buf_attachment *dmabuf_attach);
This API removes the attachment from the list in dmabuf, and optionally calls

dma_buf->ops->detach(), if provided by exporter, for any housekeeping bits.

Then, the buffer-user returns the buffer reference to exporter.

Interface:
void dma_buf_put(struct dma_buf *dmabuf);
This API then reduces the refcount for this buffer.

If, as a result of this call, the refcount becomes 0, the 'release' file

operation related to this fd is called. It calls the dmabuf->ops->release()

operation in turn, and frees the memory allocated for dmabuf when exported.
+NOTES: +- Importance of attach-detach and {get,put}_scatterlist operation pairs

The attach-detach calls allow the exporter to figure out backing-storage

constraints for the currently-interested devices. This allows preferential

allocation, and/or migration of pages across different types of storage

available, if possible.

Bracketing of dma access with {get,put}_scatterlist operations is essential

to allow just-in-time backing of storage, and migration mid-way through a

use-case.

+- Migration of backing storage if needed

After

atleast one get_scatterlist has happened,

at least

...

and the backing storage has been allocated for this buffer,

If another new buffer-user intends to attach itself to this buffer, it might

be allowed, if possible for the exporter.

In case it is allowed by the exporter:

if the new buffer-user has stricter 'backing-storage constraints', and the

exporter can handle these constraints, the exporter can just stall on the

get_scatterlist till all outstanding access is completed (as signalled by

until

...

put_scatterlist).

Once all ongoing access is completed, the exporter could potentially move

the buffer to the stricter backing-storage, and then allow further

{get,put}_scatterlist operations from any buffer-user from the migrated

backing-storage.

If the exporter cannot fulfill the backing-storage constraints of the new

buffer-user device as requested, dma_buf_attach() would return an error to

denote non-compatibility of the new buffer-sharing request with the current

buffer.

If the exporter chooses not to allow an attach() operation once a

get_scatterlist has been called, it simply returns an error.

+- mmap file operation

An mmap() file operation is provided for the fd associated with the buffer.

If the exporter defines an mmap operation, the mmap() fop calls this to allow

mmap for devices that might need it; if not, it returns an error.

+References: +[1] struct dma_buf_ops in include/linux/dma-buf.h +[2] All interfaces mentioned above defined in include/linux/dma-buf.h

-- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code ***

Semwal, Sumit

13 Oct 13 Oct

4:48 a.m.

New subject: [RFC 2/2] dma-buf: Documentation for buffer sharing framework

Hi Randy, On Thu, Oct 13, 2011 at 4:00 AM, Randy Dunlap rdunlap@xenotime.net wrote:

...

On 10/11/2011 02:23 AM, Sumit Semwal wrote:

...
Add documentation for dma buffer sharing framework, explaining the various operations, members and API of the dma buffer sharing framework.

Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Sumit Semwal sumit.semwal@ti.com

Documentation/dma-buf-sharing.txt | 210 +++++++++++++++++++++++++++++++++++++

<snip>

...

...

if the new buffer-user has stricter 'backing-storage constraints', and the

exporter can handle these constraints, the exporter can just stall on the

get_scatterlist till all outstanding access is completed (as signalled by

until

Thanks for your review; I will update all these in the next version.

...

...

put_scatterlist).

Once all ongoing access is completed, the exporter could potentially move

the buffer to the stricter backing-storage, and then allow further

{get,put}_scatterlist operations from any buffer-user from the migrated

backing-storage.

If the exporter cannot fulfill the backing-storage constraints of the new

buffer-user device as requested, dma_buf_attach() would return an error to

denote non-compatibility of the new buffer-sharing request with the current

buffer.

If the exporter chooses not to allow an attach() operation once a

get_scatterlist has been called, it simply returns an error.

+- mmap file operation

An mmap() file operation is provided for the fd associated with the buffer.

If the exporter defines an mmap operation, the mmap() fop calls this to allow

mmap for devices that might need it; if not, it returns an error.

+References: +[1] struct dma_buf_ops in include/linux/dma-buf.h +[2] All interfaces mentioned above defined in include/linux/dma-buf.h

-- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code ***

Best regards, ~Sumit.

5070

days inactive

5121

days old

linaro-mm-sig@lists.linaro.org

32 comments

participants

tags (0)

participants (12)

Clark, Rob
Daniel Vetter
Dave Airlie
Marek Szyprowski
Randy Dunlap
Rob Clark
Rob Clark
Russell King - ARM Linux
Semwal, Sumit
Sumit Semwal
Sumit Semwal
Tomasz Stanislawski